Techniques for accessing and selecting relevant information from LLM memory stores
Core Idea: Memory retrieval methods enable LLMs to efficiently search through stored information and extract the most contextually relevant content for the current interaction.
Key Elements
-
Retrieval Approaches:
- Recency-Based: Prioritizing recent interactions (most basic)
- Keyword Matching: Finding memories containing specific terms
- Semantic Search: Retrieving conceptually similar content via embeddings
- Hybrid Retrieval: Combining multiple retrieval methods
- Multi-hop Retrieval: Finding connected information across multiple memory entries
-
Retrieval Mechanics:
- Vector Similarity: Using cosine similarity between embeddings
- Relevance Scoring: Ranking memories by computed relevance
- Top-k Selection: Choosing the most relevant subset of memories
- Contextual Filtering: Narrowing retrieval based on topic or domain
-
Optimization Techniques:
- Embedding Compression: Reducing vector dimensionality
- Approximate Nearest Neighbors: Fast search in high-dimensional spaces
- Caching: Storing frequent retrievals for quick access
- Query Reformulation: Improving retrieval with better queries
Implementation Example
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
# Setup vector store for memory
embeddings = OpenAIEmbeddings()
memory_vectorstore = Chroma(embedding_function=embeddings)
# Store memories with embeddings
for memory in memories:
memory_vectorstore.add_texts([memory.content], metadatas=[{"timestamp": memory.timestamp}])
# Retrieve relevant memories for current query
def retrieve_relevant_memories(query, k=5):
# Perform similarity search
results = memory_vectorstore.similarity_search_with_score(
query,
k=k,
fetch_k=20 # Retrieve more candidates for reranking
)
# Rerank results (optional advanced step)
reranked_results = rerank_by_relevance(results, query)
return reranked_results
Applications
- Contextual Question Answering: Finding relevant past information to answer queries
- Personalized Responses: Retrieving user preferences and history
- Complex Problem Solving: Connecting related pieces of information from different times
- Knowledge Synthesis: Combining information across multiple memory entries
Connections
- Related Concepts: LLM Memory Systems (basic memory infrastructure), Agentic Memory Organization (self-organizing memory)
- Broader Context: Vector Search Techniques (technology powering semantic retrieval), Retrieval-Augmented Generation (broader pattern)
- Applications: Conversational Search (enhanced by memory retrieval)
- Components: Vector Databases (technology for storing and retrieving embeddings)
References
- Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"
- Guu, K., et al. (2020). "REALM: Retrieval-Augmented Language Model Pre-Training"
- Maharana, A., et al. (2024). "Evaluating Very Long-Term Conversational Memory of LLM Agents"
#memory-retrieval #vector-search #semantic-similarity #information-retrieval #llm-agents
Sources: