
Tenant Data Isolation: Patterns and Anti-Patterns
Explore effective patterns and pitfalls of tenant data isolation in multi-tenant systems to enhance security and compliance.
Jul 30, 2025
Read More

An AI agent without memory is a very expensive stateless function. Every call starts from scratch — no context about who the user is, what was discussed previously, or what the agent has already tried. For transactional use cases, this is fine. For anything requiring multi-turn reasoning, personalization, or learning from past interactions, memory is not optional.
Before choosing a memory type, clarify what you need memory for:
Different requirements call for different memory architectures. Conflating them leads to systems that are bloated, slow, or hallucinate recalled facts.
Buffer memory is the simplest form: the entire conversation history is stored and passed as context to the LLM on every call.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)
chain.predict(input="My name is Alex")
chain.predict(input="What's my name?") # Agent recalls 'Alex'
Strengths: Zero configuration, perfect recall, no information loss. Weaknesses: Context window fills fast — a 30-turn conversation in GPT-4 Turbo can cost $0.30+ per call. Use when: Short-lived conversations (5-10 turns) where perfect recall matters more than cost.
Summary memory periodically compresses conversation history into a summary, replacing the full history with a condensed version that fits in fewer tokens.
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)
# After N turns, history is summarized automatically
Strengths: Handles long conversations without exploding context costs. Scales to hundreds of turns. Weaknesses: Lossy — details dropped in summarization may become relevant later. Adds summarization latency and cost. Use when: Long multi-turn conversations, customer support agents, unbounded conversation length.
A hybrid approach: keep the last N interactions verbatim for recent accuracy, and summarize everything older than that threshold. This is the pragmatic default for most conversational agents.
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=500 # Recent messages kept verbatim until 500 tokens, then summarized
)
Vector store memory stores conversations or facts as embeddings in a vector database. When the agent needs context, it retrieves the most semantically relevant memories — not just the most recent ones.
from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})
memory = VectorStoreRetrieverMemory(retriever=retriever)
Strengths: Scales to unlimited memory. Retrieves relevant context regardless of when it occurred. Weaknesses: May miss critical non-semantic context. Adds embedding and retrieval latency. Requires a vector database. Use when: Long-term user personalization, knowledge bases, large interaction histories.
In LangGraph (the recommended agent runtime for LangChain in 2025), memory is part of the graph state, not a separate memory object. State persists across invocations using checkpointers.
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
checkpointer = MemorySaver()
graph = StateGraph(AgentState)
app = graph.compile(checkpointer=checkpointer)
# Pass thread_id to maintain conversation continuity
config = {"configurable": {"thread_id": user_id}}
app.invoke(input, config=config)
For production, replace MemorySaver with PostgresSaver or RedisSaver to persist state across server restarts.
| Use Case | Recommended Memory | Why |
|---|---|---|
| Short chat (5-10 turns) | Buffer | Perfect recall, simple |
| Long chat (50+ turns) | Summary Buffer | Accuracy + cost balance |
| User personalization | Vector Store | Semantic retrieval across sessions |
| Entity tracking | Entity Memory | Structured fact maintenance |
| Complex agent workflows | LangGraph State + Checkpointer | State is first-class, production-ready |
| Multi-session agents | Vector + PostgresSaver | Retrieval + persistence |
Related: LangChain Memory Optimization for AI Workflows
Buffer memory stores complete conversation history verbatim and passes it to the LLM on every call — perfect recall but context window fills quickly. Summary memory compresses older history into a summary — handles unlimited conversation length but may lose specific details in the summarization process.
When the agent receives a new input, it's embedded using an embedding model, and the vector store is queried for the K nearest embeddings. The retrieved memories are included as context in the LLM prompt alongside the current input — surfacing relevant facts regardless of when they occurred.
Yes. Replace the default MemorySaver (in-memory, lost on restart) with PostgresSaver or RedisSaver. State is serialized and stored in the external database, surviving server restarts and horizontal scaling.
Always use a unique thread_id or session_id when invoking the agent. In LangGraph, the thread_id in the config dict scopes the checkpointer to a specific conversation. Never share a single memory instance across concurrent requests.
Need an expert team to provide digital solutions for your business?
Book A Free CallDive into a wealth of knowledge with our unique articles and resources. Stay informed about the latest trends and best practices in the tech industry.
View All articlesTell us about your vision. We'll respond within 24 hours with a free AI-powered estimate.
© 2026 Propelius Technologies. All rights reserved.