Agent Memory in LangChain: Short-Term, Long-Term, and Episodic

Feb 21, 2026
9 min read
Agent Memory in LangChain: Short-Term, Long-Term, and Episodic

Agent Memory in LangChain: Short-Term, Long-Term, and Episodic

An AI agent without memory is a very expensive stateless function. Every call starts from scratch — no context about who the user is, what was discussed previously, or what the agent has already tried. For transactional use cases, this is fine. For anything requiring multi-turn reasoning, personalization, or learning from past interactions, memory is not optional.

Why Memory Architecture Matters

Before choosing a memory type, clarify what you need memory for:

  • Conversation continuity: The agent should remember what was said earlier in this session.
  • User personalization: The agent should remember facts about this user across sessions.
  • Task state: The agent should remember what it's already tried, what worked, and what failed.
  • Knowledge accumulation: The agent should store and retrieve information from external sources or past runs.

Different requirements call for different memory architectures. Conflating them leads to systems that are bloated, slow, or hallucinate recalled facts.

Buffer Memory (ConversationBufferMemory)

Buffer memory is the simplest form: the entire conversation history is stored and passed as context to the LLM on every call.

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
chain = ConversationChain(llm=llm, memory=memory)

chain.predict(input="My name is Alex")
chain.predict(input="What's my name?")  # Agent recalls 'Alex'

Strengths: Zero configuration, perfect recall, no information loss. Weaknesses: Context window fills fast — a 30-turn conversation in GPT-4 Turbo can cost $0.30+ per call. Use when: Short-lived conversations (5-10 turns) where perfect recall matters more than cost.

Summary Memory (ConversationSummaryMemory)

Summary memory periodically compresses conversation history into a summary, replacing the full history with a condensed version that fits in fewer tokens.

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=llm)
# After N turns, history is summarized automatically

Strengths: Handles long conversations without exploding context costs. Scales to hundreds of turns. Weaknesses: Lossy — details dropped in summarization may become relevant later. Adds summarization latency and cost. Use when: Long multi-turn conversations, customer support agents, unbounded conversation length.

Summary Buffer Memory (Hybrid)

A hybrid approach: keep the last N interactions verbatim for recent accuracy, and summarize everything older than that threshold. This is the pragmatic default for most conversational agents.

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=500  # Recent messages kept verbatim until 500 tokens, then summarized
)

Vector Store Memory

Vector store memory stores conversations or facts as embeddings in a vector database. When the agent needs context, it retrieves the most semantically relevant memories — not just the most recent ones.

from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

vectorstore = Chroma(embedding_function=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={'k': 3})
memory = VectorStoreRetrieverMemory(retriever=retriever)

Strengths: Scales to unlimited memory. Retrieves relevant context regardless of when it occurred. Weaknesses: May miss critical non-semantic context. Adds embedding and retrieval latency. Requires a vector database. Use when: Long-term user personalization, knowledge bases, large interaction histories.

Memory in LangGraph Agents

In LangGraph (the recommended agent runtime for LangChain in 2025), memory is part of the graph state, not a separate memory object. State persists across invocations using checkpointers.

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph

checkpointer = MemorySaver()
graph = StateGraph(AgentState)
app = graph.compile(checkpointer=checkpointer)

# Pass thread_id to maintain conversation continuity
config = {"configurable": {"thread_id": user_id}}
app.invoke(input, config=config)

For production, replace MemorySaver with PostgresSaver or RedisSaver to persist state across server restarts.

Memory Architecture Decision Table

Use CaseRecommended MemoryWhy
Short chat (5-10 turns)BufferPerfect recall, simple
Long chat (50+ turns)Summary BufferAccuracy + cost balance
User personalizationVector StoreSemantic retrieval across sessions
Entity trackingEntity MemoryStructured fact maintenance
Complex agent workflowsLangGraph State + CheckpointerState is first-class, production-ready
Multi-session agentsVector + PostgresSaverRetrieval + persistence

Production Considerations

  • Memory isolation: Always scope memory by user/session ID. Sharing a memory object across users leaks context — a serious data privacy issue.
  • Memory size limits: Set explicit token or turn limits. Unbounded memory growth causes latency creep and cost overruns.
  • Memory TTL: Implement time-to-live for cached memories. User preferences from 18 months ago may no longer be valid.
  • Testing memory: Agents that work in single-turn testing often fail in multi-turn production because memory state wasn't accounted for in tests.

Related: LangChain Memory Optimization for AI Workflows

FAQs

What is the difference between ConversationBufferMemory and ConversationSummaryMemory?

Buffer memory stores complete conversation history verbatim and passes it to the LLM on every call — perfect recall but context window fills quickly. Summary memory compresses older history into a summary — handles unlimited conversation length but may lose specific details in the summarization process.

How does vector store memory retrieve relevant memories?

When the agent receives a new input, it's embedded using an embedding model, and the vector store is queried for the K nearest embeddings. The retrieved memories are included as context in the LLM prompt alongside the current input — surfacing relevant facts regardless of when they occurred.

Can LangGraph agents maintain memory across server restarts?

Yes. Replace the default MemorySaver (in-memory, lost on restart) with PostgresSaver or RedisSaver. State is serialized and stored in the external database, surviving server restarts and horizontal scaling.

How do you prevent memory from leaking between users?

Always use a unique thread_id or session_id when invoking the agent. In LangGraph, the thread_id in the config dict scopes the checkpointer to a specific conversation. Never share a single memory instance across concurrent requests.

Need an expert team to provide digital solutions for your business?

Book A Free Call

Related Articles & Resources

Dive into a wealth of knowledge with our unique articles and resources. Stay informed about the latest trends and best practices in the tech industry.

View All articles
Get in Touch

Let's build somethinggreat together.

Tell us about your vision. We'll respond within 24 hours with a free AI-powered estimate.

🎁This month only: Free UI/UX Design worth $3,000
Takes just 2 minutes
* How did you hear about us?
or prefer instant chat?

Quick question? Chat on WhatsApp

Get instant responses • Just takes 5 seconds

Response in 24 hours
100% confidential
No commitment required
🛡️100% Satisfaction Guarantee — If you're not happy with the estimate, we'll refine it for free
Propelius Technologies

You bring the vision. We handle the build.

facebookinstagramLinkedinupworkclutch

© 2026 Propelius Technologies. All rights reserved.