Customer support chatbot remembering preferences across visits: Mem0. Sales agent that needs "the customer's stated budget changed from $50k to $200k in March": Zep. Autonomous research agent that runs for weeks accumulating findings: Letta. Multi-user SaaS where each user has isolated memory: Mem0 or Zep, both have solid namespacing. Voice agent with sub-second latency budget: Mem0 (lowest retrieval overhead).

Zep vs Mem0 vs Letta Agent Memory API (2026)

Why Memory Is the Hardest Part of Production Agents

By 2026, "stateless agent" is a euphemism for "agent that forgets every conversation." Real users expect agents to remember preferences, prior decisions, ongoing projects, and relationships. The memory layer is the part of the stack most teams underbuild and most users notice first.

Three platforms dominate the agent memory space in 2026: Zep, Mem0, and Letta (formerly MemGPT). They take fundamentally different architectural bets, and the right one for you depends on what kind of memory you actually need.

TL;DR

Zep is a temporal knowledge graph. It extracts entities and facts from conversations, attaches time validity, and lets you query "what was true on date X." Best for agents that need a coherent world model over weeks or months.
Mem0 is the simplest abstraction: an extract-and-retrieve memory layer that fits in front of any LLM call. Lightest integration, broadest framework support, ships in an afternoon.
Letta is a full agent runtime built around the MemGPT paper's idea of memory hierarchy. Memory blocks, archival memory, recall memory — it gives you stateful agents as the unit of computation, not a layer attached to a stateless one.

If you are adding memory to an existing chatbot, start with Mem0. If you need temporal reasoning ("the user used to prefer X, but switched to Y after April"), use Zep. If you are building a fundamentally stateful agent, Letta is the architecture you want.

Key Takeaways

Architecture: Mem0 is vector + LLM extraction; Zep is a temporal knowledge graph; Letta is a stateful agent runtime with structured memory tiers.
Retrieval latency: Mem0 is fastest (~80ms p50), Zep ~150ms (graph traversals), Letta varies based on memory tier accessed.
Query model: Mem0 returns relevant memories, Zep answers structured queries against a graph, Letta exposes memory as part of the agent's working context.
Pricing: Mem0 has the most generous free tier; Zep has self-host and managed; Letta is open-core with hosted plans.
Framework fit: Mem0 plugs into anything; Zep ships official LangChain/LlamaIndex integrations; Letta replaces parts of your agent loop.

Decision Table

Need	Pick	Why
Drop-in memory for a chatbot	Mem0	Lowest integration cost
"What did the user prefer 3 months ago"	Zep	Temporal graph excels here
Long-running, autonomous agents	Letta	Built around stateful execution
Multi-tenant SaaS with per-user memory	Mem0 or Zep	Strong isolation primitives
Self-host required	Zep or Letta	Both ship OSS cores
RAG-adjacent applications	Zep	Graph + retrieval combined

Mem0

Mem0's superpower is integration speed. You wrap your LLM calls with a few lines of code, and the system extracts memories from conversations, stores them as vectors with metadata, and retrieves the relevant ones on the next turn. The model doing the extraction is yours — Mem0 just orchestrates.

from mem0 import MemoryClient

m = MemoryClient(api_key=os.environ["MEM0_API_KEY"])
m.add("I'm vegetarian and allergic to peanuts", user_id="alice")

memories = m.search("What can I eat?", user_id="alice")
# Returns vegetarian + peanut allergy facts ranked by relevance

Mem0's update-in-place model is an underrated detail. When a user says "I switched to vegan," Mem0 doesn't just append — it reconciles the new fact against existing memories and updates them. That avoids the classic problem where retrieval surfaces stale preferences.

What is good:

Five-minute integration in any agent stack.
Strong free tier and predictable pricing.
The reconciliation behavior is genuinely smart.

What is mid:

No first-class temporal reasoning. If you need "what was true on date X," Mem0 will retrieve relevant memories but not answer the structured question.
Search quality depends heavily on the extraction model you configure.

Zep

Zep takes memory more seriously as data. Their core abstraction is a temporal knowledge graph: entities (people, places, concepts), facts about them (relationships, attributes), and time validity (when did this become true, when did it stop being true). The graph is built incrementally from conversations.

import { ZepClient } from "@getzep/zep-cloud";

const zep = new ZepClient({ apiKey: process.env.ZEP_API_KEY });

await zep.memory.add("session-42", {
  messages: [
    { role: "user", content: "We hired Maya as our new head of product last week." },
  ],
});

const facts = await zep.memory.search("session-42", {
  query: "Who is the head of product?",
});

The graph approach pays off when you need coherent answers across many conversations and the world is changing under the agent. Customer support agents, executive assistants, and sales agents are the natural fits.

What is good:

Temporal reasoning that other tools fake or skip.
Open-source core with a strong managed offering.
Strong RAG story — Zep's retrieval is competitive even against pure vector solutions.

What is mid:

Higher learning curve. The graph model is more powerful but you have to think about it.
Latency is higher than Mem0 because graph traversals do real work.

Letta

Letta (MemGPT's commercial form) takes the most ambitious bet: agents should be stateful, period. Instead of attaching memory to a stateless LLM, Letta gives you an agent runtime where the memory is the agent. Working memory blocks, archival memory, recall memory, and core memory are all primitives the agent itself can read and write.

from letta_client import Letta

client = Letta(token=os.environ["LETTA_API_KEY"])

agent = client.agents.create(
    name="research-assistant",
    memory_blocks=[
        {"label": "persona", "value": "Expert research assistant who tracks ongoing topics."},
        {"label": "human", "value": "Currently researching transformer architectures."},
    ],
)

response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "What papers should I read next?"}],
)

Because Letta agents are durable, the value compounds. An agent you create today has the same memory tomorrow, next week, next month — without you orchestrating that persistence. For long-running autonomous workflows, the architecture matches the problem.

What is good:

Stateful by default, which removes a class of bugs.
Open-source heritage with strong research lineage.
Memory tiers map cleanly to "what should be in context now vs available on demand."

What is mid:

Higher commitment than the others. You're adopting a runtime, not a library.
Less ergonomic if you only need short-conversation memory.

When Each One Wins

Customer support chatbot remembering preferences across visits: Mem0.
Sales agent that needs "the customer's stated budget changed from $50k to $200k in March": Zep.
Autonomous research agent that runs for weeks accumulating findings: Letta.
Multi-user SaaS where each user has isolated memory: Mem0 or Zep, both have solid namespacing.
Voice agent with sub-second latency budget: Mem0 (lowest retrieval overhead).

Who Should Choose What

Pick Mem0 if you want memory in your agent today and the bar is "remember the user's preferences across sessions."
Pick Zep if time is a meaningful dimension in your domain — preferences change, facts expire, decisions are made on date X — and you want the graph to encode that.
Pick Letta if you are designing a fundamentally stateful agent and the lifecycle of the agent is the lifecycle of the memory.

The Verdict

Memory has matured in 2026 from "throw conversations into a vector DB" to three real architectural choices. Most teams will start with Mem0 because the integration cost is near zero. The teams that grow into Zep or Letta usually do so because the simple model started returning stale or contradictory memories at scale, and they realized the data shape of the problem was actually a graph or a stateful agent.

Related reading: RAG pipeline: Pinecone vs Weaviate vs pgvector for the retrieval substrate, and Best AI agent APIs for the wider agent ecosystem.

The API Integration Checklist (Free PDF)