Most AI agents today are stateless. Every conversation starts from zero, and when the session ends, everything the agent learned disappears. Adding persistent memory changes that fundamentally — your agent can remember user preferences, recall past interactions, and build context over time.

This tutorial shows you how to add persistent memory to any AI agent using MetaMemory's API. We'll go from zero to a working memory-enabled agent in about five minutes.

Prerequisites

You'll need:

A MetaMemory account (sign up at app.metamemory.tech)
Your API key from the dashboard
An existing AI agent or chatbot (any framework — we'll use generic HTTP calls)

Step 1: Start an Episode

MetaMemory organizes memories around episodes — coherent periods of interaction between your agent and a user. Start by creating an episode:

curl -X POST https://api.metamemory.tech/v1/episodes \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_123",
    "agent_id": "support-bot",
    "metadata": {
      "channel": "web",
      "topic": "onboarding"
    }
  }'

The response gives you an episode_id that you'll use for all subsequent operations in this conversation:

{
  "episode_id": "ep_abc123",
  "user_id": "user_123",
  "created_at": "2026-03-21T10:30:00Z"
}

Step 2: Store Memories During Conversation

As your agent processes messages, store important interactions as memories. You don't need to store every message — focus on information the agent should remember later:

curl -X POST https://api.metamemory.tech/v1/memories \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "episode_id": "ep_abc123",
    "content": "User prefers Python and is deploying on AWS Lambda. They are migrating from a monolithic Flask app to serverless microservices.",
    "metadata": {
      "type": "preference",
      "importance": "high"
    }
  }'

Behind the scenes, MetaMemory encodes this across four vector spaces simultaneously — semantic (the factual content), emotional (detected sentiment), process (any workflow knowledge), and context (when this happened and what surrounded it). This multi-vector encoding is what enables richer retrieval later.

Step 3: Retrieve Relevant Memories

Before generating a response to a user message, query MetaMemory for relevant context:

curl -X POST https://api.metamemory.tech/v1/memories/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "user_123",
    "query": "How should I structure my Lambda functions?",
    "top_k": 5
  }'

The response returns ranked memories with relevance scores:

{
  "memories": [
    {
      "id": "mem_xyz789",
      "content": "User prefers Python and is deploying on AWS Lambda. They are migrating from a monolithic Flask app to serverless microservices.",
      "relevance_score": 0.94,
      "retrieval_channel": "semantic",
      "created_at": "2026-03-21T10:31:00Z"
    },
    {
      "id": "mem_xyz790",
      "content": "User had issues with cold start latency on previous Lambda deployment. Resolved by using provisioned concurrency.",
      "relevance_score": 0.87,
      "retrieval_channel": "temporal",
      "created_at": "2026-03-14T15:22:00Z"
    }
  ]
}

Notice that the second memory came from a previous session two weeks ago. This is persistent memory in action — context that spans across conversations.

Step 4: Inject Memories Into Your Prompt

Take the retrieved memories and include them in your LLM prompt. Here's a minimal Python example:

import requests
import openai

def get_memories(user_id: str, query: str) -> list[dict]:
    resp = requests.post(
        "https://api.metamemory.tech/v1/memories/search",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={"user_id": user_id, "query": query, "top_k": 5}
    )
    return resp.json()["memories"]

def generate_response(user_id: str, user_message: str) -> str:
    # Retrieve relevant memories
    memories = get_memories(user_id, user_message)
    memory_context = "\n".join(
        f"- {m['content']}" for m in memories
    )

    # Build prompt with memory context
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"""You are a helpful assistant.
Here is what you remember about this user:
{memory_context}

Use these memories to personalize your response."""
            },
            {"role": "user", "content": user_message}
        ]
    )
    return response.choices[0].message.content

Step 5: Complete the Episode

When the conversation ends, complete the episode. This triggers MetaMemory's consolidation process, which merges related memories and compresses redundant information — typically achieving around 70% compression while preserving recall quality:

curl -X POST https://api.metamemory.tech/v1/episodes/ep_abc123/complete \
  -H "Authorization: Bearer YOUR_API_KEY"

What Happens Behind the Scenes

When you store a memory, MetaMemory does several things simultaneously:

Multi-vector encoding: The content is embedded across four vector spaces (semantic, emotional, process, context), each capturing a different dimension of the information.
Episode detection: The system automatically groups related memories into episodes — coherent sequences of interactions that belong together.
Emotional analysis: Sentiment and emotional state are detected and encoded, allowing future retrieval to account for the user's emotional context.
Importance scoring: Each memory is scored for importance based on content, context, and emotional weight. High-importance memories are preserved during consolidation.

When you retrieve memories, five specialized channels — semantic, temporal, emotional, keyword, and graph — compete to surface the best results. Thompson Sampling learns which channels work best over time, so retrieval quality improves with usage.

Going Further: MCP Integration

If you're using Claude Code or any MCP-compatible client, MetaMemory has a first-class MCP server that makes integration even simpler. Instead of HTTP calls, your agent calls tools like store_memory and retrieve_context directly:

// In your MCP client configuration
{
  "mcpServers": {
    "metamemory": {
      "command": "npx",
      "args": ["-y", "@metamemory/mcp-server"],
      "env": {
        "METAMEMORY_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

With MCP, the LLM itself decides when to store and retrieve memories, making the integration more natural and requiring less manual orchestration.

Summary

Adding persistent memory to your AI agent takes three API calls: start an episode, store memories during the conversation, and retrieve them when you need context. The result is an agent that actually remembers — user preferences, past interactions, emotional context, and learned workflows. No more starting from scratch every session.

How to Add Persistent Memory to Your AI Agent in 5 Minutes