top of page

Tutorial 4: Agent Memory and State

  • Contributor
  • Jun 10
  • 4 min read

A stateless agent can't learn from previous interactions. Memory makes the agent useful across turns and sessions. This tutorial walks through the three layers.

What You'll Build

An agent with three memory types: working (current task), conversation (current session), and persistent (across sessions).

Step 1: Working Memory (5 min)

This is just the agent's loop state — the messages array. Implicit in the API.

messages = []  # Current task's working memory

while not done:
    response = client.messages.create(messages=messages, ...)
    messages.append(response)
    # ...

Cleared when the task completes.

Step 2: Conversation Memory (15 min)

History of the current session. Maintained between agent calls.

class Session:
    def __init__(self, session_id):
        self.session_id = session_id
        self.messages = self.load_messages()  # From DB
    
    def chat(self, user_message):
        self.messages.append({"role": "user", "content": user_message})
        response = call_agent(self.messages)
        self.messages.append({"role": "assistant", "content": response})
        self.save_messages()
        return response

Within a session, the agent has context.

Step 3: Persistent (User) Memory (30 min)

Facts about the user that persist across sessions:

# Schema
CREATE TABLE user_memory (
    user_id TEXT,
    fact TEXT,
    confidence FLOAT,
    last_referenced TIMESTAMPTZ,
    PRIMARY KEY (user_id, fact)
);

Insert facts as discovered:

def extract_and_store_facts(user_id, user_message):
    facts = extract_facts(user_message)  # LLM call
    for fact in facts:
        db.execute("""
            INSERT INTO user_memory (user_id, fact, confidence, last_referenced)
            VALUES (%s, %s, %s, NOW())
            ON CONFLICT (user_id, fact) DO UPDATE
            SET last_referenced = NOW()
        """, [user_id, fact, 0.8])

Inject into agent's system prompt:

def get_user_context(user_id):
    facts = db.query("SELECT fact FROM user_memory WHERE user_id = %s", [user_id])
    return "\n".join(f["fact"] for f in facts)

Step 4: Wire Memory Into the Agent (20 min)

def run_agent(session_id, user_message):
    session = load_session(session_id)
    user_id = session.user_id
    
    # Load user context
    user_context = get_user_context(user_id)
    
    system_prompt = f"""
    {BASE_SYSTEM_PROMPT}
    
    Known about this user:
    {user_context}
    """
    
    # Run with conversation history
    response = agent_loop(
        system=system_prompt,
        messages=session.messages + [{"role": "user", "content": user_message}],
    )
    
    # Extract new facts; store
    extract_and_store_facts(user_id, user_message)
    extract_and_store_facts(user_id, response)
    
    # Update conversation memory
    session.append_messages([
        {"role": "user", "content": user_message},
        {"role": "assistant", "content": response},
    ])
    
    return response

Three memory layers all in play.

Step 5: Memory Decay (15 min)

Old facts may become stale:

def cleanup_stale_memory():
    db.execute("""
        DELETE FROM user_memory
        WHERE last_referenced < NOW() - INTERVAL '6 months'
    """)

Or weight by recency:

def get_user_context(user_id, limit=20):
    return db.query("""
        SELECT fact FROM user_memory 
        WHERE user_id = %s 
        ORDER BY last_referenced DESC 
        LIMIT %s
    """, [user_id, limit])

Step 6: Episodic Memory (advanced, varies)

Memory of past interactions:

CREATE TABLE episode_summaries (
    user_id TEXT,
    session_id TEXT,
    summary TEXT,
    embedding VECTOR(1536),
    created_at TIMESTAMPTZ
);

When a session ends, summarize:

def end_session(session):
    summary = summarize_session(session.messages)
    embedding = embed(summary)
    
    db.execute("""
        INSERT INTO episode_summaries 
        (user_id, session_id, summary, embedding, created_at)
        VALUES (%s, %s, %s, %s, NOW())
    """, [session.user_id, session.session_id, summary, embedding])

Retrieve relevant past episodes for current context:

def relevant_episodes(user_id, current_question, top_k=3):
    embedding = embed(current_question)
    return db.query("""
        SELECT summary FROM episode_summaries
        WHERE user_id = %s
        ORDER BY embedding <=> %s
        LIMIT %s
    """, [user_id, embedding, top_k])

The agent remembers relevant past interactions.

Step 7: Tool-Triggered Memory (10 min)

Tools can also write to memory:

def remember(user_id: str, fact: str) -> str:
    """Explicitly remember something about the user."""
    db.execute("""
        INSERT INTO user_memory (user_id, fact, confidence, last_referenced)
        VALUES (%s, %s, 1.0, NOW())
    """, [user_id, fact])
    return f"Remembered: {fact}"

# Add to tools
TOOLS.append({
    "name": "remember",
    "description": "Save something important about the user for future conversations.",
    # ...
})

The agent can explicitly say "I should remember this."

Step 8: Test Memory Across Sessions (varies)

# Session 1
agent.chat("session1", "My name is Pat and I work in marketing.")

# Session 2 (different session ID)
response = agent.chat("session2", "What's my role again?")
assert "marketing" in response.lower()

Memory should persist across sessions.

Step 9: Privacy Considerations (15 min)

Memory has privacy implications:

  • User should be able to see what's remembered

  • User should be able to delete

  • Sensitive data shouldn't be auto-extracted (PII, financial)

  • Forgetting should be honored

def list_user_memory(user_id):
    return db.query("SELECT fact FROM user_memory WHERE user_id = %s", [user_id])

def forget_user_memory(user_id, fact=None):
    if fact:
        db.execute("DELETE FROM user_memory WHERE user_id=%s AND fact=%s", [user_id, fact])
    else:
        db.execute("DELETE FROM user_memory WHERE user_id=%s", [user_id])

Build the UI for these operations.

Step 10: Monitor Memory Quality (ongoing)

  • Are stored facts useful?

  • Does memory inclusion improve responses?

  • Are facts accurate?

Periodically review memory contents. Stale or wrong facts degrade the agent.

What You Just Did

You added three-layer memory to your agent. Working memory handles the current task; conversation memory handles the session; user memory persists across sessions.

Common Failure Modes

Unbounded memory. Token cost grows; eventually exceeds context.

Auto-extracted PII. Sensitive data stored without consent.

No forget mechanism. Privacy violation; can't update wrong facts.

Stale memory injected. Old facts confuse the agent.

Memory without testing. Don't know if it's actually helping.

Next Tutorial

Real agents fail. Handle it: Tutorial 5: Handle Agent Failures.

Related reading

Keep learning. This article is part of the AI in Quality & Delivery path in the ShiftQuality Learning Center. Use AI in delivery — and evaluate it honestly — without the hype.

bottom of page