4 min

Memory & Context

Conversations That Remember

The Memory Problem

Every Claude API call is stateless. Claude doesn't remember your last conversation, your preferences, or even your name. The messages array you send *is* the memory — and it's your job to manage it.

Conversation History

The simplest memory: keep a growing array of messages and send it with every API call.

Turn 1: messages = [user: "What meetings do I have?"]
Turn 2: messages = [user: "What meetings?", assistant: "You have 3...", user: "Cancel the 2pm"]
Turn 3: messages = [user, assistant, user, assistant, user: "Thanks"]

Claude sees the full conversation and responds in context. "Cancel the 2pm" works because Claude can see the earlier list of meetings.

The Context Window Limit

The messages array can't grow forever. Claude's context window is large (200K tokens) but not infinite. A 60-turn conversation with tool calls can easily reach 50K+ tokens.

Strategy	How It Works	Trade-off
Truncation	Drop oldest messages	Loses early context
Sliding window	Keep last N messages	Predictable but lossy
Summarization	Summarize old messages, keep recent	Best balance

Summarization is the best approach for assistants:

When history exceeds a threshold (e.g., 20 messages), summarize the oldest half

Replace them with a single assistant message: "Summary of earlier conversation: ..."

Keep recent messages in full for immediate context

This preserves key facts while staying within token limits.

User Preferences

A preferences file makes the assistant personal:

{
  "name": "Alex",
  "role": "Engineering Manager",
  "communication_style": "concise, direct",
  "priority_rules": {
    "critical": ["production", "outage", "security"],
    "low": ["newsletter", "social", "digest"]
  },
  "working_hours": "9am-6pm EST"
}

Load this into the system prompt: "The user's name is Alex. They prefer concise, direct communication. They work 9am-6pm EST."

Preferences affect everything:

Email triage uses priority_rules to classify messages

Reply drafts match the communication_style

Calendar queries respect working_hours

Contact Awareness

Loading contacts.csv into the assistant's context changes how it talks about people:

Without contacts: "You have an email from sarah.chen@globex.com"

With contacts: "You have an email from Sarah Chen, VP of Partnerships at Globex — she's your main contact for the analytics integration"

The trick is loading contacts into the system prompt so Claude always has them available, without the user needing to ask.

Persistence Between Sessions

For a local CLI assistant, you can save conversation summaries and preferences to disk:

What to Persist	Where	Format
User preferences	`config/preferences.json`	JSON
Conversation summary	`config/history.json`	JSON with summary + key facts
Contact context	`data/contacts.csv`	Loaded at startup

On startup, load the previous session's summary and inject it: "In our last conversation, we discussed the Q3 budget review and you asked me to follow up on the Globex proposal."

Memory Changes Everything

Without memory, every conversation starts cold:

> "Hi, I'm your assistant. How can I help?"

With memory:

> "Good morning, Alex. You have 3 unread emails — one critical alert from the monitoring system. Your first meeting is at 10am with Sarah Chen about the Globex integration. Want me to start with the alert?"

That's the difference between a chatbot and an assistant.

Key Takeaways

Conversation history is a growing messages array. You manage it — Claude doesn't persist anything.

Summarize old messages to stay within context limits while preserving key facts.

User preferences in the system prompt make the assistant personal and consistent.

Contact awareness transforms impersonal email addresses into meaningful context.

This is chapter 5 of Build Your AI Assistant with Claude.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 4: Research Agent

Ch. 6: Deploy Locally