Back to guides
1
4 min

Claude API Fundamentals

Messages, Streaming & Models

Your First Claude API Call

The Anthropic SDK is the official way to talk to Claude from code. Unlike chat interfaces, the API gives you full control: you choose the model, set the temperature, define system prompts, and stream responses token by token.

The Messages API

Every Claude interaction is a messages request. You send an array of messages (alternating user/assistant turns) and get back a response:

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello, Claude!" }],
});

The response contains an array of content blocks — usually a single text block with Claude's reply.

Message Roles

RolePurposeExample
`system`Sets persistent behavior and context"You are a helpful personal assistant"
`user`The human's input"What meetings do I have today?"
`assistant`Claude's responses (or pre-filled)"You have 3 meetings today..."

The system prompt is special — it's not part of the messages array but a separate parameter. It shapes every response Claude gives.

Streaming: Why It Matters

Without streaming, you wait for the entire response before showing anything. With streaming, tokens appear as they're generated — usually within 200ms of the first token.

const stream = await client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain streaming." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    process.stdout.write(event.delta.text);
  }
}

Streaming is essential for assistants because:

  • Users see progress immediately instead of staring at a blank screen
  • Long responses feel interactive, not like waiting for a page load
  • You can abort early if the response is going off track
  • Model Selection

    ModelSpeedIntelligenceBest For
    HaikuFastestGoodQuick lookups, classification
    SonnetFastVery goodMost assistant tasks
    OpusSlowerBestComplex analysis, nuanced reasoning

    For a personal assistant, Sonnet is the sweet spot — fast enough for interactive use, smart enough for complex tasks like email triage and research synthesis.

    Temperature for Assistants

  • 0.0–0.3 — Factual lookups, data extraction, classification
  • 0.3–0.5 — General assistant tasks, email drafts, scheduling
  • 0.5–0.8 — Creative suggestions, brainstorming
  • For your assistant, start with 0.3 for most tasks. You can adjust per-task later.

    Building a Chat Loop

    A chat loop is the simplest interactive assistant pattern:

  • Read user input from the terminal
  • Append it to the conversation history
  • Send the full history to Claude (with streaming)
  • Print the response as it streams
  • Append Claude's response to the history
  • Repeat
  • The conversation history is just an array that grows with each turn. Claude sees the entire conversation every time, which is how it "remembers" what you said earlier.

    Key Takeaways

  • The Messages API takes a model, max_tokens, optional system prompt, and a messages array.
  • Streaming shows tokens as they generate — essential for interactive assistants.
  • Sonnet is the best model for most assistant tasks. Use temperature 0.3 as a default.
  • Conversation history is just a growing array of messages sent with each request.
  • This is chapter 1 of Build Your AI Assistant with Claude.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details