4 min

Claude API Fundamentals

Messages, Streaming & Models

Your First Claude API Call

The Anthropic SDK is the official way to talk to Claude from code. Unlike chat interfaces, the API gives you full control: you choose the model, set the temperature, define system prompts, and stream responses token by token.

The Messages API

Every Claude interaction is a messages request. You send an array of messages (alternating user/assistant turns) and get back a response:

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello, Claude!" }],
});

The response contains an array of content blocks — usually a single text block with Claude's reply.

Message Roles

Role	Purpose	Example
`system`	Sets persistent behavior and context	"You are a helpful personal assistant"
`user`	The human's input	"What meetings do I have today?"
`assistant`	Claude's responses (or pre-filled)	"You have 3 meetings today..."

The system prompt is special — it's not part of the messages array but a separate parameter. It shapes every response Claude gives.

Streaming: Why It Matters

Without streaming, you wait for the entire response before showing anything. With streaming, tokens appear as they're generated — usually within 200ms of the first token.

const stream = await client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain streaming." }],
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    process.stdout.write(event.delta.text);
  }
}

Streaming is essential for assistants because:

Users see progress immediately instead of staring at a blank screen

Long responses feel interactive, not like waiting for a page load

You can abort early if the response is going off track

Model Selection

Model	Speed	Intelligence	Best For
Haiku	Fastest	Good	Quick lookups, classification
Sonnet	Fast	Very good	Most assistant tasks
Opus	Slower	Best	Complex analysis, nuanced reasoning

For a personal assistant, Sonnet is the sweet spot — fast enough for interactive use, smart enough for complex tasks like email triage and research synthesis.

Temperature for Assistants

0.0–0.3 — Factual lookups, data extraction, classification

0.3–0.5 — General assistant tasks, email drafts, scheduling

0.5–0.8 — Creative suggestions, brainstorming

For your assistant, start with 0.3 for most tasks. You can adjust per-task later.

Building a Chat Loop

A chat loop is the simplest interactive assistant pattern:

Read user input from the terminal

Append it to the conversation history

Send the full history to Claude (with streaming)

Print the response as it streams

Append Claude's response to the history

Repeat

The conversation history is just an array that grows with each turn. Claude sees the entire conversation every time, which is how it "remembers" what you said earlier.

Key Takeaways

The Messages API takes a model, max_tokens, optional system prompt, and a messages array.

Streaming shows tokens as they generate — essential for interactive assistants.

Sonnet is the best model for most assistant tasks. Use temperature 0.3 as a default.

Conversation history is just a growing array of messages sent with each request.

This is chapter 1 of Build Your AI Assistant with Claude.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Tool Use