4 min

How LLMs Think

Tokens, Context & Probability

What Happens When You Type a Prompt

Large Language Models don't "understand" language the way you do. They predict the most likely next token — a chunk of text, usually 3-4 characters — based on everything that came before it. Every response is a chain of thousands of these probability calculations.

This matters because once you understand the prediction engine, you can steer it.

Tokens: The Atoms of Language

LLMs don't see words. They see tokens — fragments of text from a fixed vocabulary of ~100,000 entries. "Artificial intelligence" is two tokens. "AI" is one. "Pneumonoultramicroscopicsilicovolcanoconiosis" is nine.

Why this matters for prompting:

Context windows have token limits (e.g., 200K tokens for Claude). Your prompt + the response must fit.

Longer prompts cost more — APIs charge per token.

Unusual words split into more tokens, which can affect how the model processes them.

Input:  "Summarize this quarterly report"
Tokens: ["Sum", "mar", "ize", " this", " quarterly", " report"]

Temperature: Controlling Randomness

When the model predicts the next token, it generates probabilities across the entire vocabulary. Temperature controls how it picks from those probabilities:

Temperature	Behavior	Best For
0.0	Always picks the highest-probability token	Factual extraction, classification
0.3–0.7	Slight variation, mostly predictable	Business writing, analysis
0.8–1.0	More creative, occasionally surprising	Brainstorming, creative copy

What LLMs Are Good At

Pattern matching — recognizing structures in text (emails, JSON, tables)

Translation between formats — converting prose to bullet points, CSV to JSON, code to explanation

Summarization — compressing long documents while retaining key points

Following templates — generating output that matches examples you provide

Reasoning through steps — when explicitly asked to think step by step

What LLMs Are Bad At

Math — they approximate, not calculate. "What's 7,849 × 3,271?" gets close but not exact.

Counting — "How many r's in strawberry?" is famously unreliable.

Real-time knowledge — they know what was in training data, not what happened yesterday.

Guaranteed consistency — the same prompt can produce different outputs on different runs.

Citing sources accurately — they pattern-match plausible-sounding citations.

The Mental Model

Think of an LLM as an extremely well-read autocomplete engine. It has read billions of documents and learned patterns about how text follows other text. Your job as a prompt engineer is to set up the right context so the autocomplete engine produces exactly what you need.

The next five modules will teach you specific techniques to do this — starting with the simplest approach (just ask) and building up to structured, reusable prompt systems.

Key Takeaways

LLMs predict tokens, not words. Context windows and costs are measured in tokens.

Temperature controls randomness — lower for facts, higher for creativity.

LLMs excel at pattern matching and format conversion. They struggle with math, counting, and real-time facts.

Better prompts work by giving the prediction engine better context to work with.

This is chapter 1 of Prompt Engineering Essentials.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Zero-Shot & Few-Shot