Back to guides
1
4 min

How LLMs Think

Tokens, Context & Probability

What Happens When You Type a Prompt

Large Language Models don't "understand" language the way you do. They predict the most likely next token — a chunk of text, usually 3-4 characters — based on everything that came before it. Every response is a chain of thousands of these probability calculations.

This matters because once you understand the prediction engine, you can steer it.

Tokens: The Atoms of Language

LLMs don't see words. They see tokens — fragments of text from a fixed vocabulary of ~100,000 entries. "Artificial intelligence" is two tokens. "AI" is one. "Pneumonoultramicroscopicsilicovolcanoconiosis" is nine.

Why this matters for prompting:

  • Context windows have token limits (e.g., 200K tokens for Claude). Your prompt + the response must fit.
  • Longer prompts cost more — APIs charge per token.
  • Unusual words split into more tokens, which can affect how the model processes them.
  • Input:  "Summarize this quarterly report"
    Tokens: ["Sum", "mar", "ize", " this", " quarterly", " report"]

    Temperature: Controlling Randomness

    When the model predicts the next token, it generates probabilities across the entire vocabulary. Temperature controls how it picks from those probabilities:

    TemperatureBehaviorBest For
    0.0Always picks the highest-probability tokenFactual extraction, classification
    0.3–0.7Slight variation, mostly predictableBusiness writing, analysis
    0.8–1.0More creative, occasionally surprisingBrainstorming, creative copy

    What LLMs Are Good At

  • Pattern matching — recognizing structures in text (emails, JSON, tables)
  • Translation between formats — converting prose to bullet points, CSV to JSON, code to explanation
  • Summarization — compressing long documents while retaining key points
  • Following templates — generating output that matches examples you provide
  • Reasoning through steps — when explicitly asked to think step by step
  • What LLMs Are Bad At

  • Math — they approximate, not calculate. "What's 7,849 × 3,271?" gets close but not exact.
  • Counting — "How many r's in strawberry?" is famously unreliable.
  • Real-time knowledge — they know what was in training data, not what happened yesterday.
  • Guaranteed consistency — the same prompt can produce different outputs on different runs.
  • Citing sources accurately — they pattern-match plausible-sounding citations.
  • The Mental Model

    Think of an LLM as an extremely well-read autocomplete engine. It has read billions of documents and learned patterns about how text follows other text. Your job as a prompt engineer is to set up the right context so the autocomplete engine produces exactly what you need.

    The next five modules will teach you specific techniques to do this — starting with the simplest approach (just ask) and building up to structured, reusable prompt systems.

    Key Takeaways

  • LLMs predict tokens, not words. Context windows and costs are measured in tokens.
  • Temperature controls randomness — lower for facts, higher for creativity.
  • LLMs excel at pattern matching and format conversion. They struggle with math, counting, and real-time facts.
  • Better prompts work by giving the prediction engine better context to work with.
  • This is chapter 1 of Prompt Engineering Essentials.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details