Back to guides
3
5 min

Chain-of-Thought

Making LLMs Show Their Work

Why "Think Step by Step" Works

When you ask an LLM a complex question directly, it jumps to an answer using a single prediction pass. But when you ask it to reason through steps, each intermediate step becomes context for the next — giving the model more signal to work with.

# Without Chain-of-Thought
Q: A company has 3 departments. Sales has 12 people, Engineering has 28, and Marketing has 8. If the company adds 15% more people to each department, how many total employees will there be?

A: 55 (wrong — it often skips steps)

# With Chain-of-Thought
Q: A company has 3 departments. Sales has 12 people, Engineering has 28, and Marketing has 8. If the company adds 15% more people to each department, how many total employees will there be? Think through this step by step.

A: Let me work through this:
1. Sales: 12 × 1.15 = 13.8 → 14 people
2. Engineering: 28 × 1.15 = 32.2 → 32 people
3. Marketing: 8 × 1.15 = 9.2 → 9 people
4. Total: 14 + 32 + 9 = 55 people

The magic phrase "think step by step" is the simplest form of chain-of-thought (CoT) prompting. But you can do much more.

Structured Chain-of-Thought

Instead of a vague "think step by step," give the model a specific reasoning framework:

Analyze this customer review and determine the appropriate response action.

Review: "The product itself is great but your shipping took 3 weeks and the box arrived damaged. I had to call support twice before getting a replacement."

Please reason through:
1. SENTIMENT: What is the overall sentiment? What specific aspects are positive vs negative?
2. ISSUES: List each distinct issue mentioned.
3. SEVERITY: Rate each issue (low/medium/high) based on customer impact.
4. ROOT CAUSE: What likely caused each issue?
5. ACTION: What specific response should we take?

This produces dramatically better analysis than "What should we do about this review?"

When CoT Helps vs. Hurts

CoT Helps

  • Multi-step reasoning — math, logic, comparisons across data points
  • Analysis tasks — evaluating pros/cons, assessing risk, making recommendations
  • Complex extraction — pulling structured data from messy, ambiguous text
  • Decision-making — choosing between options with trade-offs
  • CoT Hurts

  • Simple lookup tasks — "What's the capital of France?" Adding CoT wastes tokens.
  • Creative generation — "Write a poem." Reasoning steps can make creative output feel mechanical.
  • Speed-critical tasks — CoT responses are 2-5x longer, meaning higher latency and cost.
  • Decomposing Complex Tasks

    The most powerful CoT technique: break a big task into explicit sub-tasks.

    You are analyzing a business invoice. Complete these steps in order:
    
    STEP 1 — EXTRACT: Pull out invoice_id, vendor, line_items, subtotal, tax, total
    STEP 2 — VALIDATE: Check if the line items sum to the subtotal. Flag any discrepancies.
    STEP 3 — CLASSIFY: Categorize the expense (software, services, hardware, travel, other)
    STEP 4 — FLAG: Note anything unusual (duplicate invoice number, amount over $10K, vendor not in approved list)
    STEP 5 — SUMMARY: One-line summary suitable for an expense report
    
    Invoice data:
    [paste invoice here]

    Each step builds on the previous one. The model can't skip ahead because each step's output feeds the next.

    Reasoning Traces for Debugging

    Chain-of-thought isn't just for accuracy — it's for debuggability. When a model gives a wrong answer with CoT, you can see exactly where the reasoning went wrong:

    Step 1: Extracted vendor as "Acme Corp" ✓
    Step 2: Calculated subtotal as $4,500 ✗ (actual: $4,200 — model misread a line item)
    Step 3: Classified as "software" based on wrong subtotal

    Without CoT, you just get a wrong answer with no way to diagnose it.

    Practice Tasks

    Using the data in your project:

  • Use CoT to analyze each email in data/emails.json — extract sentiment, urgency, required action, and suggested response
  • Use structured decomposition to validate the invoices in data/invoices.csv — extract, validate totals, classify expenses, flag anomalies
  • Compare zero-shot vs CoT on the same review analysis task using data/reviews.json
  • Key Takeaways

  • "Think step by step" works because intermediate tokens give the model more context for each prediction.
  • Structured CoT with explicit steps outperforms vague "think about it" instructions.
  • Use CoT for reasoning and analysis. Skip it for simple lookups and creative tasks.
  • Reasoning traces make wrong answers debuggable — you can see where the logic broke.
  • This is chapter 3 of Prompt Engineering Essentials.

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details