5 min

Chain-of-Thought

Making LLMs Show Their Work

Why "Think Step by Step" Works

When you ask an LLM a complex question directly, it jumps to an answer using a single prediction pass. But when you ask it to reason through steps, each intermediate step becomes context for the next — giving the model more signal to work with.

# Without Chain-of-Thought
Q: A company has 3 departments. Sales has 12 people, Engineering has 28, and Marketing has 8. If the company adds 15% more people to each department, how many total employees will there be?

A: 55 (wrong — it often skips steps)

# With Chain-of-Thought
Q: A company has 3 departments. Sales has 12 people, Engineering has 28, and Marketing has 8. If the company adds 15% more people to each department, how many total employees will there be? Think through this step by step.

A: Let me work through this:
1. Sales: 12 × 1.15 = 13.8 → 14 people
2. Engineering: 28 × 1.15 = 32.2 → 32 people
3. Marketing: 8 × 1.15 = 9.2 → 9 people
4. Total: 14 + 32 + 9 = 55 people

The magic phrase "think step by step" is the simplest form of chain-of-thought (CoT) prompting. But you can do much more.

Structured Chain-of-Thought

Instead of a vague "think step by step," give the model a specific reasoning framework:

Analyze this customer review and determine the appropriate response action.

Review: "The product itself is great but your shipping took 3 weeks and the box arrived damaged. I had to call support twice before getting a replacement."

Please reason through:
1. SENTIMENT: What is the overall sentiment? What specific aspects are positive vs negative?
2. ISSUES: List each distinct issue mentioned.
3. SEVERITY: Rate each issue (low/medium/high) based on customer impact.
4. ROOT CAUSE: What likely caused each issue?
5. ACTION: What specific response should we take?

This produces dramatically better analysis than "What should we do about this review?"

When CoT Helps vs. Hurts

CoT Helps

Multi-step reasoning — math, logic, comparisons across data points

Analysis tasks — evaluating pros/cons, assessing risk, making recommendations

Complex extraction — pulling structured data from messy, ambiguous text

Decision-making — choosing between options with trade-offs

CoT Hurts

Simple lookup tasks — "What's the capital of France?" Adding CoT wastes tokens.

Creative generation — "Write a poem." Reasoning steps can make creative output feel mechanical.

Speed-critical tasks — CoT responses are 2-5x longer, meaning higher latency and cost.

Decomposing Complex Tasks

The most powerful CoT technique: break a big task into explicit sub-tasks.

You are analyzing a business invoice. Complete these steps in order:

STEP 1 — EXTRACT: Pull out invoice_id, vendor, line_items, subtotal, tax, total
STEP 2 — VALIDATE: Check if the line items sum to the subtotal. Flag any discrepancies.
STEP 3 — CLASSIFY: Categorize the expense (software, services, hardware, travel, other)
STEP 4 — FLAG: Note anything unusual (duplicate invoice number, amount over $10K, vendor not in approved list)
STEP 5 — SUMMARY: One-line summary suitable for an expense report

Invoice data:
[paste invoice here]

Each step builds on the previous one. The model can't skip ahead because each step's output feeds the next.

Reasoning Traces for Debugging

Chain-of-thought isn't just for accuracy — it's for debuggability. When a model gives a wrong answer with CoT, you can see exactly where the reasoning went wrong:

Step 1: Extracted vendor as "Acme Corp" ✓
Step 2: Calculated subtotal as $4,500 ✗ (actual: $4,200 — model misread a line item)
Step 3: Classified as "software" based on wrong subtotal

Without CoT, you just get a wrong answer with no way to diagnose it.

Practice Tasks

Using the data in your project:

Use CoT to analyze each email in data/emails.json — extract sentiment, urgency, required action, and suggested response

Use structured decomposition to validate the invoices in data/invoices.csv — extract, validate totals, classify expenses, flag anomalies

Compare zero-shot vs CoT on the same review analysis task using data/reviews.json

Key Takeaways

"Think step by step" works because intermediate tokens give the model more context for each prediction.

Structured CoT with explicit steps outperforms vague "think about it" instructions.

Use CoT for reasoning and analysis. Skip it for simple lookups and creative tasks.

Reasoning traces make wrong answers debuggable — you can see where the logic broke.

This is chapter 3 of Prompt Engineering Essentials.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Zero-Shot & Few-Shot

Ch. 4: System Prompts