Back to guides
4
6 min

Annealing Solver

Simulated Annealing & Solution Analysis

How Simulated Annealing Works

Simulated annealing (SA) is a metaheuristic inspired by the physical process of cooling metal. When metal is heated and slowly cooled (annealed), atoms settle into a low-energy crystalline structure. SA applies the same principle to optimization.

The Algorithm

1. Start with a random binary state x
2. Set temperature T = T_start (high)
3. Repeat for num_steps:
   a. Pick a random variable i
   b. Flip it: x_i = 1 - x_i
   c. Compute energy change: dE = E(new) - E(old)
   d. If dE <= 0: accept (it's an improvement)
   e. If dE > 0: accept with probability exp(-dE / T)
   f. Cool: T = T * cooling_rate
4. Return the final state and its energy

Why It Works

The acceptance probability exp(-dE / T) is the key insight:

  • High temperature: exp(-dE / T) ≈ 1 for moderate dE. Almost any move is accepted. The solver explores the landscape freely, jumping over barriers.
  • Low temperature: exp(-dE / T) ≈ 0 for positive dE. Only improvements are accepted. The solver exploits the current region, settling into the nearest minimum.
  • Goldilocks zone: During cooling, the solver transitions from exploration to exploitation. If the schedule is right, it discovers good regions early and refines them late.
  • Temperature Schedule

    The geometric cooling schedule multiplies temperature by a constant factor each step:

    cooling_rate = (T_end / T_start) ^ (1 / num_steps)
    T_next = T_current * cooling_rate

    This produces exponential decay: fast cooling at high temperatures (where there's less to gain from exploration) and slow cooling at low temperatures (where precision matters).

    Key parameters:

  • T_start = 10: High enough to accept most moves initially
  • T_end = 0.01: Low enough to settle into a minimum
  • num_steps = 1000: More steps = finer cooling = better solutions (with diminishing returns)
  • num_reads = 100: Independent runs from different random starts
  • Multiple Reads

    SA is stochastic — different starting points lead to different solutions. Running multiple independent chains (num_reads) and keeping the best increases your chance of finding the global optimum.

    The energy distribution across reads tells you about the landscape:

  • Tight distribution (all reads find similar energy): the landscape has few minima, SA converges reliably
  • Wide distribution (reads find very different energies): the landscape is rugged, more reads help
  • Simulated vs Quantum Annealing

    PropertySimulated AnnealingQuantum Annealing
    Escape mechanismThermal fluctuationQuantum tunneling
    HardwareAny CPUD-Wave quantum processor
    Problem sizeNo hard limit~5000 qubits (with embedding)
    SpeedO(num_steps × num_reads)Fixed anneal time (~20μs)
    GuaranteeConverges to optimal as T→0 (infinitely slow)No formal guarantee
    Practical advantageWell-understood, reliableDebated — active research

    The honest assessment: for problems under ~100 variables, well-tuned SA is competitive with or better than current quantum hardware. The QUBO framework is the valuable part — it works with both backends.

    Analyzing Solutions

    After running the solver, you should check:

  • Best energy: Is it significantly better than random? Than greedy?
  • Selected variables: Do they make domain sense? (e.g., are the selected features actually informative?)
  • Consistency: Do multiple reads converge to similar solutions?
  • Sensitivity: How much do results change when you tweak QUBO weights?
  • Baseline Comparisons

    Always compare against simple baselines:

  • Random selection: Pick K variables uniformly at random. This is the "no intelligence" baseline.
  • Greedy selection: Iteratively pick the variable that most improves the objective. This is the "local intelligence" baseline.
  • SA selection: The annealing result. Should beat both, especially when interactions (off-diagonal terms) matter.
  • Greedy fails when features are interacting: it might greedily pick two correlated features because each is individually good, missing that together they're redundant. SA considers the full QUBO structure.

    Convergence Analysis

    Plot best energy vs num_reads and best energy vs num_steps:

  • If the curve is still improving → increase that parameter
  • If the curve has plateaued → you've found the practical limit
  • The elbow point is your sweet spot for compute budget
  • What's Next

    In Module 5, you'll build an interactive dashboard that makes this analysis visual — sliders for parameters, charts for energy distributions, side-by-side comparisons with baselines.

    This is chapter 4 of Quantum Optimization for AI.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details