6 min

Foundations

Energy Landscapes & QUBO Basics

Why Quantum Optimization?

Every time you train a machine learning model, you're solving an optimization problem. Gradient descent finds good weights. Grid search finds good hyperparameters. Feature selection finds the right inputs. But some optimization problems are combinatorial — the solution is a binary choice (include or exclude), and the number of possible combinations grows exponentially.

Feature selection with 30 features has 2^30 = 1,073,741,824 possible subsets. Graph partitioning with 50 nodes has 2^50 ≈ 10^15 possibilities. You can't try them all.

Quantum annealing is an approach to these problems. Instead of searching the solution space algorithmically, it encodes the problem as a physical energy landscape and lets physics (quantum mechanics) find the lowest-energy state.

Energy Landscapes

Think of a combinatorial optimization problem as a hilly terrain. Each point on the terrain is a possible solution. The height at that point is the cost (or energy) of that solution. You want to find the lowest valley.

Greedy algorithms walk downhill from wherever they start. They find a valley, but it might be a local minimum — a small dip, not the deepest valley.

Random search teleports to random points. Given enough jumps, it might land in the deepest valley, but it's inefficient.

Simulated annealing starts by exploring freely (high temperature = accept uphill moves), then gradually becomes pickier (low temperature = only accept downhill). The early exploration helps it discover promising regions; the late exploitation helps it settle into the best one.

Quantum annealing takes this further: quantum tunneling lets the system pass through energy barriers instead of climbing over them. In theory, this finds deeper valleys faster than any classical method. In practice, this advantage is still being proven for real-world problems.

The QUBO Framework

QUBO stands for Quadratic Unconstrained Binary Optimization. It's a way to express any combinatorial problem as:

Minimize E(x) = x^T Q x

Where:

x is a vector of binary variables (0 or 1)

Q is a matrix of coefficients

E is the energy (cost) we want to minimize

Each binary variable represents a decision: include this feature or not, assign this node to community A or B, take this route or not.

The diagonal of Q (Q[i,i]) represents the linear cost of selecting variable i. A negative diagonal means selecting i is rewarding.

The off-diagonal (Q[i,j]) represents the interaction cost of selecting both i and j. A positive off-diagonal means selecting both is penalized.

A Tiny Example

Consider selecting 2 features from 4:

Feature 0: high relevance (MI = 0.9)

Feature 1: medium relevance (MI = 0.5)

Feature 2: high relevance (MI = 0.8) but correlated with Feature 0

Feature 3: low relevance (MI = 0.1)

The QUBO matrix might look like:

     f0     f1     f2     f3
f0 [-0.9   0.5    1.2    0.5  ]
f1 [ 0     -0.5   0.5    0.5  ]
f2 [ 0      0    -0.8    0.5  ]
f3 [ 0      0     0      0.4  ]

Reading this: selecting f0 is rewarding (diagonal = -0.9). Selecting both f0 and f2 is expensive (off-diagonal = 1.2) because they're correlated. The solver should pick f0 and f1 (or f2 and f1) — two uncorrelated informative features.

With 4 variables, there are only 16 possible states. We can check them all:

x = [1,1,0,0] → E = -0.9 + (-0.5) + 0.5 = -0.9

x = [1,0,1,0] → E = -0.9 + (-0.8) + 1.2 = -0.5

x = [0,1,1,0] → E = -0.5 + (-0.8) + 0.5 = -0.8

The optimal solution selects features 0 and 1 — high relevance, low redundancy.

Why Brute Force Fails

This brute-force approach works for 4 variables (16 states). But:

20 variables → 1,048,576 states (still feasible, ~1 second)

30 variables → 1,073,741,824 states (~minutes)

50 variables → 1,125,899,906,842,624 states (impossible)

100 variables → more states than atoms in the universe

This exponential growth is why we need smarter methods. Annealing — whether simulated or quantum — is one such method.

What's Next

In Module 2, you'll learn how to construct QUBO matrices for real problems: feature selection (using mutual information and correlation) and graph partitioning (using edge weights and balance constraints). The math is more intuitive than it looks.

This is chapter 1 of Quantum Optimization for AI.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 2: QUBO Formulation