Two Paths, One Framework
You've learned the QUBO framework. Now it's time to apply it to a real problem. This course offers two paths — both use the same solver, but they encode fundamentally different optimization problems.
Path A: Feature Selection for Machine Learning
The Problem: You have a dataset with 30 features and a binary classification target. 8 features are truly informative, 5 are redundant (correlated copies), and 17 are noise. Can the QUBO solver find the right 8?
Why This Matters:
Dimensionality reduction improves model speed, interpretability, and often accuracyTraditional methods (Lasso, recursive feature elimination) are greedy — they make local decisionsQUBO-based selection considers global interactions between features (the redundancy penalty makes correlated pairs expensive)At scale (1000+ features), the combinatorial explosion makes exhaustive search impossibleWhat You'll Build:
QUBO with three terms: relevance (MI), redundancy (correlation), cardinality (select K)Evaluation: compare selected features against ground truth and measure classifier accuracySensitivity analysis: how do alpha, beta, gamma affect which features get selected?The Dataset:
classification-30features.json contains:
500 samples with 30 features eachPre-computed mutual information scores (feature → target relevance)Pre-computed correlation matrix (feature × feature redundancy)Ground truth: features 0-7 are informative, 8-12 are redundant copiesSuccess Looks Like:
The solver selects ~8 features that overlap heavily with the ground truth (features 0-7), avoids redundant pairs, and achieves comparable or better classifier accuracy than using all 30 features.
Path B: Graph Partitioning for Clustering
The Problem: You have a social network with 50 nodes and ~200 edges. There are 3 natural communities. Can the QUBO solver find them?
Why This Matters:
Community detection is fundamental to social network analysis, recommendation systems, and fraud detectionGraph partitioning is NP-hard — no polynomial-time algorithm guarantees optimal solutionsQUBO-based partitioning considers the global edge structure, not just local densityMulti-way partitioning (3+ communities) adds complexity: you need multiple binary variables per nodeWhat You'll Build:
QUBO with two terms: edge reward (maximize intra-community connections) and balance constraintStart with 2-way partition, then extend to 3-way (requires encoding tricks)Evaluation: compute modularity score and compare partition against ground truth communitiesThe Dataset:
social-graph.json contains:
50 nodes with ~200 weighted edgesHigher edge density within communities (0.35 probability) than between (0.05)3 ground truth communities of ~17 nodes eachFull adjacency matrix for computationSuccess Looks Like:
The solver produces a partition that aligns with the ground truth communities, achieves high modularity (>0.3), and correctly separates the dense intra-community connections from sparse inter-community ones.
Choosing Your Path
Both paths are equally rigorous. Choose based on what connects to your work:
If you work with tabular data, ML pipelines, or high-dimensional datasets → Feature SelectionIf you work with networks, graphs, social data, or clustering → Graph PartitioningIf neither resonates → Feature Selection is more broadly applicable to MLThere's no wrong choice. The QUBO framework transfers between problems — once you've solved one, encoding the other is straightforward.
What's Next
After choosing your path, you'll:
Build the problem-specific QUBO (Module 3 lab)Run the annealing solver and analyze results (Module 4)Build an interactive dashboard (Module 5)Benchmark against classical methods and document limitations (Module 6)