7 min

Supervisor Pattern

The Orchestrator That Never Does the Work

What Is the Supervisor Pattern?

The supervisor pattern separates coordination from execution. One entity (the supervisor) decides what needs to happen and who should do it. Other entities (the agents) do the actual work. The supervisor never writes a memo, never runs an analysis, never touches the code. It plans, routes, monitors, and assembles.

This separation is critical because coordination and execution require fundamentally different capabilities. Executing a research task requires domain knowledge, access to data sources, and the ability to synthesize findings. Coordinating four agents requires understanding task dependencies, monitoring progress, handling failures, and deciding when the overall job is done.

The analogy: A conductor doesn't play any instrument. They manage tempo, dynamics, and transitions. An orchestra without a conductor is four musicians playing different songs at different speeds.

Task Decomposition: Query to DAG

When a user submits "research our competitor's latest product launch, analyze the market impact, and draft a response strategy memo," the supervisor's first job is decomposition — breaking this into discrete sub-tasks.

The output is a Directed Acyclic Graph (DAG):

Research & Data Gathering (researcher)
    ↓
Data Analysis (analyst) ──── Technical Implementation (coder)
    ↓                              ↓
           Document Drafting (writer)

Dependencies enforce order:

The researcher starts immediately (no dependencies)

The analyst and coder depend on research (they run after, potentially in parallel)

The writer depends on ALL other tasks (it needs everyone's output)

The decomposer produces Task[] where each task has a dependencies array of task IDs. The orchestrator checks dependencies before executing — if the analyst's dependencies aren't met, it skips to the next ready task.

Routing: Task to Agent

With the task DAG defined, the supervisor routes each task to the best agent. Routing combines two signals:

Agent self-assessment (60% weight) — Each agent's canHandle(task) returns a confidence score. The researcher scores high on research tasks, low on coding tasks.

Keyword matching (40% weight) — Words in the task description are compared to the agent's capability descriptions. "Market analysis" matches the analyst's "Market Analysis" capability.

The router scores all agents for each task and picks the highest:

"Research & Data Gathering"
  researcher: 0.72 ← winner
  writer:     0.31
  analyst:    0.48
  coder:      0.22

The decomposer can suggest an agent, but the router validates the suggestion. If the decomposer says "researcher" but the researcher scores below 0.5, the router overrides. This prevents bad decomposition from cascading into bad execution.

State Management

The supervisor maintains a SupervisorState that tracks everything:

Field	Purpose
`phase`	Current pipeline phase (decomposing → routing → executing → reviewing → complete)
`tasks`	All sub-tasks with status, assignment, and results
`messages`	Full communication log between agents and supervisor
`artifacts`	All outputs produced by agents
`tokenBudget`	Total and per-agent token usage
`iteration`	Current iteration (for retry/refinement loops)

The StateManager provides context assembly — when an agent starts a task, it receives a curated context including:

The original user query

Results from completed tasks it depends on

Recent relevant messages

The current artifact count and iteration

This is information scoping. The writer doesn't need the full 50-message conversation history — it needs the researcher's findings, the analyst's conclusions, and the original query. The StateManager filters appropriately.

Phase Machine

The orchestration pipeline is a state machine:

decomposing → routing → executing → reviewing → assembling → complete

Each phase has clear entry and exit criteria:

Decomposing exits when all tasks are created with dependencies

Routing exits when all tasks have assigned agents

Executing exits when all tasks are completed or failed

Reviewing exits when quality checks pass (or max iterations hit)

Assembling exits when the final output is produced

Complete is the terminal state

The phase machine prevents the orchestrator from getting stuck. If executing takes too long, the maxIterations cap forces a transition to reviewing. If reviewing fails quality checks, the system can loop back to executing for refinement — but only up to maxIterations times.

Why Not Just Chain Prompts?

The obvious alternative to a supervisor is prompt chaining: call the researcher, pass its output to the analyst, pass both to the writer. Simple, linear, no routing needed.

Prompt chaining fails for three reasons:

No parallelism. The analyst and coder could run simultaneously, but a chain forces sequential execution. For complex queries with 4+ agents, this doubles wall-clock time.

No recovery. If the analyst fails in a chain, the writer gets nothing. The supervisor can retry the analyst, route to a fallback, or skip the analysis and let the writer work with what's available.

No adaptation. Chains are static — always the same sequence. The supervisor adapts: simple queries might skip analysis entirely, while complex queries add extra review rounds. The task DAG is dynamic.

The supervisor pattern costs more engineering upfront, but it's the pattern that scales from 2 agents to 20 agents without architectural changes.

This is chapter 2 of Multi-Agent Orchestration.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 1: Agent Design

Ch. 3: Shared Memory