7 min

Consensus & Handoff

When Agents Disagree and When They Pass the Baton

The Agreement Problem

Four agents working on related tasks will inevitably produce conflicting outputs. The researcher finds NovaTech's market share at 24.5%. The analyst models it at 19.3%. The writer cites the researcher's number. The analyst objects. Which is right? Who decides?

Without a consensus mechanism, the last agent to write wins — or worse, both numbers end up in the final deliverable, confusing the reader. Consensus protocols make disagreements explicit, structured, and resolvable.

Multi-Agent Voting

Voting is the simplest consensus mechanism. After an agent produces output, other agents review it and vote: approve or reject, with a confidence score and reasoning.

Three voting strategies serve different use cases:

Majority Vote

More approvals than rejections wins. Fast, cheap, and appropriate for most tasks. A research report approved by 2 of 3 reviewers is probably good enough.

Unanimous Vote

Every voter must approve. No abstentions allowed. Use for high-stakes outputs — customer-facing documents, financial projections, anything where a single dissenting voice should block publication. Unanimous voting is expensive (every agent must review) but provides the strongest quality guarantee.

Weighted Vote

Each vote is scaled by the voter's confidence. A researcher who's 90% confident their approval outweighs a coder who's 40% confident in rejection. This respects domain expertise — the researcher's opinion on data quality matters more than the coder's.

// Weighted voting example
const votes: Vote[] = [
  { agentRole: "researcher", approve: true, confidence: 0.9, reasoning: "Sources properly cited" },
  { agentRole: "analyst", approve: false, confidence: 0.4, reasoning: "Market size seems high" },
  { agentRole: "coder", approve: true, confidence: 0.7, reasoning: "Code examples are valid" },
];

// Weighted: (0.9 + 0.7) approve = 1.6 vs (0.4) reject = 0.4 → approved

The average confidence across all votes is your quality signal. A vote that passes with 0.9 average confidence is a strong approval. A vote that passes with 0.5 average confidence is borderline — flag it for human review.

Quality Gates

Before expensive agent-to-agent voting, automated quality checks catch structural problems:

Check	Applies To	Rule
Length	All	Output > 100 characters
Structure	Reports, Memos	Must include section headers (`##`)
Citations	Reports, Analysis	Must reference data sources
Findings	Reports, Analysis	Must include explicit findings/recommendations
Error handling	Code	Must include try/catch or error patterns

Quality gates have type-specific thresholds. Code needs to pass at 0.8 (broken code is immediately visible). Research needs 0.7 (incomplete research is still partially useful). The default is 0.65 — it catches catastrophic failures without blocking everything.

The quality checker returns three outputs: a score (0-1), issues (must fix), and suggestions (nice to have). Issues reduce the score significantly; suggestions are gentle nudges. The orchestrator uses this to decide: retry the task (score below threshold), proceed with warnings (score above threshold but has suggestions), or proceed clean (high score, no issues).

Agent Handoff Protocol

Handoffs happen when an agent realizes mid-task that another agent should take over. This is normal — routing isn't perfect, and tasks evolve as agents work on them.

A handoff carries five things:

From/To agents — Who's handing off, who's receiving

Task ID — Which task is being transferred

Reason — Why the handoff is happening ("Requires quantitative modeling beyond my expertise")

Context — What the originating agent learned (partial findings, observations, dead ends)

Artifacts — Any outputs already produced (the receiving agent builds on these)

Idempotency Tokens

The handoff protocol includes idempotency tokens — unique identifiers that prevent duplicate processing. If the supervisor retries a handoff (network error, timeout), the token ensures the receiving agent doesn't process the same handoff twice.

This is the same pattern payment systems use to prevent double-charges:

// First attempt
handoff.acceptHandoff("ho-123"); // → true (processed)

// Retry (same handoff)
handoff.acceptHandoff("ho-123"); // → false (duplicate detected)

Without idempotency, a retried handoff could cause the analyst to run the same analysis twice, wasting tokens and potentially producing inconsistent results if the data changed between runs.

Handoff Rejection

The receiving agent can reject a handoff. "I can't handle this — I don't have the web_search tool needed for external research." The rejection reason is logged, and the supervisor tries routing to another agent.

Handoffs are the escape valve for bad routing. No router is perfect, and handoffs let agents self-correct without failing the task entirely.

Conflict Resolution

When agents genuinely disagree — not quality issues, but different conclusions — you need resolution strategies:

Supervisor Decides

The supervisor picks the position with the highest confidence. Fast and cheap. Risk: the supervisor might miss nuances that the lower-confidence agent caught.

Agent Debate

Agents present positions and counter-arguments in structured rounds. Produces the best outcomes but costs multiple API calls per round. Reserve for high-stakes conflicts.

Weighted Merge

Both positions are preserved, weighted by confidence. "The researcher (0.8) found 24.5% market share using Q4 reports. The analyst (0.65) modeled 19.3% using a different methodology." This gives the reader transparency about uncertainty.

Escalate to Human

When agent confidence is low across the board, punt to a human reviewer. This is the honest answer: sometimes AI agents genuinely don't know, and pretending otherwise is worse than asking for help.

The pattern is progressive escalation: supervisor_decides (cheap) → weighted_merge (transparent) → escalate_to_human (safe). Start cheap, escalate only when needed. Most conflicts resolve at the first level.

Integrating Consensus into the Pipeline

The orchestrator's review phase is where consensus happens:

Task completes → quality checker runs (automated gate)

If quality passes → voting round (2-3 agents review)

If vote passes → output moves to assembly

If vote fails → task is retried or handed off to a different agent

If retry fails → conflict resolver determines outcome

This adds latency but dramatically improves output quality. In production, you'd make this configurable per task type — research gets a quick quality check, customer-facing memos get full voting rounds.

This is chapter 4 of Multi-Agent Orchestration.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 3: Shared Memory

Ch. 5: Orchestration App