6 min

Agent Design

Building Specialist Agents That Know Their Limits

Why Specialists Beat Generalists

A single AI agent given the prompt "research competitors, analyze the market, and write a strategy memo" will produce mediocre results in all three areas. It context-switches between research and writing, loses track of data sources, and produces analysis that's neither thorough nor well-presented.

Four specialist agents — each with a focused system prompt, relevant tools, and a narrow scope — produce dramatically better results. The researcher knows how to find and cite sources. The writer knows how to structure a memo for executives. The analyst knows how to benchmark data and flag assumptions. The coder knows how to build tools and validate implementations.

The principle: In multi-agent systems, specialization is a feature, not a limitation. Each agent should do one thing well and explicitly declare what it cannot do.

Anatomy of a Specialist Agent

Every agent in a multi-agent system has four components:

Component	Purpose	Example (Researcher)
System Prompt	Defines expertise and rules	"You are a research specialist. Always cite sources. Flag low-confidence findings."
Capabilities	Declares what it can do	Market Research, Competitive Analysis, Data Gathering
Tools	What it has access to	web_search, document_reader
Self-Assessment	Can it handle a given task?	canHandle() returns confidence score

The system prompt is the most important piece. A vague prompt ("You are a helpful agent") produces vague results. A specific prompt with rules, output format, and explicit limitations produces consistent, high-quality work.

const researcherPrompt = `You are a research specialist.

Rules:
- Always cite sources with document IDs
- Distinguish facts (from data) from inferences (your analysis)
- Flag low-confidence findings explicitly
- Structure as: Key Findings → Evidence → Gaps
- Never fabricate data — if you can't find it, say so`;

Capability Declaration

Capabilities tell the supervisor what the agent can do and what tools it needs for each capability. This is the contract the router reads when deciding where to send a task.

capabilities: [
  {
    name: "Market Research",
    description: "Analyze market reports and industry trends",
    tools: ["document_reader", "web_search"]
  },
  {
    name: "Competitive Analysis",
    description: "Research competitor products and positioning",
    tools: ["document_reader", "web_search"]
  }
]

When the supervisor needs to route "analyze NovaTech's pricing strategy," it scans capability descriptions. "Competitive Analysis" matches. The researcher gets the task.

Self-Assessment: canHandle()

Not every task is a clear match. "Build a pricing comparison chart" involves both analysis (pricing data) and visualization (chart). The analyst and the coder could both argue they should handle it.

Self-assessment resolves this. Each agent evaluates a task and returns a confidence score:

canHandle(task: Task): { capable: boolean; confidence: number; reason: string } {
  // Score based on capability matching, tool availability, and keyword relevance
  return { capable: true, confidence: 0.75, reason: "Chart generation matches my capabilities" };
}

The supervisor asks ALL agents for their assessment, then picks the highest-scoring one. This is dynamic routing — no hardcoded rules about which agent handles which task type.

The Base Agent Pattern

An abstract base class enforces consistency across all agents:

process(task, context) — Every agent must implement this. Takes a task and returns a typed result with artifacts.

createArtifact() — Shared utility that produces consistently-shaped artifacts with the agent's role, timestamp, and version tracking.

buildPrompt() — Assembles the system prompt + task context into a ready-to-send prompt.

Agents are stateless. They receive everything they need via parameters. No internal memory between calls. This makes them safe to run in parallel, easy to test (replay the same inputs, get the same output), and simple to replace (swap one researcher implementation for another without touching the supervisor).

Artifact Contracts

Every agent produces typed artifacts: report, memo, analysis, code, chart, or summary. The type is an enum, not a free-form string. This means the assembler always knows what it's working with.

Artifacts have versions. When the analyst reviews the writer's memo and the writer revises, you get version 1 and version 2. The artifact store tracks this automatically — the latest version is what gets assembled, but all versions are preserved for audit.

The key rule: agents produce artifacts, not plain text. A researcher that returns a string is just a chatbot. A researcher that returns a typed report artifact with metadata (sources consulted, confidence level, gaps identified) is a reliable component of a larger system.

This is chapter 1 of Multi-Agent Orchestration.

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Supervisor Pattern