6 min

Agent Orchestration

The Plan-Execute-Observe Loop

From Chatbots to Agents

A chatbot takes a message and returns a response. An agent takes a *goal* and figures out the steps. The difference is a loop:

Plan — The LLM decides what tool to call (or whether to respond directly)

Execute — The system runs the chosen tool with the LLM's parameters

Observe — The tool result is fed back to the LLM as context

Repeat — The LLM plans again with the new information, until the goal is met

This loop is the core of every agent framework — LangChain, CrewAI, AutoGen, Claude's tool use. The frameworks differ in how they structure the loop, but the pattern is universal.

User: "Find the Globex account and prep me for tomorrow's renewal call."

  ┌─────────────────────────────────────┐
  │ PLAN: I need to find the account    │
  │ → Call: search_crm_contacts         │
  └──────────────┬──────────────────────┘
                 │
  ┌──────────────▼──────────────────────┐
  │ EXECUTE: search_crm_contacts({     │
  │   company: "Globex" })              │
  └──────────────┬──────────────────────┘
                 │
  ┌──────────────▼──────────────────────┐
  │ OBSERVE: Found 3 contacts,          │
  │ main contact: Jane Smith            │
  └──────────────┬──────────────────────┘
                 │
  ┌──────────────▼──────────────────────┐
  │ PLAN: Now I need deal history       │
  │ → Call: get_deal_history             │
  └──────────────┬──────────────────────┘
                 │
  ┌──────────────▼──────────────────────┐
  │ OBSERVE: $240K ARR, renewal in 2d,  │
  │ open support ticket re: latency     │
  └──────────────┬──────────────────────┘
                 │
  ┌──────────────▼──────────────────────┐
  │ PLAN: I have enough context.        │
  │ → Generate call prep brief          │
  └─────────────────────────────────────┘

LangGraph.js Fundamentals

LangGraph.js models agents as state graphs. Instead of a linear chain of prompts, you define nodes (functions that transform state) and edges (transitions between them). This gives you explicit control over the agent's flow.

Core Concepts

Concept	What It Is	Example
State	A typed object that flows through the graph	`{ messages: Message[], toolResults: ToolResult[] }`
Node	A function that reads state, does work, returns updated state	`callModel`, `executeTool`, `checkApproval`
Edge	A transition from one node to another	`callModel → executeTool`
Conditional Edge	A transition that depends on state	If tool call requested → `executeTool`; else → `respond`

import { StateGraph, MessagesAnnotation } from "@langchain/langgraph";

// Define the state shape
const AgentState = MessagesAnnotation;

// Define nodes
async function callModel(state: typeof AgentState.State) {
  const response = await model.invoke(state.messages);
  return { messages: [response] };
}

async function executeTool(state: typeof AgentState.State) {
  const lastMessage = state.messages[state.messages.length - 1];
  const toolCall = lastMessage.tool_calls[0];
  const result = await registry.execute(toolCall.name, toolCall.args);
  return { messages: [new ToolMessage({ content: JSON.stringify(result), tool_call_id: toolCall.id })] };
}

// Build the graph
const graph = new StateGraph(AgentState)
  .addNode("agent", callModel)
  .addNode("tools", executeTool)
  .addEdge("__start__", "agent")
  .addConditionalEdges("agent", (state) => {
    const last = state.messages[state.messages.length - 1];
    return last.tool_calls?.length ? "tools" : "__end__";
  })
  .addEdge("tools", "agent")
  .compile();

State Management

State is the memory of your agent. Every node reads it, transforms it, and passes it forward. LangGraph uses immutable state updates — nodes return patches, not mutations:

// Good: return a patch
return { messages: [newMessage] }; // appended to existing messages

// Bad: mutate in place
state.messages.push(newMessage); // breaks checkpointing and replay

Why immutable? Three reasons:

Debugging — You can inspect the state at every step. "What did the agent know when it made that decision?"

Checkpointing — Save and restore agent state mid-conversation. Critical for human-in-the-loop approval gates.

Replay — Re-run a failed conversation from any checkpoint with different tools or prompts.

Parallel Execution

Some tool calls are independent. If the agent needs both a contact's deal history and their support tickets, it doesn't need to wait for one to finish before starting the other. LangGraph supports fan-out for parallel execution:

                ┌──────────────────┐
                │   callModel      │
                │ (requests 2      │
                │  tool calls)     │
                └───────┬──────────┘
                   ┌────┴────┐
            ┌──────▼──┐  ┌───▼───────┐
            │ get_deal │  │ get_tickets│
            │ _history │  │           │
            └──────┬──┘  └───┬───────┘
                   └────┬────┘
                ┌───────▼──────────┐
                │   callModel      │
                │ (both results    │
                │  in context)     │
                └──────────────────┘

async function executeTools(state: typeof AgentState.State) {
  const lastMessage = state.messages[state.messages.length - 1];
  const toolCalls = lastMessage.tool_calls ?? [];

  // Execute all tool calls in parallel
  const results = await Promise.all(
    toolCalls.map(async (tc) => {
      const result = await registry.execute(tc.name, tc.args);
      return new ToolMessage({
        content: JSON.stringify(result),
        tool_call_id: tc.id,
      });
    })
  );

  return { messages: results };
}

Error Recovery

Agents fail. Tools time out, APIs return errors, the LLM misunderstands context. Your orchestration layer needs a strategy:

Retry with Backoff

For transient failures (network timeouts, rate limits), retry the same tool call with exponential backoff. Cap at 3 retries to avoid infinite loops.

Fallback Tools

If the primary CRM search fails, fall back to a cached local index. The results might be stale, but they're better than nothing. Register fallback tools and route to them on primary failure.

Graceful Degradation

If a tool is completely unavailable, tell the model. Return a structured error so the LLM can inform the user and adjust its plan:

return {
  success: false,
  error: "service_unavailable",
  message: "CRM search is currently down. I can still help with information from our knowledge base.",
  available_alternatives: ["search_knowledge_base", "search_cached_contacts"],
};

Loop Detection

Set a maximum number of iterations (typically 10-15). If the agent keeps calling tools without converging on an answer, break the loop and ask the user for clarification. This prevents runaway API costs and infinite cycles.

Putting It Together

In the capstone, your Sales Companion agent will use this exact pattern: a LangGraph state graph with tool nodes for CRM search, document retrieval, email drafting, and meeting prep. The graph structure makes it easy to add new capabilities later — just register a tool and add a node.

The key insight: the graph is the product. Users don't see nodes and edges, but they experience the difference between an agent that flails and one that methodically works through a problem. Good orchestration is invisible. Bad orchestration is immediately obvious.

This is chapter 2 of Production AI Agents.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 1: Tool Design

Ch. 3: MCP (Model Context Protocol)