Back to guides
4
5 min

AI Gateway

Route & Guard

Why Not Just Call the LLM Directly?

In a prototype, you send the user's question straight to an LLM with your retrieved context. In production, that's a recipe for:

  • Runaway costs — one user hammering the API burns through your budget
  • Data leaks — PII or sensitive data flowing into prompts unfiltered
  • Inconsistent quality — simple questions and complex queries get the same treatment
  • No observability — you can't debug what you can't see
  • The AI Gateway is the control plane between your users and the LLM. It classifies, routes, guards, caches, and tracks every request.

    Key Concepts

    Why LangGraph.js?

    Traditional if/else chains work for simple routing, but AI gateways have conditional, branching logic that's hard to express linearly:

  • If the query is a greeting → skip retrieval, respond directly
  • If the query mentions a competitor → add competitor context + guardrail check
  • If the query is complex → use a stronger (more expensive) model
  • If the response mentions pricing → check against approved pricing docs
  • LangGraph.js models this as a state graph — each step is a node, conditions are edges. The graph is:

  • Visual — you can draw and debug the flow
  • Composable — add a new node without rewriting the pipeline
  • Traceable — every node execution is logged with inputs/outputs
  • State Graph Architecture

                    ┌──→ Cache Hit ──→ Return Cached ──→ Track
                    │
    Query ──→ Classify ──→ Route ──→ Guardrails ──→ LLM ──→ Format ──→ Track
                    │         │           │
                    │    simple/complex   PII check
                    │    greeting/search  input validation
                    │                    output check
                    └──→ Greeting ──→ Direct Response ──→ Track

    GatewayState

    The state object flows through every node, accumulating data:

    interface GatewayState {
      query: string;
      queryType: "greeting" | "simple" | "complex" | "sensitive";
      context: RetrievedChunk[];
      response: string;
      model: string;
      cached: boolean;
      cost: { inputTokens: number; outputTokens: number; usd: number };
      userId: string;
      guardrailFlags: string[];
    }

    Each node reads what it needs, writes what it produces. The graph framework handles the plumbing.

    Query Classification

    The classify node analyzes the incoming query and determines:

  • Type — greeting, simple factual, complex analytical, sensitive
  • Required context — which data sources to search
  • Model tier — fast/cheap for simple, powerful for complex
  • This is often done with a small, fast LLM call or even a rule-based classifier for common patterns.

    Guardrails

    Guardrails are graph nodes that inspect inputs and outputs:

    Input guardrails:

  • PII detection — block or redact social security numbers, credit cards, personal emails
  • Prompt injection detection — catch attempts to override system instructions
  • Topic boundaries — reject queries outside the system's domain
  • Output guardrails:

  • Hallucination check — does the response cite sources from the retrieved context?
  • Sensitive data check — is the response leaking internal pricing, roadmap, or HR data?
  • Format validation — does the response match the expected structure?
  • Semantic Caching

    If someone asks "What's Acme's deal status?" and another rep asked the same thing 5 minutes ago, why run the full pipeline again? Semantic caching matches queries by meaning (not exact string) and returns cached responses when similarity exceeds a threshold.

    Cost Controls

    Per-user budgets prevent runaway spending:

  • Token caps — max tokens per request and per day per user
  • Model routing — simple queries use cheap models, complex queries use expensive ones
  • Rate limiting — max requests per minute per user
  • Budget alerts — notify admins when spending approaches limits
  • What You'll Build

  • LangGraph.js state graph with classify → route → guardrails → LLM → format → track
  • Conditional edges based on query complexity
  • PII detection and input validation as graph nodes
  • Per-user cost tracking and budget enforcement
  • Glossary

    TermMeaning
    State graphA directed graph where data flows through nodes via edges
    LangGraph.jsFramework for building stateful AI workflows as graphs
    NodeA processing step in the graph (classify, route, guard, etc.)
    Conditional edgeAn edge that routes to different nodes based on state
    PIIPersonally Identifiable Information (SSN, emails, phone numbers)
    Prompt injectionMalicious input trying to override system instructions
    Semantic cacheCache that matches by meaning similarity, not exact strings
    Token budgetMaximum tokens a user can consume per time period

    This is chapter 4 of AI Sales Companion.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details