5 min

AI Gateway

Route & Guard

Why Not Just Call the LLM Directly?

In a prototype, you send the user's question straight to an LLM with your retrieved context. In production, that's a recipe for:

Runaway costs — one user hammering the API burns through your budget

Data leaks — PII or sensitive data flowing into prompts unfiltered

Inconsistent quality — simple questions and complex queries get the same treatment

No observability — you can't debug what you can't see

The AI Gateway is the control plane between your users and the LLM. It classifies, routes, guards, caches, and tracks every request.

Key Concepts

Why LangGraph.js?

Traditional if/else chains work for simple routing, but AI gateways have conditional, branching logic that's hard to express linearly:

If the query is a greeting → skip retrieval, respond directly

If the query mentions a competitor → add competitor context + guardrail check

If the query is complex → use a stronger (more expensive) model

If the response mentions pricing → check against approved pricing docs

LangGraph.js models this as a state graph — each step is a node, conditions are edges. The graph is:

Visual — you can draw and debug the flow

Composable — add a new node without rewriting the pipeline

Traceable — every node execution is logged with inputs/outputs

State Graph Architecture

                ┌──→ Cache Hit ──→ Return Cached ──→ Track
                │
Query ──→ Classify ──→ Route ──→ Guardrails ──→ LLM ──→ Format ──→ Track
                │         │           │
                │    simple/complex   PII check
                │    greeting/search  input validation
                │                    output check
                └──→ Greeting ──→ Direct Response ──→ Track

GatewayState

The state object flows through every node, accumulating data:

interface GatewayState {
  query: string;
  queryType: "greeting" | "simple" | "complex" | "sensitive";
  context: RetrievedChunk[];
  response: string;
  model: string;
  cached: boolean;
  cost: { inputTokens: number; outputTokens: number; usd: number };
  userId: string;
  guardrailFlags: string[];
}

Each node reads what it needs, writes what it produces. The graph framework handles the plumbing.

Query Classification

The classify node analyzes the incoming query and determines:

Type — greeting, simple factual, complex analytical, sensitive

Required context — which data sources to search

Model tier — fast/cheap for simple, powerful for complex

This is often done with a small, fast LLM call or even a rule-based classifier for common patterns.

Guardrails

Guardrails are graph nodes that inspect inputs and outputs:

Input guardrails:

PII detection — block or redact social security numbers, credit cards, personal emails

Prompt injection detection — catch attempts to override system instructions

Topic boundaries — reject queries outside the system's domain

Output guardrails:

Hallucination check — does the response cite sources from the retrieved context?

Sensitive data check — is the response leaking internal pricing, roadmap, or HR data?

Format validation — does the response match the expected structure?

Semantic Caching

If someone asks "What's Acme's deal status?" and another rep asked the same thing 5 minutes ago, why run the full pipeline again? Semantic caching matches queries by meaning (not exact string) and returns cached responses when similarity exceeds a threshold.

Cost Controls

Per-user budgets prevent runaway spending:

Token caps — max tokens per request and per day per user

Model routing — simple queries use cheap models, complex queries use expensive ones

Rate limiting — max requests per minute per user

Budget alerts — notify admins when spending approaches limits

What You'll Build

LangGraph.js state graph with classify → route → guardrails → LLM → format → track

Conditional edges based on query complexity

PII detection and input validation as graph nodes

Per-user cost tracking and budget enforcement

Glossary

Term	Meaning
State graph	A directed graph where data flows through nodes via edges
LangGraph.js	Framework for building stateful AI workflows as graphs
Node	A processing step in the graph (classify, route, guard, etc.)
Conditional edge	An edge that routes to different nodes based on state
PII	Personally Identifiable Information (SSN, emails, phone numbers)
Prompt injection	Malicious input trying to override system instructions
Semantic cache	Cache that matches by meaning similarity, not exact strings
Token budget	Maximum tokens a user can consume per time period

This is chapter 4 of AI Sales Companion.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 3: Retrieval System

Ch. 5: Companion App