6 min

AI Gateway

Guardrails, Routing & LangGraph.js

Why an AI Gateway?

In Module 3, you built a retrieval system that finds the right HR documents. But between the retrieval system and the employee's screen sits a critical layer: the AI Gateway. This is the control plane that decides:

What kind of question is this? (classification)

Should we answer it at all? (guardrails)

Which model should handle it? (routing)

Have we answered this before? (caching)

How much is this costing us? (usage tracking)

For a generic chatbot, you might skip most of this. For an HR assistant, every one of these is mandatory. The gateway is what makes the system trustworthy enough for employees to rely on.

HR-Specific Guardrails

This is where an HR assistant fundamentally differs from other RAG systems. The guardrails aren't optional — they're the reason the system can be deployed at all.

No Salary Disclosure

An employee asks: "How much does Sarah in Engineering make?" The system has access to compensation data (or could infer it from org chart seniority). Without guardrails, it might answer. With them:

→ Detect: query references specific employee compensation
→ Block: "I can help with compensation bands and structures.
   For specific salary information, please contact your
   HR Business Partner."

This isn't a bug — it's the system correctly protecting confidential information.

No Legal Advice

An employee asks: "Can they fire me for taking FMLA leave? That's illegal, right?" The system has the FMLA policy and could explain protections. But interpreting whether a specific situation constitutes illegal retaliation is legal advice.

→ Detect: query asks for legal interpretation
→ Redirect: "Our FMLA policy protects eligible employees from
   retaliation for taking qualified leave. For guidance on your
   specific situation, please consult with Legal or contact
   the ethics hotline at 1-800-555-0188."

The system shares the policy but draws the line at legal interpretation.

Confidentiality Enforcement

Not all employees should see all data. The gateway enforces role-based access:

User Role	Access Level
Employee	Public + Internal (handbook, policies, benefits, org chart)
Manager	Above + team PTO balances, team org data
HR Team	All data including confidential (all PTO, all org)
HR Admin	All data including restricted (compensation, investigations)

This is enforced at the gateway level, not the UI level. Even if someone crafts a clever prompt, the gateway filters results before they reach the LLM.

PII Detection

Queries containing Social Security numbers, bank account numbers, or other PII are intercepted. The PII is masked before the query proceeds to the LLM, and the employee is warned about sharing sensitive information in chat.

LangGraph.js Architecture

We implement the gateway as a LangGraph.js state graph — a directed graph where each node is a function and edges are conditional transitions.

Why LangGraph?

Instead of nested if/else chains that become unmaintainable:

// BAD: Spaghetti orchestration
if (isCached(query)) return cached;
if (hasPII(query)) query = maskPII(query);
if (isSensitive(query)) { /* special handling */ }
if (isSimple(query)) model = "haiku";
// ... 200 more lines of branching logic

LangGraph makes the flow visible and composable:

classify → cache_check → [hit] → return
                       → [miss] → guardrails → route → llm → format → track

Each node is a pure function. Edges are conditional. You can trace exactly which path a query took. Adding a new step (say, sentiment analysis) means adding one node and two edges — nothing else changes.

The State Object

Every node reads and updates a typed state:

interface GatewayState {
  query: string;
  category: "policy" | "benefits" | "org" | "leave" | "compliance" | "general";
  complexity: "simple" | "moderate" | "complex";
  sensitivity: "normal" | "sensitive" | "restricted";
  userRole: "employee" | "manager" | "hr" | "hr_admin";
  retrievedContext: SearchResult[];
  cachedResponse?: string;
  guardrailFlags: string[];
  selectedModel: string;
  response: string;
  citations: Citation[];
  confidence: "high" | "medium" | "low";
  tokensUsed: number;
  latencyMs: number;
}

This state is the single source of truth. The classify node sets category and complexity. The guardrails node may add to guardrailFlags. The route node reads complexity and sensitivity to set selectedModel. Every decision is traceable.

Query Classification

The classify node categorizes each query:

Category	Example Query	Routing Implication
Policy	"What's the remote work policy?"	Search policies + handbook
Benefits	"What's the 401k match?"	Search benefits guide
Leave	"How much PTO do I have?"	Search PTO + leave policies
Org	"Who reports to David?"	Search org chart
Compliance	"How do I report harassment?"	Search policies + flag as sensitive
General	"When is open enrollment?"	Search all sources

Classification determines which sources to search, which model to use, and which guardrails to apply.

Model Routing

Not every query needs the same model. Simple lookups are fast and cheap; complex compliance questions need the best available model.

Complexity	Model	Cost	Use Case
Simple	Haiku	$0.25/M tokens	"Who's my manager?"
Moderate	Sonnet	$3/M tokens	"Explain our PTO carryover rules"
Complex	Opus	$15/M tokens	"Compare our CA vs TX leave provisions"

This routing typically saves 60-70% on token costs compared to using the most capable model for every query.

Semantic Caching

Policy questions are highly repetitive. "What's the 401k match?" gets asked hundreds of times with slight variations. Semantic caching detects these:

Exact match — hash the normalized query, check the cache

Semantic match — embed the query, check cosine similarity against cached queries (threshold > 0.95)

Category-aware TTL — policies rarely change (24h cache), PTO balances change daily (1h cache)

Cache hit rates of 30-50% are common for HR systems, which dramatically reduces cost and latency.

What You'll Build

Design and implement the LangGraph state graph

Build HR guardrails (salary blocking, legal advice detection, PII masking)

Implement query classification and model routing

Add semantic caching with category-aware TTL

Run the full gateway end-to-end with different query types

Glossary

Term	Meaning
AI Gateway	Control plane between retrieval and LLM
LangGraph	Graph-based AI orchestration framework
State graph	Directed graph where nodes are functions and edges are transitions
Guardrail	Rule that blocks, redirects, or modifies queries/responses
Model routing	Selecting the right LLM based on query characteristics
Semantic caching	Caching responses for semantically similar queries
RBAC	Role-Based Access Control — permissions tied to user roles

This is chapter 4 of AI HR Assistant.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 3: Retrieval System

Ch. 5: HR App