Back to guides
4
6 min

AI Gateway

Guardrails, Routing & LangGraph.js

Why an AI Gateway?

In Module 3, you built a retrieval system that finds the right HR documents. But between the retrieval system and the employee's screen sits a critical layer: the AI Gateway. This is the control plane that decides:

  • What kind of question is this? (classification)
  • Should we answer it at all? (guardrails)
  • Which model should handle it? (routing)
  • Have we answered this before? (caching)
  • How much is this costing us? (usage tracking)
  • For a generic chatbot, you might skip most of this. For an HR assistant, every one of these is mandatory. The gateway is what makes the system trustworthy enough for employees to rely on.

    HR-Specific Guardrails

    This is where an HR assistant fundamentally differs from other RAG systems. The guardrails aren't optional — they're the reason the system can be deployed at all.

    No Salary Disclosure

    An employee asks: "How much does Sarah in Engineering make?" The system has access to compensation data (or could infer it from org chart seniority). Without guardrails, it might answer. With them:

    → Detect: query references specific employee compensation
    → Block: "I can help with compensation bands and structures.
       For specific salary information, please contact your
       HR Business Partner."

    This isn't a bug — it's the system correctly protecting confidential information.

    No Legal Advice

    An employee asks: "Can they fire me for taking FMLA leave? That's illegal, right?" The system has the FMLA policy and could explain protections. But interpreting whether a specific situation constitutes illegal retaliation is legal advice.

    → Detect: query asks for legal interpretation
    → Redirect: "Our FMLA policy protects eligible employees from
       retaliation for taking qualified leave. For guidance on your
       specific situation, please consult with Legal or contact
       the ethics hotline at 1-800-555-0188."

    The system shares the policy but draws the line at legal interpretation.

    Confidentiality Enforcement

    Not all employees should see all data. The gateway enforces role-based access:

    User RoleAccess Level
    EmployeePublic + Internal (handbook, policies, benefits, org chart)
    ManagerAbove + team PTO balances, team org data
    HR TeamAll data including confidential (all PTO, all org)
    HR AdminAll data including restricted (compensation, investigations)

    This is enforced at the gateway level, not the UI level. Even if someone crafts a clever prompt, the gateway filters results before they reach the LLM.

    PII Detection

    Queries containing Social Security numbers, bank account numbers, or other PII are intercepted. The PII is masked before the query proceeds to the LLM, and the employee is warned about sharing sensitive information in chat.

    LangGraph.js Architecture

    We implement the gateway as a LangGraph.js state graph — a directed graph where each node is a function and edges are conditional transitions.

    Why LangGraph?

    Instead of nested if/else chains that become unmaintainable:

    // BAD: Spaghetti orchestration
    if (isCached(query)) return cached;
    if (hasPII(query)) query = maskPII(query);
    if (isSensitive(query)) { /* special handling */ }
    if (isSimple(query)) model = "haiku";
    // ... 200 more lines of branching logic

    LangGraph makes the flow visible and composable:

    classify → cache_check → [hit] → return
                           → [miss] → guardrails → route → llm → format → track

    Each node is a pure function. Edges are conditional. You can trace exactly which path a query took. Adding a new step (say, sentiment analysis) means adding one node and two edges — nothing else changes.

    The State Object

    Every node reads and updates a typed state:

    interface GatewayState {
      query: string;
      category: "policy" | "benefits" | "org" | "leave" | "compliance" | "general";
      complexity: "simple" | "moderate" | "complex";
      sensitivity: "normal" | "sensitive" | "restricted";
      userRole: "employee" | "manager" | "hr" | "hr_admin";
      retrievedContext: SearchResult[];
      cachedResponse?: string;
      guardrailFlags: string[];
      selectedModel: string;
      response: string;
      citations: Citation[];
      confidence: "high" | "medium" | "low";
      tokensUsed: number;
      latencyMs: number;
    }

    This state is the single source of truth. The classify node sets category and complexity. The guardrails node may add to guardrailFlags. The route node reads complexity and sensitivity to set selectedModel. Every decision is traceable.

    Query Classification

    The classify node categorizes each query:

    CategoryExample QueryRouting Implication
    Policy"What's the remote work policy?"Search policies + handbook
    Benefits"What's the 401k match?"Search benefits guide
    Leave"How much PTO do I have?"Search PTO + leave policies
    Org"Who reports to David?"Search org chart
    Compliance"How do I report harassment?"Search policies + flag as sensitive
    General"When is open enrollment?"Search all sources

    Classification determines which sources to search, which model to use, and which guardrails to apply.

    Model Routing

    Not every query needs the same model. Simple lookups are fast and cheap; complex compliance questions need the best available model.

    ComplexityModelCostUse Case
    SimpleHaiku$0.25/M tokens"Who's my manager?"
    ModerateSonnet$3/M tokens"Explain our PTO carryover rules"
    ComplexOpus$15/M tokens"Compare our CA vs TX leave provisions"

    This routing typically saves 60-70% on token costs compared to using the most capable model for every query.

    Semantic Caching

    Policy questions are highly repetitive. "What's the 401k match?" gets asked hundreds of times with slight variations. Semantic caching detects these:

  • Exact match — hash the normalized query, check the cache
  • Semantic match — embed the query, check cosine similarity against cached queries (threshold > 0.95)
  • Category-aware TTL — policies rarely change (24h cache), PTO balances change daily (1h cache)
  • Cache hit rates of 30-50% are common for HR systems, which dramatically reduces cost and latency.

    What You'll Build

  • Design and implement the LangGraph state graph
  • Build HR guardrails (salary blocking, legal advice detection, PII masking)
  • Implement query classification and model routing
  • Add semantic caching with category-aware TTL
  • Run the full gateway end-to-end with different query types
  • Glossary

    TermMeaning
    AI GatewayControl plane between retrieval and LLM
    LangGraphGraph-based AI orchestration framework
    State graphDirected graph where nodes are functions and edges are transitions
    GuardrailRule that blocks, redirects, or modifies queries/responses
    Model routingSelecting the right LLM based on query characteristics
    Semantic cachingCaching responses for semantically similar queries
    RBACRole-Based Access Control — permissions tied to user roles

    This is chapter 4 of AI HR Assistant.

    Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

    View course details