5 min

Guardrails & Human-in-the-Loop

Safety as a Feature, Not a Constraint

Why Guardrails Are the Enterprise Differentiator

Every team can wire an LLM to tools. The difference between a demo and a product customers trust is guardrails. Enterprise buyers don't ask "can your agent draft an email?" They ask "what happens when your agent drafts the wrong email?"

Guardrails aren't about limiting your agent. They're about making it safe enough to give it real authority. An agent with strong guardrails can be trusted with higher-stakes actions — sending emails, updating CRM records, creating support tickets. An agent without them stays a toy.

Defense in Depth

No single check catches everything. Layer your defenses:

User Input
    │
    ▼
┌──────────────────┐
│ Input Validation  │  ← Block prompt injection, enforce length limits
└────────┬─────────┘
         ▼
┌──────────────────┐
│ PII Detection     │  ← Flag/redact sensitive data before it reaches the LLM
└────────┬─────────┘
         ▼
┌──────────────────┐
│ LLM Processing    │  ← Model generates tool calls
└────────┬─────────┘
         ▼
┌──────────────────┐
│ Approval Gate     │  ← Human reviews high-risk actions before execution
└────────┬─────────┘
         ▼
┌──────────────────┐
│ Output Validation │  ← Check response for hallucinations, policy violations
└────────┬─────────┘
         ▼
┌──────────────────┐
│ Audit Log         │  ← Record everything for compliance and debugging
└──────────────────┘

Input Validation

The first line of defense. Catch problems before they reach the model:

Prompt Injection Detection

Users (or attackers) may try to override the system prompt: "Ignore all previous instructions and..." Pattern-match against known injection patterns and flag suspicious inputs:

const INJECTION_PATTERNS = [
  /ignore (all )?(previous|prior|above) instructions/i,
  /you are now/i,
  /new system prompt/i,
  /disregard (your|the) (rules|instructions|guidelines)/i,
];

function detectInjection(input: string): boolean {
  return INJECTION_PATTERNS.some((p) => p.test(input));
}

Length Limits

Extremely long inputs can be used for context stuffing attacks or just waste tokens. Set reasonable per-message limits (e.g., 4,000 characters for a sales assistant).

Encoding Attacks

Watch for Unicode tricks, zero-width characters, and Base64-encoded payloads that try to sneak past pattern matching.

PII Detection

Your agent handles sales data — names, emails, phone numbers, deal amounts. PII leaking into logs, third-party APIs, or model training is a compliance disaster.

const PII_PATTERNS = {
  ssn: /\b\d{3}-\d{2}-\d{4}\b/,
  credit_card: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/,
  phone: /\b\(\d{3}\)\s?\d{3}-\d{4}\b/,
  email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b/i,
};

function detectPII(text: string): { type: string; match: string }[] {
  const findings: { type: string; match: string }[] = [];
  for (const [type, pattern] of Object.entries(PII_PATTERNS)) {
    const matches = text.match(new RegExp(pattern, "g"));
    if (matches) findings.push(...matches.map((m) => ({ type, match: m })));
  }
  return findings;
}

Rule of thumb: False positives are better than misses. Flag aggressively, then let the approval gate decide. A false positive costs a human 5 seconds of review. A miss costs a compliance investigation.

Risk Classification

Not all tool calls are equal. Classify actions by reversibility:

Risk Level	Criteria	Approval Required	Examples
Low	Read-only, no side effects	No	Search contacts, view deal history, read docs
Medium	Reversible writes	Optional (configurable)	Update CRM notes, create draft email, add a tag
High	Irreversible or external-facing	Always	Send email to customer, delete record, create invoice
Critical	Financial, legal, or compliance impact	Always + manager approval	Approve discount > 20%, modify contract terms

type RiskLevel = "low" | "medium" | "high" | "critical";

const TOOL_RISK: Record<string, RiskLevel> = {
  search_contacts: "low",
  get_deal_history: "low",
  update_crm_notes: "medium",
  draft_email: "medium",
  send_email: "high",
  create_invoice: "critical",
};

Human-in-the-Loop Approval

When the agent plans a high-risk action, pause execution and ask a human to approve:

async function executeWithApproval(toolCall: ToolCall, state: AgentState) {
  const risk = TOOL_RISK[toolCall.name] ?? "high";

  if (risk === "low") {
    return await registry.execute(toolCall.name, toolCall.args);
  }

  // Pause and request approval
  return {
    type: "approval_required",
    tool: toolCall.name,
    args: toolCall.args,
    risk,
    reason: `This action will ${describeAction(toolCall)}. Please review and approve.`,
  };
}

The approval response is fed back into the agent's state. If approved, execution continues. If rejected, the model receives the rejection reason and replans.

Key UX principle: Show the human *exactly* what will happen. "Send email to jane@globex.com with subject 'Renewal Proposal'" is actionable. "Execute send_email tool" is not.

Audit Logging

Every tool call, approval decision, and agent response gets logged:

interface AuditEntry {
  timestamp: string;
  session_id: string;
  user_id: string;
  action: "tool_call" | "approval_requested" | "approved" | "rejected" | "response";
  tool_name?: string;
  tool_args?: Record<string, unknown>;
  risk_level?: RiskLevel;
  result?: unknown;
  approver_id?: string;
  latency_ms: number;
}

Audit logs serve three purposes:

Compliance — Prove to auditors exactly what the agent did and who approved it

Debugging — When something goes wrong, trace the exact sequence of decisions

Training data — Approved actions become positive examples; rejected actions become negative examples for improving the agent's planning

Cost Caps

Agents can loop and run up API costs. Set budgets:

Per-user daily limit — No single user can trigger more than $5 of API calls per day

Per-session limit — A single conversation can't exceed 50 tool calls

Per-action limit — Any tool call estimated to cost more than $0.50 requires approval

async function checkBudget(userId: string, estimatedCost: number): Promise<boolean> {
  const dailySpend = await getDailySpend(userId);
  if (dailySpend + estimatedCost > DAILY_LIMIT) {
    return false; // Budget exceeded
  }
  await recordSpend(userId, estimatedCost);
  return true;
}

In the capstone, you'll implement all of these layers for your Sales Companion — input validation, PII detection, risk-based approval gates, and audit logging. This is what makes the difference between an internal prototype and a tool your sales team actually trusts.

This is chapter 4 of Production AI Agents.

Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.

View course details

Ch. 3: MCP (Model Context Protocol)

Ch. 5: Agent App