Guardrails & Human-in-the-Loop
Safety as a Feature, Not a Constraint
Why Guardrails Are the Enterprise Differentiator
Every team can wire an LLM to tools. The difference between a demo and a product customers trust is guardrails. Enterprise buyers don't ask "can your agent draft an email?" They ask "what happens when your agent drafts the wrong email?"
Guardrails aren't about limiting your agent. They're about making it safe enough to give it real authority. An agent with strong guardrails can be trusted with higher-stakes actions — sending emails, updating CRM records, creating support tickets. An agent without them stays a toy.
Defense in Depth
No single check catches everything. Layer your defenses:
User Input
│
▼
┌──────────────────┐
│ Input Validation │ ← Block prompt injection, enforce length limits
└────────┬─────────┘
▼
┌──────────────────┐
│ PII Detection │ ← Flag/redact sensitive data before it reaches the LLM
└────────┬─────────┘
▼
┌──────────────────┐
│ LLM Processing │ ← Model generates tool calls
└────────┬─────────┘
▼
┌──────────────────┐
│ Approval Gate │ ← Human reviews high-risk actions before execution
└────────┬─────────┘
▼
┌──────────────────┐
│ Output Validation │ ← Check response for hallucinations, policy violations
└────────┬─────────┘
▼
┌──────────────────┐
│ Audit Log │ ← Record everything for compliance and debugging
└──────────────────┘Input Validation
The first line of defense. Catch problems before they reach the model:
Prompt Injection Detection
Users (or attackers) may try to override the system prompt: "Ignore all previous instructions and..." Pattern-match against known injection patterns and flag suspicious inputs:
const INJECTION_PATTERNS = [
/ignore (all )?(previous|prior|above) instructions/i,
/you are now/i,
/new system prompt/i,
/disregard (your|the) (rules|instructions|guidelines)/i,
];
function detectInjection(input: string): boolean {
return INJECTION_PATTERNS.some((p) => p.test(input));
}Length Limits
Extremely long inputs can be used for context stuffing attacks or just waste tokens. Set reasonable per-message limits (e.g., 4,000 characters for a sales assistant).
Encoding Attacks
Watch for Unicode tricks, zero-width characters, and Base64-encoded payloads that try to sneak past pattern matching.
PII Detection
Your agent handles sales data — names, emails, phone numbers, deal amounts. PII leaking into logs, third-party APIs, or model training is a compliance disaster.
const PII_PATTERNS = {
ssn: /\b\d{3}-\d{2}-\d{4}\b/,
credit_card: /\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/,
phone: /\b\(\d{3}\)\s?\d{3}-\d{4}\b/,
email: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b/i,
};
function detectPII(text: string): { type: string; match: string }[] {
const findings: { type: string; match: string }[] = [];
for (const [type, pattern] of Object.entries(PII_PATTERNS)) {
const matches = text.match(new RegExp(pattern, "g"));
if (matches) findings.push(...matches.map((m) => ({ type, match: m })));
}
return findings;
}Rule of thumb: False positives are better than misses. Flag aggressively, then let the approval gate decide. A false positive costs a human 5 seconds of review. A miss costs a compliance investigation.
Risk Classification
Not all tool calls are equal. Classify actions by reversibility:
| Risk Level | Criteria | Approval Required | Examples |
|---|---|---|---|
| Low | Read-only, no side effects | No | Search contacts, view deal history, read docs |
| Medium | Reversible writes | Optional (configurable) | Update CRM notes, create draft email, add a tag |
| High | Irreversible or external-facing | Always | Send email to customer, delete record, create invoice |
| Critical | Financial, legal, or compliance impact | Always + manager approval | Approve discount > 20%, modify contract terms |
type RiskLevel = "low" | "medium" | "high" | "critical";
const TOOL_RISK: Record<string, RiskLevel> = {
search_contacts: "low",
get_deal_history: "low",
update_crm_notes: "medium",
draft_email: "medium",
send_email: "high",
create_invoice: "critical",
};Human-in-the-Loop Approval
When the agent plans a high-risk action, pause execution and ask a human to approve:
async function executeWithApproval(toolCall: ToolCall, state: AgentState) {
const risk = TOOL_RISK[toolCall.name] ?? "high";
if (risk === "low") {
return await registry.execute(toolCall.name, toolCall.args);
}
// Pause and request approval
return {
type: "approval_required",
tool: toolCall.name,
args: toolCall.args,
risk,
reason: `This action will ${describeAction(toolCall)}. Please review and approve.`,
};
}The approval response is fed back into the agent's state. If approved, execution continues. If rejected, the model receives the rejection reason and replans.
Key UX principle: Show the human *exactly* what will happen. "Send email to jane@globex.com with subject 'Renewal Proposal'" is actionable. "Execute send_email tool" is not.
Audit Logging
Every tool call, approval decision, and agent response gets logged:
interface AuditEntry {
timestamp: string;
session_id: string;
user_id: string;
action: "tool_call" | "approval_requested" | "approved" | "rejected" | "response";
tool_name?: string;
tool_args?: Record<string, unknown>;
risk_level?: RiskLevel;
result?: unknown;
approver_id?: string;
latency_ms: number;
}Audit logs serve three purposes:
Cost Caps
Agents can loop and run up API costs. Set budgets:
async function checkBudget(userId: string, estimatedCost: number): Promise<boolean> {
const dailySpend = await getDailySpend(userId);
if (dailySpend + estimatedCost > DAILY_LIMIT) {
return false; // Budget exceeded
}
await recordSpend(userId, estimatedCost);
return true;
}In the capstone, you'll implement all of these layers for your Sales Companion — input validation, PII detection, risk-based approval gates, and audit logging. This is what makes the difference between an internal prototype and a tool your sales team actually trusts.
This is chapter 4 of Production AI Agents.
Get the full hands-on course for $100 and build the complete system. Your projects become your portfolio.
View course details