Back to guides
3
4 min

Safety & Guardrails

Permissions, Sandboxing & Audit

Defense-in-Depth for AI Agents

No single safety measure is enough. A firewall without monitoring is blind. An audit log without permission rules is just a record of damage. Defense-in-depth means layering multiple protections so that no single failure can cause serious harm.

For AI agents, the layers are: permission rules, command sandboxing, audit logging, and cost controls.

Layer 1: Permission Rules

Permission rules define what the agent is allowed to do. There are two philosophies:

ApproachHow It WorksSafety Level
Block-listAllow everything except what's listedWeak — you can't anticipate every dangerous action
Allow-listBlock everything except what's listedStrong — agent can only do what you explicitly permit

Always use allow-lists. With a block-list, you have to think of every possible dangerous command. With an allow-list, the agent can only do what you've explicitly approved.

Your safety-rules.json uses an allow-list approach:

  • Allowed directories are explicitly listed
  • Allowed commands are explicitly listed
  • Everything else is denied by default
  • Layer 2: Command Sandboxing

    Even with permission rules, shell commands deserve extra scrutiny. An agent with shell access can:

  • Delete files (rm -rf)
  • Change permissions (chmod 777)
  • Download and execute code (curl | sh)
  • Exfiltrate data (curl -X POST with your files as payload)
  • Command sandboxing restricts *which* shell commands the agent can run and *how*:

  • Command allow-list — only ls, cat, mkdir, cp, mv (no rm, curl, chmod)
  • Argument validation — even allowed commands can be dangerous with wrong arguments (cp to a system directory)
  • No pipe chains — prevent cat /etc/passwd | curl ...
  • No shell expansion — prevent rm -rf $HOME where $HOME expands unexpectedly
  • Layer 3: Audit Logging

    Every action the agent takes must be logged. The audit log serves three purposes:

  • Forensics — what happened, when, and why (after something goes wrong)
  • Monitoring — real-time visibility into what the agent is doing (detect problems early)
  • Accountability — proof that the agent followed its safety rules (or didn't)
  • A good audit log entry includes:

    FieldPurpose
    `timestamp`When the action occurred
    `action`What the agent did (e.g., "file_move", "api_call")
    `category`Action type (read, write, delete, network, shell)
    `target`What was acted on (file path, URL, command)
    `result`Success, failure, or denied
    `reasoning`Why the agent chose this action

    Review your audit logs regularly. An agent that reads ~/.ssh/id_rsa on every run is a red flag even if it's technically allowed.

    Layer 4: Cost Controls

    API-based agents spend money on every request. Without cost controls:

  • A loop bug can send thousands of API calls in minutes
  • A verbose agent can use 10x more tokens than necessary
  • A multi-step workflow can chain expensive calls
  • Cost controls to implement:

  • Per-request limit — max tokens per single API call
  • Per-run limit — total budget for one agent task execution
  • Per-day limit — hard cap on daily spending
  • Alert thresholds — notifications at 50%, 80%, and 95% of limits
  • Testing Your Guardrails

    The most important step: try to break your own safety system. Adversarial testing means:

  • Try to access a blocked directory — does the agent refuse?
  • Try to run a blocked command — does the permission check catch it?
  • Try to exceed cost limits — does the budget cap trigger?
  • Check the audit log — are all attempts (including denied ones) recorded?
  • If you can break your guardrails, so can a confused LLM. Fix the gap before it matters.

    Key Takeaways

  • Defense-in-depth: layer permission rules, sandboxing, audit logging, and cost controls.
  • Always use allow-lists over block-lists — you can't anticipate every dangerous action.
  • Log every action, including denied attempts. Review logs regularly for anomalies.
  • Cost controls prevent runaway API spending from loops and verbose agents.
  • Test your guardrails adversarially — try to break them before deploying.
  • This is chapter 3 of Open Source AI Agents (OpenClaw).

    Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

    View course details