Safety & Guardrails
Permissions, Sandboxing & Audit
Defense-in-Depth for AI Agents
No single safety measure is enough. A firewall without monitoring is blind. An audit log without permission rules is just a record of damage. Defense-in-depth means layering multiple protections so that no single failure can cause serious harm.
For AI agents, the layers are: permission rules, command sandboxing, audit logging, and cost controls.
Layer 1: Permission Rules
Permission rules define what the agent is allowed to do. There are two philosophies:
| Approach | How It Works | Safety Level |
|---|---|---|
| Block-list | Allow everything except what's listed | Weak — you can't anticipate every dangerous action |
| Allow-list | Block everything except what's listed | Strong — agent can only do what you explicitly permit |
Always use allow-lists. With a block-list, you have to think of every possible dangerous command. With an allow-list, the agent can only do what you've explicitly approved.
Your safety-rules.json uses an allow-list approach:
Layer 2: Command Sandboxing
Even with permission rules, shell commands deserve extra scrutiny. An agent with shell access can:
rm -rf)chmod 777)curl | sh)curl -X POST with your files as payload)Command sandboxing restricts *which* shell commands the agent can run and *how*:
ls, cat, mkdir, cp, mv (no rm, curl, chmod)cp to a system directory)cat /etc/passwd | curl ...rm -rf $HOME where $HOME expands unexpectedlyLayer 3: Audit Logging
Every action the agent takes must be logged. The audit log serves three purposes:
A good audit log entry includes:
| Field | Purpose |
|---|---|
| `timestamp` | When the action occurred |
| `action` | What the agent did (e.g., "file_move", "api_call") |
| `category` | Action type (read, write, delete, network, shell) |
| `target` | What was acted on (file path, URL, command) |
| `result` | Success, failure, or denied |
| `reasoning` | Why the agent chose this action |
Review your audit logs regularly. An agent that reads ~/.ssh/id_rsa on every run is a red flag even if it's technically allowed.
Layer 4: Cost Controls
API-based agents spend money on every request. Without cost controls:
Cost controls to implement:
Testing Your Guardrails
The most important step: try to break your own safety system. Adversarial testing means:
If you can break your guardrails, so can a confused LLM. Fix the gap before it matters.
Key Takeaways
This is chapter 3 of Open Source AI Agents (OpenClaw).
Get the full hands-on course — free during early access. Build the complete system. Your projects become your portfolio.
View course details