Last updated: 2026-04-18

AI Agent Security

Agents are powerful because they read wide and act autonomously. That combination is also the root of every real security risk. This is a practical, balanced guide — not fear-mongering, not vendor boosterism. Eight deep-dive topics, a 6-platform posture comparison, and a 15-minute hardening checklist you can actually complete.

Platform security posture at a glance

Rough posture rating based on default-deny vs. default-allow, sandbox enforcement, and managed-vs-self-hosted trade-offs. "Medium" is not bad — it means you need to do the work; the defaults won't save you.

Platform	Posture	Security model
OpenClaw	🟡 Medium	Self-hosted. You own the sandbox boundary. Default-allow on skills unless you configure otherwise.
NemoClaw	🟡 Medium	Self-hosted like OpenClaw, but with a policy layer (YAML rules) that gates every tool call.
IronClaw	🟢 Strong	Sandboxed-by-default. Every skill runs in an isolated process with a manifest-declared capability set.
Hermes	🟡 Medium	Managed cloud service. Anthropic (or the vendor) handles infrastructure; you configure scopes via OAuth.
Claude Cowork	🟢 Strong	Anthropic-managed. Projects are isolated; system prompts and files stay within your workspace.
ChatGPT	🟡 Medium	OpenAI-managed. Custom GPTs and Actions run in OpenAI's infrastructure with API calls to third-party services you configure.

The non-negotiables

If you skip everything else, do these four:

Default-deny on skills. Never enable a skill globally. Scope per project.
Draft-only for irreversible actions. Email send, git push, file delete, payments. Always a human confirmation gate.
Secrets in .env, never in prompts. SOUL.md, CLAUDE.md, and system prompts get sent to the model on every turn.
Read-only OAuth scopes by default. Grant write access only for the specific action that needs it, and prefer draft/label over send/delete.

Need the shortest possible version? Go to the 15-minute checklist. Building something new? Start with prompt injection — it's the attack class every agent is exposed to.

AI Agent Security

Platform security posture at a glance

Deep-dive topics

Prompt Injection — the #1 agent vulnerability

Skill & Tool Allowlisting — default-deny is not optional

Secrets & Credentials — never in prompts, never in memory

Sandboxing — contain the blast radius

MCP Server Supply Chain — the new npm attack surface

Email & Calendar Scopes — the read-write boundary matters

Incident Response — what to do when the agent goes wrong

The Agent Security Checklist

The non-negotiables