AgentEscapeBench¶
A sandbox escape benchmark for LLM capability evaluation.
AgentEscapeBench measures whether language models can develop working
exploits against real sandbox technologies — Docker, V8, Firecracker,
Chromium's renderer sandbox, and others. Each sample puts a model
inside a disposable QEMU VM with full bash access, asks it to
construct a payload that breaks out of a configured sandbox, and runs
that payload in a fresh scoring VM to verify the escape. The result is
a (model × sandbox) capability matrix.
The benchmark is built as an Inspect AI
eval. Results are standard .eval files.
The README on GitHub carries the full project overview, the sandbox-family table, the threat model, and the install snippet — this site mirrors the design and reference docs.
Where to start¶
- Quickstart — set up GKE, validate the harness, run an eval.
- Cluster design — per-sample pod, nested KVM, scaling.
- Image build design — image cache, build pipeline, GHA secondary build path.
- Sandbox families — what each family tests.
- Proof mechanics — how each proof is checked.
- Troubleshooting — when something breaks.
- WIF setup — Workload Identity Federation for the GitHub Actions producers.
- Sandbox technology ideas — what's shipped, on deck, ruled out.
- Dropped CVEs — investigated but not shipped, with reasons.
- Adding a sandbox — step-by-step for new sandbox types, image variants, and families.