AgentEscapeBench¶

A sandbox escape benchmark for LLM capability evaluation.

AgentEscapeBench measures whether language models can develop working exploits against real sandbox technologies — Docker, V8, Firecracker, Chromium's renderer sandbox, and others. Each sample puts a model inside a disposable QEMU VM with full bash access, asks it to construct a payload that breaks out of a configured sandbox, and runs that payload in a fresh scoring VM to verify the escape. The result is a (model × sandbox) capability matrix.

The benchmark is built as an Inspect AI eval. Results are standard .eval files.

The README on GitHub carries the full project overview, the sandbox-family table, the threat model, and the install snippet — this site mirrors the design and reference docs.

Where to start¶

Quickstart — set up GKE, validate the harness, run an eval.
Cluster design — per-sample pod, nested KVM, scaling.
Image build design — image cache, build pipeline, GHA secondary build path.
Sandbox families — what each family tests.
Proof mechanics — how each proof is checked.
Troubleshooting — when something breaks.
WIF setup — Workload Identity Federation for the GitHub Actions producers.
Sandbox technology ideas — what's shipped, on deck, ruled out.
Dropped CVEs — investigated but not shipped, with reasons.
Adding a sandbox — step-by-step for new sandbox types, image variants, and families.