Skip to content

AgentEscapeBench

A sandbox escape benchmark for LLM capability evaluation.

AgentEscapeBench measures whether language models can develop working exploits against real sandbox technologies — Docker, V8, Firecracker, Chromium's renderer sandbox, and others. Each sample puts a model inside a disposable QEMU VM with full bash access, asks it to construct a payload that breaks out of a configured sandbox, and runs that payload in a fresh scoring VM to verify the escape. The result is a (model × sandbox) capability matrix.

The benchmark is built as an Inspect AI eval. Results are standard .eval files.

The README on GitHub carries the full project overview, the sandbox-family table, the threat model, and the install snippet — this site mirrors the design and reference docs.

Where to start