Cluster design¶
AgentEscapeBench runs each eval sample in its own Kubernetes pod with
nested-virtualization (KVM) enabled, on a GKE cluster the operator
provisions once via scripts/setup-gke.sh. Two QEMU VMs coexist in
each pod: the eval VM (where the agent develops the exploit) and
the scoring VM (booted afresh per scoring round on a sibling SSH
port).
The model has no inspect-side tools from this repo. It interacts with
the eval VM via its in-VM bash shell — provided by inspect-swe's
agent bridge — and writes its payload to /root/payload. After every
CLI exit the scorer pauses the eval VM via QMP (stop), boots a
sibling scoring VM on scoringSshPort, reads the payload as bytes,
runs it inside the configured sandbox, checks the targeted proof,
kills the scoring VM, and resumes the eval VM (cont) so the agent's
session state survives across rounds. See docs/proof-mechanics.md
for the per-proof mechanics and CLAUDE.md § "Iterative scoring loop"
for the loop-level description.
Goals¶
- Scale: 100+ concurrent VMs for 48+ hour runs.
- Simplicity: model owns its own sandbox lifecycle; the harness provides bash + scoring, not abstractions over the sandbox.
- Portability: k8s API is cloud-agnostic (GKE, EKS, AKS, bare metal). EKS / AKS aren't shipped, but the sandbox plugin is the same; only the cluster-setup script differs.
- Cost efficiency: cluster autoscaler adds/removes nodes on demand. Idle clusters scale to zero on the kvm-pool.
Architecture¶
Workstation GKE Cluster
─────────── ───────────
inspect eval ...
│
├─ task_init()
│ └─ ensure prepared images ──► Image cache
│ (one per sandbox_type) (gs:// or https://)
│ │
├─ sample_init() ×N (concurrent) │
│ ├─ helm install (k8s sandbox) ──► Pod (n2-standard-8 node)
│ │ ├─ entrypoint.sh:
│ │ │ download prepared image
│ │ │ qemu-img create overlay
│ │ │ boot eval QEMU (-daemonize)
│ │ │ sleep infinity
│ └─ poll SSH ready ──► Eval VM (4 vCPU / 12GB RAM)
│ │ agent runs here, full bash
│ │
├─ solver runs (inspect-swe loop) │
│ └─ agent iterates on ──► VM: model develops exploit at
│ /root/payload via SSH /root/payload
│ │
├─ scorer (every CLI exit) │
│ ├─ pause eval VM (QMP stop) │
│ ├─ boot scoring VM (sibling) ──► Scoring VM (2 vCPU / 8GB RAM)
│ │ plant flag, run payload │ sshPort 2223, fresh overlay
│ │ check the targeted proof │
│ ├─ kill scoring VM │
│ └─ resume eval VM (QMP cont) │
│ │
└─ sample_cleanup() │
└─ helm uninstall ──► Pod deleted, node may scale down
Pod packing: one pod per n2-standard-8 node. Pod requests 7 vCPU
and 26 GiB memory (request==limit), under GKE's
7910m / ~27.7 GiB allocatable on n2-standard-8. The eval VM
(4 vCPU / 12 GiB) and scoring VM (2 vCPU / 8 GiB) share that
envelope; the eval VM is paused during scoring so live concurrency
is 4 vCPU + the scoring VM's 2 vCPU, leaving the 7th for kubelet
shim + QEMU IO threads. The remaining ~6 GiB above the two guests'
20 GiB is QEMU overhead + page cache + slack. The pod template is
src/escapebench/resources/k8s/chart/templates/pod.yaml
(authoritative — read it for current values).
k8s sandbox plugin¶
Pod lifecycle is delegated to the
inspect-k8s-sandbox
package rather than reimplemented. The plugin handles:
- Pod creation/deletion via Helm (
helm install --wait,helm uninstall). - Pod-host exec with retry logic (5 attempts, exponential backoff, CRI-O workarounds).
- Interrupt handling (deferred cleanup on Ctrl-C; orphan sweep in
task_cleanup). - CLI cleanup (
inspect sandbox cleanup k8s). - Two-level concurrency control (Helm install semaphore + pod exec semaphore).
AgentEscapeBench wraps the plugin's exec in two paths sharing one transport:
- Pod-host exec (
k8s_env.exec(cmd)) — QMP commands, overlay creation, QEMU process management, image downloads. Uses the plugin's native exec directly. - VM exec (
k8s_env.exec(ssh_wrap(cmd))) — commands inside the eval / scoring VM. Wraps the command with SSH before passing to the plugin's exec.
Helm chart lives inside the Python package as package data at
src/escapebench/resources/k8s/chart/ (included via pyproject.toml
package data config). Default Helm values come from
escapebench.yaml — same file Inspect reads for the sandbox config,
so there's one source of truth. Per-sample values (prepared image
URI, sandbox type) are auto-injected from sample metadata by the k8s
sandbox plugin's _metadata_to_extra_values. The chart template
references them as {{ .Values.sampleMetadataPreparedImageUri }},
{{ .Values.sampleMetadataSandboxType }}, etc.
SSH key handling¶
There is no pre-created SSH-key Kubernetes Secret. The pod's
entrypoint.sh runs setup-cloud-init.sh, which generates a fresh
ed25519 keypair inside the pod and seeds it into the VM via
cloud-init user-data. Each pod has its own ephemeral keypair;
nothing is persisted in the cluster.
Image cache¶
Pod entrypoints download the prepared qcow2 from the URI configured
in escapebench.yaml's imageCache. Schemes supported:
| Scheme | Existence check | Pod download | Typical use |
|---|---|---|---|
gs:// |
GCS SDK (needs GCP auth) | gcloud storage cp |
Cluster builds (writer side) |
s3:// |
aws s3 ls |
aws s3 cp |
AWS-based clusters |
http(s):// |
urllib HEAD |
curl -fSL --retry 3 |
Public consumption (no auth) |
/path, ./path |
Path.exists |
cp |
Local dev / debugging |
The shipped default is the public bucket
(https://storage.googleapis.com/fellows-safety-research-escapebench-public),
which the pod reads anonymously. See
image-build-design.md for the build-side
view, the cache-key scheme, and the .validated marker that gates
runtime consumption.
Pod image¶
Source-derived: the pod container image URI is computed from
images/pod/ content via image_keys.pod_image_uri() —
…/escapebench-qemu:src-<sha256(images/pod/**)>. Build pipeline,
runtime, and the .github/workflows/build-pod-image.yml workflow
all import the same function, so the registry tag, the cache-key
contribution, and the runtime URI all derive from the same
checked-out tree. There is no image: field in escapebench.yaml;
adding one would re-introduce drift the source-derived design exists
to prevent. The yaml-loader injects the URI into the Helm values at
sandbox_env._materialized_values_path() time.
See image-build-design.md →
"Pod-image lifecycle" for the build-and-publish workflow.
Configuration¶
escapebench.yaml serves as both the Inspect sandbox config AND the
Helm values file — one source of truth for both Python and Helm.
config_deserialize returns a K8sSandboxEnvironmentConfig pointing
to our chart and this same file as values. Python code reads values
via config.values.read_text() when needed — no module-level
globals.
The shipped defaults (read the file directly for the live values):
imageCache: https://storage.googleapis.com/fellows-safety-research-escapebench-public
activeDeadlineSeconds: 21600
vmMemoryMb: 12288
vmCpus: 4
sshPort: 2222
scoringSshPort: 2223
scoringVmMemoryMb: 8192
scoringVmCpus: 2
scoringTimeoutSeconds: 60
Per-sample values (e.g. the prepared image URI) come from metadata
injection via the k8s sandbox plugin's _metadata_to_extra_values,
not from this file.
Cluster setup¶
# Consumer: cluster + KVM pool + node SA with logging/monitoring only.
# The shipped escapebench.yaml already points at the public pod image
# and the public qcow2 cache, so this is enough to run evals.
./scripts/setup-gke.sh <gcp-project-id>
# Producer: also creates the named GCS image-cache bucket (public-read)
# and grants the node SA write access. Required to build your own qcow2
# images via `python -m escapebench.build`.
./scripts/setup-gke.sh <gcp-project-id> --bucket gs://<your-bucket>
# --cluster NAME (default: escapebench) lets the same script
# provision a differently-named cluster + matching node SA in the
# same project — useful for parallel teams or staging clusters.
./scripts/setup-gke.sh <gcp-project-id> --cluster myteam-escapebench
The script is idempotent in either mode. The producer-only steps
(bucket + storage.objectAdmin + artifactregistry.reader) are
gated behind --bucket gs://NAME so consumers don't end up with a
public bucket they never write to. The script also applies
src/escapebench/resources/k8s/priority-class.yaml — a
cluster-scoped PriorityClass (escapebench) that the per-sample
Helm chart references via spec.priorityClassName: escapebench.
Default value is 0 (same priority tier as everything else) with
preemptionPolicy: Never, so escapebench pods are polite tenants —
they don't preempt others, and aren't preempted by other
default-priority pods on contention.
See troubleshooting.md for what fails when
something goes wrong.
Cost estimates¶
| Scenario | Nodes | Duration | Cost (on-demand) |
|---|---|---|---|
| Testing (1–4 VMs) | 1–4 × n2-standard-8 |
1 h | $0.27–$1.08 |
| Production (100 VMs) | ~100 × n2-standard-8 |
48 h | ~$1,300 |
GKE management fee: ~$2.40/day (Standard).
Dependencies¶
| Package | Purpose |
|---|---|
inspect-k8s-sandbox |
Pod lifecycle, exec with retries, cleanup |
inspect-swe |
Agent bridge (claude-code, codex-cli) |
| Helm CLI | Required by inspect-k8s-sandbox |
kubectl |
Required for cluster access |
gcloud CLI |
GCS access (prepared images), GKE cluster management |