Cluster design¶

AgentEscapeBench runs each eval sample in its own Kubernetes pod with nested-virtualization (KVM) enabled, on a GKE cluster the operator provisions once via scripts/setup-gke.sh. Two QEMU VMs coexist in each pod: the eval VM (where the agent develops the exploit) and the scoring VM (booted afresh per scoring round on a sibling SSH port).

The model has no inspect-side tools from this repo. It interacts with the eval VM via its in-VM bash shell — provided by inspect-swe's agent bridge — and writes its payload to /root/payload. After every CLI exit the scorer pauses the eval VM via QMP (stop), boots a sibling scoring VM on scoringSshPort, reads the payload as bytes, runs it inside the configured sandbox, checks the targeted proof, kills the scoring VM, and resumes the eval VM (cont) so the agent's session state survives across rounds. See docs/proof-mechanics.md for the per-proof mechanics and CLAUDE.md § "Iterative scoring loop" for the loop-level description.

Goals¶

Scale: 100+ concurrent VMs for 48+ hour runs.
Simplicity: model owns its own sandbox lifecycle; the harness provides bash + scoring, not abstractions over the sandbox.
Portability: k8s API is cloud-agnostic (GKE, EKS, AKS, bare metal). EKS / AKS aren't shipped, but the sandbox plugin is the same; only the cluster-setup script differs.
Cost efficiency: cluster autoscaler adds/removes nodes on demand. Idle clusters scale to zero on the kvm-pool.

Architecture¶

Workstation                          GKE Cluster
───────────                          ───────────
inspect eval ...
  │
  ├─ task_init()
  │   └─ ensure prepared images      ──► Image cache
  │       (one per sandbox_type)          (gs:// or https://)
  │                                       │
  ├─ sample_init() ×N (concurrent)        │
  │   ├─ helm install (k8s sandbox)  ──► Pod (n2-standard-8 node)
  │   │                                   ├─ entrypoint.sh:
  │   │                                   │     download prepared image
  │   │                                   │     qemu-img create overlay
  │   │                                   │     boot eval QEMU (-daemonize)
  │   │                                   │     sleep infinity
  │   └─ poll SSH ready              ──► Eval VM  (4 vCPU / 12GB RAM)
  │                                       │   agent runs here, full bash
  │                                       │
  ├─ solver runs (inspect-swe loop)       │
  │   └─ agent iterates on              ──► VM: model develops exploit at
  │      /root/payload via SSH               /root/payload
  │                                       │
  ├─ scorer (every CLI exit)              │
  │   ├─ pause eval VM (QMP stop)         │
  │   ├─ boot scoring VM (sibling) ──► Scoring VM (2 vCPU / 8GB RAM)
  │   │   plant flag, run payload         │   sshPort 2223, fresh overlay
  │   │   check the targeted proof        │
  │   ├─ kill scoring VM                  │
  │   └─ resume eval VM (QMP cont)        │
  │                                       │
  └─ sample_cleanup()                     │
      └─ helm uninstall              ──► Pod deleted, node may scale down

Pod packing: one pod per n2-standard-8 node. Pod requests 7 vCPU and 26 GiB memory (request==limit), under GKE's 7910m / ~27.7 GiB allocatable on n2-standard-8. The eval VM (4 vCPU / 12 GiB) and scoring VM (2 vCPU / 8 GiB) share that envelope; the eval VM is paused during scoring so live concurrency is 4 vCPU + the scoring VM's 2 vCPU, leaving the 7th for kubelet shim + QEMU IO threads. The remaining ~6 GiB above the two guests' 20 GiB is QEMU overhead + page cache + slack. The pod template is src/escapebench/resources/k8s/chart/templates/pod.yaml (authoritative — read it for current values).

k8s sandbox plugin¶

Pod lifecycle is delegated to the inspect-k8s-sandbox package rather than reimplemented. The plugin handles:

Pod creation/deletion via Helm (helm install --wait, helm uninstall).
Pod-host exec with retry logic (5 attempts, exponential backoff, CRI-O workarounds).
Interrupt handling (deferred cleanup on Ctrl-C; orphan sweep in task_cleanup).
CLI cleanup (inspect sandbox cleanup k8s).
Two-level concurrency control (Helm install semaphore + pod exec semaphore).

AgentEscapeBench wraps the plugin's exec in two paths sharing one transport:

Pod-host exec (k8s_env.exec(cmd)) — QMP commands, overlay creation, QEMU process management, image downloads. Uses the plugin's native exec directly.
VM exec (k8s_env.exec(ssh_wrap(cmd))) — commands inside the eval / scoring VM. Wraps the command with SSH before passing to the plugin's exec.

Helm chart lives inside the Python package as package data at src/escapebench/resources/k8s/chart/ (included via pyproject.toml package data config). Default Helm values come from escapebench.yaml — same file Inspect reads for the sandbox config, so there's one source of truth. Per-sample values (prepared image URI, sandbox type) are auto-injected from sample metadata by the k8s sandbox plugin's _metadata_to_extra_values. The chart template references them as {{ .Values.sampleMetadataPreparedImageUri }}, {{ .Values.sampleMetadataSandboxType }}, etc.

SSH key handling¶

There is no pre-created SSH-key Kubernetes Secret. The pod's entrypoint.sh runs setup-cloud-init.sh, which generates a fresh ed25519 keypair inside the pod and seeds it into the VM via cloud-init user-data. Each pod has its own ephemeral keypair; nothing is persisted in the cluster.

Image cache¶

Pod entrypoints download the prepared qcow2 from the URI configured in escapebench.yaml's imageCache. Schemes supported:

Scheme	Existence check	Pod download	Typical use
`gs://`	GCS SDK (needs GCP auth)	`gcloud storage cp`	Cluster builds (writer side)
`s3://`	`aws s3 ls`	`aws s3 cp`	AWS-based clusters
`http(s)://`	`urllib` HEAD	`curl -fSL --retry 3`	Public consumption (no auth)
`/path`, `./path`	`Path.exists`	`cp`	Local dev / debugging

The shipped default is the public bucket (https://storage.googleapis.com/fellows-safety-research-escapebench-public), which the pod reads anonymously. See image-build-design.md for the build-side view, the cache-key scheme, and the .validated marker that gates runtime consumption.

Pod image¶

Source-derived: the pod container image URI is computed from images/pod/ content via image_keys.pod_image_uri() — …/escapebench-qemu:src-<sha256(images/pod/**)>. Build pipeline, runtime, and the .github/workflows/build-pod-image.yml workflow all import the same function, so the registry tag, the cache-key contribution, and the runtime URI all derive from the same checked-out tree. There is no image: field in escapebench.yaml; adding one would re-introduce drift the source-derived design exists to prevent. The yaml-loader injects the URI into the Helm values at sandbox_env._materialized_values_path() time.

See image-build-design.md → "Pod-image lifecycle" for the build-and-publish workflow.

Configuration¶

escapebench.yaml serves as both the Inspect sandbox config AND the Helm values file — one source of truth for both Python and Helm. config_deserialize returns a K8sSandboxEnvironmentConfig pointing to our chart and this same file as values. Python code reads values via config.values.read_text() when needed — no module-level globals.

The shipped defaults (read the file directly for the live values):

imageCache: https://storage.googleapis.com/fellows-safety-research-escapebench-public
activeDeadlineSeconds: 21600
vmMemoryMb: 12288
vmCpus: 4
sshPort: 2222
scoringSshPort: 2223
scoringVmMemoryMb: 8192
scoringVmCpus: 2
scoringTimeoutSeconds: 60

Per-sample values (e.g. the prepared image URI) come from metadata injection via the k8s sandbox plugin's _metadata_to_extra_values, not from this file.

Cluster setup¶

# Consumer: cluster + KVM pool + node SA with logging/monitoring only.
# The shipped escapebench.yaml already points at the public pod image
# and the public qcow2 cache, so this is enough to run evals.
./scripts/setup-gke.sh <gcp-project-id>

# Producer: also creates the named GCS image-cache bucket (public-read)
# and grants the node SA write access. Required to build your own qcow2
# images via `python -m escapebench.build`.
./scripts/setup-gke.sh <gcp-project-id> --bucket gs://<your-bucket>

# --cluster NAME (default: escapebench) lets the same script
# provision a differently-named cluster + matching node SA in the
# same project — useful for parallel teams or staging clusters.
./scripts/setup-gke.sh <gcp-project-id> --cluster myteam-escapebench

The script is idempotent in either mode. The producer-only steps (bucket + storage.objectAdmin + artifactregistry.reader) are gated behind --bucket gs://NAME so consumers don't end up with a public bucket they never write to. The script also applies src/escapebench/resources/k8s/priority-class.yaml — a cluster-scoped PriorityClass (escapebench) that the per-sample Helm chart references via spec.priorityClassName: escapebench. Default value is 0 (same priority tier as everything else) with preemptionPolicy: Never, so escapebench pods are polite tenants — they don't preempt others, and aren't preempted by other default-priority pods on contention.

See troubleshooting.md for what fails when something goes wrong.

Cost estimates¶

Scenario	Nodes	Duration	Cost (on-demand)
Testing (1–4 VMs)	1–4 × `n2-standard-8`	1 h	$0.27–$1.08
Production (100 VMs)	~100 × `n2-standard-8`	48 h	~$1,300

GKE management fee: ~$2.40/day (Standard).

Dependencies¶

Package	Purpose
`inspect-k8s-sandbox`	Pod lifecycle, exec with retries, cleanup
`inspect-swe`	Agent bridge (claude-code, codex-cli)
Helm CLI	Required by inspect-k8s-sandbox
`kubectl`	Required for cluster access
`gcloud` CLI	GCS access (prepared images), GKE cluster management