Eval-time FileNotFoundError mentioning a <key>.qcow2 URL
The qcow2 for that variant isn't in the bucket, or it's there but missing the .validated marker
python -m escapebench.build <variant> --image-cache gs://<bucket> builds and validates. If a qcow2 already exists from a GHA run, python -m escapebench.build --validate-only <variant> --image-cache gs://<bucket> finishes the round-trip.
Eval-time error: "image present but unvalidated; rebuild"
<key>.qcow2 is in the bucket but <key>.qcow2.validated is not
Same as above — most likely a previous build aborted between upload and validation.
Pod stuck in ImagePullBackOff, registry returns 404
The :src-<hash> tag isn't published yet
Wait for .github/workflows/build-pod-image.yml to finish on this revision, or trigger it via workflow_dispatch. Check with gcloud artifacts docker images describe "$(uv run python -c 'from escapebench.image_keys import pod_image_uri; print(pod_image_uri())')".
--validate-only fails with FileNotFoundError pointing at the full-build command
The qcow2 isn't in the bucket
Run a full build instead: python -m escapebench.build <variant> --image-cache gs://<bucket>.
python -m escapebench.build --status is silent or wrong
--status only supports gs:// URIs
Pass --image-cache gs://<bucket> explicitly; HTTPS existence checks happen on the eval path, not via --status.
Build script error: "Input listed in VariantSpec.inputs doesn't exist"
Cache-key inputs reference a file that's been moved/deleted
Update _VARIANT_SPECS in src/escapebench/build.py to match disk.
Pod scheduling stuck — autoscaler refuses to add a node
Pod requests don't fit n2-standard-8 allocatable (7910m / ~27.7 GiB)
Confirm pod.yaml requests are 7 vCPU / 26 GiB (request==limit). Anything closer to the ceiling (8/28 Gi) is unschedulable.
QEMU boot fails with "no /dev/kvm" or KVM permission errors
Node doesn't have nested virtualization, or the kvm device plugin isn't installed
Re-run ./scripts/setup-gke.sh <project> — it provisions KVM-capable nodes. Other clouds (EKS, AKS) need equivalent provisioning; the requirement is the same.
Build pod errors writing .validated marker
Node SA lacks storage.objectAdmin on the bucket
Re-run ./scripts/setup-gke.sh <project> --bucket gs://<bucket>. Producer mode (gated on --bucket) is what grants the bucket-write permission.
helm install fails with "PriorityClass not found: escapebench"
The cluster-scoped PriorityClass from priority-class.yaml wasn't applied
Re-run ./scripts/setup-gke.sh (it applies the manifest), or kubectl apply -f src/escapebench/resources/k8s/priority-class.yaml.
Sample fails with LimitExceededError after 3 identical scoring rounds
Model resubmitted the same payload bytes 3× in a row
Expected behaviour — cleanroom scoring is deterministic, so the sample ends. Bump the message budget if the model legitimately needs more attempts at different payloads.
Sample times out at activeDeadlineSeconds
Pod TTL safety net kicked in
Default is 21600s (6h). Bump in escapebench.yaml if you're running long agent runs.
Sample's mid-run state isn't readable in inspect view
--log-shared not enabled, or eval still finalizing
Add --log-shared 60 --display log to every agent invocation (already in CLAUDE.md / quickstart.md examples). The .eval zip's central directory only lands at finalize without it.
Backgrounded run's captured stdout is empty
--display full (default) is a Textual TUI that doesn't capture
Add --display log.
Multiple samples across different variants all error with "ESCAPEBENCH_ALLOW_UNVALIDATED_IMAGE" pulled images that aren't yours
The flag was leaked into the parent process env outside _run_validation
Only python -m escapebench.build's subprocess should set this. Search your shell env / CI for stray exports; the runtime's marker check exists exactly to keep un-validated bytes inert.
Validation eval inside python -m escapebench.build fails on the negative control
The qcow2 boots but the scoring pipeline broke (Helm chart, payload runner, etc.)
Inspect the validation eval's .eval log; the build's marker is not written, so the qcow2 stays inert. Fix the underlying issue and rerun the build.
Validation eval fails on the positive control (reference exploit)
Exploit drift — kernel update / package bump moved the target
Build still completes (positive control is warn-only); reference exploit needs regeneration. For pinned exploits with embedded binaries, see CLAUDE.md → "Pinned reference exploits".