Skip to content

Troubleshooting

Common failure modes when running escapebench, organized by where the error surfaces.

Image cache (qcow2 or pod image not found)

Symptom Cause Fix
Eval-time FileNotFoundError mentioning a <key>.qcow2 URL The qcow2 for that variant isn't in the bucket, or it's there but missing the .validated marker python -m escapebench.build <variant> --image-cache gs://<bucket> builds and validates. If a qcow2 already exists from a GHA run, python -m escapebench.build --validate-only <variant> --image-cache gs://<bucket> finishes the round-trip.
Eval-time error: "image present but unvalidated; rebuild" <key>.qcow2 is in the bucket but <key>.qcow2.validated is not Same as above — most likely a previous build aborted between upload and validation.
Pod stuck in ImagePullBackOff, registry returns 404 The :src-<hash> tag isn't published yet Wait for .github/workflows/build-pod-image.yml to finish on this revision, or trigger it via workflow_dispatch. Check with gcloud artifacts docker images describe "$(uv run python -c 'from escapebench.image_keys import pod_image_uri; print(pod_image_uri())')".
--validate-only fails with FileNotFoundError pointing at the full-build command The qcow2 isn't in the bucket Run a full build instead: python -m escapebench.build <variant> --image-cache gs://<bucket>.
python -m escapebench.build --status is silent or wrong --status only supports gs:// URIs Pass --image-cache gs://<bucket> explicitly; HTTPS existence checks happen on the eval path, not via --status.
Build script error: "Input listed in VariantSpec.inputs doesn't exist" Cache-key inputs reference a file that's been moved/deleted Update _VARIANT_SPECS in src/escapebench/build.py to match disk.

Cluster setup / KVM availability

Symptom Cause Fix
Pod scheduling stuck — autoscaler refuses to add a node Pod requests don't fit n2-standard-8 allocatable (7910m / ~27.7 GiB) Confirm pod.yaml requests are 7 vCPU / 26 GiB (request==limit). Anything closer to the ceiling (8/28 Gi) is unschedulable.
QEMU boot fails with "no /dev/kvm" or KVM permission errors Node doesn't have nested virtualization, or the kvm device plugin isn't installed Re-run ./scripts/setup-gke.sh <project> — it provisions KVM-capable nodes. Other clouds (EKS, AKS) need equivalent provisioning; the requirement is the same.
Build pod errors writing .validated marker Node SA lacks storage.objectAdmin on the bucket Re-run ./scripts/setup-gke.sh <project> --bucket gs://<bucket>. Producer mode (gated on --bucket) is what grants the bucket-write permission.
helm install fails with "PriorityClass not found: escapebench" The cluster-scoped PriorityClass from priority-class.yaml wasn't applied Re-run ./scripts/setup-gke.sh (it applies the manifest), or kubectl apply -f src/escapebench/resources/k8s/priority-class.yaml.

GitHub Actions producers

See wif-setup.md for the underlying SA / WIF setup; the items below are runtime symptoms.

Symptom Cause Fix
build-pod-image.yml step "Authenticate to Google Cloud" fails WIF binding is missing or scoped wrong Walk through wif-setup.md → Pod-image SA. The repo binding only allows the safety-research/agent-escape-bench repo to impersonate.
build-qcow2.yml upload step succeeds for .qcow2 but the marker never appears Expected — GHA can't write .validated markers Run python -m escapebench.build --validate-only <variant> from the cluster afterwards to finish the round-trip.
build-qcow2.yml step "Authenticate to Google Cloud" fails Same as the pod-image case but for gha-qcow2-builder See wif-setup.md → qcow2 builder SA.
build-qcow2.yml enumerate job fails for a single-variant request Variant exceeds 12 GiB guest RAM (v8-release, v8-no-sandbox) Build that variant from the cluster instead; GHA runners can't host the larger guests.

Eval-time / scoring

Symptom Cause Fix
Sample fails with LimitExceededError after 3 identical scoring rounds Model resubmitted the same payload bytes 3× in a row Expected behaviour — cleanroom scoring is deterministic, so the sample ends. Bump the message budget if the model legitimately needs more attempts at different payloads.
Sample times out at activeDeadlineSeconds Pod TTL safety net kicked in Default is 21600s (6h). Bump in escapebench.yaml if you're running long agent runs.
Sample's mid-run state isn't readable in inspect view --log-shared not enabled, or eval still finalizing Add --log-shared 60 --display log to every agent invocation (already in CLAUDE.md / quickstart.md examples). The .eval zip's central directory only lands at finalize without it.
Backgrounded run's captured stdout is empty --display full (default) is a Textual TUI that doesn't capture Add --display log.
Multiple samples across different variants all error with "ESCAPEBENCH_ALLOW_UNVALIDATED_IMAGE" pulled images that aren't yours The flag was leaked into the parent process env outside _run_validation Only python -m escapebench.build's subprocess should set this. Search your shell env / CI for stray exports; the runtime's marker check exists exactly to keep un-validated bytes inert.
Validation eval inside python -m escapebench.build fails on the negative control The qcow2 boots but the scoring pipeline broke (Helm chart, payload runner, etc.) Inspect the validation eval's .eval log; the build's marker is not written, so the qcow2 stays inert. Fix the underlying issue and rerun the build.
Validation eval fails on the positive control (reference exploit) Exploit drift — kernel update / package bump moved the target Build still completes (positive control is warn-only); reference exploit needs regeneration. For pinned exploits with embedded binaries, see CLAUDE.md → "Pinned reference exploits".

Local development

Symptom Cause Fix
uv run pytest hits import errors on the kernel family Stale references to a removed family Search for KERNEL in your branch; the kernel family was dropped 2026-04-27 (see dropped-cves.md → "Kernel family").
Pre-commit hook fails on ruff errors during commit Hook caught lint or format issues uv run ruff check src/ --fix && uv run ruff format src/, re-stage, re-commit. Don't --no-verify past it.
Tests in tests/test_build.py::TestPkrHclPathRefs fail A ${path.root}/X reference in a Packer template doesn't resolve under the assembled build dir Either fix the template or update _VARIANT_SPECS[variant].inputs to include the missing file.