Troubleshooting¶

Common failure modes when running escapebench, organized by where the error surfaces.

Image cache (qcow2 or pod image not found)¶

Symptom	Cause	Fix
Eval-time `FileNotFoundError` mentioning a `<key>.qcow2` URL	The qcow2 for that variant isn't in the bucket, or it's there but missing the `.validated` marker	`python -m escapebench.build <variant> --image-cache gs://<bucket>` builds and validates. If a qcow2 already exists from a GHA run, `python -m escapebench.build --validate-only <variant> --image-cache gs://<bucket>` finishes the round-trip.
Eval-time error: "image present but unvalidated; rebuild"	`<key>.qcow2` is in the bucket but `<key>.qcow2.validated` is not	Same as above — most likely a previous build aborted between upload and validation.
Pod stuck in `ImagePullBackOff`, registry returns 404	The `:src-<hash>` tag isn't published yet	Wait for `.github/workflows/build-pod-image.yml` to finish on this revision, or trigger it via `workflow_dispatch`. Check with `gcloud artifacts docker images describe "$(uv run python -c 'from escapebench.image_keys import pod_image_uri; print(pod_image_uri())')"`.
`--validate-only` fails with FileNotFoundError pointing at the full-build command	The qcow2 isn't in the bucket	Run a full build instead: `python -m escapebench.build <variant> --image-cache gs://<bucket>`.
`python -m escapebench.build --status` is silent or wrong	`--status` only supports `gs://` URIs	Pass `--image-cache gs://<bucket>` explicitly; HTTPS existence checks happen on the eval path, not via `--status`.
Build script error: "Input listed in `VariantSpec.inputs` doesn't exist"	Cache-key inputs reference a file that's been moved/deleted	Update `_VARIANT_SPECS` in `src/escapebench/build.py` to match disk.

Cluster setup / KVM availability¶

Symptom	Cause	Fix
Pod scheduling stuck — autoscaler refuses to add a node	Pod requests don't fit `n2-standard-8` allocatable (7910m / ~27.7 GiB)	Confirm `pod.yaml` requests are 7 vCPU / 26 GiB (request==limit). Anything closer to the ceiling (8/28 Gi) is unschedulable.
QEMU boot fails with "no /dev/kvm" or KVM permission errors	Node doesn't have nested virtualization, or the kvm device plugin isn't installed	Re-run `./scripts/setup-gke.sh <project>` — it provisions KVM-capable nodes. Other clouds (EKS, AKS) need equivalent provisioning; the requirement is the same.
Build pod errors writing `.validated` marker	Node SA lacks `storage.objectAdmin` on the bucket	Re-run `./scripts/setup-gke.sh <project> --bucket gs://<bucket>`. Producer mode (gated on `--bucket`) is what grants the bucket-write permission.
`helm install` fails with "PriorityClass not found: escapebench"	The cluster-scoped `PriorityClass` from `priority-class.yaml` wasn't applied	Re-run `./scripts/setup-gke.sh` (it applies the manifest), or `kubectl apply -f src/escapebench/resources/k8s/priority-class.yaml`.

GitHub Actions producers¶

See wif-setup.md for the underlying SA / WIF setup; the items below are runtime symptoms.

Symptom	Cause	Fix
`build-pod-image.yml` step "Authenticate to Google Cloud" fails	WIF binding is missing or scoped wrong	Walk through `wif-setup.md` → Pod-image SA. The repo binding only allows the `safety-research/agent-escape-bench` repo to impersonate.
`build-qcow2.yml` upload step succeeds for `.qcow2` but the marker never appears	Expected — GHA can't write `.validated` markers	Run `python -m escapebench.build --validate-only <variant>` from the cluster afterwards to finish the round-trip.
`build-qcow2.yml` step "Authenticate to Google Cloud" fails	Same as the pod-image case but for `gha-qcow2-builder`	See `wif-setup.md` → qcow2 builder SA.
`build-qcow2.yml` `enumerate` job fails for a single-variant request	Variant exceeds 12 GiB guest RAM (`v8-release`, `v8-no-sandbox`)	Build that variant from the cluster instead; GHA runners can't host the larger guests.

Eval-time / scoring¶

Symptom	Cause	Fix
Sample fails with `LimitExceededError` after 3 identical scoring rounds	Model resubmitted the same payload bytes 3× in a row	Expected behaviour — cleanroom scoring is deterministic, so the sample ends. Bump the message budget if the model legitimately needs more attempts at different payloads.
Sample times out at `activeDeadlineSeconds`	Pod TTL safety net kicked in	Default is 21600s (6h). Bump in `escapebench.yaml` if you're running long agent runs.
Sample's mid-run state isn't readable in `inspect view`	`--log-shared` not enabled, or eval still finalizing	Add `--log-shared 60 --display log` to every agent invocation (already in CLAUDE.md / quickstart.md examples). The `.eval` zip's central directory only lands at finalize without it.
Backgrounded run's captured stdout is empty	`--display full` (default) is a Textual TUI that doesn't capture	Add `--display log`.
Multiple samples across different variants all error with "ESCAPEBENCH_ALLOW_UNVALIDATED_IMAGE" pulled images that aren't yours	The flag was leaked into the parent process env outside `_run_validation`	Only `python -m escapebench.build`'s subprocess should set this. Search your shell env / CI for stray exports; the runtime's marker check exists exactly to keep un-validated bytes inert.
Validation eval inside `python -m escapebench.build` fails on the negative control	The qcow2 boots but the scoring pipeline broke (Helm chart, payload runner, etc.)	Inspect the validation eval's `.eval` log; the build's marker is not written, so the qcow2 stays inert. Fix the underlying issue and rerun the build.
Validation eval fails on the positive control (reference exploit)	Exploit drift — kernel update / package bump moved the target	Build still completes (positive control is warn-only); reference exploit needs regeneration. For pinned exploits with embedded binaries, see `CLAUDE.md` → "Pinned reference exploits".

Local development¶

Symptom	Cause	Fix
`uv run pytest` hits import errors on the kernel family	Stale references to a removed family	Search for `KERNEL` in your branch; the kernel family was dropped 2026-04-27 (see `dropped-cves.md` → "Kernel family").
Pre-commit hook fails on `ruff` errors during commit	Hook caught lint or format issues	`uv run ruff check src/ --fix && uv run ruff format src/`, re-stage, re-commit. Don't `--no-verify` past it.
Tests in `tests/test_build.py::TestPkrHclPathRefs` fail	A `${path.root}/X` reference in a Packer template doesn't resolve under the assembled build dir	Either fix the template or update `_VARIANT_SPECS[variant].inputs` to include the missing file.