fix(eval): exit non-zero when any sheet hard-fails by ebootheee · Pull Request #73 · ebootheee/excel-to-engine

ebootheee · 2026-06-10T16:39:39Z

What

Observed live on tonight's A-1 canonical eval: the 17-sheet cluster child OOMed its 12GB heap, only the 3 standalone sheets were scored, and the harness printed Overall accuracy: 99.9% and exited 0 — a confident wrong summary from the canonical harness itself. A crashed sheet contributes zero tested cells, so the accuracy-only exit gate (>= 85%) never saw it.

Hard failures (status crash/oom/error) now force exit 1, with a loud summary line. The report still records surviving sheets' accuracy and the failed sheets' status — honest and visible, not masked.

Tests

New test-per-sheet-eval-exit-honesty.mjs: builds a cluster + healthy standalone through the real rust-parser and kills the cluster child via EVAL_CLUSTER_TIMEOUT_MS=10. Negative-controlled via stash — pre-fix the 100% standalone hides the dead cluster (exit 0); post-fix exit 1. 5/5; eval suites green.

🤖 Generated with Claude Code

…s a dishonest gate A crashed/OOMed sheet contributes ZERO tested cells, so the accuracy-only exit gate (>=85%) never saw it. Observed live on the A-1 canonical eval: the 17-sheet cluster child OOMed its 12GB heap, only the 3 standalone sheets were scored, and the harness printed an overall accuracy of 99.9% and exited 0 — a confident wrong summary from the canonical harness itself. Hard failures (status crash/oom/error) now force exit 1; the report still records the surviving sheets accuracy and the failed sheets status (honest and visible, not masked). Regression test kills the cluster child via a 10ms EVAL_CLUSTER_TIMEOUT_MS next to a healthy 100% standalone sheet — red pre-fix (exit 0), green post-fix. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ebootheee merged commit 04f7ea3 into main Jun 10, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): exit non-zero when any sheet hard-fails#73

fix(eval): exit non-zero when any sheet hard-fails#73
ebootheee merged 1 commit into
mainfrom
fix/eval-exit-honesty

ebootheee commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ebootheee commented Jun 10, 2026

What

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant