Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32
Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32alanshurafa wants to merge 3 commits into
Conversation
die() ignored its optional second argument and always exited 1, so the
~50 call sites that pass a meaningful code (2,3,5,6,8,9,10) collapsed to a
generic failure -- e.g. pr-emitter.sh's "yq required" guard intended exit
2. Honor "${2:-1}" in both die() definitions (the canonical one in lib/
and the guarded fallback in evals/) so a caller's intent reaches the
shell; the default stays 1.
Add tests/die-exit-code-simulation.sh covering both definitions. The full
existing suite shows exact baseline parity -- no path regressed on the
now-activated codes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
invoke_claude hardcoded "--model claude-opus-4-6" in both the WSL and the direct branch, so the model could not change without editing source. Read $CLAUDE_MODEL instead (default unchanged at claude-opus-4-6) and add a --claude-model flag to co-evolve-bouncer.sh. Precedence: flag > CLAUDE_MODEL env > default. agent-bouncer.sh has no option parser, so it inherits the env var only; a flag there is deferred (4.3 may retire that runner). Add tests/claude-model-override-simulation.sh (default, env override, flag wiring, --help). Full suite parity holds -- only the pre-existing template-proposer failure remains, unrelated to this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All three yq guards did presence-only checks (command -v yq), so the incompatible python yq (Debian/Ubuntu apt install yq) passed and then failed later with opaque jq-ish errors. Add require_mikefarah_yq to lib/co-evolution.sh -- discriminate on `yq --version | grep -qi mikefarah` -- and call it from evals ensure_yq; inline the same check at the two PEL sites that cannot source lib by their self-containment invariants (proposer/policy keeps echo+exit 2, pr-emitter keeps die ... 2). Document the apt install yq trap in README, evals/README, lab/pel/README. Chose the version string over the plan's functional-probe preference: a behavioral probe is harder to make robust here, while mikefarah reliably prints "mikefarah" in --version and the python yq does not. Add tests/yq-guard-simulation.sh (helper accept/reject, drift coverage at all three sites, end-to-end python rejection at exit 2 -- which also exercises the F-6 exit-code fix). Full suite parity holds; only the pre-existing template-proposer failure remains. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2631ad72d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| command -v yq >/dev/null 2>&1 || die "yq not found. Install mikefarah/yq: 'scoop install yq' (Windows), 'brew install yq' (macOS), 'apt install yq' (Linux go-install)." | ||
| # F-1: delegate to the shared guard in lib/co-evolution.sh (in scope at runtime | ||
| # via the eval-harness callers, which source both libs); rejects the python yq. | ||
| require_mikefarah_yq |
There was a problem hiding this comment.
Keep evals yq guard self-contained
When the eval library is executed directly via its supported --self-test entry point, this call now fails with require_mikefarah_yq: command not found because the self-test does not source lib/co-evolution.sh before calling ensure_yq. The main eval scripts source both libraries, but the library's own self-test and any standalone source of evals/lib/co-evolution-evals.sh now regress before reaching the actual yq check.
Useful? React with 👍 / 👎.
What this does
Phase 1 of the performance-audit remediation (
REMEDIATION-PLAN.md, #31) -- the correctness and portability fixes. Three findings, each its own commit, each test-first.F-6 --
die()now honors its exit codedie()ignored its optional second argument and always exited 1, silently collapsing ~51 call sites across 8 files that pass meaningful codes (2, 3, 5, 6, 8, 9, 10). For example,pr-emitter's "yq required" guard meant exit 2 but signalled a generic failure. Bothdie()definitions nowexit "${2:-1}"; the default stays 1.F-5a -- Claude model is overridable
invoke_claudehardcoded--model claude-opus-4-6in two places. It now reads$CLAUDE_MODEL(default unchanged) with a--claude-modelflag onco-evolve-bouncer.sh. Precedence: flag >CLAUDE_MODELenv > default.agent-bouncer.shhas no option parser, so it inherits the env var only.F-1 -- reject the wrong
yqAll three yq guards were presence-only (
command -v yq), so Debian/Ubuntu's incompatible pythonyq(fromapt install yq) passed and then failed later with opaque jq-ish errors. A sharedrequire_mikefarah_yqhelper now checksyq --versionfor the mikefarah signature; the two PEL guards that can't source the lib (self-containment invariants) inline the same check. Theapttrap is documented in three READMEs.Verification
Each fix has a hermetic test that fails before the change and passes after. The full existing suite was run before and after against master in a clean worktree -- exact parity, no regressions across the ~50 exit-code paths F-6 activated.
Notes
template-proposer-simulation.shfails from non-main worktrees. This is pre-existing (the test hardcodes a path to one checkout instead of using$REPO_ROOT) and is not introduced here; tracked separately.review-verdict.jsoncopies and enforces the loose shape, while the strict copies feed the PowerShell runner -- a design decision with cross-runner risk, not a mechanical dedup.shellcheckisn't wired yet (that lands in Phase 2);bash -nis clean on every touched file.🤖 Generated with Claude Code