-
Notifications
You must be signed in to change notification settings - Fork 0
Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alanshurafa
wants to merge
3
commits into
master
Choose a base branch
from
fix/audit-phase-1-correctness
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| #!/usr/bin/env bash | ||
| # tests/claude-model-override-simulation.sh | ||
| # Hermetic gate for the Claude model override (audit finding F-5a). | ||
| # | ||
| # invoke_claude() previously hardcoded "--model claude-opus-4-6" in two places. | ||
| # After the fix it must read $CLAUDE_MODEL (default claude-opus-4-6), so the | ||
| # model is overridable via the CLAUDE_MODEL env var and the --claude-model flag | ||
| # without changing the default. Precedence: --claude-model > CLAUDE_MODEL > default. | ||
| # | ||
| # Coverage: | ||
| # 1. default model is the unchanged claude-opus-4-6 | ||
| # 2. CLAUDE_MODEL env override reaches the claude invocation | ||
| # 3. co-evolve-bouncer.sh wires --claude-model to CLAUDE_MODEL | ||
| # 4. --help documents --claude-model | ||
| # | ||
| # Pattern: a PATH-injected claude stub records the --model value it receives. | ||
|
|
||
| set -uo pipefail | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" | ||
| LIB="$REPO_ROOT/lib/co-evolution.sh" | ||
| BOUNCER="$REPO_ROOT/co-evolve-bouncer.sh" | ||
|
|
||
| TEST_DIR="$(mktemp -d -t claude-model-XXXXXX)" | ||
| trap 'rm -rf "$TEST_DIR"' EXIT | ||
|
|
||
| TOTAL=0 | ||
| FAILURES=0 | ||
| pass() { printf "PASS: %s\n" "$1"; } | ||
| fail() { printf "FAIL: %s\n" "$1" >&2; FAILURES=$((FAILURES + 1)); } | ||
|
|
||
| # claude stub: record the value following --model, then emit a document body. | ||
| mkdir -p "$TEST_DIR/bin" | ||
| cat > "$TEST_DIR/bin/claude" <<'STUB' | ||
| #!/usr/bin/env bash | ||
| if [[ "$*" == *"--version"* ]]; then echo "claude 1.0.0 (model-stub)"; exit 0; fi | ||
| model="" | ||
| while [[ $# -gt 0 ]]; do | ||
| if [[ "$1" == "--model" ]]; then model="${2:-}"; shift 2; continue; fi | ||
| shift | ||
| done | ||
| [[ -n "${MODEL_MARKER:-}" ]] && printf '%s' "$model" > "$MODEL_MARKER" | ||
| cat > /dev/null # consume stdin | ||
| echo "Stub document body with enough plain words to clear any downstream size check." | ||
| STUB | ||
| chmod +x "$TEST_DIR/bin/claude" | ||
|
|
||
| # --- Scenario 1: default model is claude-opus-4-6 --- | ||
| TOTAL=$((TOTAL + 1)) | ||
| marker="$TEST_DIR/m_default" | ||
| ( | ||
| unset CLAUDE_MODEL | ||
| export MODEL_MARKER="$marker" | ||
| export PATH="$TEST_DIR/bin:$PATH" | ||
| source "$LIB" | ||
| invoke_claude /dev/null "$TEST_DIR/out1" "$TEST_DIR/err1" false | ||
| ) >/dev/null 2>&1 | ||
| got="$(cat "$marker" 2>/dev/null || true)" | ||
| if [[ "$got" == "claude-opus-4-6" ]]; then | ||
| pass "default model is claude-opus-4-6 (got '$got')" | ||
| else | ||
| fail "default: expected claude-opus-4-6, got '$got'" | ||
| fi | ||
|
|
||
| # --- Scenario 2: CLAUDE_MODEL env override reaches the invocation --- | ||
| TOTAL=$((TOTAL + 1)) | ||
| marker="$TEST_DIR/m_env" | ||
| ( | ||
| export CLAUDE_MODEL="claude-test-env-xyz" | ||
| export MODEL_MARKER="$marker" | ||
| export PATH="$TEST_DIR/bin:$PATH" | ||
| source "$LIB" | ||
| invoke_claude /dev/null "$TEST_DIR/out2" "$TEST_DIR/err2" false | ||
| ) >/dev/null 2>&1 | ||
| got="$(cat "$marker" 2>/dev/null || true)" | ||
| if [[ "$got" == "claude-test-env-xyz" ]]; then | ||
| pass "CLAUDE_MODEL env override honored (got '$got')" | ||
| else | ||
| fail "env override: expected claude-test-env-xyz, got '$got'" | ||
| fi | ||
|
|
||
| # --- Scenario 3: runner wires --claude-model to CLAUDE_MODEL --- | ||
| TOTAL=$((TOTAL + 1)) | ||
| if grep -Eq -- '--claude-model\)' "$BOUNCER" && grep -Fq 'CLAUDE_MODEL="$2"' "$BOUNCER"; then | ||
| pass "co-evolve-bouncer.sh wires --claude-model to CLAUDE_MODEL" | ||
| else | ||
| fail "co-evolve-bouncer.sh missing --claude-model -> CLAUDE_MODEL wiring" | ||
| fi | ||
|
|
||
| # --- Scenario 4: --help documents --claude-model --- | ||
| TOTAL=$((TOTAL + 1)) | ||
| help_out="$(bash "$BOUNCER" --help 2>/dev/null || true)" | ||
| if [[ "$help_out" == *"--claude-model"* ]]; then | ||
| pass "--help documents --claude-model" | ||
| else | ||
| fail "--help does not document --claude-model" | ||
| fi | ||
|
|
||
| passed=$((TOTAL - FAILURES)) | ||
| if (( FAILURES == 0 )); then | ||
| echo "$passed/$TOTAL scenarios passed" | ||
| exit 0 | ||
| else | ||
| echo "$passed/$TOTAL scenarios passed ($FAILURES failed)" >&2 | ||
| exit 1 | ||
| fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| #!/usr/bin/env bash | ||
| # tests/die-exit-code-simulation.sh | ||
| # Hermetic unit gate for die() exit-code propagation (audit finding F-6). | ||
| # | ||
| # die() historically ignored its optional second argument and always exited 1, | ||
| # silently collapsing ~50 call sites that pass meaningful codes (2,3,5,6,8,9,10) | ||
| # down to 1. After the fix die() must `exit "${2:-1}"`: honor an explicit code, | ||
| # default to 1 when none is given. Both definitions are covered — the canonical | ||
| # one in lib/co-evolution.sh and the guarded fallback in | ||
| # evals/lib/co-evolution-evals.sh — so the two cannot drift. | ||
| # | ||
| # Pattern: each case runs in a pristine `bash -c` that sources one lib and calls | ||
| # die, so no function definition leaks between cases and the | ||
| # sourced-in-production path is exercised faithfully. | ||
|
|
||
| set -uo pipefail # NOT -e: cases capture their own exit codes | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" | ||
|
|
||
| LIB="$REPO_ROOT/lib/co-evolution.sh" | ||
| EVALS_LIB="$REPO_ROOT/evals/lib/co-evolution-evals.sh" | ||
|
|
||
| TOTAL=0 | ||
| FAILURES=0 | ||
|
|
||
| pass() { printf "PASS: %s\n" "$1"; } | ||
| fail() { printf "FAIL: %s\n" "$1" >&2; FAILURES=$((FAILURES + 1)); } | ||
|
|
||
| # run_die <lib-path> <die-call> -> prints the exit code die produced. | ||
| run_die() { | ||
| local lib="$1" call="$2" | ||
| bash -c "source \"$lib\"; $call" >/dev/null 2>&1 | ||
| printf '%s' "$?" | ||
| } | ||
|
|
||
| # check <description> <lib-path> <die-call> <expected-code> | ||
| check() { | ||
| TOTAL=$((TOTAL + 1)) | ||
| local got | ||
| got="$(run_die "$2" "$3")" | ||
| if [[ "$got" == "$4" ]]; then | ||
| pass "$1 (exit $got)" | ||
| else | ||
| fail "$1: expected exit $4, got $got" | ||
| fi | ||
| } | ||
|
|
||
| check "lib die: no code defaults to 1" "$LIB" 'die "boom"' 1 | ||
| check "lib die: explicit 2 honored" "$LIB" 'die "boom" 2' 2 | ||
| check "lib die: explicit 5 honored" "$LIB" 'die "boom" 5' 5 | ||
| check "lib die: explicit 10 honored" "$LIB" 'die "boom" 10' 10 | ||
| check "evals die: no code defaults to 1" "$EVALS_LIB" 'die "boom"' 1 | ||
| check "evals die: explicit 3 honored" "$EVALS_LIB" 'die "boom" 3' 3 | ||
|
|
||
| # die() must still log the message before exiting (fix preserves logging). | ||
| TOTAL=$((TOTAL + 1)) | ||
| msg_out="$(bash -c "source \"$LIB\"; die \"distinct-marker-xyz\" 2" 2>&1 || true)" | ||
| if [[ "$msg_out" == *"ERROR: distinct-marker-xyz"* ]]; then | ||
| pass "lib die: logs ERROR message before exit" | ||
| else | ||
| fail "lib die: expected ERROR message in output, got: $msg_out" | ||
| fi | ||
|
|
||
| passed=$((TOTAL - FAILURES)) | ||
| if (( FAILURES == 0 )); then | ||
| echo "$passed/$TOTAL scenarios passed" | ||
| exit 0 | ||
| else | ||
| echo "$passed/$TOTAL scenarios passed ($FAILURES failed)" >&2 | ||
| exit 1 | ||
| fi |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the eval library is executed directly via its supported
--self-testentry point, this call now fails withrequire_mikefarah_yq: command not foundbecause the self-test does not sourcelib/co-evolution.shbefore callingensure_yq. The main eval scripts source both libraries, but the library's own self-test and any standalone source ofevals/lib/co-evolution-evals.shnow regress before reaching the actual yq check.Useful? React with 👍 / 👎.