v1.5: claude-build (Codex orchestrates, Claude executes) + Opus default + Windows status-reader fix#37
Conversation
… research notes Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…hase 1) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Phase 2) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… + status reader (v1.5 Phase 3) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s (v1.5 Phase 5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ier seat now real (v1.5 Phase 6) Found by live dogfood: (1) --preset codex-build --verifier codex leaked VERIFIER_MODEL=fable into CODEX_MODEL (ChatGPT plan rejects with HTTP 400); apply_seat_env now drops a cross-agent-kind model+effort pair as a unit. (2) review-verdict.json lacked additionalProperties:false on issues.items and complete required lists (OpenAI strict structured output) — the codex --output-schema verify path could never have produced a verdict. Proven E2E: real run exits 0 with verdict APPROVED@96, verify tokens captured. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ATE live, token evidence note Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Native jq.exe cannot open Git Bash process-substitution paths (/proc/<pid>/fd/N), so dev-review-status.sh --json failed only on Windows. Replace --slurpfile <(jq ...) with --argjson "$(jq -c ...)". Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fable access was lost, so the codex-build preset pointed its two Claude seats (plan + review) at an unreachable model. Default them through a best/opus alias (-> claude-opus-4-8) so a future model bump is one line in resolve_claude_model_alias. Extend the cross-agent leak guard to drop best|opus from codex seats, document a session-follow opt-in, and move the docs-sync + preset-expansion drift guards in lockstep. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mirror of codex-build: a Codex-hosted orchestrator that plans and reviews while Claude executes the build. Adds the claude-build preset arm (Codex plans/reviews at xhigh, model unpinned; Claude executes at best/high), help + unknown-preset text, the Codex-facing protocol doc dev-review/codex/claude-build.md (synchronous gate; HARD-requires claude auth since Claude is the executor; detached+polling documented as future design only), routing + AGENTS contract, and drift-guard scenarios pinning both preset triples. Built via codex-build (Codex executed, reviewed in-session). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7934bcdb2b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| write_state_field "$STATE_JSON" ".completed_at" "string" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" | ||
| # v1.5 Phase 3: plan-only runs also reach a clean terminal — clear the in-flight | ||
| # phase so a status reader does not see a stale compose/bounce as "still running". | ||
| write_state_field "$STATE_JSON" ".current_phase" "null" |
There was a problem hiding this comment.
Mark plan-only runs terminal in state
In the --plan-only exit path, this records completed_at and clears current_phase but leaves the initial .status as pending. I verified --skip-plan --plan ... --plan-only exits 0 with state.status still "pending"; dev-review-status.sh --json then treats the finished run as non-terminal/dead and exits 4. Set .status based on PLAN_EXIT before cleanup so completed plan-only runs are reported as completed/partial instead of presumed dead.
Useful? React with 👍 / 👎.
| --claude-model) | ||
| [[ $# -gt 1 ]] || die "--claude-model requires a value" | ||
| # v1.5: resolve friendly alias (fable -> claude-fable-5) else passthrough. | ||
| CLAUDE_MODEL=$(resolve_claude_model_alias "$2") |
There was a problem hiding this comment.
Let --claude-model override preset seats
When a preset has already set a Claude seat model such as COMPOSER_MODEL, EXECUTOR_MODEL, or VERIFIER_MODEL to best, this flag updates only the global CLAUDE_MODEL; apply_seat_env later prefers the seat-specific model, so --preset claude-build --claude-model claude-opus-4-6 still invokes Claude with --model claude-opus-4-8 (reproduced with stubs: seat_models.executor stayed opus:claude-opus-4-8@high). That contradicts the new last-wins preset help and prevents callers from pinning Claude via the documented flag; update or clear the active Claude seat overrides here.
Useful? React with 👍 / 👎.
The execute/verify auth gate scanned both the output and the full stderr log for auth substrings unconditionally, so any run whose working log echoed auth strings — plan text, or the auth-detection source itself — was misread as "authentication failed" and aborted with no verdict (observed building claude-build via codex-build). Mirror lib's robust validate_agent_artifact discriminator: an auth banner in the output counts only when the output is short (<50 words), and a banner in stderr counts only when the output is empty (a real work product means the CLI authenticated). Adds reliability sim S5a-S5d covering the regression. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three changes stack here, smallest blast radius first.
What's in it
Status reader works under native Windows jq (
733972a)dev-review-status.sh --jsonfailed only on Windows: nativejq.execan't open Git Bash process-substitution paths (/proc/<pid>/fd/N). Swapped--slurpfile <(jq ...)for--argjson "$(jq -c ...)". Every other--slurpfilein the repo reads a real file, so this was the only offender.codex-build defaults to Opus, not Fable (
e3d7fdc)Fable access was lost, so the preset's two Claude seats pointed at an unreachable model. They now default through a
best/opusalias (→claude-opus-4-8), so the next model bump is a one-line edit inresolve_claude_model_alias. The cross-agent leak guard learned to dropbest|opusfrom codex seats, and the docs-sync + preset-expansion drift guards moved in lockstep. A session-follow opt-in is documented for callers who want the seats to match their session model.claude-build— the mirror of codex-build (7934bcd)A Codex-hosted orchestrator: Codex plans and reviews while Claude executes the build. The reverse ladder already worked via role flags; this names it as a preset (Codex plans/reviews at xhigh model-unpinned, Claude executes at best/high) and adds the Codex-facing protocol doc
dev-review/codex/claude-build.md. Because Codex has no harness wake-on-exit, the orchestration is synchronous; a detached+polling variant is documented as future design only. The Claude-executor path hard-requiresclaudeauth (no degrade — degrading the executor to Codex is just codex-build).This feature was itself built through codex-build: Codex executed the plan in an isolated worktree, reviewed in-session.
Verification
Full suite green locally on Windows (25/25). This PR exists to run the same suite on the macOS (BSD / bash 3.2) and Ubuntu legs of the CI matrix — macOS is the GNU-ism canary. Static audit of the new code found no portability hazards.
🤖 Generated with Claude Code