Skip to content

v1.5: claude-build (Codex orchestrates, Claude executes) + Opus default + Windows status-reader fix#37

Merged
alanshurafa merged 12 commits into
masterfrom
feat/v1.5-codex-build
Jun 13, 2026
Merged

v1.5: claude-build (Codex orchestrates, Claude executes) + Opus default + Windows status-reader fix#37
alanshurafa merged 12 commits into
masterfrom
feat/v1.5-codex-build

Conversation

@alanshurafa

Copy link
Copy Markdown
Owner

Three changes stack here, smallest blast radius first.

What's in it

Status reader works under native Windows jq (733972a)
dev-review-status.sh --json failed only on Windows: native jq.exe can't open Git Bash process-substitution paths (/proc/<pid>/fd/N). Swapped --slurpfile <(jq ...) for --argjson "$(jq -c ...)". Every other --slurpfile in the repo reads a real file, so this was the only offender.

codex-build defaults to Opus, not Fable (e3d7fdc)
Fable access was lost, so the preset's two Claude seats pointed at an unreachable model. They now default through a best/opus alias (→ claude-opus-4-8), so the next model bump is a one-line edit in resolve_claude_model_alias. The cross-agent leak guard learned to drop best|opus from codex seats, and the docs-sync + preset-expansion drift guards moved in lockstep. A session-follow opt-in is documented for callers who want the seats to match their session model.

claude-build — the mirror of codex-build (7934bcd)
A Codex-hosted orchestrator: Codex plans and reviews while Claude executes the build. The reverse ladder already worked via role flags; this names it as a preset (Codex plans/reviews at xhigh model-unpinned, Claude executes at best/high) and adds the Codex-facing protocol doc dev-review/codex/claude-build.md. Because Codex has no harness wake-on-exit, the orchestration is synchronous; a detached+polling variant is documented as future design only. The Claude-executor path hard-requires claude auth (no degrade — degrading the executor to Codex is just codex-build).

This feature was itself built through codex-build: Codex executed the plan in an isolated worktree, reviewed in-session.

Verification

Full suite green locally on Windows (25/25). This PR exists to run the same suite on the macOS (BSD / bash 3.2) and Ubuntu legs of the CI matrix — macOS is the GNU-ism canary. Static audit of the new code found no portability hazards.

🤖 Generated with Claude Code

alanshurafa and others added 11 commits June 12, 2026 12:40
… research notes

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…hase 1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Phase 2)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… + status reader (v1.5 Phase 3)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s (v1.5 Phase 5)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ier seat now real (v1.5 Phase 6)

Found by live dogfood: (1) --preset codex-build --verifier codex leaked
VERIFIER_MODEL=fable into CODEX_MODEL (ChatGPT plan rejects with HTTP 400);
apply_seat_env now drops a cross-agent-kind model+effort pair as a unit.
(2) review-verdict.json lacked additionalProperties:false on issues.items
and complete required lists (OpenAI strict structured output) — the codex
--output-schema verify path could never have produced a verdict. Proven
E2E: real run exits 0 with verdict APPROVED@96, verify tokens captured.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ATE live, token evidence note

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Native jq.exe cannot open Git Bash process-substitution paths
(/proc/<pid>/fd/N), so dev-review-status.sh --json failed only on
Windows. Replace --slurpfile <(jq ...) with --argjson "$(jq -c ...)".

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fable access was lost, so the codex-build preset pointed its two Claude
seats (plan + review) at an unreachable model. Default them through a
best/opus alias (-> claude-opus-4-8) so a future model bump is one line
in resolve_claude_model_alias. Extend the cross-agent leak guard to drop
best|opus from codex seats, document a session-follow opt-in, and move
the docs-sync + preset-expansion drift guards in lockstep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mirror of codex-build: a Codex-hosted orchestrator that plans and reviews
while Claude executes the build. Adds the claude-build preset arm (Codex
plans/reviews at xhigh, model unpinned; Claude executes at best/high),
help + unknown-preset text, the Codex-facing protocol doc
dev-review/codex/claude-build.md (synchronous gate; HARD-requires claude
auth since Claude is the executor; detached+polling documented as future
design only), routing + AGENTS contract, and drift-guard scenarios
pinning both preset triples. Built via codex-build (Codex executed,
reviewed in-session).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7934bcdb2b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 1559 to +1562
write_state_field "$STATE_JSON" ".completed_at" "string" "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
# v1.5 Phase 3: plan-only runs also reach a clean terminal — clear the in-flight
# phase so a status reader does not see a stale compose/bounce as "still running".
write_state_field "$STATE_JSON" ".current_phase" "null"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark plan-only runs terminal in state

In the --plan-only exit path, this records completed_at and clears current_phase but leaves the initial .status as pending. I verified --skip-plan --plan ... --plan-only exits 0 with state.status still "pending"; dev-review-status.sh --json then treats the finished run as non-terminal/dead and exits 4. Set .status based on PLAN_EXIT before cleanup so completed plan-only runs are reported as completed/partial instead of presumed dead.

Useful? React with 👍 / 👎.

--claude-model)
[[ $# -gt 1 ]] || die "--claude-model requires a value"
# v1.5: resolve friendly alias (fable -> claude-fable-5) else passthrough.
CLAUDE_MODEL=$(resolve_claude_model_alias "$2")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let --claude-model override preset seats

When a preset has already set a Claude seat model such as COMPOSER_MODEL, EXECUTOR_MODEL, or VERIFIER_MODEL to best, this flag updates only the global CLAUDE_MODEL; apply_seat_env later prefers the seat-specific model, so --preset claude-build --claude-model claude-opus-4-6 still invokes Claude with --model claude-opus-4-8 (reproduced with stubs: seat_models.executor stayed opus:claude-opus-4-8@high). That contradicts the new last-wins preset help and prevents callers from pinning Claude via the documented flag; update or clear the active Claude seat overrides here.

Useful? React with 👍 / 👎.

The execute/verify auth gate scanned both the output and the full stderr
log for auth substrings unconditionally, so any run whose working log
echoed auth strings — plan text, or the auth-detection source itself —
was misread as "authentication failed" and aborted with no verdict
(observed building claude-build via codex-build). Mirror lib's robust
validate_agent_artifact discriminator: an auth banner in the output
counts only when the output is short (<50 words), and a banner in stderr
counts only when the output is empty (a real work product means the CLI
authenticated). Adds reliability sim S5a-S5d covering the regression.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@alanshurafa alanshurafa merged commit 14b684a into master Jun 13, 2026
6 checks passed
@alanshurafa alanshurafa deleted the feat/v1.5-codex-build branch June 13, 2026 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant