From e062774286a187ad176499fdadb8d2de42cdcaf4 Mon Sep 17 00:00:00 2001 From: Claude Auto-Fix Date: Wed, 20 May 2026 14:00:26 -0400 Subject: [PATCH 1/4] =?UTF-8?q?docs:=20codex-pivot=20spec=20rev=202=20?= =?UTF-8?q?=E2=80=94=20dual-surface=20architecture=20(App=20+=20Action)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rev 2 incorporates spike #275 findings and the project-reviewer feedback on the conceptual v1. Architectural change: pr-review retires entirely (handled by the Codex GitHub App under subscription, zero in-repo YAML); the four write-side workflows migrate to openai/codex-action with API billing; the verb router collapses entirely since the App handles @codex review / @codex address feedback natively. Adds a mandatory shadow-mode phase with written kill criteria between App enablement and Claude pr-review retirement. Quality gate becomes a thin workflow filtered on the Codex bot's review state, escaping the severity-regex trap from #271. Refs #273, #275, #277 --- .../specs/2026-05-20-codex-pivot.md | 577 ++++++++++++++++++ 1 file changed, 577 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-20-codex-pivot.md diff --git a/docs/superpowers/specs/2026-05-20-codex-pivot.md b/docs/superpowers/specs/2026-05-20-codex-pivot.md new file mode 100644 index 0000000..c369c78 --- /dev/null +++ b/docs/superpowers/specs/2026-05-20-codex-pivot.md @@ -0,0 +1,577 @@ +--- +title: Codex Pivot — Dual-Surface (App for Review, codex-action for Write-Side) +revision: 2 +supersedes_conceptual: v1 (this same file, prior revision — no PR ever landed) +touches: + # Workflows — migrate (write-side; openai/codex-action) + - .github/workflows/claude-apply-fix.yml + - .github/workflows/claude-lint-failure.yml + - .github/workflows/claude-ci-failure.yml + - .github/workflows/claude-tag-respond.yml + - .github/workflows/claude-lint-fix.yml + - .github/workflows/ci-failure.yaml + - .github/workflows/apply-fix.yml + # Workflows — RETIRE entirely (handled by Codex GitHub App, no replacement file) + - .github/workflows/claude-pr-review.yml + # Workflows — new gate (one new file) + - .github/workflows/codex-gate.yml + # Workflows — runtime tree (delete all) + - .github/workflows/overlay-smoke.yml + - .github/workflows/runtime-build.yml + - .github/workflows/runtime-check-private-freshness.yml + - .github/workflows/runtime-prune-pending.yml + - .github/workflows/runtime-rollback.yml + - .github/workflows/marker-emission-aggregate.yml + # Composite actions — RETIRE (App-handled) + - pr-review/action.yml + - pr-review/lib/severity-regex.sh + # Composite actions — RETIRE (verb router collapses) + - tag-claude/action.yml + - claude-command-router/action.yml + - check-auth/action.yml + # Composite actions — migrate/rename + - apply-fix/action.yml + - lint-failure/action.yml + - lint-diagnose/action.yml + - lint-apply/action.yml + # Runtime tree (delete) + - runtime/** + # Docs and consumer surface + - CLAUDE.md + - README.md + - AGENTS.md + - docs/superpowers/specs/2026-04-21-ci-claude-runtime-design.md + - docs/superpowers/plans/2026-04-22-ci-claude-runtime.md +skills_relevant: + - github-actions + - hook-authoring +--- + +# Codex Pivot — Spec (Rev 2) + +**Epic:** #273 • **Milestone:** `codex-pivot` (#10) • **Date:** 2026-05-20 + +> **Rev 2 note.** This revision supersedes the v1 architectural assumption that +> all five workflows must migrate to `openai/codex-action`. Spike #275 confirmed +> the Codex GitHub App's subscription path works end-to-end on `glitchwerks/github-actions`, +> which restructures the migration into two surfaces (App for review; action for write-side). +> Rev 2 also addresses 4 BLOCKING / 5 CONCERN / 3 NIT findings from project-reviewer on v1. + +--- + +## 1. Overview + +Replace this repo's CI Claude surface with a **dual-surface Codex architecture:** + +- **PR review** → the **Codex GitHub App** (`chatgpt-codex-connector[bot]`), + configured cloud-side. No workflow file. No composite action. No in-repo + secret. Subscription-covered (ChatGPT Plus / Pro / Business / Edu / Enterprise). +- **Write-side workflows** (apply-fix, lint-failure, ci-failure, tag-respond) → + `openai/codex-action@v1` with `OPENAI_API_KEY`. API-billed; per-token spend. + +The entire `runtime/` Docker overlay tree (base + 3 overlays + 6 support workflows) +retires. The local dev loop (Claude Code in the IDE) is unaffected. + +Success criterion (verbatim from user): **"context-aware reviewing"** — confirmed +achievable under the App path. Spike #275 observed Codex running +`git ls-files | rg 'codex-(pivot|evaluation)\.md'` from inside its sandbox to +verify a reference claim ([spike #275 closing comment](https://github.com/glitchwerks/github-actions/issues/275), fetched 2026-05-20). +The App has full repo-tree access during review, not diff-only. + +## 2. Drivers + +1. **OAuth deprecation deadline ~2026-06-20.** Anthropic is deprecating OAuth for + non-interactive use; `CLAUDE_CODE_OAUTH_TOKEN` stops working ~30 days from + 2026-05-20. Hard cutover deadline. +2. **Quality-gate fragility.** Prose-regex severity matching (#271) has produced + self-blocking PRs (#270). The App's structured review state replaces regex + entirely. +3. **Cost.** The subscription covers the review surface at zero marginal cost. + Pre-Rev-2 estimate (whole surface on `openai/codex-action`): ~$232/mo on + gpt-5.2-codex assuming pr-review volume on all five workflows. Actual Rev 2 + spend is bounded to the four write-side workflows, all of which are failure- + triggered or on-demand (low volume) — likely <$50/mo. **unverified:** actual + monthly spend will only be measurable post-cutover. +4. **Cross-model review.** Claude Code authors locally; Codex/GPT-5.x reviews in + CI. Cross-model review catches a different failure set than same-model self- + review. + +## 3. Codex Surface Inventory — Spike-Verified Facts + +Spike #275 ([closing comment](https://github.com/glitchwerks/github-actions/issues/275), fetched 2026-05-20) verified the following on real +`chatgpt-codex-connector[bot]` reviews against PR #276: + +| Fact | Source | +|---|---| +| Bot identity is `chatgpt-codex-connector[bot]`, `author_association: NONE` | [PR #276 review](https://github.com/glitchwerks/github-actions/pull/276) (fetched 2026-05-20) | +| Subscription tier (ChatGPT Plus) is sufficient for org-owned repo review | Spike #275 + PR #276 | +| Review state on a P2-only finding is `COMMENTED` (not `CHANGES_REQUESTED`) | PR #276 | +| Severity surfaced as P0/P1/P2/P3 markdown shield-image badges + prose | PR #276 | +| Context-aware: Codex runs shell against full repo tree in its sandbox | Spike #275 closing comment | +| Triggers confirmed: PR open, draft → ready-for-review, `@codex review` comment | Codex disclosure footer on PR #276 | +| `synchronize`-event auto-trigger | **unverified:** not in Codex's disclosed trigger list; observe during shadow mode | +| `CHANGES_REQUESTED` threshold (which severity emits which state) | **unverified:** undocumented; observe during shadow mode | +| Review-depth modes (default vs exhaustive): exhaustive ~2× slower, identical 1-finding output on the spike PR | Spike #275 | +| Default-vs-exhaustive finding-density delta on real diffs | **unverified:** synthetic stimulus inconclusive; resolved by shadow-mode observation | + +For descriptive Phase 0 research (cookbook patterns, action inputs, pricing), +see `docs/superpowers/research/2026-05-20-codex-evaluation.md` (PR #277). The +spec retains only the load-bearing facts. + +### 3.1 `openai/codex-action@v1.8` — write-side action + +Currently published at **`v1.8`**; `@v1` floating tag tracks the v1 major. +**Pinning policy:** Rev 2 pins workflows at `openai/codex-action@v1` (floating +major) **only after** SHA-pinning every other third-party action in the same +file; if `openai/codex-action` ships a breaking change inside v1, all migrated +workflows fail loudly at once rather than silently drift. Revisit after the +first three months in production. + +Key inputs (unchanged from v1 spec, retained verbatim for reference): + +| Input | Purpose | +|---|---| +| `openai-api-key` | Secret for Responses API proxy (required) | +| `prompt` / `prompt-file` | Inline or file-based instructions | +| `sandbox` | `workspace-write` \| `read-only` \| `danger-full-access` | +| `output-schema` / `output-schema-file` | JSON Schema for structured output | +| `output-file` | Path Codex writes its final message to | +| `model`, `effort` | Model selection and reasoning effort | +| `codex-args` | Passthrough flags to `codex exec` | +| `safety-strategy` | `drop-sudo` (default) \| `unprivileged-user` \| `read-only` \| `unsafe` | +| `allow-users`, `allow-bots`, `allow-bot-users` | Built-in authorization gating | + +Source: [openai/codex-action README](https://github.com/openai/codex-action) (fetched 2026-05-20). + +The action **does not post PR comments itself** — workflows consume +`final-message` or read `output-file` and post via `actions/github-script` or +`gh pr comment` ([Codex GitHub Action docs](https://developers.openai.com/codex/github-action), fetched 2026-05-20). + +## 4. Architecture Decision — Dual Surface + +**Decision:** **Codex GitHub App for PR review; `openai/codex-action` for write-side workflows.** + +Reasoning chain: + +1. The App is subscription-covered. Cost on the review surface drops to $0 marginal. +2. The App's cloud sandbox provides full-repo context-aware reviewing without + any CI plumbing on our side (no diff-walk, no severity regex, no + `track_progress`, no overlay images). +3. The App ships severity filtering, formatting, comment-posting, and re-review + on `@codex review` natively — none of this needs to be re-implemented. +4. Write-side workflows can't use the App (it only reviews; it doesn't apply + patches or post diagnoses). They need the action with an API key. +5. Coexistence is supported: nothing in OpenAI's docs prohibits the App + reviewing a PR while a workflow_call-shaped action job also runs against the + same repo ([Codex GitHub integration](https://developers.openai.com/codex/integrations/github), fetched 2026-05-20). + +**Trade-off accepted:** the user pays for ChatGPT subscription **and** +`OPENAI_API_KEY` consumption. Mitigated by the four write-side workflows being +low-volume (failure-triggered / on-demand). + +**Reviewer BLOCKING #1 resolved:** v1's `codex-review-gate.yml` race condition +(unfiltered `pull_request_review` listener could fire on any reviewer's comment, +not Codex's) is **moot under the App path** because we no longer write a gate +workflow that listens for review events. The App's own review state is read by +branch protection directly — see § 7. + +## 5. Workflow-by-Workflow Migration Plan + +### 5.1 `pr-review` — RETIRE entirely + +**No replacement workflow file. No composite action.** The Codex GitHub App, +configured in `chatgpt.com/codex/settings/code-review` with **Automatic reviews +ON** for `glitchwerks/github-actions`, handles PR review end-to-end. + +**Files deleted:** + +- `.github/workflows/claude-pr-review.yml` +- `pr-review/action.yml` +- `pr-review/lib/severity-regex.sh` + +**Configuration carried over:** the inline prompt in `pr-review/action.yml` and +the repo-specific review guidance is translated into a top-level `AGENTS.md` +file (Codex reads the nearest `AGENTS.md` to each changed file — [source](https://developers.openai.com/codex/integrations/github), fetched 2026-05-20). + +### 5.2 `apply-fix` → `codex-apply-fix` (migrate; **contract preserved**) + +**Reviewer BLOCKING #2 resolution.** `codex-apply-fix/` keeps the current +contract: **the caller workflow passes a pre-produced `fix_diff` input.** The +composite action validates the diff against protected paths and applies it. +The agent (Codex) is **not** invoked inside `codex-apply-fix/`; Codex is invoked +upstream (in `codex-lint-failure/`, `codex-ci-failure/`, etc.) and **those** +workflows produce the diff that they pass into `codex-apply-fix/`. + +This preserves the v2 consumer-facing contract: + +- `apply-fix/` (today) takes `fix_diff` as input. +- `codex-apply-fix/` (Rev 2) takes `fix_diff` as input. +- **Breaking change for external consumers:** the action **path** changes + (`glitchwerks/github-actions/apply-fix@v2` → `glitchwerks/github-actions/codex-apply-fix@v3`) + and the workflow filename changes (`claude-apply-fix.yml` → `codex-apply-fix.yml`). + The **input/output schema is unchanged.** See § 10 for migration notes. + +**Legacy `apply-fix.yml`** (the `workflow_dispatch` manual-trigger wrapper) is +**deleted**, not migrated. The user can re-derive a manual path on demand from +the new `codex-apply-fix.yml` (which is `workflow_call`-shaped — invoke it from +a local `workflow_dispatch` shim if needed). + +### 5.3 `lint-failure` → `codex-lint-failure` (migrate) + +Same shape as today: fetches failed lint logs, invokes the agent for +diagnosis, optionally auto-applies a fix when `auto_apply: true`. Replace the +`claude-code-action@v1` step with `openai/codex-action@v1` using +`sandbox: workspace-write` and a verb-specific prompt file. + +**Reviewer CONCERN #9 (sandbox vs `/tmp/` log pre-writes):** the existing flow +writes the failed-lint logs to `/tmp/lint_logs.txt` before invoking the agent. +Under `openai/codex-action` with `sandbox: workspace-write`, Codex can read +files inside the workspace; `/tmp/` is outside the workspace. **Resolution:** +move log pre-writes to `${{ github.workspace }}/.tmp/lint_logs.txt` (matches +CLAUDE.md § Agent Scratch / Temp Files anyway) and reference that path in the +prompt. Carry this same fix into 5.4 and 5.5. + +**Legacy `claude-lint-fix.yml`** (two-job `lint-diagnose` + `lint-apply`) is +**deleted**. Composite action directories `lint-diagnose/` and `lint-apply/` +are deleted. The unified `codex-lint-failure/` is the single supported shape. + +### 5.4 `ci-failure` → `codex-ci-failure` (migrate) + +Both `claude-ci-failure.yml` (container-pinned) and legacy `ci-failure.yaml` +collapse to a single `codex-ci-failure.yml` reusing `codex-lint-failure/` (per +today's pattern where both reuse `lint-failure/`). + +**Reviewer NIT #10 resolution (direct-commit vs PR-open):** **keep direct-commit +semantics.** Rationale: today's `ci-failure` workflow direct-commits to the PR +branch via `codex-apply-fix/`, the user has the branch-protection quality gate +plus the Codex App's PR review as safety nets, and switching to "open a fix PR" +introduces a new PR-management burden (closing, merging, syncing) for a path +that fires only on CI failure. The OpenAI cookbook's PR-open pattern is the +default for *external* consumers who lack our gate infrastructure; we have +that infrastructure. + +### 5.5 `tag-respond` → RETIRE (verb router collapses) + +**Reviewer CONCERN #6 resolved:** `check-auth/` retires alongside `tag-claude/`, +`claude-command-router/`, and `claude-tag-respond.yml` — the Codex GitHub App +handles `@codex review` and `@codex address that feedback` natively (the disclosure +footer on PR #276 documents both). There is no remaining verb that justifies +the verb-router infrastructure. + +**Files deleted:** + +- `.github/workflows/claude-tag-respond.yml` +- `tag-claude/action.yml` (entire directory) +- `claude-command-router/action.yml` (entire directory, including `lib/parse.sh`) +- `check-auth/action.yml` (entire directory) +- `.github/workflows/test.yml` (the parse.sh corpus test) + +**Reviewer NIT for verification:** spec author confirmed via Glob that +`tests/cases.json` exists at `claude-command-router/tests/cases.json` and will +be deleted as part of the `claude-command-router/` directory deletion. Run +`git ls-tree HEAD -- claude-command-router/tests/cases.json` to verify at PR +time. + +**What we lose:** a `@codex fix` or `@codex explain` UX. If the user needs these +later, file them as fresh sub-issues; do **not** preserve the verb router on +spec. The Codex App's built-in feedback-addressing covers the most common +"act on my review" case. + +## 6. `runtime/` Retirement Plan + +Entire `runtime/` tree retires. No overlay images needed — the action runs on +vanilla `ubuntu-latest`; the App runs in OpenAI's cloud. + +**Files/directories deleted:** + +- `runtime/ci-manifest.yaml`, `runtime/ci-manifest.schema.json` +- `runtime/base/Dockerfile` +- `runtime/overlays/{review,fix,explain}/{Dockerfile,CLAUDE.md,expected.yaml}` +- `runtime/shared/CLAUDE-ci.md` +- `runtime/scripts/*` (9 scripts + `tests/expected-matcher-fixture/`) + +**Reviewer NIT #11 resolution:** `runtime/scripts/tests/` contains durable +matcher-fixture guidance. Before deletion, extract any non-obvious test patterns +into a memory file or into a comment on the closing issue for sub-issue #9 +(below). Per CLAUDE.md `# Document Files § Lifecycle`, this is the +extract-then-delete drill. + +**Workflows deleted (six):** + +- `.github/workflows/runtime-build.yml` +- `.github/workflows/runtime-check-private-freshness.yml` +- `.github/workflows/runtime-prune-pending.yml` +- `.github/workflows/runtime-rollback.yml` +- `.github/workflows/overlay-smoke.yml` +- `.github/workflows/marker-emission-aggregate.yml` + +**GHCR images:** `ghcr.io/glitchwerks/claude-runtime-{base,review,fix,explain}` +deleted after the cutover lands and a **30-day grace window** passes (in case +any external consumer pinned digests directly). Manual GHCR UI operation. +**unverified:** no audit of external consumers' digest pins; the grace window +is a defensive default, not measured. + +**Secrets retired:** `GH_PAT` (only used by `runtime-build.yml`). + +**Reviewer BLOCKING #1 secondary resolution:** the legacy spec/plan files +(`docs/superpowers/specs/2026-04-21-ci-claude-runtime-design.md` and +`docs/superpowers/plans/2026-04-22-ci-claude-runtime.md`) **delete** after the +durable rationale ("why containerization was chosen, and why it's now +discarded") is captured as a decision-log entry on the closing PR for sub-issue +#9 below. Per CLAUDE.md `# Document Files § Lifecycle`. + +## 7. Quality Gate Replacement + +**Today (delete):** +- `pr-review/action.yml:L330–L412` posts `claude-pr-review/quality-gate`. +- `:L414–L518` synthesizes structured marker HTML comments. +- `:L520–L642` posts `claude-pr-review/quality-gate-shadow`. +- Severity regex in `pr-review/lib/severity-regex.sh`. + +**Rev 2 design — three options, recommendation = option 2:** + +| Option | Mechanism | Trade-off | +|---|---|---| +| 1 | Branch-protection requires `chatgpt-codex-connector[bot]` review state = `APPROVED` | Strict; `COMMENTED` reviews on P2-only findings would falsely block | +| **2 (recommended)** | Small `codex-gate.yml` reads the App's review state; posts `codex-pr-review/quality-gate = success` unless state is `CHANGES_REQUESTED` | Matches GitHub native required-reviewer semantics; permissive on `COMMENTED` | +| 3 | Parse P0/P1 badge presence in review body | Returns to the regex problem #271 was escaping; NOT recommended | + +**Option 2 design (`codex-gate.yml`):** + +```yaml +on: + pull_request_review: + types: [submitted, edited, dismissed] +jobs: + gate: + if: github.event.review.user.login == 'chatgpt-codex-connector[bot]' + permissions: { statuses: write, pull-requests: read } + runs-on: ubuntu-latest + steps: + - name: Post quality-gate status + run: | + STATE='${{ github.event.review.state }}' + if [ "$STATE" = "changes_requested" ]; then + CONCLUSION=failure + else + CONCLUSION=success + fi + gh api repos/${{ github.repository }}/statuses/${{ github.event.pull_request.head.sha }} \ + -f context='codex-pr-review/quality-gate' \ + -f state="$CONCLUSION" +``` + +The bot-login filter (`github.event.review.user.login == 'chatgpt-codex-connector[bot]'`) +addresses Reviewer BLOCKING #1 / CONCERN #5: only Codex's reviews trip the gate, +not arbitrary reviewers. Human approvers can still `APPROVE` / `REQUEST CHANGES` +in parallel — those routes through branch protection's normal required-reviewer +config, not this status check. + +**Branch-protection ruleset rename** (in lockstep with cutover): required check +`claude-pr-review/quality-gate` → `codex-pr-review/quality-gate`. Both names +required during the shadow-mode phase (§ 8); old name dropped only after the +go decision is made. + +## 8. Shadow Mode (NEW — addresses BLOCKING #3) + +**Duration:** **≥7 days OR ≥30 real (non-synthetic) PRs, whichever is greater.** + +**Setup:** + +1. Codex GitHub App is enabled with Automatic reviews ON for `glitchwerks/github-actions`. +2. `claude-pr-review.yml` (Claude) continues to run unchanged. +3. `codex-gate.yml` posts `codex-pr-review/quality-gate`. +4. Branch protection requires **both** `claude-pr-review/quality-gate` (Claude + path) and `codex-pr-review/quality-gate` (Codex path) during the window. + PRs cannot merge unless both pass. +5. The user manually compares Codex's and Claude's reviews on every PR that + merges during the window. + +**Kill criteria (written, must be in spec):** + +**Stay on Codex (cutover proceeds)** if and only if: +- False-positive rate ≤ Claude's observed rate during the same window + (false-positive = a Codex finding the user judges incorrect or out-of-scope), AND +- No genuinely-blocking issue Claude flagged was missed by Codex within the + window, AND +- Review latency ≤ 30 min on ≥80% of PRs. + +**Revert to Claude (and reconsider the architecture)** if any of: +- Codex misses any genuinely-blocking issue Claude flagged. +- Codex false-positive rate > 2× Claude's observed rate. +- Review latency > 30 min on >20% of PRs. + +**Decision gate:** at the end of the window, the user makes an explicit go/no-go +call. The spec **records the criteria**; it does not pre-judge the outcome. + +**Failure mode if no-go:** because shadow mode preserves both gates and both +workflows, reverting is zero-cost — disable the Codex App in cloud config, +drop `codex-pr-review/quality-gate` from branch protection, and revisit the +architecture. The OAuth deadline (~2026-06-20) compresses this window — if +shadow mode starts ≤2026-05-27 it completes by ≤2026-06-03, leaving 2+ weeks +of buffer. + +## 9. Migration Sequencing + +| Order | Sub-issue | Why this order | +|---|---|---| +| 1 | #A (App + `AGENTS.md`) | No code change; observable side-by-side with Claude | +| 2 | #B (`codex-gate.yml`) | Provides replacement signal before removing old one | +| 3 | #C (branch protection — add new) | Both names required during transition | +| 4 | **SHADOW MODE WINDOW (§ 8)** | Both gates required; ≥7 days / ≥30 PRs | +| 5 | **DECISION GATE** (go/no-go) | Explicit user call | +| 6 | #D, #E, #F (write-side migrations) — parallel | apply-fix, lint-failure, ci-failure | +| 7 | #G (retire verb router) | Last workflow migration | +| 8 | #H (drop `claude-pr-review/quality-gate` requirement) | After every Claude workflow gone | +| 9 | #I (delete `runtime/` tree + six workflows) | After every consumer of overlay images migrated | +| 10 | #J (docs: CLAUDE.md, README.md, examples) | Stabilize before tagging | +| 11 | #K (cut `v3.0.0`) | Cutover complete | +| 12 | #L, #M (GHCR image deletion +30d, external-consumer audit) | Post-release cleanup | + +**Hard deadline:** Anthropic OAuth EOL ~2026-06-20. Shadow mode must start by +~2026-05-27 to keep the window inside the deadline with buffer. + +**Rollback path:** every migrated workflow lands on its own PR. If a Codex +workflow misbehaves between merge and OAuth EOL, revert the single PR — the +Claude-era workflow file returns and `CLAUDE_CODE_OAUTH_TOKEN` still works +until EOL. After EOL there is no rollback; the Claude path is dead substrate. + +## 10. Naming / Versioning / Consumer-Facing + +**Major version bump to `v3`.** Breaking changes for external consumers: +- `claude-*.yml` workflow filenames → `codex-*.yml` (where retained). +- `apply-fix/`, `lint-failure/` composite-action paths → `codex-apply-fix/`, + `codex-lint-failure/`. +- `pr-review/`, `tag-claude/`, `claude-command-router/`, `check-auth/` paths + → **deleted** (no replacement at that path). +- Required secret `CLAUDE_CODE_OAUTH_TOKEN` → `OPENAI_API_KEY` (for write-side + workflows) + cloud-side App install (for review). +- `claude-pr-review/quality-gate` status name → `codex-pr-review/quality-gate`. + +Keep `v2` floating tag frozen at the last Claude-era commit so existing +consumers don't surprise-upgrade. `v3` is the Codex line. + +**Workflow naming convention:** `codex-*.yml` (symmetric with today's +`claude-*.yml`). Discussed alternatives (`ai-*.yml`, prefix-less) rejected as +either vague at consumer site or collision-prone with retained legacy names. + +**Reviewer BLOCKING #4 — consumer onboarding files:** + +- **`examples/` directory:** verified to **not exist** as of 2026-05-20 (Glob + returned no matches). No rewrite scope here, but if the user creates this + directory before cutover, it must be added to the migration plan. +- **`docs/consumer-onboarding.md`:** verified to **not exist**. Same. +- **`README.md`:** verified via Grep to contain `CLAUDE_CODE_OAUTH_TOKEN`, + `ghcr.io/glitchwerks/claude-runtime-*` digest pins, and `claude-*.yml@v2` + `uses:` examples. **Full rewrite required** as part of sub-issue #J. + +## 11. Proposed Sub-Issues (file under milestone `codex-pivot`) + +The spec author proposes the breakdown below. Identifiers are letters (not +numbers) to avoid clashing with existing GitHub issue numbers — the router / +ops agent will assign actual issue numbers when filing. + +- **#A — Stand up Codex GitHub App + initial `AGENTS.md`.** Cloud-side App + install, configure auto-review, write top-level `AGENTS.md` translating the + inline prompt from `pr-review/action.yml`. Verify on a throwaway PR. +- **#B — Build `codex-gate.yml`.** Posts `codex-pr-review/quality-gate` based + on Codex App review state. Filter to bot login per § 7. Depends on #A. +- **#C — Branch-protection ruleset update (transition).** Add + `codex-pr-review/quality-gate` as required alongside the existing + `claude-pr-review/quality-gate`. Both required during shadow mode. +- **#D — Migrate `apply-fix` → `codex-apply-fix`.** Rename composite action + + workflow; contract preserved per § 5.2. Delete legacy `apply-fix.yml`. +- **#E — Migrate `lint-failure` → `codex-lint-failure`.** Rename, swap agent, + unify with `lint-fix` two-job legacy. Delete `lint-diagnose/`, `lint-apply/`, + `claude-lint-fix.yml`. Apply the `/tmp/` → `${{ github.workspace }}/.tmp/` + workspace-sandbox fix from § 5.3. +- **#F — Migrate `ci-failure` → `codex-ci-failure`.** Direct-commit semantics + retained per § 5.4 (NIT #10 resolution). Delete legacy `ci-failure.yaml`. +- **#G — Retire `@claude` verb router.** Delete `claude-command-router/`, + `tag-claude/`, `check-auth/`, `claude-tag-respond.yml`, `test.yml`. +- **#H — Drop `claude-pr-review/quality-gate` branch-protection requirement.** + Final cutover step after all Claude workflows removed. +- **#I — Delete `runtime/` tree.** All files in § 6, all six runtime workflows. + Extract durable matcher-test guidance and runtime decision-log per § 6 first. +- **#J — Update `CLAUDE.md`, `README.md`, retire legacy runtime spec/plan.** + Architecture rewrite, secrets table refresh, consumer-facing examples + rewritten. Mark and delete legacy runtime spec + plan files per CLAUDE.md + lifecycle rule (durable content extracted first). +- **#K — Cut `v3.0.0` release.** Move `v3` floating tag, freeze `v2` at last + Claude-era commit, write release notes documenting all breaking changes + from § 10. +- **#L — (Post-cutover, +30 days) Delete GHCR images.** Manual GHCR UI on + `claude-runtime-{base,review,fix,explain}`. +- **#M — Audit external consumers pinned to `v2`.** Survey before `v3` + release; notify of breaking-change nature. +- **#N — Document shadow-mode kill-criteria observations.** A short results + log (PR count, false-positive comparison, latency, decision) extracted from + shadow mode and attached to whichever sub-issue closes the decision gate. + +## 12. Out of Scope + +- **`lint.yml` (actionlint)** — unaffected. +- **`verify-app-secrets.yml`** — keep; App secrets are still in use for write-side workflows. +- **Local dev loop** — Claude Code on the user's machine is unchanged. +- **Codex memory / persona files beyond `AGENTS.md`** — `AGENTS.md` is the only + mechanism Codex respects ([source](https://developers.openai.com/codex/integrations/github), fetched 2026-05-20). + Deletion of `runtime/shared/CLAUDE-ci.md` is permanent. +- **`@codex fix` / `@codex explain` verbs** — out of scope. If the user wants + these later, file fresh issues; do not preserve the verb router on spec. + +## 13. Open Questions for the User (gating sub-issue filing) + +These are questions Rev 2 cannot answer in writing. The user should resolve +before #A is filed. + +1. **Shadow-mode window length.** Spec proposes **≥7 days OR ≥30 PRs, whichever + is greater.** The PR-volume threshold may be unreachable in a 7-day window + for this repo's actual cadence. Confirm or adjust. +2. **Severity threshold for `CHANGES_REQUESTED`.** Spike #275 observed + `COMMENTED` on a P2 finding; the spec recommends gating on `!= CHANGES_REQUESTED` + (option 2 in § 7). Confirm this matches the user's risk tolerance — if a P1 + finding emits `COMMENTED` (undocumented behavior), option 2 would let it + merge. +3. **External consumer audit timing.** Sub-issue #M is listed post-release in + § 9. Should it run **before** the `v3` cut instead, to give consumers a + migration window? +4. **`AGENTS.md` voice and detail.** The PR-review prompt at + `pr-review/action.yml:L1–L696` is verbose and Claude-specific. How much + of it translates verbatim vs. gets rewritten for Codex's conventions? (The + App's docs note Codex follows "Review guidelines" sections specifically.) +5. **`OPENAI_API_KEY` budget cap.** Recommend setting a hard cap (OpenAI's + billing dashboard supports this) so a runaway workflow can't burn through + the user's account. Acceptable cap value? + +## 14. Definition of Done + +- All five Claude-Code-Action workflows replaced (4) or retired (1) per § 5. +- `runtime/` tree fully deleted; GHCR images scheduled for deletion (+30d). +- `CLAUDE_CODE_OAUTH_TOKEN` removed from repo secrets. +- `OPENAI_API_KEY` added; identity model documented in `CLAUDE.md`. +- `AGENTS.md` exists at repo root with review guidelines. +- `codex-pr-review/quality-gate` is the **sole** required PR-review check on + branch protection; `claude-pr-review/quality-gate` removed. +- `codex-gate.yml` posts the new status filtered to the Codex bot login. +- `v3.0.0` tag cut, `v3` floating tag moved, `v2` frozen at last Claude-era + commit, release notes published. +- `CLAUDE.md` Architecture + Required Secrets sections accurate; `README.md` + consumer examples accurate. +- Legacy runtime spec + plan files deleted per CLAUDE.md doc lifecycle rule + (durable content extracted to memory or decision log first). +- Shadow-mode results documented (sub-issue #N). + +--- + +## Sources + +- [openai/codex-action README](https://github.com/openai/codex-action) (fetched 2026-05-20) — action inputs, outputs, auth model, sandbox, safety-strategy, `@v1` floating tag +- [Codex GitHub Action docs](https://developers.openai.com/codex/github-action) (fetched 2026-05-20) — required permissions, OS support, posting reviews via `actions/github-script` +- [Codex GitHub integration (App)](https://developers.openai.com/codex/integrations/github) (fetched 2026-05-20) — App installation, `@codex review`, automatic reviews, P0/P1 filtering, `AGENTS.md` customization +- [Codex pricing](https://developers.openai.com/codex/pricing) (fetched 2026-05-20) — subscription coverage, API-key billing +- [Codex CLI autofix cookbook](https://developers.openai.com/cookbook/examples/codex/autofix-github-actions) (fetched 2026-05-20) — `workflow_run`-triggered autofix pattern +- [Spike #275 closing comment](https://github.com/glitchwerks/github-actions/issues/275) (fetched 2026-05-20) — bot identity, subscription tier, context-aware reading, trigger list, severity-badge format, review-state semantics, depth-mode timing +- [PR #276](https://github.com/glitchwerks/github-actions/pull/276) (fetched 2026-05-20) — real `chatgpt-codex-connector[bot]` review evidence, `author_association: NONE`, `state: COMMENTED` on P2 finding +- `docs/superpowers/research/2026-05-20-codex-evaluation.md` (PR #277, pending merge as of spec write) — Phase 0 descriptive research; spec retains only load-bearing facts +- Repo internals: `pr-review/action.yml:L1–L696`, `apply-fix/action.yml:L1–L93`, `claude-command-router/action.yml:L1–L98`, `check-auth/action.yml:L1–L62`, `runtime/` tree, `CLAUDE.md § Architecture`, `CLAUDE.md § CI Runtime (Phase 1+)`, `CLAUDE.md § Required secrets`, `README.md` (Grep-verified 2026-05-20 to contain Claude-era token + image references) +- Epic #273; quality-gate context #271, #270; ruleset enforcement #176; container `packages: read` discovery #192; git safe.directory baking #199; spike #275; spike PR #276 From 07a08f04542882799b0db4828b93dc45344750f1 Mon Sep 17 00:00:00 2001 From: Claude Auto-Fix Date: Wed, 20 May 2026 14:20:30 -0400 Subject: [PATCH 2/4] docs: address project-reviewer findings on spec v2 (3 BLOCKING + 4 CONCERN + 2 NIT) BLOCKING: - examples/** and docs/consumer-onboarding.md DO exist on main (false negative in planner's verification step); add to touches, correct false claim, expand sub-issue #J's scope to cover them. - Shadow-mode duration logic rewritten: 7 days AND >=7 PRs binding floor; >=30 PRs is advisory, not blocking -- closes the deadline trap. - codex-gate.yml YAML draft hardened: empty-SHA guard + description field + workflow-level statuses:write permission + filter to chatgpt-codex-connector[bot]. CONCERN: - openai/codex-action @v1 pinning policy rescoped (precondition was unsatisfiable). - AGENTS.md sub-issue #A gains concrete acceptance test. - New sub-issue #O gates synchronize-event verification before Claude retirement. NIT: - S6 GHCR deletion sequencing note added. - S9 verb-router-retirement atomicity note added. Refs #273, #278 (this PR), #275 (spike findings unchanged). --- .../specs/2026-05-20-codex-pivot.md | 131 ++++++++++++++---- 1 file changed, 103 insertions(+), 28 deletions(-) diff --git a/docs/superpowers/specs/2026-05-20-codex-pivot.md b/docs/superpowers/specs/2026-05-20-codex-pivot.md index c369c78..d8bcf8a 100644 --- a/docs/superpowers/specs/2026-05-20-codex-pivot.md +++ b/docs/superpowers/specs/2026-05-20-codex-pivot.md @@ -36,6 +36,9 @@ touches: - lint-apply/action.yml # Runtime tree (delete) - runtime/** + # Consumer-facing surface (rewrite for v3) + - examples/** + - docs/consumer-onboarding.md # Docs and consumer surface - CLAUDE.md - README.md @@ -121,11 +124,14 @@ spec retains only the load-bearing facts. ### 3.1 `openai/codex-action@v1.8` — write-side action Currently published at **`v1.8`**; `@v1` floating tag tracks the v1 major. -**Pinning policy:** Rev 2 pins workflows at `openai/codex-action@v1` (floating -major) **only after** SHA-pinning every other third-party action in the same -file; if `openai/codex-action` ships a breaking change inside v1, all migrated -workflows fail loudly at once rather than silently drift. Revisit after the -first three months in production. +**Pinning policy.** `openai/codex-action` is pinned at `@v1` as a starting +point. If a breaking change is observed in the first 90 days post-cutover +(defined: any composite-action behavior regression, input/output schema change, +or sandbox-mode default flip), promote the pin to a SHA digest captured at the +last-known-good run. SHA-pinning of `actions/checkout`, `actions/create-github-app-token`, +and other third-party actions in the same composite action is a separate concern +tracked under sub-issue #L (or file a new sub-issue if #L doesn't cover it) and +is NOT a prerequisite for the `@v1` major-tag pin. Key inputs (unchanged from v1 spec, retained verbatim for reference): @@ -309,7 +315,10 @@ extract-then-delete drill. deleted after the cutover lands and a **30-day grace window** passes (in case any external consumer pinned digests directly). Manual GHCR UI operation. **unverified:** no audit of external consumers' digest pins; the grace window -is a defensive default, not measured. +is a defensive default, not measured. **Sequencing:** sub-issue #M +(external-consumer audit) MUST complete before the 30-day grace window starts, +so digest-pinned consumers receive a full 30 days of notice from the +audit-completion date rather than from the v3 release date. **Secrets retired:** `GH_PAT` (only used by `runtime-build.yml`). @@ -342,30 +351,54 @@ discarded") is captured as a decision-log entry on the closing PR for sub-issue on: pull_request_review: types: [submitted, edited, dismissed] + +permissions: + statuses: write + pull-requests: read + jobs: gate: if: github.event.review.user.login == 'chatgpt-codex-connector[bot]' - permissions: { statuses: write, pull-requests: read } runs-on: ubuntu-latest steps: + - name: Fail if head SHA was empty + if: github.event.pull_request.head.sha == '' + run: | + echo "::error::pull_request.head.sha was empty — gate cannot post a commit status." + exit 1 + - name: Post quality-gate status + if: github.event.pull_request.head.sha != '' + env: + GH_TOKEN: ${{ github.token }} + REVIEW_STATE: ${{ github.event.review.state }} + BOT_LOGIN: ${{ github.event.review.user.login }} + HEAD_SHA: ${{ github.event.pull_request.head.sha }} run: | - STATE='${{ github.event.review.state }}' - if [ "$STATE" = "changes_requested" ]; then + if [ "$REVIEW_STATE" = "changes_requested" ]; then CONCLUSION=failure + DESCRIPTION="Codex requested changes — see the bot review for details" else CONCLUSION=success + DESCRIPTION="Codex review state: $REVIEW_STATE" fi - gh api repos/${{ github.repository }}/statuses/${{ github.event.pull_request.head.sha }} \ + gh api repos/${{ github.repository }}/statuses/$HEAD_SHA \ -f context='codex-pr-review/quality-gate' \ - -f state="$CONCLUSION" + -f state="$CONCLUSION" \ + -f description="$DESCRIPTION" ``` The bot-login filter (`github.event.review.user.login == 'chatgpt-codex-connector[bot]'`) -addresses Reviewer BLOCKING #1 / CONCERN #5: only Codex's reviews trip the gate, -not arbitrary reviewers. Human approvers can still `APPROVE` / `REQUEST CHANGES` -in parallel — those routes through branch protection's normal required-reviewer -config, not this status check. +on the `gate` job's `if:` condition addresses Reviewer BLOCKING #1 / CONCERN #5: +only Codex's reviews trip the gate, not arbitrary reviewers. Human approvers can +still `APPROVE` / `REQUEST CHANGES` in parallel — those route through branch +protection's normal required-reviewer config, not this status check. + +**Permissions** are declared at the **workflow level** (`permissions: { statuses: +write, pull-requests: read }` at the top of the file, not inside the job) per the +key convention in CLAUDE.md — GitHub ignores job-level permissions when calling +reusable workflows, and consistent top-level placement makes the grant surface +auditable at a glance. **Branch-protection ruleset rename** (in lockstep with cutover): required check `claude-pr-review/quality-gate` → `codex-pr-review/quality-gate`. Both names @@ -374,7 +407,7 @@ go decision is made. ## 8. Shadow Mode (NEW — addresses BLOCKING #3) -**Duration:** **≥7 days OR ≥30 real (non-synthetic) PRs, whichever is greater.** +**Duration:** ≥7 days AND ≥N real (non-synthetic) PRs observed, where N is the lesser of (a) 30 PRs and (b) whatever volume the repo organically produces in a 14-day window. Concretely: shadow mode ends at the later of the 7-day mark and the moment at least 7 PRs have been processed by both reviewers. If 30+ PRs are achieved in that window (unlikely at observed cadence), so much the better — but the count threshold is advisory, not blocking. If after 14 days the PR count is still below 7, the cutover proceeds on time alone with an explicit caveat recorded on the decision-gate sub-issue noting the limited sample size. **Setup:** @@ -419,7 +452,7 @@ of buffer. | 2 | #B (`codex-gate.yml`) | Provides replacement signal before removing old one | | 3 | #C (branch protection — add new) | Both names required during transition | | 4 | **SHADOW MODE WINDOW (§ 8)** | Both gates required; ≥7 days / ≥30 PRs | -| 5 | **DECISION GATE** (go/no-go) | Explicit user call | +| 5 | **DECISION GATE** (go/no-go) | Explicit user call; requires #O resolved (synchronize-event behavior confirmed) | | 6 | #D, #E, #F (write-side migrations) — parallel | apply-fix, lint-failure, ci-failure | | 7 | #G (retire verb router) | Last workflow migration | | 8 | #H (drop `claude-pr-review/quality-gate` requirement) | After every Claude workflow gone | @@ -428,6 +461,12 @@ of buffer. | 11 | #K (cut `v3.0.0`) | Cutover complete | | 12 | #L, #M (GHCR image deletion +30d, external-consumer audit) | Post-release cleanup | +**Atomicity note — #G (retire verb router).** Sub-issue #G's PR must delete +`claude-tag-respond.yml`, `tag-claude/`, `claude-command-router/`, and +`check-auth/` in a single PR — not piecemeal. The four files form an +interconnected verb-router; partial deletion creates a partially-routed surface +that exists until the final piece lands. + **Hard deadline:** Anthropic OAuth EOL ~2026-06-20. Shadow mode must start by ~2026-05-27 to keep the window inside the deadline with buffer. @@ -457,10 +496,14 @@ either vague at consumer site or collision-prone with retained legacy names. **Reviewer BLOCKING #4 — consumer onboarding files:** -- **`examples/` directory:** verified to **not exist** as of 2026-05-20 (Glob - returned no matches). No rewrite scope here, but if the user creates this - directory before cutover, it must be added to the migration plan. -- **`docs/consumer-onboarding.md`:** verified to **not exist**. Same. +- **`examples/` directory and `docs/consumer-onboarding.md`:** Both exist on + main. `examples/` contains 5 caller-workflow templates plus a README + (`examples/{README.md,claude-apply-fix.yml,claude-ci-failure.yml,claude-lint-failure.yml,claude-pr-review.yml,claude-tag-respond.yml}`). + `docs/consumer-onboarding.md` also exists. Both reference + `CLAUDE_CODE_OAUTH_TOKEN`, GHCR overlay packages, and `claude-*.yml@v2` + `uses:` lines — all of which become invalid post-v3. They are added to + `touches:` and to sub-issue #J's scope explicitly. (The planner's original + Glob verification was a false negative; corrected 2026-05-20 by project-reviewer.) - **`README.md`:** verified via Grep to contain `CLAUDE_CODE_OAUTH_TOKEN`, `ghcr.io/glitchwerks/claude-runtime-*` digest pins, and `claude-*.yml@v2` `uses:` examples. **Full rewrite required** as part of sub-issue #J. @@ -474,6 +517,13 @@ ops agent will assign actual issue numbers when filing. - **#A — Stand up Codex GitHub App + initial `AGENTS.md`.** Cloud-side App install, configure auto-review, write top-level `AGENTS.md` translating the inline prompt from `pr-review/action.yml`. Verify on a throwaway PR. + **Acceptance:** AGENTS.md is verified adequate by triggering Codex review on + a throwaway test PR that contains at least one defect of a class the existing + Claude `pr-review` prompt's domain-specific guidance was designed to catch + (e.g. an unquoted shell variable expansion in a composite action's bash step, + or a missing `packages: read` on a container-pinned workflow). If Codex's + review surfaces that finding, AGENTS.md is adequate. If not, iterate on the + AGENTS.md content and re-test before merging. - **#B — Build `codex-gate.yml`.** Posts `codex-pr-review/quality-gate` based on Codex App review state. Filter to bot login per § 7. Depends on #A. - **#C — Branch-protection ruleset update (transition).** Add @@ -493,10 +543,19 @@ ops agent will assign actual issue numbers when filing. Final cutover step after all Claude workflows removed. - **#I — Delete `runtime/` tree.** All files in § 6, all six runtime workflows. Extract durable matcher-test guidance and runtime decision-log per § 6 first. -- **#J — Update `CLAUDE.md`, `README.md`, retire legacy runtime spec/plan.** - Architecture rewrite, secrets table refresh, consumer-facing examples - rewritten. Mark and delete legacy runtime spec + plan files per CLAUDE.md - lifecycle rule (durable content extracted first). +- **#J — Rewrite consumer examples + onboarding for v3.** Rewrite + `examples/{README.md,claude-apply-fix.yml,claude-ci-failure.yml,claude-lint-failure.yml,claude-pr-review.yml,claude-tag-respond.yml}` + and `docs/consumer-onboarding.md` to remove `CLAUDE_CODE_OAUTH_TOKEN` + references, GHCR overlay package install steps, and `claude-*.yml@v2` + `uses:` lines. Replace with the post-cutover surface (`codex-*.yml@v3`, + `OPENAI_API_KEY` secret, and the App-handled PR-review path which requires + no caller-workflow file). Also update `CLAUDE.md` and `README.md` + (architecture rewrite, secrets table refresh) and retire legacy runtime + spec + plan files per CLAUDE.md lifecycle rule (durable content extracted + first). Acceptance: a fresh consumer following the rewritten + examples/onboarding can complete setup against + `glitchwerks/github-actions@v3` without any reference to retired surfaces. + **Gating sub-issue — must merge before `v3` tag is cut.** - **#K — Cut `v3.0.0` release.** Move `v3` floating tag, freeze `v2` at last Claude-era commit, write release notes documenting all breaking changes from § 10. @@ -507,6 +566,22 @@ ops agent will assign actual issue numbers when filing. - **#N — Document shadow-mode kill-criteria observations.** A short results log (PR count, false-positive comparison, latency, decision) extracted from shadow mode and attached to whichever sub-issue closes the decision gate. +- **#O — Verify Codex App `synchronize`-event behavior before retiring + `claude-pr-review.yml`.** The current Claude pr-review handles `synchronize` + events by reviewing only new commits (`git diff before..after`). Codex's + documented triggers are PR open, draft→ready, and `@codex review` comment — + `synchronize` is not explicitly listed. Before the cutover decision-gate in + §9, run a controlled test: open a PR with the Codex App enabled, push a + follow-up commit, observe whether the App posts a fresh review automatically. + **Outcomes and fallbacks:** + - **App auto-reviews on `synchronize`:** no action needed; Claude can be + retired on schedule. + - **App does NOT auto-review on `synchronize`:** the cutover plan must add + either (a) a small `codex-synchronize-trigger.yml` workflow that posts + `@codex review` on synchronize events, or (b) accept reduced review + coverage on push-after-open and document it in `examples/README.md`. + **Gating:** this sub-issue MUST resolve before the §9 decision gate. Do NOT + delete `claude-pr-review.yml` without an answer. ## 12. Out of Scope @@ -524,9 +599,9 @@ ops agent will assign actual issue numbers when filing. These are questions Rev 2 cannot answer in writing. The user should resolve before #A is filed. -1. **Shadow-mode window length.** Spec proposes **≥7 days OR ≥30 PRs, whichever - is greater.** The PR-volume threshold may be unreachable in a 7-day window - for this repo's actual cadence. Confirm or adjust. +1. **Shadow-mode window length (resolved 2026-05-20 by reviewer feedback).** + Volume threshold is advisory, not blocking. 7 days + 7 observed PRs is + the binding floor; ≥30 PRs is aspirational. See §8 for the rationale. 2. **Severity threshold for `CHANGES_REQUESTED`.** Spike #275 observed `COMMENTED` on a P2 finding; the spec recommends gating on `!= CHANGES_REQUESTED` (option 2 in § 7). Confirm this matches the user's risk tolerance — if a P1 From 640e20a1d28549a50e8cddf154feedb6ac69ef11 Mon Sep 17 00:00:00 2001 From: Claude Auto-Fix Date: Wed, 20 May 2026 14:44:59 -0400 Subject: [PATCH 3/4] docs: address Claude pr-review findings on spec v2 (#278) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MAJOR: - Shadow mode duration logic clarified (7d AND 7 PRs binding; 14d max; 30 PRs aspirational) - #M sequencing contradiction resolved — 30d GHCR clock gated on #M completion, not v3 release Medium: - #O sub-issue gains explicit timing (run during #A/#B, before shadow window) Nits: - Empty-SHA error message includes recovery hint - Redundant if-condition retained with explanatory comment - §8 observation methodology specifies structured PR-by-PR logging schema - #J acceptance test method specifies throwaway consumer-test repo - BLOCKING-#4 verification step now shows the git ls-tree command + 2026-05-20 date Findings from claude-action-runner[bot] review of e062774 → 07a08f0. Refs #273, #278 Co-Authored-By: Claude Sonnet 4.6 --- .../specs/2026-05-20-codex-pivot.md | 40 ++++++++++++++++--- 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/docs/superpowers/specs/2026-05-20-codex-pivot.md b/docs/superpowers/specs/2026-05-20-codex-pivot.md index d8bcf8a..0cf1845 100644 --- a/docs/superpowers/specs/2026-05-20-codex-pivot.md +++ b/docs/superpowers/specs/2026-05-20-codex-pivot.md @@ -364,10 +364,12 @@ jobs: - name: Fail if head SHA was empty if: github.event.pull_request.head.sha == '' run: | - echo "::error::pull_request.head.sha was empty — gate cannot post a commit status." + echo "::error::pull_request.head.sha was empty (possible webhook payload issue or unusual event type). Gate cannot post commit status. Check the PR event payload or re-trigger the review." exit 1 - name: Post quality-gate status + # defensive: prior step already fails on empty SHA, but this guards + # against step-skipping due to manual workflow_dispatch or unusual event shape if: github.event.pull_request.head.sha != '' env: GH_TOKEN: ${{ github.token }} @@ -407,7 +409,11 @@ go decision is made. ## 8. Shadow Mode (NEW — addresses BLOCKING #3) -**Duration:** ≥7 days AND ≥N real (non-synthetic) PRs observed, where N is the lesser of (a) 30 PRs and (b) whatever volume the repo organically produces in a 14-day window. Concretely: shadow mode ends at the later of the 7-day mark and the moment at least 7 PRs have been processed by both reviewers. If 30+ PRs are achieved in that window (unlikely at observed cadence), so much the better — but the count threshold is advisory, not blocking. If after 14 days the PR count is still below 7, the cutover proceeds on time alone with an explicit caveat recorded on the decision-gate sub-issue noting the limited sample size. +**Duration:** ≥7 days AND ≥7 real (non-synthetic) PRs observed, with a maximum window of 14 days. + +- **Minimum:** 7 days + 7 PRs — both must be met before the decision gate opens. +- **Maximum:** 14 days — if the PR count is still below 7 at day 14, the cutover proceeds on time alone with an explicit caveat recorded on the decision-gate sub-issue noting the limited sample size. +- **Aspirational:** 30 PRs if organically achieved within the window; additional data strengthens the decision but is not a blocking requirement. **Setup:** @@ -420,6 +426,13 @@ go decision is made. 5. The user manually compares Codex's and Claude's reviews on every PR that merges during the window. +**Observation methodology.** During shadow mode, log observations on sub-issue +#N in a table with at minimum these columns: `PR#` | `Claude finding count +(by severity)` | `Codex finding count (by severity)` | `Findings unique to +each` | `False-positive count (each side)` | `Codex latency (trigger → +review-posted)`. The kill criteria (below) reference these columns directly — +without structured logging, the criteria are unverifiable at decision-gate time. + **Kill criteria (written, must be in spec):** **Stay on Codex (cutover proceeds)** if and only if: @@ -451,7 +464,7 @@ of buffer. | 1 | #A (App + `AGENTS.md`) | No code change; observable side-by-side with Claude | | 2 | #B (`codex-gate.yml`) | Provides replacement signal before removing old one | | 3 | #C (branch protection — add new) | Both names required during transition | -| 4 | **SHADOW MODE WINDOW (§ 8)** | Both gates required; ≥7 days / ≥30 PRs | +| 4 | **SHADOW MODE WINDOW (§ 8)** | Both gates required; ≥7 days + ≥7 PRs (14-day max) | | 5 | **DECISION GATE** (go/no-go) | Explicit user call; requires #O resolved (synchronize-event behavior confirmed) | | 6 | #D, #E, #F (write-side migrations) — parallel | apply-fix, lint-failure, ci-failure | | 7 | #G (retire verb router) | Last workflow migration | @@ -459,7 +472,7 @@ of buffer. | 9 | #I (delete `runtime/` tree + six workflows) | After every consumer of overlay images migrated | | 10 | #J (docs: CLAUDE.md, README.md, examples) | Stabilize before tagging | | 11 | #K (cut `v3.0.0`) | Cutover complete | -| 12 | #L, #M (GHCR image deletion +30d, external-consumer audit) | Post-release cleanup | +| 12 | #L, #M (GHCR image deletion +30d, external-consumer audit) | Post-release cleanup. **Note:** the 30-day grace window does NOT start at v3 release; it starts when #M completes. This guarantees digest-pinned consumers receive a full 30 days of notice from audit-completion regardless of when #M finishes. | **Atomicity note — #G (retire verb router).** Sub-issue #G's PR must delete `claude-tag-respond.yml`, `tag-claude/`, `claude-command-router/`, and @@ -504,6 +517,9 @@ either vague at consumer site or collision-prone with retained legacy names. `uses:` lines — all of which become invalid post-v3. They are added to `touches:` and to sub-issue #J's scope explicitly. (The planner's original Glob verification was a false negative; corrected 2026-05-20 by project-reviewer.) + Verified via `git ls-tree -r origin/main` on 2026-05-20: returned + `examples/{README.md,claude-apply-fix.yml,claude-ci-failure.yml,claude-lint-failure.yml,claude-pr-review.yml,claude-tag-respond.yml}` + and `docs/consumer-onboarding.md`. - **`README.md`:** verified via Grep to contain `CLAUDE_CODE_OAUTH_TOKEN`, `ghcr.io/glitchwerks/claude-runtime-*` digest pins, and `claude-*.yml@v2` `uses:` examples. **Full rewrite required** as part of sub-issue #J. @@ -555,6 +571,13 @@ ops agent will assign actual issue numbers when filing. first). Acceptance: a fresh consumer following the rewritten examples/onboarding can complete setup against `glitchwerks/github-actions@v3` without any reference to retired surfaces. + **Test method:** Use a throwaway repo (e.g., `cbeaulieu-gt/codex-pivot-consumer-test` + or a fresh personal repo) as the consumer test bed. The acceptance is met + when a clean checkout of that test repo, following only the rewritten + `examples/README.md` and `docs/consumer-onboarding.md`, can install the + Codex GitHub App, open a test PR, and observe a Codex review post — with + no reference to retired Claude/GHCR/OAuth surfaces in the consumer's + workflow. **Gating sub-issue — must merge before `v3` tag is cut.** - **#K — Cut `v3.0.0` release.** Move `v3` floating tag, freeze `v2` at last Claude-era commit, write release notes documenting all breaking changes @@ -583,6 +606,11 @@ ops agent will assign actual issue numbers when filing. **Gating:** this sub-issue MUST resolve before the §9 decision gate. Do NOT delete `claude-pr-review.yml` without an answer. + **Timing:** Run this test as part of sub-issue #A (App setup) or #B + (`codex-gate.yml`), before the shadow mode window begins. This ensures any + fallback workflow (#O option (a) `codex-synchronize-trigger.yml`) can be + built in parallel with shadow-mode setup without delaying the decision gate. + ## 12. Out of Scope - **`lint.yml` (actionlint)** — unaffected. @@ -600,8 +628,8 @@ These are questions Rev 2 cannot answer in writing. The user should resolve before #A is filed. 1. **Shadow-mode window length (resolved 2026-05-20 by reviewer feedback).** - Volume threshold is advisory, not blocking. 7 days + 7 observed PRs is - the binding floor; ≥30 PRs is aspirational. See §8 for the rationale. + Binding floor: ≥7 days AND ≥7 real PRs. Maximum: 14 days (cutover proceeds + on time even if <7 PRs, with caveat). ≥30 PRs is aspirational. See §8. 2. **Severity threshold for `CHANGES_REQUESTED`.** Spike #275 observed `COMMENTED` on a P2 finding; the spec recommends gating on `!= CHANGES_REQUESTED` (option 2 in § 7). Confirm this matches the user's risk tolerance — if a P1 From 243a4eedf904bbca4e58e4ee7b376310b47c4002 Mon Sep 17 00:00:00 2001 From: Claude Auto-Fix Date: Wed, 20 May 2026 15:05:46 -0400 Subject: [PATCH 4/4] docs: complete Option-B propagation for #M audit timing (#278) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The fix-up in 640e20a applied Option B to §9 row 12 (conditional 30-day clock) but left two references with Option-A "before-release" framing: #M's body and §13 Q3. This commit propagates Option B to both — #M runs post-release in parallel with #L; the GHCR-deletion clock is gated on #M completion; §13 Q3 marked resolved. Refs #273, #278 --- .../specs/2026-05-20-codex-pivot.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/superpowers/specs/2026-05-20-codex-pivot.md b/docs/superpowers/specs/2026-05-20-codex-pivot.md index 0cf1845..aa64ca2 100644 --- a/docs/superpowers/specs/2026-05-20-codex-pivot.md +++ b/docs/superpowers/specs/2026-05-20-codex-pivot.md @@ -584,8 +584,13 @@ ops agent will assign actual issue numbers when filing. from § 10. - **#L — (Post-cutover, +30 days) Delete GHCR images.** Manual GHCR UI on `claude-runtime-{base,review,fix,explain}`. -- **#M — Audit external consumers pinned to `v2`.** Survey before `v3` - release; notify of breaking-change nature. +- **#M — Audit external consumers pinned to `v2`.** Survey post-release in + parallel with `#L` (GHCR-deletion clock prep). Notify each consumer of the + breaking-change nature of `v3` and the upcoming GHCR image deletions. + **Audit completion gates the start of the 30-day GHCR-deletion grace window + (see §9 row 12).** This decouples release timing from audit completion — the + cutover can proceed against the OAuth deadline without `#M` becoming a + release blocker. - **#N — Document shadow-mode kill-criteria observations.** A short results log (PR count, false-positive comparison, latency, decision) extracted from shadow mode and attached to whichever sub-issue closes the decision gate. @@ -635,9 +640,12 @@ before #A is filed. (option 2 in § 7). Confirm this matches the user's risk tolerance — if a P1 finding emits `COMMENTED` (undocumented behavior), option 2 would let it merge. -3. **External consumer audit timing.** Sub-issue #M is listed post-release in - § 9. Should it run **before** the `v3` cut instead, to give consumers a - migration window? +3. **External consumer audit timing (resolved 2026-05-20).** Decided: #M stays + post-release in §9 row 12, but the 30-day GHCR-deletion grace window does + NOT start at v3 release — it starts when #M completes. This gives consumers + a full 30 days of notice from audit-completion without making #M a release + blocker on the OAuth-deadline path. See §9 row 12 and `#M`'s body for the + resolved semantics. 4. **`AGENTS.md` voice and detail.** The PR-review prompt at `pr-review/action.yml:L1–L696` is verbose and Claude-specific. How much of it translates verbatim vs. gets rewritten for Codex's conventions? (The