codex-stack

codex-stack turns Codex from a generic coding assistant into a team of workflow specialists you can call on demand.

Eighteen opinionated workflow modes for Codex: product framing, technical planning, paranoid diff review, scored QA, regression triage, preview verification, deploy verification, release shipping, browser automation, engineering retrospectives, upgrade audits, fleet rollout plus auto-remediation control, local agent staffing, goal and queue tracking, heartbeat scheduling, approval gates, and local MCP interoperability.

Inspired by gstack, codex-stack adapts the same specialist-workflow idea for Codex. If gstack is the Claude Code version of this pattern, codex-stack is the Codex-native version. This project is independently maintained and is not affiliated with gstack.

Without codex-stack

Requests stay vague, so the agent executes before the scope is really clear.
Review depth varies from run to run because there is no shared checklist or report shape.
Browser QA lives in one-off prompts instead of reusable checked-in flows.
Shipping still needs manual PR setup, reviewer routing, labels, assignees, and project metadata.
Deployment validation is disconnected from the shipping path.
Weekly updates for stakeholders are assembled by hand.

With codex-stack

Skill	Mode	What it does
`product`	Product thinker	Reframes a request into user outcomes, scope, and acceptance criteria.
`tech`	Tech lead	Locks architecture, trust boundaries, rollout risks, and the test plan.
`review`	Paranoid staff engineer	Audits the diff for structural production risks instead of style noise.
`qa`	QA lead	Runs browser flows, diff-aware route probes, and snapshot checks, then scores release readiness.
`qa-decide`	Regression triage operator	Records explicit approvals, suppressions, and refresh-required decisions for known regressions.
`preview`	Preview verifier	Resolves a PR preview URL, waits for readiness, runs QA, and reports whether the preview is safe to merge.
`deploy`	Deploy verifier	Verifies a preview or staging deploy across path and device checks, flows, snapshots, screenshots, and console evidence.
`ship`	Release engineer	Validates the branch, prepares PR metadata, and can run QA before opening the PR.
`browse`	QA engineer	Drives a real browser with persistent sessions, portable session bundles, named flows, snapshots, and artifacts.
`retro`	Engineering manager	Summarizes delivery patterns from git history and optional GitHub PR analytics.
`upgrade`	Repo maintainer	Audits Bun, dependency drift, workflow action drift, and install health for codex-stack itself.
`fleet`	Control plane operator	Pushes shared policy packs across repos, collects normalized health, and renders a GitHub-native rollout dashboard.
`agents`	Agent manager	Registers named engineering agents, reporting lines, and staffing status for local orchestration.
`goals`	Program lead	Tracks goal hierarchy and assignable work queues for multiple agents.
`heartbeat`	Loop operator	Schedules agent wakeups, records heartbeat runs, and preserves per-agent continuity state.
`approvals`	Governance lead	Manages approval requests and gates high-risk autonomous actions.
`mcp`	Interop engineer	Exposes read-only codex-stack tools and evidence resources to MCP-capable clients over local stdio.

Default workflow

Use the repo in this order:

Open an issue
Create a branch from that issue
Open a PR from the issue branch
Let pr-review comment and gate the PR automatically
Add the automerge label when the PR is ready to merge after checks

What ships today

Installable Codex skills under skills/
Checked-in root CLI under src/cli.ts
Playwright-backed browser runtime under browse/src/cli.ts with semantic selectors, device presets, iframe targeting, request mocking/blocking, download capture, upload, dialog, wait-state, and element-state assertions
Persistent named browser sessions
Portable session import/export with cookie and storage-state bundles
Checked-in and local browser flows with import/export for JSON, YAML, and Markdown
Page snapshots and snapshot comparison artifacts with self-contained visual packs, diff heatmaps, and image-diff scores
QA reports with typed categories, severity, health score, diff-aware route inference, stale-baseline detection, visual-risk scoring, saved evidence, annotated screenshots, and published visual packs for snapshot failures
Repo-tracked baseline decision files under .codex-stack/baseline-decisions/ so expected regressions stay explicit, reviewable, and expirable
Opt-in accessibility scans and performance budget checks that feed the same QA, preview, deploy, ship, PR review, and Pages report surfaces
Historical QA trend artifacts under .codex-stack/qa/trends.json and .codex-stack/qa/trends.md
Preview verification with URL template resolution, readiness polling, deploy/page verification, QA execution, and PR comment output for preview deployments
Deploy verification with page and device matrices, screenshot manifests, console capture, tracked evidence, and visual/index.html review packs with ranked regressions plus a consolidated visual-risk score
Shipping automation with PR body generation, labels, reviewers, assignees, projects, and optional deploy verification
PR comments with deploy verification summaries and artifact references after ship --pr
Tracked QA evidence published under docs/qa/<branch>/ during shipping so PR comments can link to real files
GitHub Pages publishing for docs/qa/ so merged QA reports keep a stable URL after branch cleanup
Issue-first workflow automation with PR review comments and opt-in auto-merge
Fleet rollout controls for multi-repo policy packs, policy-aware health expectations, rollout PR planning, auto-remediation issues, and org-level dashboard rendering
Local control-plane state under .codex-stack/control-plane/ with named agents, goals, delegated task queues, schedules with retry and cooloff policy, continuity sessions, budget policies, and approval queues
Local stdio MCP server with read-only and dry-run wrappers for review, QA, preview, deploy, ship planning, fleet planning, retro, and upgrade workflows
Retrospective analytics plus weekly digest publishing outputs for markdown, Slack, and email, including visual regression rollups from published QA evidence
Upgrade auditing via CLI plus a daily scheduled update-check workflow that syncs a stable issue

Quick start

bun --version
./setup
bun run typecheck
bunx playwright install chromium
bash scripts/install-skills.sh user
bun src/cli.ts list

./setup runs environment checks, installs Bun dependencies when needed, and creates local wrappers under .codex-stack/bin/ for:

codex-stack
codex-stack-browse
product
tech
review
qa
qa-decide
preview
deploy
ship
browse
retro
upgrade
fleet
agents
goals
heartbeat
approvals
mcp

If you want shell-level commands, link those wrappers into your PATH:

bash scripts/link-commands.sh

Swarm multiple agents

You can run multiple Codex sessions in parallel across separate worktrees or terminals.

Typical split:

one agent in review
one agent in qa
one agent in preview
one agent in deploy
one agent in ship

Because the command contracts are shared, those agents stay aligned on the same review, QA, and shipping workflow.

Demo the sample app

The repo includes a release-readiness demo app at examples/customer-portal-demo/. It is built for technical evaluators who need to see whether codex-stack can support a real merge decision, not just automate a toy page.

Start it:

bun run demo:start
bun run demo:publish-qa

Then run the core live sequence:

bun src/cli.ts browse run-flow http://127.0.0.1:4173/login release-full-demo --session friend-demo
bun src/cli.ts browse snapshot http://127.0.0.1:4173/dashboard release-dashboard --session friend-demo
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session friend-demo
bun src/cli.ts deploy --url http://127.0.0.1:4173 --path /dashboard --path /changes --device desktop --device mobile --flow release-dashboard --flow release-changes --snapshot release-dashboard --publish-dir docs/qa/demo/deploy
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device desktop --verify-device mobile --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard

What that sequence demonstrates:

authenticated browser automation
QA evidence tied to release confidence
page and device verification against a believable change-impact surface
shipping decisions backed by the same evidence

The checked-in release-login flow clears the demo app's stored login state before navigation so you can re-run it safely on the same named browser session. bun run demo:publish-qa refreshes the tracked sample report at docs/qa/release-readiness-demo/, which is what keeps the public QA Pages landing view useful before you ship a real branch report.

Issue to merge flow

Create the work item and branch:

bun src/cli.ts issue start --title "Add issue-first PR workflow" --label automation --prefix feat

This creates a GitHub issue and a local branch like feat/123-add-issue-first-pr-workflow.

Ship the branch as a PR:

bun src/cli.ts ship --message "feat: add issue-first workflow" --push --pr

What happens next:

pr-review.yml runs codex-stack review on the PR diff
for same-repo PRs, the review workflow publishes a GitHub Pages preview at https://<owner>.github.io/<repo>/pr-preview/pr-<number>/ and verifies that live preview before merging
the same workflow republishes review evidence under https://<owner>.github.io/<repo>/pr-preview/pr-<number>/__codex/
the workflow posts or updates a PR comment with structural findings plus hosted preview visual evidence
the job fails if critical findings are detected in either structural review or preview verification
if the PR has the automerge label, pr-automerge.yml enables GitHub auto-merge

Branch naming matters: when the branch follows <prefix>/<issue-number>-slug, ship includes Closes #<issue-number> in the generated PR body so the issue closes on merge.

Root CLI

bun src/cli.ts list
bun src/cli.ts show qa
bun src/cli.ts review --json --base origin/main
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session demo --json
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --a11y --a11y-scope main --perf --perf-budget lcp=2s --perf-budget cls=0.1 --session demo --json
bun src/cli.ts qa https://preview.example.com --mode diff-aware --base-ref origin/main --session preview --json
bun src/cli.ts qa https://preview.example.com/dashboard --flow release-dashboard --session preview-auth --session-bundle .codex-stack/private/preview-auth.json --json
bun src/cli.ts qa-decide approve --snapshot release-dashboard --route /dashboard --device desktop --kind snapshot-drift --reason "Intentional redesign approved in PR #123"
bun src/cli.ts qa-decide suppress --category accessibility --kind accessibility-rule --route /checkout --device desktop --rule color-contrast --reason "Vendor widget pending upstream fix" --expires-at 2026-03-29T00:00:00Z
bun src/cli.ts preview --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" --pr 42 --branch feat/42-preview --sha abcdef123 --path /login --path /dashboard --device desktop --device mobile --flow release-full-demo
bun src/cli.ts preview --url-template "https://preview-{pr}.example.com" --pr 42 --branch feat/42-preview --sha abcdef123 --path / --path /dashboard --device desktop --device mobile --flow landing-smoke --snapshot landing-home
bun src/cli.ts preview --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" --pr 42 --branch feat/42-preview --sha abcdef123 --path /dashboard --device desktop --flow release-dashboard --session preview-auth --session-bundle .codex-stack/private/preview-auth.json
bun src/cli.ts deploy --url https://staging.example.com --path / --path /dashboard --path /changes --device desktop --device mobile --flow release-dashboard --flow release-changes --snapshot release-dashboard
bun src/cli.ts deploy --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --snapshot release-dashboard --a11y --a11y-scope main --perf --perf-budget lcp=2s --perf-budget cls=0.1
bun src/cli.ts deploy --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --session staging-auth --session-bundle .codex-stack/private/staging-auth.json
bun src/cli.ts ship --message "feat: ready for review" --push --pr --reviewer octocat --assignee @me --project "Engineering Roadmap"
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device mobile --verify-console-errors --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device mobile --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard --verify-a11y --verify-a11y-scope main --verify-perf --verify-perf-budget lcp=2s
bun src/cli.ts fleet validate --manifest .codex-stack/fleet.example.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts fleet collect --manifest .codex-stack/fleet.example.json --json
bun src/cli.ts fleet dashboard --manifest .codex-stack/fleet.example.json --out .fleet-site
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts agents add --name lead-1 --runtime codex --role manager --team platform --status working
bun src/cli.ts agents dashboard --out .codex-stack/control-plane/dashboard
bun src/cli.ts goals add --id release-q2 --title "Release Q2 hardening" --type initiative --owner lead-1 --status active
bun src/cli.ts goals task add --id review-contracts --goal release-q2 --title "Review agent contracts" --assignee reviewer-1 --action-kind review --action-arg --base --action-arg origin/main --expected-minutes 10
bun src/cli.ts goals queue --json
bun src/cli.ts agents budget set --agent reviewer-1 --window daily --max-runs 8 --max-minutes 120 --max-cost-units 20
bun src/cli.ts heartbeat schedule add --agent reviewer-1 --task review-contracts --trigger cron --expression "*/30 * * * *" --summary "Review new queue items"
bun src/cli.ts heartbeat run-due --agent reviewer-1 --max-agents 1 --max-tasks 1 --json
bun src/cli.ts approvals gate --agent reviewer-1 --kind ship-pr --target review-contracts --json
bun src/cli.ts mcp inspect --json
bun src/cli.ts mcp serve
bun src/cli.ts retro --since "7 days ago" --repo anup4khandelwal/codex-stack
bun src/cli.ts upgrade --offline --json
bun src/cli.ts upgrade --offline --apply
bun src/cli.ts browse doctor
bun src/cli.ts browse flows
bun src/cli.ts browse export-session ./tmp/staging-session.json --session staging
bun src/cli.ts browse import-session ./tmp/staging-session.json --session staging-copy
bun src/cli.ts browse import-browser-cookies chrome --session staging --profile Default
bun src/cli.ts browse probe https://example.com/settings --session staging
bun src/cli.ts browse upload https://example.com/profile "input[type=file]" ./fixtures/avatar.png --session staging
bun src/cli.ts browse dialog https://example.com/settings accept "#delete-confirm" --session staging
bun src/cli.ts browse click https://example.com/login "role:button:Continue" --session staging --device mobile
bun src/cli.ts browse fill https://example.com/login "label:Email" demo@example.com --session staging
bun src/cli.ts browse assert-visible https://example.com/home "testid:hero" --session staging
bun src/cli.ts browse click https://example.com/checkout "role:button:Pay now" --session staging --frame "name:payment"
bun src/cli.ts browse mock https://example.com/app "**/api/profile" '{"status":503,"json":{"error":"offline"}}' --session staging
bun src/cli.ts browse download https://example.com/reports "role:button:Export CSV" ./artifacts/report.csv --session staging
bun src/cli.ts browse assert-focused https://example.com/login "input[name=email]" --session staging
bun src/cli.ts browse snapshot https://example.com marketing-home --session staging
bun src/cli.ts browse compare-snapshot https://example.com marketing-home --session staging

Useful Bun scripts:

bun run doctor
bun run typecheck
bun run smoke
bun run demo:start
bun run demo:smoke
bun run review
bun run qa -- http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session demo
bun src/cli.ts qa-decide list --active-only
bun run preview -- --url-template "https://preview-{pr}.example.com" --pr 42 --branch feat/42-preview --sha abcdef123 --path / --device desktop --flow landing-smoke --snapshot landing-home
bun run deploy -- --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --snapshot release-dashboard
bun run ship:dry
bun run agents -- list --json
bun run goals -- queue --json
bun run heartbeat -- list --json
bun run approvals -- list --json
bun run mcp -- inspect --json
bun run retro
bun run upgrade
bun run upgrade:apply
bun run weekly

Control plane

codex-stack now has a local control-plane layer for multi-agent engineering work before you need a full remote scheduler.

Core ideas:

register named agents with roles, runtimes, teams, and managers
track initiative and repo goals explicitly
assign persistent queued work and delegate parent tasks into child tasks instead of only running one-shot commands
schedule heartbeats with continuity state, retry limits, and cooloff windows per agent
execute one bounded workflow step per agent heartbeat, capture execution history, and resume from continuity state on the next tick
enforce budget limits and explicit approvals for high-risk actions, including ship and rollout remediation preflight
render a static dashboard under .codex-stack/control-plane/dashboard/

Useful commands:

bun src/cli.ts agents add --name lead-1 --runtime codex --role manager --team platform --status working
bun src/cli.ts agents add --name reviewer-1 --runtime claude-code --role reviewer --team platform --manager lead-1
bun src/cli.ts goals add --id release-q2 --title "Release Q2 hardening" --type initiative --owner lead-1 --status active
bun src/cli.ts goals task add --id review-contracts --goal release-q2 --title "Review agent contracts" --assignee reviewer-1 --action-kind review --action-arg --base --action-arg origin/main --expected-minutes 10
bun src/cli.ts goals task add --id release-train --goal release-q2 --title "Coordinate release" --assignee lead-1 --action-kind custom-command --action-arg node --action-arg -e --action-arg "console.log(JSON.stringify({summary:'lead ok',nextAction:'complete'}))" --delegate-id qa-contracts --delegate-title "Run delegated QA" --delegate-assignee qa-1 --delegate-action-kind qa --delegate-action-arg http://127.0.0.1:4173/dashboard --delegate-action-arg --flow --delegate-action-arg release-dashboard --delegate-expected-minutes 15
bun src/cli.ts agents budget set --agent reviewer-1 --window daily --max-runs 8 --max-minutes 120 --max-cost-units 20
bun src/cli.ts heartbeat schedule add --agent reviewer-1 --task review-contracts --trigger cron --expression "*/30 * * * *" --summary "Review queue" --retry-limit 2 --cooldown-minutes 30
bun src/cli.ts heartbeat inspect --agent reviewer-1 --json
bun src/cli.ts heartbeat run-due --agent reviewer-1 --max-agents 1 --max-tasks 1 --json
bun src/cli.ts heartbeat run-agent --agent lead-1 --json
bun src/cli.ts approvals approve <id> --by lead-1 --note "Approved release work"
bun src/cli.ts ship --dry-run --pr --control-agent reviewer-1 --control-state .codex-stack/control-plane/state.json
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.example.json --dry-run --open-prs --control-agent lead-1 --control-state .codex-stack/control-plane/state.json --json
bun src/cli.ts goals queue --assignee reviewer-1
bun src/cli.ts agents dashboard --out .codex-stack/control-plane/dashboard

MCP interop

codex-stack can expose its workflow layer to MCP-capable clients through a local stdio server.

v1 policy:

stdio transport only
read-only plus dry-run tools only
no repo mutation, GitHub mutation, issue creation, PR creation, or baseline updates through MCP

Useful commands:

bun src/cli.ts mcp inspect --json
bun src/cli.ts mcp serve

Example client wiring:

{
  "mcpServers": {
    "codex-stack": {
      "command": "bun",
      "args": [
        "/Users/anup.khandelwal/Desktop/codex/codex/codex-stack/src/cli.ts",
        "mcp",
        "serve"
      ]
    }
  }
}

Fleet rollout

Use fleet when one repo needs to manage codex-stack policy across many repos.

Example:

bun src/cli.ts fleet validate --manifest .codex-stack/fleet.example.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts fleet collect --manifest .codex-stack/fleet.example.json --json
bun src/cli.ts fleet dashboard --manifest .codex-stack/fleet.example.json --out .fleet-site

Checked-in real control plane for this workspace:

bun src/cli.ts fleet validate --manifest .codex-stack/fleet.anup4khandelwal.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.anup4khandelwal.json --open-prs
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --json
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --open-prs --issue-repo anup4khandelwal/codex-stack
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --open-prs --control-agent fleet-1 --control-state .codex-stack/control-plane/state.json --json

The control repo owns:

a fleet manifest listing managed repos
shared policy packs under .codex-stack/policies/
generated member config and fleet-status workflow for each target repo
an org-level dashboard that ranks rollout drift and unresolved risk across repos

fleet sync generates three files in each managed repo:

.codex-stack/fleet-member.json
.github/codex-stack/fleet-status.js
.github/workflows/codex-stack-fleet-status.yml

That workflow emits a normalized codex-stack-fleet-status artifact so fleet collect can aggregate repo health without inventing a new backend. fleet remediate builds on the same status feed to open rollout PRs for drifted repos, maintain one stable remediation issue per unhealthy repo in the control plane, and optionally require an approval gate before rollout PRs open.

Policy packs also define whether a repo is expected to publish a codex-stack QA/deploy report. review-only repos are considered healthy when they are installed and in sync even if they do not publish docs/qa/ artifacts. Full rollout packs can still require a latest report before fleet health turns green.

Current checked-in targets:

anup4khandelwal/autopilot-multi-agent-loop via review-only
anup4khandelwal/awesome-codex-skills via review-only
anup4khandelwal/anup4khandelwal via review-only

Current checked-in fleet state:

3/3 repos installed
3/3 repos healthy
0 repos drifted

Browser QA model

browse is the runtime. qa is the report layer.

Use browse when you want raw control:

sessions
portable session bundles
named flows
snapshots
route probes
ad hoc assertions
semantic selectors and responsive viewport presets
iframe targeting by frame name, URL fragment, or iframe selector
request blocking and mocked responses for repeatable QA and failure-path testing
download capture and filename assertions for export flows
local browser-profile import for Chrome, Arc, Brave, and Edge on macOS
screenshots and artifacts

Use qa when you want a decision-ready report:

pass / warning / critical status
health score
findings with category + evidence
diff-aware route inference from changed files
annotated SVG evidence for snapshot-based failures
optional accessibility and performance findings with dedicated artifacts and markdown summaries
saved markdown/json report under .codex-stack/qa/
automatic trend summaries across saved QA runs

Preview verification

Use preview when the branch already has a preview deployment and you want merge readiness against the live preview, not only against the code diff.

Example:

bun src/cli.ts preview \
  --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" \
  --pr 42 \
  --branch feat/42-preview \
  --sha abcdef1234567890 \
  --path /login \
  --path /dashboard \
  --device desktop \
  --device mobile \
  --flow release-full-demo \
  --a11y \
  --a11y-scope main \
  --perf \
  --perf-budget lcp=2s

For same-repo PRs, pr-review.yml publishes this preview site automatically to GitHub Pages before it verifies the deployment. preview-verify.yml remains available as a manual rerun or for external preview URLs.

Authenticated previews use the same session bundle format as browse export-session:

bun src/cli.ts browse import-browser-cookies chrome --session preview-auth --profile Default
bun src/cli.ts browse export-session .codex-stack/private/preview-auth.json --session preview-auth
bun src/cli.ts preview \
  --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" \
  --pr 42 \
  --branch feat/42-preview \
  --sha abcdef1234567890 \
  --path /dashboard \
  --device desktop \
  --flow release-dashboard \
  --session preview-auth \
  --session-bundle .codex-stack/private/preview-auth.json

For CI, base64-encode that bundle and save it as the repo secret CODEX_STACK_PREVIEW_SESSION_BUNDLE_B64. pr-review.yml and preview-verify.yml decode it into a temp file, pass --session-bundle into preview verification, and delete the temp file before the job exits.

Ship verification

ship can call qa before push/PR creation.

Example:

bun src/cli.ts ship \
  --message "feat: ready for review" \
  --push \
  --pr \
  --verify-url https://staging.example.com/dashboard \
  --verify-flow landing-smoke \
  --verify-snapshot landing-home \
  --verify-a11y \
  --verify-a11y-scope main \
  --verify-perf \
  --verify-perf-budget lcp=2s

This keeps QA in the shipping path instead of as a manual follow-up. When verification runs during ship, the QA report and evidence are published into docs/qa/<branch>/ before the branch is pushed, so the PR comment can point at tracked files on GitHub. After merge, the qa-pages workflow renders those tracked artifacts into a GitHub Pages site so the report, annotation, and screenshot links remain stable even after the feature branch is deleted.

Upgrade checks

Use upgrade when you want to audit the repo itself instead of a feature branch.

Examples:

bun src/cli.ts upgrade --offline
bun src/cli.ts upgrade --json
bun src/cli.ts upgrade --offline --apply
bun src/cli.ts upgrade --markdown-out docs/daily-update-check.md --json-out docs/daily-update-check.json

The upgrade report covers:

Bun runtime alignment against packageManager and engines.bun
dependency drift from npm when network access is available
GitHub Actions uses: ref drift
local wrapper and installed Codex skill link health
optional safe local refresh results when --apply is used

--apply is intentionally narrow. It regenerates .codex-stack/bin wrappers with dependency install skipped and refreshes project skill links under .codex/skills. It does not mutate dependency versions or workflow refs.

.github/workflows/daily-update-check.yml runs that same report on a daily schedule, uploads the markdown/json artifacts, writes the markdown into the workflow summary, and syncs a stable GitHub issue titled Daily codex-stack update check.

QA Pages

Build the static QA site locally:

bun run demo:publish-qa
bun run qa:site
open .site/index.html

On GitHub, .github/workflows/qa-pages.yml deploys the merged docs/qa/ reports to Pages. ship --pr now emits two classes of QA links:

branch artifact links that work immediately on the PR branch
stable Pages links that activate after the branch is merged to main

When a published QA report includes snapshot evidence, the Pages site now surfaces visual/index.html as the primary review-evidence entrypoint alongside the raw markdown, JSON, annotation, and screenshot files. Those visual packs now include a diff heatmap and an image-diff score so regressions can be ranked instead of treated as binary drift only. The merged QA Pages site also renders visual history charts for risk score, image-diff score, and baseline age so drift over time is visible without opening each report one by one. When a report includes accessibility or performance data, the Pages site also exposes the latest violation counts, perf-budget failures, LCP/CLS summaries, and matching history charts. Snapshot baselines now carry route and device metadata, and QA/deploy/preview runs flag stale baselines automatically when the saved reference is too old. The repo also keeps a tracked sample report at docs/qa/release-readiness-demo/ so the Pages root is never empty for evaluators landing on the project cold.

The same gh-pages branch also hosts PR previews under pr-preview/pr-<number>/. Configure these repo variables if you want richer automatic preview coverage in pr-review.yml:

CODEX_STACK_PREVIEW_PATHS=/login,/dashboard
CODEX_STACK_PREVIEW_DEVICES=desktop,mobile
CODEX_STACK_PREVIEW_FLOW=release-full-demo
CODEX_STACK_PREVIEW_SNAPSHOT=<optional snapshot name>
CODEX_STACK_PREVIEW_WAIT_TIMEOUT=300

Optional authenticated preview secret:

CODEX_STACK_PREVIEW_SESSION_BUNDLE_B64=<base64 of browse export-session output>

For same-repo PRs, pr-review.yml republishes the preview subtree with hosted review evidence under pr-preview/pr-<number>/__codex/, including the deploy visual pack at __codex/visual/index.html. The PR review comment uses that hosted pack to show direct visual-summary links and inline screenshots for the highest-signal regressions.

When a PR closes, .github/workflows/preview-cleanup.yml removes only gh-pages/pr-preview/pr-<number>/ and keeps the root QA site plus other active PR previews intact.

Install skills for Codex

User-level install:

bash scripts/install-skills.sh user

Project-level install:

bash scripts/install-skills.sh project /path/to/repo

This creates links such as:

~/.codex/skills/codex-stack-product
~/.codex/skills/codex-stack-qa
~/.codex/skills/codex-stack-review
~/.codex/skills/codex-stack-browse
~/.codex/skills/codex-stack-setup-browser-cookies
~/.codex/skills/codex-stack-upgrade

Example prompts after installation:

Use codex-stack-product to tighten this feature request into acceptance criteria.
Use codex-stack-review to audit the current branch against main and focus on production risk.
Use codex-stack-qa to verify the staging dashboard flow and tell me if it is safe to ship.
Use codex-stack-browse to capture a baseline snapshot for the new onboarding page.
Use codex-stack-setup-browser-cookies to import my signed-in Chrome session and prepare a preview auth bundle for CI.
Use codex-stack-upgrade to audit whether this codex-stack install needs dependency, workflow, or skill-link refreshes.

Repository layout

codex-stack/
  browse/              Browser runtime, flows, and artifacts helpers
  docs/                Install, command, and example docs
  examples/            Sample apps for demos
  scripts/             Setup, review, qa, ship, retro, and digest helpers
  skills/              Installable Codex skills
  src/                 TypeScript source for the root CLI

Documentation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.codex-stack		.codex-stack
.github		.github
browse		browse
docs		docs
examples/customer-portal-demo		examples/customer-portal-demo
scripts		scripts
skills		skills
src		src
templates/fleet		templates/fleet
.bun-version		.bun-version
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
setup		setup
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codex-stack

Without codex-stack

With codex-stack

Default workflow

What ships today

Quick start

Swarm multiple agents

Demo the sample app

Issue to merge flow

Root CLI

Control plane

MCP interop

Fleet rollout

Browser QA model

Preview verification

Ship verification

Upgrade checks

QA Pages

Install skills for Codex

Repository layout

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

codex-stack

Without codex-stack

With codex-stack

Default workflow

What ships today

Quick start

Swarm multiple agents

Demo the sample app

Issue to merge flow

Root CLI

Control plane

MCP interop

Fleet rollout

Browser QA model

Preview verification

Ship verification

Upgrade checks

QA Pages

Install skills for Codex

Repository layout

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages