Skip to content

anup4khandelwal/codex-stack

Repository files navigation

codex-stack

codex-stack turns Codex from a generic coding assistant into a team of workflow specialists you can call on demand.

Eighteen opinionated workflow modes for Codex: product framing, technical planning, paranoid diff review, scored QA, regression triage, preview verification, deploy verification, release shipping, browser automation, engineering retrospectives, upgrade audits, fleet rollout plus auto-remediation control, local agent staffing, goal and queue tracking, heartbeat scheduling, approval gates, and local MCP interoperability.

Inspired by gstack, codex-stack adapts the same specialist-workflow idea for Codex. If gstack is the Claude Code version of this pattern, codex-stack is the Codex-native version. This project is independently maintained and is not affiliated with gstack.

Without codex-stack

  • Requests stay vague, so the agent executes before the scope is really clear.
  • Review depth varies from run to run because there is no shared checklist or report shape.
  • Browser QA lives in one-off prompts instead of reusable checked-in flows.
  • Shipping still needs manual PR setup, reviewer routing, labels, assignees, and project metadata.
  • Deployment validation is disconnected from the shipping path.
  • Weekly updates for stakeholders are assembled by hand.

With codex-stack

Skill Mode What it does
product Product thinker Reframes a request into user outcomes, scope, and acceptance criteria.
tech Tech lead Locks architecture, trust boundaries, rollout risks, and the test plan.
review Paranoid staff engineer Audits the diff for structural production risks instead of style noise.
qa QA lead Runs browser flows, diff-aware route probes, and snapshot checks, then scores release readiness.
qa-decide Regression triage operator Records explicit approvals, suppressions, and refresh-required decisions for known regressions.
preview Preview verifier Resolves a PR preview URL, waits for readiness, runs QA, and reports whether the preview is safe to merge.
deploy Deploy verifier Verifies a preview or staging deploy across path and device checks, flows, snapshots, screenshots, and console evidence.
ship Release engineer Validates the branch, prepares PR metadata, and can run QA before opening the PR.
browse QA engineer Drives a real browser with persistent sessions, portable session bundles, named flows, snapshots, and artifacts.
retro Engineering manager Summarizes delivery patterns from git history and optional GitHub PR analytics.
upgrade Repo maintainer Audits Bun, dependency drift, workflow action drift, and install health for codex-stack itself.
fleet Control plane operator Pushes shared policy packs across repos, collects normalized health, and renders a GitHub-native rollout dashboard.
agents Agent manager Registers named engineering agents, reporting lines, and staffing status for local orchestration.
goals Program lead Tracks goal hierarchy and assignable work queues for multiple agents.
heartbeat Loop operator Schedules agent wakeups, records heartbeat runs, and preserves per-agent continuity state.
approvals Governance lead Manages approval requests and gates high-risk autonomous actions.
mcp Interop engineer Exposes read-only codex-stack tools and evidence resources to MCP-capable clients over local stdio.

Default workflow

Use the repo in this order:

  1. Open an issue
  2. Create a branch from that issue
  3. Open a PR from the issue branch
  4. Let pr-review comment and gate the PR automatically
  5. Add the automerge label when the PR is ready to merge after checks

What ships today

  • Installable Codex skills under skills/
  • Checked-in root CLI under src/cli.ts
  • Playwright-backed browser runtime under browse/src/cli.ts with semantic selectors, device presets, iframe targeting, request mocking/blocking, download capture, upload, dialog, wait-state, and element-state assertions
  • Persistent named browser sessions
  • Portable session import/export with cookie and storage-state bundles
  • Checked-in and local browser flows with import/export for JSON, YAML, and Markdown
  • Page snapshots and snapshot comparison artifacts with self-contained visual packs, diff heatmaps, and image-diff scores
  • QA reports with typed categories, severity, health score, diff-aware route inference, stale-baseline detection, visual-risk scoring, saved evidence, annotated screenshots, and published visual packs for snapshot failures
  • Repo-tracked baseline decision files under .codex-stack/baseline-decisions/ so expected regressions stay explicit, reviewable, and expirable
  • Opt-in accessibility scans and performance budget checks that feed the same QA, preview, deploy, ship, PR review, and Pages report surfaces
  • Historical QA trend artifacts under .codex-stack/qa/trends.json and .codex-stack/qa/trends.md
  • Preview verification with URL template resolution, readiness polling, deploy/page verification, QA execution, and PR comment output for preview deployments
  • Deploy verification with page and device matrices, screenshot manifests, console capture, tracked evidence, and visual/index.html review packs with ranked regressions plus a consolidated visual-risk score
  • Shipping automation with PR body generation, labels, reviewers, assignees, projects, and optional deploy verification
  • PR comments with deploy verification summaries and artifact references after ship --pr
  • Tracked QA evidence published under docs/qa/<branch>/ during shipping so PR comments can link to real files
  • GitHub Pages publishing for docs/qa/ so merged QA reports keep a stable URL after branch cleanup
  • Issue-first workflow automation with PR review comments and opt-in auto-merge
  • Fleet rollout controls for multi-repo policy packs, policy-aware health expectations, rollout PR planning, auto-remediation issues, and org-level dashboard rendering
  • Local control-plane state under .codex-stack/control-plane/ with named agents, goals, delegated task queues, schedules with retry and cooloff policy, continuity sessions, budget policies, and approval queues
  • Local stdio MCP server with read-only and dry-run wrappers for review, QA, preview, deploy, ship planning, fleet planning, retro, and upgrade workflows
  • Retrospective analytics plus weekly digest publishing outputs for markdown, Slack, and email, including visual regression rollups from published QA evidence
  • Upgrade auditing via CLI plus a daily scheduled update-check workflow that syncs a stable issue

Quick start

bun --version
./setup
bun run typecheck
bunx playwright install chromium
bash scripts/install-skills.sh user
bun src/cli.ts list

./setup runs environment checks, installs Bun dependencies when needed, and creates local wrappers under .codex-stack/bin/ for:

  • codex-stack
  • codex-stack-browse
  • product
  • tech
  • review
  • qa
  • qa-decide
  • preview
  • deploy
  • ship
  • browse
  • retro
  • upgrade
  • fleet
  • agents
  • goals
  • heartbeat
  • approvals
  • mcp

If you want shell-level commands, link those wrappers into your PATH:

bash scripts/link-commands.sh

Swarm multiple agents

You can run multiple Codex sessions in parallel across separate worktrees or terminals.

Typical split:

  • one agent in review
  • one agent in qa
  • one agent in preview
  • one agent in deploy
  • one agent in ship

Because the command contracts are shared, those agents stay aligned on the same review, QA, and shipping workflow.

Demo the sample app

The repo includes a release-readiness demo app at examples/customer-portal-demo/. It is built for technical evaluators who need to see whether codex-stack can support a real merge decision, not just automate a toy page.

Start it:

bun run demo:start
bun run demo:publish-qa

Then run the core live sequence:

bun src/cli.ts browse run-flow http://127.0.0.1:4173/login release-full-demo --session friend-demo
bun src/cli.ts browse snapshot http://127.0.0.1:4173/dashboard release-dashboard --session friend-demo
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session friend-demo
bun src/cli.ts deploy --url http://127.0.0.1:4173 --path /dashboard --path /changes --device desktop --device mobile --flow release-dashboard --flow release-changes --snapshot release-dashboard --publish-dir docs/qa/demo/deploy
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device desktop --verify-device mobile --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard

What that sequence demonstrates:

  • authenticated browser automation
  • QA evidence tied to release confidence
  • page and device verification against a believable change-impact surface
  • shipping decisions backed by the same evidence

The checked-in release-login flow clears the demo app's stored login state before navigation so you can re-run it safely on the same named browser session. bun run demo:publish-qa refreshes the tracked sample report at docs/qa/release-readiness-demo/, which is what keeps the public QA Pages landing view useful before you ship a real branch report.

Issue to merge flow

Create the work item and branch:

bun src/cli.ts issue start --title "Add issue-first PR workflow" --label automation --prefix feat

This creates a GitHub issue and a local branch like feat/123-add-issue-first-pr-workflow.

Ship the branch as a PR:

bun src/cli.ts ship --message "feat: add issue-first workflow" --push --pr

What happens next:

  • pr-review.yml runs codex-stack review on the PR diff
  • for same-repo PRs, the review workflow publishes a GitHub Pages preview at https://<owner>.github.io/<repo>/pr-preview/pr-<number>/ and verifies that live preview before merging
  • the same workflow republishes review evidence under https://<owner>.github.io/<repo>/pr-preview/pr-<number>/__codex/
  • the workflow posts or updates a PR comment with structural findings plus hosted preview visual evidence
  • the job fails if critical findings are detected in either structural review or preview verification
  • if the PR has the automerge label, pr-automerge.yml enables GitHub auto-merge

Branch naming matters: when the branch follows <prefix>/<issue-number>-slug, ship includes Closes #<issue-number> in the generated PR body so the issue closes on merge.

Root CLI

bun src/cli.ts list
bun src/cli.ts show qa
bun src/cli.ts review --json --base origin/main
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session demo --json
bun src/cli.ts qa http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --a11y --a11y-scope main --perf --perf-budget lcp=2s --perf-budget cls=0.1 --session demo --json
bun src/cli.ts qa https://preview.example.com --mode diff-aware --base-ref origin/main --session preview --json
bun src/cli.ts qa https://preview.example.com/dashboard --flow release-dashboard --session preview-auth --session-bundle .codex-stack/private/preview-auth.json --json
bun src/cli.ts qa-decide approve --snapshot release-dashboard --route /dashboard --device desktop --kind snapshot-drift --reason "Intentional redesign approved in PR #123"
bun src/cli.ts qa-decide suppress --category accessibility --kind accessibility-rule --route /checkout --device desktop --rule color-contrast --reason "Vendor widget pending upstream fix" --expires-at 2026-03-29T00:00:00Z
bun src/cli.ts preview --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" --pr 42 --branch feat/42-preview --sha abcdef123 --path /login --path /dashboard --device desktop --device mobile --flow release-full-demo
bun src/cli.ts preview --url-template "https://preview-{pr}.example.com" --pr 42 --branch feat/42-preview --sha abcdef123 --path / --path /dashboard --device desktop --device mobile --flow landing-smoke --snapshot landing-home
bun src/cli.ts preview --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" --pr 42 --branch feat/42-preview --sha abcdef123 --path /dashboard --device desktop --flow release-dashboard --session preview-auth --session-bundle .codex-stack/private/preview-auth.json
bun src/cli.ts deploy --url https://staging.example.com --path / --path /dashboard --path /changes --device desktop --device mobile --flow release-dashboard --flow release-changes --snapshot release-dashboard
bun src/cli.ts deploy --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --snapshot release-dashboard --a11y --a11y-scope main --perf --perf-budget lcp=2s --perf-budget cls=0.1
bun src/cli.ts deploy --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --session staging-auth --session-bundle .codex-stack/private/staging-auth.json
bun src/cli.ts ship --message "feat: ready for review" --push --pr --reviewer octocat --assignee @me --project "Engineering Roadmap"
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device mobile --verify-console-errors --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard
bun src/cli.ts ship --dry-run --pr --verify-url http://127.0.0.1:4173 --verify-path /dashboard --verify-path /changes --verify-device mobile --verify-flow release-dashboard --verify-flow release-changes --verify-snapshot release-dashboard --verify-a11y --verify-a11y-scope main --verify-perf --verify-perf-budget lcp=2s
bun src/cli.ts fleet validate --manifest .codex-stack/fleet.example.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts fleet collect --manifest .codex-stack/fleet.example.json --json
bun src/cli.ts fleet dashboard --manifest .codex-stack/fleet.example.json --out .fleet-site
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts agents add --name lead-1 --runtime codex --role manager --team platform --status working
bun src/cli.ts agents dashboard --out .codex-stack/control-plane/dashboard
bun src/cli.ts goals add --id release-q2 --title "Release Q2 hardening" --type initiative --owner lead-1 --status active
bun src/cli.ts goals task add --id review-contracts --goal release-q2 --title "Review agent contracts" --assignee reviewer-1 --action-kind review --action-arg --base --action-arg origin/main --expected-minutes 10
bun src/cli.ts goals queue --json
bun src/cli.ts agents budget set --agent reviewer-1 --window daily --max-runs 8 --max-minutes 120 --max-cost-units 20
bun src/cli.ts heartbeat schedule add --agent reviewer-1 --task review-contracts --trigger cron --expression "*/30 * * * *" --summary "Review new queue items"
bun src/cli.ts heartbeat run-due --agent reviewer-1 --max-agents 1 --max-tasks 1 --json
bun src/cli.ts approvals gate --agent reviewer-1 --kind ship-pr --target review-contracts --json
bun src/cli.ts mcp inspect --json
bun src/cli.ts mcp serve
bun src/cli.ts retro --since "7 days ago" --repo anup4khandelwal/codex-stack
bun src/cli.ts upgrade --offline --json
bun src/cli.ts upgrade --offline --apply
bun src/cli.ts browse doctor
bun src/cli.ts browse flows
bun src/cli.ts browse export-session ./tmp/staging-session.json --session staging
bun src/cli.ts browse import-session ./tmp/staging-session.json --session staging-copy
bun src/cli.ts browse import-browser-cookies chrome --session staging --profile Default
bun src/cli.ts browse probe https://example.com/settings --session staging
bun src/cli.ts browse upload https://example.com/profile "input[type=file]" ./fixtures/avatar.png --session staging
bun src/cli.ts browse dialog https://example.com/settings accept "#delete-confirm" --session staging
bun src/cli.ts browse click https://example.com/login "role:button:Continue" --session staging --device mobile
bun src/cli.ts browse fill https://example.com/login "label:Email" demo@example.com --session staging
bun src/cli.ts browse assert-visible https://example.com/home "testid:hero" --session staging
bun src/cli.ts browse click https://example.com/checkout "role:button:Pay now" --session staging --frame "name:payment"
bun src/cli.ts browse mock https://example.com/app "**/api/profile" '{"status":503,"json":{"error":"offline"}}' --session staging
bun src/cli.ts browse download https://example.com/reports "role:button:Export CSV" ./artifacts/report.csv --session staging
bun src/cli.ts browse assert-focused https://example.com/login "input[name=email]" --session staging
bun src/cli.ts browse snapshot https://example.com marketing-home --session staging
bun src/cli.ts browse compare-snapshot https://example.com marketing-home --session staging

Useful Bun scripts:

bun run doctor
bun run typecheck
bun run smoke
bun run demo:start
bun run demo:smoke
bun run review
bun run qa -- http://127.0.0.1:4173/dashboard --flow release-dashboard --snapshot release-dashboard --session demo
bun src/cli.ts qa-decide list --active-only
bun run preview -- --url-template "https://preview-{pr}.example.com" --pr 42 --branch feat/42-preview --sha abcdef123 --path / --device desktop --flow landing-smoke --snapshot landing-home
bun run deploy -- --url https://staging.example.com --path /dashboard --device desktop --flow release-dashboard --snapshot release-dashboard
bun run ship:dry
bun run agents -- list --json
bun run goals -- queue --json
bun run heartbeat -- list --json
bun run approvals -- list --json
bun run mcp -- inspect --json
bun run retro
bun run upgrade
bun run upgrade:apply
bun run weekly

Control plane

codex-stack now has a local control-plane layer for multi-agent engineering work before you need a full remote scheduler.

Core ideas:

  • register named agents with roles, runtimes, teams, and managers
  • track initiative and repo goals explicitly
  • assign persistent queued work and delegate parent tasks into child tasks instead of only running one-shot commands
  • schedule heartbeats with continuity state, retry limits, and cooloff windows per agent
  • execute one bounded workflow step per agent heartbeat, capture execution history, and resume from continuity state on the next tick
  • enforce budget limits and explicit approvals for high-risk actions, including ship and rollout remediation preflight
  • render a static dashboard under .codex-stack/control-plane/dashboard/

Useful commands:

bun src/cli.ts agents add --name lead-1 --runtime codex --role manager --team platform --status working
bun src/cli.ts agents add --name reviewer-1 --runtime claude-code --role reviewer --team platform --manager lead-1
bun src/cli.ts goals add --id release-q2 --title "Release Q2 hardening" --type initiative --owner lead-1 --status active
bun src/cli.ts goals task add --id review-contracts --goal release-q2 --title "Review agent contracts" --assignee reviewer-1 --action-kind review --action-arg --base --action-arg origin/main --expected-minutes 10
bun src/cli.ts goals task add --id release-train --goal release-q2 --title "Coordinate release" --assignee lead-1 --action-kind custom-command --action-arg node --action-arg -e --action-arg "console.log(JSON.stringify({summary:'lead ok',nextAction:'complete'}))" --delegate-id qa-contracts --delegate-title "Run delegated QA" --delegate-assignee qa-1 --delegate-action-kind qa --delegate-action-arg http://127.0.0.1:4173/dashboard --delegate-action-arg --flow --delegate-action-arg release-dashboard --delegate-expected-minutes 15
bun src/cli.ts agents budget set --agent reviewer-1 --window daily --max-runs 8 --max-minutes 120 --max-cost-units 20
bun src/cli.ts heartbeat schedule add --agent reviewer-1 --task review-contracts --trigger cron --expression "*/30 * * * *" --summary "Review queue" --retry-limit 2 --cooldown-minutes 30
bun src/cli.ts heartbeat inspect --agent reviewer-1 --json
bun src/cli.ts heartbeat run-due --agent reviewer-1 --max-agents 1 --max-tasks 1 --json
bun src/cli.ts heartbeat run-agent --agent lead-1 --json
bun src/cli.ts approvals approve <id> --by lead-1 --note "Approved release work"
bun src/cli.ts ship --dry-run --pr --control-agent reviewer-1 --control-state .codex-stack/control-plane/state.json
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.example.json --dry-run --open-prs --control-agent lead-1 --control-state .codex-stack/control-plane/state.json --json
bun src/cli.ts goals queue --assignee reviewer-1
bun src/cli.ts agents dashboard --out .codex-stack/control-plane/dashboard

MCP interop

codex-stack can expose its workflow layer to MCP-capable clients through a local stdio server.

v1 policy:

  • stdio transport only
  • read-only plus dry-run tools only
  • no repo mutation, GitHub mutation, issue creation, PR creation, or baseline updates through MCP

Useful commands:

bun src/cli.ts mcp inspect --json
bun src/cli.ts mcp serve

Example client wiring:

{
  "mcpServers": {
    "codex-stack": {
      "command": "bun",
      "args": [
        "/Users/anup.khandelwal/Desktop/codex/codex/codex-stack/src/cli.ts",
        "mcp",
        "serve"
      ]
    }
  }
}

Fleet rollout

Use fleet when one repo needs to manage codex-stack policy across many repos.

Example:

bun src/cli.ts fleet validate --manifest .codex-stack/fleet.example.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.example.json --dry-run --json
bun src/cli.ts fleet collect --manifest .codex-stack/fleet.example.json --json
bun src/cli.ts fleet dashboard --manifest .codex-stack/fleet.example.json --out .fleet-site

Checked-in real control plane for this workspace:

bun src/cli.ts fleet validate --manifest .codex-stack/fleet.anup4khandelwal.json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --json
bun src/cli.ts fleet sync --manifest .codex-stack/fleet.anup4khandelwal.json --open-prs
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --json
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --open-prs --issue-repo anup4khandelwal/codex-stack
bun src/cli.ts fleet remediate --manifest .codex-stack/fleet.anup4khandelwal.json --dry-run --open-prs --control-agent fleet-1 --control-state .codex-stack/control-plane/state.json --json

The control repo owns:

  • a fleet manifest listing managed repos
  • shared policy packs under .codex-stack/policies/
  • generated member config and fleet-status workflow for each target repo
  • an org-level dashboard that ranks rollout drift and unresolved risk across repos

fleet sync generates three files in each managed repo:

  • .codex-stack/fleet-member.json
  • .github/codex-stack/fleet-status.js
  • .github/workflows/codex-stack-fleet-status.yml

That workflow emits a normalized codex-stack-fleet-status artifact so fleet collect can aggregate repo health without inventing a new backend. fleet remediate builds on the same status feed to open rollout PRs for drifted repos, maintain one stable remediation issue per unhealthy repo in the control plane, and optionally require an approval gate before rollout PRs open.

Policy packs also define whether a repo is expected to publish a codex-stack QA/deploy report. review-only repos are considered healthy when they are installed and in sync even if they do not publish docs/qa/ artifacts. Full rollout packs can still require a latest report before fleet health turns green.

Current checked-in targets:

  • anup4khandelwal/autopilot-multi-agent-loop via review-only
  • anup4khandelwal/awesome-codex-skills via review-only
  • anup4khandelwal/anup4khandelwal via review-only

Current checked-in fleet state:

  • 3/3 repos installed
  • 3/3 repos healthy
  • 0 repos drifted

Browser QA model

browse is the runtime. qa is the report layer.

Use browse when you want raw control:

  • sessions
  • portable session bundles
  • named flows
  • snapshots
  • route probes
  • ad hoc assertions
  • semantic selectors and responsive viewport presets
  • iframe targeting by frame name, URL fragment, or iframe selector
  • request blocking and mocked responses for repeatable QA and failure-path testing
  • download capture and filename assertions for export flows
  • local browser-profile import for Chrome, Arc, Brave, and Edge on macOS
  • screenshots and artifacts

Use qa when you want a decision-ready report:

  • pass / warning / critical status
  • health score
  • findings with category + evidence
  • diff-aware route inference from changed files
  • annotated SVG evidence for snapshot-based failures
  • optional accessibility and performance findings with dedicated artifacts and markdown summaries
  • saved markdown/json report under .codex-stack/qa/
  • automatic trend summaries across saved QA runs

Preview verification

Use preview when the branch already has a preview deployment and you want merge readiness against the live preview, not only against the code diff.

Example:

bun src/cli.ts preview \
  --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" \
  --pr 42 \
  --branch feat/42-preview \
  --sha abcdef1234567890 \
  --path /login \
  --path /dashboard \
  --device desktop \
  --device mobile \
  --flow release-full-demo \
  --a11y \
  --a11y-scope main \
  --perf \
  --perf-budget lcp=2s

For same-repo PRs, pr-review.yml publishes this preview site automatically to GitHub Pages before it verifies the deployment. preview-verify.yml remains available as a manual rerun or for external preview URLs.

Authenticated previews use the same session bundle format as browse export-session:

bun src/cli.ts browse import-browser-cookies chrome --session preview-auth --profile Default
bun src/cli.ts browse export-session .codex-stack/private/preview-auth.json --session preview-auth
bun src/cli.ts preview \
  --url "https://anup4khandelwal.github.io/codex-stack/pr-preview/pr-42/" \
  --pr 42 \
  --branch feat/42-preview \
  --sha abcdef1234567890 \
  --path /dashboard \
  --device desktop \
  --flow release-dashboard \
  --session preview-auth \
  --session-bundle .codex-stack/private/preview-auth.json

For CI, base64-encode that bundle and save it as the repo secret CODEX_STACK_PREVIEW_SESSION_BUNDLE_B64. pr-review.yml and preview-verify.yml decode it into a temp file, pass --session-bundle into preview verification, and delete the temp file before the job exits.

Ship verification

ship can call qa before push/PR creation.

Example:

bun src/cli.ts ship \
  --message "feat: ready for review" \
  --push \
  --pr \
  --verify-url https://staging.example.com/dashboard \
  --verify-flow landing-smoke \
  --verify-snapshot landing-home \
  --verify-a11y \
  --verify-a11y-scope main \
  --verify-perf \
  --verify-perf-budget lcp=2s

This keeps QA in the shipping path instead of as a manual follow-up. When verification runs during ship, the QA report and evidence are published into docs/qa/<branch>/ before the branch is pushed, so the PR comment can point at tracked files on GitHub. After merge, the qa-pages workflow renders those tracked artifacts into a GitHub Pages site so the report, annotation, and screenshot links remain stable even after the feature branch is deleted.

Upgrade checks

Use upgrade when you want to audit the repo itself instead of a feature branch.

Examples:

bun src/cli.ts upgrade --offline
bun src/cli.ts upgrade --json
bun src/cli.ts upgrade --offline --apply
bun src/cli.ts upgrade --markdown-out docs/daily-update-check.md --json-out docs/daily-update-check.json

The upgrade report covers:

  • Bun runtime alignment against packageManager and engines.bun
  • dependency drift from npm when network access is available
  • GitHub Actions uses: ref drift
  • local wrapper and installed Codex skill link health
  • optional safe local refresh results when --apply is used

--apply is intentionally narrow. It regenerates .codex-stack/bin wrappers with dependency install skipped and refreshes project skill links under .codex/skills. It does not mutate dependency versions or workflow refs.

.github/workflows/daily-update-check.yml runs that same report on a daily schedule, uploads the markdown/json artifacts, writes the markdown into the workflow summary, and syncs a stable GitHub issue titled Daily codex-stack update check.

QA Pages

Build the static QA site locally:

bun run demo:publish-qa
bun run qa:site
open .site/index.html

On GitHub, .github/workflows/qa-pages.yml deploys the merged docs/qa/ reports to Pages. ship --pr now emits two classes of QA links:

  • branch artifact links that work immediately on the PR branch
  • stable Pages links that activate after the branch is merged to main

When a published QA report includes snapshot evidence, the Pages site now surfaces visual/index.html as the primary review-evidence entrypoint alongside the raw markdown, JSON, annotation, and screenshot files. Those visual packs now include a diff heatmap and an image-diff score so regressions can be ranked instead of treated as binary drift only. The merged QA Pages site also renders visual history charts for risk score, image-diff score, and baseline age so drift over time is visible without opening each report one by one. When a report includes accessibility or performance data, the Pages site also exposes the latest violation counts, perf-budget failures, LCP/CLS summaries, and matching history charts. Snapshot baselines now carry route and device metadata, and QA/deploy/preview runs flag stale baselines automatically when the saved reference is too old. The repo also keeps a tracked sample report at docs/qa/release-readiness-demo/ so the Pages root is never empty for evaluators landing on the project cold.

The same gh-pages branch also hosts PR previews under pr-preview/pr-<number>/. Configure these repo variables if you want richer automatic preview coverage in pr-review.yml:

  • CODEX_STACK_PREVIEW_PATHS=/login,/dashboard
  • CODEX_STACK_PREVIEW_DEVICES=desktop,mobile
  • CODEX_STACK_PREVIEW_FLOW=release-full-demo
  • CODEX_STACK_PREVIEW_SNAPSHOT=<optional snapshot name>
  • CODEX_STACK_PREVIEW_WAIT_TIMEOUT=300

Optional authenticated preview secret:

  • CODEX_STACK_PREVIEW_SESSION_BUNDLE_B64=<base64 of browse export-session output>

For same-repo PRs, pr-review.yml republishes the preview subtree with hosted review evidence under pr-preview/pr-<number>/__codex/, including the deploy visual pack at __codex/visual/index.html. The PR review comment uses that hosted pack to show direct visual-summary links and inline screenshots for the highest-signal regressions.

When a PR closes, .github/workflows/preview-cleanup.yml removes only gh-pages/pr-preview/pr-<number>/ and keeps the root QA site plus other active PR previews intact.

Install skills for Codex

User-level install:

bash scripts/install-skills.sh user

Project-level install:

bash scripts/install-skills.sh project /path/to/repo

This creates links such as:

  • ~/.codex/skills/codex-stack-product
  • ~/.codex/skills/codex-stack-qa
  • ~/.codex/skills/codex-stack-review
  • ~/.codex/skills/codex-stack-browse
  • ~/.codex/skills/codex-stack-setup-browser-cookies
  • ~/.codex/skills/codex-stack-upgrade

Example prompts after installation:

Use codex-stack-product to tighten this feature request into acceptance criteria.
Use codex-stack-review to audit the current branch against main and focus on production risk.
Use codex-stack-qa to verify the staging dashboard flow and tell me if it is safe to ship.
Use codex-stack-browse to capture a baseline snapshot for the new onboarding page.
Use codex-stack-setup-browser-cookies to import my signed-in Chrome session and prepare a preview auth bundle for CI.
Use codex-stack-upgrade to audit whether this codex-stack install needs dependency, workflow, or skill-link refreshes.

Repository layout

codex-stack/
  browse/              Browser runtime, flows, and artifacts helpers
  docs/                Install, command, and example docs
  examples/            Sample apps for demos
  scripts/             Setup, review, qa, ship, retro, and digest helpers
  skills/              Installable Codex skills
  src/                 TypeScript source for the root CLI

Documentation

License

MIT

About

Codex-native workflow stack for review, browser QA, preview verification, and shipping automation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors