__ __
____ ____ / /______ _ __ ______ / /______ ___ __
/ __ \/ __ \/ //_/ __ `/ / / / / __ \/ //_/ __ `/ / / /
/ /_/ / /_/ / ,< / /_/ / / /_/ / /_/ / ,< / /_/ / /_/ /
/ .___/\____/_/|_|\__,_/ \__, /\____/_/|_|\__,_/\__, /
/_/ /____/ /____/
AI-assisted development orchestration - A Claude Code and Codex plugin that orchestrates AI-assisted development sessions with configurable human oversight, bridging the gap between hands-on control and full automation through skills, hooks, agents, and integration with ohno for task management.
- PRD to Tasks - Automatically break down requirements into epics, stories, and tasks
- Orchestrated Sessions - Work across multiple sessions without losing context
- Human Checkpoints - Choose your autonomy level: supervised, semi-auto, auto, or unattended
- Multi-Dimensional Auditing - Verify accessibility, testing, security, docs, and observability
- 24 Specialized Skills - Route work to domain-specific workflows automatically
- Evidence-Based Completion - Require fresh verification before "done", "fixed", or "passing" claims
- Root-Cause Debugging - Reproduce, diagnose, and regression-test bugs before fixes are marked complete
- Spike Protocol - Time-boxed investigations with mandatory decisions
- Claude Code v1.0.0 or later, or Codex
- Node.js v18 or later (for ohno CLI)
- Git (for version control integration)
The setup wizard is the easiest path for Claude Code from npm:
npx pokayokayFor current Codex support, run the setup wizard from the repository checkout so Codex can use that checkout as the marketplace source:
cd ~/Projects/stevestomp/pokayokay
npm --prefix cli install
node cli/bin/cli.jsThe public npm package is the setup CLI and does not contain the Codex plugin payload. The local setup wizard will:
- Install or register the pokayokay plugin for Claude Code, Codex, or both
- Configure the ohno MCP server
- Initialize ohno in your project
- Wire runtime hooks where supported
- Optionally set up kaizen integration
Run npx pokayokay doctor anytime to verify your installation.
Click to expand manual steps
Claude Code:
# 1. Add the marketplace (one-time setup)
claude plugin marketplace add srstomp/pokayokay
# 2. Install the plugin
claude plugin install pokayokay@pokayokayOr from inside Claude Code REPL:
/plugin marketplace add srstomp/pokayokay
/plugin install pokayokay@pokayokay
Codex:
# Codex installs plugins in two steps: register the marketplace, then add the
# plugin from it. Run this from the pokayokay repository checkout:
cd ~/Projects/stevestomp/pokayokay
codex plugin marketplace add .
codex plugin add pokayokay@pokayokay
# Optional: run the local setup wizard to wire ohno MCP and hooks.
npm --prefix cli install
node cli/bin/cli.jsCodex stores the marketplace entry in ~/.codex/config.toml under
[marketplaces.pokayokay] and the install record under
[plugins."pokayokay@pokayokay"]. Registering the marketplace alone does not
install the plugin — codex plugin add is required (the command is add, not
install). The local setup wizard runs both steps and adds a pokayokay-owned
hook block to ~/.codex/config.toml. The hook block enables
codex_hooks = true, routes tool lifecycle events to hooks/actions/bridge.py,
and adds conservative PermissionRequest approval handling.
Add to your MCP configuration.
Claude Code (~/.claude/settings.json):
{
"mcpServers": {
"ohno": {
"command": "npx",
"args": ["@stevestomp/ohno-mcp"]
}
}
}Codex (~/.codex/config.toml):
[mcp_servers.ohno]
command = "npx"
args = ["@stevestomp/ohno-mcp"]npx @stevestomp/ohno-cli init# 1. Run setup wizard (if not done already)
npx pokayokay
# 2. Restart your configured AI runtime to activate MCP server
# 3. Plan from a PRD or concept brief
/pokayokay:plan docs/prd.md
# 4. View kanban board
npx @stevestomp/ohno-cli serve
# 5. Start working
/pokayokay:work supervised
# 6. Audit completeness
/pokayokay:audit --fullFor a quick one-off change, use /pokayokay:quick <task>. For a bug, use
/pokayokay:fix <bug> so pokayokay reproduces the issue, records root cause,
adds or identifies a regression test, and verifies the fix before completion.
| Command | Description |
|---|---|
/pokayokay:plan [--headless] [--review] <path> |
Analyze PRD and create tasks with skill routing |
/pokayokay:revise [--direct] |
Revise existing plan with impact analysis |
/pokayokay:work [mode] [-n N] |
Start/continue work session (supervised/semi-auto/auto/unattended) |
/pokayokay:audit [feature] |
Audit feature completeness across 5 dimensions |
/pokayokay:review |
Analyze session patterns and skill effectiveness |
/pokayokay:handoff |
Prepare session handoff with context preservation |
/pokayokay:hooks |
View and manage hook configuration |
/pokayokay:worktrees |
List, cleanup, switch, or remove worktrees |
| Command | Description |
|---|---|
/pokayokay:quick <task> |
Create task and work inline with TDD/verification gates |
/pokayokay:fix [--thorough] <bug> |
Root-cause bug fix with regression verification (--thorough for full pipeline) |
/pokayokay:spike <question> |
Time-boxed technical investigation |
/pokayokay:hotfix <incident> |
Production incident response |
| Command | Description |
|---|---|
/pokayokay:api <task> |
API design - REST/GraphQL patterns |
/pokayokay:arch <area> |
Architecture review and refactoring |
/pokayokay:db <task> |
Database schema and migrations |
/pokayokay:test <task> |
Testing strategy and implementation |
/pokayokay:integrate <api> |
Third-party API integration |
/pokayokay:sdk <task> |
SDK creation and extraction |
| Command | Description |
|---|---|
/pokayokay:cicd <task> |
CI/CD pipeline creation and optimization |
/pokayokay:security <area> |
Security audit and vulnerability scanning |
/pokayokay:observe <task> |
Logging, metrics, and tracing |
| Command | Description |
|---|---|
/pokayokay:research <topic> |
Extended technical research |
/pokayokay:docs <task> |
Technical documentation |
The plugin includes 24 specialized skills loaded on demand via commands or planner routing:
work-session- Coordinator workflow, modes, agent dispatchplanning- PRD-to-task breakdown with ohno integrationplan-revision- Impact analysis on existing plansspike- Time-boxed investigation with structured outputdeep-research- Multi-day technology evaluationsession-review- Post-session analysis and handoff prepfeature-audit- L0-L5 completeness verificationworktrees- Git worktree managementbrowser-verification- Playwright UI verificationsystematic-debugging- Root-cause-first bug and failure diagnosisverification-before-completion- Fresh evidence before completion claimsfinishing-branch- Verified merge/PR/keep/discard branch finish flow
api-design- REST/GraphQL endpoint designapi-integration- Third-party API consumptiondatabase-design- Schema design, migrations, optimizationarchitecture-review- Code structure, module boundariesci-cd- GitHub Actions, GitLab CI, deployment strategiescloud-infrastructure- AWS service selection, CDK patternsobservability- Logging, metrics, tracing, alertingtesting-strategy- Test architecture, coverage, E2E patternssecurity-audit- OWASP Top 10, dependency scanningerror-handling- Error hierarchies, recovery patternssdk-development- TypeScript SDK extraction and publishingdocumentation- READMEs, API docs, ADRs
pokayokay deliberately adds process where coding agents tend to drift:
| Gate | What it prevents | Where it applies |
|---|---|---|
| Brainstorm gate | Vague tasks becoming wrong implementations | yokay-brainstormer, /work |
| TDD gate | Behavior changes without tests | /quick, yokay-implementer, domain skills |
| Systematic debugging | Symptom patches and guess-and-check fixes | /fix, yokay-fixer |
| Spec then quality review | Good-looking code that misses requirements | /work task completion |
| Verification before completion | Unverified "done/fixed/passing" claims | Agents, commands, handoffs |
| Finishing branch | Ambiguous cleanup or accidental discard | Worktree and branch completion |
The /pokayokay:audit command checks 5 dimensions:
| Dimension | Levels | Description |
|---|---|---|
| Accessibility | L0-L5 | Is the feature user-accessible? |
| Testing | T0-T4 | Test coverage and types |
| Documentation | D0-D4 | Code comments to user docs |
| Security | S0-S4 | Input validation to hardened |
| Observability | O0-O4 | Logging to full telemetry |
/pokayokay:audit # Quick (accessibility only)
/pokayokay:audit --dimension testing # Specific dimension
/pokayokay:audit --full # All dimensions| Mode | Task | Story | Epic |
|---|---|---|---|
supervised |
PAUSE | PAUSE | PAUSE |
semi-auto |
log | PAUSE | PAUSE |
auto |
skip | log | PAUSE |
unattended |
skip | skip | skip |
Run multiple tasks simultaneously for faster throughput:
# Run up to 3 tasks in parallel
/pokayokay:work semi-auto -n 3
# Adaptive sizing (starts at 2, adjusts based on outcomes)
/pokayokay:work semi-auto -n autoNote:
-pis reserved for the Claude CLI--promptflag. Use-nfor parallel count.
How it works:
- Coordinator dispatches N implementer agents in a single message
- Each agent works independently with fresh context
- Results processed as they complete
- Dependency graph prevents unsafe parallelization
Adaptive mode (-n auto):
- Starts at 2 parallel tasks
- Scales up (max 4) when tasks succeed consecutively
- Scales down (min 2) when failures occur
- Displays batch size changes during session
Recommended settings:
- Default: 1 (sequential, safest)
- Independent tasks: 2-3
- Adaptive:
auto(recommended for most sessions) - Maximum: 5
Tradeoffs:
- Higher token usage (N concurrent contexts)
- Potential git conflicts (auto-resolved when possible)
- No shared learning between parallel agents
pokayokay tracks subagent usage when runtime telemetry is available and prints it in the session summary. Use the numbers as a session-review signal, not as a hard billing source: some runtimes may report unavailable token counts.
Practical defaults:
| Work type | Token-aware default |
|---|---|
| Tiny edit or support request | /pokayokay:quick inline |
| Bug with unclear cause | /pokayokay:fix with root-cause debugging before edits |
| Broad codebase question | Explorer/test-runner style agents before full pipelines |
| Clear implementation task | One implementer, then spec/quality review |
| Independent backlog batch | /pokayokay:work -n 2 or -n auto before larger fan-out |
Codex skills use progressive disclosure, so keep skill descriptions concise and let references stay lazy-loaded. Claude Code and Codex subagents both preserve the main context, but they add separate model/tool work; spend that budget when isolation, review quality, or wall-clock speed is worth it.
Resume interrupted work sessions without losing context:
# Resume the last session, picking up where you left off
/pokayokay:work --continueLoads tasks with saved WIP data from ohno, skips brainstorming for resumed tasks, and dispatches the implementer with previous context.
When context fills during auto-mode work, sessions can automatically chain — finishing gracefully and spawning a new session that resumes from WIP. This is configured in .pokayokay/config.json, with .claude/pokayokay.json still supported for existing Claude Code projects:
{
"headless": {
"max_chains": 10,
"report": "on_complete",
"notify": "terminal"
}
}Chaining requires an explicit scope to prevent runaway sessions:
# Scope to a story — chains will continue until story tasks are done
/pokayokay:work auto --story story-abc123
# Scope to an epic
/pokayokay:work auto --epic epic-def456Chain reports are generated to .ohno/reports/. The max chains limit (default 10) prevents runaway execution.
Tasks automatically run in isolated git worktrees based on type:
| Task Type | Behavior | Override |
|---|---|---|
| feature, bug, spike | Worktree | --in-place |
| chore, docs | In-place | --worktree |
# Default: smart based on task type
/pokayokay:work
# Force worktree for a chore
/pokayokay:work --worktree
# Force in-place for a feature
/pokayokay:work --in-placeStory-based reuse: Tasks in the same story share a worktree, keeping related changes together.
On completion: Use finishing-branch to verify the branch and choose to
merge, create PR, keep the worktree, or discard work.
pokayokay includes 14 Claude Code sub-agents that run in isolated context
windows for verbose operations. The model column below reflects Claude Code
agent frontmatter aliases (haiku, sonnet, opus).
| Agent | Claude model alias | Purpose |
|---|---|---|
yokay-auditor |
Sonnet | L0-L5 completeness scanning (read-only) |
yokay-brainstormer |
Sonnet | Refines ambiguous tasks into clear requirements |
yokay-browser-verifier |
Sonnet | Browser verification for UI changes (read-only) |
yokay-design-reviewer |
Sonnet | Pre-implementation design and codebase-pattern review (read-only) |
yokay-explorer |
Haiku | Fast codebase exploration (read-only, 5-10x cheaper) |
yokay-fixer |
Sonnet | Auto-retry on test failures with targeted fixes |
yokay-implementer |
Opus | TDD implementation with fresh context |
yokay-planner |
Opus | PRD analysis and structured plan generation |
yokay-reviewer |
Sonnet | Code review and analysis (read-only) |
yokay-security-scanner |
Sonnet | OWASP vulnerability scanning (read-only) |
yokay-spec-reviewer |
Opus | Adversarial spec compliance review |
yokay-quality-reviewer |
Sonnet | Code quality review (after spec passes) |
yokay-spike-runner |
Sonnet | Time-boxed investigations |
yokay-test-runner |
Haiku | Test execution with concise output |
Codex does not use the Claude haiku / sonnet / opus aliases from these
Markdown agent files. In Codex, pokayokay currently relies on skills, hooks,
and MCP integration. A Codex-native agent layer would use .codex/agents/*.toml
files with OpenAI model IDs such as gpt-5.4, gpt-5.4-mini, or Codex models,
plus optional model_reasoning_effort. If a Codex subagent omits model
settings, it inherits the parent session model and reasoning effort.
- Context isolation - Verbose scan output stays separate from main conversation
- Cost optimization - Lightweight Claude aliases are used for simple exploration and test-running agents
- Enforced constraints - Read-only agents can't accidentally modify files
- Parallel execution - Run multiple investigations simultaneously
Commands like /pokayokay:audit, /pokayokay:security, and /pokayokay:spike automatically delegate to the appropriate agent.
For ambiguous or under-specified tasks, yokay-brainstormer runs before implementation:
- Detects vague descriptions or missing acceptance criteria
- Explores codebase for context
- Produces clear requirements and technical approach
- Requests confirmation before implementation proceeds
This prevents wasted work from misunderstood requirements.
After implementation, two sequential reviewers check the work:
| Stage | Agent | Checks |
|---|---|---|
| 1 | yokay-spec-reviewer |
Adversarial spec compliance (requirements met? no scope creep?) |
| 2 | yokay-quality-reviewer |
Code quality (well-written and tested?) |
Stage 2 only runs if Stage 1 passes. Both must PASS before a task is marked complete, and the quality reviewer must cite fresh automated-check evidence or state why a check was unavailable.
pokayokay includes a guaranteed hook system that executes actions at key lifecycle points through Claude Code and Codex hooks:
| Hook | Trigger | Actions |
|---|---|---|
| pre-session | Session starts | Verify clean, pre-flight (unattended), recover |
| pre-task | Task starts | Check blockers, suggest skills, setup worktree |
| post-task | Task completes | Sync, commit, detect spike, capture knowledge |
| post-story | Story completes | Test, story integration, audit gate |
| post-epic | Epic completes | Audit gate |
| on-blocker | Task blocked | Notification |
| pre-commit | Before git commit | Lint, check ref sizes |
| permission-request | Codex approval prompt | Allow obvious read-only/test/ohno commands, deny destructive/deploy commands |
| post-session | Session ends | Sync, session summary, curate memory, session chain |
Beyond lifecycle automation, hooks provide intelligent guidance:
| Action | Hook | Purpose |
|---|---|---|
suggest-skills |
pre-task | Suggests relevant skills based on task keywords |
detect-spike |
post-task | Detects uncertainty signals, suggests spike conversion |
capture-knowledge |
post-task | Auto-suggests docs for spike/research tasks |
audit-gate |
post-story/epic | Checks quality thresholds at boundaries |
Hooks are registered through the plugin system and routed by bridge.py. Claude
Code loads plugin hooks from hooks/hooks.json; Codex setup writes equivalent
hook wiring to ~/.codex/config.toml because Codex hooks are configured
through config files. The ohno MCP server provides boundary metadata when
tasks complete, enabling automatic detection of story/epic completion.
Codex approval handling is intentionally conservative. pokayokay may auto-allow read-only inspection, pokayokay tests, and ohno bookkeeping. Destructive, deployment, publishing, push, and history-rewrite commands are denied or left to the normal human approval flow.
Use /pokayokay:hooks to view and manage hook configuration.
See HOOKS.md for configuration and customization.
For high-uncertainty work:
- Time-box: 2-4 hours (max 1 day)
- 50% Checkpoint: Assess progress
- Mandatory Decision: GO / NO-GO / PIVOT / MORE-INFO
- Output:
.claude/spikes/[name].md
- GUIDE.md - Detailed usage guide
- CHEATSHEET.md - Quick reference card
- CHANGELOG.md - Version history
See GUIDE.md for:
- Command relationships diagram
- Skill routing patterns
- Keyword detection
- Integration with ohno
git clone https://github.com/srstomp/pokayokay.git
# Claude Code development
claude --plugin-dir ./plugins/pokayokay
# Codex local marketplace development
codex plugin marketplace add .
codex plugin add pokayokay@pokayokay
npm --prefix cli install
node cli/bin/cli.jsUseful checks while developing:
bash plugins/pokayokay/tests/codex-compatibility.test.sh
node plugins/pokayokay/tests/cli-dual-runtime.test.mjs
bash plugins/pokayokay/tests/bridge-runtime-normalization.test.sh- ohno - Task management via MCP
MIT