Skip to content

[FORGE-96] feat(orchestrator): refactor CLI verb suite (P2.5-T05; absorbs FORGE-79)#166

Merged
firatcand merged 10 commits into
mainfrom
feat/FORGE-96-refactor-cli-verb-suite
May 17, 2026
Merged

[FORGE-96] feat(orchestrator): refactor CLI verb suite (P2.5-T05; absorbs FORGE-79)#166
firatcand merged 10 commits into
mainfrom
feat/FORGE-96-refactor-cli-verb-suite

Conversation

@firatcand
Copy link
Copy Markdown
Owner

Summary

Refactor forge orchestrate from the v0.2.x flat 6-verb surface into the v2 verb-table dispatcher with a strict read-only / user-approved-mutating split (per spec/ORCHESTRATOR.md §CLI surface, rewritten 2026-05-17). Absorbs FORGE-79 (file-glob overlap library) per plan fork F10.

  • 17 verbs registered across two bands plus a nested run sub-tree.
    • Read-only (no lease, no tracker mutation): phases, status, questions, doctor, attach, spec-diff, run list.
    • User-approved mutating: claim, dispatch, heartbeat, question, answer, event, complete, cancel, reconcile, gc, run start.
  • Stable JSON envelope on every --json invocation: { ok, data? | error: { code, message, retriable, details? } }.
  • Verb-table dispatcher (src/cli/orchestrate/index.ts) — Map<string, VerbHandler | Map<string, VerbHandler>>. --help walks the registry so the read/mutate classification table cannot drift from the implementation (guarded by a unit test).
  • Centralized arg validation in src/schemas/cli-args.ts — one zod schema per verb plus shared primitives (TaskIdSchema, UUIDv7 schemas for run/claim/attempt/question, DecisionKeySchema, PhaseSchema, RoutingHintSchema).
  • src/orchestrator/overlap.ts (absorbed FORGE-79) — pure file-glob classifier consumed by phases --ready to tag candidates no-overlap | soft-overlap | hard-overlap. Tasks without declared write_globs default to worst-case-assume-overlap-on-hard-lock-only.
  • TaskSchema.status + TaskSchema.write_globs — optional fields on phases.yaml task entries; the loader was previously silently dropping these.
  • 3-task e2e integration test spawns dist/bin/forge.cjs and drives the full lifecycle (run start → phases → claim → dispatch → heartbeat → question → cancel | complete) against a fixture. FORGE_NOOP_TRACKER=1 keeps it hermetic.
  • Dropped without alias: next, suggest-next, session-check, intent-detect. Replacements: phases --ready for listing, status for re-grounding, /amend-roadmap for intent.

Why

P2.5-T05 ships the simplified CLI surface that the rest of Phase 2.5 sits on top of:

  • Dispatch skill rewrite (FORGE-98) builds on the read/mutate split.
  • Doctor scopes (FORGE-99) extend the --scope flag.
  • Reconcile (FORGE-100, just merged) now flows through the same dispatcher.

The dropped verbs were redundant in the simplified design — sole-user decision, no compat shim per plan F4.

Forks asked & answered (in plan)

Fork Answer
F1 doctor scope A — SPEC↔code only; ADR scopes return SCOPE_NOT_IMPLEMENTED
F2 file layout A — migrate all 6 existing verb files into src/cli/orchestrate/ (via git mv so blame follows)
F4 run depth A — 3-level `run start
F10 overlap B — absorb FORGE-79 into this PR
F6 zod schemas A — centralized src/schemas/cli-args.ts
F7 e2e test A — node --test spawns dist/bin/forge.cjs subprocess

Known limitations (documented in code)

  1. Tracker wiring deferred to FORGE-98: claim/dispatch/heartbeat use a ClaimableTracker interface. Production wiring reads .forge/settings.yaml but currently returns NO_TRACKER_CONFIGURED for real configs; FORGE_NOOP_TRACKER=1 or missing settings yields a NoopTracker. This keeps FORGE-96 scope tight and unblocks the dispatch-skill PR that will provide the real factory.
  2. Heartbeat fencing: spec line 181 has no --claim flag, so the verb reads identity from lease.json to renew. The lower-level leases.heartbeat() still has the proper LEASE_STOLEN fence; CLI-level detection of stale workers is a follow-up.
  3. verdict.verified.json: CLI self-attest only in v0.4; full re-verification (re-running tests + lint via spawnSync) is a follow-up.

Test plan

  • npm run typecheck — green
  • npm test — 1034 / 0 / 11 (skipped) — includes 83 new tests across overlap (18), envelope (8), cli-args schemas (16), phases/doctor/run-list/run-start (15), claim/dispatch/heartbeat (12), question/event/complete/cancel (10), help-drift (4), 3-task e2e (3)
  • npm run build — green
  • npm run test:pack — 0 forbidden paths in tarball
  • node dist/bin/forge.cjs --version — 0.2.2
  • node dist/bin/forge.cjs orchestrate — classification table renders
  • gitleaks detect — no leaks
  • Merged origin/main (FORGE-100 reconcile + FORGE-115 worktree std) and rewired reconcile through the new verb table

Linked

Closes FORGE-96 (P2.5-T05).
Closes FORGE-79 (P2-T19) — absorbed per plan F10.

🤖 Generated with Claude Code

firatcand and others added 10 commits May 17, 2026 21:20
…5-T05 step 1)

Lays the ground for FORGE-96 by reorganizing the orchestrate CLI surface
into a single directory and a single shared dispatcher.

- Migrate 6 existing flat verb files into src/cli/orchestrate/{verb}.ts via
  git mv so blame follows. Bump all relative imports one level deeper.
- Same migration for the 5 unit-test files and the spec-diff e2e file.
- New src/cli/envelope.ts: ok/fail/emit helpers + EnvelopeSchema (zod)
  producing the spec'd { ok, data? | error?: { code, message, retriable } }
  envelope on --json.
- New src/cli/orchestrate/flags.ts: extracts parseFlag / hasFlag /
  firstPositional from src/bin/forge.ts so every verb can reuse them; adds
  nthPositional + resolveForgeDir.
- New src/schemas/cli-args.ts: centralized zod schemas per verb plus shared
  primitives (TaskIdSchema, RunIdSchema/ClaimIdSchema/AttemptIdSchema as
  UUIDv7, DecisionKeySchema, PhaseSchema, RoutingHintSchema). Per-fork
  decision F6.
- New src/cli/orchestrate/index.ts: verb-table dispatcher
  (Map<string, VerbHandler | Map<string, VerbHandler>>) with adapter
  wrappers for the 6 existing verbs, --help walking the registry, and a
  registerVerb hook for subsequent step files. Per-fork decision F3 + F2.
- src/bin/forge.ts simplifies to a single dispatchOrchestrate call.
- test/unit/bin/forge.test.ts: update two assertions pinned to the prior
  if-chain (usage line layout + UNKNOWN_VERB envelope code).

All 876 tests still pass.

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…P2-T19)

Absorbed into FORGE-96 per plan fork F10. Pure library — no I/O, no event
emission, no CLI surface — consumed by phases --ready (Step 3) to surface
hard-overlap and soft-overlap classification when listing ready tasks.

- src/orchestrator/overlap.ts: classifyOverlap({ activeAttempts, candidate,
  hardLockGlobs? }) → { classification: 'no'|'soft'|'hard', offendingGlobs,
  conflictingTaskIds }.
- Tiny pure-JS minimatch-style compileGlob (literals + ? + * + **) to avoid
  pulling a glob dependency.
- DEFAULT_HARD_LOCK_GLOBS list from ORCHESTRATOR.md §File-glob declarations
  (package.json, lockfiles, tsconfig.json, plans/phases.yaml, src/index.ts,
  migrations/**, prisma/schema.prisma).
- Worst-case-assume-overlap-on-hard-lock-only for tasks without declared
  write_globs.
- 18 unit tests covering: pair classification matrix, hard-lock category
  matching, undeclared globs (both sides + one side), multi-active sets,
  hardLockGlobs override, deduplication, spec-default content.

Closes FORGE-79.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…step 3)

Wires the read-only band of the v2 orchestrate CLI surface.

- src/cli/orchestrate/phases.ts: `phases [--ready] [--phase] [--blocked-by]
  [--limit]`. With --ready, filters by deps-shipped + non-deferred status,
  then tags every candidate with overlap classification (no/soft/hard) via
  the FORGE-79 library. Returns { tasks, overlap_check: 'enabled' }.
- src/cli/orchestrate/doctor.ts: `doctor [--scope spec-code]`. Default scope
  scans spec/SPEC.md (+ ORCHESTRATOR.md / PRD.md if present) for src/...
  path references and reports drift when the file is missing on disk.
  --scope adr-drafts and --scope apply-journal explicitly fail with
  SCOPE_NOT_IMPLEMENTED (deferred to FORGE-99 / v0.5 per plan F1).
  Exit codes: 0 clean, 1 warnings, 2 drift detected.
- src/cli/orchestrate/run-start.ts + run-list.ts: the `run` sub-tree. Start
  writes runs/<uuidv7>/manifest.json + touches notifications.jsonl; list
  enumerates manifests sorted most-recent-first with optional --active
  filter (heuristic: zero notifications = quiesced).
- src/cli/orchestrate/index.ts: registers phases/doctor/run-start/run-list
  in the verb table. `run` is a nested Map so `forge orchestrate run start`
  routes correctly (plan F4).
- src/schemas/phases.ts: extend TaskSchema with optional `status`
  (TASK_STATUSES = active|paused|done|deferred-v0.5|dropped), `write_globs`
  (consumed by overlap), and the deferred/dropped metadata keys that
  phases.yaml already carries.
- src/schemas/index.ts: barrel re-exports TASK_STATUSES + TaskStatus type.
- src/cli/envelope.ts: tighten EnvelopeSchema so a malformed
  `{ ok: true }` (missing data) fails validation (Codex-style nit caught
  by the new envelope test).
- Tests: envelope (8), cli-args schemas (16), phases (5), doctor (4),
  run-start (3), run-list (3), help-table drift guard (4). Total: +43.

All 934 tests pass.

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… step 4)

Implements the state-machine entry path of the v2 orchestrate CLI surface:
unclaimed → claimed → dispatched → running.

- src/cli/orchestrate/claim.ts: atomic tracker.claim + leases.acquire +
  writeTaskState('claimed'). On tracker refusal → ALREADY_CLAIMED.
  On LEASE_EXISTS → tracker.releaseClaim rollback + LEASE_CONFLICT.
  Decision F-binary-tracker-wiring: claim is dependency-injectable
  (ClaimableTracker interface — claim/releaseClaim only) so unit tests
  can stub. Production wiring reads .forge/settings.yaml; when no
  settings exist OR FORGE_NOOP_TRACKER=1, a NoopTracker is returned.
  Real tracker construction from settings (Linear/GitHub/Notion) is
  deferred to FORGE-98 (dispatch skill rollout) — surfaces as a clear
  NO_TRACKER_CONFIGURED error rather than silently noop-ing.
- src/cli/orchestrate/dispatch.ts: validates --claim against lease.json,
  mints attempt_id (UUIDv7), writes attempt manifest.json atomically
  (wx flag), appends 'attempt_started' event, transitions state to
  'dispatched'. Refuses non-claimed states + wrong claim_id.
- src/cli/orchestrate/heartbeat.ts: wraps leases.heartbeat to renew
  lease; first invocation also transitions dispatched → running.
  Spec line 181 documents the verb signature without --claim/--run;
  the verb reads the lease from disk to identify the renewer.
  Limitation: cannot distinguish a stale worker from a fresh holder
  without an explicit identity flag — documented in commentary; a
  future ticket may add --claim for stricter fencing.
- src/cli/orchestrate/tracker-factory.ts: ClaimableTracker interface +
  NoopTracker + resolveTrackerForCLI factory.
- Registered claim/dispatch/heartbeat in the verb table; updated the
  help-drift guard test.
- Tests: claim (5), dispatch (4), heartbeat (3). Total: +12.

All 946 tests pass.

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cel (FORGE-96 step 5)

Completes the mutating band of the v2 orchestrate CLI surface — the
worker-side write path plus terminal-state verbs.

- src/cli/orchestrate/question-write.ts: wraps writeQuestionAtomic. Loads
  options from --options-file (JSON array; defaults to yes/no when absent).
  Accepts --decision-key, --question, optional --drift-event-id,
  --routing-hint apply-decision|amend-roadmap (echoed in response). Builds a
  full Question record (UUIDv7 question_id, ts/expires +24h, default
  classification=architectural/other/medium/module/ask). State transitions
  running → blocked_on_question; appends 'question_written' event.
- src/cli/orchestrate/event.ts: pure append. Accepts --type + --data JSON
  payload; validates against AttemptEventSchema discriminated union; rejects
  unknown event types with INVALID_EVENT. Used for 'drift', 'heartbeat',
  'worktree_inspected', etc. No state transition.
- src/cli/orchestrate/complete.ts: reads --verdict-file (VerdictSchema),
  writes verdict.json (worker-reported) + verdict.verified.json (CLI
  self-attest; full re-verification deferred to follow-up). Transitions
  state per (phase, verdict):
    implement+ready_for_review → ready_for_review
    review+ready_for_review → reviewed
    ship+ready_for_review → shipped (terminal)
    *+changes_needed/blocked → running (loop-back)
  Appends 'attempt_completed' event.
- src/cli/orchestrate/cancel.ts: cancellable from claimed/dispatched/
  running/blocked_on_question/awaiting_respawn. Transitions to 'cancelled'
  (terminal), appends 'attempt_cancelled' event (best-effort), releases
  lease. Refuses terminal-state input with INVALID_STATE.
- Registered question/event/complete/cancel in verb table.
- Tests: question-write (3), event (2), complete (3), cancel (2). +10.

All 956 tests pass. The v2 verb suite is now feature-complete in this PR.
Next: 3-task fixture + e2e subprocess integration test (Step 6).

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…step 6)

Satisfies the spec acceptance criterion 'Integration test: 3-task fixture
run end-to-end via CLI verbs from shell script' by spawning dist/bin/forge.cjs
as a subprocess and driving the full v2 orchestrate lifecycle:

- test/fixtures/orchestrator/3-task-phases.yaml: 3 independent tasks
  (ETOE-1/2/3) — one for happy path, one for question/cancel, one for
  cancel-mid-flight.
- test/integration/cli/orchestrate/e2e.test.ts:
  - run start → captures run_id
  - phases --ready --phase implement → asserts 3 ready tasks
  - ETOE-1: claim → dispatch → heartbeat → complete (verdict ready_for_review,
    phase implement) → state ready_for_review
  - ETOE-2: claim → dispatch → heartbeat → question → cancel → state cancelled
  - ETOE-3: claim → dispatch → cancel (no heartbeat) → state cancelled
  - run list shows the run

before() hook auto-runs npm run build if dist/bin/forge.cjs is missing so
the test is runnable in any environment. FORGE_NOOP_TRACKER=1 makes the
test hermetic (no Linear/GitHub/Notion deps).

Drive-by fix: src/bin/forge.ts only matches --help / --version when they
are the FIRST argument, so `forge orchestrate run --help` reaches the
dispatcher's scoped help instead of getting swallowed by the global one.

All 959 tests pass (3 new e2e tests).

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…RGE-96 step 7)

Unreleased section now documents:
- The v2 read-only / mutating verb taxonomy (17 verbs) with their flag sets.
- The verb-table dispatcher with the classification-table drift guard.
- src/orchestrator/overlap.ts (absorbed FORGE-79) + DEFAULT_HARD_LOCK_GLOBS.
- TaskSchema.status + TaskSchema.write_globs (optional fields the loader was
  previously dropping silently).
- 3-task e2e fixture + integration test against dist/bin/forge.cjs.

Removed section: forge orchestrate next | suggest-next | session-check |
intent-detect dropped without a deprecation alias (sole-user decision).
Replacements documented inline.

Smoke checks before commit:
- npm run build: green
- node dist/bin/forge.cjs --version → 0.2.2
- node dist/bin/forge.cjs orchestrate → classification table renders
- npm test: 959 pass / 0 fail / 11 skip
- npm run test:pack: 0 forbidden paths in tarball

Refs FORGE-96.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both parallel sessions merged to main while FORGE-96 was in flight.
Conflicts + structural integration resolved:

- src/bin/forge.ts: took the FORGE-96 dispatcher migration (delegates to
  dispatchOrchestrate); deleted the legacy if-chain that origin/main had
  extended to include a 'reconcile' branch (now routed through the verb
  table instead).
- src/schemas/phases.ts: combined the two TaskSchema extensions —
  FORGE-96's strict TASK_STATUSES enum + write_globs field, and main's
  paused_at/paused_reason metadata + broader TASK_TYPES (skill, docs) +
  PHASE_STATUSES ('paused') + id regex (P2.5-style decimals).
- CHANGELOG.md: auto-merged cleanly; both 'Added' entries coexist.

Migrated the FORGE-100 reconcile verb into the new directory layout:
- src/cli/orchestrate-reconcile.ts → src/cli/orchestrate/reconcile.ts
- test/unit/cli/orchestrate-reconcile.test.ts → .../orchestrate/reconcile.test.ts
- test/integration/cli/reconcile.e2e.test.ts → .../orchestrate/reconcile.e2e.test.ts
- Relative imports bumped one level deeper; references to the old flat
  module path rewritten.
- Registered reconcileHandler (band: mutate) in the verb table + added to
  HELP_ORDER + the help-drift guard test.

Smoke checks:
- npm run typecheck: green
- npm test: 1034 pass / 0 fail / 11 skip (was 959 in FORGE-96 alone +
  ~75 from FORGE-100's reconcile suite)
- npm run build: green
- node dist/bin/forge.cjs orchestrate: classification table renders
  'reconcile' in the mutating band

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The git mv + perl import-rewrite sequence in the prior merge commit
landed the moves but not the rewrites — the moved files still referenced
their old relative paths. Bump them now:

- src/cli/orchestrate/reconcile.ts: ../core/* → ../../core/*, etc.
- test/unit/cli/orchestrate/reconcile.test.ts: ../../../src/cli/orchestrate-reconcile
  → ../../../../src/cli/orchestrate/reconcile
- test/integration/cli/orchestrate/reconcile.e2e.test.ts: same

Tests + typecheck were green before this commit because tsx resolves
relative paths from the moved file location on every run — the earlier
test passes were against the post-edit content even though git hadn't
recorded the edit. This commit closes the gap so the published tree
matches what was tested.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three correctness/UX gaps surfaced by /codex review on PR #166:

1. **claim.ts atomic rollback** (Codex HIGH/8) — if writeTaskState throws
   after leases.acquire + tracker.claim both succeeded, the lease and
   tracker claim were left in place; the next claim would see
   ALREADY_CLAIMED + LEASE_EXISTS and required manual gc. Now explicitly
   releases the lease + tracker claim before returning the IO_ERROR
   envelope. Reports `rolled_back: true | false` so the caller can tell
   whether retry is safe. Surfaces partial-rollback failures in
   `rollback_errors` for gc visibility.

2. **tracker-factory.ts NoopTracker bootstrap warning** (Codex MEDIUM/9)
   — when settings.yaml is missing, the factory still returns NoopTracker
   (preserves local-dev workflows) but now writes a stderr warning so the
   silent fallback can't be mistaken for a real claim. Suppressed when
   FORGE_NOOP_TRACKER=1 is set explicitly (test path).

3. **complete.ts per-phase state pre-check** (Codex HIGH/9) — `complete
   --phase ship` from state='running' threw an opaque ILLEGAL_TRANSITION
   from writeTaskState. Added an upfront state-vs-phase check that
   refuses with INVALID_STATE_FOR_PHASE before writing the verdict file,
   giving the caller a clear error: `current_state`, `required_state`,
   `phase` in details. Map per spec §State machine:
     implement+ready_for_review fires from 'running'
     review+ready_for_review fires from 'ready_for_review'
     ship+ready_for_review fires from 'reviewed'

Tests:
- claim.test.ts: +1 test (writeTaskState failure → full rollback;
  verifies lease.json removed + tracker.releaseClaim called).
- complete.test.ts: rewrote the ship-from-running test to assert
  INVALID_STATE_FOR_PHASE refusal with helpful details (was previously
  asserting only that verdict_path was written).
- tracker-factory.test.ts: new file with 3 tests for the FORGE_NOOP
  override, the bootstrap warning, and malformed-settings → TRACKER_
  INIT_FAILED path.

All 1038 tests pass.

Codex finding #2 (overlap correctness) and #5 (heartbeat LEASE_STOLEN
dead-code) confirmed as no-action — overlap was correct; LEASE_STOLEN
dead path is a known/documented limitation, not a bug.

Refs FORGE-96, Codex review on PR #166.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@firatcand firatcand merged commit 9626946 into main May 17, 2026
10 checks passed
@firatcand firatcand deleted the feat/FORGE-96-refactor-cli-verb-suite branch May 17, 2026 20:18
firatcand added a commit that referenced this pull request May 17, 2026
Promote three orphan Linear tickets surfaced by /sync-status into
plans/phases.yaml. Codex-reviewed before commit.

- P2.5-T21 (FORGE-122) — fix e2e/perf tests failing from worktree;
  hardcoded ${repoRoot}/node_modules/.bin/tsx path doesn't resolve
  from .forge/worktrees/<id>/. infra · S · P1.
- P2.5-T22 (FORGE-81) — fix /learn back-propagation; writes hit the
  worktree but /pickup-task hydrates from main, so worktree-written
  learnings never sync. skill · S · P1. Implementation shape not
  pre-bound; design options A–E live in the Linear ticket body.
- P3-T05 (FORGE-117) — refactor NotionTracker to ntn CLI transport
  + implement updateIssueBody. Required for P3-T03 greenfield-notion
  e2e to exercise the closed-loop tracker-body-mutation path.
  Linear milestone moved to Phase 3 to match. backend · M · P1.

Also bump P3-T03 depends_on to include P3-T05 so the Notion e2e
waits for the Notion adapter to ship updateIssueBody.

Companion to /sync-status orphan triage 2026-05-18: cancelled 8
superseded pre-rescope Phase-3 tickets (FORGE-24..30, FORGE-75) in
Linear with explainer comments pointing at FORGE-108..111; amended
FORGE-79 body to point at PR #166 / src/orchestrator/overlap.ts
(absorbed into FORGE-96 rather than shipped standalone).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant