All notable changes to this project will be documented in this file.
Format: Semantic Versioning. Bump per PR:
- Patch (0.0.x): bug fixes, doc updates
- Minor (0.x.0): new features, backward-compatible
- Major (x.0.0): breaking changes
4-story patch closing v0.9.2's documented operational gap (coordinator subagent stuck mid-eval for 3+ hours during dogfood iteration 3). v0.9.3 self-validated: tier=LOW (G30 in retrospective). 24 consecutive LOW-tier self-validations (v0.6.5..v0.9.3).
Headline result: v0.9.2 iter-3 hang — CLOSED (engineered). New parent-side wallclock timeout wraps eval "$COORD_CMD" with timeout(1) from coreutils. Default 30 min ceiling; configurable via QL_COORDINATOR_TIMEOUT_S env var. On timeout's rc=124 (SIGTERM kill), parent forces SIGNAL_RESULT=WAVE_FAILED so per-story review-field aggregation runs.
- US-001 + US-002 (atomic) — parent-side wallclock timeout + KEEP-OFF rationale documentation —
quantum-loop.sh:~1592wraps coord dispatch withtimeout --kill-after=10s ${QL_COORDINATOR_TIMEOUT_S:-1800}s bash -c "$COORD_CMD". Afterrunner_parse_output: ifRUNNER_EXIT==124, forceSIGNAL_RESULT=WAVE_FAILED+ ERROR (pattern:Coordinator subagent exceeded.*timeout).ql_wrap_subagent_dispatchgate stays OFF under coordinator mode (US-002 comment expansion documents rationale: STALE detection unsafe under coordinator mode + cross-ref toQL_COORDINATOR_TIMEOUT_S). Newtests/test_coordinator_e2e.shTest 6: stubhung_coordinatormode (sleep 30s) +QL_COORDINATOR_TIMEOUT_S=5; asserts ERROR + per-story aggregation marks both stories failed. 13/13 → 16/16 (+3). - US-003 — multi-perspective post-merge review + 2 score-≥85 inline fixes — 3 parallel reviewers (architect + code-reviewer + security). Verdicts: architect SHIP, code-reviewer REQUEST CHANGES, security SHIP. 2 inline fixes: (1) HIGH score 90 — missing
timeoutavailability guard per FR-1; addedcommand -v timeoutcheck matchinglib/dep-manifest.sh:255; on missing, WARN + degrade to plainbash -c(v0.9.2 behavior). (2) MEDIUM score 88 — numeric validation onQL_COORDINATOR_TIMEOUT_S; non-numeric input → WARN + default 1800. New Test 7 validates the fixes. 16/16 → 18/18 (+2). - US-004 — retrospective + IDEA_REPORT_v30 + version bump 0.9.2 → 0.9.3 — this entry.
| Test file | v0.9.2 | v0.9.3 | delta |
|---|---|---|---|
tests/test_coordinator_e2e.sh (+Test 6 + Test 7) |
13 | 18 | +5 |
| Total v0.9.3 added: | +5 |
v0.9.x track has reached operational + structural maturity. v0.9.0 shipped per-wave coordinator dispatch wires. v0.9.1 validated empirically (streak BROKEN). v0.9.2 closed 5a HIGH (engineered guard). v0.9.3 closes operational iter-3 hang (engineered timeout). The next significant cycle would be v0.10.0 (architect-recommended outer-loop replacement; deferred from v0.9.0).
v0.9.4 candidate slate (in idea-stage/IDEA_REPORT_v30.md): cosmetic housekeeping + 1 optional real-feature dogfood. Mostly LOW severity. May skip and pivot to v0.10.0 design phase.
- US-003 review surfaced 1 HIGH (FR-1 violation). Reviewer originally said REQUEST CHANGES; INLINE FIX brought it to SHIP. The HIGH was a real PRD AC violation (FR-1 said "fallback if
timeoutmissing" but the v0.9.3 commit-1 implementation lacked the guard). Inline fix closed it cleanly.
4/4 user-facing stories shipped first-attempt PASS. Multi-cycle CSV milestone: 58 → 61 rows (3 advisory rows). G30 self-validation captured. Manual-takeover streak BROKEN through v0.9.3 — cron-driven /loop pattern handled v0.9.3 end-to-end with no operator intervention. Full retrospective: idea-stage/PIPELINE_REPORT_v30.md. v0.9.4 backlog: idea-stage/IDEA_REPORT_v30.md.
5-story patch closing v0.9.1's known-issue advisory and 2 reviewer MEDIUMs from v0.9.1 US-004 review trio. v0.9.2 self-validated: tier=LOW (G30 in retrospective). 23 consecutive LOW-tier self-validations (v0.6.5..v0.9.2).
Headline result: v0.9.1 finding 5a HIGH — EMPIRICALLY CLOSED. New lib/coordinator-guard.sh::guard_head_advance uses git merge-base --is-ancestor to detect destructive git reset --hard by implementer subagents. The v0.9.2 dogfood deliberately instructed an implementer to run a reset; across 2 iterations the guard fired correctly each time, marked the misbehaving story failed in review fields, and emitted <quantum>WAVE_FAILED</quantum>. See idea-stage/dogfood-v0.9.2-findings.md for full evidence.
- US-001 — coordinator HEAD-snapshot guard —
lib/coordinator-guard.sh(NEW; ~50 LOC).guard_head_advance HEAD_BEFORE [HEAD_AFTER]returns 0 if HEAD_BEFORE is ancestor of HEAD_AFTER, else 1 with stderr "HEAD reset detected". Per v0.9.1 US-004 security review: ancestry check is mandatory (ordinal SHA comparison would falsely accept reset-and-recommit siblings).agents/coordinator.mdstep 2 amended: HEAD_BEFORE pre-dispatch capture + post-dispatch guard invocation; failure path = mark review fields failed + emit WAVE_FAILED. Newtests/test_coordinator_guard.sh(4 cases / 8 asserts; all PASS). - US-002 — gate legacy STORY_ case branches under coordinator mode* — Defense-in-depth: under
COORDINATOR_MODE=true, ifSIGNAL_RESULTmatchesSTORY_PASSED|STORY_FAILED|BLOCKED, log WARNING (pattern "Unexpected STORY_.*coordinator mode") and redirect toWAVE_FAILEDbranch for per-story review-field aggregation. 9-line gate inquantum-loop.shbeforecase "$SIGNAL_RESULT" in. Legacy non-coord path byte-for-byte unchanged. New Test 5 intests/test_coordinator_e2e.sh(3 asserts). Closes v0.9.1 code-reviewer MEDIUM. - US-003 — filePaths validation gap —
lib/quantum-validate.sh(NEW; ~40 LOC).validate_story_filepathsadvisory hook emits stderr WARNING for any eligible story (status pending|failed|in_progress) whosetasks[].filePathsis empty in aggregate.lib/dag-query.sh::next_waveinvokes once per call as advisory preamble (warnings to stderr; never blocks). Newtests/test_quantum_validate.sh(5 cases / 11 asserts). Closes v0.9.1 architect MEDIUM (filter_file_conflictssilent bypass). - US-004 — real-LLM dogfood validates HEAD-snapshot guard — Synthetic 2-story bundle in
.ql-wt/dogfood-v092/. US-A normal flow; US-B task fixture explicitly instructsgit reset --hard HEAD~1to test guard. Result: guard fired correctly across 2 iterations. Reflog confirms each iteration: implementer reset + commit (sibling, not descendant); coordinator detected; marked failed; emitted WAVE_FAILED; per-story aggregation routed correctly. Iteration 3 hung (operator killed); v0.9.3 candidate. Findings doc:idea-stage/dogfood-v0.9.2-findings.md. - US-005 — retrospective + IDEA_REPORT_v29 + version bump 0.9.1 → 0.9.2 + worktree cleanup — this entry.
| Test file | v0.9.1 | v0.9.2 | delta |
|---|---|---|---|
tests/test_coordinator_guard.sh (NEW) |
— | 8 | +8 |
tests/test_quantum_validate.sh (NEW) |
— | 11 | +11 |
tests/test_coordinator_e2e.sh (+Test 5) |
10 | 13 | +3 |
| Total v0.9.2 added: | +22 |
v0.9.1 finding 5a HIGH closed engineered (not just emergent). v0.9.1's coordinator self-healed via emergent recovery; v0.9.2 ships an engineered safety net that detects destructive ops deterministically. The fix is small (~50 LOC in lib + 1 instruction amend) but the empirical validation across 2 dogfood iterations is the real proof.
v0.9.3 candidate slate (in idea-stage/IDEA_REPORT_v29.md): primary item is iter-3 hang — coordinator subagent stuck mid-eval for 3+ hours during dogfood iteration 3. Operational, not architectural. v0.9.3 will add a parent-side wallclock timeout on eval "$COORD_CMD" and re-evaluate ql_wrap_subagent_dispatch STALE detection under coordinator mode.
- US-004 dogfood iteration 3 hung > 3 hours; operator killed parent shell to capture findings. Iterations 1+2 are conclusive evidence of guard correctness; iter-3 hang documented as v0.9.3 candidate.
- Multi-perspective review trio NOT run for v0.9.2 (US-004 was dogfood; review trio deferred to v0.9.3 US-004 per the v0.9.1 pattern).
5/5 user-facing stories shipped first-attempt PASS. Multi-cycle CSV milestone: 55 → 58 rows (3 advisory rows). G30 self-validation captured. HEAD-snapshot guard validated empirically — first cycle to ship engineered defense against implementer destructive ops. Full retrospective: idea-stage/PIPELINE_REPORT_v29.md. v0.9.3 backlog: idea-stage/IDEA_REPORT_v29.md.
5-story validation patch confirming v0.9.0 N42 (per-wave coordinator dispatch under --coordinator opt-in) works end-to-end with a real LLM. v0.9.1 self-validated: tier=LOW (G30 captured in retrospective). 22 consecutive LOW-tier self-validations (v0.6.5..v0.9.1).
Headline result: 18-cycle manual-takeover streak BROKEN. First cycle in 19 (v0.6.7..v0.9.0 was the streak) where the autonomous loop drove a complete 2-story plan to <quantum>COMPLETE</quantum> exit 0 without manual takeover. See idea-stage/dogfood-v0.9.1-findings.md for full evidence.
Operators using --coordinator with real (non-synthetic) plans should be aware: Implementer subagents may run destructive git operations (e.g., git reset --hard) in the shared coordinator worktree. The coordinator's self-healing observed in v0.9.1 dogfood was emergent, not engineered — there is no HEAD_BEFORE/HEAD_AFTER guard in the coordinator contract. The single dogfood incident (US-B implementer wiped US-A's commit; coordinator restored via fix commit) recovered cleanly only because the synthetic plan was trivially simple. Real-feature dispatch with multi-file diffs may not recover. Per-story isolated worktrees OR coordinator HEAD-snapshot guard is planned for v0.9.2.
- US-001 — real-LLM dogfood — synthetic 2-story bundle in
.ql-wt/dogfood-v091/worktree on throwaway branchdogfood-v091-runtime(off master at v0.9.0d66aaf8). Firedbash quantum-loop.sh --coordinator --max-iterations 5. Iteration 1 dispatched coordinator with both stories; coordinator emittedWAVE_PASSED(exact confidence); per-story aggregation marked bothpassed. Iteration 2 returned COMPLETE. End-to-end successful. - US-002 — capture findings —
idea-stage/dogfood-v0.9.1-findings.md(6 sections, evidence-cited). Per-claim verdict: 4 minimum claims (next_wave invocation, spawn_coordinator output reachability, per-story aggregation routing, coordinator scope boundedness) all WORKED. One real defect surfaced (5a HIGH) + one cosmetic (5b LOW). - US-003 — inline fix for finding 5b —
quantum-loop.sh:1569-1577legacySpawning %s for story %s...printf gated under[[ "$COORDINATOR_MODE" != "true" ]]. Cosmetic only; coordinator-mode branch already prints accurateSpawning coordinator for wave-N with K story/stories: .... 5a deferred to v0.9.2 with explicit rationale. - US-004 — multi-perspective post-merge review — 3 parallel reviewers (architect + code-reviewer + security). Code-reviewer: APPROVE with 1 MEDIUM latent (
STORY_PASSED/STORY_FAILED/BLOCKEDcase branches still use scalar$STORY_IDunder coordinator mode) → v0.9.2 candidate. Architect: streak-break HONEST; bumped 5a severity from MEDIUM-HIGH to HIGH; surfaced 3 new v0.9.2 risks (per-story worktree isolation conflicts with--coordinator --parallel;filePathsomission silent bypass; N46 deferred-debt). Security: 0 active findings. - US-005 — retrospective + IDEA_REPORT_v28 + version bump 0.9.0 → 0.9.1 + worktree cleanup — this entry.
No new test files. All 6 v0.9.0 test suites green post-US-003: test_next_wave.sh 18/18, test_coordinator_e2e.sh 10/10, test_signal_parsing.sh 15/15, test_orchestrator_liveness.sh 34/34, test_quantum_loop_recovery.sh 5/5, test_coordinator_dispatch.sh 7/7 (89/89 total).
v0.9.0 N42's wires worked on first real-LLM exercise. This is the third architectural dogfood-pattern validation: v0.7.x (manual takeover required) → v0.8.x (4-layer N33 closure shipped wires structurally) → v0.9.0/v0.9.1 (wires fired correctly under real LLM dispatch).
v0.9.2 candidate slate (from US-004 review synthesis): primary item is finding 5a (HIGH) — coordinator HEAD-snapshot guard OR per-story worktree isolation. Secondary items: code-reviewer's MEDIUM (gate STORY_PASSED/FAILED/BLOCKED case branches under coordinator mode); architect's filePaths validation gap (silent conflict-filter bypass on empty arrays); architect's per-story worktree isolation conflict with --coordinator --parallel mutual exclusion; N46 (respawn output not re-parsed). See idea-stage/IDEA_REPORT_v28.md.
- US-001 expanded from "small dogfood" to "small dogfood + emergent coordinator self-healing diagnostic" when the implementer subagent ran
git reset --hard. Diagnostic-only ship per Option C semantics; fix deferred to v0.9.2. - US-005 includes worktree cleanup (operator-staged in quantum.json plan).
5/5 user-facing stories shipped first-attempt PASS. 0 retries on the bundle branch. Multi-cycle CSV milestone: 52 → 55 rows (3 advisory rows; design/prd/plan all 1 LOW). G30 self-validation captured in retrospective. First cycle since v0.6.7 to break the manual-takeover streak. Full retrospective: idea-stage/PIPELINE_REPORT_v28.md. v0.9.2 backlog: idea-stage/IDEA_REPORT_v28.md.
7-story minor cycle replacing single-spawn dispatch with coordinator-driven per-wave dispatch when operators opt in via --coordinator. First minor since v0.8.0 (which shipped N33 closure infrastructure). v0.8.x closed the N33 anti-pattern at 4 layers + verified all v0.9.0 prerequisites; v0.9.0 ships the actual cure. v0.9.0 self-validated: tier=LOW score=20 files=8 sensitive=0 → skip with automated:true. 21 consecutive LOW-tier self-validations (v0.6.5..v0.9.0).
Architectural newness justifying minor-tier: new CLI behavior path (--coordinator actually does something different from --legacy-orchestrator for the first time), new dispatch architecture in the sequential path, new function (lib/dag-query.sh::next_wave), and new field-ownership enforcement.
- US-000 — coordinator.md field-ownership amendment —
agents/coordinator.md:25previously instructed the coordinator to "set each story's status to passed or failed" while line 105 said "Parent loop ownsstories[].status" (direct contradiction caught by architect 3). v0.9.0 amends step 4 to "Update each story'sreview.specComplianceandreview.codeQualityfields based on signals. Do NOT writestories[].status— the parent shell owns that field." Step 1 also amended (parent now pre-marksin_progress). - US-001 + US-003 + US-004 — per-wave coordinator dispatch wired (atomic commit; all touch quantum-loop.sh):
- US-001 — outer
COORDINATOR_MODEbranch in story selection: callsnext_wave(rc=0 → wave; 1 → COMPLETE; 2 → BLOCKED); legacy path synthesizes 1-elementWAVE_STORY_IDS_JSONfor uniform downstream. Multi-storyin_progresspre-mark. Spawn block: new outer if for coordinator mode →spawn_coordinatorreturns command STRING;evalsynchronously. Legacy path verbatim.ql_wrap_subagent_dispatchsoft-fire skipped under coordinator mode (coordinator handles internal retries; respawn would re-spawn whole wave with stale args; N46 deferred).*)unknown-signal branch now applies retry accounting to ALLWAVE_STORY_IDS(architect 1's HIGH risk: prior$STORY_ID-only logic would orphan other wave membersin_progress). - US-003 —
WAVE_PASSED)bulk-updates ALL wave stories tostatus=passedvia single jq pass.WAVE_FAILED)per-story aggregation via Option A: derives outcome from coordinator-writtenreview.specCompliance+review.codeQualityfields. Stories with both passed → status=passed; else status=failed + retries.attempts++ + failureLog append. No signal-protocol change. - US-004 — replaced v0.8.1 warn-only message at line 672 with hard exit when
--coordinator --parallelboth set. Documented as policy inagents/coordinator.md§ "Interaction with --parallel" (added in v0.8.2 US-004).
- US-001 — outer
- US-002 —
lib/dag-query.sh::next_wavethin composer + 8-case tests — composes existingget_executable_stories+filter_file_conflicts(no logic duplication). 3-way exit code (0=wave, 1=COMPLETE, 2=BLOCKED) gives callers a clean rc-based dispatch (no string-sentinel parsing). Defensive type-check onget_executable_storiesoutput guards against silent contract drift. Newtests/test_next_wave.sh(18 assertions) covers happy path, COMPLETE, BLOCKED (exhausted retries on dep gate), file conflict reduces wave to one, multi-dependency gate, in_progress exclusion, cross-wave file conflict, empty stories. - US-005 — real-fire coordinator-dispatch integration tests — new
tests/test_coordinator_e2e.sh(10 assertions). Each test invokes the realbash quantum-loop.sh --coordinator ...binary against a 2-story stub plan and asserts on post-run quantum.json mutation + exit codes. NOT presence-only — the v0.8.x retrospective burned this lesson home four times. 4 cases: WAVE_PASSED happy path, WAVE_FAILED partial-pass (per-story aggregation from review fields),--coordinator --parallelrejection, COMPLETE path. - US-006 — retrospective + IDEA_REPORT_v27 + version bump 0.8.4 → 0.9.0 — this entry.
+2 new test files, +28 new assertions:
- 1 new:
tests/test_next_wave.sh(18 asserts) — DAG wave-eligibility composer - 1 new:
tests/test_coordinator_e2e.sh(10 asserts) — real-fire coordinator dispatch end-to-end
v0.9.0 ships the actual cure for N33's manual-takeover streak. The 17-cycle streak (v0.6.7..v0.8.4) was driven by orchestrator drift in single-spawn dispatch. v0.9.0 replaces single-spawn with per-wave coordinator dispatch under --coordinator opt-in. Empirical proof requires real-LLM dogfood (deferred to v0.9.1 like v0.8.0→v0.8.1's pattern); v0.9.0 ships the wires + integration tests.
Three architect agents designed the sub-problems independently before this cycle (inner-dispatch + next_wave + per-story aggregation). Their independent reviews caught the field-ownership contradiction (architect 3) and the coordinator-crash retry-accounting gap (architect 1's HIGH risk). Both became prerequisite stories (US-000, US-001 *) gating).
- US-001 + US-003 + US-004 committed atomically (all touch quantum-loop.sh; logically separable but file-coupled).
- US-006 retrospective lifted some boilerplate from v0.8.4 retrospective; the v0.8.x saga summary now lives in v0.8.x track memory not duplicated here.
6/6 user-facing stories shipped first-attempt PASS under manual takeover (18th consecutive cycle). 0 retries. Multi-cycle CSV milestone: 49 → 52 rows. G30 self-validation (21st consecutive correct LOW): tier=LOW score=20 files=8 sensitive=0 → skip; recorded with automated:true. First minor-tier in 1 cycle (immediately after v0.8.4 closed v0.8.x). Full retrospective: idea-stage/PIPELINE_REPORT_v27.md. v0.9.x backlog: idea-stage/IDEA_REPORT_v27.md.
v0.9.1 (next cycle) is the empirical proof point: real-LLM dogfood through --coordinator. The infrastructure works in stub-driven integration tests; whether it breaks the manual-takeover streak in production is the v0.9.1 validation question.
4 implementation stories closing all residual findings from the v0.8.3 multi-perspective post-merge review (architect + code-reviewer + security agents). After v0.8.4, the v0.8.x track is fully closed with zero deferred findings. v0.9.0 N42 (real per-wave coordinator dispatch) is the next minor-tier scope, awaiting operator confirmation. v0.8.4 self-validated: tier=LOW score=≤25 files=≤7 sensitive=0 → skip with automated:true. 20 consecutive LOW-tier self-validations (v0.6.5..v0.8.4).
- US-001 — PS
WAVE_FAILEDretry accounting parity (MEDIUM from v0.8.3 code-reviewer) —quantum-loop.ps1:415-418"WAVE_FAILED"arm now mirrors the bash branch's full retry-accounting jq expression (status=failed,retries.attempts++, appendfailureLogwithphase: "wave_failed"). Prior arm only clearedstartedAt, leaving an infinite-retry loop in the PS path. Same defect class as v0.8.1 US-006 fix on bash side. - US-002 — PS
$storyIddefense-in-depth regex validation (LOW from v0.8.3 security-reviewer) — Added regex validation after$storyIdextraction (around line 285) mirroringquantum-loop.sh:1471(^[A-Za-z0-9_-]+$).jq --argalready treats values as strings (no jq injection), but defense-in-depth guards against future code paths. - US-003 — lib/spawn.sh idempotency source guard (LOW from v0.8.3 code-reviewer) — Added
_QL_SPAWN_SHguard at top oflib/spawn.sh. The v0.8.3 source-gate comment claimed "lib/spawn.sh has its own source-guard"; v0.8.4 makes the comment accurate. Idempotent re-sourcing now guaranteed. - US-004 — update 4 implementer-scoped docs to reference WAVE_ signals (cosmetic from v0.8.3 architect)* —
CLAUDE.mdSignal Reference,skills/ql-execute/SKILL.mdsignal table,runners/preamble.mdsignal protocol, andtemplates/quantum-loop.ps1user-template comment all now reference WAVE_PASSED/WAVE_FAILED with appropriate context (coordinator pattern; implementer-scoped vs wave-scoped clarification). - US-005 — retrospective + IDEA_REPORT_v26 + version bump 0.8.3 → 0.8.4 — this entry.
No new tests; the source-guard idempotency was verified inline (double-source returns 0 silently with no error).
v0.8.x track FINAL CLOSE. Four cycles shipped: v0.8.0 (architectural minor — N33 closure infrastructure), v0.8.1 (validation patch — found 2 inert pieces), v0.8.2 (review hotfix — primary signal regex + CRLF + ceilings + docs), v0.8.3 (4th-layer hotfix — heuristic regex + case switch + source gate + PS parity + test tightening), v0.8.4 (final hotfix — PS retry parity + storyId validation + spawn.sh guard + 4 cosmetic doc gaps). All findings from all 4 multi-perspective post-merge reviews are now closed.
The 4-layer N33 anti-pattern saga is the strongest validation of p012 (anti-presence-only AC) in the codebase. The pattern is reusable: when introducing a recovery wrapper, opt-in flag, OR a signal that another component must recognize, search ALL parallel consumer sites before declaring closure.
- US-001 + US-002 committed atomically (same file, both small, no logical separation worth a commit boundary).
- US-004 added a comment-only annotation to
templates/quantum-loop.ps1rather than extending its signal-handling branches — the user template is implementer-scoped and never sees WAVE_* signals at runtime.
4/4 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries (17th consecutive cycle). Multi-cycle CSV milestone: 46 → 49 rows (3 advisory rows; design/prd/plan all 1 LOW). G30 self-validation (20th consecutive correct LOW): tier=LOW score=≤25 files=≤7 sensitive=0 → skip; recorded with automated:true. v0.8.x track FULLY CLOSED. v0.9.0 N42 prerequisites all met; backlog catalogued in idea-stage/IDEA_REPORT_v26.md. Full retrospective: idea-stage/PIPELINE_REPORT_v26.md.
3 implementation stories closing the fourth, fifth, and sixth layers of the N33 root-cause #1 anti-pattern (presence-only AC at parallel signal-wire sites). v0.8.2 fixed ONE signal regex; multi-perspective post-v0.8.2 review (architect + code-reviewer agents) found 3 parallel sites that were missed, plus a PowerShell parity gap and 2 trivially-passing test assertions. v0.8.3 closes them all atomically. v0.8.3 self-validated: tier=LOW score=≤25 files=≤6 sensitive=0 → skip with automated:true. 19 consecutive LOW-tier self-validations (v0.6.5..v0.8.3).
- US-001 — close 3 WAVE_ wire sites in bash code (HIGH + HIGH + MEDIUM)* — (1)
lib/signal-heuristics.sh:33regex extended to 6 signals (mirrorslib/runner.sh:283); the heuristic-fallback path no longer drops wave signals on non-Claude runners. (2)quantum-loop.sh:1570case "$SIGNAL_RESULT"switch gains explicitWAVE_PASSED)andWAVE_FAILED)branches before the*)wildcard — story-progressing and retry semantics respectively (v0.9.0 N42 may refine wave-to-story mapping). (3)quantum-loop.sh:687source gate now also fireslib/spawn.shandlib/dag-query.shunderCOORDINATOR_MODE=true(parallel toPARALLEL_MODE); v0.9.0 N42'sspawn_coordinator()will be defined at call time. - US-002 — PowerShell parity (MEDIUM) —
quantum-loop.ps1:364regex extended to 6 signals; switch block gains"WAVE_PASSED"and"WAVE_FAILED"case arms before thedefaultbranch. PowerShell entry-point preserved in lockstep with bash. - US-003 — tighten test_signal_parsing.sh negative assertions (LOW) — Tests 9 and 10 each gain an explicit
SIGNAL_CONFIDENCE != "exact"assertion. Guards against the trivial-pass regression where a wholly-broken regex would still make the negative tests pass via the no-signal default. Test count: 13 → 15. - US-004 — retrospective + IDEA_REPORT_v25 + version bump 0.8.2 → 0.8.3 — this entry.
+2 new assertions (test_signal_parsing.sh non-exact-confidence guards); 0 new test files.
v0.8.3 closes the FOURTH and final layer of the N33 root-cause #1 anti-pattern. The pattern saga across v0.8.x:
| Layer | Cycle | Defect | Caught by |
|---|---|---|---|
| 1 (function) | v0.7.x | wrap_orchestrator_dispatch defined; zero callers |
v0.8.0 5-agent research synthesis |
| 2 (caller) | v0.8.0 | ql_wrap_subagent_dispatch defined; zero callers in dispatch loop |
v0.8.1 dogfood (static grep) |
| 3 (signal parser) | v0.8.0 | runner_parse_output regex missing WAVE_* |
v0.8.2 architect review |
| 4 (parallel consumer sites) | v0.8.2 | signal-heuristics.sh regex + case switch + source gate + PowerShell mirror all missing WAVE_* |
v0.8.3 architect + code-reviewer review |
p012 (anti-presence-only AC) reinforced again. The pattern is now mature: when introducing a recovery wrapper, opt-in flag, OR a signal that another component must recognize, search ALL parallel consumer sites before declaring closure — not just the primary one.
- US-001 grew to 3 atomic fixes in 1 story (was originally proposed as ≥3 separate stories in the design). Bundling kept the diff coherent and reverted-as-a-unit.
- The architect's commit-order recommendation (US-002 DAG → US-001 dispatch → US-003 signal switch) didn't apply to this v0.8.3 cycle since v0.8.3 doesn't ship the dispatch wire — that's still v0.9.0 N42.
4/4 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries (16th consecutive cycle). Multi-cycle CSV milestone: 43 → 46 rows (3 advisory rows; design/prd/plan all 1 LOW). G30 self-validation (19th consecutive correct LOW): tier=LOW score=≤25 files=≤6 sensitive=0 → skip; recorded with automated:true. N33 root-cause #1 fully closed across 4 layers; v0.9.0 N42 prerequisites genuinely complete (no more parallel wire sites discovered). Full retrospective: idea-stage/PIPELINE_REPORT_v25.md. v0.9.0+ backlog: idea-stage/IDEA_REPORT_v25.md.
5 stories addressing the multi-perspective post-v0.8.1 review (architect + code-reviewer + security-reviewer sub-agents). Closes 2 CRITICAL items (CRLF hygiene + signal-protocol gap) plus 3 supporting follow-ups (wall-clock flake + architectural-debt docs + retrospective). Sets up v0.9.0 N42 (real per-wave coordinator dispatch) to land cleanly. v0.8.2 self-validated: tier=LOW score=25 files=10 sensitive=0 → skip with automated:true. 18 consecutive LOW-tier self-validations (v0.6.5..v0.8.2).
- US-001 — fix CRLF on copilot-hooks.sh + test_runner.sh; add
.gitattributes— v0.8.1 US-004 imported these files from a Windows-authored stash; they shipped CRLF-encoded. Strict Linux/CI environments reject the shebang withbad interpreter: No such file or directory. Re-encoded both files viased s/\r$//(line-ending-only delta; verified by 294-in/294-out diff stat on test_runner.sh). New.gitattributesenforces*.shand*.bashtext eol=lfto prevent recurrence, with text annotation for*.md/*.txt/*.json/*.yml/*.yaml. - US-002 — extend
runner_parse_outputto recognizeWAVE_PASSED/WAVE_FAILED— v0.9.0 N42 prerequisite. Architect post-v0.8.1 review caught thatrunner_parse_output(lib/runner.sh:283) regex matched only the 4 existing signals (STORY_PASSED|STORY_FAILED|COMPLETE|BLOCKED). The coordinator agent (agents/coordinator.md:36-37) emitsWAVE_PASSED/WAVE_FAILED. Without the parser extension, every coordinator dispatch in v0.9.0 N42 would route to the unknown-signal failure branch — exactly the N33 root-cause #1 anti-pattern repeating one layer deeper. Regex extended to 6 signals (additive only; existing 4-signal recognition preserved). Newtests/test_signal_parsing.sh(13 assertions) covers all 6 signals + regression guards (whitespace tolerance, last-signal-wins) + negative tests (substringWAVE_PASSINGand bare untaggedWAVE_PASSEDdo NOT match). - US-003 — bump Test 5 wall-clock ceiling 6s → 10s —
tests/test_orchestrator_liveness.shTest 5 (interval_sec=0 guard) was intermittently failing on Git Bash with elapsed=7s. Same Git Bash subprocess-fork-overhead jitter class as v0.8.1 US-003's Test 2 (12 → 20s) fix. Bumped to 10s with reference annotation toreferences/test-wallclock-baselines.md. - US-004 — document
--coordinator --parallelrejection + quantum.json field ownership inagents/coordinator.md— Architect surfaced two architectural-debt items needing documentation before v0.9.0 ships per-wave dispatch: (1)--coordinatorand--parallelare mutually exclusive (parallel path already does worktree-based wave dispatch; coordinator spawns implementer subagents internally; combining produces nested worktree-of-worktrees that doesn't compose); (2) quantum.json field ownership — coordinator ownsexecution.completedWaves/progress/stories[].review.*; parent loop ownsstories[].status/stories[].retries.*/startedAt/updatedAt. v0.9.0 N42 will enforce the rejection at CLI parse time and respect the ownership boundary. - US-005 — retrospective + IDEA_REPORT_v24 + version bump 0.8.1 → 0.8.2 — this entry.
+1 new test file, +13 new assertions (signal parsing); 1 ceiling adjustment (test_orchestrator_liveness Test 5):
- 1 new:
tests/test_signal_parsing.sh(13 asserts) — exact-match coverage for all 6 quantum signals; regression guards + negative tests. - Ceiling bump: liveness Test 5 (6 → 10s).
v0.8.2 closes the v0.8.1 review findings and lays the v0.9.0 N42 foundation. The signal-protocol extension in US-002 closes the third layer of the N33 root-cause #1 anti-pattern (after v0.8.0 US-001 added the wrapper definition and v0.8.1 US-001 added the call site). v0.9.0 can now wire spawn_coordinator into the dispatch loop with confidence that WAVE_PASSED/WAVE_FAILED will route correctly.
- US-002 expanded slightly to cover not just regex extension but a 7-line comment block above the function explaining the v0.9.0 framing — useful future-context for the next architect to read the file.
- US-001 added more
.gitattributesrules than strictly required (5 file-type annotations vs 2 strict requirements). Over-engineering on a one-time hygiene fix.
5/5 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries (15th consecutive cycle). Multi-cycle CSV milestone: 40 → 43 rows (3 advisory rows; design/prd/plan all 1 LOW). G30 self-validation (18th consecutive correct LOW): tier=LOW score=25 files=10 sensitive=0 → skip; recorded with automated:true. v0.9.0 N42 prerequisites met: signal parser recognizes 6 signals; CRLF hygiene enforced; architectural-debt docs in place. Full retrospective: idea-stage/PIPELINE_REPORT_v24.md. v0.9.0+ backlog: idea-stage/IDEA_REPORT_v24.md.
5 stories validating v0.8.0's N33 closure + 2 inline fixes surfaced during pre-flight. Honest finding: v0.8.0 shipped two pieces of inert infrastructure with presence-only ACs — exactly N33 root cause #1 repeating one layer deeper. v0.8.1 dogfood caught both and ships the minimal viable wires + a regression-guard test. v0.8.1 self-validated: tier=LOW score=25 files=10 sensitive=0 → skip with automated:true. 17 consecutive LOW-tier self-validations (v0.6.5..v0.8.1).
- US-003 — Test 2 wall-clock ceiling 12s → 20s —
tests/test_orchestrator_liveness.shTest 2 was failing 33/34 on Git Bash with elapsed=18s against the 12s ceiling. Bumped to 20s with reference annotation toreferences/test-wallclock-baselines.md(Git Bash fork-overhead jitter documented there). Verified 34/34 passes. - US-004 — copilot-hooks defensive flags + paired test updates —
runners/hooks/copilot-hooks.sh::pre_spawnnow appends 5 additional defensive flags:--max-autopilot-continues 0(cap autopilot recursion),--silent(reduce CLI noise),--no-color(strip ANSI for clean stdout parsing),--stream off(disable progressive output),--no-remote(disable remote control surface).tests/test_runner.shpaired updates: copilot_hook test gains 5 assertions; codex test now references manifest-driven flag arrays.tests/test_copilot_dispatch.shceiling bumped 60s → 150s for real-network LLM dispatch. Verified test_runner.sh 43/43, test_copilot_runner_smoke.sh 5/5, test_copilot_dispatch.sh 3/3. - US-001 — wire
ql_wrap_subagent_dispatch+COORDINATOR_MODEin quantum-loop.sh — The dogfood finding. Static-analysis grep againstquantum-loop.shexposed two defects in <30s: (1)COORDINATOR_MODEwas set by--coordinator/--legacy-orchestratorbut never read; (2)ql_wrap_subagent_dispatchwas defined but had zero callers. Both are exactly the same anti-pattern as N33 root cause #1. Fixes: (a)COORDINATOR_MODE=truenow emits a WARN to stderr at startup; (b)ql_wrap_subagent_dispatch 5 1 ""is invoked post-runner_parse_outputwhenSIGNAL_RESULTis empty (drift suspect). Newtests/test_v081_wiring.sh(4 assertions) guards against the "presence-only AC" anti-pattern: counts CALL sites not just definitions, and smoke-tests the actual--coordinatorinvocation against a stub repo. - US-002 — capture v0.8.1 dogfood synthesis —
idea-stage/dogfood-v0.8.1-findings.mdincludes a "Final verdict" block addressing each of the 4 minimum claims (no UNKNOWN entries). Two BROKEN→fixed (wrap_orchestrator_dispatchinvoked: BROKEN→WORKED; coordinator dispatch: BROKEN→DEGRADED with WARN). Two remain DEGRADED pending the larger N42 work (real per-wave dispatch). - US-005 — retrospective + IDEA_REPORT_v23 + version bump 0.8.0 → 0.8.1 — this entry.
- US-006 — post-PR-review inline fix (correctness) — Soliton
/soliton:pr-reviewon PR #84 caught two CRITICAL findings (confidence 97 + 88) that v0.8.1 itself was supposed to surface — same anti-pattern repeating one layer deeper. Finding 1: the v0.8.1 US-001 guardif [[ -z "$SIGNAL_RESULT" ]]was always false becauserunner_parse_outputalways setsSIGNAL_RESULTbefore returning. Fix: corrected to[[ "$SIGNAL_RESULT" == "STORY_FAILED" && "$SIGNAL_CONFIDENCE" != "exact" ]]— the actual "drift suspect" condition. Finding 3: the*(unknown-signal) case marked status=failed without incrementingretries.attempts, creating an effective infinite retry loop. Fix: mirrored theSTORY_FAILEDbranch's retry accounting. Test 1b added totests/test_v081_wiring.shto assert the guard is the reachable form, NOT the dead-zform — guards against future regression. Soliton security agent: FINDINGS_NONE.
+1 new test file, +5 new assertions (test_runner.sh copilot_hook), +ceiling adjustments (test_orchestrator_liveness Test 2 + test_copilot_dispatch Test 1):
- 1 new:
test_v081_wiring.sh(4 asserts) — anti-presence-only regression guard - Ceiling bumps: liveness Test 2 (12 → 20s), copilot dispatch Test 1 (60 → 150s)
- Paired: test_runner.sh +5 copilot_hook assertions
v0.8.1 is the diagnostic; v0.9.0 should be the cure. v0.8.0's N33 closure was structurally complete but functionally inert. v0.8.1 ships minimal viable wires + a regression-guard test, but the real coordinator-driven dispatch (per-wave spawn, bounded context per invocation) remains future work. Tracked as N42 in idea-stage/IDEA_REPORT_v23.md.
- US-001 grew from "small dogfood + observe" to "ship a real fix" per the PRD's escalation clause. Wire-fix scope was bounded (~15 LOC across
quantum-loop.sh+ 1 new test); did not need a full re-architecting. - US-004 expanded to also bump
tests/test_copilot_dispatch.shceiling 60→150s (real-network jitter; not strictly the new defensive flags' fault but exposed by them).
5/5 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries (14th consecutive cycle). Multi-cycle CSV milestone: 37 → 40 rows (3 advisory rows; design/prd/plan all 1 LOW). G30 self-validation (17th consecutive correct LOW): tier=LOW score=25 files=10 sensitive=0 → skip; recorded with automated:true. N39 dogfood verdict: v0.8.0 N33 closure was inert; v0.8.1 ships the wires + regression-guard test; v0.9.0+ will deliver the real per-wave dispatch (N42). Full retrospective: idea-stage/PIPELINE_REPORT_v23.md. v0.9.0+ backlog: idea-stage/IDEA_REPORT_v23.md.
7 stories closing N33 (12+ consecutive cycles of orchestrator drift requiring manual takeover). First minor-tier release since v0.7.0 (8 cycles). Twelfth multi-cycle populated-CSV run. v0.8.0 self-validated: tier=LOW score=25 files=22 sensitive=0 → skip with automated:true. 16 consecutive LOW-tier self-validations (v0.6.5..v0.8.0).
Root cause synthesis (5 parallel research agents): the recovery infrastructure (5 layers shipped v0.6.8..v0.7.2) was inert — wrap_orchestrator_dispatch had zero production callers. quantum-loop.sh did not source lib/orchestrator-liveness.sh. The agents/orchestrator.md agent definition consumed ~30% of context window (1,743 lines / ~39,900 tokens) before any work began. No per-story re-prompt for orchestrator. Signal protocol asymmetry between shell-side and agent-side parsing. git rev-parse HEAD mismatch in worktree mode.
- US-001 — wire recovery infrastructure into
quantum-loop.sh— sourceslib/orchestrator-liveness.sh; newql_wrap_subagent_dispatchhelper for callers (US-004 coordinator + external supervisor scripts). Recovery layers are now reachable in production code. - US-002 — worktree-aware
poll_orchestrator_commits— extends signature to accept optionalWORKTREE_PATH4th arg (andwrap_orchestrator_dispatch3rd arg). Eliminates false-positive STALE in worktree-parallel mode where orchestrator commits land under.ql-wt/<story>/on feature branches. Tests 12 + 13 (4 new assertions on worktree-path STALE + LIVE detection). 22 → 26 liveness assertions. - US-003 — modularize
agents/orchestrator.md— extracts 13 optional/conditional sections toagents/orchestrator-modules/*.md(slop-cleanup, dead-code-detection, intent-graph, skeleton-drift, constant-scan, hyclone, type-audit, full-feature-review, init-and-routing, observations, reground, contract-promotion, quantum-json-fields). Each section inorchestrator.mdreplaced with a stub reference + activation guard. 1,743 → 1,007 lines (42% reduction), exceeds initial AC drift target (≤700 lines) but delivers substantive context-window relief. - US-004 — per-wave coordinator pattern — new
agents/coordinator.mddefines a thin per-wave subagent (one wave per invocation, then exits). Newlib/spawn.sh::spawn_coordinator+build_coordinator_prompthelpers. New--coordinatoropt-in flag inquantum-loop.sh(default--legacy-orchestratorpreserves existing single-spawn behavior). v0.8.1 dogfood will validate end-to-end. - US-005 — signal protocol unification — new
runner_parse_subagent_outputwrapper inlib/runner.shroutes subagent output through the canonicalrunner_parse_outputchain (claim-check + heuristic fallback).agents/orchestrator.mdandagents/coordinator.mdupdated to cite the wrapper instead of ad-hoc<quantum>...</quantum>grep on TaskOutput. - US-006 — integration tests — new
tests/test_quantum_loop_recovery.sh(5 asserts) verifies recovery wiring fires on stale stub orchestrator. Newtests/test_coordinator_dispatch.sh(7 asserts) verifies coordinator agent file + spawn helpers + flag parsing. Plus +4 worktree-path assertions in extended liveness test. - US-007 — retrospective + IDEA_REPORT_v22 + version bump 0.7.10 → 0.8.0 — first minor since v0.7.0 (8 cycles).
+16 new assertions across 3 test files:
- 1 extended:
test_orchestrator_liveness.sh22 → 26 (+4) - 2 new:
test_quantum_loop_recovery.sh(5) +test_coordinator_dispatch.sh(7)
This is the first new architectural concept since the multi-runner foundation (v0.7.4 N5) — introduces (a) production wiring for previously-inert infrastructure, (b) a per-wave coordinator pattern (vs the long-running monolithic orchestrator), (c) modular orchestrator definition (13 lazy-loaded modules), (d) unified signal handling. Genuinely minor-tier, validated by the operator.
- US-003 AC said
agents/orchestrator.md≤ 700 lines; actual = 1,007 lines. The 42% reduction is substantive but doesn't hit the original target. Further reduction would require extracting Step 3B (Parallel Execution) core logic, which carries regression risk and was deferred. - US-006 ships unit/integration coverage; full coordinator-pattern dogfood validation is deferred to v0.8.1 (running a real cycle through the new path).
7/7 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries. Multi-cycle CSV milestone: 33 → 36 rows (3 advisory rows). Aggregate severity now includes 1 MEDIUM finding (first MEDIUM in 8+ cycles — healthy signal for substantive minor work). G30 self-validation: tier=LOW score=25 files=22 sensitive=0 → skip; recorded with automated:true. N33 root-cause closure: 12+ consecutive manual-takeover cycles addressed via 4 layers of fixes; v0.8.1 validation will confirm whether auto-recovery actually fires in a real cycle. Full retrospective: idea-stage/PIPELINE_REPORT_v22.md. v0.8.x backlog: idea-stage/IDEA_REPORT_v22.md.
7 stories closing N35 (real-task dispatch beyond version probe) + multi-runner test layer reframe + v0.7.9 housekeeping. Patch-tier; operator-driven extension of v0.7.4 N5 + v0.7.7 N30 multi-runner foundation. Eleventh multi-cycle populated-CSV run (30 → 33 rows). v0.7.10 self-validated: tier=LOW score=22 files=9 sensitive=0 → skip with automated:true. 15 consecutive LOW-tier self-validations (v0.6.5..v0.7.10).
- US-001 —
runner_dispatchwrapper —lib/runner.shaddsrunner_dispatch <runner_name> <prompt>function composingrunner_load+runner_build_cmd+eval. Newtests/test_runner_dispatch.shcovers function presence, mock-echo manifest dispatch, unknown-runner error, and missing-args error (5 assertions). - US-002 — codex dispatch test + manifest fix —
tests/test_codex_dispatch.sh(skip-aware on missing codex CLI). Real-task dispatch uncovered codex manifest drift:runners/codex.jsonupdated to use theexecsubcommand and--full-autoflag (codex CLI removed-qand--approval-mode). Test asserts rc=0 + non-empty stdout + ≤60s wall-clock. - US-003 — copilot dispatch test —
tests/test_copilot_dispatch.shmirrors codex pattern; skip-aware on missing copilot CLI. - US-004 — multi-runner E2E dispatch test —
tests/test_multi_runner_dispatch_e2e.shiterates claude+codex+copilot, dispatching a tiny prompt against each. Three accounting buckets:dispatched(rc=0 + non-empty),skipped(binary not on PATH),timed_out(rc=124 fromtimeout --kill-after=5 90wrapper — environmental, not a code defect). Final invariant: all 3 runners accounted for. 5/5 assertions on this dogfood machine (claude+codex dispatched, copilot timed-out due to rate-limit). - US-005 — multi-runner test layers documentation —
CLAUDE.mdgains a "Multi-runner test layers" subsection distinguishing smoke (version-probe health check) from dispatch (real-task end-to-end).lib/runner.shheader lists all three top-level entry-points (runner_load,runner_build_cmd,runner_dispatch). - US-006 — smoke test docstring reframe —
tests/test_codex_runner_smoke.shandtests/test_copilot_runner_smoke.shheaders updated to identify themselves as the SMOKE/health-check layer and cross-link to the new DISPATCH layer. - US-007 — retrospective + v0.7.9 housekeeping —
idea-stage/PIPELINE_REPORT_v21.md+idea-stage/IDEA_REPORT_v21.md. Commitsdocs/plans/2026-04-28-v0.7.9-bundle-design.md+tasks/prd-v0.7.9-bundle.mdthat were authored in v0.7.9 cycle but never staged (untracked-design-prd housekeeping, recurring pattern from v0.7.5/v0.7.6 N29/N34 audits).
+5 new test files, +12 new assertions approximate:
- 4 new:
test_runner_dispatch.sh(5 asserts),test_codex_dispatch.sh(3 asserts),test_copilot_dispatch.sh(3 asserts),test_multi_runner_dispatch_e2e.sh(5 asserts)
runners/codex.jsoninvocation:headlessFlags: ["-q"]→["exec"];autoApproveFlags: ["--approval-mode", "full-auto"]→["--full-auto"](codex CLI flag migration).
7/7 user-facing stories shipped first-attempt PASS under manual takeover. 0 retries. Multi-cycle CSV milestone: 30 → 33 rows. G30 self-validation (15th consecutive correct LOW classification): tier=LOW score=22 files=9 sensitive=0 → skip; recorded with automated:true. Real-task dispatch validated end-to-end for claude + codex; copilot timed-out due to environmental rate-limit (not a code defect — TIMEOUT bucket added to E2E for visibility). Full retrospective: idea-stage/PIPELINE_REPORT_v21.md. v0.9.0+ backlog: idea-stage/IDEA_REPORT_v21.md.
3 stories closing 6 issues from external code review of lib/multi-runner-manifest.sh (v0.7.4 N5 shipment). Patch-tier; operator-driven reactive (not autonomous loop). Loop-stop signal in IDEA_REPORT_v19 stands for autonomous track. Tenth multi-cycle populated-CSV run — ledger 27 → 30 rows. v0.7.9 self-validated: tier=LOW score=5 files=2 sensitive=0 → skip with automated:true. 14 consecutive LOW-tier self-validations (v0.6.5..v0.7.9).
- US-001 — multi-runner-manifest hardening (Issues 1-4 + 6) —
lib/multi-runner-manifest.sh5 fixes: (1) yq backend distinguishes empty-success from parse-failure (empty/null output emits "is empty or has no top-level structure"); (2) python backend captures stderr to a tmpfile separately from stdout, preventing JSON contamination by DeprecationWarnings or other diagnostic output; (3) shell parser strips leading whitespace before pattern matching, accepting tab-indented and arbitrary-space-indented manifests; (4) shell parser rtrims trailing whitespace from each field value via new_rtrimhelper; (6)validate_manifestdistinguishes jq exit codes (rc=1 → missing-runners-or-wrong-type; rc≥2 → "malformed JSON input"). - US-002 — backend-selection tests (Issue 5) —
lib/multi-runner-manifest.shadds 3 test-only env-var hooks:MR_DISABLE_YQ=1skips yq even if installed;MR_DISABLE_PYTHON=1skips python+yaml;MR_DEBUG=1emits[manifest] backend: <name>to stderr per parse.tests/test_multi_runner_manifest.shadds Tests 7-10: Test 7 (yq backend, skips if yq not installed), Test 8 (python backend withMR_DISABLE_YQ, skips if no python+yaml), Test 9 (shell backend with both disable flags), Test 10 (tab-indented + trailing-whitespace fixture forced through shell backend, verifies Issues 3+4 end-to-end). 6 → 14 multi-runner-manifest test assertions.
+8 new assertions in 1 extended test file:
- 1 extended:
test_multi_runner_manifest.sh6 assertions → 14 (+8 from Tests 7-10 + +1 from Test 10's trim assertion)
3/3 user-facing stories shipped first-attempt PASS under manual takeover (12th consecutive cycle if counting autonomous runs in between). 0 retries. Multi-cycle CSV milestone: 27 → 30 rows. G30 self-validation (14th consecutive correct LOW classification): tier=LOW score=5 files=2 sensitive=0 → skip; recorded with automated:true. Operator-driven reactive distinction: this cycle does NOT contradict IDEA_REPORT_v19's "stop the patch-tier loop" recommendation — that signal targets autonomous loop-discovered patches. External code review by the operator is materially different and is allowed to surface follow-up patch work. Full retrospective: idea-stage/PIPELINE_REPORT_v20.md. v0.9.0+ backlog: idea-stage/IDEA_REPORT_v20.md.
3-story compact reactive closing N36 + N37 from IDEA_REPORT_v18. Trivial scope (1-line lib edit + 7-line yaml header note). G30 score=9 LOW (13th consecutive).
- N36 — runner_load error message hint (US-001) —
lib/runner.sh::runner_loadnow appends a hint sentence when given an invalid name (e.g., a manifest path): "Hint: pass the runner name (e.g., 'codex'), not the manifest path. The manifest is auto-resolved from runners/.json." Closes the v0.7.7 US-001/US-002 onboarding paper-cut. - N37 — manifest.example.yaml deprecation note (US-002) —
runners/manifest.example.yamlheader now explicitly states it is documentation-only;runners/*.jsonper-runner manifests are the runtime-authoritative source. Surfaced in v0.7.7 PIPELINE_REPORT_v18.
v0.7.4 → v0.7.5 → v0.7.6 → v0.7.7 → v0.7.8 has closed every reactive item from the v0.7.4 dogfood + v0.7.7 multi-runner integration. Patch-tier track is again drained. v0.9.0 minor needs operator scope (N35 real-task dispatch as natural anchor).
4-story patch-tier closing N30 (v0.7.7 anchor). First end-to-end smoke validation of codex (tier=tested) + copilot (tier=experimental) CLI runners against the existing multi-runner infrastructure. v0.7.7 self-validation: tier=LOW score=7. 12 consecutive LOW-tier self-validations (v0.6.5..v0.7.7).
- Codex CLI smoke test (US-001) — NEW
tests/test_codex_runner_smoke.sh(5 assertions). Loadsrunners/codex.jsonviarunner_load codex; assertsRUNNER_NAME=codex+RUNNER_BINARY=codex+RUNNER_TIER=tested; verifies_provider_version codexreturns non-empty (e.g.,codex-cli 0.118.0); wall-clock ≤15s. Skip-pass ifcodexCLI absent. - Copilot CLI smoke test (US-002) — NEW
tests/test_copilot_runner_smoke.sh(5 assertions). Mirrors US-001 for copilot; assertsRUNNER_TIER=experimental; verifies version (e.g.,GitHub Copilot CLI 1.0.37). - Routing E2E multi-runner (US-003) —
tests/test_routing_e2e.shTest 5:resolve_routing claude codex copilotproduces 3-distinct-providers snapshot withversions.codex+versions.copilotpopulated. 6 → 9 assertions. Skip-pass when any CLI absent.
+13 new assertions across 2 new test files + 1 extended:
- 2 new:
test_codex_runner_smoke.sh(5),test_copilot_runner_smoke.sh(5) - 1 extended:
test_routing_e2e.sh6 → 9 (+3 Test 5 multi-runner)
This is the first end-to-end validation that quantum-loop integrates with non-claude runners. Multi-runner is now real: codex (tested) + copilot (experimental). Full retrospective: idea-stage/PIPELINE_REPORT_v18.md. Next: v0.9.0 minor for real-task dispatch (N35).
2-story compact reactive patch closing N34 (untracked design+PRD docs audit). Smallest cycle yet (2 stories, ~15 min wall-clock). 11 consecutive LOW-tier self-validations expected.
- N34 —
--audituntracked-design-prd check (US-001) —quantum-loop.shadds_audit_untracked_design_prd_docshelper usinggit ls-files --others --exclude-standard docs/plans/ tasks/. WARN-level when matching files are untracked; OK otherwise. Mirrors v0.7.5 N29 csv-uncommitted pattern. +2 audit tests intests/test_audit.sh. Closes the v0.7.4 + v0.7.5 process gap where design+PRD docs were repeatedly authored locally but never staged before squash-merge (caught at v0.7.5 housekeeping).
tests/test_audit.sh+2 (Tests 39 + 39b)quantum-loop.sh+1 helper + 1 ROWS invocation
v0.7.4 (dogfood) → v0.7.5 (N29+N31) → v0.7.6 (N34) sequence has now closed all known process gaps surfaced by the dogfood cycle. Patch-tier track is genuinely drained. v0.8.0 minor-tier requires operator-defined substantive scope (N30 multi-runner first integration is the natural anchor). Full retrospective: idea-stage/PIPELINE_REPORT_v17.md.
3-story compact reactive patch closing 2 LOW process gaps surfaced in v0.7.4 dogfood. v0.7.5 self-validation: tier=LOW score=7. 10 consecutive LOW-tier self-validations (v0.6.5..v0.7.5).
- N29 —
--auditcsv-uncommitted check (US-001) —quantum-loop.shadds_audit_csv_uncommittedhelper. WARN-level whenmetrics/pre-impl-review-findings.csvhas uncommitted changes; OK otherwise. +2 audit tests intests/test_audit.sh. Closes the v0.7.2/v0.7.3 process gap where advisory-hook CSV updates never reached master. - N31 —
/ql-specSKILL grep-verify instruction (US-002) —skills/ql-spec/SKILL.mdStep 1 (Gather Context) adds bullet 5 with grep/ls verification instruction for cited file paths +[NEW]annotation convention. Doc-only edit (1 line). Closes v0.7.4 US-004 PRD-AC-drift class of bugs.
tests/test_audit.sh+2 (Tests 38 + 38b)quantum-loop.sh+1 audit check + 1 helper
3/3 stories shipped first-attempt PASS under manual takeover (8th consecutive cycle). 0 retries. Multi-cycle CSV milestone: 24 → 27 committed rows. G30 self-validation (10th consecutive LOW): tier=LOW score=7 — smallest score yet (4 files at audit time). Full retrospective: idea-stage/PIPELINE_REPORT_v16.md. v0.8.0+ backlog: idea-stage/IDEA_REPORT_v16.md.
6 user-facing changes shipping operator combined scope (#1/2/3/4/6/7) framed as patch-tier + 1 retrospective. First end-to-end dogfood of /ql-brainstorm → /ql-spec → /ql-plan skill pipeline at v0.7.x scale. v0.7.4 diff self-validated: tier=LOW score=22 files=9 sensitive=0 → skip recorded with automated:true. 9 consecutive LOW-tier self-validations (v0.6.5..v0.7.4).
- N25 —
QL_RESPAWN_CMDreal-CLI smoke test (US-001) —tests/test_orchestrator_liveness.shTest 11: detectsclaudeviacommand -v. Skip-pass branch emits stderr WARN +.handoffs/n25-deferred.mdwhen absent; present-path setsQL_RESPAWN_CMD="claude --version", triggers stale, asserts rc=0 + version regex + wall-clock ≤15s. 27 → 30 liveness assertions. - Provider-routing E2E (US-002) — NEW
tests/test_routing_e2e.shwith 4 tests coveringresolve_routing→write_routing_snapshot→read_routing_snapshotround-trip + empty/missing fallback paths. 6 assertions. Complements existing unit-leveltest_per_role_routing.sh. - Sensitive-path bundle test fixture (US-003) —
tests/test_deep_review_dispatch.shTest 9: real-diff fixture withauth/login.js,config/.env,payment/charge.js,config/api-token.js→ score=38 tier=MEDIUM. Differs from N19 Test 8 by exercising the full git diff construction step. 19 → 22 dispatch assertions. - Worktree-isolation re-test (US-004) —
tests/test_type_audit.shadds worktree-style divergence test: 2 simulated worktrees each definingHandoffEnvelopewith conflicting fields →grep_duplicate_definitionssurfaces both. Discovered + reconciled PRD path bugs (test_worktree_isolation.shandlib/type-auditor.shdon't exist; real names aretest_type_audit.sh+lib/type-audit.sh). - Multi-runner-manifest foundation (US-005) — NEW
lib/multi-runner-manifest.sh(parse/validate/list, 3-tier backend chain: yq → python+yaml → handcrafted shell parser) + NEWrunners/manifest.example.yaml(claude/codex/gemini-cli stubs) + NEWtests/test_multi_runner_manifest.sh(6 tests, 10 assertions). Foundation only — no actual runner integrations (deferred to v0.8.0). Includes cygpath translation for Windows-python path-with-spaces compatibility. - G22 third calibration pass (US-006) — NEW
references/severity-rubric-calibration-v0.7.4.md(8-committed-cycle / 24-row / 58-finding baseline). Documents CSV-commit gap (v0.7.2 + v0.7.3 hooks fired but never reached master). Patch-tier-skew hypothesis confirmed (patch HIGH=4.0% vs minor HIGH=12.5%). Plan-review still 12/12 LOW. CLAUDE.md ref updated v0.7.2 → v0.7.4.
+25 new assertions across 5 affected test files + 2 new files + 1 new lib + 1 new manifest:
- 4 extended:
test_orchestrator_liveness.sh27 → 30 (+3 N25);test_deep_review_dispatch.sh19 → 22 (+3 sensitive-path);test_type_audit.sh+3 (worktree-divergence);CLAUDE.mdref updated. - 2 new:
test_routing_e2e.sh(6 assertions);test_multi_runner_manifest.sh(10 assertions). - 1 new lib:
lib/multi-runner-manifest.sh. - 1 new schema:
runners/manifest.example.yaml.
6/7 stories shipped first-attempt PASS, 1 second-attempt PASS (US-005 yaml-backend cygpath fix). 0 retries beyond first inline fix. First end-to-end skill-pipeline dogfood at v0.7.x scale: /ql-brainstorm + /ql-spec + /ql-plan skills exercised; /ql-execute deferred to manual-takeover (7th consecutive cycle of manual-takeover, consistent with established 100% drift baseline). Per-story dual-review (soliton + copilot:copilot-rescue) consolidated to PR-time. Multi-cycle CSV milestone: 21 → 24 committed rows (v0.7.2/v0.7.3 hooks never reached master — gap captured for v0.7.5+ recovery). Aggregate across 8 committed cycles: 58 findings (0 critical / 3 high / 13 medium / 42 low; LOW share 72.4%). G30 self-validation (9th consecutive correct LOW classification): tier=LOW score=22 files=9 sensitive=0 → skip; recorded with automated:true. Full retrospective: idea-stage/PIPELINE_REPORT_v15.md. v0.8.0+ backlog: idea-stage/IDEA_REPORT_v15.md.
1 user-facing change closing N28 (3 test-adequacy gaps surfaced by internal code review of v0.7.2) + 1 housekeeping/retrospective story. Ninth multi-cycle populated-CSV run — ledger 24 → 27 rows. Patch-tier; 2-story compact bundle (smallest yet). v0.7.3 self-validated: tier=LOW score=2 files=1 sensitive=0 → skip recorded with automated:true (US-001-only diff; US-002 docs/manifests bumps the file count). 9 consecutive LOW-tier self-validations (v0.6.5..v0.7.3).
- N28 — test-adequacy fixes (US-001) —
tests/test_orchestrator_liveness.shTest 8 gains a 4th assertion grepingout8for[QL-EXECUTE] orchestrator-stale — respawning via QL_RESPAWN_CMD(lib's stderr diagnostic). Test 10 gains 2 new assertions: wall-clock guard(( elapsed <= 10 ))matching Test 8 pattern, and grep forQL_RESPAWN_CMD exited 42 — respawn may have failed(lib's failing-respawn diagnostic added in v0.7.2 soliton fix). 24 → 27 liveness tests. First v0.7.x cycle triggered by internal code review (vs operator/soliton). - Housekeeping + retrospective (US-002) —
docs/plans/2026-04-28-v0.7.2-bundle-design.mdandtasks/prd-v0.7.2-bundle.mdcommitted (housekeeping; were authored during v0.7.2 cycle but never staged). PIPELINE_REPORT_v14 + IDEA_REPORT_v14 written. CHANGELOG [0.7.3] entry. 3-manifest version bump 0.7.2 → 0.7.3.
+3 new assertions in 1 modified test file:
- 1 extended:
test_orchestrator_liveness.sh24 → 27 (+3 N28 — Test 8 +1 / Test 10 +2)
2/2 user-facing stories shipped first-attempt PASS under manual takeover (7th consecutive cycle). 0 retries. Multi-cycle CSV milestone: 24 → 27 rows. Aggregate across 9 cycles: 61 findings (0 critical / 3 high / 13 medium / 45 low; LOW share 73.8%). G30 self-validation (9th consecutive correct LOW classification): tier=LOW score=2 files=1 sensitive=0 → skip; recorded with automated:true. Smallest bundle yet (2 stories) — patch-tier track maturity continues. First internal-review-driven cycle — N28 was surfaced by an internal code review pass on v0.7.2 (vs operator/soliton in prior cycles). Full retrospective: idea-stage/PIPELINE_REPORT_v14.md. v0.8.0+ backlog: idea-stage/IDEA_REPORT_v14.md.
2 user-facing changes closing the IDEA_REPORT_v12 v0.7.2 slate (N24, G22-second-pass) + 1 retrospective. Eighth multi-cycle populated-CSV run — ledger 21 → 24 rows. Patch-tier; 3-story compact bundle. v0.7.2 diff self-validated: tier=LOW score=10 files=4 sensitive=0 → skip recorded with automated:true. 8 consecutive LOW-tier self-validations (v0.6.5..v0.7.2).
- N24 —
wrap_orchestrator_dispatchQL_RESPAWN_CMD auto-respawn (US-001) —lib/orchestrator-liveness.sh::wrap_orchestrator_dispatchgainsQL_RESPAWN_CMDenv-var branch. When non-empty and STALE detected, executesbash -c "${QL_RESPAWN_CMD}"and returns its exit code instead of emitting the manual-takeover handoff. When empty/unset, v0.7.1 behavior unchanged (handoff + return 1). Operator setsQL_RESPAWN_CMDto a fully-specified orchestrator invocation to enable unattended auto-respawn. Tests 8 + 9 + 10 (6 new assertions, 2 added in post-soliton inline fix): Test 8 —QL_RESPAWN_CMD="echo respawned"+ stale repo → "respawned" in stdout + rc=0 + wall-clock ≤10s; Test 9 — unset + stale → handoff + rc=1 (regression guard); Test 10 —QL_RESPAWN_CMD="exit 42"+ stale → rc=42 (exit-code propagation). 18 → 24 liveness tests. - G22 second calibration pass (US-002) —
references/severity-rubric-calibration-v0.7.2.mdcreated: updated empirical tables (8-cycle / 57-finding baseline), bundle-tier comparison axis (patch-tier 7 cycles vs minor-tier 1 cycle — v0.7.0). Key finding: patch-tier HIGH=4.1% vs minor-tier HIGH=12.5% — directionally confirms v0.7.0 first-pass hypothesis that HIGH under-classification was patch-tier-skew driven, not rubric miscalibration. MEDIUM stable across tiers (~25%).CLAUDE.mdProcess references updated from v0.7.0 to v0.7.2 calibration doc.
+4 new assertions in 1 extended test file:
- 1 extended:
test_orchestrator_liveness.sh18 → 24 (+6 N24 Tests 8+9+10 + soliton wall-clock fix)
2/2 user-facing stories shipped first-attempt PASS under manual takeover (6th consecutive cycle). 0 retries. Multi-cycle CSV milestone: 21 → 24 rows. Aggregate across 8 cycles: 57 findings (0 critical / 3 high / 13 medium / 41 low; LOW share 71.9%). G30 self-validation (8th consecutive correct LOW classification): tier=LOW score=10 files=4 sensitive=0 → skip; recorded with automated:true. Full retrospective: idea-stage/PIPELINE_REPORT_v13.md. v0.8.0+ backlog: idea-stage/IDEA_REPORT_v13.md.
4 user-facing changes addressing v0.7.0's IDEA_REPORT_v11 v0.7.1 slate (N18, N19, N20, N21) + 1 retrospective. Seventh multi-cycle populated-CSV run — ledger 18 → 21 rows. Patch-tier; 5-story compact bundle. v0.7.1's own diff was self-validated: tier=LOW score=25 files=10 sensitive=0 → skip recorded with automated:true. 7 consecutive LOW-tier self-validations (v0.6.5..v0.7.1).
- N19 — G30 dispatch gate end-to-end MEDIUM-tier fixture (US-001) —
tests/test_deep_review_dispatch.shTest 8 (4 new assertions): synthesizes patch with sensitive_hits=2 (auth/login.js + .env) → should_dispatch_deep_review returns 0 (dispatch); dispatch_set MEDIUM returns 4 canonical reviewers (oh-my-claudecode:code-reviewer, soliton:synthesizer, oh-my-claudecode:security-reviewer, oh-my-claudecode:test-engineer). Closes the v0.7.0 PIPELINE_REPORT_v11 calibration-insight gap (MEDIUM/HIGH/CRITICAL branches now exercised end-to-end). 15 → 19 dispatch tests. - N20 —
wrap_orchestrator_dispatchruntime extraction (US-002) —lib/orchestrator-liveness.shadds NEWwrap_orchestrator_dispatch [timeout_sec] [interval_sec]function. HonorsQL_LIVENESS_ENABLEenv var (default true; false silent skip). Invokespoll_orchestrator_commits; on STALE emits canonical handoff message (cross-link toreferences/orchestrator-takeover.md+ numbered recovery steps) and returns 1.skills/ql-execute/SKILL.mdreplaces inlineif [[ ${QL_LIVENESS_ENABLE:-true} == "true" ]]; then ...example with single-linewrap_orchestrator_dispatch || exit 1. Tests 6 + 7 (6 new assertions): opt-out silent rc=0; default + stale tmp repo → handoff stdout + rc=1 + cross-link. 12 → 18 liveness tests.tests/test_ql_execute_liveness_wrapping.shTest 6 grep updated to verify the function reference (replacing the v0.7.0 poll_orchestrator_commits inline grep). - N18 — plan-review MEDIUM second example (US-003) —
references/finding-severity.mdplan-review MEDIUM row gains a second worked example joined by**OR**:single-story-wave-bottleneck-masked(dag-validator single-story-wave warning that's structurally correct but informational, masking a real serialization risk). Closes v0.7.0 G22 calibration insight that plan-review emitted ONLY LOW findings (9/9 across 6 cycles). - N21 — parse-script aggregate suppress zero-row (US-004) —
references/severity-rubric-calibration-parse.shper-stage aggregate awk addsrows++counter andif (rows == 0) exitbefore printing**Aggregate**:line. v0.7.0 PR #71 soliton conf-75 sub-threshold carry-over closed.
+10 new assertions across 2 extended test files + 0 new files:
- 2 extended:
test_deep_review_dispatch.sh15 → 19 (+4 N19 Test 8);test_orchestrator_liveness.sh12 → 18 (+6 N20 Tests 6 + 7) - 1 modified (count unchanged):
test_ql_execute_liveness_wrapping.sh6 → 6 (Test 6 grep target updated)
4/4 user-facing stories shipped first-attempt PASS under manual takeover (5th consecutive cycle). 0 retries. Multi-cycle CSV milestone: 18 → 21 rows. Aggregate across 7 cycles: 53 findings (0 critical / 3 high / 12 medium / 38 low; LOW share 71.7%). G30 self-validation (7th consecutive correct LOW classification): tier=LOW score=25 files=10 sensitive=0 → skip; recorded with automated:true. 3-layer + extraction recovery infrastructure complete: v0.6.8 prose / v0.6.9 lib helper / v0.7.0 SKILL prose / v0.7.1 SKILL → callable function. Bundle size pattern: 7-7-7-7-5-6-5; patch-tier track increasingly drained. v0.8.0 minor-tier framing recommended for next substantive cycle (G22 second calibration pass + bundle-tier comparison data). Full retrospective: idea-stage/PIPELINE_REPORT_v12.md. v0.8.0+ backlog: idea-stage/IDEA_REPORT_v12.md.
MINOR bump rationale: First non-patch release in 5 cycles. Patch-tier backlog drained over v0.6.5..v0.6.9 (5 consecutive LOW-tier patch releases). v0.7.0 ships substantive G22 calibration analysis + N14 SKILL-level runtime control-flow change + 3 LOW-priority cleanups. The version bump deliberately skips 0.6.10 to signal v0.6.x patch-track closure.
5 user-facing changes addressing v0.6.9's IDEA_REPORT_v10 v0.7.0 slate (G22, N14, N15, N16, N17) + 1 retrospective. Sixth multi-cycle populated-CSV run — ledger 15 → 18 rows. v0.7.0's own diff was self-validated: tier=LOW score=25 files=12 sensitive=0 → skip recorded with automated:true. 6 consecutive LOW-tier self-validations across v0.6.5..v0.7.0 — calibration insight: G30 score formula caps blast-radius at 25 for files_changed ≥ 10 + 0 sensitive + 0 cg = 25 = LOW; conservative threshold means patch-tier and small-minor-tier bundles correctly skip the deep-review pipeline.
-
G22 — severity rubric calibration first pass (US-001) — new
references/severity-rubric-calibration-v0.7.0.md(~150 lines, 6 sections: Methodology / Empirical distribution / Expected distribution / Drift analysis / Rubric language updates / Future work). Companionreferences/severity-rubric-calibration-parse.sh(NEW awk parser cited by Methodology). 6-cycle aggregate: 49 findings → 0/3/13/33 (critical/high/medium/low). Drift: HIGH mild under-classification (-4 to -9pp); LOW mild over-classification (+7 to +17pp); plan-review emits ONLY LOW (9/9 = 100%). Verdict: no urgent rubric edits at this baseline; plan-review MEDIUM-example tweak queued for v0.7.1 N18. Re-snapshot at v0.7.x retrospectives via the parse-script. Cross-linked from CLAUDE.md## Process references(under N15's new Process-related sub-categorization). 6 assertions intests/test_severity_rubric_calibration.sh. -
N14 —
/ql-executeSKILL-level liveness wrapping (US-002) —skills/ql-execute/SKILL.md## Orchestrator liveness gatesubsection wraps orchestrator dispatch withpoll_orchestrator_commits(timeout 600s, interval 60s). HonorsQL_LIVENESS_ENABLEenv var (defaulttrue; setfalsepreserves v0.6.9 dispatch semantics for backwards compat). On STALE signal (rc=1), the SKILL emits a structured handoff message (header + pointer toreferences/orchestrator-takeover.md+ numbered recovery steps) and exits with rc=1 (orchestrator-stalesignal). Closes the v0.6.7+v0.6.8+v0.6.9 manual-takeover recovery loop: v0.6.8 N6 prose advisory → v0.6.9 N6-followup lib helper → v0.7.0 N14 SKILL wrapping (3-layer recovery infrastructure complete). 6 PRESENCE-ONLY assertions intests/test_ql_execute_liveness_wrapping.sh(matches v0.6.8 N6 pattern). -
N15 — CLAUDE.md
## Process referencesre-categorization (US-003) — refactored into 3###sub-headers (Orchestrator-related / Test-related / Process-related). 4 entries placed in natural categories (orchestrator-takeover.md → Orchestrator; test-wallclock-baselines.md → Test; soliton-finding-triage.md + severity-rubric-calibration-v0.7.0.md → Process). Existing 14 cross-link assertions remain green post-refactor. -
N16 —
poll_orchestrator_commitsinterval_sec=0 guard (US-004) — addsif (( interval_sec <= 0 ))fail-fast guard at top of function. Emits[LIVENESS] ERROR: interval_sec must be > 0 (got %s)log + returns 1, preventing the infinite-loop hazard (sleep 0 + elapsed never increases). v0.6.9 PR #70 soliton conf-82 carry-over closed. Test 5 intests/test_orchestrator_liveness.sh(4 new assertions: exit 1 + ERROR log within 2s + no STALE log on guard path). 8 → 12 dispatch tests. -
N17 —
tests/bench_wallclock_baseline_drift.shREPO_ROOT printf %q quoting (US-005) — replaceseval "(cd '$REPO_ROOT' && $cmd)"withsafe_cmd=$(printf '(cd %q && %s)' "$REPO_ROOT" "$cmd"); time eval "$safe_cmd". Defensive against future relocations to single-quote-containing paths (e.g./home/user/andy's-repos/). v0.6.9 PR #70 soliton conf-65 carry-over closed.
+16 new assertions across 2 new test files + 1 extended:
- 2 new files:
test_severity_rubric_calibration.sh(6),test_ql_execute_liveness_wrapping.sh(6) - 1 extended:
test_orchestrator_liveness.sh8 → 12 (+4: Test 5 N16 guard) - 1 implicit (cross-link assertions remain green post-refactor): 14 unchanged across 3 existing test files
5/5 user-facing stories shipped first-attempt PASS under manual takeover (4th consecutive cycle; orchestrator subagent's 3-layer recovery infrastructure now complete but applies only to runs starting AFTER v0.7.0 merges — the v0.7.0 dogfood ran on v0.6.9 master HEAD which has the lib helper but no SKILL wrapping). 0 retries. Multi-cycle CSV milestone: 15 → 18 rows. Aggregate across 6 cycles: 49 findings (0 critical / 3 high / 13 medium / 33 low). G30 self-validation (6th consecutive correct LOW classification): tier=LOW score=25 files=12 sensitive=0 → skip; recorded with automated:true. G30 score-formula calibration insight: blast-radius caps at 25 for files_changed ≥ 10; the dispatch gate's threshold (score>30) is conservative — only sensitive-path or coverage-gap diffs trigger MEDIUM+. v0.7.x candidates: N18 (plan-review MEDIUM example), N19 (G30 dispatch gate end-to-end test fixture), N20 (N14 SKILL wrapping runtime extraction). Full retrospective: idea-stage/PIPELINE_REPORT_v11.md. v0.7.1+ backlog: idea-stage/IDEA_REPORT_v11.md.
4 user-facing changes addressing v0.6.8's IDEA_REPORT_v9 priority list (N6-followup orchestrator-liveness lib helper, N13 orchestrator-takeover SOP doc, N9-followup wall-clock baseline-drift bench, N12 helper rename) + 1 retrospective. Patch-tier; 5-story bundle (smaller than the typical 7 — clean LOW-tier slate, patch-track backlog drained). Fifth multi-cycle populated-CSV run — ledger 12 → 15 rows. v0.6.9's own diff was self-validated: tier=LOW score=25 files=11 sensitive=0 → skip recorded in quantum.json.reviews[v0.6.9-bundle].deepReview with automated:true. 5 consecutive LOW-tier self-validations across v0.6.5..v0.6.9 — G30 dispatch gate routes patch-tier bundles with 100% accuracy across the established baseline.
- N6-followup —
lib/orchestrator-liveness.sh::poll_orchestrator_commits(US-001) — new parent-side commit-poll helper. Defaults: timeout=600s, interval=60s, base=git rev-parse HEAD at call time. Returns 0 (live) on new-commit observation; 1 (stale) on timeout. Stderr log:[LIVENESS] new commit XXXXXXXX observed at +Ns(live) /[LIVENESS] STALE: no commits in Ns (base=XXXXXXXX)(stale). Library contract: no shell flags at source time.agents/orchestrator.mdStep 1.0.4 prose subsection points operators at the helper for unattended/ql-executemode (helper invocation is operator-side; SKILL-level wrapping queued as v0.7.0 N14). 8 assertions intests/test_orchestrator_liveness.sh(function defined, stale-path within timeout-jitter ceiling, live-path with pre-staged HEAD-advance, default-arg behavior). - N13 —
references/orchestrator-takeover.md(US-002) — new manual-takeover SOP for parent agents detecting orchestrator drift mid-cycle. 4 sections: When to detect drift / What to verify / How to take over without corrupting state / Recovery from N6-followup STALE signal. Documents the verification-failure-driven amendment rule (preserve orchestrator edits unless a check proves them broken; v0.6.7 Pattern C → Pattern A worked example). Cross-linked fromCLAUDE.md## Process referencessection (joining N7 soliton-finding-triage + N9 wallclock-baselines). 5 assertions intests/test_orchestrator_takeover_doc.sh. - N9-followup —
tests/bench_wallclock_baseline_drift.sh(US-003) — new opt-in benchmark. Runs 7 documented baseline commands withtime, parses real wall-clock seconds, compares against hardcoded BASELINES (curated subset ofreferences/test-wallclock-baselines.md). EmitsWARN: <cmd> took Ns (baseline Xs, threshold Ys — drift > 50%)if measured > 1.5× baseline. Always exits 0 (informational). File-naming usesbench_*prefix (NOTtest_*) sotests/run_all.sh'stests/test_*.shglob deliberately skips it; operators must invoke directly:bash tests/bench_wallclock_baseline_drift.sh. - N12 — helper rename in
tests/test_audit.sh(US-004) —extract_function_comments→extract_function_header_comments(clarifies: returns ONLY the function-header comment block);extract_function_full_comments→extract_function_all_comments(clarifies: returns header AND body comments). Mechanical rename across 9 occurrences (2 function defs + 4 call sites + 3 doc-comment refs). Replacement order: longer name substituted first to avoid sub-string overlap. 45/45 audit assertions unchanged.
+13 new assertions across 2 new test files (the bench file is opt-in and not counted toward run_all):
- 2 new files:
test_orchestrator_liveness.sh(8),test_orchestrator_takeover_doc.sh(5) - 1 new opt-in bench:
bench_wallclock_baseline_drift.sh(informational; not run bytests/run_all.shdue to glob mismatch) - 1 modified (assertion count unchanged):
test_audit.sh45 → 45 (mechanical rename)
5/5 user-facing stories shipped first-attempt PASS under manual takeover (3rd consecutive cycle; orchestrator-liveness helper SHIPS in this bundle and applies only to runs starting AFTER v0.6.9 merges — the v0.6.9 dogfood itself ran on v0.6.8 master HEAD which had no runtime liveness). 0 retries. Multi-cycle CSV milestone: 12 → 15 rows. Aggregate across 5 cycles: 41 findings (0 critical / 2 high / 11 medium / 28 low) — patch-tier baseline solidly stable; G22 first calibration pass becomes meaningful at v0.7.x. G30 self-validation: v0.6.9's own master..HEAD diff (11 files) classified by its own should_dispatch_deep_review rule as tier=LOW score=25 → skip; decision recorded with automated:true. Bundle size shrinking: v0.6.5/6/7/8 all had 7 stories; v0.6.9 had 5 — the natural endpoint of a patch-tier track. v0.7.0 minor-tier is the right next move. v0.7.0 candidates: G22 calibration first pass, N14 SKILL-level wrapping of the liveness helper, N15 CLAUDE.md Process references re-categorization. Full retrospective: idea-stage/PIPELINE_REPORT_v10.md. v0.7.0 backlog: idea-stage/IDEA_REPORT_v10.md.
6 user-facing changes addressing v0.6.7's IDEA_REPORT_v8 priority list (N6 orchestrator stale-detection prose guard, N7 soliton-finding-triage doc, N8 narrow Test 37a awk to function-header range, N9 wall-clock baselines reference, N10 compute_risk_score comment correction, N11 orchestrator Step 4B.5 cleanup-line move). Patch-tier; all changes are doc/cleanup-additive (no schema deltas, no breaking changes, 2 new committed reference files). Fourth multi-cycle populated-CSV run — ledger 9 → 12 rows. v0.6.8's own diff was self-validated: tier=LOW score=25 files=13 sensitive=0 → skip recorded in quantum.json.reviews[v0.6.8-bundle].deepReview with automated:true.
- N6 — orchestrator Self-monitoring guard prose (US-001) —
agents/orchestrator.mdadds a new### Self-monitoring guardsubsection between Step 4 and Step 4B. Documents the rule (verify in_progress story commit landed before reasoning about other stories), 3 forbidden idioms ("while that runs", "let me proactively", "let me prepare US-XXX in parallel"), self-recovery action ([ORCH] STALE-DETECT log + reset to current in_progress story's task list), and explicit prose-only enforcement model. Runtime enforcement (parent-side liveness check) queued as v0.6.9 N6-followup. Newtests/test_orchestrator_self_monitor.sh(5 assertions: presence header, >=3 idioms, STALE-DETECT marker, negative-control regex against legitimate cross-story phrasing, positive-control regex match against true drift phrase). - N7 — soliton-finding-triage doc (US-002) — new
references/soliton-finding-triage.md(Workflow / Repro template / Examples) documenting the validate-before-design workflow for sub-threshold (<85)/soliton:pr-reviewfindings between cycles. v0.6.7 G36 documented as worked HALLUCINATION example. Cross-linked from new## Process referencessection in CLAUDE.md. Newtests/test_soliton_triage_doc.sh(5 assertions). - N8 — narrow
extract_function_commentsawk to header range (US-003) —tests/test_audit.sh::extract_function_commentssimplified to emit accumulated header buffer ONLY (noin_bodystate machine). Matches G34 stated scope ("trim PR-metadata bloat from function-HEADER comments"). DISCOVERED: Tests 36b/37b WHY-phrase checks were relying on body comments; SPLIT INTO 2 HELPERS —extract_function_comments(header-only) for Tests 36a/37a (bloat); newextract_function_full_comments(header+body) for Tests 36b/37b (WHY). 45/45 audit assertions preserved. - N9 — wall-clock baselines reference (US-004) — new
references/test-wallclock-baselines.md(6-row platform-conditional table: Git Bash vs Linux/CI for the major test commands). Cross-linked from CLAUDE.md## Process references. Newtests/test_wallclock_baselines_doc.sh(4 assertions). Baseline-drift WARN-test deferred to v0.6.9 N9-followup per design-review. - N10 — compute_risk_score comment correction (US-005) —
lib/deep-review.shcomment block above v0.6.7's G36 defense-in-depth guard rewritten. Removed misleading "Mirrors the structure used in compute_risk_score above" claim (compute_risk_score uses outer SHA-presence gate; should_dispatch_deep_review uses inner files_changed-count gate — different mechanisms, same intent). Comment-only edit. 14/14 dispatch tests preserved. - N11 — orchestrator Step 4B.5 cleanup-line move (US-006) —
agents/orchestrator.mdStep 4B.5 else-branch:rm -f .quantum-feature-diff.patchmoved from start-of-branch to end-of-branch (after the verdict-case block, before closing fi). BLOCKS_MERGE intentional skip-via-exit-1 documented in comment near new location — patch file remains for forensic inspection by the operator triaging the blocked merge. New Test 7 intests/test_deep_review_dispatch.sh(awk-line-numbering assertion: rm-f line index AFTER case-VERDICT line index).
+15 new assertions across 3 new test files + 1 extended:
- 3 new files:
test_orchestrator_self_monitor.sh(5),test_soliton_triage_doc.sh(5),test_wallclock_baselines_doc.sh(4) - 1 extended:
test_deep_review_dispatch.sh14 → 15 (+1: Test 7 N11 cleanup ordering) - 1 modified (assertion count unchanged):
test_audit.sh45 → 45 (awk simplification + helper split)
6/6 user-facing stories shipped first-attempt PASS under manual takeover (parent agent executed all stories — orchestrator subagent's N6 prose guard does not yet apply since the dogfood ran on v0.6.7 master HEAD). 0 retries. Multi-cycle CSV milestone: metrics/pre-impl-review-findings.csv now has 12 rows (3 v0.6.5 + 3 v0.6.6 + 3 v0.6.7 + 3 v0.6.8) — first calibration pass becomes meaningful at v0.7.x with bundle-tier comparison data (33 findings total: 0 critical / 2 high / 9 medium / 22 low). G30 self-validation: v0.6.8's own master..HEAD diff (13 files) classified by its own should_dispatch_deep_review rule as tier=LOW score=25 → skip; decision recorded in quantum.json.reviews[v0.6.8-bundle].deepReview with all 4 required keys. Audit log shows split summary AND pre-impl-review-coverage: 3/3 stages OK (deterministic given v0.6.7 + v0.6.8 rows fall inside the 7d rolling window). 2 mid-cycle design-improvements: US-001 regex form needed [A-Z0-9]+ (not [0-9]+) for placeholder match; US-003 needed a split into 2 helpers (extract_function_comments header-only vs extract_function_full_comments for WHY-checks) to match G34's actual semantics. v0.6.9+ candidates: N6-followup (parent-side liveness check), N13 (references/orchestrator-takeover.md), N9-followup (baseline-drift WARN-test), N12 (helper rename). Full retrospective: idea-stage/PIPELINE_REPORT_v9.md. v0.6.9+ backlog: idea-stage/IDEA_REPORT_v9.md.
6 user-facing changes addressing v0.6.6's IDEA_REPORT_v7 priority list (G35, G36, G37, N1, N2, N5) plus 1 discovered v0.6.6 commit-hygiene fix (c47e038's body comment trimmed). Patch-tier; all changes are bug fixes, doc clarifications, or process-additive (no schema deltas, no breaking changes, no new files). Third multi-cycle populated-CSV run — the v0.6.6 ledger of 6 rows extended to 9 rows from this cycle's planning hooks. First N1-gated dispatch wiring — should_dispatch_deep_review now load-bearing in agents/orchestrator.md Step 4B.5 via explicit if/else containment (pre-N1 the gate was informational; post-N1 the live pipeline runs only on dispatch path). v0.6.7's own diff was self-validated: tier=LOW score=25 files=10 sensitive=0 → skip recorded in quantum.json.reviews[v0.6.7-bundle].deepReview with automated:true.
- G35 —
tests/test_audit.shTest 4 hang fix (US-001) — wraps theout=$(do_audit 2>&1); rc=$?capture in an explicitset +e ... set -eblock so inheritedset -euo pipefail(from sourcing quantum-loop.sh) doesn't propagate into the subshell capture. Runs do_audit against a clean tmp repo (matching Test 5's pattern) for determinism — prevents the test from depending on the developer's working repo state. Bundled inline: v0.6.6 c47e038 regression fix — do_audit body comment had "confidence 95" + "Soliton-pr-review caught at" tripping Test 37a (whose awkextract_function_commentsincludes body comments). Rephrased to load-bearing-WHY only ("mapfile is the right tool because it treats newlines as record separators by default"). 0 assertion delta (45 → 45); was: hang past 30s indefinitely → now: completes in ~3-4min wall-clock (Git Bash subprocess overhead is the new bottleneck, not the test logic). - G36 —
should_dispatch_deep_reviewempty-input guard (US-002) — addsif (( files_changed > 0 ))short-circuit around theprod_countgrep computation inlib/deep-review.sh. Defense-in-depth: the existing regex^(tests?/|$)|...already excludes empty input via the|$alternative (verified empirically — soliton's confidence-82 finding was a false positive). The newtests/test_deep_review_dispatch.shTest 5 fixture (3 assertions: empty-diff exit 1, files=0, score=0) is the durable regression-guard locking in current correct behavior. 8 → 11 dispatch tests. - G37 —
tests/run_all.sh --parallelxargs_rc capture (US-003) — replaces|| truewith|| xargs_rc=$?in the--parallelxargs invocation; ORs(( xargs_rc != 0 ))into OVERALL_RC alongside the existing: 0/N passedgrep check. Catches the missed surface: a worker that exits non-zero AFTER printing a passing "P/N passed" Results line (partial-run-then-crash; post-assertion cleanup-failure).tests/test_run_all.shTest 4 (RED-tested before fix → GREEN after). 9 → 10 run_all tests. - N1 — wire
should_dispatch_deep_reviewinto orchestrator Step 4B.5 control flow (US-004) —agents/orchestrator.mdStep 4B.5 pseudocode block restructured with explicitif/elsecontainment. Skip branch records the deepReview decision via jq AND removes the diff patch file AND falls through to Step 4C. Else branch holds the 7-step dispatch pipeline (score-from-quantum, dispatch-set, prepare-context, agent dispatch, aggregate, persist, verdict-case). Cleanup symmetric across branches. Pre-N1 the dispatch pipeline ran unconditionally; post-N1 it runs only on the dispatch path.tests/test_deep_review_dispatch.shTest 6 (3 structural assertions via awk-bounded code-fence extraction) verifies gate-then-else containment +score-from-quantumlives betweenelseandfi. 11 → 14 dispatch tests. - N2 —
_audit_test_suitesdoc clarification (US-005) —quantum-loop.shfunction-header comment gains a 4-line clarification: helper reads the.omc/phase-*-evidence/LEDGER, not the live test corpus; row answers "did the most recent recorded test run pass?" not "do tests pass right now?"; for live state, runbash tests/run_all.sh. Existing FR-10 fail-when-no-evidence WHY preserved. Comment-only change. Closes the operator-confusion gap from IDEA_REPORT_v7 §N2. - N5 —
quantum.json.codebasePatternsre-seed (US-006) — appended p009/p010/p011 verbatim fromidea-stage/PIPELINE_REPORT_v7.md§ "codebasePatterns harvested" (the committed canonical record from v0.6.6 dogfood). Final state: 8 → 11 patterns. quantum.json is gitignored — committed as audit-trail marker only. IDEA_REPORT_v8 § "Persistent canon" cross-links to PIPELINE_REPORT_v7 as the durable single source-of-truth across cycles.
+7 new assertions across 3 extended test files. Zero new test files; zero regressions:
- 3 extended:
test_audit.sh45 → 45 (Test 4 wrapper change; no count delta),test_deep_review_dispatch.sh8 → 14 (+6: Test 5 + Test 6 add 3 each),test_run_all.sh9 → 10 (+1: Test 4)
7/7 user-facing stories shipped first-attempt PASS (5 stories Wave 0 + 1 story Wave 1 [US-004 deps US-002] + 1 story Wave 2 [US-007 deps all]). 0 retries. Multi-cycle CSV milestone: metrics/pre-impl-review-findings.csv now has 9 rows (3 v0.6.5 + 3 v0.6.6 + 3 v0.6.7) — calibration histograms become meaningful at v0.7.x with 1-3 more populated releases (current 9-row baseline shows 0 critical / 2 high / 7 medium / 16 low across 25 findings; LOW dominates at 64%). G30 self-validation: v0.6.7's own master..HEAD diff (10 files) classified by its own should_dispatch_deep_review rule as tier=LOW score=25 → skip; decision recorded in quantum.json.reviews[v0.6.7-bundle].deepReview with all 4 required keys. Audit log shows split summary AND pre-impl-review-coverage: <N>/3 stages row. 3 mid-cycle discoveries: (a) v0.6.6 c47e038 regression in do_audit body comment (fixed inline in US-001), (b) soliton G36 finding was a false positive (defensive guard + regression-guard test still shipped), (c) orchestrator subagent context-drift mid-task (parent agent took over manually for stories US-001 through US-007 — first-attempt-PASS pattern preserved despite agent failure). v0.6.8+ candidates: N6 (orchestrator stale-detection heuristic), N7 (soliton-validate-before-design process), N8 (Test 37a scope decision), N9 (wall-clock target calibration). Full retrospective: idea-stage/PIPELINE_REPORT_v8.md. v0.6.8+ backlog: idea-stage/IDEA_REPORT_v8.md.
6 user-facing changes addressing v0.6.5's IDEA_REPORT_v6 priority list (G30, G31, G32, G33, G34, p008-driven test-helper audit). Patch-tier; all changes are cleanup or additive (no schema deltas, no breaking changes). Second multi-cycle populated-CSV run — the v0.6.5 ledger of 3 rows extended to 6 rows from this cycle's planning hooks. Calibration histograms become possible at v0.7.x with 3-5 more populated runs. First release with auto-gated ql-deep-review dispatch — should_dispatch_deep_review is the canonical decision rule going forward; v0.6.6's own diff was self-validated against the new rule (tier=LOW score=25 → skip recorded in quantum.json.reviews).
- G32 —
agents/spec-reviewer.mdnegative-assertion regression guard (US-001) —tests/test_sprint_contract.shTest 7i asserts 0 uncited inline regex enumerations inagents/spec-reviewer.md(the inline regex shape(test_|\.test\.excluding the citation line). Regression-guard property verified manually: injecting an inline copy fails the test; reverting passes. 1 new assertion (24 → 25). - G33 —
tests/test_audit.shTests 34/35 invoke realdo_auditviaQL_AUDIT_TEST_ROWS(US-002) —quantum-loop.sh::do_auditnow honorsQL_AUDIT_TEST_ROWSenv var (gated onQL_AUDIT_TEST_MODE=1) for synthetic newline-delimited ROWS injection. Tests 34/35 rewritten to subprocess-invokebash quantum-loop.sh --auditinstead of inlining a private_t34_run/_t35_runre-implementation. Test-mode guard tightened to require$#==0so--auditsubprocesses bypass the source-mode short-circuit. -23 net lines intests/test_audit.sh. Tests 34/35 now catch any future regression indo_audit's case-pattern switch. - G34 — trim PR-metadata bloat from
quantum-loop.shfunction comments (US-003) —_audit_format_rowinline comment +do_auditfunction-header trimmed to load-bearing WHY ("FAIL OR WARN because both signal something the operator should see"; "split counters by status because a single combined counter would silently treat WARN as on-target"). Removed:confidence,soliton-pr-review,v0.6.5 post-merge,G18/G29/G33tags, version-tag headers, before-and-after summary string. 4 new meta-assertions intests/test_audit.sh(Tests 36a/b + 37a/b) using awk-based function-comment-range extraction enforce 0 bloat strings + ≥1 WHY phrase per function. 41 → 45 audit assertions. - G30 —
lib/deep-review.sh::should_dispatch_deep_review+agents/orchestrator.mdStep 4B.5 documentation (US-004) — newshould_dispatch_deep_review(diff_path)helper returns 0 (dispatch) on tier ≥ MEDIUM, 1 (skip) on LOW. HonorsQL_DEEP_REVIEW=force(always dispatch) andQL_DEEP_REVIEW=skip(always skip) env-var overrides. Diff-path entry-point lets retrospective callers invoke without live SHAs (US-007 self-validation depends on this). Orchestrator Step 4B.5 documents the gate with override-syntax table. 8 new assertions intests/test_deep_review_dispatch.sh(NEW) covering 4 fixture cases × 2 assertions each. - G31 —
tests/run_all.shrunner with--quickand--parallel Nmodes (US-005) — new test-suite runner. Default: sequential;--quick: filter togit diff master..HEAD --name-only -- 'tests/test_*.sh';--parallel N(default N=4): xargs -P dispatch via private--__oneself-recursive entry-point (sidesteps export-f portability quirks on MSYS/Git Bash); combined--quick --parallel N. Per-file output formattests/test_<name>.sh: <P>/<T> passed. Exit 0 iff all PASS; 1 iff any FAIL. xargs -P fallback to sequential when unavailable. PARALLEL_UNSAFE allowlist convention documented (currently empty). 9 new assertions intests/test_run_all.sh(NEW) covering all 4 modes via 3-test fixture. - p008 —
tests/test_test_helpers.shtest-helper audit (US-006) — new audit asserts everytests/test_*.shfile uses one of four safe sourced-script-errexit patterns: A (function-extracted subshell + two-invocation idiom), B (|| trueafter substitution), C (enclosingset +e...set -eblock), or D (file does not enableset -e— the hazard is errexit-specific). Opt-out via# pragma test-helper-audit: opt-out (rationale: ...)with rationale enforcement. Per-file PASS/FAIL withfile:linecitation for any unsafe substitution. 9 new assertions; 0 unsafe substitutions across all 76 test files in current corpus; 0 opt-outs needed (target met).
+30 new assertions across 6 test files (1 modified + 5 NEW, including the runner from G31 and the audit from p008). Zero regressions:
- 4 new files:
test_deep_review_dispatch.sh(8),test_run_all.sh(9),test_test_helpers.sh(9),tests/run_all.sh(runner; not a test file) - 2 extended:
test_sprint_contract.sh24 → 25 (+1: Test 7i G32 negative assertion),test_audit.sh41 → 45 (+4: Tests 36a/b + 37a/b G34 meta-assertions)
7/7 user-facing stories shipped first-attempt PASS across 3 waves (Wave 0: 5 sequential — US-001/US-002/US-004/US-005/US-006; Wave 1: 1 sequential — US-003 depends on US-002; Wave 2: 1 sequential — US-007 retrospective). 0 retries, 0 cross-story contract violations, 0 merge conflicts. Multi-cycle CSV milestone: metrics/pre-impl-review-findings.csv now has 6 rows (3 v0.6.5 + 3 v0.6.6) — calibration histograms feasible at v0.7.x. G30 self-validation: v0.6.6's own master..HEAD diff (12 files, 1663 lines) classified by its own should_dispatch_deep_review rule as tier=LOW score=25 → skip; decision recorded in quantum.json.reviews[v0.6.6-bundle].deepReview with all 4 required keys (tier/decision/rationale/automated). Audit log shows split summary 7/7 OK, 0 WARN, 0 FAIL AND pre-impl-review-coverage: 3/3 stages OK (deterministic given v0.6.5 + v0.6.6 rows fall inside the 7d rolling window). Full retrospective: idea-stage/PIPELINE_REPORT_v7.md. v0.6.7+ backlog: idea-stage/IDEA_REPORT_v7.md.
6 user-facing changes addressing v0.6.4's IDEA_REPORT_v5 priority list (G18, G20, G25, G26, G27, G28). Patch-tier; all changes are doc / drill-text / agent-prose / new-doc additions. No breaking changes; no schema changes; no new lib code. First release whose dogfood produced real metrics/pre-impl-review-findings.csv data — 3 rows representing all 3 advisory pre-impl-review stages on this v0.6.5 planning cycle. Closes IDEA_REPORT_v5 §G18/G20/G25/G26/G27/G28; unblocks G22 (severity-rubric calibration) for v0.7.x retrospectives. The orchestrator that drove this dogfood ran on v0.6.4 master HEAD semantics — see README.md ## Self-modifying execution (NEW in this release) for the recurring caveat.
do_auditsummary split into OK / WARN / FAIL counters (US-001, G26) —quantum-loop.sh::do_auditsummary line now readsSummary: <ok>/<total> OK, <warn> WARN, <fail> FAIL.instead ofSummary: <ok>/<total> metrics on target.which silently counted WARN rows as on-target. Thecase-style row scan accumulates 3 counters in a single pass; exit-code semantics preserved (returns 1 iff any row contains|FAIL|; WARN rows do not trip exit). Fixes the misleading wording exposed by v0.6.4's introduction of the first WARN-capable metric (pre-impl-review-coverage). 4 new test assertions intests/test_audit.sh(Tests 33-35 split-summary fixtures + Test 23 fixed-format update); existing 35 audit assertions preserved.agents/spec-reviewer.mdplan-review checklist cites SPRINT_CONTRACT_TEST_REGEX (US-002, G27) — closes G14's 5th call site that v0.6.4 missed. The plan-review checklist's "testFirst command consistency" rule now referenceslib/handoff.sh::SPRINT_CONTRACT_TEST_REGEXas the canonical pattern source. Inline regex characters preserved with explicit "see ..." framing for self-documentation.lib/handoff.shconsumer comment updated 4 → 5 sites. 1 new test assertion (sub-test 7h intests/test_sprint_contract.sh).agents/conflict-auditor.mdStep 4 sort enumerates all 5 severities (US-003, G25) — sort instruction now reads "high → medium → low → warning → none" with a 1-sentence rationale (high most-impactful; warning informational-but-real-signal; none informational-only). Closes the gap opened by v0.7.0 G15 (warning) and Rule 0 (none) which had no deterministic ordering before. Pure documentation-only fix. 1 new test assertion (Test 7 intests/test_changelog_ownership.sh).references/risk-mitigation-language.md(US-004, G28) — NEW 153-line design-craft checklist for Risk-section authors. 3 sections: (1) the rule (enumerate operations, not just techniques) + cautionary tale citing v0.6.4 commitc89ba13and the soliton finding at confidence 90 that surfaced the flock-bootstrap-race; (2) 4-item concurrency checklist (shared mutable state / observably-coupled operations / race window without mitigation / race window with mitigation, each with concrete interleaving examples); (3) short-form patterns for non-concurrency mitigations (validate input X / handle malformed Y / atomic update Z). Bidirectional cross-link withreferences/finding-severity.md(review-craft pair). Operationalizes codebasePattern p007 from v0.6.4. 10 new test assertions intests/test_risk_mitigation_language.sh(NEW)._audit_pre_impl_review_coveragemissing-csv drill text gains operator guidance (US-005, G18) — drill text now reads "missing-csv — no metrics/pre-impl-review-findings.csv yet (expected on first run after install — invoke /ql-brainstorm/spec/plan to populate)". Other WARN states (no-recent-runs,partial-coverage) keep their existing drill messages. Closes the v0.6.4-flagged operator-confusion gap. One follow-up surfaced:_audit_format_rowonly renders drills for FAIL rows, so the new guidance is captured in tests but not yet visible at runtime; queued as v0.6.6 G29. 1 new test assertion (Test 28b intests/test_audit.sh).README.md ## Self-modifying executionsection (US-006, G20) — NEW 2-paragraph section between How It Works and Quick Start. Explains: each release ships a bundle that modifies the very orchestrator/agent/skill prompts the orchestrator itself uses, so each dogfood runs on the PREVIOUS release's prompt semantics, and this release's wires only apply to runs starting AFTER the bundle merges to master. Concrete v0.6.4 example (CSV-persistence wires applied to v0.6.5+ runs, not the v0.6.4 dogfood that introduced them). Cross-link toidea-stage/PIPELINE_REPORT_v5.md. Closes the recurring "is the audit broken?" reaction for new operators surfaced in PIPELINE_REPORT_v5 and the v0.6.5 PRD-review missing-measurement finding. 7 new test assertions intests/test_readme.sh(NEW).
+23 new assertions across 5 test files. Zero regressions in pre-existing suites (apart from a single pre-existing flake in tests/test_timeout.sh "Agent A still running after B killed" that is byte-identical to master and unrelated to any v0.6.5 change):
- 2 new files:
test_risk_mitigation_language.sh(10),test_readme.sh(7) - 3 extended:
test_audit.sh35 → 39 (+4: Tests 33-35 split-summary + Test 28b drill-substring),test_sprint_contract.sh23 → 24 (+1: sub-test 7h),test_changelog_ownership.sh9 → 10 (+1: Test 7)
6/6 user-facing stories shipped first-attempt PASS across 2 waves (5 sequential + 1 sequential). 0 retries, 0 cross-story contract violations, 0 merge conflicts. Total wall-clock to Wave 1 completion: ~85 minutes across 2 orchestrator instances (Run 1 hit iteration cap mid-US-001 with T-001 RED tests in place; Run 2 resumed by fixing one stale grep assertion and proceeded through US-001..US-006 + US-007 retrospective without further interruption). First populated-CSV milestone: 3 rows in metrics/pre-impl-review-findings.csv (design + prd + plan from this cycle's planning) — the first release whose dogfood produced real CSV data via manual hook invocation pre-execution. Audit log shows the new G26 split summary AND pre-impl-review-coverage: 3/3 stages OK (was 0/3 stages WARN missing-csv in v0.6.4 retrospectives). 1 new codebasePattern harvested (p008: sourced-script errexit propagation in test files; mitigation via function-extraction + two-invocation idiom). The G18-G28 cluster from IDEA_REPORT_v5 is now fully closed. Full retrospective: idea-stage/PIPELINE_REPORT_v6.md. v0.6.6+ backlog: idea-stage/IDEA_REPORT_v6.md.
6 user-facing changes maturing v0.6.3's advisory pre-impl-reviewers via instrumentation (G12-G17). All mechanisms additive/opt-in; no breaking changes — patch-tier per strict semver (no API/schema breaks; advisory mechanisms remain advisory; retrospective artifacts use the planning-time v0.7.0 label but the release ships as v0.6.4). Closes IDEA_REPORT_v4 §G12/G13/G14/G15/G16/G17. v0.6.5's first run will be the first end-to-end populated metrics/pre-impl-review-findings.csv — this v0.6.4 dogfood ran on v0.6.3 master HEAD (self-modifying caveat) so the new persistence wires apply only to NEXT runs. Promotion of any pre-impl-review stage from advisory → blocking should wait until ≥1 release of CSV baseline data accumulates.
lib/finding-synth.sh(US-001, G12) — pre-impl-review FINDING-block parser.parse_findings(stage)reads stdin, accumulates lines betweenFINDING_START/FINDING_ENDmarkers, parses key:value pairs (category/severity/file/line/evidence/suggestion), emits a structured JSON array.summarize_findings(stage, findings_json)returns{stage,count,by_severity,by_category}.format_summary_line(summary_json)emits the[REVIEW] <stage>-review complete: <N> findings (<crit>/<high>/<med>/<low>)line. CLI subcommand mode supported. Malformed blocks warn-and-drop; remaining well-formed blocks parse. Library follows lib/handoff.sh's no-flags-at-source-time + double-source-guard convention. 31 new test assertions.lib/finding-persist.sh+ 3 SKILL wires (US-002, G13) — persistence layer for parsed findings.persist_review_findings(stage, source_path, summary_json, findings_json)writes (a).handoffs/<stage>-review-findings.jsonper-run snapshot (idempotent overwrite per stage) and (b) appends a row to the aggregatemetrics/pre-impl-review-findings.csvledger (header on first write, thenflock -xguarded append; rename-replace fallback on systems withoutflock).read_review_findings(stage)round-trips snapshot or emits{}+ stderr WARN on missing-file.skills/ql-brainstorm/SKILL.mdPhase 4d,skills/ql-spec/SKILL.mdpost-prd-review, andskills/ql-plan/SKILL.mdStep 9 each gain a wrapper that captures reviewer stderr → parses → summarizes → persists. Advisory contract preserved across all 3 stages — wrappers never abort..gitignoreadds.handoffs/*-review-findings.json(per-run snapshots are runtime state);metrics/remains tracked (CSV ledger is committed baseline data). 37 new test assertions.lib/handoff.sh::SPRINT_CONTRACT_TEST_REGEXconstant (US-003, G14) — DRY refactor: the test-pattern regex'(test_|\.test\.|spec|pytest|^bash tests/|^npm test)'previously inlined in 4 places (orchestrator.md Step 2.5, ql-plan SKILL.md Step 8, test_sprint_contract.sh, test_sprint_contract_ql_plan.sh) is now a single readonly shell constant near the top oflib/handoff.sh. All 4 consumers source the lib + pass viajq --arg pattern.references/sprint-contract.mdreferences the constant by name. Future regex changes touch one file. Pure refactor — same behavior, single source of truth. +2 no-inline-regex assertions in test_sprint_contract.sh; existing 16 + 13 baseline assertions preserved.- CHANGELOG ownership convention (US-006, G15) —
agents/dag-validator.md§5d documents: stories that touchCHANGELOG.mdSHOULD defer to a single retrospective story per release. If >1 story hasCHANGELOG.mdintasks[].filePaths, the conflict-auditor MUST classify the conflict asseverity: warning(notseverity: none) and emit a Health Report line:WARNING: <N> stories touch CHANGELOG.md — consolidate to a single retrospective story per the v0.6.3 convention.agents/conflict-auditor.mdRule 0.5 codifies the per-file override (positioned between universal-pre-empt Rule 0 and category Rule 1). 9 new assertions intests/test_changelog_ownership.sh. references/finding-severity.mdrubric (US-004, G16) — new severity-calibration document with 3 mode sections (## design-review,## prd-review,## plan-review), each with a 4-row Severity / Rubric / Example table calibrated to the mode's existing checklist categories.agents/spec-reviewer.mddesign-review / prd-review / plan-review sections each gain a single cross-link line above their### Output formatsubsection (kebab-case anchor IDs). Helps operators distinguish "critical for design-review" from "critical for quality-reviewer downstream" — different bars. 14 new assertions intests/test_finding_severity.sh.quantum-loop.sh --audit pre-impl-review-coveragemetric (US-005, G17) — 7th audit row tracks how many of the 3 advisory pre-impl-review stages have run in the last 7 days. 4 states:missing-csvWARN (no CSV),no-recent-runsWARN (CSV exists but all rows >7d old),partial-coverageWARN (1-2/3 stages recent),full-coverageOK (3/3 stages recent). WARN never trips audit exit code — the metric measures operator pipeline-engagement, not codebase health. Cross-platform date math: GNUdate -d '7 days ago'first, BSDdate -v-7dfallback, epoch-0 fallback otherwise (degrades to "all rows count as recent" rather than crash).quantum-loop.ps1documents the PS1/SH parity divergence in its.DESCRIPTIONblock —--auditremains bash-only per AC. 6 new assertions intests/test_audit.sh(Tests 28-33); existing 30 baseline assertions preserved.
+97 new assertions across 7 test files. Zero regressions:
- 4 new files:
test_finding_synth.sh(31),test_finding_persist.sh(37),test_finding_severity.sh(14),test_changelog_ownership.sh(9) - 3 extended:
test_sprint_contract.sh16 → 18 (+2 no-inline-regex),test_sprint_contract_ql_plan.sh13 → 13 (rewired, count unchanged),test_audit.sh30 → 36 (+6 G17 4-state coverage)
7/7 user-facing stories shipped first-attempt PASS across 4 waves (4 parallel + 1 + 1 + retrospective). 0 retries, 0 cross-story contract violations, 0 merge conflicts. The new G15 CHANGELOG-ownership convention validated itself: only US-007 touches CHANGELOG.md (count=1, no warning emitted). G14's SPRINT_CONTRACT_TEST_REGEX constant exercised by both test_sprint_contract.sh and test_sprint_contract_ql_plan.sh — both green via the shared lib source. 1 new codebasePattern harvested (p006: single source of truth for shell constants). The execution itself ran on v0.6.3 master HEAD semantics — v0.6.4's persistence wires apply to v0.6.5+ runs only (self-modifying-orchestrator caveat). Plus 3 post-merge fixes (commit c89ba13, /soliton:pr-review-driven) addressing race conditions in lib/finding-persist.sh: atomic header bootstrap inside flock, fallback path guards, jq-failure abort. 44/44 tests GREEN (37 baseline + 7 new). Full retrospective: idea-stage/PIPELINE_REPORT_v5.md. v0.6.5+ backlog: idea-stage/IDEA_REPORT_v5.md.
8 user-facing changes covering G-track cleanup (G2/G3/G8/G9/G11) + 3-stage pre-impl spec-review (P5.B4 design / PRD / plan exits, advisory-only). Patch-tier; new mechanisms are opt-out via QL_SKIP_PRE_IMPL_REVIEW env var. No breaking changes. Closes IDEA_REPORT_v3 §G2/G3/G8/G9/G11 + §P5.B4 (expanded to 3 stages).
lib/api-rename.sh(US-005, G2) — symbol-migration helper.find_rename_targets <old> <new> [--exclude <glob>]emits<file>:<line>:<context>for every occurrence (code AND doc-comments);validate_rename_complete <old> [--exclude <glob>]exits 0 iff no occurrences remain. Addresses the v0.6.0 US-001 dogfood where thekill_agent_process → reap_agentrename initially missed line-4 module-header doc-comments.- Sprint-Contract
expectedTestsfilter +otherCommandsschema split (US-002, G9) — orchestrator Step 2.5 jq splits.commandsby test-pattern regex (test_|\.test\.|spec|pytest|^bash tests/|^npm test); test-pattern matches →expectedTests, rest → new siblingotherCommands. Backward-compat: existing readers ignore the new optional field. /ql-planexit writes Sprint-Contract per story (US-004, G3) —skills/ql-plan/SKILL.mdStep 8 iterates stories and callswrite_sprint_contract, materializing.handoffs/sprint-<storyId>.jsonfiles at plan time instead of lazily during orchestrator's first run. Idempotent on re-run.write_routing_snapshotcanonicalization (US-003, G11) —lib/runner.sh::write_routing_snapshotnow composes withlib/json-atomic.sh::write_quantum_jsonfor atomic write + JSON-validation gate, replacing its inlinejq > tmp && mvpattern. Inheritscleanup_stale_tmpcoordination consistently with all other quantum.json writers.- Critic fallback unification (US-001, G8) — deleted dead
quantum-loop.sh::parse_critic_argshim;lib/runner.sh::_availability_checknow role-aware:criticfalls back tonone(preserving US-002's "downgrade-not-substitute" intent),planner/executorfall back toclaude. PowerShell already atnonefor critic. Operator-visible behavior change:--critic=codexwith codex absent now produces critic disabled (wasclaudesince v0.6.0). spec-reviewerdesign-review mode (US-006, P5.B4-design) — new## Mode: design-reviewsection inagents/spec-reviewer.md+ Phase 4d hook inskills/ql-brainstorm/SKILL.md. Reads the just-saved design doc and reports structural gaps (missing sections, TBD/FIXME markers, hedge phrases, missing non-goals). Advisory: emits to stderr; does not abort the skill. Opt out viaQL_SKIP_PRE_IMPL_REVIEW=design.spec-reviewerprd-review mode (US-007, P5.B4-PRD) — new## Mode: prd-reviewsection + post-exit hook inskills/ql-spec/SKILL.md. Reads the just-saved PRD and reports non-testable ACs, vague FRs, missing measurement methods. Advisory. Opt out viaQL_SKIP_PRE_IMPL_REVIEW=prd(or comma-combined, e.g.design,prd).spec-reviewerplan-review mode (US-008, P5.B4-plan) — new## Mode: plan-reviewsection + Step 9 hook inskills/ql-plan/SKILL.md(after dag-validator + US-004 sprint-contract write). Cross-references quantum.json against the PRD; reports AC coverage gaps, command-test mismatches, missing wiring tasks. Advisory. Opt out viaQL_SKIP_PRE_IMPL_REVIEW=plan.
+67 new assertions across 8 test files. Zero regressions:
- 5 new files:
test_api_rename.sh(14),test_spec_review_design.sh(14),test_spec_review_prd.sh(13),test_spec_review_plan.sh(13),test_sprint_contract_ql_plan.sh(13) - 3 extended:
test_cross_provider_critic_flag.sh13 → 20 (+7),test_sprint_contract.sh11 → 16 (+5),test_per_role_routing.sh26 → 29 (+3 G11 composition + validation-gate)
8/8 user-facing stories shipped first-attempt PASS across 4 waves (5 + 2 + 1 + retrospective). Total wall-clock to wave-2 completion: ~34 minutes. Zero retries, zero cross-story contract violations, zero merge conflicts. Rule 0 fileConflicts severity=none classification held perfectly across all 4 conflicts. Per-story two-stage review gate applied; cross-story constant scan + typecheck + full test suite + barrel/dep-manifest regen ran at each wave boundary. 5 new codebasePatterns harvested (opt-out env-var pattern; RED-test-first when refactoring with validation gates; backward-compat schema additions default to empty array; inline-vs-subshell arg-parser globals; defensive \r stripping for heredoc-fed JSON on Git Bash). Full retrospective: idea-stage/PIPELINE_REPORT_v4.md. v0.7.0 backlog: idea-stage/IDEA_REPORT_v4.md.
Two follow-up items from the v0.6.x cycle's /soliton:pr-review post-merge findings (IDEA_REPORT_v3 §G10 + score-100 .cursor-plugin gap):
- G10 —
lib/json-atomic.shPRD-sha migration shim (compute_prd_sha_legacy+verify_prd_sha). Before v0.6.1,compute_prd_shadid not normalize CRLF; Windows users withautocrlf=truewho ran/ql-planunder v0.6.0 stored CRLF-era hashes. After upgrading to v0.6.1, the same PRD now hashes differently — every story'sprdShawould be markedstaleand force a full/ql-planre-run. v0.6.2 adds a transparent migration:verify_prd_shatries the new (LF-normalized) hash first; on mismatch, falls back to the legacy (v0.6.0) hash; if THAT matches, the orchestrator's Step 1.1 updates the stored value in-place with a single one-line "MIGRATE" log message and no re-plan. Real drift still marks the story stale as before. 3 new tests added (match / migrate / drift paths) + 1 orchestrator-wiring test. .cursor-plugin/plugin.jsonversion catch-up bump 0.5.1 → 0.6.2 — the v0.6.0 + v0.6.1 cycle bumped.claude-plugin/plugin.jsonand.claude-plugin/marketplace.jsonbut the Cursor manifest was missed. PR #58 established the convention of moving all three plugin manifests in lockstep on every release;/soliton:pr-reviewflagged this at score 100 on PR #61 but the fix landed here. Cursor marketplace consumers will now see v0.6.2 in sync with the Claude side.
The migration path is transparent: existing v0.6.0/v0.6.1 quantum.json files load and run without operator action. The first orchestrator pass after upgrade silently rewrites legacy prdSha values to the new format. Stories without a prdSha field continue to behave exactly as before (back-compat warning logged once).
Three correctness bugs surfaced by soliton:pr-review on the v0.6.0 bundle (1 at-threshold + 2 below-threshold; all confirmed as real cross-environment hazards rather than nits):
lib/runner.sh:write_routing_snapshotorphan.tmpcleanup — the inlinejq ... > "$qj.tmp" && mv ...pattern left a 0-byte$qj.tmpon jq failure, inconsistent withlib/json-atomic.sh:write_quantum_json's canonicalrm -f "$tmp_path"failure-branch cleanup. Now the function captures the jq exit code and removes the tmp file before propagating, mirroring the canonical pattern.lib/json-atomic.sh:compute_prd_shaCRLF cross-platform sha mismatch — therstrip(b' \t\n\r')only stripped trailing whitespace; on Windows withautocrlf=true, internal\r\nbytes throughout a CRLF-checked-out PRD were retained, producing a different sha256 than the same file on Linux/LF. Every story'sprdShawould false-positive asstatus: "stale"in cross-platform setups. Now normalizes\r\n→\nbefore hashing.quantum-loop.sh--critic/--planner/--executorspace-form$2guards — underset -euo pipefail, a trailing flag (e.g.--criticat end-of-args) crashed with bash'sunbound variableerror; a flag-following pattern (e.g.--critic --parallel) consumed the next flag as a value and emitted a generic enum error. Each space-form now guards[[ $# -lt 2 || "${2:-}" == --* ]]and emits a user-friendlyError: --<flag> requires a value (...)exit-2 message.
No new tests required — these fixes are tightening existing code paths exercised by tests/test_per_role_routing.sh (26), tests/test_prd_hash_pinning.sh (12), and tests/test_runner_integration.sh baseline. Test count and assertion totals unchanged from v0.6.0.
P5.A cleanup bundle (8 items) + P5.B1 per-role provider routing + P5.Z1 dogfood retrospective. Bigger dogfood than v0.5.1's --audit (10 stories, 5 waves, multi-runner dispatch). Closes P2.9 fully via OMC v4.12 mechanism port.
agents/orchestrator.mdStep 3B.3 watchdog wiring (US-001) — 3 explicit calls (poll, circuit, reset on STORY_PASSED) with reap_agent migration for platform-aware kills via taskkill on Windows.--critic=auto|codex|gemini|claude|none(US-002) — operator-facing critic provider flag with availability detection and fallback (subsumed by --planner/--critic/--executor in US-009).lib/deslop.shregex fallback (US-003) — when knip/ts-prune/vulture/cargo-udeps/staticcheck are absent, dispatches tolib/dead-code.shwith normalized{file, line, kind, severity}schema.- 5 new runner manifests (US-004) —
runners/{opencode,devin,kiro,goose,cline}.json, allexperimental: true.opencode.jsonincludes skill_discovery_paths quirk for Superpowers v5 plugin pattern compatibility. prdShafield per story (US-005) — RAGShield Level-1 drift detection (arXiv:2604.00387).lib/json-atomic.sh:compute_prd_shaproduces a stable sha256; orchestrator Step 1.1 hash-check marks mismatched storiesstatus: "stale"for re-plan.- Sprint-Contract handoff (US-006) — per-story
.handoffs/sprint-<storyId>.jsonwritten by/ql-planand consumed by/ql-execute+/ql-review. Mirrors Anthropic's 2026-03-24 Generator-Evaluator contract pattern. Schema documented inreferences/sprint-contract.md. - Inline self-review checklists (US-007) —
[INLINE-REVIEW] typecheck OK / lint OK / all assigned tests pass / file-org follows project conventionstokens in implementer prompt before STORY_PASSED. Subagent dispatch reserved for adversarial review (cross-story conflict, intent drift, security). 25min -> 30s on routine path per Superpowers v5.0.6. complexityfield per story +runner_select_model(US-008) — formulamin(100, task_count*10 + dependsOn_depth*15 + (security_tag ? 30 : 0) + filePaths_count*2). Routes <=30 -> haiku, 31-60 -> sonnet, 61+ -> opus. Story-levelmodel:'<override>'wins.--planner / --critic / --executorper-role routing (US-009) — ports OMC v4.12 mechanism.lib/runner.sh:resolve_routingresolves each role with availability check + fallback to claude. Snapshot persisted toquantum.json.routingfor replay determinism. Closes P2.9 fully.idea-stage/PIPELINE_REPORT_v3.md+idea-stage/IDEA_REPORT_v3.md(US-010) — v0.6.0 dogfood findings: 9/9 user-facing stories first-attempt PASS across 5 waves (5 parallel + 2 parallel + 3 sequential). 5 NEW codebasePatterns logged. P5.B2-B5 + P5.C frontier remain open for v0.7+. Test-suite delta: ~+110 new assertions, zero regressions.
110+ new assertions across 8 new test files. Zero regressions in pre-existing suites:
tests/test_watchdog_wiring.sh(9),tests/test_cross_provider_critic_flag.sh(13),tests/test_deslop_regex_fallback.sh(7)tests/test_runner_manifests.shextended +28 assertions,tests/test_complexity_routing.sh(19)tests/test_prd_hash_pinning.sh(12),tests/test_sprint_contract.sh(11)tests/test_per_role_routing.sh(26),tests/test_per_role_routing_integration.sh(6)tests/test_orchestrator_wiring.shextended +7 assertions for inline-checklist tokens
The pipeline executed its largest fan-out yet: 5-story parallel wave-0 with worktree isolation, zero file-conflict resolution failures. The DAG validator's Rule 0 fileConflicts severity=none classification held perfectly across all 10 conflicts. 5 NEW codebasePatterns surfaced (cross-module rename doc-comment scanning, jq validator gaps, PATH manipulation in tests, test-guard carve-outs, set -uo pipefail return-1 termination). All retrospective material captured in idea-stage/PIPELINE_REPORT_v3.md + idea-stage/IDEA_REPORT_v3.md.
quantum-loop.sh --audit(#56) — read-only repo-hygiene check that prints the six IDEA_REPORT §6 measurement metrics with drill-down on failures. Exit 0 all-OK, exit 1 any off-target, exit 2 misuse. Env-tunable thresholds viaQL_AUDIT_BRANCH_MAX/QL_AUDIT_ORPHAN_MAX/QL_AUDIT_CONFLICT_MAX/QL_AUDIT_CPC_MAX.docs/plans/2026-04-24-audit-flag-design.md+tasks/prd-audit-flag.md(#57) — pipeline artifacts from dogfooding/ql-brainstorm+/ql-specon the audit feature. Preserves the IDEA_REPORT §6 → design → PRD → shipped-feature traceback.
First real pipeline self-use. The /ql-brainstorm → /ql-spec → /ql-plan → /ql-execute cycle drove a complete 4-story feature end-to-end. Findings captured in #56 PR body for follow-up skill refinements (question-count rigidity in ql-spec, placeholder-drift across stories, file-conflict serialization heuristics).
All ten wedges from idea-stage/IDEA_REPORT.md §P3 landed with a consistent contract pattern (no shell flags at source time, CLI block enables strict mode, env-var tunables, readonly arrays guarded against re-source).
lib/constitution.sh(P3.11, arXiv:2602.02584) — regex-based invariants on generated code: hardcoded-secret scan, SQL-injection pattern, input-validation presence, immutable-schema rule.lib/deep-review.shfar_filter(P3.3/P3.4, arXiv:2505.17928 + arXiv:2604.03196) — KBI→FAR reviewer split with agreement boost, confidence cutoff, known-false-positive regex suppression.lib/trajectory.sh(P3.5, arXiv:2511.00197) — tool-shape thrashing detection:parse_trajectory/classify_trajectory(productive | searching | thrashing | stuck) /should_early_kill.lib/hyclone.sh(P3.7, arXiv:2508.01357) — Stage-1 semantic-clone fingerprint: alpha-normalize + sha256 +find_clonesgrouping.lib/conflict-grade.sh(P3.2, ConGra arXiv:2409.14121) — per-hunk conflict severity grading 1-5 + routing toauto-git | diff3 | llm-merge | escalate.lib/tracecoder.sh(P3.8, arXiv:2602.06875) — Observe-Analyze-Repair primitives:observe/extract_error_markers/build_analysis_context/should_repair.lib/reground.sh(P3.9, arXiv:2603.00492) — session-level drift mitigation: re-inject PRD + progress + iron-law reminder every N stories.lib/skeleton.sh(P3.1 SSAT, arXiv:2303.06689) — signature-level API surface:extract_skeleton/skeleton_text/skeleton_diffacross TS/JS/Python/Go/Rust.lib/intent-graph.sh(P3.6, arXiv:2604.11209) — formal semantic-intent extraction:(verb, object)triples from stories + code with bidirectional drift reporting.lib/dead-code.sh(P3.10, arXiv:2604.07291) — regex-based unused-import + unused-private-helper detection across TS/JS/Python/Go/Rust.
Every new lib wired into the orchestrator via grep-assertion-covered integration points. test_orchestrator_wiring.sh grew from 40 → 106 assertions, none can silently unwire a lib.
- Step 1C (reground) — periodic re-grounding, gate on
REGROUND_INTERVALstories. - Step 3A.1 sub-5 (skeleton) — pre-task API-surface preview.
- Step 3A.3 (tracecoder) — Observe-Analyze-Repair wrapper on typecheck/lint/test gates.
- Step 3A.5C (dead-code) — post-generation advisory unused-import/private scan.
- Step 3A.5D (intent-graph) — post-generation advisory verb-object drift check.
- Step 3A.5E (skeleton) — post-task skeleton-diff drift report.
- Step 3A.6 (trailers) — advisory trailers appended to commit message for durability in
git log. - Step 3B.3 (trajectory) — monitor-loop tick alongside watchdog; kill path via
reap_agent. - Step 3C.NEG0 (hyclone) — wave-boundary cross-story clone detection.
lib/merge-strategy.sh(conflict-grade) — grade 5 short-circuits to escalation; grades 1-4 logged alongside category routing.
- String-unsafe comment stripper in
lib/hyclone.shandlib/conflict-grade.sh— awk passes now track string state so//,/*,#inside a string literal are preserved verbatim (e.g.,"http://x","/regex/","/* not a comment */"). - Trajectory wiring log-path mismatch — orchestrator now reads
.ql-wt/$sid/.ql-agent-output.txt(spawn.sh convention) instead of the non-existent.ql-wt/$sid/agent.log. - TraceCoder wiring pseudocode — removed undefined
GATE_CMD[$gate]/mark_story_failed/apply_focused_fixidentifiers; replaced with prose agent-action comments matching the pattern used elsewhere (watchdog mark-failed, etc.). - test_typecheck_gate 12 failures — added
TYPECHECK_EXTRA_ALLOWED_PREFIXESenv var to extend the security allowlist for test fixtures without weakening the runtime gate. - Advisory trailers dead code — Steps 3A.5B/C/D/E each set a trailer variable but none were appended to the 3A.6 commit.
git loggrep workflow was unusable. Now each trailer is guarded-appended toCOMMIT_MSGbefore commit.
| Target | Goal | Achieved |
|---|---|---|
| CPC variant files | 0 | ✓ 0 |
| README conflict markers | 0 | ✓ 0 |
Orphan .claude/worktrees/agent-* |
0 | ✓ 0 |
| Remote branch count | ≤10 | ✓ 1 (master) |
| Local branch count | ≤10 | ✓ 1 (master) |
| Archive tags preserved | — | 49 |
| Master test suites green | 100% | ✓ 54/54 (~1,400 tests) |
- Multi-runner support — universal runner adapter lets quantum-loop drive Claude, Codex, Copilot, Cursor, Gemini, Aider, Cline, Amp, Devin, Kiro, Goose, and OpenCode through a shared manifest contract. Design doc:
docs/plans/2026-04-01-multi-runner-support-design.md. - Runner JSON schema (
schemas/runner.schema.json) + validator (schemas/validate.sh) for manifest linting. - Runner library (
lib/runner.sh) with load / ensure_instructions / command_builder / hook helpers. Manifests underrunners/*.json. - Signal protocol preamble injected into non-Claude runners so they emit
<quantum>STORY_PASSED/FAILED/COMPLETE/BLOCKED</quantum>markers in a shared format. - Signal heuristic fallback (
lib/signal-heuristics.sh) — if no explicit<quantum>signal is present, infer from commit evidence, test results, and hedge-phrase filters. - Instruction-file auto-copy — replicates
CLAUDE.mdto each runner's convention (AGENTS.md,GEMINI.md, etc.). - Runner manifests:
claude.json(guaranteed),codex.json(tested end-to-end), plus experimental manifests foramp,aider,copilot,cursor,gemini. - Sequential mode, PowerShell mode, and
templates/quantum-loop.shall wired to the runner framework. - 22-test signal-heuristic suite and an integration test-suite for Codex CLI dispatch.
- Sequential-mode status updates no longer lose state on runner switch.
- Heuristic false-positive filter for ambiguous runner output.
- Runner name + template argument validation hardened against injection.
- Hardening-v2: init-guard + AST-aware merge + resilience — design doc
docs/plans/2026-03-28-hardening-v2-design.md. lib/init-guard.sh— environment pre-flight: OneDrive / long-path detection, tmpdir writability check, orphan worktree prune.lib/merge-semantic.sh— AST-aware 3-way merge (ts-morph for TypeScript, libcst for Python, diff3 fallback) routed vialib/merge-strategy.sh.lib/resilience.sh— WIP commits per task, squash-on-merge at story boundary, crash recovery vialastWipCommit+completedTasksfields. Supersedeslib/crash-recovery.sh(to be removed).- Stash exclusion for
quantum.jsonduring merges to prevent schema corruption. - Integration tests covering init → merge flow, crash recovery with WIP commits, semantic merge conflict, quantum.json stash isolation.
- Stash-ordering race in
merge-strategy.sh. - Trap cleanup on early abort paths in
resilience.sh. - stderr redirect typo on ts-morph merge fallback path.
- Modular Hardening (7 independent modules) — design doc
docs/plans/2026-03-25-modular-hardening-design.md. lib/barrel-regen.sh— auto-regenerate barrel exports (_barrel.ts/__init__.py/ etc.) post-merge so new story files become consumer-importable.lib/dep-manifest.sh— detect dependency-manifest changes (npm / pip / go / cargo) and run appropriate install post-merge.lib/known-failures.sh— baseline + delta tracking of test failures per wave. Alerts on new regressions, not pre-existing red.- Worktree lifecycle —
worktree.shextended with lifecycle-tracking functions;execution.worktreeTrackingfields{activeWorktrees, cleanedThisSession, maxWorktrees}. - Category-based merge strategy (
lib/merge-strategy.sh) — routes conflicts by file kind:dependency_manifest → ours+install,barrel_export → regenerate,new_story_file → theirs,shared_infrastructure → ours,contract_stub → theirs, default escalate. - Interface cascade guard extending the L5 type audit.
contractBreakingflag +fixesfield in ql-plan for intentional breaking changes.- Unit tests for barrel-regen, dep-manifest, known-failures, worktree lifecycle, merge-strategy, plus integration tests for known-failures lifecycle and escalation retry.
- DAG Intelligence — parallel specialist validators — design doc
docs/plans/2026-03-24-dag-intelligence-design.md. dag-validatorcoordinator agent that spawns three specialists in parallel:bottleneck-analyzer— Kahn's-algorithm wave assignment; detects sequential bottlenecks.duplication-detector— Jaccard keyword pre-filter + LLM semantic check for overlapping stories.conflict-auditor— computes completefileConflictsfromfilePathsintersections with severity classification.
storyTypefield on stories (feature / refactor / fix / test / docs / skeleton / integration).dagValidationblock +severityfield inquantum.jsonschema.dag-validation.mdreference doc.ql-planintegratesdag-validatorinvocation before plan confirmation; shows DAG Health Report.
- PR-review findings from soliton review (prompt hardening, context-window optimization).
- Progressive Materialization (5-layer type-divergence defense) — design doc
docs/plans/2026-03-18-worktree-isolation-fix-design.md. lib/materialize.sh— detects language; materializes real interface files for contracts consumed by ≥2 stories (threshold configurable). Smart-materialization threshold logic added in this release.lib/type-audit.sh— grep-based duplicate-definition detection at wave boundary; spawnstype-auditoragent on hits and feeds findings back into contracts.- Post-merge typecheck gate — auto-detects project typechecker (tsc, pyright, mypy, etc.) and reverts on failure.
- Auto-promotion of discovered contracts — orchestrator promotes wave-end audited types into
materializedContracts. typecheckCommandfield inquantum.jsonschema; new execution-metadata fields for materialization and audit tracking.- Shared-types directory inference +
contract-shapesreference doc forql-plan. - Contract Effectiveness section in orchestrator post-mortem output.
- Extended
merge_worktree_branch()with conflict classification. - Unit tests for
materialize.sh,type-audit.sh, merge escalation, typecheck gate.
- Command-injection hardening in merged scripts.
- Path-traversal checks in materialization paths.
- Merge revert safety when a post-merge gate fails.
- L5 audit feedback loop, error counting, Python pattern recognition.
- File-conflict-aware DAG scheduling — new
filter_file_conflicts()indag-query.shprevents spawning parallel agents that share file paths. Uses greedy priority-ordered selection with exact-match comparison. Wired into both dispatch sites inquantum-loop.sh(initial wave + mid-wave top-up). Also detects cross-wave conflicts by seeding with in_progress stories' files. - Worktree nesting prevention — new
_resolve_repo_root()helper inworktree.shresolves nested worktree paths to the top-level repo root viagit rev-parse --git-common-dir. Used by all three public functions (create, remove, list). Falls back gracefully on Git < 2.31 and warns when resolution fails inside.ql-wt/paths. - Windows long-path fallback —
create_worktreedetects paths > 200 chars and falls back to a repo-namespaced temp directory (/tmp/ql-wt-<hash>/). Re-checks fallback length with emergency/tmplast resort.remove_worktreeandlist_worktreescheck both locations. - Editable install race condition — PYTHONPATH injection guidance added to
implementer.md,orchestrator.md, andspawn.sh(both src-layout and flat-layout variants). Agents no longer runpip install -e .in parallel worktrees. - Worktree cleanup on Windows — retry loop (3 attempts, 2s delay) for file locks from OneDrive sync /
__pycache__.rm -rffallback only runs whengit worktree removeactually fails.git worktree pruneruns after cleanup. - Unescaped shell expansions —
orchestrator.mdprompt template now uses\$(pwd)and\$PYTHONPATH(matchingspawn.shpattern) to prevent premature expansion. - echo → printf —
detect_cyclesindag-query.shnow usesprintfmatching project convention.
- Inline review gate for parallel mode (Step 3B.4) — orchestrator runs spec compliance + code quality checks after each worktree merge, matching sequential mode's quality bar. Defers to wave-end when accumulated diff exceeds 2000 lines.
- Full-feature code review (Step 4B) — holistic review of entire branch diff after all stories pass. Checks cross-story consistency (naming, duplicates, type mismatches), architecture coherence (PRD goals, data flow, backward compatibility), and security (secrets, TODOs, error handling).
- quantum.json merge guidance — documented stash/merge/restore pattern and recommended
.gitignorebest practice. - Input validation —
filter_file_conflicts()validates file path and eligible array before processing. - Cross-repo collision prevention —
_short_path_base()uses repo-root hash to namespace/tmpworktrees per repository. - 14 new tests:
filter_file_conflicts(8 tests covering filePaths overlap, fileConflicts entries, empty/null, three-way conflict, transitive chains, similar paths),_resolve_repo_root(3 tests: identity, from-worktree, nested-create),_short_path_base(2 tests: determinism, cross-repo uniqueness), PYTHONPATH in spawn prompt (4 assertions). 79 total tests, all passing.
- Mandatory worktree isolation —
isolation: "worktree"is now documented as MANDATORY for parallel execution, with specific failure modes listed (bash contention, file conflicts, quantum.json races) - Correct tool naming — orchestrator now references "Agent tool" (not "Task tool") with exact parameter names (
subagent_type,isolation,mode,run_in_background) - Atomic quantum.json updates — new Step 3B.1 batches all
in_progressstatus writes into a single atomic update before spawning agents - Monitor loop — changed from polling to waiting for Claude Code completion notifications
- State management discipline — only the orchestrator writes quantum.json; Edit tool banned (use Python/jq); multi-story updates batched into one write
- Implementer parallel mode — implementer agents in worktrees no longer edit quantum.json (stale copy); report via output message instead
- Anti-rationalization guards — 2 new entries blocking "skip worktree" and "worktrees won't work on this OS" excuses
- Cross-story contracts —
contractsfield in quantum.json for shared values (secret keys, env vars, types) across parallel stories. ql-plan generates them; implementer reads and enforces them with propose-and-wait on disagreements. - Wiring verification —
wiring_verificationfield on tasks with grep-based mechanical check in spec-reviewer. No agent judgment: missing string = fail, present = pass. consumedByfield — declares cross-story consumption so implementers import existing components instead of inlining duplicates.- Coverage gate — configurable
coverageThresholdin quantum.json; quality-reviewer runs coverage tool and fails stories below threshold (tool detection: c8 → nyc → pytest-cov → go test → JaCoCo). - Coding standards enforcement — quality-reviewer reads CLAUDE.md,
.claude/rules/, andcodebasePatterns; violations of documented rules are CRITICAL severity. - Stale story detection —
startedAttimestamp +staleThresholdMinutes(default 20, CLI-overridable). Implemented in orchestrator, quantum-loop.sh, quantum-loop.ps1, and templates/quantum-loop.sh. - Final verification sweep — scripts run full test suite + import smoke test before declaring COMPLETE. Test failure blocks COMPLETE.
- Execution observations — auto-generated post-mortem doc after every run with failure summary, patterns, and raw data. Optional GitHub issue filing with user confirmation.
- CLAUDE.md defensive check — Step 3 warns and skips stories that are in_progress but not assigned to the current agent.
- Lifecycle awareness — ql-brainstorm question #7 (LIFECYCLE) and ql-spec Pre-Save lifecycle checklist (first-run, returning-user, update, error recovery, empty state, uninstall).
testFirstmandate — ql-plan defaultstestFirst: truefor all tasks; exempt tasks requirenotesjustification.- Shell tests for stale detection, startedAt, final sweep, and observations (4 new test files, 14 tests)
- Atomic in_progress + startedAt write — single jq operation prevents crash window where story is in_progress without startedAt
- startedAt cleared on all exit paths — BLOCKED, unrecognized signal, timeout, merge conflict, non-zero exit, and crash recovery
- Null-guard failureLog — observations jq uses
(.retries.failureLog // [])[]to handle null vs empty array
- Cross-story integration review — Stage 3 in ql-review traces call chains across story boundaries using LSP (grep fallback). Runs after dependency chains complete and as a final gate before COMPLETE.
- Final integration gate — orchestrator runs import smoke test, full test suite, and dead code scan before declaring COMPLETE
- File-touch conflict detection — ql-plan Step 5 flags parallel stories modifying the same file, adds reconciliation tasks, stores conflicts in
quantum.jsonmetadata (fileConflicts) - Consumer verification pattern — wiring acceptance criteria belong on the consumer story, not the creator
- Edge case test requirements — boundary values, type variations, collision scenarios, scale tests required for all testFirst tasks
- Edge case reference doc —
references/edge-cases.mdwith Python, JS, Go, Rust testing gotchas. Implementer reads it at the start of every testFirst task. - Import chain verification — ql-verify requires integration evidence for multi-story features
- Cursor marketplace manifest —
.cursor-plugin/plugin.jsonfor cross-platform publishing
- Orchestrator Step 4 split into Step 4 (Final Integration Gate) and Step 5 (Completion)
- Implementer always reads
references/edge-cases.mdfor testFirst tasks (not on-demand)
- Orchestrator agent (
agents/orchestrator.md) — manages full execution lifecycle inside Claude Code with DAG query, sequential/parallel dispatch, two-stage review, retry logic - Native PowerShell script (
quantum-loop.ps1) — Windows overnight runs without bash/WSL - SkillsMP compatibility —
namefield in all SKILL.md frontmatter - ql-plan runner copy — copies quantum-loop.sh/ps1 into project after planning
- Lost work in parallel mode — agents must commit before signaling; orchestrator adds safety commit before merge
- Merge failure on dirty tree — stash working tree before merge, pop after
- Stale worktree branches — delete existing branch before
git worktree add -b
- Simplified
skills/ql-execute/SKILL.mdfrom ~300 lines to ~50 line dispatcher CLAUDE.mdparallel mode: agents explicitly told to commit before signalinglib/spawn.shprompt includes commit instruction
- Parallel execution via DAG-driven worktree agents
- 7 shell library modules (
lib/) for DAG query, worktree lifecycle, agent spawning, monitoring, atomic JSON writes, crash recovery - 7 test suites with 110 tests
--paralleland--max-parallelflags forquantum-loop.sh/ql-executeparallel orchestration via Task subagents- Crash recovery for orphaned worktrees
CLAUDE.mdparallel mode instructions
- Initial release
- 6 skills: brainstorm, spec, plan, execute, verify, review
- 3 agents: implementer, spec-reviewer, quality-reviewer
quantum-loop.shsequential autonomous loopCLAUDE.mdagent template- Dependency DAG execution
- Two-stage review gates (spec compliance + code quality)
- Iron Law verification
- Anti-rationalization guards