Project Context
FocusBlox is a productivity iOS/macOS app built with SwiftUI (iOS 26.2+). The project uses a custom Claude Code workflow with 6 phases, 3 human checkpoints, Python hook gates, and a multi-agent Orchestrator-Developer-Adversary pattern. The workflow has been in active use for several months across ~711 commits and 37+ completed workflow cycles.
What Works (with Evidence)
1. Hard Gates via Hook Scripts — 100% Compliance
Evidence: feedback_workflow-discipline.md: "CLAUDE.md rules are followed with ~60-70% probability. Hooks with 100%."
The Python hook system (edit_gate.py, test_quality_gate.py, bash_gate.py, phase_listener.py) enforces workflow discipline reliably. Any rule documented in CLAUDE.md alone gets skipped eventually. The same rule enforced via a blocking hook never gets skipped.
Key insight: Documentation is a suggestion. Hooks are law. This is the single most important finding from this project.
2. Human Checkpoint System (3-gate approval)
Evidence: reference_workflow-gate-map.md — 3 explicit human checkpoints: "stimmt" (analysis approved), "go" (TDD RED approved), "commit" (implementation approved).
Claude cannot advance checkpoints itself. Only the user can trigger phase transitions via phase_listener.py. This successfully prevents Claude from self-approving its own work.
3. TDD Phase Separation via edit_gate
Evidence: edit_gate.py enforces that test files can only be written in phase4_tdd_red and source code only in phase5_implement. This effectively prevents retroactive test writing.
4. Silent-Pass Test Detection (INFRA_016)
Evidence: test_quality_gate.py — blocks guard let x = ... else { return } and similar patterns in test files that would make tests always-green regardless of implementation. 17 hook tests verify this. Commit 8359792 introduced this hardening.
5. Conventional Commits + Issue Linking (bash_gate)
Evidence: git log shows consistent feat:, fix:, refactor: prefixes across 711 commits. bash_gate.py blocks commits without a GitHub issue reference for feat/fix commits.
What Doesn't Work (with Evidence)
1. Adversary Produces False Positives
Evidence: feedback_adversary-false-positives.md — In session 2026-04-17, 3 of 6 Adversary findings were false positives. The Adversary read the spec to generate findings but didn't verify against actual code. Example: "adoptIdea deletes on error" — code already had do/catch.
Root cause: Adversary prompt says "spec requires X" without reading current implementation first.
2. Early Phases (Context + Analysis) Have No Hard Gates
Evidence: feedback_workflow-discipline.md incident 2026-04-13 (FEATURE_185): "Context/Analysis skipped, Fresh-Eyes faked (only mark-result-inspection set without Agent), macOS build forgotten."
Phase 1 (Context) and Phase 2 (Analysis) have no hard blocking gates. Claude can proceed without completing them. The memory explicitly notes: "Skill text: 'Not gate-enforced but still best practice' — that's an invitation to skip."
3. Mandatory Steps Not Hook-Enforced
Evidence: feedback_agent-orchestration-gaps.md — 4 mandatory steps documented in CLAUDE.md but not enforced:
/inspect-ui before UI tests → Trial-and-error on element IDs
- Screenshots at Checkpoint 3 → Text description instead of visual proof
/13-localize check → Hardcoded strings shipped
- Context file requirement → Phase 1 bypassed
4. Adversary Isolation Bug
Evidence: feedback_workflow-gate-ux.md — When Adversary runs with isolation: "worktree", it receives a clean repo copy and reviews the old code state, not the current uncommitted changes.
5. No Execution Metrics
Evidence: No systematic log exists. When asked "what works and what doesn't?", the only available data is anecdotal memory entries (corrections from past sessions). There is no structured record of: how often phases were skipped, adversary true/false positive rate, scope compliance, TDD discipline rate.
6. Self-Policing Agents Removed
Evidence: reference_workflow-gate-map.md lists 5 self-policing agents that were removed: analysis-challenger, fresh-eyes, implementation-validator, spec-validator, user-advocate. Reason: they weren't technically enforceable and added overhead without reliability.
Gaps / Blind Spots
-
No Observability Standard — No schema for recording workflow execution outcomes. Makes cross-session and cross-project learning impossible.
-
Scope Guard Not Enforced — CLAUDE.md documents max 5 files / ±250 LoC per workflow. No hook enforces this limit. Violations go undetected.
-
AskUserQuestion Tool Not Detected by phase_listener — feedback_workflow-gate-ux.md: Phase listener only reacts to direct chat messages, not AskUserQuestion tool responses. User must type keywords twice.
-
No Reproducibility Spec — The workflow exists as a collection of Python scripts, JSON state files, and CLAUDE.md rules. There is no machine-readable spec that describes the workflow in a way another project could implement it independently.
-
Adversary Pre-Fix Validation Not Enforced — The adversary is required to stash changes, run the test without the fix (to confirm it fails), then restore. This is documented but not hook-enforced.
Concrete Recommendations for agent-os-openspec
R1: Establish a "Hooks are Law" Principle
Any workflow rule that is not enforced by a hook/gate should be explicitly marked as "aspirational" in the spec. The spec should recommend that every mandatory step has a corresponding machine-enforced gate. CLAUDE.md-style documentation alone achieves ~60-70% compliance. Gates achieve 100%.
R2: Define a Workflow Execution Log Schema
Standardize a minimal YAML/JSON schema for recording each workflow execution. Example fields:
workflow_id: FEATURE_123
project: FocusBlox
date: 2026-05-09
phases_completed: [1, 2, 3, 4, 5]
phases_skipped: []
tdd_red_confirmed: true
adversary_findings: 3
adversary_false_positives: 1
override_used: false
files_changed: 4
loc_diff: +187
outcome: success # success | partial | reverted
This would make cross-project learning possible. Currently there is no comparable data available.
R3: Specify Adversary Code-First Verification
The spec should require that Adversary agents verify each finding against actual code (with file:line reference) before reporting it as a finding. A finding without a code reference is not a finding.
R4: Define Phase Gate Requirements
The spec should define which phases require hard (blocking) gates vs. soft (documented) gates. Projects should be able to implement the spec and know which transitions are enforceable by tooling.
R5: Address Multi-Agent Isolation
When a sub-agent (e.g., Adversary, Fresh-Eyes) needs to review uncommitted changes, it must not run in worktree isolation. The spec should address how multi-agent workflows handle uncommitted state.
Project: FocusBlox | Stack: iOS/macOS, SwiftUI, Xcode 26.2 | Workflow maturity: ~6 months, 711 commits, 37+ completed cycles
Project Context
FocusBlox is a productivity iOS/macOS app built with SwiftUI (iOS 26.2+). The project uses a custom Claude Code workflow with 6 phases, 3 human checkpoints, Python hook gates, and a multi-agent Orchestrator-Developer-Adversary pattern. The workflow has been in active use for several months across ~711 commits and 37+ completed workflow cycles.
What Works (with Evidence)
1. Hard Gates via Hook Scripts — 100% Compliance
Evidence:
feedback_workflow-discipline.md: "CLAUDE.md rules are followed with ~60-70% probability. Hooks with 100%."The Python hook system (
edit_gate.py,test_quality_gate.py,bash_gate.py,phase_listener.py) enforces workflow discipline reliably. Any rule documented in CLAUDE.md alone gets skipped eventually. The same rule enforced via a blocking hook never gets skipped.Key insight: Documentation is a suggestion. Hooks are law. This is the single most important finding from this project.
2. Human Checkpoint System (3-gate approval)
Evidence:
reference_workflow-gate-map.md— 3 explicit human checkpoints: "stimmt" (analysis approved), "go" (TDD RED approved), "commit" (implementation approved).Claude cannot advance checkpoints itself. Only the user can trigger phase transitions via
phase_listener.py. This successfully prevents Claude from self-approving its own work.3. TDD Phase Separation via edit_gate
Evidence:
edit_gate.pyenforces that test files can only be written inphase4_tdd_redand source code only inphase5_implement. This effectively prevents retroactive test writing.4. Silent-Pass Test Detection (INFRA_016)
Evidence:
test_quality_gate.py— blocksguard let x = ... else { return }and similar patterns in test files that would make tests always-green regardless of implementation. 17 hook tests verify this. Commit8359792introduced this hardening.5. Conventional Commits + Issue Linking (bash_gate)
Evidence: git log shows consistent
feat:,fix:,refactor:prefixes across 711 commits.bash_gate.pyblocks commits without a GitHub issue reference for feat/fix commits.What Doesn't Work (with Evidence)
1. Adversary Produces False Positives
Evidence:
feedback_adversary-false-positives.md— In session 2026-04-17, 3 of 6 Adversary findings were false positives. The Adversary read the spec to generate findings but didn't verify against actual code. Example: "adoptIdea deletes on error" — code already had do/catch.Root cause: Adversary prompt says "spec requires X" without reading current implementation first.
2. Early Phases (Context + Analysis) Have No Hard Gates
Evidence:
feedback_workflow-discipline.mdincident 2026-04-13 (FEATURE_185): "Context/Analysis skipped, Fresh-Eyes faked (only mark-result-inspection set without Agent), macOS build forgotten."Phase 1 (Context) and Phase 2 (Analysis) have no hard blocking gates. Claude can proceed without completing them. The memory explicitly notes: "Skill text: 'Not gate-enforced but still best practice' — that's an invitation to skip."
3. Mandatory Steps Not Hook-Enforced
Evidence:
feedback_agent-orchestration-gaps.md— 4 mandatory steps documented in CLAUDE.md but not enforced:/inspect-uibefore UI tests → Trial-and-error on element IDs/13-localizecheck → Hardcoded strings shipped4. Adversary Isolation Bug
Evidence:
feedback_workflow-gate-ux.md— When Adversary runs withisolation: "worktree", it receives a clean repo copy and reviews the old code state, not the current uncommitted changes.5. No Execution Metrics
Evidence: No systematic log exists. When asked "what works and what doesn't?", the only available data is anecdotal memory entries (corrections from past sessions). There is no structured record of: how often phases were skipped, adversary true/false positive rate, scope compliance, TDD discipline rate.
6. Self-Policing Agents Removed
Evidence:
reference_workflow-gate-map.mdlists 5 self-policing agents that were removed:analysis-challenger,fresh-eyes,implementation-validator,spec-validator,user-advocate. Reason: they weren't technically enforceable and added overhead without reliability.Gaps / Blind Spots
No Observability Standard — No schema for recording workflow execution outcomes. Makes cross-session and cross-project learning impossible.
Scope Guard Not Enforced — CLAUDE.md documents max 5 files / ±250 LoC per workflow. No hook enforces this limit. Violations go undetected.
AskUserQuestion Tool Not Detected by phase_listener —
feedback_workflow-gate-ux.md: Phase listener only reacts to direct chat messages, notAskUserQuestiontool responses. User must type keywords twice.No Reproducibility Spec — The workflow exists as a collection of Python scripts, JSON state files, and CLAUDE.md rules. There is no machine-readable spec that describes the workflow in a way another project could implement it independently.
Adversary Pre-Fix Validation Not Enforced — The adversary is required to stash changes, run the test without the fix (to confirm it fails), then restore. This is documented but not hook-enforced.
Concrete Recommendations for agent-os-openspec
R1: Establish a "Hooks are Law" Principle
Any workflow rule that is not enforced by a hook/gate should be explicitly marked as "aspirational" in the spec. The spec should recommend that every mandatory step has a corresponding machine-enforced gate. CLAUDE.md-style documentation alone achieves ~60-70% compliance. Gates achieve 100%.
R2: Define a Workflow Execution Log Schema
Standardize a minimal YAML/JSON schema for recording each workflow execution. Example fields:
This would make cross-project learning possible. Currently there is no comparable data available.
R3: Specify Adversary Code-First Verification
The spec should require that Adversary agents verify each finding against actual code (with file:line reference) before reporting it as a finding. A finding without a code reference is not a finding.
R4: Define Phase Gate Requirements
The spec should define which phases require hard (blocking) gates vs. soft (documented) gates. Projects should be able to implement the spec and know which transitions are enforceable by tooling.
R5: Address Multi-Agent Isolation
When a sub-agent (e.g., Adversary, Fresh-Eyes) needs to review uncommitted changes, it must not run in worktree isolation. The spec should address how multi-agent workflows handle uncommitted state.
Project: FocusBlox | Stack: iOS/macOS, SwiftUI, Xcode 26.2 | Workflow maturity: ~6 months, 711 commits, 37+ completed cycles