Skip to content

[Feedback] FocusBlox (iOS/macOS SwiftUI) — Workflow Analysis #3

@henemm

Description

@henemm

Project Context

FocusBlox is a productivity iOS/macOS app built with SwiftUI (iOS 26.2+). The project uses a custom Claude Code workflow with 6 phases, 3 human checkpoints, Python hook gates, and a multi-agent Orchestrator-Developer-Adversary pattern. The workflow has been in active use for several months across ~711 commits and 37+ completed workflow cycles.


What Works (with Evidence)

1. Hard Gates via Hook Scripts — 100% Compliance

Evidence: feedback_workflow-discipline.md: "CLAUDE.md rules are followed with ~60-70% probability. Hooks with 100%."

The Python hook system (edit_gate.py, test_quality_gate.py, bash_gate.py, phase_listener.py) enforces workflow discipline reliably. Any rule documented in CLAUDE.md alone gets skipped eventually. The same rule enforced via a blocking hook never gets skipped.

Key insight: Documentation is a suggestion. Hooks are law. This is the single most important finding from this project.

2. Human Checkpoint System (3-gate approval)

Evidence: reference_workflow-gate-map.md — 3 explicit human checkpoints: "stimmt" (analysis approved), "go" (TDD RED approved), "commit" (implementation approved).

Claude cannot advance checkpoints itself. Only the user can trigger phase transitions via phase_listener.py. This successfully prevents Claude from self-approving its own work.

3. TDD Phase Separation via edit_gate

Evidence: edit_gate.py enforces that test files can only be written in phase4_tdd_red and source code only in phase5_implement. This effectively prevents retroactive test writing.

4. Silent-Pass Test Detection (INFRA_016)

Evidence: test_quality_gate.py — blocks guard let x = ... else { return } and similar patterns in test files that would make tests always-green regardless of implementation. 17 hook tests verify this. Commit 8359792 introduced this hardening.

5. Conventional Commits + Issue Linking (bash_gate)

Evidence: git log shows consistent feat:, fix:, refactor: prefixes across 711 commits. bash_gate.py blocks commits without a GitHub issue reference for feat/fix commits.


What Doesn't Work (with Evidence)

1. Adversary Produces False Positives

Evidence: feedback_adversary-false-positives.md — In session 2026-04-17, 3 of 6 Adversary findings were false positives. The Adversary read the spec to generate findings but didn't verify against actual code. Example: "adoptIdea deletes on error" — code already had do/catch.

Root cause: Adversary prompt says "spec requires X" without reading current implementation first.

2. Early Phases (Context + Analysis) Have No Hard Gates

Evidence: feedback_workflow-discipline.md incident 2026-04-13 (FEATURE_185): "Context/Analysis skipped, Fresh-Eyes faked (only mark-result-inspection set without Agent), macOS build forgotten."

Phase 1 (Context) and Phase 2 (Analysis) have no hard blocking gates. Claude can proceed without completing them. The memory explicitly notes: "Skill text: 'Not gate-enforced but still best practice' — that's an invitation to skip."

3. Mandatory Steps Not Hook-Enforced

Evidence: feedback_agent-orchestration-gaps.md — 4 mandatory steps documented in CLAUDE.md but not enforced:

  • /inspect-ui before UI tests → Trial-and-error on element IDs
  • Screenshots at Checkpoint 3 → Text description instead of visual proof
  • /13-localize check → Hardcoded strings shipped
  • Context file requirement → Phase 1 bypassed

4. Adversary Isolation Bug

Evidence: feedback_workflow-gate-ux.md — When Adversary runs with isolation: "worktree", it receives a clean repo copy and reviews the old code state, not the current uncommitted changes.

5. No Execution Metrics

Evidence: No systematic log exists. When asked "what works and what doesn't?", the only available data is anecdotal memory entries (corrections from past sessions). There is no structured record of: how often phases were skipped, adversary true/false positive rate, scope compliance, TDD discipline rate.

6. Self-Policing Agents Removed

Evidence: reference_workflow-gate-map.md lists 5 self-policing agents that were removed: analysis-challenger, fresh-eyes, implementation-validator, spec-validator, user-advocate. Reason: they weren't technically enforceable and added overhead without reliability.


Gaps / Blind Spots

  1. No Observability Standard — No schema for recording workflow execution outcomes. Makes cross-session and cross-project learning impossible.

  2. Scope Guard Not Enforced — CLAUDE.md documents max 5 files / ±250 LoC per workflow. No hook enforces this limit. Violations go undetected.

  3. AskUserQuestion Tool Not Detected by phase_listenerfeedback_workflow-gate-ux.md: Phase listener only reacts to direct chat messages, not AskUserQuestion tool responses. User must type keywords twice.

  4. No Reproducibility Spec — The workflow exists as a collection of Python scripts, JSON state files, and CLAUDE.md rules. There is no machine-readable spec that describes the workflow in a way another project could implement it independently.

  5. Adversary Pre-Fix Validation Not Enforced — The adversary is required to stash changes, run the test without the fix (to confirm it fails), then restore. This is documented but not hook-enforced.


Concrete Recommendations for agent-os-openspec

R1: Establish a "Hooks are Law" Principle

Any workflow rule that is not enforced by a hook/gate should be explicitly marked as "aspirational" in the spec. The spec should recommend that every mandatory step has a corresponding machine-enforced gate. CLAUDE.md-style documentation alone achieves ~60-70% compliance. Gates achieve 100%.

R2: Define a Workflow Execution Log Schema

Standardize a minimal YAML/JSON schema for recording each workflow execution. Example fields:

workflow_id: FEATURE_123
project: FocusBlox
date: 2026-05-09
phases_completed: [1, 2, 3, 4, 5]
phases_skipped: []
tdd_red_confirmed: true
adversary_findings: 3
adversary_false_positives: 1
override_used: false
files_changed: 4
loc_diff: +187
outcome: success  # success | partial | reverted

This would make cross-project learning possible. Currently there is no comparable data available.

R3: Specify Adversary Code-First Verification

The spec should require that Adversary agents verify each finding against actual code (with file:line reference) before reporting it as a finding. A finding without a code reference is not a finding.

R4: Define Phase Gate Requirements

The spec should define which phases require hard (blocking) gates vs. soft (documented) gates. Projects should be able to implement the spec and know which transitions are enforceable by tooling.

R5: Address Multi-Agent Isolation

When a sub-agent (e.g., Adversary, Fresh-Eyes) needs to review uncommitted changes, it must not run in worktree isolation. The spec should address how multi-agent workflows handle uncommitted state.


Project: FocusBlox | Stack: iOS/macOS, SwiftUI, Xcode 26.2 | Workflow maturity: ~6 months, 711 commits, 37+ completed cycles

Metadata

Metadata

Assignees

No one assigned

    Labels

    workflow-feedbackFeedback from a project implementing the workflow spec

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions