Skip to content

[Feedback] Meditationstimer — Workflow-Analyse (Workflow v6, Agent Orchestrator Pattern) #4

@henemm

Description

@henemm

Project Context

Meditationstimer / Healthy Habits Haven is an iOS + watchOS SwiftUI meditation and wellness app (Swift 6.2, Xcode 26, iOS 18.5+). The project uses Claude Code with a custom Workflow v6 — an 8-phase, multi-agent orchestrator pattern with hook-enforced checkpoints, TDD RED/GREEN discipline, and a dedicated Adversary agent.

Stack: SwiftUI, HealthKit, XCTest/XCUITest, GitHub Issues for backlog.


What Works (Evidence-Based)

1. Hook-enforced Phase Gating — Highly Effective

The edit_gate.py hook enforces strict file access per phase:

  • Phase 4 (phase4_tdd_red): only test files editable
  • Phase 5 (phase5_implement): only source files editable, test changes blocked
  • Phase 6+: scope limit of 5 code files checked
  • bash_gate.py blocks git commit without checkpoint3_approved and a GitHub Issue reference

Evidence: Both archived workflows (bug-background-meditation.json, bug-7-remaining-tests.json) completed all 3 checkpoints correctly. The guard has dedicated logic for __infra__ overrides, stop-locks, and build-lock serialization.

2. 3-Checkpoint System — Fully Used in Practice

Both archived workflows show checkpoint1_approved, checkpoint2_approved, checkpoint3_approved all set. The phase_listener.py hook detects natural-language keywords ("stimmt", "go", "commit") to set these — Claude cannot set them directly.

Evidence from bug-background-meditation.json:

"checkpoint1_notes": "User approved at 2026-04-22T12:44:43",
"checkpoint2_notes": "User approved at 2026-04-22T14:41:54",
"checkpoint3_notes": "User approved at 2026-04-22T16:13:38"

3. TDD RED/GREEN Cycle — Consistently Enforced

Evidence from bug-background-meditation.json:

"ui_test_red_result": "3/3 runs FAILED - test_backgroundForeground_sessionStillRunning consistently fails",
"green_test_result": "UI Tests GREEN after Finding-1-Fix: both BackgroundMeditationUITests passed"

The post_bash.py hook auto-warns when tests pass during RED phase or fail during GREEN phase.

4. Adversary Findings System — Granular, Traceable

The add-finding CLI creates structured records with id, title, impact, proof, and per-finding resolution status (fix/accept/defer). Evidence from bug-background-meditation.json — 3 findings, each resolved separately:

5. Information Isolation Between Agents — Designed and Followed

The orchestrator commands (10-bug.md, 11-feature.md) explicitly define what each agent receives and what it must not see (e.g., User-Advocate gets no code, Investigators don't see each other's results, Adversary gets no analysis or Developer report). This is enforced by convention, not hooks.

6. Localization Gate at Commit

bash_gate.py blocks git commit unless localize_checked: true OR no_user_strings: true. This prevents shipping untranslated strings.
Evidence: memory entry bug38-uppercase-investigation.md — 163 missing DE/EN translations were the root cause of uppercase debug keys; the gate would have caught this if committed through the workflow.


What Doesn't Work (Evidence-Based)

1. red_test_done Flag Never Set (Only UI Variant)

Both archived workflows show "red_test_done": false alongside "ui_test_red_done": true. The standard unit test RED artifact is never registered because the project uses UI tests exclusively for TDD. This means the edit_gate RED-check (step 11 in edit_gate.py) always falls through to the ui_test_red_done fallback. The primary flag is effectively dead code.

2. Adversary "AMBIGUOUS" Verdict Has No Enforcement Path

bug-background-meditation.json shows "adversary_verdict": "AMBIGUOUS" but the workflow still proceeded to checkpoint3_approved and commit. The bash_gate.py only blocks on unresolved findings, not on AMBIGUOUS verdict itself. An AMBIGUOUS verdict with all findings resolved is indistinguishable from VERIFIED at the gate level.

3. Trial-and-Error Despite Analysis-First Principle

Memory entry for Bug 38 (xcstrings uppercase keys):

"12 Versuche mit .textCase, NavigationView, Scheme-Flags etc. waren alle Sackgassen"

Despite the analysis-first standard, the root cause (missing xcstrings translations) was found only after exhaustive trial-and-error. The workflow's Analysis phase did not prevent this. The phase2_analyse structure does not mandate "falsification of candidates before fixing."

4. Checkpoint Bypass Was Attempted

Memory entry: feedback_no_checkpoint_bypass.md — "WORKFLOW_CALLER=phase_listener nie manuell setzen". This feedback exists because the bypass was actually attempted. The phase_listener.py guard (os.environ.get("CLAUDE_ADMIN")) is the only protection, and WORKFLOW_CALLER itself is not validated by workflow.py.

5. Scope Enforcement Is File-Count-Only, Not LoC

edit_gate.py checks len(code_affected) > 5 but does not count lines changed. The CLAUDE.md rule of ±250 LoC has no automated enforcement. Commits like f1b9a4a fix: Add 163 missing DE/EN translations (likely >250 LoC) could pass the gate undetected.


Gaps / Blind Spots

1. No Workflow Metrics / Observability

No system measures: how often each phase catches issues, average time per phase, how often the Adversary is BROKEN vs VERIFIED, how many findings are deferred vs fixed. Without data, it's impossible to know which phases add the most value.

2. Phase Transitions Are Unguarded

Hooks block code edits per phase, but phase transitions themselves are not enforced by hooks — the orchestrator calls workflow.py phase phase5_implement directly. Claude could skip from phase2_analyse to phase5_implement without checkpoints. The only enforcement is the keyword-gated checkpoint flags (which block the commit but not the transition).

3. Agent Information Isolation Is Convention-Only

The critical isolation rules (User-Advocate sees no code, Adversary sees no analysis) are in the orchestrator's markdown instructions, not technically enforced. A future Claude session, or a new orchestrator, could violate these easily.

4. No Session Resilience Documentation

The .sessions.json multi-session mapping exists, but there's no documented recovery procedure when a session ends mid-phase. The user_override_token.json (currently untracked in git) is the escape hatch, but its use is not documented in the workflow commands.

5. Developer Agent Worktree Isolation Unclear

The 10-bug.md command says isolation: "worktree" for the Developer Agent, but .claude/worktrees/ exists as an empty directory and the archived workflows show no worktree artifact. Unclear whether worktree isolation is actually being used or silently skipped.

6. No Spec Quality Gate

The Spec-Writer agent produces a spec, Henning approves it with a keyword, and the QA-Writer writes tests against it. But there's no check that the spec is testable — no acceptance criteria format, no "must include at least N acceptance tests" requirement. Vague specs produce vague tests.


Concrete Recommendations for agent-os-openspec

  1. Distinguish red_test_done from ui_test_red_done in standards — projects that use only UI tests for TDD should have a single unified tdd_red_done flag, or the standard should clarify the difference and which gate checks which.

  2. Adversary AMBIGUOUS must block commit — add a gate that treats adversary_verdict == "AMBIGUOUS" with zero findings the same as BROKEN, requiring explicit Henning override. "AMBIGUOUS but resolved findings" is a false green.

  3. Analysis-First standard should include candidate falsification — before fixing, the analyst must list alternative root cause candidates and explicitly eliminate each. This prevents the "12 attempts" anti-pattern.

  4. Add LoC counting to scope guard — the edit_gate already tracks affected_files; extend it with a git diff --shortstat check on modified files. Block if total LoC delta exceeds the project limit (250 in this project).

  5. Spec acceptance criteria format standard — define a mandatory section in specs: ## Acceptance Criteria with Given/When/Then or similar testable format. The QA-Writer should refuse to write tests without it.

  6. Phase transition audit trailworkflow.py phase <phase> should log the calling context (timestamp, session) so it's possible to audit whether phases were skipped.

  7. Information isolation as hook — consider a lightweight context_gate.py hook that reads agent role from session metadata and blocks tool calls that violate isolation rules (e.g., User-Advocate trying to Read .swift files).


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    workflow-feedbackFeedback from a project implementing the workflow spec

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions