Spec Enhancement: Workflow Observability & Measurement

## Summary

Three independent projects running the OpenSpec 8-phase workflow — FocusBlox (711 commits, 37+ cycles), Meditationstimer (iOS/watchOS, Workflow v6), and gregor\_zwanzig (Python service, 170 commits since April) — independently identified the same structural gap: **zero machine-readable record of workflow execution**. Without execution logs, it is impossible to measure which phases add value, calibrate the Adversary, detect chronic problem features, or do cross-project learning. All three projects also independently identified two related spec gaps: (1) specs have no standardised acceptance-criteria format, making tests untraceable to criteria, and (2) scope enforcement checks file count but not LoC delta.

This proposal defines the minimal additions to the OpenSpec spec that would close these gaps.

---

## Evidence (from 3 projects)

### Gap 1 — No Workflow Execution Log (3/3 projects)

**FocusBlox:** *"No systematic log exists. When asked 'what works and what doesn't?', the only available data is anecdotal memory entries. There is no structured record of: how often phases were skipped, adversary true/false positive rate, scope compliance, TDD discipline rate."*

**Meditationstimer:** *"No system measures: how often each phase catches issues, average time per phase, how often the Adversary is BROKEN vs VERIFIED, how many findings are deferred vs fixed. Without data, it's impossible to know which phases add the most value."*

**gregor\_zwanzig:** *"Phase 6 fix-loop has no telemetry. `5-implement.md` says 'max 3 iterations' but state file does not count them. There's no audit trail of how often a workflow bounced between developer and validator, so chronic problem features can't be detected."*

### Gap 2 — No Testable Acceptance Criteria Format (3/3 projects)

**FocusBlox:** *"No Reproducibility Spec — no machine-readable spec that describes acceptance criteria in a way another project could implement independently."*

**Meditationstimer:** *"No Spec Quality Gate — no check that the spec is testable — no acceptance criteria format, no 'must include at least N acceptance tests' requirement. Vague specs produce vague tests."*

**gregor\_zwanzig:** *"No machine-readable spec format. Approval is grep-detected; acceptance criteria are not parsed; no spec → test traceability map."* The approval mechanism is `- [ ] Approved` / `- [x] Approved` toggled by grep — no validation that AC are present, distinct from boilerplate, or traceable to tests.

### Gap 3 — Scope Enforcement Incomplete: File Count Only, No LoC (2/3 projects)

**FocusBlox:** *"Scope Guard Not Enforced — CLAUDE.md documents max 5 files / ±250 LoC. No hook enforces the LoC limit. Violations go undetected."*

**Meditationstimer:** *"Scope Enforcement Is File-Count-Only, Not LoC — `edit_gate.py` checks `len(code_affected) > 5` but does not count lines changed."*

### Gap 4 — Adversary Code-First Verification Not Enforced (2/3 projects)

**FocusBlox:** *"Adversary Produces False Positives — 3 of 6 findings in one session were false positives. The Adversary read the spec to generate findings but didn't verify against actual code."* Root cause: Adversary prompt does not require reading current implementation before reporting.

**Meditationstimer:** *"Adversary AMBIGUOUS verdict has no enforcement path — workflow still proceeded to `checkpoint3_approved` and commit. An AMBIGUOUS verdict with all findings resolved is indistinguishable from VERIFIED at the gate level."*

### Gap 5 — Phase Transition Audit Trail Missing (2/3 projects)

**Meditationstimer:** *"Phase transitions themselves are not enforced by hooks — the orchestrator calls `workflow.py phase phase5_implement` directly. Claude could skip from `phase2_analyse` to `phase5_implement` without checkpoints."*

**gregor\_zwanzig:** *"Track fix-loop iterations as first-class state. Persist a counter per workflow per phase, surface it in `/status`, fail closed at the configured max."*

### Cross-cutting confirmation — "Hooks are Law" (3/3 projects)

**FocusBlox (strongest evidence):** *"CLAUDE.md rules are followed with ~60–70% probability. Hooks with 100%. Documentation is a suggestion. Hooks are law."*

All three projects independently confirm: any spec requirement without a corresponding hook will eventually be skipped.

---

## Proposed Spec Additions

### S1 — Workflow Execution Log Schema (closes Gap 1)

Add a standard section to the workflow spec defining a mandatory execution log entry per completed workflow. The log is written to `.claude/workflows/_log/YYYY-MM-DD_<workflow-id>.yaml` and committed alongside the work.

Minimum required fields:

```yaml
workflow_id: FEAT_001
project: <project-name>
completed_at: 2026-05-09T14:22:00Z
phases_completed: [phase1_context, phase2_analyse, phase3_spec, phase4_approved, phase5_tdd_red, phase6_implement, phase7_validate]
phases_skipped: []
override_used: false
tdd_red_confirmed: true
adversary_verdict: VERIFIED        # VERIFIED | BROKEN | AMBIGUOUS
adversary_findings_total: 2
adversary_fix_loop_iterations: 1
scope_files_changed: 3
scope_loc_delta: +142
outcome: success                   # success | partial | reverted
```

`workflow.py complete` must refuse to archive without a valid log entry. `bash_gate.py` git-commit check must verify the log file is staged in `phase8_complete`.

### S2 — Acceptance Criteria Format Standard (closes Gap 2)

Mandatory `## Acceptance Criteria` section in every spec. Each criterion must:

1. Have a unique ID: `AC-<N>`
2. Use testable format: `Given <precondition> / When <action> / Then <observable outcome>`
3. Reference at least one test (populated after TDD RED phase)

Example:
```markdown
## Acceptance Criteria

- **AC-1:** Given the app is in background / When the session timer fires / Then the session continues and elapsed time is correct on foreground.
  - Test: `BackgroundMeditationUITests.test_backgroundForeground_sessionStillRunning`
```

`edit_gate.py` must parse the spec file for `## Acceptance Criteria` before allowing phase6 edits. Block if section is missing or contains zero `AC-N:` entries. `qa_gate.py` must link adversary findings to AC IDs — a finding without an AC reference is flagged unverifiable.

### S3 — LoC Delta Enforcement (closes Gap 3)

Extend `edit_gate.py` to track cumulative LoC delta per workflow session via `git diff --shortstat` on modified files. Block when `max_loc_delta` threshold is exceeded (default: 250). Surface in `workflow.py status`:

```
Scope: 3/5 files, +142 LoC (limit: 250)
```

Per-workflow override via `workflow.py set-field loc_limit_override 500` for legitimate exceptions (e.g. bulk translation commits). Configurable `loc_exclude_patterns` in `openspec.yaml` for generated files.

### S4 — Adversary Code-First Requirement (closes Gap 4)

Add to `implementation-validator.md`: **every finding must include a file:line reference** obtained by reading the actual current implementation, not only the spec. A finding without a code reference is rejected as malformed.

Required finding format:
```
Finding #N — <title>
Severity: CRITICAL | HIGH | MEDIUM | LOW
Code reference: path/to/file.py:42
Evidence: <what the code actually does>
Spec requirement: AC-N says <X>
Conflict: <why this is a violation>
```

Confirmations (AC satisfied) must be listed explicitly alongside findings to prove coverage.

For AMBIGUOUS verdict: AMBIGUOUS with zero open findings must be treated identically to BROKEN at `bash_gate.py`. Only `workflow.py override-ambiguous "<reason>"` (requires user keyword via `phase_listener.py`) may unlock the commit.

### S5 — Phase Transition Audit Trail (closes Gap 5)

`workflow.py phase <new_phase>` must append to `phase_transitions` in workflow state:

```json
"phase_transitions": [
  {"from": "phase2_analyse", "to": "phase3_spec", "at": "2026-05-09T10:00:00Z", "trigger": "command"},
  {"from": "phase3_spec", "to": "phase4_approved", "at": "2026-05-09T10:31:00Z", "trigger": "user_keyword"}
]
```

`trigger` values: `user_keyword` | `command` | `manual`. Transitions with `trigger: manual` that skip phases emit a warning (not a block). Fix-loop iteration counter incremented each time `phase6_implement` is re-entered after `phase6b_adversary`. Surfaced in `workflow.py status` and included in execution log.

---

## Open Questions

1. **Log storage:** `.claude/workflows/_log/` (project-local, committed) vs. optional central aggregation endpoint. Recommendation: local-first, add optional `log_export_url` config field later.

2. **False-positive annotation:** `adversary_findings_false_positives` requires human judgement post-session. Options: (a) manual annotation in PR comment; (b) `workflow.py annotate-finding <id> false-positive` command with bash_gate requirement. Decision needed before S1 is implemented.

3. **LoC exclusions for generated files:** Translation files (`*.xcstrings`, `*.po`) and codegen output should be excludable. Define `loc_exclude_patterns` as project-specific or part of standard config?

4. **AC-N enforcement on existing specs:** Should `edit_gate.py` enforce the AC-N format for existing specs (warn-only) or only for new specs (hard-block)? Recommendation: warn for existing, block for new.

5. **AMBIGUOUS verdict resolution rule:** If 4/5 findings are FIXED/ACCEPT and 1 is DEFER, is the verdict AMBIGUOUS or BROKEN? Suggest: AMBIGUOUS only if all findings are FIXED or ACCEPT; BROKEN if any finding remains DEFER with no explicit user override.

---

## Source Issues

- #3 — [Feedback] FocusBlox (iOS/macOS SwiftUI) — Workflow Analysis
- #4 — [Feedback] Meditationstimer — Workflow-Analyse (Workflow v6, Agent Orchestrator Pattern)
- #5 — [Feedback] gregor\_zwanzig — Workflow analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec Enhancement: Workflow Observability & Measurement #6

Summary

Evidence (from 3 projects)

Gap 1 — No Workflow Execution Log (3/3 projects)

Gap 2 — No Testable Acceptance Criteria Format (3/3 projects)

Gap 3 — Scope Enforcement Incomplete: File Count Only, No LoC (2/3 projects)

Gap 4 — Adversary Code-First Verification Not Enforced (2/3 projects)

Gap 5 — Phase Transition Audit Trail Missing (2/3 projects)

Cross-cutting confirmation — "Hooks are Law" (3/3 projects)

Proposed Spec Additions

S1 — Workflow Execution Log Schema (closes Gap 1)

S2 — Acceptance Criteria Format Standard (closes Gap 2)

S3 — LoC Delta Enforcement (closes Gap 3)

S4 — Adversary Code-First Requirement (closes Gap 4)

S5 — Phase Transition Audit Trail (closes Gap 5)

Open Questions

Source Issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spec Enhancement: Workflow Observability & Measurement #6

Description

Summary

Evidence (from 3 projects)

Gap 1 — No Workflow Execution Log (3/3 projects)

Gap 2 — No Testable Acceptance Criteria Format (3/3 projects)

Gap 3 — Scope Enforcement Incomplete: File Count Only, No LoC (2/3 projects)

Gap 4 — Adversary Code-First Verification Not Enforced (2/3 projects)

Gap 5 — Phase Transition Audit Trail Missing (2/3 projects)

Cross-cutting confirmation — "Hooks are Law" (3/3 projects)

Proposed Spec Additions

S1 — Workflow Execution Log Schema (closes Gap 1)

S2 — Acceptance Criteria Format Standard (closes Gap 2)

S3 — LoC Delta Enforcement (closes Gap 3)

S4 — Adversary Code-First Requirement (closes Gap 4)

S5 — Phase Transition Audit Trail (closes Gap 5)

Open Questions

Source Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions