[Feedback] gregor_zwanzig — Workflow analysis

## Project context

`gregor_zwanzig` is a headless Python service that normalises weather data and emits compact reports (SMS ≤160 chars, HTML email) for long-distance hikers. Stack: Python + uv + pytest, Go API binary, Svelte frontend, NiceGUI admin UI, deployed via systemd to a Hetzner VPS with a separate staging environment. The project has been running the OpenSpec 8-phase workflow with adversary verification since Epic #77 (commit `7e86270`, 2026-04-18). 170 commits land on `main` since April; almost every commit is tied to a GitHub Issue ID.

## What works (with evidence)

- **Hook-enforced phase gates actually block bypass attempts.** `.claude/settings.json` chains 14 PreToolUse hooks on `Edit|Write` (workflow_gate, spec_enforcement, tdd_enforcement, red_test_gate, post_implementation_gate, scope_guard, …). The system caught at least two real bypass attempts that ended up in long-term memory: editing `legacy_entities.txt` to whitelist a missing spec (memory `feedback_no_workflow_bypass.md`) and trying to "hot-fix" a worktree-routing bug directly in main (memory `feedback_workflow_strict.md`, 2026-05-02). In both cases the hooks won.
- **External validator as isolated `claude --print` session caught real bugs that internal tests missed.** `.claude/validate-external.sh` spawns a fresh Claude with no conversation history, only spec + running app. Issue #111 result: first run was AMBIGUOUS because Python-loader output wasn't HTTP-observable. After adding an internal endpoint (Issue #115, commit `23be83c`), a second validator run found **3 real bugs**, one CRITICAL HTTP 500 on `aggregation.profile=null` (commit `4d57330`). Documented in memory `feedback_validator_observability_first.md`.
- **Product Owner Pattern with worktree-isolated developer agent.** Main context delegates implementation to a developer agent (Opus, isolated git worktree). 44 entries in `.claude/worktrees/` show heavy use. Role split is enforced by the Phase 6 command (`5-implement.md` lines 230–236: "Du TUST … Developer Agent spawnen … Du tust NICHT … Code schreiben oder editieren").
- **Adversary verification with tri-state verdict.** Phase 6b (`implementation-validator` agent on Sonnet) returns VERIFIED / BROKEN / AMBIGUOUS. AMBIGUOUS verdicts are *not* nodded through — memory captures the rule that AMBIGUOUS triggers an observability-first response, not dismissal.
- **Real integration tests, no mocks.** CLAUDE.md prohibits `Mock()/patch()/MagicMock`. Email tests round-trip through Gmail SMTP + IMAP. API tests hit the real provider. Result: zero "tests passed but prod broke" incidents in the last 50 commits.
- **Memory system distils repeated mistakes into actionable rules.** 13 feedback memories in `~/.claude/projects/-home-hem-gregor-zwanzig/memory/`, each with a concrete originating session and dated incident. Examples: `feedback_post_push_workflow.md` (5-step Push → staging-wait → staging-validate → prod-deploy → validator-vs-prod, anchored in Issues #113 and #114), `feedback_validator_after_push.md` (validator order matters; pre-push runs hit old code).

## What does not work (with evidence)

- **Worktree state-routing was repeatedly broken.** `workflow_state.json` is a single 137KB central file shared across worktrees. Issue #112 (commit `510717b`) fixed the initial routing. Then `tdd_enforcement` still resolved artifact paths against the worktree root instead of main repo root (commit `28d5b22`). Then `active_workflow` drift between parallel worktrees pushed artifacts into the wrong workflow **twice in one session** (memory `feedback_workflow_state_explicit_name.md`, 2026-05-04). The shared-state design is the root cause; per-fix patches haven't fully eliminated drift.
- **Validator order vs. push direction is a footgun.** Default validator URL is production. Running validator pre-push hits old prod code → false-negative AMBIGUOUS. Documented in memory `feedback_validator_after_push.md` and Issue #113. Fix is procedural, not enforced — nothing in the workflow stops you from running it in the wrong order.
- **`systemctl restart` ≠ deploy.** Three Python-only commits restarted services without rebuilding the Go binary or frontend → 23 h of code drift before BetterStack alerted (memory `feedback_post_push_workflow.md`). Required adding `deploy-gregor-prod.sh` and a drift monitor (`check-gregor20.sh`). The workflow had no concept of "what counts as deployed".
- **Schema rework caused silent data loss.** Issue #102 (commit `b0a3576`): a refactor lost 3 of 4 stages of the GR221 trip. Recovery only worked because GPX files happened to survive in a stash. Triggered a new `data_schema_backup.py` hook and a "Daten-Schema-Reworks" section in CLAUDE.md. The original spec/TDD-RED loop did not catch the regression because acceptance criteria didn't cover persistence-survives-edit semantics.
- **Spec-Approval gating is grep-fragile.** Approval is encoded as `- [ ] Approved` / `- [x] Approved` in the spec markdown. There is no schema validation that acceptance criteria are present, traceable to tests, or distinct from boilerplate. Several specs in `docs/specs/modules/` use the template structure inconsistently.
- **Phase 6 fix-loop has no telemetry.** `5-implement.md` says "max 3 iterations" but state file does not count them. There's no audit trail of how often a workflow bounced between developer and validator, so chronic problem features can't be detected.

## Gaps / blind spots

- **No machine-readable spec format.** Specs are free-form markdown. Approval is grep-detected; acceptance criteria are not parsed; no spec → test traceability map. RED-test artifacts are registered with a free-text description, not linked to specific criteria.
- **Validator observability is bespoke per project.** The Issue #111 → #115 lesson ("AMBIGUOUS means observability gap, not 'tests cover it'") is encoded as a memory, not a workflow contract. A standard spec should declare its required black-box surface.
- **Single-file workflow state.** 137KB JSON, mixes per-workflow phase data, artifacts, approvals, and active-workflow pointer. Concurrent worktrees cause drift. Per-workflow files would be more robust and would not require the "always pass workflow name explicitly" workaround documented in memory.
- **No standard deploy contract.** "What counts as deployed" had to be discovered the hard way (Issue #113). The workflow does not check that a binary build, a frontend build, and a smoke test all happened before declaring "done".
- **Memory rules can rot.** Feedback memories reference file paths and module names that may move; nothing audits them against the current repo. `feedback_workflow_state_explicit_name.md` is already a workaround, not a permanent fix.
- **Cross-repo dependency contract is informal.** This project notifies sibling Claude instances (`claude-mq`) via free-text messages. Specs that span multiple repos (e.g. nginx config in `henemm-infra` for a new endpoint) have no formal link.
- **GitHub Issue ↔ Spec linkage is by convention only.** Commits cite issue numbers, but specs in `docs/specs/modules/` don't always reference issues, and issues don't list their spec path. Bidirectional traceability would help auditing.

## Concrete recommendations to agent-os-openspec

1. **Adopt a structured spec frontmatter** with parsable fields: `acceptance_criteria` (list of IDs), `observability_requirements` (HTTP endpoints / log signatures the validator can check), `tests` (mapping criterion → test file/name). Replace grep-based approval with explicit `approved_at` timestamp + signer.
2. **Define a standard validator contract.** Every spec must declare which black-box surfaces the external validator can rely on. AMBIGUOUS verdict on a non-declared surface should auto-block; on a declared surface it falls back to user review. This codifies the Issue #111 lesson.
3. **Move workflow state to per-workflow files** (`docs/artifacts/<workflow>/state.json`) plus a tiny `active.json` pointer. Eliminates the central-file drift class entirely.
4. **Add a deploy contract phase** with explicit gates: artifact-built, smoke-tested, drift-monitor-clear. `systemctl restart` alone must not satisfy "done".
5. **Track fix-loop iterations as first-class state.** Persist a counter per workflow per phase, surface it in `/status`, fail closed at the configured max.
6. **Schema-rework template:** mandatory pre-snapshot, post-restore-test, and acceptance criterion "no field of any persisted record is unreadable after migration". Generalise from the GR221 incident.
7. **Bi-directional GitHub Issue ↔ Spec link** as a workflow primitive: spec frontmatter has `issue:` field, `gh issue` is annotated with `spec:` path. CI fails if mismatch.
8. **Memory audit hook:** when memories reference file paths, periodically check the paths still exist; flag stale memories. Prevents the "rule about a deleted file" failure mode.
9. **Adopt the "tech-free user-facing language" convention as an explicit role contract** in agent definitions, with examples of bad vs. good phrasing — currently it lives in memory only and re-learned each session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feedback] gregor_zwanzig — Workflow analysis #5

Project context

What works (with evidence)

What does not work (with evidence)

Gaps / blind spots

Concrete recommendations to agent-os-openspec

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feedback] gregor_zwanzig — Workflow analysis #5

Description

Project context

What works (with evidence)

What does not work (with evidence)

Gaps / blind spots

Concrete recommendations to agent-os-openspec

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions