Skip to content

docs(roles): document FIX-LANDED protocol for clearing stand-down state#8

Open
veryviolet wants to merge 1 commit into
mainfrom
docs/stand-down-clearance-protocol
Open

docs(roles): document FIX-LANDED protocol for clearing stand-down state#8
veryviolet wants to merge 1 commit into
mainfrom
docs/stand-down-clearance-protocol

Conversation

@veryviolet

Copy link
Copy Markdown
Owner

Summary

  • STAND-KEEPER, TESTER, and DEVELOPER role docs now describe the canonical inbox-wake shape (FIX-LANDED for <task-id> stand-profile <profile>. Worktree ... carries iter-<N> changes: ...) that SK reads when deciding whether to transition down → free.
  • Pure documentation change. No CLI surface, no schema, no behavior changes in code.

Why

Empirically, the FSM already has a working clearance path: DEVELOPER (or TESTER) sends SK an inbox message announcing that an iter-N fix has landed in the relevant worktree, and SK then runs greatminds stand up. SK's behavior here has been consistent across many task cycles. But the protocol was never written down — TESTER.md and DEVELOPER.md mention stand lease / stand release but stop there; STAND-KEEPER.md says "after recovery, run stand up" without telling SK how to recognize recovery.

In a recent nginarea session this gap caused a multi-hour stall:

  • 0010 mlgpu2 deploy hit a real playbook bug (wrong model name + *.pt rsync exclude). SK correctly transitioned the stand to down with a detailed down_reason.
  • TESTER sent 3 wake-overrides to SK with bodies like "FOR THIS LEASE: load_profile(coord, 'mlgpu2'). Run mlgpu2.yaml." — wrong shape: no FIX-LANDED signal, references an active lease that doesn't exist while the stand is down.
  • SK correctly ignored them (no active lease + no fix-landed semantic).
  • Both TESTER and DEVELOPER then asked MAINTAINER to manually greatminds stand up --profile mlgpu2. The CLI is SK-only; MAINTAINER has no override.
  • Stand stayed down for ~3 hours until DEV sent a correctly-shaped wake.

This patch makes the canonical body discoverable from the three role docs that need it, and explicitly tells implementers not to route through MAINTAINER.

Design choices

  • FIX-LANDED prefix as the semantic marker. Plain English so SK can pattern-match the body without a structured CLI change. A future PR could add greatminds inbox send --kind fix-landed for fully structured signalling, but that's a larger surface change and not needed to unblock the immediate documentation gap.
  • SK should verify the worktree before transitioning. A small grep / diff check is enough — the friction is intentional and prevents stray re-deploys of unfixed code if the body is fabricated.
  • Explicit don't-ask-MAINTAINER note in DEVELOPER.md. stand up access_control is canon-gated to SK; documenting that the override doesn't exist closes the speculative escalation path.

Test plan

  • Verify greatminds render-role STAND-KEEPER / TESTER / DEVELOPER includes the new sections in the rendered bootstrap text.
  • No behavior tests needed — pure documentation change.

🤖 Generated with Claude Code

Empirically, STAND-KEEPER clears a `down` stand via `greatminds stand
up` only after receiving an inbox wake whose body carries semantic
"fix landed" content from DEVELOPER or TESTER. This protocol was not
documented in any role doc, and the resulting discoverability gap
caused multi-hour stalls when implementers sent misshaped wakes
(e.g. "FOR THIS LEASE: load_profile..." while there was no active
lease) and then asked MAINTAINER to override the SK-only `stand up`
CLI.

This patch documents the canonical `FIX-LANDED` body shape in three
places:

- STAND-KEEPER.md: SK reads the body verbatim, verifies the worktree
  path, then transitions. Explicitly states SK should ignore bare
  wake pings and lease-shaped wakes while down — that friction is
  what prevents stray re-deploys of unfixed code.

- TESTER.md / DEVELOPER.md: holders / implementers learn the
  canonical body shape and are told NOT to route to MAINTAINER —
  `stand up` is SK-only and MAINTAINER has no override.

Pure documentation change. No CLI surface or schema modified.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant