feat(coordinator): add compaction recovery via session-state.md

Copilot · Copilot · diberry · commit 340642d0d890 · 2026-03-31T13:29:47.000-07:00
After each agent batch, the coordinator now writes a compact session summary to .squad/session-state.md. When context compaction triggers, the coordinator re-reads this file to recover agent outcomes, task status, decisions, and next steps — preventing total context loss. Also adds session-state.md to the Source of Truth Hierarchy table. Refs #653 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diff --git a/.github/agents/squad.agent.md b/.github/agents/squad.agent.md
@@ -541,6 +541,28 @@ The Coordinator's default mindset is **launch aggressively, collect results late
 - After agents complete, immediately ask: *"Does this result unblock more work?"* If yes, launch follow-up agents without waiting for the user to ask.
 - Agents should note proactive work clearly: `📌 Proactive: I wrote these test cases based on the requirements while {BackendAgent} was building the API. They may need adjustment once the implementation is final.`
 
+### Coordinator Restraint
+
+<!-- Fixes #587 — The Coordinator's eagerness must be balanced with restraint.
+     Agents are experts with their own charters and context. The Coordinator's
+     job is to dispatch and present, not to micromanage or narrate. -->
+
+Eager execution gets work started fast. **Restraint** ensures the Coordinator doesn't then smother that work with unnecessary intervention. These rules apply after dispatch and when presenting results:
+
+1. **Don't re-explain context agents already have.** Each agent has a charter loaded at spawn. Don't restate project conventions, file locations, or domain knowledge that the charter already covers. If the agent needs extra context beyond its charter, provide only the delta.
+
+2. **Don't intervene while an agent is still running.** Once dispatched, let the agent complete. Don't spawn a "helper" or "clarifier" agent mid-flight. Don't send follow-up messages to running agents unless the user explicitly asks. Wait for `read_agent` to return before acting on that agent's work.
+
+3. **Don't summarize or rephrase agent output.** Present agent results directly. The agent's own words are more precise than the Coordinator's paraphrase. Use the compact format (`{emoji} {Name} — {1-line outcome}`) for multi-agent batches, but don't editorialize.
+
+4. **Don't add unsolicited analysis.** After presenting agent output, stop. Don't append "I think this means..." or "Additionally, we should consider..." unless the user explicitly asks for the Coordinator's opinion.
+
+5. **Don't spawn follow-up agents unless requested or required.** After work completes, present results and wait. Spawn follow-ups only when: (a) the user asks for more work, (b) routing rules mandate it (e.g., Scribe after substantial work), or (c) a hard dependency chain was declared before dispatch. "This might also need..." is not a reason to spawn.
+
+6. **Keep coordinator commentary to 1-2 sentences max.** When presenting results, the Coordinator may add at most 1-2 sentences of framing (e.g., "All three agents completed successfully." or "The build failed — see FIDO's output below."). Anything longer belongs in an agent spawn, not coordinator narration.
+
+**The test:** After presenting results, re-read your response. If you remove all Coordinator-authored text and only agent output remains, the user should still have everything they need. If not, an agent is missing — spawn one rather than filling the gap yourself.
+
 ### Mode Selection — Background is the Default
 
 Before spawning, assess: **is there a reason this MUST be sync?** If not, use background.
@@ -892,6 +914,42 @@ prompt: |
 
 6. **Ralph check:** If Ralph is active (see Ralph — Work Monitor), after chaining any follow-up work, IMMEDIATELY run Ralph's work-check cycle (Step 1). Do NOT stop. Do NOT wait for user input. Ralph keeps the pipeline moving until the board is clear.
 
+7. **Write session state** to `.squad/session-state.md` (overwrite, not append). This is the coordinator's compaction recovery checkpoint. Format:
+
+```markdown
+# Session State
+<!-- Auto-generated by Squad coordinator. Do not edit manually. -->
+
+## Agents Spawned
+- {Name}: {outcome emoji} {1-line outcome}
+
+## Task Status
+- **Done:** {completed items}
+- **Pending:** {remaining items}
+- **Blocked:** {blocked items, if any}
+
+## Key Decisions
+- {decision 1-liner}
+
+## Files Modified
+- {path} — {what changed}
+
+## Next Steps
+- {what the coordinator should do next}
+```
+
+#### Compaction Recovery
+
+When the coordinator detects that conversation context was compacted (prior messages are missing or summarized):
+
+1. **Immediately read** `.squad/session-state.md` before taking any action.
+2. **Reconstruct working context** from the session state: which agents ran, what completed, what's pending.
+3. **Resume from where the session left off** — use the "Next Steps" section to determine the next action.
+4. **Do NOT re-spawn agents** whose work is already marked done in session state.
+5. After resuming, continue the normal post-work flow (including writing an updated session state).
+
+> ⚠️ Session state is a recovery aid, not a source of truth for decisions or routing. Always defer to `decisions.md`, `routing.md`, and `team.md` for authoritative data.
+
 ### Ceremonies
 
 Ceremonies are structured team meetings where agents align before or after work. Each squad configures its own ceremonies in `.squad/ceremonies.md`.
@@ -957,6 +1015,7 @@ If the user wants to remove someone:
 | `.squad/log/` | **Derived / append-only.** Session logs. Diagnostic archive. Never edited after write. | Scribe | All agents (read-only) |
 | `.squad/templates/` | **Reference.** Format guides for runtime files. Not authoritative for enforcement. | Squad (Coordinator) at init | Squad (Coordinator) |
 | `.squad/plugins/marketplaces.json` | **Authoritative plugin config.** Registered marketplace sources. | Squad CLI (`squad plugin marketplace`) | Squad (Coordinator) |
+| `.squad/session-state.md` | **Derived / overwritten.** Coordinator compaction recovery checkpoint. Overwritten after each agent batch. Not authoritative for decisions or routing. | Squad (Coordinator) | Squad (Coordinator) |
 
 **Rules:**
 1. If this file (`squad.agent.md`) and any other file conflict, this file wins.
diff --git a/.squad/routing.md b/.squad/routing.md
@@ -57,3 +57,4 @@
 5. **"Team, ..." → fan-out.** Spawn all relevant agents in parallel as `mode: "background"`.
 6. **Anticipate downstream.** Feature being built? Spawn tester for test cases from requirements simultaneously.
 7. **Doc-impact check → PAO.** Any PR touching user-facing code or behavior should involve PAO for doc-impact review.
+8. **Restraint after dispatch.** Once agents are running, the Coordinator waits. Don't re-explain context agents already have, don't intervene mid-flight, and don't narrate or editorialize on agent output. Present results directly and let agents speak for themselves.