This guideline is for specifying multi-component systems — orchestrators, daemons, long-running services, and anything with non-trivial state, concurrency, or cross-component contracts. It complements the Backend and Frontend guidelines, which are optimized for single-feature PRDs (a Lambda handler, a React component). When to use this instead of Backend or Frontend guidelines:
- You're specifying a service that runs continuously (a poller, a scheduler, a daemon).
- The system has ≥ 3 components that communicate with each other.
- There is a non-trivial state machine (issue lifecycles, retry backoff, reconciliation).
- Observability, concurrency control, or safety boundaries are first-class concerns.
- The system is deployable as its own unit (not a handler inside a larger app).
If you are specifying "a Lambda function" or "a React component," use the Backend or Frontend guideline instead. This one is for "a service that orchestrates Lambda functions" or "a dashboard that coordinates multiple React apps."
Inspiration: The structural spine of this guideline is adapted from OpenAI's Symphony
SPEC.md— a reference specification for a long-running agent-orchestration service. Symphony itself is not a guideline; it is an example of what a rigorous system-level spec looks like. We've extracted its structure into a reusable template here.
System-level work is the most natural home for agent orchestration:
- Spawn Plan subagents early. System specs benefit from a second opinion on the state machine and the domain model. Delegate architectural reviews to a Plan subagent before writing.
- Use Explore subagents for cross-cutting concerns. "Does any existing service already do X?" is a recon question perfect for a subagent.
- Hooks enforce the test matrix. A complete test matrix (per component) should be enforced by a
Stophook that refuses to archive the implementation log until each row is green. - Workspace isolation is mandatory. System specs produce large, cross-cutting changes. Never work on
main; always use a dedicated branch or worktree. - The
WORKFLOW.mdcontract. At the system level,WORKFLOW.mdmay also encode deployment targets, observability endpoints, and on-call rotation — the agent should read it before writing the spec.
- Receive Prompt – User describes a system they want built (e.g., "a service that polls an issue tracker and runs a coding agent on each ticket in an isolated workspace").
- Read the Landscape – Before anything else, use the Explore subagent to survey existing adjacent services, shared libraries, and deployment patterns.
- Clarify Boundaries First – Same "build the fence" discipline as Backend/Frontend: non-goals, phasing, integration seams. System specs fail hardest when scope is not nailed down.
- Draft the Spec – Use the section spine below. Write for extensibility: unknown configuration keys should be ignored, new components should be addable without breaking existing ones.
- Delegate Review – Before finalizing, send the draft to a Plan subagent for independent architectural review. Fold findings back in.
- Review, Approve, Commit – Same critical checkpoint as all other guidelines. Do not begin implementation until the spec is committed.
- Archive on Completion – Follow the ARCHIVAL PROTOCOL in
implementation-tasks-creation-guidelines.md.
Use these sections in order. Each section has a purpose — do not skip, even if it seems short.
State the operational problems the system solves. Be concrete:
- What is broken or manual today?
- What does success look like (measurable, verifiable)?
- What is explicitly out of scope? Example (from Symphony):
"Turns issue execution into a repeatable daemon workflow, isolates agent execution per issue, keeps workflow policy in-repo, and provides observability for concurrent runs."
Name the components. Draw the boundaries. For each component, state:
- Responsibility (one sentence)
- Inputs (what it reads)
- Outputs (what it produces)
- Lifecycle (started by what, stopped by what) A system spec without a component list is a wish list. Be explicit. Template: | Component | Responsibility | Inputs | Outputs | Lifecycle | |-----------|----------------|--------|---------|-----------| | ... | ... | ... | ... | ... |
List the entities the system reasons about. For each:
- Name (stable identifier — the thing a log line would reference)
- Fields (what it contains)
- Identity (how it is uniquely identified — internal ID vs. human-readable ID)
- Lifecycle (created, transitioned, retired) Distinguish stable internal IDs from display-facing identifiers. This matters for logs, retries, and reconciliation.
Naming note. This section is about the service's own in-repository config/policy file — the thing this service reads at startup. It is not the project-level
WORKFLOW.mdthat tells the coding agent how to operate on this codebase. A project has oneWORKFLOW.md(orCLAUDE.md/AGENTS.md) that lives alongside all of its code; a service has its own configuration file that encodes operational policy. In some small projects they collapse to the same file. In larger projects (multiple services in one repo), keep them separate and name them distinctly (e.g.,config/scheduler.yaml,config/poller.yaml).
If the service reads operational policy from a file in the repository (highly recommended), specify:
- Filename and location (
config/[service-name].yaml,[service-name]-policy.md, etc.) - Schema (required fields, optional fields, unknown-key policy)
- Reload semantics (does the service re-read at runtime? restart-only? SIGHUP?)
- Validation (what errors fail startup vs. warn)
- Relationship to project
WORKFLOW.md(if any rules overlap, which file wins?)
Forward compatibility rule: Unknown keys should be ignored with a warning, not rejected. This lets the spec evolve without breaking deployed instances.
For any component with non-trivial state:
- States (enumerated)
- Transitions (from → to, and the event that triggers each)
- Invariants (things that must be true in each state)
- Concurrency rules (max parallel runs, rate limits, queueing)
- Retry policy (backoff, max attempts, giveup condition)
- Reconciliation (how the system recovers after a restart — does it resume, replay, or drop in-flight work?)
A state diagram is worth writing out even if it feels obvious.
When to write this section. Only if the system exposes or consumes a stream: Server-Sent Events, WebSocket, long-polling, gRPC server-streaming, Kafka/NATS/Redis PubSub fan-out, or any other "push" pattern where a client subscribes once and receives many messages. Frontend guidelines route streaming questions here; this is the canonical home for the design.
For each stream the system owns or consumes, specify:
6.1 Transport choice and justification
- SSE — one-way server-to-client, survives most proxies, browser EventSource is reliable. Good default for dashboards and live feeds.
- WebSocket — bidirectional, lower overhead, needs explicit reconnect logic. Choose when the client needs to send messages back over the same channel.
- Long-polling — fallback for restrictive environments; request/response loop with a hold timeout. Accept the per-request latency.
- gRPC streaming — for service-to-service streams in controlled networks; not for public browser clients.
- Broker fan-out (Kafka/NATS/Redis) — when multiple independent subscribers need the same feed and you want to decouple producer from consumer.
Document the tradeoff you accepted. "We chose SSE because [X, Y, Z]" beats "We use SSE."
6.2 Message schema and framing
- Wire format (JSON, protobuf, CBOR) and versioning policy.
- Required envelope fields: message ID (monotonic or ULID), timestamp, type, payload, schema version.
- Forward-compatibility rule: unknown fields ignored; unknown message types logged and dropped, not fatal.
6.3 Backpressure and buffering
- Producer strategy when the consumer is slow: drop oldest, drop newest, block, disconnect?
- Buffer bounds — how many frames does the server hold per subscriber? What happens at the limit?
- Slow-consumer policy — disconnect after N seconds of full buffer? Log + count the drops?
6.4 Reconnect and resume semantics
- Who initiates reconnect — client always, client-with-server-nudge, or server-push retry?
- Backoff curve — exponential with jitter, capped at N seconds; document the numbers.
- Resume point — last-seen message ID echoed on reconnect so the server can replay; or "no resume, clients accept gaps."
- Idempotency guarantee — if the server replays, are messages idempotent on the client side (keyed by message ID)?
- Ordering on resume — strictly ordered, per-partition ordered, or best-effort?
6.5 Heartbeat and liveness
- Heartbeat frequency and form (ping frame, empty SSE event, NOP message).
- Client-side idle timeout — how long without a heartbeat before the client assumes dead and reconnects.
- Server-side dead-connection detection — TCP keepalive settings or application-layer ping-pong.
6.6 Fan-out and multi-subscriber semantics
- One-to-one (per-user feed) vs. one-to-many (shared feed): different design.
- Subscription granularity — does the server filter, or does the client?
- Auth scope on each message — does every frame re-verify entitlement, or is subscription-time auth sufficient?
6.7 Shutdown and graceful drain
- Shutdown signal (SIGTERM, HTTP admin endpoint) and what it triggers.
- Drain behavior — stop accepting new subscriptions; continue serving existing ones for N seconds; then close with a reason code.
- Close code taxonomy — if the service emits WebSocket close codes, define them here.
6.8 Testing hooks Every streaming contract should have a row in the Testing Matrix (§9):
- Unit test for the message encoder/decoder.
- Integration test that subscribes, receives N messages, validates ordering and schema.
- Chaos test that kills the connection mid-stream and verifies reconnect + resume.
- Slow-consumer test that confirms the documented backpressure policy.
6.9 Frontend handoff The frontend PRD that consumes this stream should not repeat the transport, backpressure, or reconnect design — it should reference this section by name. The frontend PRD owns: the render path, the stale-frame detection UI, the reconnect-status indicator, and which user action surfaces a transport error.
When to write this section. Only if the system produces records that must be retained for regulatory, contractual, or forensic reasons: financial trades, authentication events, consent changes, administrative actions, data access logs, model-output provenance. Frontend guidelines route audit questions here. If the system has no such records, say so explicitly in one line and move on.
An audit record is not a log line. Logs are operational and can be rotated; audit records have retention obligations and integrity guarantees. Treat them as a first-class output of the system.
7.1 Scope — what is and is not an audit event Enumerate the audit-relevant actions. For each, state:
- Triggering event (e.g., "order submitted", "user consent changed", "admin role grant", "model inference over threshold")
- Who emits the record (which component)
- When (synchronously on the action, or asynchronously from a queue)
- Whether the originating action blocks on successful audit write (see §7.7)
7.2 Record schema Specify the canonical schema. Required fields for most audit records:
- Record ID — immutable, globally unique (ULID, UUIDv7).
- Timestamp — server-authoritative, monotonic, UTC, ISO-8601 with fractional seconds.
- Actor — who did the thing (user ID, service ID, system).
- Action — enumerated verb from the schema's action vocabulary.
- Subject — what the action was performed on (entity type + ID).
- Outcome — success, failure, partial, denied.
- Context — request ID, session ID, source IP, user agent (as applicable).
- Before/after state (where relevant) — the minimum state delta needed to reconstruct what changed.
- Schema version — for forward compatibility.
7.3 Storage, retention, and immutability
- Storage tier (database, append-only log, object storage) and durability guarantee.
- Retention period (e.g., 7 years for SOX trade records, 2 years for auth events) — state the regulatory source if applicable.
- Immutability mechanism — WORM (write-once-read-many) storage, append-only log, hash chaining, or cryptographic notarization. Document which guarantee you can actually prove.
- Deletion policy — beyond retention end, where do records go? Are there GDPR "right to erasure" carve-outs that override the retention floor?
7.4 PII and secret handling
- What is redacted before the record is written, and by whom.
- Hashing and tokenization — if a field is a hashed user ID, document the salt/pepper source.
- Secrets policy — no secrets, keys, tokens, or full card numbers in audit records. Ever. State this explicitly.
- Access to identifying joins — if the audit store holds hashed IDs, which system holds the hash→identity mapping, and who can access it?
7.5 Access control
- Who can read the records (enumerate roles or services).
- Who can write (only the emitting components).
- Who can delete — typically no one; only retention-policy enforcement.
- Audit of audit — if a human reads audit records, is that read itself audited? For regulated workloads, yes.
7.6 Export and replay path
- Export format (JSONL, Parquet, CSV) and cadence (on-demand, scheduled).
- Replay — can you reconstruct system state from the audit log alone? If not, document the gaps.
- Legal hold — how is a subset of records marked non-deletable for an open investigation?
7.7 Failure semantics — the critical design decision When the audit write fails, does the originating action proceed?
- "Fail closed" — the action blocks and fails if the audit record cannot be written. Required for most regulated workloads (trades, consent, admin actions). Document the user-facing error and the operational remediation.
- "Fail open with catchup" — the action proceeds; audit is queued for asynchronous write with retry. Acceptable only when the action is reversible or the audit obligation permits eventual consistency. Document the maximum acceptable lag and the alert that fires if the queue grows unbounded.
There is no third option. Choose explicitly; do not leave this ambiguous.
7.8 Regulatory mapping (if applicable) One short table naming the obligation each audit event satisfies:
| Event | Framework | Clause / Rule | Retention |
|---|---|---|---|
| Trade submitted | SOX / FINRA | 17 CFR 240.17a-4 | 6 years |
| Consent changed | GDPR | Article 7(1) | Duration of processing + 3 years |
| Admin role grant | SOC 2 | CC6.1 | 1 year |
Leave the table empty if no regulatory obligation applies, but the section header stays so reviewers can't skip the question.
7.9 Frontend handoff The frontend PRD must not be the sole source of truth for an audit event. If a user action is audit-relevant, the record is written on the server side of that action, not from a fire-and-forget client call. The frontend's role is to surface success/failure to the user based on the server's response.
- Trust boundaries: Which components have which permissions? What can external input reach?
- Workspace isolation: Where does work happen? How is it cleaned up? Who owns the filesystem?
- External integrations: APIs called, authentication model, rate-limit handling, timeout policy.
- Failure domains: If external service X is down, which components degrade and which fail?
- Structured logging: What fields appear on every log line? (run ID, component, issue ID, etc.)
- Metrics: What counts, rates, and latencies are exported?
- Dashboards / status surfaces: HTTP endpoints, CLI commands, or dashboards for operators.
- Operator intervention points: How does a human pause, cancel, or drain the system safely?
- Alerts: What conditions should page a human?
Every component needs a row. Every row needs a verification command.
| Component | Test Type | Verification Command | Green When |
|---|---|---|---|
| Configuration Loader | Unit | node --test tests/config-loader.test.js |
All assertions pass |
| Tracker Client | Integration | node --test tests/tracker-client.test.js |
Returns normalized issue list |
| Orchestrator | State-machine | node --test tests/orchestrator.test.js |
All transitions fire correctly |
| Stream Encoder | Unit | node --test tests/stream-encoder.test.js |
Round-trips all message types |
| Audit Writer | Integration | node --test tests/audit-writer.test.js |
Fail-closed path blocks on error |
| End-to-end | Real integration | scripts/e2e-smoke.sh |
Exit code 0 with expected log lines |
Adapt test-file extensions to your project's conventions. The examples above assume Node.js native test; use
.test.ts,.spec.js,.test.py, or whatever fits your stack.
A test matrix is the agent-era replacement for "we'll write tests later."
- Branch/worktree:
system/[service-name] - Subagent delegation: Which components can be built in parallel by independent subagents?
- Hook requirements:
PreToolUse(Bash:git commit)runs the full test matrix;Stopblocks archival if any matrix row is red. - Project
WORKFLOW.mdreference: Point to the file; do not restate its contents.
Document how to add:
- A new component
- A new field to the service policy/configuration file
- A new state to the state machine
- A new streaming message type (forward compatibility)
- A new audit event type (schema migration)
Follow the rule: "Unknown keys are ignored for forward compatibility."
List what the system will not do. Be specific. Common non-goals for system specs:
- No persistent database (use filesystem + restart recovery)
- No mandatory approval model (trust the tracker)
- No built-in secret management (delegate to the platform)
A system spec for "a service that orchestrates coding agents against an issue tracker" would use this spine like so:
- Problem & Goals: Eliminate manual ticket pickup; run agents in isolated workspaces; keep policy in-repo.
- Architecture: Configuration Loader → Tracker Client → Orchestrator → Workspace Manager → Agent Runner → Logging Surface.
- Domain Model: Issue, WorkflowDefinition, ServiceConfig, Workspace, RunAttempt, LiveSession, RetryEntry, RuntimeState.
- Service Policy / Configuration File:
config/orchestrator.yamlwith required and optional fields; unknown keys ignored. Separate from the projectWORKFLOW.md. - State Machine: IssueState (queued → running → completed/failed); RunAttempt (scheduled → active → finished); retry with exponential backoff.
- Streaming Transports: SSE feed of live run-log frames to the operator dashboard; last-seen-ID resume; 30s heartbeat; slow-consumer disconnect after 60s.
- Audit & Compliance Records: Immutable record per RunAttempt (actor=agent-id, subject=issue-id, outcome=success/failure); 1-year retention; fail-closed on audit-write error.
- Safety: Each workspace is a disposable directory; agents never touch
main; tracker writes are agent-owned. - Observability: Structured logs with run ID; optional HTTP status surface; metrics for active runs and retry counts.
- Testing Matrix: Per-component unit tests + real-tracker integration smoke test + SSE chaos test + audit-write fail-closed test.
- Agent Execution Plan: Branch
system/symphony, subagent per component, hook-enforced test matrix.
This is not a reprint of Symphony's spec — it's what the spec looks like when compressed through this guideline's spine.
A worked pseudocode block for the state transitions described above. Pseudocode is the right level of detail for a system specification: it nails down ordering, error paths, and invariants without committing to a language or a specific concurrency primitive. See the README's "The Case for Pseudocode in Specifications" for the broader rationale.
function dispatch(issue):
assert issue.state == "queued"
if active_run_count() >= service_config.maxConcurrentRuns:
return // wait for a slot; called again on the next tick
workspace = workspace_manager.allocate(issue.issueId)
run = RunAttempt {
runId: ulid(),
issueId: issue.issueId,
state: "scheduled",
workflowName: pick_workflow(issue),
}
audit_writer.write(run, "run_scheduled") // FAIL-CLOSED — see §7.7
// if the audit write throws after retries, dispatch aborts;
// workspace is released; issue stays "queued"; operator alerted.
transition(issue, "queued" -> "running")
transition(run, "scheduled" -> "active")
spawn_async {
outcome = agent_runner.execute(run, workspace)
// the agent_runner streams LiveSession frames during execution;
// those are operational logs, not audit events.
audit_writer.write(run, outcome.action) // FAIL-CLOSED again
transition(run, "active" -> "finished")
if outcome.success:
transition(issue, "running" -> "completed")
else if outcome.retriable and run.attempt < 3:
schedule_retry(run, backoff = [60, 300, 900][run.attempt])
else:
transition(issue, "running" -> "failed")
workspace_manager.release(workspace)
}
function reconcile_on_restart():
for workspace in workspace_manager.list_dangling():
last_audit = audit_writer.last_record_for(workspace.runAttemptId)
if last_audit.state == "active" and recent_heartbeat(last_audit):
resume(workspace)
else:
mark_failed(workspace, reason = "restarted_during_run")
schedule_retry_for(workspace.runAttemptId)
What this pseudocode pins down that prose alone obscured: the audit-write precedes the state transition, not the other way around (so a failed audit aborts cleanly); the workspace release happens in the finished branch only after the audit row is committed; reconciliation distinguishes "in-flight with recent heartbeat" from "in-flight but stale," which would otherwise be a footnote that the implementer might miss. If you cannot write the pseudocode, you do not yet understand the state machine — that is the forcing-function value of this section.
Before finalizing the specification, run this consolidated checklist. It is the system-spec counterpart to the Final Audit in backend-feature-specification-guidelines.md and frontend-feature-specification-guidelines.md — one structured pass covering structure, cross-cutting design, execution plan, and red flags. The audit is a review step the agent runs on the finished specification before presenting it; its contents do not appear in the specification itself.
Each red-flag bullet below carries a stable identifier from specification-validation-vocabulary.md in parentheses. A reviewer subagent or a hook can cite those identifiers directly when reporting findings.
- ☐ Does §1 state the problem in concrete operational terms with measurable success criteria?
(missing_non_goals)(the corollary — explicit out-of-scope statements live in §13) - ☐ Does §2 have a populated component table with Responsibility, Inputs, Outputs, and Lifecycle for every component?
(missing_component_table) - ☐ Does §3 distinguish stable internal identifiers from display-facing identifiers for every entity?
(missing_stable_ids) - ☐ Does §4 specify the service's own configuration file with schema, reload semantics, validation, and a forward-compatibility rule, distinct from the project-level
WORKFLOW.md?(missing_forward_compat_rule, service_policy_confused_with_workflow) - ☐ Does §5 enumerate states, transitions, invariants, concurrency rules, retry policy, and reconciliation?
(missing_state_machine, missing_concurrency_rules, missing_reconciliation_design)
- ☐ If §6 Streaming Transports is present, does it specify backpressure, resume semantics, and heartbeat?
(streaming_no_backpressure_policy, streaming_no_resume_semantics, streaming_no_heartbeat) - ☐ If §7 Audit & Compliance Records is present, does it specify retention, PII redaction, access control, and the fail-closed vs. fail-open decision explicitly?
(audit_no_retention, audit_no_pii_policy, audit_no_access_control, audit_semantics_not_chosen) - ☐ If §6 or §7 is genuinely not applicable, is it explicitly marked so rather than silently omitted?
- ☐ Does every component named in §2 have a row in §10 Testing Matrix?
(missing_test_matrix_row) - ☐ Does every row in §10 have a runnable verification command, not a prose description?
(test_matrix_row_no_verification_command) - ☐ Does §11 name the branch or worktree, the subagent delegation strategy, the required hooks, and reference
WORKFLOW.mdrather than restating its rules?(missing_branch_name, missing_delegatable_research, prose_deterministic_rule, workflow_content_restated) - ☐ Does §12 document how to add a new component, a new configuration field, a new state, a new streaming message type, and a new audit event type?
- ☐ Does §14 list open questions with owners and resolution dates, not just question marks?
(missing_component_table)§2 is a paragraph, not a table. A system spec without a component table is a wish list.(missing_stable_ids)Domain entities use the tracker's display ID as the log/retry key. The first time the tracker renumbers, every log line lies.(missing_state_machine)A component with non-trivial state has no enumerated state machine. "It runs to completion" is not a design.(missing_concurrency_rules)NomaxConcurrentRuns, no rate limit, no queue. The first hot run will starve the rest.(missing_reconciliation_design)"On restart, start fresh." The first restart during an active run is now a silent data loss.(audit_semantics_not_chosen)§7.7 hedges between fail-closed and fail-open. Pick one. Document the rationale. There is no third option.(missing_forward_compat_rule)The service configuration schema does not state its unknown-key policy. The first schema evolution breaks every old deployment.(streaming_no_backpressure_policy)§6 specifies the transport but not the slow-consumer policy. The first slow subscriber hangs the whole stream.(test_matrix_row_no_verification_command)Matrix rows describe what is tested instead of how. "We'll write the tests later" is the v3.x mistake the matrix exists to prevent.(workflow_content_restated)§11 restates test commands or commit style instead of pointing atWORKFLOW.md.(prose_deterministic_rule)Deterministic rules ("the agent must always run the test matrix before commit") are stated as prose instead of being wired into a hook.
- Structure Over Prose: Tables, state diagrams, and enumerated lists beat narrative paragraphs at the system level.
- Stable IDs First: Before writing the state machine, nail down how entities are identified.
- Forward Compatibility by Default: Unknown configuration keys should warn, not fail.
- Extensibility Is a First-Class Section: Document how to extend the system in the spec itself.
- Test Matrix Is Non-Negotiable: Every component gets a row; every row gets a verification command.
- Observability Is Part of Design: If you can't name what to log, you can't name what went wrong.
- Isolation Over Coordination: Prefer designs where components can restart independently.
IMPORTANT: When archiving implementation logs, follow the ARCHIVAL PROTOCOL in implementation-tasks-creation-guidelines.md.
Key requirements:
- Mark ALL tasks complete (
- [ ]→- [x]) - Rename from
implementing-toimplementation-log- - Move to
documentation/tasks/completed/For system specs, the archival hook should additionally verify that every row in the Testing Matrix is green before allowing archival to complete.
- Format: Markdown (
.md) - Location:
/documentation/specifications/active/(during development),/documentation/specifications/completed/(after completion) - Filename:
system-specification-[service-name].md - Service Configuration File: A service-specific configuration file describing runtime policy (see §4). Distinct from the project-level
WORKFLOW.md. In a single-service repository the two may collapse into one file; in multi-service repositories, keep them separate so every service owns its configuration file and the project ownsWORKFLOW.md.