OpenAI Symphony on Restate — a durable substrate for the spec.
Status: pre-alpha, slice 5 done (2026-05-06). Pivoted 2026-05-05. Built on
restate-elixirv0.2.0 (local path-dep until Hex-published). Implements Symphony'sSPEC.mdexternal contract; internal architecture is Restate-native. Seedocs/architecture.mdanddocs/demo-script.mdfor the chaos-beat narrative.
Symphony's published SPEC.md openly states:
Support tracker/filesystem-driven restart recovery without requiring a persistent database; exact in-memory scheduler state is not restored.
That's the gap Restate fills. symphony-restate is the same Symphony contract — same
WORKFLOW.md, same Linear adapter, same Codex app-server protocol — running on a
substrate where the orchestrator state is durable. Kill the host mid-flow; every
issue resumes on a different node, no double-dispatch, no lost claims.
The "in-memory orchestrator" caveat in SPEC.md §1. Drop in any OpenAI Symphony
WORKFLOW.md; symphony-restate runs it; the claim state and conversation journal
survive node death.
External contract = SPEC.md. Internal architecture = Restate-native, with BEAM/OTP
and Restate as co-stars. Each Linear issue maps to a Restate Virtual Object
(IssueVO) that owns the claim (single-writer per issue ID). The actual 1..max_turns
turn loop lives in a separate Workflow service (RunAttemptWorkflow), keyed by
"\#{identifier}::a\#{attempt_n}" — one workflow journal per attempt, holding the
pinned WORKFLOW.md content hash, the conversation, and per-turn comment ids.
IssueVO.dispatch synchronously calls into the workflow via ctx.call, so a
failover of either VO or workflow resumes from the journal it owns. The codex stdio
session itself is owned by an OTP-supervised Codex.Session GenServer pinned to
one BEAM node via Registry — fast turn-to-turn handoff while the node is healthy.
On node death, Restate retries the invocation on a different node; Codex.Manager
there spawns a fresh Session, whose cold-path seeding rebuilds codex's thread
context from the durable conversation state in one extra round-trip. Per-turn Linear
comments are idempotent via deterministic markers (identifier, attempt_n, turn_n) —
a replayed ctx.run after a lost response cannot duplicate them. The codex thread is
given a linear_graphql dynamic tool so the agent can read and write its own ticket.
Both substrates do what they're best at.
See docs/architecture.md for diagrams + the full mapping against SPEC.md.
Shipped through slice 3:
- Virtual Object state on
IssueVO(per-issueclaim_status,last_attempt_n,last_attempt_result;worker_nodeis persisted for observability only, never read for routing) - Workflow state on
RunAttemptWorkflow(per-attemptworkflow_content_hash,workspace_path, durableconversation,turn_count,last_comment_id) - Virtual Object state on
SchedulerVO(per-projectrunning,interval_ms,last_tick_at_ms,dispatched_total) ctx.callfromIssueVOintoRunAttemptWorkflow(durable cross-service call; retries journaled, terminal failures propagate)ctx.call_async+Awaitable.anyfor the codex-turn-vs-stall race inRunAttemptWorkflowctx.call_async+Awaitable.allfor the per-issue VO-state fan-out inSchedulerVO.reconcilectx.send_async(invoke_at_ms:)forSchedulerVO's self-rescheduling tick loop and for fire-and-forget per-issue dispatchesctx.timerfor the stall timerctx.runfor journaled side effects (WORKFLOW.md load + content-hash pin, Linear fetch / idempotent comment, prompt render, system-time capture, port-kill on stall)ctx.set_statefor the per-turnconversationappend + per-tick scheduler bookkeeping
All five planned slices shipped. The next layer (cancellation across the
call tree, stall-fired mid-attempt cancellation of the live
CodexTurnService invocation, multi-node Restate cluster for the
"physical Restate failover" beat) is open work — see
docs/architecture.md for the residual list.
# Prereqs: mix 1.19+, Erlang/OTP 28+, restate-server 1.6+, LINEAR_API_KEY in env,
# `codex` CLI on PATH (slice 1.5+ drives a real codex app-server stdio session).
mix deps.get
mix compile
# Run the pure-layer test suite (parser + Liquid render):
cd apps/symphony_core && mix test
# Boot the BEAM endpoint on :9082 (handlers register at start):
mix run --no-halt
# In another shell, register the deployment with restate-server:
restate --yes deployments register http://localhost:9082
# Edit WORKFLOW.md → set tracker.project_slug to your Linear project slug.
# Trigger one issue end-to-end (slice 1+):
mix symphony.dispatch SYM-1 --exec
# OR boot the per-project poll loop (slice 3+):
mix symphony.scheduler start your-project-slug --interval 30000 --exec
# Snapshot the per-issue VO states across the project at any time:
mix symphony.scheduler reconcile your-project-slug --exec
# Halt the loop:
mix symphony.scheduler stop your-project-slug --exec
# `mix run --no-halt` also boots the Phoenix LiveView dashboard (slice 4+)
# at http://localhost:4000 — auto-refreshes every 2s via SchedulerVO.reconcile,
# expand any issue row to see its current attempt's conversation.mix symphony.dispatch SYM-1 prints the curl, or pass --exec to run it. The
handler will: load WORKFLOW.md (ctx.run), fetch the issue from Linear (ctx.run),
ensure a workspace clone (ctx.run), then drive the 1..max_turns loop — for each
turn it renders the prompt with Solid Liquid (ctx.run), drives a real codex turn
through the per-issue Codex.Session (ctx.run), appends %{turn, prompt, response}
to durable conversation state, posts a per-turn Linear comment (ctx.run), and
re-fetches the issue to break early on terminal tracker state. Kill the BEAM
mid-flow and Restate replays completed ctx.run blocks on resume; if the next turn
runs on a different node, the fresh Codex.Session rehydrates context from the
durable conversation via cold-path seeding.
docs/demo-script.md — three chaos beats
(pkill codex / kill the BEAM / kill the Restate container) each
zeroed in on a single architectural invariant. Drives via
mix symphony.chaos {kill_codex,kill_beam,kill_restate} and the
shell scripts in scripts/.
./scripts/dev/up.sh # restate-server in docker
mix run --no-halt # BEAM + dashboard on host
./scripts/dev/register.sh # one-time register w/ Restate
mix symphony.scheduler start <SLUG> --interval 30000 --exec
# then in a separate shell:
mix symphony.chaos kill_codex| Slice | Scope | Status |
|---|---|---|
| 0 | Umbrella scaffold, Apache-2.0, deps wired | done |
| 1 | WORKFLOW.md parser + Liquid render in :symphony_core; IssueVO.dispatch w/ Linear fetch + post-comment + stub codex turn; mix symphony.dispatch task; endpoint registered on :9082 |
done (2026-05-05) |
| 1.5 | Real codex app-server stdio session (single-shot port of upstream SymphonyElixir.Codex.AppServer); per-turn workspace ensure; auto-approval policy for non-interactive runs |
done (2026-05-05) |
| 2 | 1..max_turns continuation loop in IssueVO; per-issue Codex.Session GenServer pinned to one BEAM node via Registry + DynamicSupervisor; cold-path conversation seeding rebuilds codex thread on cross-node failover from durable conversation state; per-turn Linear comments + tracker re-fetch between turns |
done (2026-05-05) |
| 2.5 | IssueVO slimmed to claim/dispatch; turn loop extracted into RunAttemptWorkflow (Workflow service, keyed ${id}::a${n}); WORKFLOW.md content-hash pinned per attempt; node-local workspace preflight outside ctx.run; idempotent Linear comments via (identifier, attempt_n, turn_n) markers; idle-timeout for Codex.Session; linear_graphql dynamic tool so the agent can drive its own ticket |
done (2026-05-05) |
| 3 | Codex turn extracted as CodexTurnService (Restate Service); workflow uses ctx.call_async + ctx.timer + Awaitable.any for the turn-vs-stall race; on stall, kills Codex.Session port and raises terminal failure; IssueVO.read_state shared handler for snapshots; Linear.list_issues_in_project!; SchedulerVO poll loop self-rescheduling via ctx.send_async(invoke_at_ms:) + Awaitable.all reconciliation across N issues; mix symphony.scheduler {start,stop,tick,reconcile} driver |
done (2026-05-06) |
| 4 | New :symphony_dashboard umbrella app: Phoenix 1.7 + LiveView 1.0 on :4000, single OverviewLive with 2s auto-refresh; RestateClient HTTP wrapper hitting Restate ingress for SchedulerVO.reconcile + IssueVO.readState + RunAttemptWorkflow.readState; click-to-expand attempt panels showing conversation, content-hash pin, workspace_path; stale-state badge surfaces ingress failure; no asset pipeline (Phoenix/LV JS served from each dep's priv/static) |
done (2026-05-06) |
| 5 | Root docker-compose.yml (single-node Restate); scripts/dev/{up,down,register}.sh boot helpers; scripts/chaos/{kill-codex,kill-beam,kill-restate}.sh three chaos beats with banners that name the invariant they exercise; mix symphony.chaos {kill_codex,kill_beam,kill_restate} wrapper; docs/demo-script.md narrative the operator runs from. |
done (2026-05-06) |
Demo readiness gate: see demo-engineering.md §6.
Apache-2.0 (matching upstream Symphony). Portions of this code are derived from
OpenAI Symphony; see NOTICE for
the per-module attribution.