Skip to content

Ryanontheinside/feat/refactor app#220

Open
ryanontheinside wants to merge 11 commits into
mainfrom
ryanontheinside/feat/refactor-app
Open

Ryanontheinside/feat/refactor app#220
ryanontheinside wants to merge 11 commits into
mainfrom
ryanontheinside/feat/refactor-app

Conversation

@ryanontheinside
Copy link
Copy Markdown
Collaborator

@ryanontheinside ryanontheinside commented Jun 3, 2026

Backend-owned control contract + demon-client SDK

Depends on #224. This branch contains the golden-harness branch (merged at 7d6301e). Merge #224 into main first; this diff then reduces to the refactor itself plus the test port. Do not merge this PR before #224.

Makes the realtime-motion-graph control surface a backend-owned, self-describing contract so a re-skinned UI, a vibecoded frontend, or an MCP agent can build against the backend instead of re-declaring shapes. The OCP standard this branch enforces: adding a control means editing exactly one backend place.

The two registries

Parameter surface (acestep/streaming/knobs.py): knob_specs -> knob_catalog(), served at GET /api/knobs with KNOB_SCHEMA_VERSION. Values are enforced through coerce_knob_values (silent clamp on the hot path in session.set_knobs, raise in MCP). KnobState(specs) seeds all registry defaults so the snapshot is complete from t=0; the runtime LoRA-strength knob is registry-derived (lora_strength_spec), and the dead KnobDef/KnobBank surface is deleted.

Wire vocabulary (demos/realtime_motion_graph_web/protocol.py): a CommandSpec/EventSpec registry (16 commands, 18 events, plus config and upload-handshake sections) -> wire_contract(), served at GET /api/protocol with PROTOCOL_VERSION and exposed as the MCP describe_protocol tool. The config section is derived from the SessionConfig dataclass via typing.get_type_hints. coerce_command_payload gives type coercion, enum checks, and nullable handling; the WS dispatcher derives from the registry at runtime (unknown types rejected via COMMAND_NAMES, payloads coerced before the arms, arms unchanged), and MCP derives its command validation from the same function.

Generated client types

scripts/gen_wire_types.py projects the contract into web/sdk/types/wireContract.gen.ts: name unions, payload interfaces, SessionConfigPayload, and the WireCommand/WireEvent unions. The client's senders are typed against their generated command interfaces and the inbound ladder is a compile-checked switch (msg.type as EventName).

Drift guards in tests/unit/test_wire_contract.py keep all of it honest: dispatcher arm fields vs specs (AST), emitted event fields vs specs (AST over dict literals), client-handled events vs EVENT_NAMES (case-label scan), MCP dict fields vs specs, config vs dataclass, handshake registration, and a byte-compare freshness test that regenerates the TS in memory.

demon-client SDK

Everything a frontend needs to talk to a pod now lives in web/sdk/ (package demon-client, consumed in-tree via the @demon/client tsconfig alias): RemoteBackend, the slice-decode worker, AudioPlayer + LUFS, WsReconnector, manifest fetchers, and the generated + hand-written types. Three dependency inversions keep it host-free (store writes moved to app-side listeners, LoRA trigger prefixing injected via RemoteBackendOptions.promptTransform, AudioPlayer takes a loudness-config provider and worklet URL), enforced by a boundary guard test. packPcmFrame() replaces three hand-rolled copies of the PCM framing. sdk/README.md documents the protocol state machine.

Deliberately still hand-coded (inherent logic, not vocabulary): dispatcher arm -> session-method bindings, sender business logic, binary framing.

Rebase + test harness

Rebased onto main @ 96c57bb; main's init_ack WS telemetry (#219) was folded into the contract architecture (registered EventSpec, SDK store writes moved to ws_trace_update/ws_init_ack listeners). The golden-harness branch is merged (7d6301e) and its tests ported to the SDK layout (ec32f60): pure mechanical import renames to @demon/client (the RemoteBackend surface is unchanged, so the replay send-mapping needed zero logic changes), plus the replay server learned the two new capability probes (/api/knobs, /api/protocol) served from the real registries, both torch-free.

Validation (local RTX 5090, 2026-06-04)

Web tiers (GPU-free, bit-exact policy):

  • tsc --noEmit clean; next build clean.
  • vitest Tiers A+B: 41 passed, 1 skipped by design. All six replay scenarios reconstruct the canonical region bit-exactly through the real RemoteBackend; every re-driven send byte/JSON-equals the recording.
  • Playwright Tier C: passed (2088 slices, 0 playhead stalls, 0 stale drops, bit-exact in-browser SHA-256 of the canonical region, clean console).

Golden GPU tier: 11/12. The one failure (knob_step, win_cos_min 0.98532 < 0.995) was A/B-verified to not be a regression: the pre-refactor baseline build fails identically on the same day, and the two builds' run audio matches each other at win_cos_min 0.99999988 (max abs diff 5.7e-4, the float16 wire floor). The divergence is a ~2 s patch straddling the knob transition vs a reference captured earlier the same day; see the harness PR for the diagnosis.

Latency vs the pre-refactor baseline reports, all six scenarios: flat or slightly better. Ready 5.8 -> 5.4 s warm, slice gap p50/p95 unchanged (16/32 ms; knob_step 31/47 ms), decode p95 down ~0.2-0.4 ms, realtime factor up ~0.5-1.4x. Action acks: knob 0 ms (unchanged), prompt 156 -> 141 ms, swap 156 -> 141 ms, LoRA 1484 -> 1422 ms. Knob-to-ear unchanged: 234/594 ms first/full at depth 4.

Python unit suite: 132 passed, 4 skipped, 8 failed; the 8 are the pre-existing test_lora_refit failures (_transpose_for_engine AttributeError), present before this branch. Wire/SDK/session contract tests: 29/29.

Live GPU session spot-checked after the SDK extraction (playback, knobs, prompt sends).

@ryanontheinside ryanontheinside marked this pull request as draft June 3, 2026 22:12
Give the command/event vocabulary the same backend-owned, self-describing
treatment the knob registry gave the parameter surface, so a re-skinned UI
or MCP agent builds against the contract instead of reverse-engineering
ws_adapter.py / engine/protocol.ts.

- protocol.py: CommandSpec/EventSpec registry as the single source of truth
  (16 commands, 18 events) + command_catalog()/event_catalog()/
  wire_contract() projections + PROTOCOL_VERSION
- server.py: serve GET /api/protocol (ACAO:*, mirrors /api/knobs)
- mcp_server.py: describe_protocol tool over the same contract
- web: types/wireContract.ts + engine/wire/fetchWireContract.ts +
  useWireContractStore, fetched at boot alongside the knob manifest
- tests/unit/test_wire_contract.py: drift guard — AST-parses the dispatcher
  and asserts handled message types == COMMAND_NAMES, and every event the
  browser client handles is registered, so a new verb can't skip the contract

Additive only: no change to the live dispatcher or protocol.ts senders.
origin_sensitive narrowed to params/set_prompt_blend with echo_event as contract data; vestigial KnobDef/KnobBank/build_banks layer and dead KnobState surface deleted (differential-tested identical); snapshot carries sde and MCP validates rcfg_mode via the registry; packPcmFrame unifies the SDK PCM framings; codegen emits KNOB_SCHEMA_VERSION and fetchKnobManifest returns the versioned envelope; DynamicKnobPanel enums validate through app type guards; config catalog surfaces factory defaults.
@ryanontheinside ryanontheinside force-pushed the ryanontheinside/feat/refactor-app branch from bbb8e10 to 33a53bf Compare June 4, 2026 12:09
tests/golden drives a live streaming server over the WS protocol (never
imports server/app internals): 6 declarative scenarios, position-anchored
canonical audio comparison (tier-1 sha short-circuit, tier-2 calibrated
log-mel/RMS/spectral-cosine thresholds), coarse latency ceilings plus
diffable per-build reports, and full wire transcripts recorded for
upcoming browser-client replay tests.

References live on the HF dataset daydreamlive/demon-test-refs,
sha256-pinned by refs.json (baseline: main @ 96c57bb, RTX 5090, trt).
Local GPU is first-class: pytest tests/golden spawns the server itself;
without CUDA or a pod URL every test skips. The harness's own protocol
logic is covered GPU-free by tests/unit/test_golden_harness.py against
a scripted fake server.
Tier A (vitest): float16 decode exhaustive against native Float16Array,
slice-epoch swap-bleed guards, AudioPlayer swap playhead clamping,
WsReconnector backoff/cancel, session store lifecycle.

Tier B (vitest): the real RemoteBackend driven by recorded golden
transcripts through a fake WebSocket; every recorded send re-issued
through the real send methods and wire-compared; buffer reconstruction
bit-exact against canonical.f32.raw; main-thread decode perf artifacts
to runs/web-replay-reports/.

Tier C (Playwright): tests/golden/replay_server.py serves a transcript
over a real WebSocket (recorded client sends are causality gates) so
the full app boots CPU-only; start -> ready -> knob -> swap smoke,
playhead stall detection, no console errors, and an in-browser SHA-256
of the canonical buffer region against the reference bundle through the
real worker decode path. Observation hooks on window.__demonTest.

Knob-to-ear latency: each fired golden action is post-processed into
audible_first_ms (playhead reaches the first post-action slice ahead of
it) and audible_full_ms (playhead reaches the action-time generation
frontier), with a coarse DEMON_LAT_CEILING_ACTION_AUDIBLE_MS ceiling.
knob_step baseline on the 5090 at depth 4: 234 / 594 ms; the
playback-lead buffer dominates wire+apply by ~20x.
…de/feat/refactor-app

# Conflicts:
#	demos/realtime_motion_graph_web/web/app/RTMGBoot.tsx
Mechanical import-path renames after the SDK extraction (engine/protocol.ts
et al. -> sdk/, re-exported via the @demon/client alias): tests/**,
engine/testHooks.ts, and the standalone verifyReconnect.mts. vitest.config.ts
gains the @demon/client alias mirroring tsconfig. No test logic changes.

The replay server now answers the two capability probes the refactored
boot added (/api/knobs, /api/protocol), served from the real registries
(torch-free), keeping the Tier C clean-console assertion meaningful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant