Ryanontheinside/feat/refactor app#220
Open
ryanontheinside wants to merge 11 commits into
Open
Conversation
Give the command/event vocabulary the same backend-owned, self-describing treatment the knob registry gave the parameter surface, so a re-skinned UI or MCP agent builds against the contract instead of reverse-engineering ws_adapter.py / engine/protocol.ts. - protocol.py: CommandSpec/EventSpec registry as the single source of truth (16 commands, 18 events) + command_catalog()/event_catalog()/ wire_contract() projections + PROTOCOL_VERSION - server.py: serve GET /api/protocol (ACAO:*, mirrors /api/knobs) - mcp_server.py: describe_protocol tool over the same contract - web: types/wireContract.ts + engine/wire/fetchWireContract.ts + useWireContractStore, fetched at boot alongside the knob manifest - tests/unit/test_wire_contract.py: drift guard — AST-parses the dispatcher and asserts handled message types == COMMAND_NAMES, and every event the browser client handles is registered, so a new verb can't skip the contract Additive only: no change to the live dispatcher or protocol.ts senders.
origin_sensitive narrowed to params/set_prompt_blend with echo_event as contract data; vestigial KnobDef/KnobBank/build_banks layer and dead KnobState surface deleted (differential-tested identical); snapshot carries sde and MCP validates rcfg_mode via the registry; packPcmFrame unifies the SDK PCM framings; codegen emits KNOB_SCHEMA_VERSION and fetchKnobManifest returns the versioned envelope; DynamicKnobPanel enums validate through app type guards; config catalog surfaces factory defaults.
bbb8e10 to
33a53bf
Compare
tests/golden drives a live streaming server over the WS protocol (never imports server/app internals): 6 declarative scenarios, position-anchored canonical audio comparison (tier-1 sha short-circuit, tier-2 calibrated log-mel/RMS/spectral-cosine thresholds), coarse latency ceilings plus diffable per-build reports, and full wire transcripts recorded for upcoming browser-client replay tests. References live on the HF dataset daydreamlive/demon-test-refs, sha256-pinned by refs.json (baseline: main @ 96c57bb, RTX 5090, trt). Local GPU is first-class: pytest tests/golden spawns the server itself; without CUDA or a pod URL every test skips. The harness's own protocol logic is covered GPU-free by tests/unit/test_golden_harness.py against a scripted fake server.
Tier A (vitest): float16 decode exhaustive against native Float16Array, slice-epoch swap-bleed guards, AudioPlayer swap playhead clamping, WsReconnector backoff/cancel, session store lifecycle. Tier B (vitest): the real RemoteBackend driven by recorded golden transcripts through a fake WebSocket; every recorded send re-issued through the real send methods and wire-compared; buffer reconstruction bit-exact against canonical.f32.raw; main-thread decode perf artifacts to runs/web-replay-reports/. Tier C (Playwright): tests/golden/replay_server.py serves a transcript over a real WebSocket (recorded client sends are causality gates) so the full app boots CPU-only; start -> ready -> knob -> swap smoke, playhead stall detection, no console errors, and an in-browser SHA-256 of the canonical buffer region against the reference bundle through the real worker decode path. Observation hooks on window.__demonTest. Knob-to-ear latency: each fired golden action is post-processed into audible_first_ms (playhead reaches the first post-action slice ahead of it) and audible_full_ms (playhead reaches the action-time generation frontier), with a coarse DEMON_LAT_CEILING_ACTION_AUDIBLE_MS ceiling. knob_step baseline on the 5090 at depth 4: 234 / 594 ms; the playback-lead buffer dominates wire+apply by ~20x.
…de/feat/refactor-app # Conflicts: # demos/realtime_motion_graph_web/web/app/RTMGBoot.tsx
Mechanical import-path renames after the SDK extraction (engine/protocol.ts et al. -> sdk/, re-exported via the @demon/client alias): tests/**, engine/testHooks.ts, and the standalone verifyReconnect.mts. vitest.config.ts gains the @demon/client alias mirroring tsconfig. No test logic changes. The replay server now answers the two capability probes the refactored boot added (/api/knobs, /api/protocol), served from the real registries (torch-free), keeping the Tier C clean-console assertion meaningful.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backend-owned control contract + demon-client SDK
Makes the realtime-motion-graph control surface a backend-owned, self-describing contract so a re-skinned UI, a vibecoded frontend, or an MCP agent can build against the backend instead of re-declaring shapes. The OCP standard this branch enforces: adding a control means editing exactly one backend place.
The two registries
Parameter surface (
acestep/streaming/knobs.py):knob_specs->knob_catalog(), served atGET /api/knobswithKNOB_SCHEMA_VERSION. Values are enforced throughcoerce_knob_values(silent clamp on the hot path insession.set_knobs, raise in MCP).KnobState(specs)seeds all registry defaults so the snapshot is complete from t=0; the runtime LoRA-strength knob is registry-derived (lora_strength_spec), and the dead KnobDef/KnobBank surface is deleted.Wire vocabulary (
demos/realtime_motion_graph_web/protocol.py): aCommandSpec/EventSpecregistry (16 commands, 18 events, plus config and upload-handshake sections) ->wire_contract(), served atGET /api/protocolwithPROTOCOL_VERSIONand exposed as the MCPdescribe_protocoltool. The config section is derived from theSessionConfigdataclass viatyping.get_type_hints.coerce_command_payloadgives type coercion, enum checks, and nullable handling; the WS dispatcher derives from the registry at runtime (unknown types rejected viaCOMMAND_NAMES, payloads coerced before the arms, arms unchanged), and MCP derives its command validation from the same function.Generated client types
scripts/gen_wire_types.pyprojects the contract intoweb/sdk/types/wireContract.gen.ts: name unions, payload interfaces,SessionConfigPayload, and theWireCommand/WireEventunions. The client's senders are typed against their generated command interfaces and the inbound ladder is a compile-checkedswitch (msg.type as EventName).Drift guards in
tests/unit/test_wire_contract.pykeep all of it honest: dispatcher arm fields vs specs (AST), emitted event fields vs specs (AST over dict literals), client-handled events vsEVENT_NAMES(case-label scan), MCP dict fields vs specs, config vs dataclass, handshake registration, and a byte-compare freshness test that regenerates the TS in memory.demon-client SDK
Everything a frontend needs to talk to a pod now lives in
web/sdk/(packagedemon-client, consumed in-tree via the@demon/clienttsconfig alias):RemoteBackend, the slice-decode worker,AudioPlayer+ LUFS,WsReconnector, manifest fetchers, and the generated + hand-written types. Three dependency inversions keep it host-free (store writes moved to app-side listeners, LoRA trigger prefixing injected viaRemoteBackendOptions.promptTransform, AudioPlayer takes a loudness-config provider and worklet URL), enforced by a boundary guard test.packPcmFrame()replaces three hand-rolled copies of the PCM framing.sdk/README.mddocuments the protocol state machine.Deliberately still hand-coded (inherent logic, not vocabulary): dispatcher arm -> session-method bindings, sender business logic, binary framing.
Rebase + test harness
Rebased onto main @
96c57bb; main'sinit_ackWS telemetry (#219) was folded into the contract architecture (registeredEventSpec, SDK store writes moved tows_trace_update/ws_init_acklisteners). The golden-harness branch is merged (7d6301e) and its tests ported to the SDK layout (ec32f60): pure mechanical import renames to@demon/client(the RemoteBackend surface is unchanged, so the replay send-mapping needed zero logic changes), plus the replay server learned the two new capability probes (/api/knobs,/api/protocol) served from the real registries, both torch-free.Validation (local RTX 5090, 2026-06-04)
Web tiers (GPU-free, bit-exact policy):
tsc --noEmitclean;next buildclean.RemoteBackend; every re-driven send byte/JSON-equals the recording.Golden GPU tier: 11/12. The one failure (
knob_step,win_cos_min 0.98532 < 0.995) was A/B-verified to not be a regression: the pre-refactor baseline build fails identically on the same day, and the two builds' run audio matches each other at win_cos_min 0.99999988 (max abs diff 5.7e-4, the float16 wire floor). The divergence is a ~2 s patch straddling the knob transition vs a reference captured earlier the same day; see the harness PR for the diagnosis.Latency vs the pre-refactor baseline reports, all six scenarios: flat or slightly better. Ready 5.8 -> 5.4 s warm, slice gap p50/p95 unchanged (16/32 ms; knob_step 31/47 ms), decode p95 down ~0.2-0.4 ms, realtime factor up ~0.5-1.4x. Action acks: knob 0 ms (unchanged), prompt 156 -> 141 ms, swap 156 -> 141 ms, LoRA 1484 -> 1422 ms. Knob-to-ear unchanged: 234/594 ms first/full at depth 4.
Python unit suite: 132 passed, 4 skipped, 8 failed; the 8 are the pre-existing
test_lora_refitfailures (_transpose_for_engineAttributeError), present before this branch. Wire/SDK/session contract tests: 29/29.Live GPU session spot-checked after the SDK extraction (playback, knobs, prompt sends).