Skip to content

Multi-Deck#218

Draft
BuffMcBigHuge wants to merge 17 commits into
mainfrom
marco/feat/multi-deck-sessions
Draft

Multi-Deck#218
BuffMcBigHuge wants to merge 17 commits into
mainfrom
marco/feat/multi-deck-sessions

Conversation

@BuffMcBigHuge
Copy link
Copy Markdown
Collaborator

@BuffMcBigHuge BuffMcBigHuge commented Jun 3, 2026

Summary

This PR replaces the single "now playing" input with a four-deck DJ-style mixer in the realtime performance UI. Each deck owns one track, can play full / vocals / instruments, routes to bus A or bus B, and blends through a crossfader. The mixed result drives live inference, not just local preview.

Session persistence, custom track stores, and local saved sessions are inherited from the session-mods line of work; this writeup covers only the deck mixer, bus/crossfader model, and server-side parametric mix path.

Replaces #206 and builds on #150 improvements. Draft / WIP.

image

What changed in the performance surface

The old primary input controls (AudioSourceCrate, TrackPicker, standalone reference controls) are removed from the main performance shell. Decks in the Advanced drawer (DecksPanel) are now the primary way to:

  • Load and swap tracks per deck (fixtures or uploads)
  • Add/remove decks (1–4 active; at least one always remains)
  • Pick stem mode per deck
  • Set volume, mute, bus side, and timbre/structure reference roles
  • Crossfade between bus A and bus B
  • Toggle Monitor (browser audition) and Infer (server mix sync)

Deck runtime lives in PerformanceShell via useDeckRuntime, so mixing and server sync continue when the drawer tab is closed.


Deck model

useDeckStore is the browser source of truth. Each slot (AD) holds:

Field Role
trackName Loaded fixture or upload
sourcePart full, vocals, or instruments
volume, muted, solo Per-deck level (solo affects monitor only today)
crossfadeSide left → bus A, right → bus B
playing, positionSec, cueSec Local monitor transport only

Global state: active deckIds, crossfade (0 = full A, 1 = full B), timbreDeckId / structureDeckId, monitorEnabled, inferenceEnabled, and mixRevision (bumps on mix-relevant edits for sync).

Invariant: a deck is always a loaded track slot. Adding a deck requires choosing or uploading a track first; removing a deck clears that slot but never drops below one active deck. Deck 1 seeds from the current/default track.


Bus system and crossfader

Decks assign to one of two buses:

  • Bus AcrossfadeSide = "left"
  • Bus BcrossfadeSide = "right"

The crossfader is linear:

  • 0.0 → full bus A weight for A-side decks
  • 0.5 → equal A/B bus weighting
  • 1.0 → full bus B weight for B-side decks

Per-deck effective gain (shared by monitor and server):

gain = volume × bus_gain

where bus_gain = (1 - crossfade) on bus A and crossfade on bus B. Muted decks contribute 0. Active deck gains are normalized on the server so partial crossfades do not collapse latent energy toward zero.

The UI crossfader shows which decks sit on each side. Multiple decks on the same bus sum through normalized weighting after per-deck gain.


Two audio paths (monitor vs inference)

1. Browser monitor (useDeckMonitor)

  • Web Audio: one looping AudioBufferSourceNode per playing deck from loaded assets (useDeckAssets + deckAssets.ts)
  • Gain from volume, mute, solo, bus, and crossfade
  • Play/pause and cue are monitor-only — they gate local audition (requirePlaying: true)
  • Does not define what the model hears

2. Server inference (useDeckServerSyncdeck_mix_state)

  • Sends a compact, debounced (~40ms) payload over the main WebSocket
  • Per deck: id, track_name, source_part, volume, muted, playing (wire parity only), side, plus global crossfade
  • Handled in ws_adapter.pyStreamingSession.set_deck_mix_state()

Important behavior: inference is decoupled from monitor transport. A loaded, unmuted deck contributes to the mix whether or not its monitor is playing. That fixes the earlier bug where inference stayed on the default track until every deck was "playing."


Server-side parametric deck mix (core architecture)

Problem with the first approach

An earlier implementation rendered mixed PCM in the browser (renderDeckMix in deckMixer.ts) and pushed it through the heavy swap_source path (useDeckInferenceSync). That made crossfader, volume, and bus changes feel like full source swaps: VAE prep, stale queued snapshots, and lag compared to knobs like strength.

That path remains in the tree but is no longer wired from useDeckRuntime.

Current approach: blend prepared latents, not PCM

Each distinct (track_name, source_part) is prepared once into a session cache entry:

  • Source latent + context latent
  • Prompt conditioning pairs (including stem-aware cond_pair_b where applicable)

On each deck_mix_state update the server:

  1. Keeps only loaded, unmuted decks with gain > 0 (playing ignored)
  2. Loads or reuses cache entries (skips missing stem sidecars with a warning instead of blocking live extraction)
  3. Resizes latents to the active stream length
  4. Weighted-blends source latents, context latents, and source-conditioned prompt tensors
  5. Writes the result into stream.source / conditioning and marks hint blending dirty

Parametric edits (crossfade, volume, mute, bus assignment, stem part) hit the cache and re-blend — no swap_source after the first prepare for that (track, mode).

Safety invariants

  • No all-zero latent injection when the crossfader points at an empty or fully muted bus. Zero latents are not silence for this model and can cause harsh transients; the stream holds the last valid source until a contributing deck returns.
  • Normalized weights across active decks.
  • Missing stems skip that deck/mode rather than forcing synchronous separation on the hot path.

Unit coverage: tests/unit/test_deck_mix.py (16 tests) for _deck_gain and _resize_latent_tensor.


Track assets and stems (deck-facing only)

Decks consume HTTP asset routes on the demo server (/api/track_asset, /api/track_stem) for fixtures and uploads. Upload persistence and stem packets are part of the broader upload/session work; from the deck perspective, the UI only needs manifests and WAV sidecars for monitor loading and server prepare.

Per-deck full / vocals / instruments buttons switch sourcePart, which changes both monitor buffers and the cache key on the server.


Timbre and structure references

Timbre and structure are chosen per deck (replacing standalone ref controls). Fixture refs still use server fixture messages; custom tracks can send PCM from loaded deck assets. These paths still use the existing timbre/structure commands and are not yet fully folded into the parametric latent mixer.


Architecture

flowchart LR
  subgraph UI["Browser"]
    Store["useDeckStore"]
    Panel["DecksPanel"]
    Runtime["useDeckRuntime"]
    Monitor["useDeckMonitor\n(Web Audio)"]
    Sync["useDeckServerSync"]
    Store --> Panel
    Store --> Runtime
    Runtime --> Monitor
    Runtime --> Sync
  end

  subgraph Server["StreamingSession"]
    WS["deck_mix_state"]
    Cache["_deck_cache\n(track, part)"]
    Mix["set_deck_mix_state\nblend latents + cond"]
    Stream["active stream.source"]
    WS --> Mix
    Cache --> Mix
    Mix --> Stream
  end

  Sync -->|debounced params| WS
  Monitor -->|assets only| Assets["/api/track_*"]
  Mix -->|prepare on miss| Assets
Loading

Known limitations (WIP)

Area Status
solo Honored in browser monitor; not in server deck_mix_state yet
Playhead / offset Monitor respects position; server blends full-track prepared latents without per-deck timeline offset
Monitor vs server Separate implementations; monitor still gates on playing, server does not
Timbre/structure Deck UI wired; not fully integrated into latent mixer
E2E tests Gain math covered; no automated routing test for deck_mix_state or asset HTTP yet
Legacy useDeckInferenceSync.ts unused but still present

Branch technical notes: docs/DECK_SYSTEM_BRANCH_TECHNICAL.md.


Test plan

  • Start performance session; confirm deck A loads default track and Infer toggles sync mix to backend
  • Add decks B–D with different tracks; verify crossfader moves inference blend without full source-swap lag
  • Mute / volume / bus reassignment while generating; confirm live response
  • Switch full / vocals / instruments per deck when sidecars exist
  • Crossfade to empty bus (all muted or wrong side); confirm no harsh transient (hold-last-source)
  • Monitor play/pause: audible locally; inference unchanged when Infer on and decks loaded+unmuted
  • Timbre/structure deck roles with fixture vs upload
  • uv run pytest tests/unit/test_deck_mix.py -q

Key files (deck system)

Frontend: useDeckStore.ts, DecksPanel.tsx, useDeckRuntime.ts, useDeckAssets.ts, useDeckMonitor.ts, useDeckServerSync.ts, deckAssets.ts, deckMixer.ts, protocol.ts

Backend: session.py (set_deck_mix_state, _deck_cache, _deck_gain), ws_adapter.py, demo server.py asset routes

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
…sions, major work on deck system reliability.

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
@BuffMcBigHuge BuffMcBigHuge mentioned this pull request Jun 3, 2026
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant