Skip to content

realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93

Open
gioelecerati wants to merge 1 commit into
mainfrom
gio/feat/trt-profile-cap
Open

realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93
gioelecerati wants to merge 1 commit into
mainfrom
gio/feat/trt-profile-cap

Conversation

@gioelecerati
Copy link
Copy Markdown
Collaborator

Summary

  • Adds RTMG_TRT_PROFILE_CAP_S so a pod can refuse to load the 240 s decoder + vae_encode profile when its VRAM headroom doesn't support it.
  • Audio longer than the cap gets clipped to the cap (same code path as the existing largest-registered-profile ceiling — waveform = waveform[:, :int(max_seconds * SAMPLE_RATE)]).
  • The clamp lives at max_seconds so the swap-source path picks it up automatically (the existing new_wf[:, :int(max_seconds * SAMPLE_RATE)] clip in apply_swap_if_pending + the ensure_profile() call that follows both honor it).

Motivation

32 GB 5090 pods are OOM-ing on tiny (~20 MiB) allocations at session start. The steady-state VRAM baseline keeps growing (text encoder resident by default, LUFS buffers, dreamvae, rcfg/rescale state, server-side LoRA injection), and the 240 s decoder + 240 s vae_encode together push the 5090 to single-digit MiB of headroom. The next session's scratch tensor trips the allocator → Python crashes → supervisor respawns → next client sees WS 1011 ("Server restarted to clear memory").

Capping the profile to 120 s on 5090s drops VRAM use enough to give the allocator real headroom, at the cost of long uploads being trimmed to the cap. Better to lose 120 s of audio than to lose the session entirely.

How to roll out

Set RTMG_TRT_PROFILE_CAP_S=120 in /workspace/demon.env on 5090 pods (and any other card class that needs the same trade). Unset → current behavior, ceiling = largest registered profile.

Test plan

  • Backend boots with no env var set → log shows no cap message, behavior identical to current.
  • Backend boots with RTMG_TRT_PROFILE_CAP_S=120 → log shows RTMG_TRT_PROFILE_CAP_S=120s active (profile ceiling was 240s).
  • Upload a 30 s track with the cap on → 60 s profile picked (existing smallest-fit logic).
  • Upload a 200 s track with the cap on → trimmed to 120 s, 120 s profile picked.
  • Swap to a 180 s track mid-session with the cap on → swap succeeds, trimmed to 120 s, no profile swap if 120 s was already loaded.
  • Backend boots with RTMG_TRT_PROFILE_CAP_S=garbage → log shows the ignoring-non-numeric warning, behavior unchanged.

🤖 Generated with Claude Code

Adds RTMG_TRT_PROFILE_CAP_S so a pod can refuse to load the 240 s
decoder + vae_encode profile when its VRAM headroom doesn't support
it. Audio longer than the cap gets clipped to the cap (same path as
the existing largest-registered-profile ceiling).

Why now: 32 GB 5090 pods are OOM'ing on 20 MiB allocations at
session-start because the steady-state baseline (240 s decoder +
240 s vae_encode + resident text encoder + LUFS buffers + dreamvae
+ rcfg) leaves single-digit MiB of headroom. The cap lets ops trade
long-upload support for headroom without code changes.

How to apply: set RTMG_TRT_PROFILE_CAP_S=120 in /workspace/demon.env
on 5090 pods (and any future card class that needs the same trade).
Unset => current behavior, max_seconds = largest registered profile.

The clamp lives where max_seconds is computed, so swap_source picks
up the same cap automatically via the existing new_wf trim and the
ensure_profile() call that follows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant