realtime_motion_graph_web: env-driven cap on the TRT profile ceiling by gioelecerati · Pull Request #93 · daydreamlive/DEMON

gioelecerati · 2026-05-13T05:52:55Z

Summary

Adds RTMG_TRT_PROFILE_CAP_S so a pod can refuse to load the 240 s decoder + vae_encode profile when its VRAM headroom doesn't support it.
Audio longer than the cap gets clipped to the cap (same code path as the existing largest-registered-profile ceiling — waveform = waveform[:, :int(max_seconds * SAMPLE_RATE)]).
The clamp lives at max_seconds so the swap-source path picks it up automatically (the existing new_wf[:, :int(max_seconds * SAMPLE_RATE)] clip in apply_swap_if_pending + the ensure_profile() call that follows both honor it).

Motivation

32 GB 5090 pods are OOM-ing on tiny (~20 MiB) allocations at session start. The steady-state VRAM baseline keeps growing (text encoder resident by default, LUFS buffers, dreamvae, rcfg/rescale state, server-side LoRA injection), and the 240 s decoder + 240 s vae_encode together push the 5090 to single-digit MiB of headroom. The next session's scratch tensor trips the allocator → Python crashes → supervisor respawns → next client sees WS 1011 ("Server restarted to clear memory").

Capping the profile to 120 s on 5090s drops VRAM use enough to give the allocator real headroom, at the cost of long uploads being trimmed to the cap. Better to lose 120 s of audio than to lose the session entirely.

How to roll out

Set RTMG_TRT_PROFILE_CAP_S=120 in /workspace/demon.env on 5090 pods (and any other card class that needs the same trade). Unset → current behavior, ceiling = largest registered profile.

Test plan

Backend boots with no env var set → log shows no cap message, behavior identical to current.
Backend boots with RTMG_TRT_PROFILE_CAP_S=120 → log shows RTMG_TRT_PROFILE_CAP_S=120s active (profile ceiling was 240s).
Upload a 30 s track with the cap on → 60 s profile picked (existing smallest-fit logic).
Upload a 200 s track with the cap on → trimmed to 120 s, 120 s profile picked.
Swap to a 180 s track mid-session with the cap on → swap succeeds, trimmed to 120 s, no profile swap if 120 s was already loaded.
Backend boots with RTMG_TRT_PROFILE_CAP_S=garbage → log shows the ignoring-non-numeric warning, behavior unchanged.

🤖 Generated with Claude Code

Adds RTMG_TRT_PROFILE_CAP_S so a pod can refuse to load the 240 s decoder + vae_encode profile when its VRAM headroom doesn't support it. Audio longer than the cap gets clipped to the cap (same path as the existing largest-registered-profile ceiling). Why now: 32 GB 5090 pods are OOM'ing on 20 MiB allocations at session-start because the steady-state baseline (240 s decoder + 240 s vae_encode + resident text encoder + LUFS buffers + dreamvae + rcfg) leaves single-digit MiB of headroom. The cap lets ops trade long-upload support for headroom without code changes. How to apply: set RTMG_TRT_PROFILE_CAP_S=120 in /workspace/demon.env on 5090 pods (and any future card class that needs the same trade). Unset => current behavior, max_seconds = largest registered profile. The clamp lives where max_seconds is computed, so swap_source picks up the same cap automatically via the existing new_wf trim and the ensure_profile() call that follows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93

realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93
gioelecerati wants to merge 1 commit into
mainfrom
gio/feat/trt-profile-cap

gioelecerati commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gioelecerati commented May 13, 2026

Summary

Motivation

How to roll out

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant