realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93
Open
gioelecerati wants to merge 1 commit into
Open
realtime_motion_graph_web: env-driven cap on the TRT profile ceiling#93gioelecerati wants to merge 1 commit into
gioelecerati wants to merge 1 commit into
Conversation
Adds RTMG_TRT_PROFILE_CAP_S so a pod can refuse to load the 240 s decoder + vae_encode profile when its VRAM headroom doesn't support it. Audio longer than the cap gets clipped to the cap (same path as the existing largest-registered-profile ceiling). Why now: 32 GB 5090 pods are OOM'ing on 20 MiB allocations at session-start because the steady-state baseline (240 s decoder + 240 s vae_encode + resident text encoder + LUFS buffers + dreamvae + rcfg) leaves single-digit MiB of headroom. The cap lets ops trade long-upload support for headroom without code changes. How to apply: set RTMG_TRT_PROFILE_CAP_S=120 in /workspace/demon.env on 5090 pods (and any future card class that needs the same trade). Unset => current behavior, max_seconds = largest registered profile. The clamp lives where max_seconds is computed, so swap_source picks up the same cap automatically via the existing new_wf trim and the ensure_profile() call that follows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RTMG_TRT_PROFILE_CAP_Sso a pod can refuse to load the 240 s decoder + vae_encode profile when its VRAM headroom doesn't support it.waveform = waveform[:, :int(max_seconds * SAMPLE_RATE)]).max_secondsso the swap-source path picks it up automatically (the existingnew_wf[:, :int(max_seconds * SAMPLE_RATE)]clip inapply_swap_if_pending+ theensure_profile()call that follows both honor it).Motivation
32 GB 5090 pods are OOM-ing on tiny (~20 MiB) allocations at session start. The steady-state VRAM baseline keeps growing (text encoder resident by default, LUFS buffers, dreamvae, rcfg/rescale state, server-side LoRA injection), and the 240 s decoder + 240 s vae_encode together push the 5090 to single-digit MiB of headroom. The next session's scratch tensor trips the allocator → Python crashes → supervisor respawns → next client sees WS 1011 ("Server restarted to clear memory").
Capping the profile to 120 s on 5090s drops VRAM use enough to give the allocator real headroom, at the cost of long uploads being trimmed to the cap. Better to lose 120 s of audio than to lose the session entirely.
How to roll out
Set
RTMG_TRT_PROFILE_CAP_S=120in/workspace/demon.envon 5090 pods (and any other card class that needs the same trade). Unset → current behavior, ceiling = largest registered profile.Test plan
RTMG_TRT_PROFILE_CAP_S=120→ log showsRTMG_TRT_PROFILE_CAP_S=120s active (profile ceiling was 240s).RTMG_TRT_PROFILE_CAP_S=garbage→ log shows the ignoring-non-numeric warning, behavior unchanged.🤖 Generated with Claude Code