Question about cudagraph_runtime_mode=None #46

brandonmmusic-max · 2026-04-01T16:22:56Z

brandonmmusic-max
Apr 1, 2026

I've been experimenting with tryin to train a dflash model on qwen 3.5 397b running through VLLM. Training results were pretty good, but I get a very buggy 0 percent acceptance during actual inference . I've been debugging and i'm thinking the cudagraphs capture the dflash forward with 0 context tokens during warmup, and then replay that frozen graph during real inference--actual context hidden sates never reach the model. Anyone ever encounter that during testing? I was thinking of disabling cuda graph just during the dflash proposed call, but I'm out of my league on this one . lol, but the results of your models look great, and thank you all for sharing !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about cudagraph_runtime_mode=None #46

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Question about cudagraph_runtime_mode=None #46

Uh oh!

brandonmmusic-max Apr 1, 2026

Replies: 0 comments

brandonmmusic-max
Apr 1, 2026