Skip to content

harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU op-only assert#2259

Merged
kali merged 3 commits into
sonos:mainfrom
czoli1976:fix/nemotron-gpu-op-allowlist-roi
May 21, 2026
Merged

harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU op-only assert#2259
kali merged 3 commits into
sonos:mainfrom
czoli1976:fix/nemotron-gpu-op-allowlist-roi

Conversation

@czoli1976
Copy link
Copy Markdown
Contributor

Summary

Unblocks the macOS / Nemotron speech streaming en 0.6b (and cuda-lovelace) Large-models job, which currently fails on main with:

ERROR tract] Model has 48 unexpected op(s):
   _N_selfAttn_matrixBdSlice0_dyn_slice_output (DiagGather)
   _N_selfAttn_softmax0.scaled_masked_softmax (ScaledMaskedSoftmax)

The encoder's ROI attention emits DiagGather + ScaledMaskedSoftmax. These have no Metal/Cuda implementation, so they fall back to CPU — but the nemotron --metal / --cuda --assert-op-only allowlists in ci.sh don't list them, so the strict op-coverage assert fails on the GPU runtimes. (The CPU -O pass of all four sub-models passes fine.)

harness/parakeet-tdt-600m-v3/ci.sh already allows DiagGather; this mirrors that for nemotron and additionally adds ScaledMaskedSoftmax (nemotron's chunk-window-mask encoder uses it; parakeet doesn't).

If you'd rather these get Metal/Cuda impls than be CPU fallbacks, happy to drop this — but as-is it unblocks every PR branched off current main (the job is currently red on e.g. #2257 and task/onnx-dtype-panic, neither of which touch these ops).

Test plan

  • macOS / Nemotron speech streaming en 0.6b green
  • cuda-lovelace / Nemotron speech streaming en 0.6b green

🤖 Generated with Claude Code

The encoder's ROI attention emits DiagGather and ScaledMaskedSoftmax,
which have no Metal/Cuda impl and fall back to CPU. The nemotron GPU
--assert-op-only allowlists didn't list them, so the macOS/Metal (and
cuda) Large-models job fails with "Model has 48 unexpected op(s)".
Mirrors parakeet's ci.sh (already allows DiagGather); nemotron's
chunk-window-mask encoder additionally emits ScaledMaskedSoftmax.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kali
Copy link
Copy Markdown
Collaborator

kali commented May 20, 2026

Sorry, took a shortcut. Thanks for the fix.

kali
kali previously approved these changes May 20, 2026
@kali
Copy link
Copy Markdown
Collaborator

kali commented May 20, 2026

There's a second one hidden behind. I'm on it.

@kali kali merged commit 203304e into sonos:main May 21, 2026
55 checks passed
@kali
Copy link
Copy Markdown
Collaborator

kali commented May 21, 2026

Apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants