harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU op-only assert by czoli1976 · Pull Request #2259 · sonos/tract

czoli1976 · 2026-05-20T20:43:43Z

Summary

Unblocks the macOS / Nemotron speech streaming en 0.6b (and cuda-lovelace) Large-models job, which currently fails on main with:

ERROR tract] Model has 48 unexpected op(s):
   _N_selfAttn_matrixBdSlice0_dyn_slice_output (DiagGather)
   _N_selfAttn_softmax0.scaled_masked_softmax (ScaledMaskedSoftmax)

The encoder's ROI attention emits DiagGather + ScaledMaskedSoftmax. These have no Metal/Cuda implementation, so they fall back to CPU — but the nemotron --metal / --cuda --assert-op-only allowlists in ci.sh don't list them, so the strict op-coverage assert fails on the GPU runtimes. (The CPU -O pass of all four sub-models passes fine.)

harness/parakeet-tdt-600m-v3/ci.sh already allows DiagGather; this mirrors that for nemotron and additionally adds ScaledMaskedSoftmax (nemotron's chunk-window-mask encoder uses it; parakeet doesn't).

If you'd rather these get Metal/Cuda impls than be CPU fallbacks, happy to drop this — but as-is it unblocks every PR branched off current main (the job is currently red on e.g. #2257 and task/onnx-dtype-panic, neither of which touch these ops).

Test plan

macOS / Nemotron speech streaming en 0.6b green
cuda-lovelace / Nemotron speech streaming en 0.6b green

🤖 Generated with Claude Code

The encoder's ROI attention emits DiagGather and ScaledMaskedSoftmax, which have no Metal/Cuda impl and fall back to CPU. The nemotron GPU --assert-op-only allowlists didn't list them, so the macOS/Metal (and cuda) Large-models job fails with "Model has 48 unexpected op(s)". Mirrors parakeet's ci.sh (already allows DiagGather); nemotron's chunk-window-mask encoder additionally emits ScaledMaskedSoftmax. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kali · 2026-05-20T20:47:19Z

Sorry, took a shortcut. Thanks for the fix.

kali · 2026-05-20T20:58:46Z

There's a second one hidden behind. I'm on it.

kali · 2026-05-21T07:34:42Z

Apologies.

czoli1976 mentioned this pull request May 20, 2026

core/ops/scan: reuse body state across iterations (skip per-timestep plan churn) #2257

Draft

4 tasks

kali previously approved these changes May 20, 2026

View reviewed changes

fix pulsification prereq on dry run

f05539c

kali dismissed their stale review via f05539c May 20, 2026 21:18

pulse meets large models in ci !

f50a4cc

kali merged commit 203304e into sonos:main May 21, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU op-only assert#2259

harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU op-only assert#2259
kali merged 3 commits into
sonos:mainfrom
czoli1976:fix/nemotron-gpu-op-allowlist-roi

czoli1976 commented May 20, 2026

Uh oh!

kali commented May 20, 2026

Uh oh!

kali commented May 20, 2026

Uh oh!

Uh oh!

kali commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

czoli1976 commented May 20, 2026

Summary

Test plan

Uh oh!

kali commented May 20, 2026

Uh oh!

kali commented May 20, 2026

Uh oh!

Uh oh!

kali commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants