onnx: add com.microsoft MultiHeadAttention handler by czoli1976 · Pull Request #2291 · sonos/tract

czoli1976 · 2026-05-26T12:24:57Z

MultiHeadAttention: standard (bidirectional) multi-head attention over unpacked query/key/value, lowered onto tract's Sdpa, with optional present_key/present_value outputs. Bias, attention/padding masks, packed QKV and past KV cache are rejected with clear errors. Validated bit-close vs onnxruntime (output ~1e-7, present-KV bit-exact) across self-/cross-attention, num_heads variations, and the f16 compute path; no node-suite regression; clippy+fmt clean. Part of com.microsoft contrib-op coverage for ORT-exported LLMs.

Standard (bidirectional) multi-head attention over unpacked query/key/value, lowered onto tract Sdpa, with optional present_key/present_value outputs. Bias, attention/padding masks, packed QKV and past KV cache are rejected with clear errors. Validated bit-close against onnxruntime (output + present_key/value). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

czoli1976 force-pushed the feature/onnx-mha branch from c975883 to 8636e5e Compare May 26, 2026 12:35

czoli1976 mentioned this pull request May 26, 2026

onnx: add com.microsoft GroupQueryAttention handler #2292

Open

kali approved these changes May 27, 2026

View reviewed changes

kali merged commit 7b2ea86 into sonos:main May 27, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnx: add com.microsoft MultiHeadAttention handler#2291

onnx: add com.microsoft MultiHeadAttention handler#2291
kali merged 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-mha

czoli1976 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

czoli1976 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants