Skip to content

onnx: add com.microsoft MultiHeadAttention handler#2291

Merged
kali merged 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-mha
May 27, 2026
Merged

onnx: add com.microsoft MultiHeadAttention handler#2291
kali merged 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-mha

Conversation

@czoli1976
Copy link
Copy Markdown
Contributor

MultiHeadAttention: standard (bidirectional) multi-head attention over unpacked query/key/value, lowered onto tract's Sdpa, with optional present_key/present_value outputs. Bias, attention/padding masks, packed QKV and past KV cache are rejected with clear errors. Validated bit-close vs onnxruntime (output ~1e-7, present-KV bit-exact) across self-/cross-attention, num_heads variations, and the f16 compute path; no node-suite regression; clippy+fmt clean. Part of com.microsoft contrib-op coverage for ORT-exported LLMs.

Standard (bidirectional) multi-head attention over unpacked query/key/value,
lowered onto tract Sdpa, with optional present_key/present_value outputs.
Bias, attention/padding masks, packed QKV and past KV cache are rejected
with clear errors.

Validated bit-close against onnxruntime (output + present_key/value).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kali kali merged commit 7b2ea86 into sonos:main May 27, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants