onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul) by czoli1976 · Pull Request #2290 · sonos/tract

czoli1976 · 2026-05-26T12:24:48Z

MatMulNBits: 4-bit blocked weight-quantized matmul (the dominant quantization in edge-LLM exports). Dequantizes the constant quantized weight (B, scales, optional packed zero_points; default zero point 8) to a float [N,K] weight and emits a plain matmul (EinSum) against the activation, with optional bias. A fused int4 kernel would be a natural follow-up perf optimization. Validated BIT-EXACT vs onnxruntime across block_size 16/32/128, symmetric & asymmetric quantization, and 2D/3D activations; no node-suite regression; clippy+fmt clean. Part of com.microsoft contrib-op coverage for ORT-exported LLMs.

Dequantizes the constant 4-bit blocked weight (B, scales, optional packed zero_points; default zero point 8) to a float [N, K] weight in Rust and emits a plain matmul (EinSum) against the activation, with an optional bias. A fused int4 kernel would be a follow-up perf optimization. Validated bit-exact against onnxruntime MatMulNBits for both symmetric (no zero_points) and asymmetric quantization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

czoli1976 force-pushed the feature/onnx-matmulnbits branch from 2b3d915 to 1bb5e6f Compare May 26, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul)#2290

onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul)#2290
czoli1976 wants to merge 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-matmulnbits

czoli1976 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

czoli1976 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant