Skip to content

onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul)#2290

Open
czoli1976 wants to merge 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-matmulnbits
Open

onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul)#2290
czoli1976 wants to merge 1 commit into
sonos:mainfrom
czoli1976:feature/onnx-matmulnbits

Conversation

@czoli1976
Copy link
Copy Markdown
Contributor

MatMulNBits: 4-bit blocked weight-quantized matmul (the dominant quantization in edge-LLM exports). Dequantizes the constant quantized weight (B, scales, optional packed zero_points; default zero point 8) to a float [N,K] weight and emits a plain matmul (EinSum) against the activation, with optional bias. A fused int4 kernel would be a natural follow-up perf optimization. Validated BIT-EXACT vs onnxruntime across block_size 16/32/128, symmetric & asymmetric quantization, and 2D/3D activations; no node-suite regression; clippy+fmt clean. Part of com.microsoft contrib-op coverage for ORT-exported LLMs.

Dequantizes the constant 4-bit blocked weight (B, scales, optional packed
zero_points; default zero point 8) to a float [N, K] weight in Rust and
emits a plain matmul (EinSum) against the activation, with an optional
bias. A fused int4 kernel would be a follow-up perf optimization.

Validated bit-exact against onnxruntime MatMulNBits for both symmetric
(no zero_points) and asymmetric quantization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@czoli1976 czoli1976 force-pushed the feature/onnx-matmulnbits branch from 2b3d915 to 1bb5e6f Compare May 26, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant