W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.
-
Updated
May 3, 2026 - Python
W8A8/W4A8 inference on Apple Silicon — unlocking unused INT8 TensorOps in M5 for 1.2–1.9× faster LLM prefill, built as MLX custom primitives.
Add a description, image, and links to the w8a8 topic page so that developers can more easily learn about it.
To associate your repository with the w8a8 topic, visit your repo's landing page and select "manage topics."