Fast LLM speculative inference server for consumer hardware.
-
Updated
Jun 8, 2026 - C++
Fast LLM speculative inference server for consumer hardware.
Air.rs 70B+ inference on consumer GPU, LLM inference in Rust
A light, transparent, and modular inference & quantization engine for studying LLMs.
An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
Add a description, image, and links to the megakernel topic page so that developers can more easily learn about it.
To associate your repository with the megakernel topic, visit your repo's landing page and select "manage topics."