Model Compatibility

TurboQuant compresses standard transformer KV caches.

Model Family	Status	Notes
Llama-3 / 3.1 / 3.2	Full support	GQA-aware mode recommended
Mistral / Mixtral	Full support	Sliding window auto-detected
Gemma / Gemma 2	Full support
Qwen2.5 / Qwen3	Full support
Phi-3 / Phi-4	Full support
Command-R	Full support
DeepSeek-V2/V3	Skip MLA layers	KV already compressed by MLA
Qwen3.5 / Jamba	Attention layers only	Non-attention layers skipped
T5 / BART / mBART	Partial	Self-attention KV only
Mamba / RWKV	Not applicable	No KV cache (SSM/RNN)

Auto-Detection

turboquant.wrap() and TurboQuantDynamicCache.from_model() automatically:

For unsupported or exotic architectures:

from turboquant import TurboQuantKVCache

cache = TurboQuantKVCache(
    head_dim=128,     # set manually
    bit_width=3,
    residual_length=0,
)