This repository provides a reliable, patch-based wrapper to inject GFX906 (Radeon VII / MI50) Turbo optimizations and TurboQuant KV Cache Compression into a known-stable version of the upstream llama.cpp repository.
By using a patch approach, you get the latest features (like MTP, Medusa, and Eagle speculative decoding) without having to manually maintain a heavily modified fork.
- GFX906 Wave64 Kernels: Highly optimized warp-cooperative kernels tailored specifically for the Radeon VII / MI50 hardware architecture, drastically improving Prompt Processing and Token Generation speed.
-
TurboQuant: Support for 2-bit, 3-bit (
turbo3), and 4-bit KV cache compression to save up to 78% of Context VRAM with minimal quality loss. - Shadow Cache: A persistent FP16 shadow cache workaround to resolve the known ROCm 6.0+ instability issues on GFX906 during FlashAttention dequantization.
-
FWHT Rotation: A fast
$O(d \log d)$ Walsh-Hadamard Transform kernel (GGML_OP_TURBO_WHT) to rotate the KV cache into a compression-friendly space. -
HIP Graphs: Fully integrated and activated
-DGGML_HIP_GRAPHS=ONto reduce CPU overhead during decoding.
We provide a simple Bash script that automatically clones the upstream repository, checks out the exact commit this patch was built for, and applies the optimizations.
chmod +x apply-turbo.sh
./apply-turbo.shIf successful, you will see a new directory called llama.cpp-gfx906-turbo.
Move into the new directory and build using CMake. The script will output these exact commands:
cd llama.cpp-gfx906-turbo
mkdir build && cd build
cmake .. -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx906 -DGGML_HIP_GRAPHS=ON
make -j llama-cli./bin/llama-cli -m your_model.gguf --mtp 1 --ctk turbo3 --ctv turbo3 -fa on ...This patch is tied to a specific upstream commit (acd604fb277044e07c2bff01f4c169167b45f478).
If you want to update to a newer upstream commit in the future:
- Change the
STABLE_COMMITvariable inapply-turbo.sh. - Run the script.
- If the script fails (because upstream code changed significantly), Git will generate
*.rejfiles indicating which parts of the patch failed. - Manually fix the
.rejconflicts in thellama.cpp-gfx906-turbodirectory. - Create a new patch using
git diff > turbo-gfx906-mtp.patchand overwrite the old one.