Releases · Maxritz/ollama-ROCM

This is a highly optimized build of Ollama tailored for AMD RDNA4 architecture (specifically RX 9070 XT). It includes 20 specific optimizations such as Paged KV Cache, Split-K Matmul, MoE Top-K routing, RoPE Cache, and TurboQuant.

Benchmarks (8B Q8_0 model on RX 9070 XT)

Prefill Rate: ~3463 tokens/sec
Generate Rate: ~78 tokens/sec
Time to First Token (TTFT): ~130ms

Full Changelog: https://github.com/Maxritz/ollama-ROCM/commits/v0.1

Complete optimization suite for AMD Radeon RX 9070 XT (gfx1201) targeting ROCm 7.1.

HIP Backend Extensions (ggml_hip_ext.cu)

Paged KV cache with LRU eviction and device memory pools
MoE top-k routing kernel (fused softmax + selection)
Split-K matrix multiply with rocBLAS auto-tuning
RoPE/YaRN cache (32K precomputed cos/sin in fp16)
Q8_0 quantized KV cache + fused attention kernel
Persistent batch buffers (zero-allocation decode)
Speculative N-gram decoding predictor
Async H2D upload ring buffer with HIP events
Q4_K fused dequantization kernel

TurboQuant KV (ggml_turboquant.cu)

Walsh-Hadamard Transform (WHT) decorrelation
Lloyd-Max optimal quantization (2/3/4-bit)
QJL (Quantized Johnson-Lindenstrauss) projection for K-cache
Fused WHT+quantize and dequant+IWHT kernels
Format: TBQ2_0, TBQ3_0/1/2, TBQ4_0/1/2, TBQP3_0/1/2

rocWMMA Fixes

WMMA warp mask corrections for gfx1201 Wave32 mode
fattn-mma-f16 cooperative matrix tile alignment fixes

Tested on: AMD RX 9070 XT, ROCm 7.1, Windows 11, LLVM 23.0
Build: cmake -DAMDGPU_TARGETS=gfx1201 -DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP_TURBOQUANT=ON

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

HIP Backend Extensions (ggml_hip_ext.cu)

TurboQuant KV (ggml_turboquant.cu)

rocWMMA Fixes

Uh oh!

Releases: Maxritz/ollama-ROCM

Ollama — RDNA4 gfx1201 + DARS v2.0 Fork

Uh oh!

OLLAMA ROCM rdna4-gfx1201 v0.1

HIP Backend Extensions (ggml_hip_ext.cu)

TurboQuant KV (ggml_turboquant.cu)

rocWMMA Fixes

Uh oh!