fix(ggml-cuda): skip sm_120→sm_120a for consumer Blackwell (no FP4 MMA) by easel · Pull Request #3 · Luce-Org/llama.cpp-dflash-ggml

easel · 2026-04-27T19:56:28Z

Problem

Consumer Blackwell GPUs (RTX 5090, SM 12.0) lack FP4 tensor core hardware.
ggml-cuda/CMakeLists.txt unconditionally replaces sm_12X with sm_12Xa
and compiles mmq-instance-mxfp4/nvfp4 with BLACKWELL_MMA_AVAILABLE, which
emits .block_scale / mxf4 PTX instructions that fault with
CUDA_ERROR_ILLEGAL_INSTRUCTION on consumer hardware at runtime.

Changes

ggml/src/ggml-cuda/CMakeLists.txt

Add GGML_CUDA_BLACKWELL_CONSUMER CMake option (default OFF)
When ON: skip the 12X → 12Xa arch replacement and exclude
mmq-instance-mxfp4.cu / mmq-instance-nvfp4.cu from the build
Add GGML_CUDA_BLACKWELL_CONSUMER compile definition so mmq.cu can gate dispatch

ggml/src/ggml-cuda/mmq.cu

Guard MXFP4/NVFP4 switch cases with #ifndef GGML_CUDA_BLACKWELL_CONSUMER
to prevent linker errors when those instance files are excluded
Return mmq_supported = false for those types when the flag is set

Usage

Set GGML_CUDA_BLACKWELL_CONSUMER=ON at cmake configure time for builds
targeting consumer Blackwell (RTX 5080/5090). The parent repo
Luce-Org/lucebox-hub sets this automatically via nvidia-smi detection
(see companion PR Luce-Org/lucebox-hub#48).

Test plan

cmake -B build -DGGML_CUDA_BLACKWELL_CONSUMER=ON -DCMAKE_CUDA_ARCHITECTURES=120 -S . completes without error
No ptxas: Feature '.block_scale' not supported on .target 'sm_120' errors
Runtime kernels execute without CUDA_ERROR_ILLEGAL_INSTRUCTION

🤖 Generated with Claude Code

Consumer Blackwell GPUs (RTX 5090, SM 12.0) do not have FP4 tensor core instructions. The existing code unconditionally replaces sm_120 with sm_120a and compiles mmq-instance-mxfp4/nvfp4 with BLACKWELL_MMA_AVAILABLE, which emits .block_scale / mxf4 PTX that faults on sm_120 hardware. Add GGML_CUDA_BLACKWELL_CONSUMER option (set by parent build when nvidia-smi reports SM 12.x without an explicit 'a' variant): - Skip the 12X→12Xa arch replacement so ggml-cuda compiles for plain sm_120 - Exclude mmq-instance-mxfp4.cu and mmq-instance-nvfp4.cu from the build - Guard their dispatch cases in mmq.cu to prevent linker errors and surface a clear abort if FP4 types are somehow requested at runtime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

davide221 merged commit 5776d4d into Luce-Org:luce-dflash May 4, 2026

easel deleted the fix/consumer-blackwell-sm120 branch May 10, 2026 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ggml-cuda): skip sm_120→sm_120a for consumer Blackwell (no FP4 MMA)#3

fix(ggml-cuda): skip sm_120→sm_120a for consumer Blackwell (no FP4 MMA)#3
davide221 merged 1 commit into
Luce-Org:luce-dflashfrom
easel:fix/consumer-blackwell-sm120

easel commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

easel commented Apr 27, 2026

Problem

Changes

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants