fix(ggml-cuda): skip sm_120→sm_120a for consumer Blackwell (no FP4 MMA)#3
Merged
davide221 merged 1 commit intoMay 4, 2026
Merged
Conversation
Consumer Blackwell GPUs (RTX 5090, SM 12.0) do not have FP4 tensor core instructions. The existing code unconditionally replaces sm_120 with sm_120a and compiles mmq-instance-mxfp4/nvfp4 with BLACKWELL_MMA_AVAILABLE, which emits .block_scale / mxf4 PTX that faults on sm_120 hardware. Add GGML_CUDA_BLACKWELL_CONSUMER option (set by parent build when nvidia-smi reports SM 12.x without an explicit 'a' variant): - Skip the 12X→12Xa arch replacement so ggml-cuda compiles for plain sm_120 - Exclude mmq-instance-mxfp4.cu and mmq-instance-nvfp4.cu from the build - Guard their dispatch cases in mmq.cu to prevent linker errors and surface a clear abort if FP4 types are somehow requested at runtime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Consumer Blackwell GPUs (RTX 5090, SM 12.0) lack FP4 tensor core hardware.
ggml-cuda/CMakeLists.txtunconditionally replacessm_12Xwithsm_12Xaand compiles
mmq-instance-mxfp4/nvfp4withBLACKWELL_MMA_AVAILABLE, whichemits
.block_scale/mxf4PTX instructions that fault withCUDA_ERROR_ILLEGAL_INSTRUCTIONon consumer hardware at runtime.Changes
ggml/src/ggml-cuda/CMakeLists.txtGGML_CUDA_BLACKWELL_CONSUMERCMake option (default OFF)12X → 12Xaarch replacement and excludemmq-instance-mxfp4.cu/mmq-instance-nvfp4.cufrom the buildGGML_CUDA_BLACKWELL_CONSUMERcompile definition sommq.cucan gate dispatchggml/src/ggml-cuda/mmq.cuMXFP4/NVFP4switch cases with#ifndef GGML_CUDA_BLACKWELL_CONSUMERto prevent linker errors when those instance files are excluded
mmq_supported = falsefor those types when the flag is setUsage
Set
GGML_CUDA_BLACKWELL_CONSUMER=ONat cmake configure time for buildstargeting consumer Blackwell (RTX 5080/5090). The parent repo
Luce-Org/lucebox-hubsets this automatically vianvidia-smidetection(see companion PR Luce-Org/lucebox-hub#48).
Test plan
cmake -B build -DGGML_CUDA_BLACKWELL_CONSUMER=ON -DCMAKE_CUDA_ARCHITECTURES=120 -S .completes without errorptxas: Feature '.block_scale' not supported on .target 'sm_120'errorsCUDA_ERROR_ILLEGAL_INSTRUCTION🤖 Generated with Claude Code