Skip to content

chore(dflash): enforce sm_89 user override and keep BSA enabled#137

Open
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-build-cmake-sm89-bsa
Open

chore(dflash): enforce sm_89 user override and keep BSA enabled#137
javierpazo wants to merge 1 commit into
Luce-Org:mainfrom
javierpazo:xabicasa/dflash-build-cmake-sm89-bsa

Conversation

@javierpazo

Copy link
Copy Markdown
Contributor

chore(dflash): enforce sm_89 user override and keep BSA enabled

Two CMake-side rough edges that bit me on Windows MSVC + CUDA 12.x
on RTX 6000 Ada (sm_89, Ada-only):

  1. CUDA architectures: when no explicit override is provided,
    the previous CMakeLists could fall back to 75;86, which
    caused silent build issues on Ada-only setups. This change
    respects DFLASH27B_USER_CUDA_ARCHITECTURES (e.g. 89) and
    uses it consistently across the dflash and submodule
    ggml/llama.cpp consumers.

  2. BSA was sometimes silently disabled depending on detection
    order. DFLASH27B_ENABLE_BSA is now respected as an explicit
    opt-in/opt-out and a clear status line is printed at
    configure time.

Net effect: a single-arch Ada-only build with BSA enabled is
reproducible from a clean checkout. Default behaviour (no
DFLASH27B_USER_CUDA_ARCHITECTURES set, BSA on) is preserved for
existing users.

Validation:
cmake -S dflash -B dflash/build/Release
-DCMAKE_BUILD_TYPE=Release
-DDFLASH27B_USER_CUDA_ARCHITECTURES=89
-DDFLASH27B_ENABLE_BSA=ON
cmake --build dflash/build/Release --target test_dflash --parallel 8
-> BUILD_EXIT_CODE=0, sm_89 single-arch confirmed.

Verification vs existing community PRs:

COMP-COMPL with #48 ("auto-detect GPU arch to prevent sm_120a on
consumer Blackwell", open) and #91 ("expose BSA config as CLI
flags with safety warnings", merged 2026-05-04). #48 covers
auto-detect; #91 covers runtime CLI. This PR covers the
build-time CMake side: respect the user's explicit
DFLASH27B_USER_CUDA_ARCHITECTURES override and keep
DFLASH27B_ENABLE_BSA honest. The three PRs together give
sensible defaults per hardware tier.

Author: Javier Pazo xabicasa@gmail.com

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="dflash/CMakeLists.txt">

<violation number="1" location="dflash/CMakeLists.txt:193">
P2: Removing the SM80 guard can make `test_flashprefill_kernels` link fail when FlashPrefill sources are not compiled into `dflash27b`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread dflash/CMakeLists.txt Outdated
Two CMake-side rough edges that bit me on Windows MSVC + CUDA 12.x
on RTX 6000 Ada (sm_89, Ada-only):

  1. CUDA architectures: when no explicit override is provided,
     the previous CMakeLists could fall back to `75;86`, which
     caused silent build issues on Ada-only setups. This change
     respects DFLASH27B_USER_CUDA_ARCHITECTURES (e.g. `89`) and
     uses it consistently across the dflash and submodule
     ggml/llama.cpp consumers.

  2. BSA was sometimes silently disabled depending on detection
     order. DFLASH27B_ENABLE_BSA is now respected as an explicit
     opt-in/opt-out and a clear status line is printed at
     configure time.

Net effect: a single-arch Ada-only build with BSA enabled is
reproducible from a clean checkout. Default behaviour (no
DFLASH27B_USER_CUDA_ARCHITECTURES set, BSA on) is preserved for
existing users.

Validation:
  cmake -S dflash -B dflash/build/Release \
    -DCMAKE_BUILD_TYPE=Release \
    -DDFLASH27B_USER_CUDA_ARCHITECTURES=89 \
    -DDFLASH27B_ENABLE_BSA=ON
  cmake --build dflash/build/Release --target test_dflash --parallel 8
  -> BUILD_EXIT_CODE=0, sm_89 single-arch confirmed.

Verification vs existing community PRs:

  COMP-COMPL with Luce-Org#48 ("auto-detect GPU arch to prevent sm_120a on
  consumer Blackwell", open) and Luce-Org#91 ("expose BSA config as CLI
  flags with safety warnings", merged 2026-05-04). Luce-Org#48 covers
  auto-detect; Luce-Org#91 covers runtime CLI. This PR covers the
  build-time CMake side: respect the user's explicit
  DFLASH27B_USER_CUDA_ARCHITECTURES override and keep
  DFLASH27B_ENABLE_BSA honest. The three PRs together give
  sensible defaults per hardware tier.

Author: Javier Pazo <xabicasa@gmail.com>
@javierpazo javierpazo force-pushed the xabicasa/dflash-build-cmake-sm89-bsa branch from d423264 to 297fc74 Compare May 10, 2026 19:25
@javierpazo

Copy link
Copy Markdown
Contributor Author

@cubic-dev-ai good catch — restored the SM80 guard around \ est_flashprefill_kernels. The kernel sources are gated on _dflash27b_min_sm GREATER_EQUAL 80\ further up (\DFLASH27B_HAVE_FLASHPREFILL), so the test would have failed to link on legacy arches. Force-pushed; only the test target is now conditional, the rest of the CMake cleanup is unchanged.

@cubic-dev-ai

cubic-dev-ai Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai good catch — restored the SM80 guard around \ est_flashprefill_kernels. The kernel sources are gated on _dflash27b_min_sm GREATER_EQUAL 80\ further up (\DFLASH27B_HAVE_FLASHPREFILL), so the test would have failed to link on legacy arches. Force-pushed; only the test target is now conditional, the rest of the CMake cleanup is unchanged.

@javierpazo Makes sense — the DFLASH27B_HAVE_FLASHPREFILL guard at the top of the file already skips compiling the kernel sources into the dflash27b target when _dflash27b_min_sm < 80, so any test linking against that target would fail with unresolved symbols on pre-Ampere arches. Wrapping only the test target in the SM80 check (vs. the whole block) keeps the other tests like test_kv_quant and test_dflash available on legacy hardware. Clean approach.

The force-push looks good.

easel pushed a commit to easel/lucebox-hub that referenced this pull request May 27, 2026
Record the 2026-05-27 18:17 scheduled run, including fresh PR classification and targeted worktree/Codex probes for stale PRs Luce-Org#137 and Luce-Org#135.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 28, 2026
Revalidate PRs Luce-Org#137 and Luce-Org#135 in isolated worktrees, record conflict shape and current-layout recommendations. Upstream main remains unchanged at 4f4d82e.
easel pushed a commit to easel/lucebox-hub that referenced this pull request May 31, 2026
Keep the current server/CMakeLists.txt CUDA architecture and BSA handling; the old dflash/CMakeLists.txt file modified by PR Luce-Org#137 has been removed from the current tree, so resolving the modify/delete conflict by preserving the deletion records the PR head as represented without reintroducing the retired layout.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant