Strip out hermetic llvm from the rocm toolchain. Use specific target for rocm gpu#251
Draft
alekstheod wants to merge 1 commit into
Conversation
652aabc to
a41717e
Compare
eea81de to
da8e920
Compare
e0c44e9 to
434825e
Compare
434825e to
cb1d447
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR strips out hermetic llvm from the rocm toolchain. It removes the wrapper with the dynamic
selection of the compiler. Instead it introduces a new type of a target for a rocm gpu compilation.
Following is the analysis from AI:
ROCm Toolchain Architecture: Feature-based vs Wrapper-based Compilation
Overview
This document compares two approaches for ROCm/HIP compilation in Bazel:
cc_commonAPI and customrocm_compileruleTL;DR
The current feature-based approach is 67% less code, faster, more maintainable, and uses idiomatic Bazel patterns. The wrapper approach was functional but reimplemented functionality that Bazel already provides.
Approach Comparison
Previous: hipcc_wrapper (590 lines)
Architecture:
Implementation:
Pros:
cc_libraryfor GPU code (with-x rocmflag)Cons:
Current: rocm_compile Rule + Features (~180 lines)
Architecture:
Implementation:
rocm_compilerule (~80 lines): Custom Bazel rule that usescc_commonAPIrocm_hipcc_feature(~100 lines): Declarative feature defining compiler flags and environmentrocm_librarymacro (~50 lines): User-friendly wrapperPros:
cc_commonAPI, feature system, compilation contextsrocm_libraryCons:
rocm_librarymacro instead of plaincc_libraryfor GPU codecc_toolchaininfrastructure defined but not used for automatic resolutionDetailed Comparison
1. Code Complexity
2. Performance
Previous:
Current:
3. User Experience
Previous:
Current:
The current approach is more intuitive - the macro name clearly indicates GPU code.
4. Real-world Usage Pattern
In practice, GPU kernels are always separate from CPU code:
The "limitation" of requiring
rocm_librarymatches actual usage patterns:.cu.ccfiles5. Maintainability
Adding a new compiler flag:
Previous:
Current:
The current approach is dramatically easier to maintain.
6. Correctness
Previous: Manual flag parsing
Current: Bazel API handles it
7. Debuggability
Previous:
Current:
Fewer layers = easier debugging.
8. What cc_toolchain Actually Provides
The current approach uses the
cc_toolchaininfrastructure for many critical things:✅ File dependencies (
cc_toolchain.all_files)✅ Compilation context integration (
cc_common.merge_compilation_contexts)✅ Built-in include directories (
cxx_builtin_include_directories)✅ Feature system
✅ Environment variables (
cc_common.get_environment_variables)✅ Compiler flags (
cc_common.get_memory_inefficient_command_line)The only thing we don't use is automatic toolchain resolution. But manual selection is actually more explicit and clearer - the
rocm_compilerule states exactly which toolchain it uses.Without the cc_toolchain infrastructure, we'd need to manually track all of the above. The wrapper approach had to reimplement much of this logic.
Architecture Details
rocm_compile Rule
The
rocm_compilerule is the heart of the implementation:Key points:
cc_commonAPI throughoutrocm_hipcc_feature
Declarative feature defining all ROCm-specific configuration:
Benefits:
rocm_library Macro
User-friendly wrapper:
Compiles GPU code with
rocm_compile, then wraps.ofiles in standardcc_libraryfor linking.Local Sysroot Support
Both approaches face the same fundamental constraint: You cannot mix hermetic headers with system libraries (ABI mismatch → undefined symbols or segfaults).
The current approach handles this correctly:
ROCm toolchain: Supports local_sysroot (for system ROCm case)
local_sysroot_default_libsfeatureHermetic LLVM toolchain: Stays fully hermetic
This is an architectural constraint, not an implementation issue.
Migration Path
Converting from previous to current approach is straightforward:
Before:
After:
The new syntax is actually clearer about intent.
Conclusion
The current feature-based approach is superior in every meaningful metric:
The wrapper was a reasonable first attempt, but the feature-based architecture is the correct long-term solution.
References
cc/rocm/rocm_compile.bzl- Main compilation rulecc/rocm/features/rocm_hipcc_feature.bzl- Feature definitioncc/rocm/rocm_library.bzl- User-facing macrocc/impls/linux_x86_64_linux_x86_64_rocm/BUILD- Toolchain configuration