[FEA] Introduce LibRTCX#21625
Conversation
- Introduced a new `jit_bundle` class to encapsulate JIT installation and management. - Updated `cache_statistics` to include disk hit/miss counters for blobs and fragments. - Removed the custom `rw_spinlock_t` implementation in favor of standard synchronization mechanisms. - Enhanced the `cache_t` class to manage JIT bundle installation and caching more effectively. - Refactored file handling functions to improve clarity and error handling. - Updated kernel compilation logic to utilize the new JIT bundle structure. - Improved test cases for fragment creation to reflect changes in the JIT compilation process.
- Change LZ4 library linking from static to object files for better flexibility. - Update LZ4 compression method in Python script to use block compression. - Enhance JIT installation logging for better debugging. - Modify CMake configuration to use the latest LZ4 development version.
…pdate related functions
…timing in JIT compilation
bdice
left a comment
There was a problem hiding this comment.
Submitting partial feedback -- apologies, I did not realize my earlier review was not yet submitted.
wence-
left a comment
There was a problem hiding this comment.
Tiny docstring format, but from my point of view this looks good now. Thanks for bearing with me!
There was a problem hiding this comment.
A few suggestions from me and a few from my agent.
I don't see much about the plans for testing this in issue #22496. Before merging, please share your plan for building and testing this in CI. I'd like for us to include a robust test suite and include builds of librtcx in CI before we get too deep into integration with libcudf. We want to keep this component separable from libcudf, as it's possible that we may upstream this to CCCL or make it its own repo/library.
vyasr
left a comment
There was a problem hiding this comment.
This is really great work. I'm flushing my first pass of review, which focused pretty much exclusively on the compile-time infrastructure for embedding pieces into the binary. I'll do a second pass looking at the runtime usage of librtcx in a follow-up.
vyasr
left a comment
There was a problem hiding this comment.
Here's my review of the runtime code. That side of things looks very solid at this point.
- Corrected NVJitLink reference in README.md - Enhanced CMake functions to accept TARGET as an argument - Refactored embed functions to streamline embedding logic - Improved type handling in embed.hpp for better clarity - Renamed load_dll to load_dso for consistency - Added checks for zero-length files in blob_t::from_file - Enhanced cache_t documentation for clarity on directory requirements
There was a problem hiding this comment.
🧹 Nitpick comments (3)
cpp/librtcx/rtcx.hpp (3)
130-131: ⚡ Quick winAdd
@briefdocumentation forbinary_typeenum.This public enum represents the type of compiled binary output. Adding documentation would help users understand the purpose and available options.
📝 Suggested documentation
+/** + * `@brief` Specifies the type of compiled binary output from NVRTC or NVJITLink. + */ enum class binary_type : std::int8_t { LTO_IR = 0, CUBIN = 2, FATBIN = 3, PTX = 4 };As per coding guidelines, "C++/CUDA code must include proper
doxygendocumentation comments".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/librtcx/rtcx.hpp` around lines 130 - 131, Add a doxygen brief for the public enum binary_type to explain its purpose and enumerate values; update the declaration of enum class binary_type to include a Doxygen comment like "@brief Represents the type of compiled binary output" and a short description of each enumerator (LTO_IR, CUBIN, FATBIN, PTX) so users know what each value means when reading the API docs.
558-572: ⚡ Quick winAdd
@briefdocumentation forcache_statsandcache_limits.These public structs represent cache configuration and metrics that users will interact with. Brief documentation would clarify their purpose.
📝 Suggested documentation
+/** + * `@brief` Container for cache performance statistics (hit/miss counts for memory and disk caches). + */ struct [[nodiscard]] cache_stats { std::uint64_t blob_mem_hits = 0; // ... existing members }; +/** + * `@brief` Configuration limits for the in-memory cache capacity. + */ struct [[nodiscard]] cache_limits { std::uint32_t num_mem_blobs = 16'384; std::uint32_t num_mem_libraries = 16'384; };As per coding guidelines, "C++/CUDA code must include proper
doxygendocumentation comments".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/librtcx/rtcx.hpp` around lines 558 - 572, Add missing Doxygen `@brief` comments for the two public structs so their purpose is documented for users: add a brief description above struct cache_stats explaining it contains runtime cache metrics (blob and library memory/disk hits and misses) and above struct cache_limits explaining it contains configurable cache size limits (num_mem_blobs and num_mem_libraries). Use Doxygen comment style (/** ... */) with `@brief` and a short one-line explanation; you may also add a short sentence per struct about units/defaults if desired.
107-128: ⚡ Quick winAdd
@briefdocumentation forsha256_hasher.This is a public type in the API that users may use or encounter. Per coding guidelines, C++/CUDA code must include proper doxygen documentation comments.
📝 Suggested documentation
+/** + * `@brief` Hash functor for sha256 values, suitable for use with std::unordered_map and similar + * containers. + */ struct [[nodiscard]] sha256_hasher {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/librtcx/rtcx.hpp` around lines 107 - 128, Add a Doxygen `@brief` for the public struct sha256_hasher (and mention its call operator operator()) by inserting a /** `@brief` ... */ comment block immediately above the struct declaration; the brief should state that sha256_hasher provides a constexpr hash functor for sha256 objects (returns a 64-bit hash by mixing four 64-bit words) and may note its use with unordered containers so users and tools see proper API docs for sha256_hasher and its operator().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cpp/librtcx/rtcx.hpp`:
- Around line 130-131: Add a doxygen brief for the public enum binary_type to
explain its purpose and enumerate values; update the declaration of enum class
binary_type to include a Doxygen comment like "@brief Represents the type of
compiled binary output" and a short description of each enumerator (LTO_IR,
CUBIN, FATBIN, PTX) so users know what each value means when reading the API
docs.
- Around line 558-572: Add missing Doxygen `@brief` comments for the two public
structs so their purpose is documented for users: add a brief description above
struct cache_stats explaining it contains runtime cache metrics (blob and
library memory/disk hits and misses) and above struct cache_limits explaining it
contains configurable cache size limits (num_mem_blobs and num_mem_libraries).
Use Doxygen comment style (/** ... */) with `@brief` and a short one-line
explanation; you may also add a short sentence per struct about units/defaults
if desired.
- Around line 107-128: Add a Doxygen `@brief` for the public struct sha256_hasher
(and mention its call operator operator()) by inserting a /** `@brief` ... */
comment block immediately above the struct declaration; the brief should state
that sha256_hasher provides a constexpr hash functor for sha256 objects (returns
a 64-bit hash by mixing four 64-bit words) and may note its use with unordered
containers so users and tools see proper API docs for sha256_hasher and its
operator().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: acaaddec-6719-4966-be42-9d63bc3e237f
📒 Files selected for processing (5)
cpp/librtcx/README.mdcpp/librtcx/embed.cmakecpp/librtcx/embed.hppcpp/librtcx/rtcx.cppcpp/librtcx/rtcx.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
- cpp/librtcx/README.md
- cpp/librtcx/embed.cmake
- cpp/librtcx/embed.hpp
- cpp/librtcx/rtcx.cpp
…OPY_DIRECTORY for clarity Remove unnecessary comment in initialize function documentation Change anonymous namespace to inline namespace detail in sha256.hpp for consistency
vyasr
left a comment
There was a problem hiding this comment.
I'm happy enough with the current state, let's move forward.
|
/ok to test a0ec040 |
bdice
left a comment
There was a problem hiding this comment.
Thank you for your care and attention to detail!
|
/merge |
Description
This pull request introduces new library called
librtcxfor JIT compilation & linking (replacing JITIFY).Jitify previously helped bootstrap and get JIT compilation working quickly but, it has proven to be difficult for continued usage, due to multiple factors:
_FILE_OFFSET_BITSto 64-bits to enable large arrays of binaries in C++ code rather than embedding it-minifyoption). The preprocessed headers are duplicated across each kernel, this doesn't scale as we use more existing CUDF, CCCL, & CUB headers and expand the number of JIT kernelsThis Pull Request:
librtcx, a wrapper on top of NVRTC, NVJITLINK, and CUDART to provide full control of the JIT compilation pipelineFuture Work
Checklist