Skip to content

[FEA] Introduce LibRTCX#21625

Open
lamarrr wants to merge 274 commits into
rapidsai:mainfrom
lamarrr:drop-jitify
Open

[FEA] Introduce LibRTCX#21625
lamarrr wants to merge 274 commits into
rapidsai:mainfrom
lamarrr:drop-jitify

Conversation

@lamarrr
Copy link
Copy Markdown
Contributor

@lamarrr lamarrr commented Mar 2, 2026

Description

This pull request introduces new library called librtcx for JIT compilation & linking (replacing JITIFY).
Jitify previously helped bootstrap and get JIT compilation working quickly but, it has proven to be difficult for continued usage, due to multiple factors:

  • Lacks support for JIT-linking of AOT-compiled fragments, We've had to convert PTX into inline assembly in CUDA, and the current interface provides no concrete way to integrate AOT-compiled LTO-IR fragments
  • It packs each CUDA kernel as preprocessed source strings, which is large, slow, and doesn't scale (in JIT compilation-time & binary size)
  • It represents the source code as large hex arrays which slows down syntax highlighting, and forced workarounds like setting compiler _FILE_OFFSET_BITS to 64-bits to enable large arrays of binaries in C++ code rather than embedding it
  • Lacks support for source code compression (even with the -minify option). The preprocessed headers are duplicated across each kernel, this doesn't scale as we use more existing CUDF, CCCL, & CUB headers and expand the number of JIT kernels
  • Lacks support for user-managed cache or fragments
  • It doesn't expose or provide any interface into the underlying library handles (NVRTC, NVJITLINK, CUDART)
  • Most of the abstractions and advantages provided by JITIFY are not suitable and have become a barrier compared to calling NVRTC, NVJITLink, and CUDART directly
  • Lacks support for pre-loading/pre-populating the in-memory JIT cache, which has been requested by our partners and is necessary for accurate benchmarking

This Pull Request:

  • Implements librtcx, a wrapper on top of NVRTC, NVJITLINK, and CUDART to provide full control of the JIT compilation pipeline

Future Work

  • Unit tests for the LibRTCX abstractions

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

- Introduced a new `jit_bundle` class to encapsulate JIT installation and management.
- Updated `cache_statistics` to include disk hit/miss counters for blobs and fragments.
- Removed the custom `rw_spinlock_t` implementation in favor of standard synchronization mechanisms.
- Enhanced the `cache_t` class to manage JIT bundle installation and caching more effectively.
- Refactored file handling functions to improve clarity and error handling.
- Updated kernel compilation logic to utilize the new JIT bundle structure.
- Improved test cases for fragment creation to reflect changes in the JIT compilation process.
- Change LZ4 library linking from static to object files for better flexibility.
- Update LZ4 compression method in Python script to use block compression.
- Enhance JIT installation logging for better debugging.
- Modify CMake configuration to use the latest LZ4 development version.
Copy link
Copy Markdown
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting partial feedback -- apologies, I did not realize my earlier review was not yet submitted.

Comment thread cpp/include/cudf/detail/utilities/cuda.cuh
Comment thread cpp/src/librtcx/libcxx/README.md Outdated
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/sha256.hpp Outdated
Comment thread cpp/include/cudf/fixed_point/fixed_point.hpp Outdated
Comment thread cpp/src/jit/jit.cpp Outdated
Comment thread cpp/src/jit/jit.cpp Outdated
Comment thread cpp/librtcx/libcxx/cassert Outdated
Comment thread cpp/librtcx/libcxx/cstddef
Comment thread cpp/librtcx/embed.hpp
Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny docstring format, but from my point of view this looks good now. Thanks for bearing with me!

Comment thread cpp/librtcx/rtcx.hpp Outdated
Copy link
Copy Markdown
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions from me and a few from my agent.

I don't see much about the plans for testing this in issue #22496. Before merging, please share your plan for building and testing this in CI. I'd like for us to include a robust test suite and include builds of librtcx in CI before we get too deep into integration with libcudf. We want to keep this component separable from libcudf, as it's possible that we may upstream this to CCCL or make it its own repo/library.

Comment thread cpp/librtcx/embed.hpp Outdated
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp Outdated
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp
Copy link
Copy Markdown
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great work. I'm flushing my first pass of review, which focused pretty much exclusively on the compile-time infrastructure for embedding pieces into the binary. I'll do a second pass looking at the runtime usage of librtcx in a follow-up.

Comment thread cpp/librtcx/libcxx/cassert
Comment thread cpp/librtcx/README.md Outdated
Comment thread cpp/librtcx/embed.cmake Outdated
Comment thread cpp/librtcx/embed.cmake Outdated
Comment thread cpp/librtcx/embed.cmake Outdated
Comment thread cpp/librtcx/embed.in.cpp
Comment thread cpp/librtcx/embed.hpp
Comment thread cpp/librtcx/embed.hpp
Comment thread cpp/librtcx/embed.hpp
Comment thread cpp/librtcx/embed.hpp
Copy link
Copy Markdown
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my review of the runtime code. That side of things looks very solid at this point.

Comment thread cpp/librtcx/rtcx.hpp
Comment thread cpp/librtcx/rtcx.hpp
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp Outdated
Comment thread cpp/librtcx/rtcx.cpp
Comment thread cpp/librtcx/rtcx.cpp Outdated
Comment thread cpp/librtcx/sha256.hpp Outdated
Comment thread cpp/librtcx/rtcx.hpp Outdated
lamarrr added 2 commits May 15, 2026 00:55
- Corrected NVJitLink reference in README.md
- Enhanced CMake functions to accept TARGET as an argument
- Refactored embed functions to streamline embedding logic
- Improved type handling in embed.hpp for better clarity
- Renamed load_dll to load_dso for consistency
- Added checks for zero-length files in blob_t::from_file
- Enhanced cache_t documentation for clarity on directory requirements
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
cpp/librtcx/rtcx.hpp (3)

130-131: ⚡ Quick win

Add @brief documentation for binary_type enum.

This public enum represents the type of compiled binary output. Adding documentation would help users understand the purpose and available options.

📝 Suggested documentation
+/**
+ * `@brief` Specifies the type of compiled binary output from NVRTC or NVJITLink.
+ */
 enum class binary_type : std::int8_t { LTO_IR = 0, CUBIN = 2, FATBIN = 3, PTX = 4 };

As per coding guidelines, "C++/CUDA code must include proper doxygen documentation comments".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 130 - 131, Add a doxygen brief for the
public enum binary_type to explain its purpose and enumerate values; update the
declaration of enum class binary_type to include a Doxygen comment like "@brief
Represents the type of compiled binary output" and a short description of each
enumerator (LTO_IR, CUBIN, FATBIN, PTX) so users know what each value means when
reading the API docs.

558-572: ⚡ Quick win

Add @brief documentation for cache_stats and cache_limits.

These public structs represent cache configuration and metrics that users will interact with. Brief documentation would clarify their purpose.

📝 Suggested documentation
+/**
+ * `@brief` Container for cache performance statistics (hit/miss counts for memory and disk caches).
+ */
 struct [[nodiscard]] cache_stats {
   std::uint64_t blob_mem_hits       = 0;
   // ... existing members
 };

+/**
+ * `@brief` Configuration limits for the in-memory cache capacity.
+ */
 struct [[nodiscard]] cache_limits {
   std::uint32_t num_mem_blobs     = 16'384;
   std::uint32_t num_mem_libraries = 16'384;
 };

As per coding guidelines, "C++/CUDA code must include proper doxygen documentation comments".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 558 - 572, Add missing Doxygen `@brief`
comments for the two public structs so their purpose is documented for users:
add a brief description above struct cache_stats explaining it contains runtime
cache metrics (blob and library memory/disk hits and misses) and above struct
cache_limits explaining it contains configurable cache size limits
(num_mem_blobs and num_mem_libraries). Use Doxygen comment style (/** ... */)
with `@brief` and a short one-line explanation; you may also add a short sentence
per struct about units/defaults if desired.

107-128: ⚡ Quick win

Add @brief documentation for sha256_hasher.

This is a public type in the API that users may use or encounter. Per coding guidelines, C++/CUDA code must include proper doxygen documentation comments.

📝 Suggested documentation
+/**
+ * `@brief` Hash functor for sha256 values, suitable for use with std::unordered_map and similar
+ * containers.
+ */
 struct [[nodiscard]] sha256_hasher {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 107 - 128, Add a Doxygen `@brief` for the
public struct sha256_hasher (and mention its call operator operator()) by
inserting a /** `@brief` ... */ comment block immediately above the struct
declaration; the brief should state that sha256_hasher provides a constexpr hash
functor for sha256 objects (returns a 64-bit hash by mixing four 64-bit words)
and may note its use with unordered containers so users and tools see proper API
docs for sha256_hasher and its operator().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/librtcx/rtcx.hpp`:
- Around line 130-131: Add a doxygen brief for the public enum binary_type to
explain its purpose and enumerate values; update the declaration of enum class
binary_type to include a Doxygen comment like "@brief Represents the type of
compiled binary output" and a short description of each enumerator (LTO_IR,
CUBIN, FATBIN, PTX) so users know what each value means when reading the API
docs.
- Around line 558-572: Add missing Doxygen `@brief` comments for the two public
structs so their purpose is documented for users: add a brief description above
struct cache_stats explaining it contains runtime cache metrics (blob and
library memory/disk hits and misses) and above struct cache_limits explaining it
contains configurable cache size limits (num_mem_blobs and num_mem_libraries).
Use Doxygen comment style (/** ... */) with `@brief` and a short one-line
explanation; you may also add a short sentence per struct about units/defaults
if desired.
- Around line 107-128: Add a Doxygen `@brief` for the public struct sha256_hasher
(and mention its call operator operator()) by inserting a /** `@brief` ... */
comment block immediately above the struct declaration; the brief should state
that sha256_hasher provides a constexpr hash functor for sha256 objects (returns
a 64-bit hash by mixing four 64-bit words) and may note its use with unordered
containers so users and tools see proper API docs for sha256_hasher and its
operator().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: acaaddec-6719-4966-be42-9d63bc3e237f

📥 Commits

Reviewing files that changed from the base of the PR and between 03ec682 and 00b32b3.

📒 Files selected for processing (5)
  • cpp/librtcx/README.md
  • cpp/librtcx/embed.cmake
  • cpp/librtcx/embed.hpp
  • cpp/librtcx/rtcx.cpp
  • cpp/librtcx/rtcx.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • cpp/librtcx/README.md
  • cpp/librtcx/embed.cmake
  • cpp/librtcx/embed.hpp
  • cpp/librtcx/rtcx.cpp

Copy link
Copy Markdown
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy enough with the current state, let's move forward.

@lamarrr lamarrr removed Python Affects Python cuDF API. Java Affects Java cuDF API. labels May 21, 2026
@bdice
Copy link
Copy Markdown
Contributor

bdice commented May 21, 2026

/ok to test a0ec040

Copy link
Copy Markdown
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your care and attention to detail!

@lamarrr
Copy link
Copy Markdown
Contributor Author

lamarrr commented May 22, 2026

/merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

8 participants