[FEA] Introduce LibRTCX by lamarrr · Pull Request #21625 · rapidsai/cudf

lamarrr · 2026-03-02T21:27:16Z

Description

This pull request introduces new library called librtcx for JIT compilation & linking (replacing JITIFY).
Jitify previously helped bootstrap and get JIT compilation working quickly but, it has proven to be difficult for continued usage, due to multiple factors:

Lacks support for JIT-linking of AOT-compiled fragments, We've had to convert PTX into inline assembly in CUDA, and the current interface provides no concrete way to integrate AOT-compiled LTO-IR fragments
It packs each CUDA kernel as preprocessed source strings, which is large, slow, and doesn't scale (in JIT compilation-time & binary size)
It represents the source code as large hex arrays which slows down syntax highlighting, and forced workarounds like setting compiler _FILE_OFFSET_BITS to 64-bits to enable large arrays of binaries in C++ code rather than embedding it
Lacks support for source code compression (even with the -minify option). The preprocessed headers are duplicated across each kernel, this doesn't scale as we use more existing CUDF, CCCL, & CUB headers and expand the number of JIT kernels
Lacks support for user-managed cache or fragments
It doesn't expose or provide any interface into the underlying library handles (NVRTC, NVJITLINK, CUDART)
Most of the abstractions and advantages provided by JITIFY are not suitable and have become a barrier compared to calling NVRTC, NVJITLink, and CUDART directly
Lacks support for pre-loading/pre-populating the in-memory JIT cache, which has been requested by our partners and is necessary for accurate benchmarking

This Pull Request:

Implements librtcx, a wrapper on top of NVRTC, NVJITLINK, and CUDART to provide full control of the JIT compilation pipeline

Future Work

Unit tests for the LibRTCX abstractions

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

- Introduced a new `jit_bundle` class to encapsulate JIT installation and management. - Updated `cache_statistics` to include disk hit/miss counters for blobs and fragments. - Removed the custom `rw_spinlock_t` implementation in favor of standard synchronization mechanisms. - Enhanced the `cache_t` class to manage JIT bundle installation and caching more effectively. - Refactored file handling functions to improve clarity and error handling. - Updated kernel compilation logic to utilize the new JIT bundle structure. - Improved test cases for fragment creation to reflect changes in the JIT compilation process.

- Change LZ4 library linking from static to object files for better flexibility. - Update LZ4 compression method in Python script to use block compression. - Enhance JIT installation logging for better debugging. - Modify CMake configuration to use the latest LZ4 development version.

…pdate related functions

…ement

…text-mt

… RTC caches

…able handling

…timing in JIT compilation

…unctionality

bdice

Submitting partial feedback -- apologies, I did not realize my earlier review was not yet submitted.

wence-

Tiny docstring format, but from my point of view this looks good now. Thanks for bearing with me!

bdice

A few suggestions from me and a few from my agent.

I don't see much about the plans for testing this in issue #22496. Before merging, please share your plan for building and testing this in CI. I'd like for us to include a robust test suite and include builds of librtcx in CI before we get too deep into integration with libcudf. We want to keep this component separable from libcudf, as it's possible that we may upstream this to CCCL or make it its own repo/library.

vyasr

This is really great work. I'm flushing my first pass of review, which focused pretty much exclusively on the compile-time infrastructure for embedding pieces into the binary. I'll do a second pass looking at the runtime usage of librtcx in a follow-up.

vyasr

Here's my review of the runtime code. That side of things looks very solid at this point.

- Corrected NVJitLink reference in README.md - Enhanced CMake functions to accept TARGET as an argument - Refactored embed functions to streamline embedding logic - Improved type handling in embed.hpp for better clarity - Renamed load_dll to load_dso for consistency - Added checks for zero-length files in blob_t::from_file - Enhanced cache_t documentation for clarity on directory requirements

coderabbitai

🧹 Nitpick comments (3)

cpp/librtcx/rtcx.hpp (3)

130-131: ⚡ Quick win

Add @brief documentation for binary_type enum.

This public enum represents the type of compiled binary output. Adding documentation would help users understand the purpose and available options.

📝 Suggested documentation

+/**
+ * `@brief` Specifies the type of compiled binary output from NVRTC or NVJITLink.
+ */
 enum class binary_type : std::int8_t { LTO_IR = 0, CUBIN = 2, FATBIN = 3, PTX = 4 };

As per coding guidelines, "C++/CUDA code must include proper doxygen documentation comments".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 130 - 131, Add a doxygen brief for the
public enum binary_type to explain its purpose and enumerate values; update the
declaration of enum class binary_type to include a Doxygen comment like "@brief
Represents the type of compiled binary output" and a short description of each
enumerator (LTO_IR, CUBIN, FATBIN, PTX) so users know what each value means when
reading the API docs.

558-572: ⚡ Quick win

Add @brief documentation for cache_stats and cache_limits.

These public structs represent cache configuration and metrics that users will interact with. Brief documentation would clarify their purpose.

📝 Suggested documentation

+/**
+ * `@brief` Container for cache performance statistics (hit/miss counts for memory and disk caches).
+ */
 struct [[nodiscard]] cache_stats {
   std::uint64_t blob_mem_hits       = 0;
   // ... existing members
 };

+/**
+ * `@brief` Configuration limits for the in-memory cache capacity.
+ */
 struct [[nodiscard]] cache_limits {
   std::uint32_t num_mem_blobs     = 16'384;
   std::uint32_t num_mem_libraries = 16'384;
 };

As per coding guidelines, "C++/CUDA code must include proper doxygen documentation comments".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 558 - 572, Add missing Doxygen `@brief`
comments for the two public structs so their purpose is documented for users:
add a brief description above struct cache_stats explaining it contains runtime
cache metrics (blob and library memory/disk hits and misses) and above struct
cache_limits explaining it contains configurable cache size limits
(num_mem_blobs and num_mem_libraries). Use Doxygen comment style (/** ... */)
with `@brief` and a short one-line explanation; you may also add a short sentence
per struct about units/defaults if desired.

107-128: ⚡ Quick win

Add @brief documentation for sha256_hasher.

This is a public type in the API that users may use or encounter. Per coding guidelines, C++/CUDA code must include proper doxygen documentation comments.

📝 Suggested documentation

+/**
+ * `@brief` Hash functor for sha256 values, suitable for use with std::unordered_map and similar
+ * containers.
+ */
 struct [[nodiscard]] sha256_hasher {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/librtcx/rtcx.hpp` around lines 107 - 128, Add a Doxygen `@brief` for the
public struct sha256_hasher (and mention its call operator operator()) by
inserting a /** `@brief` ... */ comment block immediately above the struct
declaration; the brief should state that sha256_hasher provides a constexpr hash
functor for sha256 objects (returns a 64-bit hash by mixing four 64-bit words)
and may note its use with unordered containers so users and tools see proper API
docs for sha256_hasher and its operator().

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cpp/librtcx/rtcx.hpp`:
- Around line 130-131: Add a doxygen brief for the public enum binary_type to
explain its purpose and enumerate values; update the declaration of enum class
binary_type to include a Doxygen comment like "@brief Represents the type of
compiled binary output" and a short description of each enumerator (LTO_IR,
CUBIN, FATBIN, PTX) so users know what each value means when reading the API
docs.
- Around line 558-572: Add missing Doxygen `@brief` comments for the two public
structs so their purpose is documented for users: add a brief description above
struct cache_stats explaining it contains runtime cache metrics (blob and
library memory/disk hits and misses) and above struct cache_limits explaining it
contains configurable cache size limits (num_mem_blobs and num_mem_libraries).
Use Doxygen comment style (/** ... */) with `@brief` and a short one-line
explanation; you may also add a short sentence per struct about units/defaults
if desired.
- Around line 107-128: Add a Doxygen `@brief` for the public struct sha256_hasher
(and mention its call operator operator()) by inserting a /** `@brief` ... */
comment block immediately above the struct declaration; the brief should state
that sha256_hasher provides a constexpr hash functor for sha256 objects (returns
a 64-bit hash by mixing four 64-bit words) and may note its use with unordered
containers so users and tools see proper API docs for sha256_hasher and its
operator().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: acaaddec-6719-4966-be42-9d63bc3e237f

📥 Commits

Reviewing files that changed from the base of the PR and between 03ec682 and 00b32b3.

📒 Files selected for processing (5)

cpp/librtcx/README.md
cpp/librtcx/embed.cmake
cpp/librtcx/embed.hpp
cpp/librtcx/rtcx.cpp
cpp/librtcx/rtcx.hpp

🚧 Files skipped from review as they are similar to previous changes (4)

cpp/librtcx/README.md
cpp/librtcx/embed.cmake
cpp/librtcx/embed.hpp
cpp/librtcx/rtcx.cpp

…OPY_DIRECTORY for clarity Remove unnecessary comment in initialize function documentation Change anonymous namespace to inline namespace detail in sha256.hpp for consistency

…d readability

vyasr

I'm happy enough with the current state, let's move forward.

…erties

bdice · 2026-05-21T17:48:49Z

/ok to test a0ec040

bdice

Thank you for your care and attention to detail!

lamarrr · 2026-05-22T11:26:39Z

/merge

lamarrr added 30 commits January 13, 2026 09:10

todos and refactoring

b42b998

update

ff2f88f

Enhance JIT compilation with ASM support and integrate LZ4 compression

86c0010

Merge remote-tracking branch 'upstream/main' into lto-ir-expt

ae39c7c

Update copyright year in get_lz4.cmake and improve formatting

a955a04

Integrate Zstandard (zstd) compression support in JIT embedding and u…

82ddae0

…pdate related functions

Add python-zstd dependency to environment YAML files

43a4e08

update

37976fc

refactoring + multi-threaded setup fix

17ebfa0

enhance teardown documentation with usage warnings and clarify purpose

6fbc5a4

update copyright year to 2026 in header files

5aa1ea5

enhance teardown function to reset context and improve resource manag…

fab6e6e

…ement

fix: adjust concurrency calculation in multithreaded test utility

5738218

Merge branch 'main' into context-mt

33b93a4

fix: adjust concurrency calculation in multithreaded utility

c34420a

Merge branch 'context-mt' of https://github.com/lamarrr/cudf into con…

6be76aa

…text-mt

Merge branch 'context-mt' into lto-ir-expt

d83a397

feat: enhance initialization flags and context management for JIT and…

8b58392

… RTC caches

feat: add get_bool_env_or specialization for boolean environment vari…

b2be86b

…able handling

Merge branch 'context-mt' into lto-ir-expt

3ecf30d

feat: improve cache management and blob handling in RTC

925171d

fix: replace high_resolution_clock with steady_clock for more stable …

8fd14de

…timing in JIT compilation

formatting

c9aedc9

feat: add new accessors and device span structures for improved LTO f…

bae7fde

…unctionality

feaute-parity for column_device_view core types

e2915e7

refactoring

4480033

update

ba8f921

Merge remote-tracking branch 'upstream/main' into lto-ir-expt

f68d4f0

lamarrr added 2 commits May 13, 2026 21:11

fix coderabbit nits

eb3ef79

Merge branch 'main' into drop-jitify

1bbc0f8

bdice reviewed May 13, 2026

View reviewed changes

wence- approved these changes May 14, 2026

View reviewed changes

Comment thread cpp/librtcx/rtcx.hpp Outdated

bdice reviewed May 14, 2026

View reviewed changes

vyasr reviewed May 14, 2026

View reviewed changes

lamarrr added 2 commits May 15, 2026 00:55

revert: code rabbit suggestion

03ec682

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

lamarrr added 5 commits May 15, 2026 01:10

Refactor embed_includes function to use SOURCE_DIRECTORY instead of C…

8bde405

…OPY_DIRECTORY for clarity Remove unnecessary comment in initialize function documentation Change anonymous namespace to inline namespace detail in sha256.hpp for consistency

Add template specialization for reflect function to handle various types

463d008

WAR: pre-commit namespace decl

2a26428

Fix formatting in generate_cxx_source_files_data function for improve…

df6c97d

…d readability

Merge branch 'main' into drop-jitify

2662703

vyasr approved these changes May 15, 2026

View reviewed changes

lamarrr added 6 commits May 18, 2026 13:47

replace error-prone parent scope variable assignment with target prop…

ac0caaa

…erties

pre-commit: formatting

908cf50

fix: correct variable name for embed properties in JIT functions

48d1ec0

fix: update std::call_once to use value() for initialization flags

cbfc086

fix: change namespace declaration to inline for detail

56ac249

fix: update reflect_enum to use reflect instead of reflect_int

a138cb7

lamarrr mentioned this pull request May 21, 2026

[FEA] Expand JIT functionality in libcudf #18023

Open

lamarrr removed Python Affects Python cuDF API. Java Affects Java cuDF API. labels May 21, 2026

Merge branch 'main' into drop-jitify

a0ec040

bdice approved these changes May 21, 2026

View reviewed changes

Merge branch 'main' into drop-jitify

59593ed

Conversation

lamarrr commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Future Work

Checklist

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bdice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

vyasr left a comment

Choose a reason for hiding this comment

Uh oh!

bdice commented May 21, 2026

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

lamarrr commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

lamarrr commented Mar 2, 2026 •

edited

Loading

bdice left a comment •

edited

Loading