Skip to content

[FEA] ANSI SQL Operator JIT Support (2) : Implement Operator Library#22514

Open
lamarrr wants to merge 29 commits into
rapidsai:mainfrom
lamarrr:ansi-jit-2--operator-library
Open

[FEA] ANSI SQL Operator JIT Support (2) : Implement Operator Library#22514
lamarrr wants to merge 29 commits into
rapidsai:mainfrom
lamarrr:ansi-jit-2--operator-library

Conversation

@lamarrr
Copy link
Copy Markdown
Contributor

@lamarrr lamarrr commented May 14, 2026

Description

Split from #22224

Preceded by #22511

Story: #22598

This Pull request:

  • Implements error codes for row operators
  • Ports AST's operators to re-usable functions that can be used with JIT codegen
  • Adds new ANSI-compliant operators (ANSI_ *, ANSI_TRY_ *)
  • Adds the coalesce operator

Each operator is shaped as:

errc operator(T * ... outputs, T const * ... inputs);

Each operator returns an error code to signal to the kernel if a failure occurred.

Choice of operator shape:

  • Non-ambiguous type-resolution and non-implicit casting (e.g., floats casting to double, ints casting to floats)
  • Easy JIT codegen -- all outputs & input arguments are pointers, which makes PTX/LTO-IR codegen easy as they do not need to know the argument types to combine/fuse operators
  • There's no support for exceptions in device code, so we use error codes instead

Future Work

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@lamarrr lamarrr requested a review from a team as a code owner May 14, 2026 20:12
@lamarrr lamarrr requested review from devavret and mhaseeb123 May 14, 2026 20:12
@github-actions github-actions Bot added the libcudf Affects libcudf (C++/CUDA) code. label May 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a device-side operator library under cudf::ops: error codes, type utilities, promotion helpers, and many CUDA device operator headers implementing arithmetic, ANSI-arithmetic, bitwise, comparison, logic, math, casts, trigonometric, and null-handling functions with optional/null semantics.

Changes

Device Operator Library

Layer / File(s) Summary
Foundation: error codes, type aliases, utilities, and promotion
cpp/include/cudf/operators/error.hpp, cpp/include/cudf/operators/types.cuh, cpp/include/cudf/operators/detail/promote.cuh
New errc enum (OK, OVERFLOW, DIVISION_BY_ZERO), optional/nullopt/decimal/duration aliases, ipow10 helper, identity overloads, and compile-time integral promote specializations.
Arithmetic operators
cpp/include/cudf/operators/arithmetic.cuh
abs, add, div, floor_div, mod, pymod, mul, neg, sub, true_div for signed/unsigned integrals, floating-point, decimal<R>, and optional<T> with null propagation.
Comparison operators
cpp/include/cudf/operators/comparison.cuh
equal, not_equal, less, greater, less_equal, greater_equal, and null_equal with scalar and optional-aware overloads implementing null-aware semantics.
Bitwise operators
cpp/include/cudf/operators/bitwise.cuh
bit_and, bit_or, bit_xor, bit_invert, bit_shift_left, bit_shift_right for scalar and optional operands; optionals produce nullopt when inputs are missing.
Logic operators
cpp/include/cudf/operators/logic.cuh
null_logical_and, null_logical_or (three-valued semantics), logical_and, logical_or, logical_not, and if_else with scalar and optional overloads.
Null handling operators
cpp/include/cudf/operators/null_handling.cuh
is_null, nullify_if, coalesce (scalar and optional), and predicate conversion for optional booleans.
Casting operators
cpp/include/cudf/operators/casts.cuh
cast_to_b8, cast_to_i8i64, cast_to_u8u64, cast_to_f32/f64, cast_to_dec32/dec64/dec128, rescale, and internal decimal_cast, with optional overloads.
Math operators
cpp/include/cudf/operators/math.cuh
cbrt, ceil, exp, floor, log, pow, rint, sqrt for float/double and optional inputs; decimal ceil/floor that preserve scale.
Trigonometric operators
cpp/include/cudf/operators/trigonometric.cuh
Inverse and forward trig/hyperbolic functions (arccos, arcsin, arctan, arccosh, arcsinh, arctanh, cos, sin, tan, cosh, sinh, tanh) for float/double and optional overloads.
ANSI arithmetic operators
cpp/include/cudf/operators/ansi_arithmetic.cuh
ANSI-compliant ansi_add, ansi_sub, ansi_mul, ansi_div, ansi_mod, unary ansi_abs/ansi_neg, ansi_precision_check, and ansi_try_* wrappers to convert arithmetic errors into nullopt for optional outputs.

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Suggested labels:
    3 - Ready for Review

  • Suggested reviewers:

  • bdice

  • mhaseeb123

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[FEA] ANSI SQL Operator JIT Support (2) : Implement Operator Library' clearly describes the main objective of implementing an operator library for ANSI SQL JIT support, which aligns with the changeset's introduction of multiple operator headers (arithmetic, comparison, bitwise, etc.) in the cudf::ops namespace.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly relates to the changeset, detailing the implementation of an operator library for ANSI SQL operator JIT support with specific design rationale.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
cpp/include/cudf/ast/expressions.hpp (1)

517-527: 💤 Low value

Minor documentation style inconsistency.

The Doxygen comment at line 517 uses /// style without a proper @brief tag, while other class documentation in this file uses multi-line /** ... */ format with explicit tags. For consistency with the rest of the file (e.g., generic_scalar_device_view, literal), consider using the standard format.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/ast/expressions.hpp` around lines 517 - 527, Replace the
triple-slash doc comment for the predicate class with the project's standard
Doxygen block style: convert the leading `///` comment into a `/** ... */` block
containing an explicit `@brief` (and any other relevant tags such as `@internal`
or a short note that this is for internal filter operations) so it matches the
formatting used by other classes like `generic_scalar_device_view` and
`literal`; update the comment above the `class predicate : public expression`
declaration and keep the existing description and param notes (reference symbol:
predicate, constructor predicate(expression const& source)).
cpp/src/jit/row_ir.hpp (1)

9-11: ⚖️ Poor tradeoff

Consider moving factory includes to implementation file.

Including <cudf/column/column_factories.hpp> and <cudf/scalar/scalar_factories.hpp> in this header adds compilation overhead for all translation units that include row_ir.hpp. Since these are only used by instance_context::add_input(scalar const&) which is defined inline at lines 139-143, consider either:

  1. Moving that method implementation to the .cpp file, or
  2. Forward-declaring the necessary types if possible
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/jit/row_ir.hpp` around lines 9 - 11, The header currently pulls in
heavy factory headers for cudf used only by the inline method
instance_context::add_input(scalar const&); to reduce compile overhead, move the
implementation of instance_context::add_input(scalar const&) out of the header
into the corresponding .cpp and add `#include` <cudf/column/column_factories.hpp>
and `#include` <cudf/scalar/scalar_factories.hpp> to that .cpp (or alternatively
forward-declare the minimal cudf types if feasible); update the header to keep
only the declaration of instance_context::add_input(scalar const&) and ensure
the .cpp defines the method with the required includes.
cpp/src/stream_compaction/filter/filter.cu (1)

63-86: 💤 Low value

Missing CUDF_FUNC_RANGE() in public function.

The public cudf::filter function should include CUDF_FUNC_RANGE() before delegating to detail functions for proper NVTX profiling, consistent with filter_extended at line 99.

Suggested fix
 std::unique_ptr<table> filter(table_view const& predicate_table,
                               ast::expression const& predicate_expr,
                               table_view const& filter_table,
                               rmm::cuda_stream_view stream,
                               rmm::device_async_resource_ref mr)
 {
+  CUDF_FUNC_RANGE();
   auto args = cudf::detail::row_ir::ast_converter::filter(cudf::detail::row_ir::target::CUDA,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/stream_compaction/filter/filter.cu` around lines 63 - 86, The public
filter function is missing a CUDF_FUNC_RANGE() NVTX scope; add a
CUDF_FUNC_RANGE() call at the start of the filter(table_view const&
predicate_table, ast::expression const& predicate_expr, table_view const&
filter_table, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr)
function (same placement as in filter_extended) before performing the
ast_converter call and delegating to detail::filter so profiling/stack traces
include this public API call.
cpp/src/join/filter_join_indices_jit.cu (1)

51-54: 💤 Low value

Parameter should be std::span of const elements for const-correctness.

The table_sources parameter is not modified by these functions, so it should be std::span<std::optional<int32_t> const> rather than std::span<std::optional<int32_t>>.

Suggested fix
 jitify2::StringVec build_join_filter_template_params(
   std::span<transform_input const> inputs,
-  std::span<std::optional<int32_t>> table_sources,
+  std::span<std::optional<int32_t> const> table_sources,
   null_aware is_null_aware)
 jitify2::ConfiguredKernel build_join_filter_kernel(std::string const& predicate_code,
                                                    std::span<transform_input const> inputs,
-                                                   std::span<std::optional<int32_t>> table_sources,
+                                                   std::span<std::optional<int32_t> const> table_sources,
                                                    bool is_ptx,

Also applies to: 91-94

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/join/filter_join_indices_jit.cu` around lines 51 - 54, Change the
parameter type for table_sources to a span of const optionals to preserve
const-correctness: update the signature of build_join_filter_template_params to
take std::span<std::optional<int32_t> const> (instead of
std::span<std::optional<int32_t>>) and do the same for the other
overload/variant in this file that accepts table_sources; adjust any related
variable names or usages to compile with the new const-qualified span but do not
modify the function body logic.
cpp/include/cudf/operators/error.hpp (1)

13-13: ⚡ Quick win

Add Doxygen documentation for the public enum.

The errc enum is a public API type but lacks Doxygen documentation. As per coding guidelines, all public API entities in cpp/include/cudf/ should have Doxygen documentation tags.

📝 Suggested documentation
+/**
+ * `@brief` Error codes returned by device operators.
+ */
 enum errc : int { OK = 0, OVERFLOW = 1, DIVISION_BY_ZERO = 2 };

As per coding guidelines: "Add Doxygen documentation tags (@param, @return, @throw, @tparam, @brief) on all public API functions" (applies to public types as well).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/error.hpp` at line 13, The public enum errc (OK,
OVERFLOW, DIVISION_BY_ZERO) is missing Doxygen documentation; add a Doxygen
comment block (/** ... */) immediately above the enum declaration that includes
a `@brief` describing the purpose of errc and short descriptions for each
enumerator (OK, OVERFLOW, DIVISION_BY_ZERO) so tools and users can understand
the API; ensure the block uses standard tags (e.g., `@brief` and either `@enum` or
inline descriptions for each enumerator) and follows the project's Doxygen style
for public API types.
cpp/include/cudf/operators/casts.cuh (1)

471-471: Address the TODO comment or track as an issue.

The TODO indicates that casting from int and float to decimal32 is not yet implemented.

Do you want me to open a new issue to track this missing functionality, or is it already being handled elsewhere?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/casts.cuh` at line 471, Resolve the TODO for
CAST_TO_DEC32: either implement casting logic from int and float to decimal32
inside the CAST_TO_DEC32 specialization in cudf/operators/casts.cuh (add
conversion rules, handle scale/precision, overflow/rounding behavior consistent
with other decimal casts, and add unit tests) or remove the TODO and open a
tracked issue referencing CAST_TO_DEC32 and this file describing the missing
int/float→decimal32 work and expected behavior; update the comment to point to
the issue if you choose the latter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 622-636: Add a doc comment for the optional overload of ansi_abs
to match the other functions in the file: document the template<typename T>
__device__ inline errc ansi_abs(optional<T>* out, optional<T> const* a)
behavior, parameters (out and a), and return value (errc::OK or propagated
error) and note that nullopt is written to out when input has no value or an
error occurs; follow the same style/format used by the other optional overloads
in this file.

In `@cpp/include/cudf/operators/casts.cuh`:
- Around line 610-623: The optional overload of rescale is calling rescale(&r,
&a->value(), new_scale->value()) with new_scale->value() (int32_t const&) but
the called rescale expects an int32_t const*; fix by passing the address of the
scale value instead (i.e., pass &new_scale->value()) so the call matches the
signature; update the call in the rescale(optional<decimal<R>>*,
optional<decimal<R>> const*, optional<int32_t> const*) function that currently
invokes rescale(&r, &a->value(), new_scale->value()) to use &new_scale->value()
as the third argument.

In `@cpp/include/cudf/operators/logic.cuh`:
- Around line 244-255: The if_else(optional<T>* out, ...) implementation invokes
out->value() before out is engaged, causing undefined behavior; fix by creating
a local T result variable (e.g., T result), call the internal
if_else<T>(&result, &true_value->value(), &false_value->value(), &pred->value())
when the optionals are engaged, then assign the result into out via
out->emplace(result) or *out = result; keep the else branch setting *out =
nullopt and return errc::OK.

In `@cpp/include/cudf/operators/math.cuh`:
- Around line 96-109: The ceil and floor implementations use detail::ipow10 with
a->scale(), but fixed_point::scale() is negative for decimal places; update both
functions (template <typename R> __device__ inline errc ceil(decimal<R>* out,
decimal<R> const* a) and the corresponding floor implementation) to pass the
negated scale to ipow10 (i.e., use detail::ipow10(-a->scale())) so the
multiplier factor is computed correctly for positive exponents and
division/remainder logic remains valid for decimal<R>.

In `@cpp/include/cudf/operators/types.cuh`:
- Around line 41-57: Document that ipow10 requires a non-negative exponent by
adding a clear precondition comment above the template for ipow10(T exponent)
stating exponent must be >= 0 (or change the API to take an unsigned type). Also
consider adding a runtime check/assert (e.g., assert(exponent >= 0)) inside
ipow10 or switching the template parameter to an unsigned integer to enforce the
precondition; reference the ipow10 function to locate the change.

In `@cpp/src/join/jit/filter_join_kernel.cuh`:
- Around line 21-35: The Doxygen comment above the template function
filter_join_kernel is out of sync with its signature: replace or update the
`@param` entries that mention left_table, right_table, and scalars with correct
descriptions for the actual parameters (num_rows, left_indices, right_indices,
columns, predicate_results, and user_data) and keep any notes about the template
parameters (has_user_data, is_null_aware, Accessors) if relevant; specifically,
document num_rows as the number of rows to process, left_indices/right_indices
as device spans of indices, columns as the device column view array,
predicate_results as the output boolean array, and user_data as optional user
data for the predicate. Ensure parameter names in the comment exactly match the
function parameter names (filter_join_kernel signature) to avoid confusion.

---

Nitpick comments:
In `@cpp/include/cudf/ast/expressions.hpp`:
- Around line 517-527: Replace the triple-slash doc comment for the predicate
class with the project's standard Doxygen block style: convert the leading `///`
comment into a `/** ... */` block containing an explicit `@brief` (and any other
relevant tags such as `@internal` or a short note that this is for internal
filter operations) so it matches the formatting used by other classes like
`generic_scalar_device_view` and `literal`; update the comment above the `class
predicate : public expression` declaration and keep the existing description and
param notes (reference symbol: predicate, constructor predicate(expression
const& source)).

In `@cpp/include/cudf/operators/casts.cuh`:
- Line 471: Resolve the TODO for CAST_TO_DEC32: either implement casting logic
from int and float to decimal32 inside the CAST_TO_DEC32 specialization in
cudf/operators/casts.cuh (add conversion rules, handle scale/precision,
overflow/rounding behavior consistent with other decimal casts, and add unit
tests) or remove the TODO and open a tracked issue referencing CAST_TO_DEC32 and
this file describing the missing int/float→decimal32 work and expected behavior;
update the comment to point to the issue if you choose the latter.

In `@cpp/include/cudf/operators/error.hpp`:
- Line 13: The public enum errc (OK, OVERFLOW, DIVISION_BY_ZERO) is missing
Doxygen documentation; add a Doxygen comment block (/** ... */) immediately
above the enum declaration that includes a `@brief` describing the purpose of errc
and short descriptions for each enumerator (OK, OVERFLOW, DIVISION_BY_ZERO) so
tools and users can understand the API; ensure the block uses standard tags
(e.g., `@brief` and either `@enum` or inline descriptions for each enumerator) and
follows the project's Doxygen style for public API types.

In `@cpp/src/jit/row_ir.hpp`:
- Around line 9-11: The header currently pulls in heavy factory headers for cudf
used only by the inline method instance_context::add_input(scalar const&); to
reduce compile overhead, move the implementation of
instance_context::add_input(scalar const&) out of the header into the
corresponding .cpp and add `#include` <cudf/column/column_factories.hpp> and
`#include` <cudf/scalar/scalar_factories.hpp> to that .cpp (or alternatively
forward-declare the minimal cudf types if feasible); update the header to keep
only the declaration of instance_context::add_input(scalar const&) and ensure
the .cpp defines the method with the required includes.

In `@cpp/src/join/filter_join_indices_jit.cu`:
- Around line 51-54: Change the parameter type for table_sources to a span of
const optionals to preserve const-correctness: update the signature of
build_join_filter_template_params to take std::span<std::optional<int32_t>
const> (instead of std::span<std::optional<int32_t>>) and do the same for the
other overload/variant in this file that accepts table_sources; adjust any
related variable names or usages to compile with the new const-qualified span
but do not modify the function body logic.

In `@cpp/src/stream_compaction/filter/filter.cu`:
- Around line 63-86: The public filter function is missing a CUDF_FUNC_RANGE()
NVTX scope; add a CUDF_FUNC_RANGE() call at the start of the filter(table_view
const& predicate_table, ast::expression const& predicate_expr, table_view const&
filter_table, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr)
function (same placement as in filter_extended) before performing the
ast_converter call and delegating to detail::filter so profiling/stack traces
include this public API call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6cb67a67-631a-4678-88d6-2acbe2289ca7

📥 Commits

Reviewing files that changed from the base of the PR and between d7c7891 and b8872f8.

📒 Files selected for processing (25)
  • cpp/include/cudf/ast/detail/operator_functor.cuh
  • cpp/include/cudf/ast/expressions.hpp
  • cpp/include/cudf/operators/ansi_arithmetic.cuh
  • cpp/include/cudf/operators/arithmetic.cuh
  • cpp/include/cudf/operators/bitwise.cuh
  • cpp/include/cudf/operators/casts.cuh
  • cpp/include/cudf/operators/comparison.cuh
  • cpp/include/cudf/operators/detail/promote.cuh
  • cpp/include/cudf/operators/error.hpp
  • cpp/include/cudf/operators/logic.cuh
  • cpp/include/cudf/operators/math.cuh
  • cpp/include/cudf/operators/null_handling.cuh
  • cpp/include/cudf/operators/trigonometric.cuh
  • cpp/include/cudf/operators/types.cuh
  • cpp/src/ast/expressions.cpp
  • cpp/src/jit/column_accessor.cuh
  • cpp/src/jit/join_column_accessor.cuh
  • cpp/src/jit/row_ir.cpp
  • cpp/src/jit/row_ir.hpp
  • cpp/src/join/filter_join_indices_jit.cu
  • cpp/src/join/jit/filter_join_kernel.cu
  • cpp/src/join/jit/filter_join_kernel.cuh
  • cpp/src/stream_compaction/filter/filter.cu
  • cpp/src/transform/transform.cu
  • cpp/tests/jit/row_ir.cpp
💤 Files with no reviewable changes (1)
  • cpp/src/jit/join_column_accessor.cuh

Comment thread cpp/include/cudf/operators/ansi_arithmetic.cuh
Comment thread cpp/include/cudf/operators/casts.cuh
Comment thread cpp/include/cudf/operators/logic.cuh Outdated
Comment thread cpp/include/cudf/operators/math.cuh
Comment thread cpp/include/cudf/operators/types.cuh Outdated
Comment thread cpp/src/join/jit/filter_join_kernel.cuh
@lamarrr lamarrr added feature request New feature or request non-breaking Non-breaking change labels May 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 622-629: The Doxygen block for the ANSI absolute-value operator
uses backticks around commands which breaks parsing; edit the comment above the
function that "Computes absolute value for optional inputs with ANSI overflow
checks" (the docblock containing `@brief`, `@tparam`, `@param`, `@return`) and
remove all surrounding backticks so the tags are plain `@brief`, `@tparam`, `@param`,
and `@return` to match the rest of the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b457725b-4b7e-40f5-9d0e-a14c47f85dd3

📥 Commits

Reviewing files that changed from the base of the PR and between b8872f8 and 655689a.

📒 Files selected for processing (4)
  • cpp/include/cudf/operators/ansi_arithmetic.cuh
  • cpp/include/cudf/operators/casts.cuh
  • cpp/include/cudf/operators/logic.cuh
  • cpp/include/cudf/operators/types.cuh

Comment thread cpp/include/cudf/operators/ansi_arithmetic.cuh
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/include/cudf/operators/ansi_arithmetic.cuh (2)

389-401: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return DIVISION_BY_ZERO for decimal zero divisors.

Line 400 currently reports b->value() == 0 as errc::OVERFLOW. That collapses two distinct failure modes into one and prevents callers from distinguishing divide-by-zero from numeric overflow.

Suggested fix
 /**
  * `@brief` Divides fixed-point decimal operands with ANSI checks.
  *
  * `@tparam` R Decimal representation type.
  * `@param` out Destination decimal value.
  * `@param` a Dividend.
  * `@param` b Divisor.
- * `@return` errc::OVERFLOW on overflow or zero divisor, else errc::OK.
+ * `@return` errc::DIVISION_BY_ZERO on zero divisor, errc::OVERFLOW on overflow, else errc::OK.
  */
 template <typename R>
 __device__ inline errc ansi_div(decimal<R>* out, decimal<R> const* a, decimal<R> const* b)
 {
-  if (numeric::division_overflow<R>(a->value(), b->value()) || b->value() == 0) {
-    return errc::OVERFLOW;
-  }
+  if (b->value() == 0) { return errc::DIVISION_BY_ZERO; }
+  if (numeric::division_overflow<R>(a->value(), b->value())) { return errc::OVERFLOW; }
 
   *out = decimal<R>{numeric::scaled_integer<R>{a->value() / b->value(),
                                                numeric::scale_type{a->scale() - b->scale()}}};
   return errc::OK;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 389 - 401, In
ansi_div (template<typename R>) change the error handling so that a zero divisor
returns errc::DIVISION_BY_ZERO and only
numeric::division_overflow<R>(a->value(), b->value()) returns errc::OVERFLOW;
specifically, check b->value() == 0 first and return errc::DIVISION_BY_ZERO,
otherwise check division_overflow and return errc::OVERFLOW, then errc::OK on
success.

404-405: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The decimal quotient is truncated before the result scale is applied.

Line 404 performs integer division on the raw representations, then Line 405 applies the scale difference. This truncates the quotient before scaling, losing precision. For decimal division to be correct, the dividend must first be scaled to account for the divisor's scale before performing integer division. For example, 1.0 / 0.03 should yield approximately 33.33, but the current implementation truncates 10 / 3 to 3, yielding an incorrect result.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 404 - 405, The
current decimal division computes numeric::scaled_integer<R>{a->value() /
b->value(), numeric::scale_type{a->scale() - b->scale()}}, which truncates
before applying scale; instead scale the dividend first by 10^(b->scale() -
a->scale()) (or the equivalent power-of-10 adjustment) and then perform integer
division so precision is preserved. Locate the division logic around
decimal<R>{numeric::scaled_integer<R>{...}} using symbols a->value(),
b->value(), a->scale(), b->scale(), and numeric::scaled_integer, compute a
scaled numerator (promoting to a wider integer type if needed to avoid
overflow), divide by b->value(), and construct the result with the correct
resulting scale (a->scale() - b->scale() or adjusted target scale). Ensure
overflow checks or promotion to a larger R (or intermediate type) when
multiplying by the scale factor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 389-401: In ansi_div (template<typename R>) change the error
handling so that a zero divisor returns errc::DIVISION_BY_ZERO and only
numeric::division_overflow<R>(a->value(), b->value()) returns errc::OVERFLOW;
specifically, check b->value() == 0 first and return errc::DIVISION_BY_ZERO,
otherwise check division_overflow and return errc::OVERFLOW, then errc::OK on
success.
- Around line 404-405: The current decimal division computes
numeric::scaled_integer<R>{a->value() / b->value(),
numeric::scale_type{a->scale() - b->scale()}}, which truncates before applying
scale; instead scale the dividend first by 10^(b->scale() - a->scale()) (or the
equivalent power-of-10 adjustment) and then perform integer division so
precision is preserved. Locate the division logic around
decimal<R>{numeric::scaled_integer<R>{...}} using symbols a->value(),
b->value(), a->scale(), b->scale(), and numeric::scaled_integer, compute a
scaled numerator (promoting to a wider integer type if needed to avoid
overflow), divide by b->value(), and construct the result with the correct
resulting scale (a->scale() - b->scale() or adjusted target scale). Ensure
overflow checks or promotion to a larger R (or intermediate type) when
multiplying by the scale factor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 387797a5-61d7-4a9c-a0c6-2a40b69319ec

📥 Commits

Reviewing files that changed from the base of the PR and between 655689a and 4775b56.

📒 Files selected for processing (1)
  • cpp/include/cudf/operators/ansi_arithmetic.cuh

@lamarrr lamarrr requested a review from bdice May 20, 2026 14:29
Comment thread cpp/include/cudf/operators/ansi_arithmetic.cuh Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cpp/include/cudf/operators/ansi_arithmetic.cuh (4)

311-315: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return DIVISION_BY_ZERO for decimal zero divisors.

This path currently reports errc::OVERFLOW when b->value() == 0. Besides losing the right ANSI error, it also means ansi_mod(decimal<R>...) can never surface DIVISION_BY_ZERO even though its doc comment says it can.

Suggested fix
 __device__ inline errc ansi_div(decimal<R>* out, decimal<R> const* a, decimal<R> const* b)
 {
-  if (numeric::division_overflow<R>(a->value(), b->value()) || b->value() == 0) {
-    return errc::OVERFLOW;
-  }
+  if (b->value() == 0) { return errc::DIVISION_BY_ZERO; }
+  if (numeric::division_overflow<R>(a->value(), b->value())) { return errc::OVERFLOW; }
 
   *out = decimal<R>{numeric::scaled_integer<R>{a->value() / b->value(),
                                                numeric::scale_type{a->scale() - b->scale()}}};
   return errc::OK;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 311 - 315, The
current ansi_div(decimal<R>*) returns errc::OVERFLOW when b->value() == 0,
losing the correct ANSI error; update the logic in ansi_div so that if
b->value() == 0 it returns errc::DIVISION_BY_ZERO, otherwise check
numeric::division_overflow<R>(a->value(), b->value()) and return errc::OVERFLOW
only for an overflow; ensure the change also allows ansi_mod(decimal<R>...) to
surface errc::DIVISION_BY_ZERO as documented.

430-438: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Decimal modulus is wrong for negative non-integer quotients.

ansi_div(decimal<R>...) already truncates toward zero, so flooring that truncated quotient cannot implement the floor-based modulus here. For example, -1.23 % 0.10 should yield 0.07, but this code computes -0.03. Compute the quotient directly on aligned unscaled values with floor-division semantics instead of ansi_div(...) followed by floor(...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 430 - 438, The
ansi_mod implementation incorrectly computes remainder by calling ansi_div then
floor; replace that approach in ansi_mod(decimal<R>* out, decimal<R> const* a,
decimal<R> const* b) by computing the quotient from the aligned unscaled integer
representations with floor-division semantics directly (handle sign of operands
so division floors toward -infinity for negative results), then multiply that
integer quotient by b and subtract from a to produce the remainder; ensure you
align scales of a and b before integer division, construct the decimal<R>
quotient appropriately, and remove the ansi_div(...) + floor(...) sequence so
negative non-integer quotients produce correct mod results.

631-636: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Precision check undercounts digits when scale() > 0.

This compares only the unscaled rep against 10^precision, which is not enough once the decimal has a positive scale. That makes values produced by ansi_div slip through incorrectly; e.g. a decimal representing 100 as rep=1, scale=2 passes precision=2 here even though it needs three digits.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 631 - 636, The
precision check must account for the decimal scale: instead of comparing the raw
rep against 10^precision, compute an effective exponent = precision - scale and
compare abs_value against 10^exponent (treat exponent <= 0 as only allowing
abs_value == 0). Update the check in the function around value/abs_value to
compute exponent = static_cast<R>(*precision) - static_cast<R>(*scale), use
detail::ipow10(exponent) when exponent > 0, and return errc::OVERFLOW if
abs_value >= ipow10(exponent) (or if exponent <= 0 and abs_value != 0). Keep the
existing numeric_limits<R>::min() overflow check intact.

565-570: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict this overload to signed integrals and add a floating-point overload.

cuda::std::is_signed_v<T> is true for float and double, so this template incorrectly accepts floating-point inputs. Line 569 then treats numeric_limits<float>::min() / double::min() as overflow even though those are the smallest positive normalized values, not the most negative. Floating-point types need a separate overload without the overflow check.

Suggested fix
 template <typename T>
-  requires(cuda::std::is_signed_v<T>)
+  requires(cuda::std::is_integral_v<T> && cuda::std::is_signed_v<T>)
 __device__ inline errc ansi_neg(T* out, T const* a)
 {
   if (*a == cuda::std::numeric_limits<T>::min()) { return errc::OVERFLOW; }
   *out = -(*a);
   return errc::OK;
 }
+
+template <typename T>
+  requires(cuda::std::is_floating_point_v<T>)
+__device__ inline errc ansi_neg(T* out, T const* a)
+{
+  *out = -(*a);
+  return errc::OK;
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 565 - 570, The
template for ansi_neg wrongly accepts floating-point types because it only
checks cuda::std::is_signed_v<T>; restrict the current overload to signed
integrals by using a trait that combines cuda::std::is_integral_v<T> &&
cuda::std::is_signed_v<T> (so ansi_neg<T> for integrals keeps the overflow check
against numeric_limits<T>::min()), and add a separate ansi_neg<T> overload
constrained to floating-point types (cuda::std::is_floating_point_v<T>) that
performs *out = -(*a) without the min-value overflow check; update the template
constraints on the existing ansi_neg and add the floating-point overload
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 274-280: The ansi_div device function currently maps any
cuda::div_overflow() failure to errc::DIVISION_BY_ZERO; change it to distinguish
zero divisors from true arithmetic overflow: first check if *b == 0 and return
errc::DIVISION_BY_ZERO, otherwise call cuda::div_overflow(r, *a, *b) and return
errc::OVERFLOW on true, then store *out = r; also update the function docstring
(the comment at the top of ansi_div) to mention both divide-by-zero and overflow
cases. Use the function name ansi_div and the call to cuda::div_overflow to
locate and modify the code.

---

Outside diff comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 311-315: The current ansi_div(decimal<R>*) returns errc::OVERFLOW
when b->value() == 0, losing the correct ANSI error; update the logic in
ansi_div so that if b->value() == 0 it returns errc::DIVISION_BY_ZERO, otherwise
check numeric::division_overflow<R>(a->value(), b->value()) and return
errc::OVERFLOW only for an overflow; ensure the change also allows
ansi_mod(decimal<R>...) to surface errc::DIVISION_BY_ZERO as documented.
- Around line 430-438: The ansi_mod implementation incorrectly computes
remainder by calling ansi_div then floor; replace that approach in
ansi_mod(decimal<R>* out, decimal<R> const* a, decimal<R> const* b) by computing
the quotient from the aligned unscaled integer representations with
floor-division semantics directly (handle sign of operands so division floors
toward -infinity for negative results), then multiply that integer quotient by b
and subtract from a to produce the remainder; ensure you align scales of a and b
before integer division, construct the decimal<R> quotient appropriately, and
remove the ansi_div(...) + floor(...) sequence so negative non-integer quotients
produce correct mod results.
- Around line 631-636: The precision check must account for the decimal scale:
instead of comparing the raw rep against 10^precision, compute an effective
exponent = precision - scale and compare abs_value against 10^exponent (treat
exponent <= 0 as only allowing abs_value == 0). Update the check in the function
around value/abs_value to compute exponent = static_cast<R>(*precision) -
static_cast<R>(*scale), use detail::ipow10(exponent) when exponent > 0, and
return errc::OVERFLOW if abs_value >= ipow10(exponent) (or if exponent <= 0 and
abs_value != 0). Keep the existing numeric_limits<R>::min() overflow check
intact.
- Around line 565-570: The template for ansi_neg wrongly accepts floating-point
types because it only checks cuda::std::is_signed_v<T>; restrict the current
overload to signed integrals by using a trait that combines
cuda::std::is_integral_v<T> && cuda::std::is_signed_v<T> (so ansi_neg<T> for
integrals keeps the overflow check against numeric_limits<T>::min()), and add a
separate ansi_neg<T> overload constrained to floating-point types
(cuda::std::is_floating_point_v<T>) that performs *out = -(*a) without the
min-value overflow check; update the template constraints on the existing
ansi_neg and add the floating-point overload accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e1a65ab7-304a-494a-b12f-dffdd56568f2

📥 Commits

Reviewing files that changed from the base of the PR and between 29bec09 and c7c8745.

📒 Files selected for processing (1)
  • cpp/include/cudf/operators/ansi_arithmetic.cuh

Comment thread cpp/include/cudf/operators/ansi_arithmetic.cuh
@PointKernel
Copy link
Copy Markdown
Member

Another high-level question we probably want to resolve is the naming. From the very beginning of the ANSI support effort in cudf (e.g. #19243), the original idea was to introduce something like SUM_ANSI as an aggregation kind. Through the issue/PR discussions, it was interesting to see the naming evolve from SUM_ANSI to SUM_WITH_OVERFLOW.

The main motivation behind this change was that ANSI is not particularly self-descriptive, and reviewers raised concerns that developers without a SQL background might not find the name meaningful. At the same time, as pointed out in #21676, ANSI semantics have effectively been the de facto standard for a long time, and the more we discussed it, the less awkward the naming felt in libcudf.

So from my perspective, the ANSI naming is not necessarily bad, but I still slightly prefer something like _OVERFLOW over _ANSI because it is more explicit about the behavior. I'm mostly just sharing my observations here, and hopefully we can converge on a naming scheme that everyone is happy with.

Copy link
Copy Markdown
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick pass on the code. @lamarrr Do you think we could completely drop our custom logic and raw CUDA built-ins in favor of standard CCCL utilities? I'm thinking of things like switching from ::sin to cuda::std::sin, or using cuda::std::bit_and instead of raw & operators to clean it up.

Comment thread cpp/include/cudf/operators/types.cuh Outdated
Comment thread cpp/include/cudf/operators/types.cuh Outdated
Comment thread cpp/include/cudf/operators/types.cuh Outdated
Comment thread cpp/include/cudf/operators/error.hpp Outdated
Comment thread cpp/include/cudf/operators/error.hpp Outdated
Comment thread cpp/include/cudf/operators/types.cuh Outdated
Comment thread cpp/include/cudf/operators/types.cuh Outdated
@lamarrr
Copy link
Copy Markdown
Contributor Author

lamarrr commented May 22, 2026

The main motivation behind this change was that ANSI is not particularly self-descriptive, and reviewers raised concerns that developers without a SQL background might not find the name meaningful.

Agreed, I had the impression ANSI_* precisely communicates the expected error handling behaviour, just like how FLOATS can be described using ISO specifications. This removes behavioural ambiguity and sets expectations for users.
cc: @igorpeshansky @revans2

So from my perspective, the ANSI naming is not necessarily bad, but I still slightly prefer something like _OVERFLOW

_OVERFLOW is probably not a good name here because users would think it returns an overflow flag and the value, which isn't the case. The returned error in the operator is used for propagating errors to exceptions.

Alternatively, we could use checked_* prefixes just as seen in Velox, i.e. checked_add, checked_sum, etc.

…:optional

- Updated functions in null_handling.cuh and trigonometric.cuh to replace custom optional handling with cuda::std::optional.
- Removed the types.cuh file as it was no longer needed after the refactor.
- Changed return types of several functions from errc to void, simplifying the interface.
- Ensured null handling is consistent across all operations by using cuda::std::nullopt.
@lamarrr
Copy link
Copy Markdown
Contributor Author

lamarrr commented May 22, 2026

Just a quick pass on the code. @lamarrr Do you think we could completely drop our custom logic and raw CUDA built-ins in favor of standard CCCL utilities? I'm thinking of things like switching from ::sin to cuda::std::sin, or using cuda::std::bit_and instead of raw & operators to clean it up.

Yes, but it doesn't provide any perf or correctness advantage, it is only overhead. Especially for bit_and.

@PointKernel
Copy link
Copy Markdown
Member

Just a quick pass on the code. @lamarrr Do you think we could completely and raw CUDA built-ins in favor of standard CCCL utilities? I'm thinking of things like switching from ::sin to cuda::std::sin, or using cuda::std::bit_and instead of raw & operators to clean it up.

Yes, but it doesn't provide any perf or correctness advantage, it is only overhead. Especially for bit_and.

Right, bit_and is a bad example.

What I mean by “drop our custom logic” is avoiding reimplementation of existing functionality, such as ipow10 or device overflow handling. Since this PR introduces many fundamental operators at once, we'd like to make sure every new piece of logic is necessary and that we are not duplicating existing implementations in either cuDF or CCCL. There may or may not still be cases like this, but with ~3.5k lines added, they are not obvious for reviewers to spot, so it would be great if you could double-check.

Using CCCL utilities instead of CUDA C-style functions is meant to align with C++ best practices: avoid calling C-style functions directly in C++ code when a suitable C++ utility exists e.g. https://godbolt.org/z/Pc7YYGe36 the underlying SASS are the same.

@lamarrr
Copy link
Copy Markdown
Contributor Author

lamarrr commented May 22, 2026

I'm on board with you; we should re-use existing functionality where possible. I had no idea ipow10 existed (I knew it existed in CCCL as a detail:: function). The overflow check APIs are new (CCCL 3.2), so I wasn't aware they existed either.
By overhead, I mean compilation time overhead: more work for the compiler and more header code to inline and process when the CUDA builtin will do, they are CUDA C++ functions and always available, care has to be taken, given that it will be used for JIT compilation.
Yes, some of the operators are duplicated, especially the ones in AST. I've added deduplication as future work in #22598.
Deduplication is outside the scope of this PR and will explode the diff.
This PR is necessary to ensure the JIT API has a consistently-shaped set of functions it can dispatch to without complicated logic and to allow it to extend the operators beyond the capability of AST.
We initially decided to bootstrap JIT by using the existing AST operators but we need to separate them so JIT can have new operators.

@lamarrr
Copy link
Copy Markdown
Contributor Author

lamarrr commented May 22, 2026

/ok to test 0393ffc

@lamarrr lamarrr requested review from igorpeshansky and revans2 May 22, 2026 18:54
@PointKernel
Copy link
Copy Markdown
Member

The ask was for a broader sweep over the rest of the newly added code. I looked closely at three headers, and two of them contained logic that duplicates existing utilities in cuDF or CCCL, which is why I wanted to check whether similar cases exist elsewhere in the PR.

Existing AST duplication is reasonable to defer to #22598, but where there is a clean drop-in replacement, e.g. cuda::ipow, cuda::add_overflow, numeric::detail::ipow you did, folding it into this PR seems relatively cheap and helps avoid growing the deduplication backlog later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants