[FEA] ANSI SQL Operator JIT Support (2) : Implement Operator Library#22514
[FEA] ANSI SQL Operator JIT Support (2) : Implement Operator Library#22514lamarrr wants to merge 29 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a device-side operator library under cudf::ops: error codes, type utilities, promotion helpers, and many CUDA device operator headers implementing arithmetic, ANSI-arithmetic, bitwise, comparison, logic, math, casts, trigonometric, and null-handling functions with optional/null semantics. ChangesDevice Operator Library
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (6)
cpp/include/cudf/ast/expressions.hpp (1)
517-527: 💤 Low valueMinor documentation style inconsistency.
The Doxygen comment at line 517 uses
///style without a proper@brieftag, while other class documentation in this file uses multi-line/** ... */format with explicit tags. For consistency with the rest of the file (e.g.,generic_scalar_device_view,literal), consider using the standard format.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/ast/expressions.hpp` around lines 517 - 527, Replace the triple-slash doc comment for the predicate class with the project's standard Doxygen block style: convert the leading `///` comment into a `/** ... */` block containing an explicit `@brief` (and any other relevant tags such as `@internal` or a short note that this is for internal filter operations) so it matches the formatting used by other classes like `generic_scalar_device_view` and `literal`; update the comment above the `class predicate : public expression` declaration and keep the existing description and param notes (reference symbol: predicate, constructor predicate(expression const& source)).cpp/src/jit/row_ir.hpp (1)
9-11: ⚖️ Poor tradeoffConsider moving factory includes to implementation file.
Including
<cudf/column/column_factories.hpp>and<cudf/scalar/scalar_factories.hpp>in this header adds compilation overhead for all translation units that includerow_ir.hpp. Since these are only used byinstance_context::add_input(scalar const&)which is defined inline at lines 139-143, consider either:
- Moving that method implementation to the .cpp file, or
- Forward-declaring the necessary types if possible
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/jit/row_ir.hpp` around lines 9 - 11, The header currently pulls in heavy factory headers for cudf used only by the inline method instance_context::add_input(scalar const&); to reduce compile overhead, move the implementation of instance_context::add_input(scalar const&) out of the header into the corresponding .cpp and add `#include` <cudf/column/column_factories.hpp> and `#include` <cudf/scalar/scalar_factories.hpp> to that .cpp (or alternatively forward-declare the minimal cudf types if feasible); update the header to keep only the declaration of instance_context::add_input(scalar const&) and ensure the .cpp defines the method with the required includes.cpp/src/stream_compaction/filter/filter.cu (1)
63-86: 💤 Low valueMissing
CUDF_FUNC_RANGE()in public function.The public
cudf::filterfunction should includeCUDF_FUNC_RANGE()before delegating to detail functions for proper NVTX profiling, consistent withfilter_extendedat line 99.Suggested fix
std::unique_ptr<table> filter(table_view const& predicate_table, ast::expression const& predicate_expr, table_view const& filter_table, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) { + CUDF_FUNC_RANGE(); auto args = cudf::detail::row_ir::ast_converter::filter(cudf::detail::row_ir::target::CUDA,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/stream_compaction/filter/filter.cu` around lines 63 - 86, The public filter function is missing a CUDF_FUNC_RANGE() NVTX scope; add a CUDF_FUNC_RANGE() call at the start of the filter(table_view const& predicate_table, ast::expression const& predicate_expr, table_view const& filter_table, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) function (same placement as in filter_extended) before performing the ast_converter call and delegating to detail::filter so profiling/stack traces include this public API call.cpp/src/join/filter_join_indices_jit.cu (1)
51-54: 💤 Low valueParameter should be
std::spanof const elements for const-correctness.The
table_sourcesparameter is not modified by these functions, so it should bestd::span<std::optional<int32_t> const>rather thanstd::span<std::optional<int32_t>>.Suggested fix
jitify2::StringVec build_join_filter_template_params( std::span<transform_input const> inputs, - std::span<std::optional<int32_t>> table_sources, + std::span<std::optional<int32_t> const> table_sources, null_aware is_null_aware)jitify2::ConfiguredKernel build_join_filter_kernel(std::string const& predicate_code, std::span<transform_input const> inputs, - std::span<std::optional<int32_t>> table_sources, + std::span<std::optional<int32_t> const> table_sources, bool is_ptx,Also applies to: 91-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/join/filter_join_indices_jit.cu` around lines 51 - 54, Change the parameter type for table_sources to a span of const optionals to preserve const-correctness: update the signature of build_join_filter_template_params to take std::span<std::optional<int32_t> const> (instead of std::span<std::optional<int32_t>>) and do the same for the other overload/variant in this file that accepts table_sources; adjust any related variable names or usages to compile with the new const-qualified span but do not modify the function body logic.cpp/include/cudf/operators/error.hpp (1)
13-13: ⚡ Quick winAdd Doxygen documentation for the public enum.
The
errcenum is a public API type but lacks Doxygen documentation. As per coding guidelines, all public API entities incpp/include/cudf/should have Doxygen documentation tags.📝 Suggested documentation
+/** + * `@brief` Error codes returned by device operators. + */ enum errc : int { OK = 0, OVERFLOW = 1, DIVISION_BY_ZERO = 2 };As per coding guidelines: "Add Doxygen documentation tags (
@param,@return,@throw,@tparam,@brief) on all public API functions" (applies to public types as well).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/error.hpp` at line 13, The public enum errc (OK, OVERFLOW, DIVISION_BY_ZERO) is missing Doxygen documentation; add a Doxygen comment block (/** ... */) immediately above the enum declaration that includes a `@brief` describing the purpose of errc and short descriptions for each enumerator (OK, OVERFLOW, DIVISION_BY_ZERO) so tools and users can understand the API; ensure the block uses standard tags (e.g., `@brief` and either `@enum` or inline descriptions for each enumerator) and follows the project's Doxygen style for public API types.cpp/include/cudf/operators/casts.cuh (1)
471-471: Address the TODO comment or track as an issue.The TODO indicates that casting from int and float to decimal32 is not yet implemented.
Do you want me to open a new issue to track this missing functionality, or is it already being handled elsewhere?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/casts.cuh` at line 471, Resolve the TODO for CAST_TO_DEC32: either implement casting logic from int and float to decimal32 inside the CAST_TO_DEC32 specialization in cudf/operators/casts.cuh (add conversion rules, handle scale/precision, overflow/rounding behavior consistent with other decimal casts, and add unit tests) or remove the TODO and open a tracked issue referencing CAST_TO_DEC32 and this file describing the missing int/float→decimal32 work and expected behavior; update the comment to point to the issue if you choose the latter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 622-636: Add a doc comment for the optional overload of ansi_abs
to match the other functions in the file: document the template<typename T>
__device__ inline errc ansi_abs(optional<T>* out, optional<T> const* a)
behavior, parameters (out and a), and return value (errc::OK or propagated
error) and note that nullopt is written to out when input has no value or an
error occurs; follow the same style/format used by the other optional overloads
in this file.
In `@cpp/include/cudf/operators/casts.cuh`:
- Around line 610-623: The optional overload of rescale is calling rescale(&r,
&a->value(), new_scale->value()) with new_scale->value() (int32_t const&) but
the called rescale expects an int32_t const*; fix by passing the address of the
scale value instead (i.e., pass &new_scale->value()) so the call matches the
signature; update the call in the rescale(optional<decimal<R>>*,
optional<decimal<R>> const*, optional<int32_t> const*) function that currently
invokes rescale(&r, &a->value(), new_scale->value()) to use &new_scale->value()
as the third argument.
In `@cpp/include/cudf/operators/logic.cuh`:
- Around line 244-255: The if_else(optional<T>* out, ...) implementation invokes
out->value() before out is engaged, causing undefined behavior; fix by creating
a local T result variable (e.g., T result), call the internal
if_else<T>(&result, &true_value->value(), &false_value->value(), &pred->value())
when the optionals are engaged, then assign the result into out via
out->emplace(result) or *out = result; keep the else branch setting *out =
nullopt and return errc::OK.
In `@cpp/include/cudf/operators/math.cuh`:
- Around line 96-109: The ceil and floor implementations use detail::ipow10 with
a->scale(), but fixed_point::scale() is negative for decimal places; update both
functions (template <typename R> __device__ inline errc ceil(decimal<R>* out,
decimal<R> const* a) and the corresponding floor implementation) to pass the
negated scale to ipow10 (i.e., use detail::ipow10(-a->scale())) so the
multiplier factor is computed correctly for positive exponents and
division/remainder logic remains valid for decimal<R>.
In `@cpp/include/cudf/operators/types.cuh`:
- Around line 41-57: Document that ipow10 requires a non-negative exponent by
adding a clear precondition comment above the template for ipow10(T exponent)
stating exponent must be >= 0 (or change the API to take an unsigned type). Also
consider adding a runtime check/assert (e.g., assert(exponent >= 0)) inside
ipow10 or switching the template parameter to an unsigned integer to enforce the
precondition; reference the ipow10 function to locate the change.
In `@cpp/src/join/jit/filter_join_kernel.cuh`:
- Around line 21-35: The Doxygen comment above the template function
filter_join_kernel is out of sync with its signature: replace or update the
`@param` entries that mention left_table, right_table, and scalars with correct
descriptions for the actual parameters (num_rows, left_indices, right_indices,
columns, predicate_results, and user_data) and keep any notes about the template
parameters (has_user_data, is_null_aware, Accessors) if relevant; specifically,
document num_rows as the number of rows to process, left_indices/right_indices
as device spans of indices, columns as the device column view array,
predicate_results as the output boolean array, and user_data as optional user
data for the predicate. Ensure parameter names in the comment exactly match the
function parameter names (filter_join_kernel signature) to avoid confusion.
---
Nitpick comments:
In `@cpp/include/cudf/ast/expressions.hpp`:
- Around line 517-527: Replace the triple-slash doc comment for the predicate
class with the project's standard Doxygen block style: convert the leading `///`
comment into a `/** ... */` block containing an explicit `@brief` (and any other
relevant tags such as `@internal` or a short note that this is for internal
filter operations) so it matches the formatting used by other classes like
`generic_scalar_device_view` and `literal`; update the comment above the `class
predicate : public expression` declaration and keep the existing description and
param notes (reference symbol: predicate, constructor predicate(expression
const& source)).
In `@cpp/include/cudf/operators/casts.cuh`:
- Line 471: Resolve the TODO for CAST_TO_DEC32: either implement casting logic
from int and float to decimal32 inside the CAST_TO_DEC32 specialization in
cudf/operators/casts.cuh (add conversion rules, handle scale/precision,
overflow/rounding behavior consistent with other decimal casts, and add unit
tests) or remove the TODO and open a tracked issue referencing CAST_TO_DEC32 and
this file describing the missing int/float→decimal32 work and expected behavior;
update the comment to point to the issue if you choose the latter.
In `@cpp/include/cudf/operators/error.hpp`:
- Line 13: The public enum errc (OK, OVERFLOW, DIVISION_BY_ZERO) is missing
Doxygen documentation; add a Doxygen comment block (/** ... */) immediately
above the enum declaration that includes a `@brief` describing the purpose of errc
and short descriptions for each enumerator (OK, OVERFLOW, DIVISION_BY_ZERO) so
tools and users can understand the API; ensure the block uses standard tags
(e.g., `@brief` and either `@enum` or inline descriptions for each enumerator) and
follows the project's Doxygen style for public API types.
In `@cpp/src/jit/row_ir.hpp`:
- Around line 9-11: The header currently pulls in heavy factory headers for cudf
used only by the inline method instance_context::add_input(scalar const&); to
reduce compile overhead, move the implementation of
instance_context::add_input(scalar const&) out of the header into the
corresponding .cpp and add `#include` <cudf/column/column_factories.hpp> and
`#include` <cudf/scalar/scalar_factories.hpp> to that .cpp (or alternatively
forward-declare the minimal cudf types if feasible); update the header to keep
only the declaration of instance_context::add_input(scalar const&) and ensure
the .cpp defines the method with the required includes.
In `@cpp/src/join/filter_join_indices_jit.cu`:
- Around line 51-54: Change the parameter type for table_sources to a span of
const optionals to preserve const-correctness: update the signature of
build_join_filter_template_params to take std::span<std::optional<int32_t>
const> (instead of std::span<std::optional<int32_t>>) and do the same for the
other overload/variant in this file that accepts table_sources; adjust any
related variable names or usages to compile with the new const-qualified span
but do not modify the function body logic.
In `@cpp/src/stream_compaction/filter/filter.cu`:
- Around line 63-86: The public filter function is missing a CUDF_FUNC_RANGE()
NVTX scope; add a CUDF_FUNC_RANGE() call at the start of the filter(table_view
const& predicate_table, ast::expression const& predicate_expr, table_view const&
filter_table, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr)
function (same placement as in filter_extended) before performing the
ast_converter call and delegating to detail::filter so profiling/stack traces
include this public API call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6cb67a67-631a-4678-88d6-2acbe2289ca7
📒 Files selected for processing (25)
cpp/include/cudf/ast/detail/operator_functor.cuhcpp/include/cudf/ast/expressions.hppcpp/include/cudf/operators/ansi_arithmetic.cuhcpp/include/cudf/operators/arithmetic.cuhcpp/include/cudf/operators/bitwise.cuhcpp/include/cudf/operators/casts.cuhcpp/include/cudf/operators/comparison.cuhcpp/include/cudf/operators/detail/promote.cuhcpp/include/cudf/operators/error.hppcpp/include/cudf/operators/logic.cuhcpp/include/cudf/operators/math.cuhcpp/include/cudf/operators/null_handling.cuhcpp/include/cudf/operators/trigonometric.cuhcpp/include/cudf/operators/types.cuhcpp/src/ast/expressions.cppcpp/src/jit/column_accessor.cuhcpp/src/jit/join_column_accessor.cuhcpp/src/jit/row_ir.cppcpp/src/jit/row_ir.hppcpp/src/join/filter_join_indices_jit.cucpp/src/join/jit/filter_join_kernel.cucpp/src/join/jit/filter_join_kernel.cuhcpp/src/stream_compaction/filter/filter.cucpp/src/transform/transform.cucpp/tests/jit/row_ir.cpp
💤 Files with no reviewable changes (1)
- cpp/src/jit/join_column_accessor.cuh
…rrr/cudf into ansi-jit-1--refactor-row-ir
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 622-629: The Doxygen block for the ANSI absolute-value operator
uses backticks around commands which breaks parsing; edit the comment above the
function that "Computes absolute value for optional inputs with ANSI overflow
checks" (the docblock containing `@brief`, `@tparam`, `@param`, `@return`) and
remove all surrounding backticks so the tags are plain `@brief`, `@tparam`, `@param`,
and `@return` to match the rest of the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b457725b-4b7e-40f5-9d0e-a14c47f85dd3
📒 Files selected for processing (4)
cpp/include/cudf/operators/ansi_arithmetic.cuhcpp/include/cudf/operators/casts.cuhcpp/include/cudf/operators/logic.cuhcpp/include/cudf/operators/types.cuh
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/include/cudf/operators/ansi_arithmetic.cuh (2)
389-401:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn
DIVISION_BY_ZEROfor decimal zero divisors.Line 400 currently reports
b->value() == 0aserrc::OVERFLOW. That collapses two distinct failure modes into one and prevents callers from distinguishing divide-by-zero from numeric overflow.Suggested fix
/** * `@brief` Divides fixed-point decimal operands with ANSI checks. * * `@tparam` R Decimal representation type. * `@param` out Destination decimal value. * `@param` a Dividend. * `@param` b Divisor. - * `@return` errc::OVERFLOW on overflow or zero divisor, else errc::OK. + * `@return` errc::DIVISION_BY_ZERO on zero divisor, errc::OVERFLOW on overflow, else errc::OK. */ template <typename R> __device__ inline errc ansi_div(decimal<R>* out, decimal<R> const* a, decimal<R> const* b) { - if (numeric::division_overflow<R>(a->value(), b->value()) || b->value() == 0) { - return errc::OVERFLOW; - } + if (b->value() == 0) { return errc::DIVISION_BY_ZERO; } + if (numeric::division_overflow<R>(a->value(), b->value())) { return errc::OVERFLOW; } *out = decimal<R>{numeric::scaled_integer<R>{a->value() / b->value(), numeric::scale_type{a->scale() - b->scale()}}}; return errc::OK; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 389 - 401, In ansi_div (template<typename R>) change the error handling so that a zero divisor returns errc::DIVISION_BY_ZERO and only numeric::division_overflow<R>(a->value(), b->value()) returns errc::OVERFLOW; specifically, check b->value() == 0 first and return errc::DIVISION_BY_ZERO, otherwise check division_overflow and return errc::OVERFLOW, then errc::OK on success.
404-405:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe decimal quotient is truncated before the result scale is applied.
Line 404 performs integer division on the raw representations, then Line 405 applies the scale difference. This truncates the quotient before scaling, losing precision. For decimal division to be correct, the dividend must first be scaled to account for the divisor's scale before performing integer division. For example,
1.0 / 0.03should yield approximately33.33, but the current implementation truncates10 / 3to3, yielding an incorrect result.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 404 - 405, The current decimal division computes numeric::scaled_integer<R>{a->value() / b->value(), numeric::scale_type{a->scale() - b->scale()}}, which truncates before applying scale; instead scale the dividend first by 10^(b->scale() - a->scale()) (or the equivalent power-of-10 adjustment) and then perform integer division so precision is preserved. Locate the division logic around decimal<R>{numeric::scaled_integer<R>{...}} using symbols a->value(), b->value(), a->scale(), b->scale(), and numeric::scaled_integer, compute a scaled numerator (promoting to a wider integer type if needed to avoid overflow), divide by b->value(), and construct the result with the correct resulting scale (a->scale() - b->scale() or adjusted target scale). Ensure overflow checks or promotion to a larger R (or intermediate type) when multiplying by the scale factor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 389-401: In ansi_div (template<typename R>) change the error
handling so that a zero divisor returns errc::DIVISION_BY_ZERO and only
numeric::division_overflow<R>(a->value(), b->value()) returns errc::OVERFLOW;
specifically, check b->value() == 0 first and return errc::DIVISION_BY_ZERO,
otherwise check division_overflow and return errc::OVERFLOW, then errc::OK on
success.
- Around line 404-405: The current decimal division computes
numeric::scaled_integer<R>{a->value() / b->value(),
numeric::scale_type{a->scale() - b->scale()}}, which truncates before applying
scale; instead scale the dividend first by 10^(b->scale() - a->scale()) (or the
equivalent power-of-10 adjustment) and then perform integer division so
precision is preserved. Locate the division logic around
decimal<R>{numeric::scaled_integer<R>{...}} using symbols a->value(),
b->value(), a->scale(), b->scale(), and numeric::scaled_integer, compute a
scaled numerator (promoting to a wider integer type if needed to avoid
overflow), divide by b->value(), and construct the result with the correct
resulting scale (a->scale() - b->scale() or adjusted target scale). Ensure
overflow checks or promotion to a larger R (or intermediate type) when
multiplying by the scale factor.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 387797a5-61d7-4a9c-a0c6-2a40b69319ec
📒 Files selected for processing (1)
cpp/include/cudf/operators/ansi_arithmetic.cuh
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
cpp/include/cudf/operators/ansi_arithmetic.cuh (4)
311-315:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn
DIVISION_BY_ZEROfor decimal zero divisors.This path currently reports
errc::OVERFLOWwhenb->value() == 0. Besides losing the right ANSI error, it also meansansi_mod(decimal<R>...)can never surfaceDIVISION_BY_ZEROeven though its doc comment says it can.Suggested fix
__device__ inline errc ansi_div(decimal<R>* out, decimal<R> const* a, decimal<R> const* b) { - if (numeric::division_overflow<R>(a->value(), b->value()) || b->value() == 0) { - return errc::OVERFLOW; - } + if (b->value() == 0) { return errc::DIVISION_BY_ZERO; } + if (numeric::division_overflow<R>(a->value(), b->value())) { return errc::OVERFLOW; } *out = decimal<R>{numeric::scaled_integer<R>{a->value() / b->value(), numeric::scale_type{a->scale() - b->scale()}}}; return errc::OK; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 311 - 315, The current ansi_div(decimal<R>*) returns errc::OVERFLOW when b->value() == 0, losing the correct ANSI error; update the logic in ansi_div so that if b->value() == 0 it returns errc::DIVISION_BY_ZERO, otherwise check numeric::division_overflow<R>(a->value(), b->value()) and return errc::OVERFLOW only for an overflow; ensure the change also allows ansi_mod(decimal<R>...) to surface errc::DIVISION_BY_ZERO as documented.
430-438:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDecimal modulus is wrong for negative non-integer quotients.
ansi_div(decimal<R>...)already truncates toward zero, so flooring that truncated quotient cannot implement the floor-based modulus here. For example,-1.23 % 0.10should yield0.07, but this code computes-0.03. Compute the quotient directly on aligned unscaled values with floor-division semantics instead ofansi_div(...)followed byfloor(...).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 430 - 438, The ansi_mod implementation incorrectly computes remainder by calling ansi_div then floor; replace that approach in ansi_mod(decimal<R>* out, decimal<R> const* a, decimal<R> const* b) by computing the quotient from the aligned unscaled integer representations with floor-division semantics directly (handle sign of operands so division floors toward -infinity for negative results), then multiply that integer quotient by b and subtract from a to produce the remainder; ensure you align scales of a and b before integer division, construct the decimal<R> quotient appropriately, and remove the ansi_div(...) + floor(...) sequence so negative non-integer quotients produce correct mod results.
631-636:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrecision check undercounts digits when
scale() > 0.This compares only the unscaled rep against
10^precision, which is not enough once the decimal has a positive scale. That makes values produced byansi_divslip through incorrectly; e.g. a decimal representing100asrep=1, scale=2passesprecision=2here even though it needs three digits.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 631 - 636, The precision check must account for the decimal scale: instead of comparing the raw rep against 10^precision, compute an effective exponent = precision - scale and compare abs_value against 10^exponent (treat exponent <= 0 as only allowing abs_value == 0). Update the check in the function around value/abs_value to compute exponent = static_cast<R>(*precision) - static_cast<R>(*scale), use detail::ipow10(exponent) when exponent > 0, and return errc::OVERFLOW if abs_value >= ipow10(exponent) (or if exponent <= 0 and abs_value != 0). Keep the existing numeric_limits<R>::min() overflow check intact.
565-570:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestrict this overload to signed integrals and add a floating-point overload.
cuda::std::is_signed_v<T>istrueforfloatanddouble, so this template incorrectly accepts floating-point inputs. Line 569 then treatsnumeric_limits<float>::min()/double::min()as overflow even though those are the smallest positive normalized values, not the most negative. Floating-point types need a separate overload without the overflow check.Suggested fix
template <typename T> - requires(cuda::std::is_signed_v<T>) + requires(cuda::std::is_integral_v<T> && cuda::std::is_signed_v<T>) __device__ inline errc ansi_neg(T* out, T const* a) { if (*a == cuda::std::numeric_limits<T>::min()) { return errc::OVERFLOW; } *out = -(*a); return errc::OK; } + +template <typename T> + requires(cuda::std::is_floating_point_v<T>) +__device__ inline errc ansi_neg(T* out, T const* a) +{ + *out = -(*a); + return errc::OK; +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/include/cudf/operators/ansi_arithmetic.cuh` around lines 565 - 570, The template for ansi_neg wrongly accepts floating-point types because it only checks cuda::std::is_signed_v<T>; restrict the current overload to signed integrals by using a trait that combines cuda::std::is_integral_v<T> && cuda::std::is_signed_v<T> (so ansi_neg<T> for integrals keeps the overflow check against numeric_limits<T>::min()), and add a separate ansi_neg<T> overload constrained to floating-point types (cuda::std::is_floating_point_v<T>) that performs *out = -(*a) without the min-value overflow check; update the template constraints on the existing ansi_neg and add the floating-point overload accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 274-280: The ansi_div device function currently maps any
cuda::div_overflow() failure to errc::DIVISION_BY_ZERO; change it to distinguish
zero divisors from true arithmetic overflow: first check if *b == 0 and return
errc::DIVISION_BY_ZERO, otherwise call cuda::div_overflow(r, *a, *b) and return
errc::OVERFLOW on true, then store *out = r; also update the function docstring
(the comment at the top of ansi_div) to mention both divide-by-zero and overflow
cases. Use the function name ansi_div and the call to cuda::div_overflow to
locate and modify the code.
---
Outside diff comments:
In `@cpp/include/cudf/operators/ansi_arithmetic.cuh`:
- Around line 311-315: The current ansi_div(decimal<R>*) returns errc::OVERFLOW
when b->value() == 0, losing the correct ANSI error; update the logic in
ansi_div so that if b->value() == 0 it returns errc::DIVISION_BY_ZERO, otherwise
check numeric::division_overflow<R>(a->value(), b->value()) and return
errc::OVERFLOW only for an overflow; ensure the change also allows
ansi_mod(decimal<R>...) to surface errc::DIVISION_BY_ZERO as documented.
- Around line 430-438: The ansi_mod implementation incorrectly computes
remainder by calling ansi_div then floor; replace that approach in
ansi_mod(decimal<R>* out, decimal<R> const* a, decimal<R> const* b) by computing
the quotient from the aligned unscaled integer representations with
floor-division semantics directly (handle sign of operands so division floors
toward -infinity for negative results), then multiply that integer quotient by b
and subtract from a to produce the remainder; ensure you align scales of a and b
before integer division, construct the decimal<R> quotient appropriately, and
remove the ansi_div(...) + floor(...) sequence so negative non-integer quotients
produce correct mod results.
- Around line 631-636: The precision check must account for the decimal scale:
instead of comparing the raw rep against 10^precision, compute an effective
exponent = precision - scale and compare abs_value against 10^exponent (treat
exponent <= 0 as only allowing abs_value == 0). Update the check in the function
around value/abs_value to compute exponent = static_cast<R>(*precision) -
static_cast<R>(*scale), use detail::ipow10(exponent) when exponent > 0, and
return errc::OVERFLOW if abs_value >= ipow10(exponent) (or if exponent <= 0 and
abs_value != 0). Keep the existing numeric_limits<R>::min() overflow check
intact.
- Around line 565-570: The template for ansi_neg wrongly accepts floating-point
types because it only checks cuda::std::is_signed_v<T>; restrict the current
overload to signed integrals by using a trait that combines
cuda::std::is_integral_v<T> && cuda::std::is_signed_v<T> (so ansi_neg<T> for
integrals keeps the overflow check against numeric_limits<T>::min()), and add a
separate ansi_neg<T> overload constrained to floating-point types
(cuda::std::is_floating_point_v<T>) that performs *out = -(*a) without the
min-value overflow check; update the template constraints on the existing
ansi_neg and add the floating-point overload accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e1a65ab7-304a-494a-b12f-dffdd56568f2
📒 Files selected for processing (1)
cpp/include/cudf/operators/ansi_arithmetic.cuh
|
Another high-level question we probably want to resolve is the naming. From the very beginning of the ANSI support effort in cudf (e.g. #19243), the original idea was to introduce something like The main motivation behind this change was that So from my perspective, the |
PointKernel
left a comment
There was a problem hiding this comment.
Just a quick pass on the code. @lamarrr Do you think we could completely drop our custom logic and raw CUDA built-ins in favor of standard CCCL utilities? I'm thinking of things like switching from ::sin to cuda::std::sin, or using cuda::std::bit_and instead of raw & operators to clean it up.
Agreed, I had the impression ANSI_* precisely communicates the expected error handling behaviour, just like how FLOATS can be described using ISO specifications. This removes behavioural ambiguity and sets expectations for users.
_OVERFLOW is probably not a good name here because users would think it returns an overflow flag and the value, which isn't the case. The returned error in the operator is used for propagating errors to exceptions. Alternatively, we could use |
…:optional - Updated functions in null_handling.cuh and trigonometric.cuh to replace custom optional handling with cuda::std::optional. - Removed the types.cuh file as it was no longer needed after the refactor. - Changed return types of several functions from errc to void, simplifying the interface. - Ensured null handling is consistent across all operations by using cuda::std::nullopt.
Yes, but it doesn't provide any perf or correctness advantage, it is only overhead. Especially for |
Right, What I mean by “drop our custom logic” is avoiding reimplementation of existing functionality, such as Using CCCL utilities instead of CUDA C-style functions is meant to align with C++ best practices: avoid calling C-style functions directly in C++ code when a suitable C++ utility exists e.g. https://godbolt.org/z/Pc7YYGe36 the underlying SASS are the same. |
|
I'm on board with you; we should re-use existing functionality where possible. I had no idea |
|
/ok to test 0393ffc |
|
The ask was for a broader sweep over the rest of the newly added code. I looked closely at three headers, and two of them contained logic that duplicates existing utilities in cuDF or CCCL, which is why I wanted to check whether similar cases exist elsewhere in the PR. Existing AST duplication is reasonable to defer to #22598, but where there is a clean drop-in replacement, e.g. |
Description
Split from #22224
Preceded by #22511
Story: #22598
This Pull request:
coalesceoperatorEach operator is shaped as:
errc operator(T * ... outputs, T const * ... inputs);Each operator returns an error code to signal to the kernel if a failure occurred.
Choice of operator shape:
Future Work
Checklist