Opt-in overflow tracking for decimal fixed-point arithmetic by Avinash-Raj · Pull Request #22356 · rapidsai/cudf

Avinash-Raj · 2026-05-02T07:58:27Z

Summary

Introduces an opt-in path for detecting overflow in decimal fixed-point arithmetic, at two layers:

Value-type level – new numeric::decimal32_safe / decimal64_safe / decimal128_safe aliases backed by a new fixed_point<Rep, Rad, overflow_tracking::on> instantiation. These carry a sticky per-value bool that is set whenever a constructor, arithmetic operator, rescaled(), or float→decimal conversion would overflow the underlying integer representation. The flag propagates through +, -, *, % and rescaled(), so any expression built on the value-level operators (binaryop transforms, scans, reductions
expressed through value ops) carries it for free.
Column / binaryop level – new public API cudf::binary_operation_safe(..., rmm::device_scalar<uint32_t>& overflow_flag, ...) for the seven base-10 decimal arithmetic operators (ADD, SUB, MUL, DIV, MOD, PMOD, PYMOD). It updates a single device-side flag via atomicMax whenever any active (non-null) row overflows during arithmetic or rescale to the output scale. Designed for callers (e.g. velox-cudf) that need a one-shot “did any row overflow?” signal without per-row storage.

Motivation

Existing decimal32 / decimal64 / decimal128 silently wrap on overflow. Downstream engines that need stricter semantics today have to either implement their own checked arithmetic or rebuild libcudf. This PR keeps the default behavior unchanged but lets opt-in callers ask the same libcudf build for overflow-aware decimals.

… arithmetic

copy-pr-bot · 2026-05-02T07:59:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

bdice · 2026-05-02T16:00:40Z

I would favor adding new operators/functions rather than a compile-time flag — we want to be able to ship one build of cuDF that works universally.

devavret · 2026-05-04T12:32:28Z

@bdice I'd imagine this is not limited to scalar functions like binaryops and such. We'd also need to detect overflow in aggregations like groupby and reduce. And those would require overloading the existing +,* operators.

What's your opinion on adding a new type maybe called safe_decimal.

Spark should also have a concept of arithmetic exception for overflows. How do they support this with libcudf decimal?

mattgara · 2026-05-04T17:17:27Z

What's your opinion on adding a new type maybe called safe_decimal.

Inline with this thinking, we could extend cudf's fixed_point to implement this:

enum class overflow_tracking { off, on };

template <typename Rep, Radix Rad,
          overflow_tracking Track = overflow_tracking::off>
class fixed_point{ ...

and use if constexpr (Track == overflow_tracking::on) to branch the implementation based on this configuration. Effectively, this should be compile-time semantically equivalent to having a safe_decimal, but with less code duplication.

ttnghia · 2026-05-04T17:22:18Z

This is related to ANSI support that I've recently filed: #21676
We can use some kind of input enum like mentioned above, and custom return type as my suggestion.

…safe

Avinash-Raj · 2026-05-05T17:10:08Z

Ok, replaced the CMake CUDF_TRACK_FIXED_POINT_OVERFLOW flag with an opt-in overflow_tracking NTTP on numeric::fixed_point and add decimal*_safe / type_id::DECIMAL*_SAFE so velox-cudf gets per-element overflow detection without requiring a separate libcudf build.

pmattione-nvidia · 2026-05-05T17:23:41Z

You might want to put overflow tracking into the floating <--> decimal conversion code as well (include/cudf/fixed_point/detail/floating_conversion.hpp). E.g. there are the guarded_left_shift() and guarded_right_shift() functions, and there are potential overflows in convert_floating_to_integral_shifting(), shift_to_decimal_pospow(), and shift_to_decimal_negpow(). All potential overflow sites are mentioned explicitly in the code comments in these functions.

…or decimals

davidwendt · 2026-05-11T16:47:52Z

Could this be done without modifying the fixed-point class? The operator overloads are not really required for this to work since the dispatcher logic creates specialized code paths automatically. And somehow the overflow-awareness is communicated through to the appropriate operation/function and would generally require a special code path either way.

It seems the operators could be just free functions and not member functions or member operators. The return type would include an overflow flag along with the result. Similar to how the https://github.com/rapidsai/cudf/blob/main/cpp/include/cudf/fixed_point/detail/floating_conversion.hpp is a separate file handling specific operations for fixed-point.

The new free functions would be called as part of the specialized overflow-aware aggregation types for reduce, groupby, etc.
This would better isolate the specialized operation functions away from the current fixed-point implementations.

…ions for overflow tracking

Avinash-Raj · 2026-05-16T15:52:11Z

@davidwendt I have addressed your comment by adding free functions that return the operation result and has_overflow_occurred as a tuple, without making or reverting any changes to the fixed_point class.

davidwendt

Seems the gtests only execute on decimal64.

davidwendt · 2026-05-18T17:28:08Z

+ * @return true if the shift would overflow `Rep`, false otherwise
+ */
+template <typename Rep, Radix Rad, typename T>
+CUDF_HOST_DEVICE inline constexpr bool shift_overflows(T const& val, scale_type const& scale)


Can this function be moved to safe_arithmetic.hpp? Or does it need to be public?
Perhaps move it or create a new header file.

Yes, we could move the shift_overflows function to safe_arithmetic.hpp since that is the only place where it is called.

Done! the relevant function gets moved to safe_arithmetic.hpp and also added safe arithmetic tests to deal with decimal64 overflows.

mhaseeb123 · 2026-05-18T17:36:55Z

Is this PR and #22261 related or overlapping?

pmattione-nvidia · 2026-05-18T17:40:38Z

  return guarded_left_shift(static_cast<UnsignedRep>(shifting_rep), pow2);
 }

+/**


duplicating the functionality by adding new high level functions for all of this code is brutal; we shouldn't have to duplicate all of this code logic. It's too easy for one function to get updated but not the other, etc. I would instead add a template parameter bool for whether to check or not. Then that bool can be passed to e.g. checked_narrow_cast() (potentially renaming) and if it's constexpr-false we just do the straight up cast.

In addition I feel the overflow bool should be returned and not passed in.

Like you suggested, instead of duplicating the high-level functions, I used compile-time branching inside each function based on the CheckOverflow template parameter. For example,

safe_arithmetic.hpp

template <typename Fixed, typename Floating, CUDF_ENABLE_IF(cuda::std::is_floating_point_v<Floating> && cudf::is_fixed_point<Fixed>())> CUDF_HOST_DEVICE inline safe_result<typename Fixed::rep, Fixed::rad> safe_convert_floating_to_fixed(Floating floating, scale_type scale) { using Rep = typename Fixed::rep; if constexpr (Fixed::rad == Radix::BASE_10) { auto const [value, overflow] = convert_floating_to_integral<Rep, true>(floating, scale); return safe_result<Rep, Fixed::rad>{Fixed{scaled_integer<Rep>{value, scale}}, overflow}; } else { Rep const value = static_cast<Rep>(shift<Rep, Fixed::rad>(floating, scale)); return safe_result<Rep, Fixed::rad>{Fixed{scaled_integer<Rep>{value, scale}}, false}; } }

floating_conversion.hpp

/** * @brief Perform floating-point -> integer decimal conversion * * @tparam Rep The type of integer we are converting to, to store the decimal value * @tparam CheckOverflow Whether to return overflow detection with the converted value * @tparam FloatingType The type of floating-point object we are converting from * @param floating The floating point value to convert * @param scale The desired base-10 scale factor: decimal value = returned value * 10^scale * @return Integer representation of the floating-point value, or `{value, overflow}` when * `CheckOverflow` is true */ template <typename Rep, bool CheckOverflow = false, typename FloatingType, CUDF_ENABLE_IF(cuda::std::is_floating_point_v<FloatingType>)> CUDF_HOST_DEVICE inline auto convert_floating_to_integral(FloatingType const& floating, scale_type const& scale) { // Extract components of the floating point number using converter = floating_converter<FloatingType>; auto const integer_rep = converter::bit_cast_to_integer(floating); if (converter::is_zero(integer_rep)) { return maybe_with_overflow<CheckOverflow>(Rep{0}, false); } // Note that the significand here is an unsigned integer with sizeof(FloatingType) auto const is_negative = converter::get_is_negative(integer_rep); auto const [significand, floating_pow2] = converter::get_significand_and_pow2(integer_rep); // Add half a bit if truncating to yield expected value, see function for discussion. auto const pow10 = static_cast<int>(scale); auto const [base2_value, pow2] = add_half_if_truncates(floating, significand, floating_pow2, pow10); // Apply the powers of 2 and 10 to convert to decimal. auto const [magnitude_u, overflow] = convert_floating_to_integral_shifting<Rep, FloatingType, CheckOverflow>( base2_value, pow10, pow2); if constexpr (!CheckOverflow) { // Reapply the sign and return // NOTE: Cast can overflow! auto const signed_magnitude = static_cast<Rep>(magnitude_u); return is_negative ? -signed_magnitude : signed_magnitude; } else { // Reapply sign with saturation on representational overflow. using UnsignedRep = cuda::std::make_unsigned_t<Rep>; auto const umax = static_cast<UnsignedRep>(cuda::std::numeric_limits<Rep>::max()); if (!is_negative) { if (magnitude_u > umax) { return cuda::std::pair<Rep, bool>{cuda::std::numeric_limits<Rep>::max(), true}; } return cuda::std::pair<Rep, bool>{static_cast<Rep>(magnitude_u), overflow}; } // Negative range has one extra representable value for two's complement. // magnitude == max+1 maps to min. auto const umin_mag = umax + UnsignedRep{1}; if (magnitude_u > umin_mag) { return cuda::std::pair<Rep, bool>{cuda::std::numeric_limits<Rep>::min(), true}; } if (magnitude_u == umin_mag) { return cuda::std::pair<Rep, bool>{cuda::std::numeric_limits<Rep>::min(), overflow}; } return cuda::std::pair<Rep, bool>{static_cast<Rep>(-static_cast<Rep>(magnitude_u)), overflow}; } }

I hope this is the right way, correct me if I'm wrong.

My point is we should not have both shift_to_decimal_pospow_checked() and shift_to_decimal_pospow(). It should just be one function with template bool on whether to check or not. Same for all of the others. Duplicating all of this code will just cause maintenance nightmares.

Agreed and it got fixed on floating_conversion.hpp.

@davidwendt “In addition, I feel the overflow bool should be returned and not passed in.” If my understanding is correct, you don’t want to pass a sticky overflow flag, rmm::device_scalar<std::uint32_t>& overflow_flag, to the high-level safe functions (refer below). Instead, the corresponding overflow value (bool) should be returned as the second output parameter. Is that correct?

template <typename LhsType, typename RhsType> std::unique_ptr<column> binary_operation_safe(LhsType const& lhs, RhsType const& rhs, binary_operator op, data_type output_type, rmm::device_scalar<std::uint32_t>& overflow_flag, rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) {

Yes. Should this not return a bool per row?
I would've expected the result to be 2 columns. One for the fixed-point value and one of bool values.

…n w.r.t template param

feat(fixed_point): optional sticky overflow tracking on doing decimal…

38de09b

… arithmetic

Avinash-Raj requested review from a team as code owners May 2, 2026 07:58

Avinash-Raj requested review from pmattione-nvidia and vuule May 2, 2026 07:58

github-actions Bot assigned Avinash-Raj May 2, 2026

github-actions Bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels May 2, 2026

Avinash-Raj changed the title ~~Add optional sticky fixed-point overflow tracking (CUDF_TRACK_FIXED_POINT_OVERFLOW)~~ [Draft] Add optional sticky fixed-point overflow tracking (CUDF_TRACK_FIXED_POINT_OVERFLOW) May 2, 2026

Avinash-Raj marked this pull request as draft May 2, 2026 07:59

Avinash-Raj changed the title ~~[Draft] Add optional sticky fixed-point overflow tracking (CUDF_TRACK_FIXED_POINT_OVERFLOW)~~ Add optional sticky fixed-point overflow tracking (CUDF_TRACK_FIXED_POINT_OVERFLOW) May 2, 2026

feat(fixed_point): NTTP-based opt-in overflow tracking with decimal*_…

cb1e7c9

…safe

Avinash-Raj added 2 commits May 7, 2026 11:28

Track float→decimal overflow for decimal*_safe

927fe61

feat(binaryop): add binary_operation_safe with device overflow flag f…

6a33a77

…or decimals

Avinash-Raj changed the title ~~Add optional sticky fixed-point overflow tracking (CUDF_TRACK_FIXED_POINT_OVERFLOW)~~ Opt-in overflow tracking for decimal fixed-point arithmetic May 11, 2026

Avinash-Raj added 2 commits May 16, 2026 11:10

reverted modifications to fixed_point class and introduced free funct…

281dce6

…ions for overflow tracking

Merge branch 'main' into avi/decimal-overflow

063f1d7

davidwendt requested changes May 18, 2026

View reviewed changes

pmattione-nvidia reviewed May 18, 2026

View reviewed changes

introduced compile-time branching in convert-float-to-integer functio…

f874b5e

…n w.r.t template param

Avinash-Raj requested review from davidwendt and pmattione-nvidia May 22, 2026 06:49

Conversation

Avinash-Raj commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Uh oh!

copy-pr-bot Bot commented May 2, 2026

Uh oh!

bdice commented May 2, 2026

Uh oh!

devavret commented May 4, 2026

Uh oh!

mattgara commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ttnghia commented May 4, 2026

Uh oh!

Avinash-Raj commented May 5, 2026

Uh oh!

pmattione-nvidia commented May 5, 2026

Uh oh!

davidwendt commented May 11, 2026

Uh oh!

Avinash-Raj commented May 16, 2026

Uh oh!

davidwendt left a comment

Choose a reason for hiding this comment

Uh oh!

davidwendt May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Avinash-Raj May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Avinash-Raj May 22, 2026

Choose a reason for hiding this comment

Uh oh!

mhaseeb123 commented May 18, 2026

Uh oh!

pmattione-nvidia May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidwendt May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Avinash-Raj May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmattione-nvidia May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Avinash-Raj May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Avinash-Raj May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidwendt May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Avinash-Raj commented May 2, 2026 •

edited

Loading

mattgara commented May 4, 2026 •

edited

Loading

pmattione-nvidia May 18, 2026 •

edited

Loading

Avinash-Raj May 21, 2026 •

edited

Loading

Avinash-Raj May 22, 2026 •

edited

Loading