Skip to content

Decimal: use UInt128 significand to speed up operations#2022

Open
xwu wants to merge 18 commits into
swiftlang:mainfrom
xwu:decimal-performance
Open

Decimal: use UInt128 significand to speed up operations#2022
xwu wants to merge 18 commits into
swiftlang:mainfrom
xwu:decimal-performance

Conversation

@xwu

@xwu xwu commented Jun 3, 2026

Copy link
Copy Markdown

This PR introduces a new internal computed property (called _significand to distinguish itself from _mantissa) of type UInt128, which allows us to perform arithmetic operations bypassing VariableLengthInteger.

Although conceptually low-hanging fruit, fully threading the changes through the implementation represents an overhaul of some scale but with performance gains to match. The resulting implementations (written by hand) are fortunately imminently readable. Latent bugs are addressed along the way, substantially improving the precision of mathematical operations on Decimal.

Motivation:

#1754 demonstrated that making VariableLengthInteger non-allocating (and not really variable) dramatically improves performance. While improved, however, Decimal operations are still by no means optimized for performance. Sadly, this state of affairs encourages the erroneous impression that decimal floating-point is intrinsically much less performant than it could be as compared to alternative numeric representations.

The prior PR was a fantastic and inspiring first move. However, absent context about other advances in Swift, LLM-driven efforts overlook that performing arithmetic limb-by-limb (which is what VariableLengthInteger encapsulates) is no longer necessary for implementing basic operations, as the 128-bit mantissa can be bitwise copied into a UInt128 so that we can leverage more performant compiler primitives.

Modifications:

This PR replaces VariableLengthInteger operations with UInt128 operations, rewriting comparison, addition (and subtraction), multiplication, and division. Normalization is also rewritten to remove the last consumer of VariableLengthInteger, but it is also now only called by the NSDecimalNormalize shim.

Along the way, latent bugs are either annotated or fixed altogether--see added tests. For example:

  • The existing implementation truncates the 'refitted' mantissa in the case of arithmetic overflow during addition, which is not correct for the documented default .plain rounding mode (it also makes no attempt to behave correctly for other rounding modes). The revised implementation now respects rounding mode.

  • The existing implementation exhibits unexpected behavior when multiplying two values with small exponents that should lead to an underflow result. (Reading the code suggests there should be a runtime trap, but in the REPL there's just a very large arbitrary result.) The revised implementation now correctly throws underflow.

  • The existing implementation always rounds towards zero (i.e., truncates) for division. The revised implementation now respects rounding mode (crucially, the documented default rounding mode, .plain).

  • The existing implementation normalizes dividend and divisor by an arbitrary criterion chosen in 1999, which has been associated with bugs; code comments reference rdar://problem/5197585 and rdar://problem/2354750. The revised implementation now scales the dividend's significand appropriately to fill 128 bits.

  • The existing implementation produces a NaN value during normalization if the smaller of the two inputs has a finite, negative value that truncates to zero. The revised implementation now respects rounding mode and, if rounding up such a negative value, produces zero rather than spurious NaN.

  • In the existing implementation, legacy NSDecimal* functions other than Add never signal loss of precision, as such information was neither consistently computed nor plumbed through. The revised implementation now indicates loss of precision whenever an inexact result is returned.

Result:

Using benchmarks added in #1754, this PR results in a ~350% ~500% boost in addition performance, a ~750% ~950% boost in multiplication performance, and a ~7000% boost in division performance as measured by throughput.

And, as described above, arithmetic operations now have improved precision and latent bugs have been fixed. VariableLengthInteger is removed entirely.

----------------------------------------------------------------------------------------------------------------------------
Decimal add metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│          Throughput (# / s) (M)          │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                   main                   │        13 │        13 │        13 │        13 │        13 │        13 │        13 │        26 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│               Current_run                │        80 │        78 │        78 │        77 │        76 │        71 │        69 │       154 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │        67 │        65 │        65 │        64 │        63 │        58 │        56 │       128 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │       515 │       500 │       500 │       492 │       485 │       446 │       431 │       128 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛


----------------------------------------------------------------------------------------------------------------------------
Decimal divide metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│          Throughput (# / s) (M)          │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                   main                   │         1 │         1 │         1 │         1 │         1 │         1 │         1 │         2 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│               Current_run                │        74 │        73 │        73 │        72 │        72 │        71 │        71 │       145 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │        73 │        72 │        72 │        71 │        71 │        70 │        70 │       143 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │      7300 │      7200 │      7200 │      7100 │      7100 │      7000 │      7000 │       143 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛


----------------------------------------------------------------------------------------------------------------------------
Decimal multiply metrics
----------------------------------------------------------------------------------------------------------------------------

╒══════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│          Throughput (# / s) (M)          │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞══════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│                   main                   │         8 │         8 │         8 │         8 │         8 │         8 │         8 │        17 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│               Current_run                │        85 │        84 │        84 │        83 │        82 │        80 │        80 │       166 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│                    Δ                     │        77 │        76 │        76 │        75 │        74 │        72 │        72 │       149 │
├──────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│              Improvement %               │       962 │       950 │       950 │       938 │       925 │       900 │       900 │       149 │
╘══════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛

Testing:

All 33 existing unit tests for Decimal pass (with modifications to account for now-corrected rounding with division and improved precision--see comments below). Additional unit tests are added for corrected behavior.
All 5 existing benchmark tests show improved performance compared to the current baseline as described above.

@xwu xwu changed the title Decimal: use UInt128 significand to speed up comparison and addition Decimal: use UInt128 significand to speed up operations Jun 3, 2026
@xwu xwu force-pushed the decimal-performance branch from 152a2d4 to 4a26c4a Compare June 4, 2026 14:54
Comment thread Tests/FoundationEssentialsTests/DecimalTests.swift Outdated
Comment thread Tests/FoundationEssentialsTests/DecimalTests.swift
Comment thread Tests/FoundationEssentialsTests/DecimalTests.swift
Comment thread Tests/FoundationEssentialsTests/DecimalTests.swift
Comment thread Tests/FoundationEssentialsTests/DecimalTests.swift
@xwu xwu marked this pull request as ready for review June 5, 2026 15:50
@xwu xwu requested a review from a team as a code owner June 5, 2026 15:50
@xwu

xwu commented Jun 5, 2026

Copy link
Copy Markdown
Author

@swift-ci test macOS

@xwu

xwu commented Jun 5, 2026

Copy link
Copy Markdown
Author

cc @stephentyrone :)

Comment thread Sources/FoundationEssentials/Decimal/Decimal+Conformances.swift Outdated
Comment thread Sources/FoundationEssentials/Decimal/Decimal+Math.swift
Comment thread Sources/FoundationEssentials/Decimal/Decimal+Math.swift
Comment thread Sources/FoundationEssentials/Decimal/Decimal+Math.swift Outdated
@xwu xwu force-pushed the decimal-performance branch from 42c3295 to a4e1441 Compare June 6, 2026 13:23
@xwu

This comment was marked as outdated.

Comment thread Sources/FoundationEssentials/Decimal/Decimal+Math.swift Outdated
@xwu

This comment was marked as outdated.

@xwu

This comment was marked as outdated.

@xwu

This comment was marked as outdated.

@xwu

This comment was marked as outdated.

let result = try lhs._multiply(by: rhs, roundingMode: .plain)
lhs = result
} catch _CalculationError.underflow {
lhs = .zero

@xwu xwu Jun 10, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this may be formally tantamount to a policy change as compared to NSDecimal* guarantees, but is also probably the more (only?) reasonable behavior.

The prior implementation never threw .underflow, so there is no actual precedent for this specific operation. In practice, that implementation also had sufficient issues with correctness, precision, and not respecting rounding mode that I'm not sure users could rely upon it to produce zero or NaN (or sometimes a totally unspecified arbitrarily large result—see above).

It is already the behavior in existing code with respect to at least some operations to underflow to zero:

if actual == .underflow {
self = 0

@xwu xwu force-pushed the decimal-performance branch from bb37ca9 to 135ec4e Compare June 12, 2026 17:01
@xwu xwu force-pushed the decimal-performance branch from 135ec4e to 92c1a86 Compare June 12, 2026 17:02
@xwu

xwu commented Jun 12, 2026

Copy link
Copy Markdown
Author

@swift-ci test macOS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants