Skip to content

Comments

Improved radix division, helpers, sqrt(2) and exp(1) profile program#2583

Merged
fredrik-johansson merged 5 commits intoflintlib:mainfrom
fredrik-johansson:radix3
Feb 20, 2026
Merged

Improved radix division, helpers, sqrt(2) and exp(1) profile program#2583
fredrik-johansson merged 5 commits intoflintlib:mainfrom
fredrik-johansson:radix3

Conversation

@fredrik-johansson
Copy link
Collaborator

The main improvement in this PR is to speed up radix_divrem in the average case by incorporating the quotient trick: instead of computing the full remainder to determine the rounding of the quotient, compute an extra fraction limb of the quotient and see if this allows determining the correct rounding. With high probability, we then just need to compute the low half of the remainder, and we can skip the remainder completely when we want just the quotient.

In order to make this legal, I had to work out an error bound for radix_inv_approx and for the division code. The derivation of the bound is sketched in comments. I'm reasonably confident that the bound is correct, but I'll probably redo it more carefully in the future when the division code is more final (Karp-Markstein being the big missing optimization).

Additional new helper functions added in this PR:

  • radix_mul_1
  • radix_mul_two
  • radix_divrem_two
  • radix_rsqrt_1_approx_basecase
  • radix_rsqrt_1_approx

New profile for division with remainder:

$ build/radix/profile/p-divrem
   decimal      mpn  decimal        time        time    relative
    digits    limbs    limbs flint_mpn_divrem radix_divrem  time

        19        1        1    1.11e-08     5.4e-09    0.486x
        38        2        2    1.71e-08    7.23e-08    4.228x
        57        3        3    4.05e-08    1.22e-07    3.012x
        76        4        4    4.52e-08    1.67e-07    3.695x
       114        6        6    6.48e-08    2.67e-07    4.120x
       171        9        9    1.18e-07    4.29e-07    3.636x
       247       13       13    1.85e-07    7.48e-07    4.043x
       361       19       19    3.18e-07    1.39e-06    4.371x
       532       28       28    5.54e-07    2.07e-06    3.736x
       798       42       42    1.06e-06    3.34e-06    3.151x
      1197       63       63    2.15e-06    5.82e-06    2.707x
      1786       93       94    4.03e-06    1.51e-05    3.747x
      2679      140      141    7.91e-06    2.73e-05    3.451x
      4009      209      211    1.56e-05     4.2e-05    2.692x
      6004      312      316    2.93e-05    6.17e-05    2.106x
      9006      468      474    5.53e-05    9.25e-05    1.673x
     13509      702      711    0.000108    0.000151    1.398x
     20254     1052     1066    0.000193    0.000236    1.223x
     30381     1577     1599    0.000336    0.000331    0.985x
     45562     2365     2398    0.000502    0.000507    1.010x
     68343     3548     3597    0.000784    0.000765    0.976x
    102505     5321     5395     0.00127     0.00118    0.929x
    153748     7981     8092      0.0019     0.00173    0.911x
    230622    11971    12138     0.00285     0.00278    0.975x
    345933    17956    18207     0.00438      0.0043    0.982x
    518890    26934    27310     0.00657     0.00646    0.983x
    778335    40400    40965      0.0104     0.01002    0.963x
   1167493    60599    61447      0.0155      0.0152    0.981x
   1751230    90898    92170      0.0251      0.0243    0.968x
   2626845   136347   138255      0.0366      0.0392    1.071x
   3940258   204520   207382      0.0592      0.0598    1.010x
   5910387   306780   311073      0.0923       0.096    1.040x
   8865571   460169   466609       0.145       0.148    1.021x
  13298347   690253   699913       0.226       0.255    1.128x
  19947511  1035379  1049869       0.348       0.424    1.218x
  29921257  1553067  1574803       0.562       0.625    1.112x
  44881876  2329600  2362204       0.862       1.167    1.354x
  67322814  3494400  3543306       1.404       1.699    1.210x
 100984221  5241599  5314959       2.265       2.866    1.265x
 151476322  7862398  7972438       3.627       4.223    1.164x
 227214483 11793597 11958657       5.581       7.127    1.277x
 340821715 17690395 17937985        8.92      11.347    1.272x
 511232563 26535591 26906977       15.61      17.108    1.096x

New profile for truncating division:

$ build/radix/profile/p-tdiv_q 
   decimal      mpn  decimal        time        time    relative
    digits    limbs    limbs flint_mpn_tdiv_q radix_tdiv_q  time

        19        1        1    1.13e-08    5.37e-09    0.475x
        38        2        2    1.73e-08    5.96e-08    3.445x
        57        3        3    4.08e-08    9.51e-08    2.331x
        76        4        4    4.58e-08    1.35e-07    2.948x
       114        6        6     6.6e-08    2.08e-07    3.152x
       171        9        9    1.18e-07    3.45e-07    2.924x
       247       13       13    1.86e-07    6.12e-07    3.290x
       361       19       19    3.21e-07    1.15e-06    3.583x
       532       28       28     5.6e-07    1.68e-06    3.000x
       798       42       42    1.07e-06    2.62e-06    2.449x
      1197       63       63    2.19e-06    4.37e-06    1.995x
      1786       93       94    4.07e-06       1e-05    2.457x
      2679      140      141       8e-06    1.78e-05    2.225x
      4009      209      211    1.57e-05    3.12e-05    1.987x
      6004      312      316    2.95e-05    4.61e-05    1.563x
      9006      468      474    5.56e-05    7.09e-05    1.275x
     13509      702      711     0.00011    0.000118    1.073x
     20254     1052     1066     0.00015    0.000171    1.140x
     30381     1577     1599     0.00025    0.000256    1.024x
     45562     2365     2398    0.000359    0.000384    1.070x
     68343     3548     3597    0.000578    0.000588    1.017x
    102505     5321     5395    0.000932    0.000897    0.962x
    153748     7981     8092     0.00145     0.00133    0.917x
    230622    11971    12138     0.00215     0.00213    0.991x
    345933    17956    18207     0.00324     0.00329    1.015x
    518890    26934    27310     0.00497       0.005    1.006x
    778335    40400    40965     0.00759      0.0077    1.014x
   1167493    60599    61447      0.0115      0.0116    1.009x
   1751230    90898    92170      0.0181      0.0187    1.033x
   2626845   136347   138255      0.0274      0.0298    1.088x
   3940258   204520   207382      0.0437      0.0457    1.046x
   5910387   306780   311073      0.0696      0.0722    1.037x
   8865571   460169   466609       0.109       0.112    1.028x
  13298347   690253   699913       0.168       0.193    1.149x
  19947511  1035379  1049869       0.253       0.316    1.249x
  29921257  1553067  1574803       0.414       0.479    1.157x
  44881876  2329600  2362204       0.664       0.833    1.255x
  67322814  3494400  3543306        1.05       1.241    1.182x
 100984221  5241599  5314959       1.718       2.121    1.235x
 151476322  7862398  7972438       2.717       3.211    1.182x
 227214483 11793597 11958657       4.281        5.49    1.282x
 340821715 17690395 17937985       6.861       8.584    1.251x
 511232563 26535591 26906977      10.845      13.094    1.207x

The radix division code is now quite competitive with our mpn/fmpz code and even slightly faster around 100000 digits, despite the Karp-Markstein trick not even being used yet. This hints that some decent improvements are possible for the mpn/fmpz code.

This PR also adds a profile program for computing the constants $\sqrt{2}$ and $e$ with radix fixed-point arithmetic and comparing these to the current Arb algorithms. (This will be more interesting with a future radix ball arithmetic type.)

Sample run:

$ build/radix/profile/p-constants 

10^5 digits (332193 bits; 5264 decimal limbs = 100016 digits)
Arb compute sqrt(2):     cpu/wall(s): 0.000494 0.000495
Arb to string:           cpu/wall(s): 0.00199 0.00199
[1.41421356237309504880{...99959 digits...}18377008180561014752 +/- 3.47e-100000]
Radix compute sqrt(2):   cpu/wall(s): 0.000506 0.000505
Radix to string:         cpu/wall(s): 3.39e-05 3.39e-05
1.41421356237309504880{...99959 digits...}1837700818056101475234303909623742186

10^6 digits (3321929 bits; 52632 decimal limbs = 1000008 digits)
Arb compute sqrt(2):     cpu/wall(s): 0.00538 0.00539
Arb to string:           cpu/wall(s): 0.026 0.026
[1.41421356237309504880{...999959 digits...}42044193016904841204 +/- 4.16e-1000000]
Radix compute sqrt(2):   cpu/wall(s): 0.00534 0.00534
Radix to string:         cpu/wall(s): 0.000341 0.000341
1.41421356237309504880{...999959 digits...}42044193016904841204390646280

10^7 digits (33219281 bits; 526316 decimal limbs = 10000004 digits)
Arb compute sqrt(2):     cpu/wall(s): 0.0638 0.0638
Arb to string:           cpu/wall(s): 0.376 0.376
[1.41421356237309504880{...9999959 digits...}41235727278721315897 +/- 2.55e-10000000]
Radix compute sqrt(2):   cpu/wall(s): 0.0707 0.0707
Radix to string:         cpu/wall(s): 0.00342 0.00342
1.41421356237309504880{...9999959 digits...}4123572727872131589714262

10^8 digits (332192810 bits; 5263158 decimal limbs = 100000002 digits)
Arb compute sqrt(2):     cpu/wall(s): 1.117 1.118
Arb to string:           cpu/wall(s): 5.108 5.108
[1.41421356237309504880{...99999959 digits...}23443287604232894971 +/- 2.30e-100000000]
Radix compute sqrt(2):   cpu/wall(s): 1.104 1.104
Radix to string:         cpu/wall(s): 0.0746 0.0745
1.41421356237309504880{...99999959 digits...}23443287604232894971132

10^9 digits (3321928095 bits; 52631579 decimal limbs = 1000000001 digits)
Arb compute sqrt(2):     cpu/wall(s): 15.605 15.607
Arb to string:           cpu/wall(s): 69.715 69.721
[1.41421356237309504880{...999999959 digits...}17477180528158585274 +/- 2.88e-1000000000]
Radix compute sqrt(2):   cpu/wall(s): 13.37 13.371
Radix to string:         cpu/wall(s): 0.663 0.663
1.41421356237309504880{...999999959 digits...}1747718052815858527422

10^5 digits (332193 bits; 5264 decimal limbs = 100016 digits)
Arb compute e:     cpu/wall(s): 0.00507 0.00507
Arb to string:     cpu/wall(s): 0.00217 0.00216
[2.71828182845904523536{...99959 digits...}05429107972100427166 +/- 1.85e-100000]
Radix compute e:   cpu/wall(s): 0.00543 0.00543
Radix to string:   cpu/wall(s): 3.54e-05 3.54e-05
2.71828182845904523536{...99959 digits...}0542910797210042716581577830892298892

10^6 digits (3321929 bits; 52632 decimal limbs = 1000008 digits)
Arb compute e:     cpu/wall(s): 0.057 0.057
Arb to string:     cpu/wall(s): 0.0279 0.0279
[2.71828182845904523536{...999959 digits...}01379817644769422819 +/- 1.17e-1000000]
Radix compute e:   cpu/wall(s): 0.0627 0.0627
Radix to string:   cpu/wall(s): 0.000354 0.000354
2.71828182845904523536{...999959 digits...}01379817644769422818883747115

10^7 digits (33219281 bits; 526316 decimal limbs = 10000004 digits)
Arb compute e:     cpu/wall(s): 0.675 0.676
Arb to string:     cpu/wall(s): 0.375 0.375
[2.71828182845904523536{...9999959 digits...}54442929856139670538 +/- 3.84e-10000000]
Radix compute e:   cpu/wall(s): 0.794 0.793
Radix to string:   cpu/wall(s): 0.00356 0.00356
2.71828182845904523536{...9999959 digits...}5444292985613967053761648

10^8 digits (332192810 bits; 5263158 decimal limbs = 100000002 digits)
Arb compute e:     cpu/wall(s): 8.9 8.902
Arb to string:     cpu/wall(s): 5.098 5.098
[2.71828182845904523536{...99999959 digits...}08296062831449211820 +/- 2.56e-100000000]
Radix compute e:   cpu/wall(s): 10.686 10.686
Radix to string:   cpu/wall(s): 0.0731 0.0731
2.71828182845904523536{...99999959 digits...}08296062831449211820255

10^9 digits (3321928095 bits; 52631579 decimal limbs = 1000000001 digits)
Arb compute e:     cpu/wall(s): 120.079 120.086
Arb to string:     cpu/wall(s): 68.829 68.833
[2.71828182845904523536{...999999959 digits...}95103567871085820019 +/- 1.62e-1000000000]
Radix compute e:   cpu/wall(s): 138.655 138.665
Radix to string:   cpu/wall(s): 0.67 0.67
2.71828182845904523536{...999999959 digits...}9510356787108582001916

For $\sqrt{2}$, radix is essentially as fast arb (suggesting that arb_sqrt indeed can be improved), and much faster when one includes the time to convert to decimal.

For $e$, radix is slightly slower than arb, but slightly faster overall when one includes the time to convert to decimal.

@fredrik-johansson fredrik-johansson merged commit 9812f32 into flintlib:main Feb 20, 2026
37 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant