Skip to content

Use binary search for the S(alpha,beta) interval lookup in thermr#401

Open
ramic-k wants to merge 1 commit into
njoy:developfrom
ramic-k:perf/thermr-sig-binary-search
Open

Use binary search for the S(alpha,beta) interval lookup in thermr#401
ramic-k wants to merge 1 commit into
njoy:developfrom
ramic-k:perf/thermr-sig-binary-search

Conversation

@ramic-k

@ramic-k ramic-k commented Jun 7, 2026

Copy link
Copy Markdown

What

sig() located the bracketing α and β intervals with linear scans on
every evaluation:

do i=1,nb1
   ib=i
   if (bbb.lt.beta(i+1)) exit
enddo

For modern fine-grid tabulations these scans dominate THERMR runtime:
profiling a coherent-crystal evaluation with a 399 α × 5001 β law put
~70% of all samples inside the two scan loops. This PR replaces both
with binary searches that select exactly the same interval (the
first i with value < grid(i+1), saturating at n-1), making the
lookup O(log n) per evaluation.

Effect

  • Output is bit-for-bit identical — the selected (ia, ib) pair is
    the same for every query, including the equality and saturation edge
    cases, so all downstream arithmetic is unchanged. Verified by byte
    comparison of full RECONR/BROADR/THERMR/ACER output chains on three
    different TSL tapes (incoherent and coherent laws, 399×400 to
    399×5001 grids).
  • On the small tabulations in the repo test suite the change is
    performance-neutral (and the suite results are identical with and
    without it; same pass/fail sets on macOS/arm64).
  • On the 399×5001 research tape above, THERMR wall time dropped from
    39 to 23 minutes at -O0 (the scans shrink from ~70% of samples to a
    negligible share; remaining time is interpolation and quadrature).
    Large-grid tapes need nwscr enlarged to fit — that is independent
    of this change.

No new dependencies, no interface changes; the two loop replacements
and two scratch integers are the entire diff.

sig() located the bracketing alpha and beta intervals with linear
scans on every evaluation. For modern tabulations with hundreds to
thousands of grid points the scans dominate THERMR runtime (~70% of
profile samples on a 399x5001-point law). Replace both scans with
binary searches that select exactly the same interval (first i with
value < grid(i+1), saturating at n-1). Output is bit-for-bit
identical; THERMR time on large TSL tapes drops severalfold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant