Miscellaneous performance improvements by PerezHz · Pull Request #243 · PerezHz/TaylorIntegration.jl

PerezHz · 2026-06-07T22:35:26Z

This PR aims at improving overall performance of the package from mostly two different angles:

On JT integrations, stepsize noticeably allocates. Here, we fix this by "taylorizing" by hand this method.
Avoid deepcopy and instead rely onTaylorSeries.identity!: this avoids extra work by deepcopy not needed here, while maintaning full independency of objects in memory where needed (i.e., avoiding aliasing) and minimally allocating as much as possible.
Other scattered fixes.

PerezHz · 2026-06-07T22:43:23Z

This PR is not really dependent on JuliaDiff/TaylorSeries.jl#415, but that branch is checked out here temporarily to see how they work together.

PerezHz · 2026-06-07T22:45:00Z

And fixing JuliaDiff/TaylorSeries.jl#411 did cause a error test here to be fixed, so that case can actually be tested now (see changes around kepler1! in taylorize.jl)

coveralls · 2026-06-08T00:15:28Z

Coverage is 94.135% — jp/TS-PR-415 into main. No base build found for main.

lbenet · 2026-06-12T18:14:24Z

+    length(dest) == length(src) || return _stored_value(src)
+    TS.identity!(dest, src)
+    return dest
+end


I don't think I understand this function.

If the lengths are the same, which I would think is the common case, _stored_value is some sort of alias to _stored_taylor, which at the end of the day returns a new object(!) using identity!, which is identical to src. I don't see that dest is updated in this case. Moreover, if the lengths are not identical, identity! is also called. So, why not simply use identity! directly...

You're right that _stored_taylor was essentially an alias for _stored_value; I have removed the former. The difference is precisely the allocation of a new object; I'm trying to find an easier way to do this. The problem is, even two in the method body the values of dest and src are known, the place (slot) in memory where dest will be saved is not known, so maybe this mechanism should be changed so that the slot is passed and updated accordingly.

I reorganized this part of the code a bit, but left _stored_value and _copy_value! for now since still represent two genuinely different operations: _stored_value(src) means "an independent stored value is needed, and there may not be a destination yet"; _copy_value!(dest, src) means "there's already a available destination in memory; reuse it if compatible". So yes, both paths use identity!, but one copies into fresh storage and the other copies into existing storage. I would argue that distinction is important enough to keep. The direct way to use this functionality is now _store_value!, which hides that choice:

_store_value!(dest_array, src, i) _store_value!(dest_matrix, src, i, j)

lbenet · 2026-06-12T18:14:44Z


+When `copy_solution` is `Val(false)`, the returned solution borrows views of the
+given arrays. When `copy_solution` is `Val(true)`, `t`, `x`, `p`, `tevents`,
+`xevents` and `gresids` are copied into independent owned storage.


I would guess that the views are more performant, so what's an use case for the Val(true) case?

This is mainly related to object ownership: taylorinteg! (with the bang !) allows users to reuse caches, i.e., it uses a shared memory model, but since the .psol field is included in the cache, when that cache is reused to construct another solution, psol of the previous solution is overwritten with the new solution. So in this case we want copy_solution = Val(true), which is essentially saying reuse caches, but construct independent solution objects which "own" all their fields (thus, an independent copy is made at solution object construction).

The case copy_solution = Val(false) means users will not necessarily reuse the cache (e.g., by using taylorinteg (non-banged ! version), and thus psol in the cache can be shared with the corresponding field in the solution object, and in that case it's safe to alias (i.e., not allocate).

There's probably other approaches to this problem, but this was a good compromise I found between not using deepcopy for everything, while being able to reuse caches for propagation, (in NEOs.jl caches are reused a lot, and for good reason).

See e.g. here

lbenet · 2026-06-12T18:17:57Z

I see this is still work in process, so I left few comments of things I do not understand the motivation of. Have you performed some benchmarks, to see the improvements?

PerezHz · 2026-06-13T02:55:24Z

@lbenet thank you for your comments; to help clarify how JuliaDiff/TaylorSeries.jl#415 affects TaylorIntegration I have opened #245, which contains the minimal changes in TI needed because of JuliaDiff/TaylorSeries.jl#415. I propose to merge those other two PRs, and then we can come back to this one, which includes further changes. Sorry for the back and forth.

…jp/TS-PR-415

* Test TS PR 415 * Remove test that no longer errors * Add regression test * Import test from #243 * Fix test * Update ci.yml; bump patch version fix TS compat

PerezHz · 2026-06-14T20:22:29Z

@lbenet I have prepared benchmarks using the Kepler problem on three cases:

current main
this PR, with copy_solution = Val(false): re-use caches, solutions share memory
this PR, with copy_solution = Val(true): re-use caches, and copy (allocate) independent solution for each integration -> this is what current main does, but with expensive deepcopy

Every integration is done with 1,000 steps, order 20 of expansion wrt time, and JT order 2, saving dense output.

These benchmarks can be run using the following julia code on a fresh session with TaylorIntegration and BenchmarkTools installed, and can be run directly on either branch (main or this PR):

using TaylorIntegration
using BenchmarkTools

@taylorize function kepler!(dq, q, p, t)
    r2 = q[1]^2 + q[2]^2
    r3 = r2^1.5

    dq[1] = q[3]
    dq[2] = q[4]
    dq[3] = -q[1] / r3
    dq[4] = -q[2] / r3

    return nothing
end

const order = 20
const abstol = 1.0e-18
const reltol = 0.0
const t0 = 0.0
const tf = 31.39          # gives 1000 accepted steps with these settings
const maxsteps = 2_000
const varorder = 2

const q0 = [0.2, 0.0, 0.0, 3.0]

dq = variables!("xi", numvars = 4, order = varorder, nowarn = true)

const q0TN = q0 .+ dq

make_cache() = TaylorIntegration.init_cache(
    Val(true),
    t0,
    q0TN,
    maxsteps,
    order,
    kepler!,
    nothing;
    parse_eqs = true,
)

# run functions for each case
function run_main_or_default!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
    )
end
function run_copy_false!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
        copy_solution = Val(false),
    )
end
function run_copy_true!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
        copy_solution = Val(true),
    )
end

# run function selector based on copy_solution kwarg
function has_copy_solution()
    cache = make_cache()
    try
        run_copy_false!(cache)
        return true
    catch err
        err isa MethodError || rethrow()
        return false
    end
end

# useful info
println("TaylorIntegration source: ", pathof(TaylorIntegration))
println("copy_solution keyword available: ", has_copy_solution())

if has_copy_solution()
    cache_false = make_cache()
    sol_false = run_copy_false!(cache_false)
    println("Val(false): steps = ", length(sol_false.t) - 1)
    @btime run_copy_false!($cache_false) samples = 5 evals = 1

    cache_true = make_cache()
    sol_true = run_copy_true!(cache_true)
    println("Val(true):  steps = ", length(sol_true.t) - 1)
    @btime run_copy_true!($cache_true) samples = 5 evals = 1
else
    cache_main = make_cache()
    sol_main = run_main_or_default!(cache_main)
    println("main/default: steps = ", length(sol_main.t) - 1)
    @btime run_main_or_default!($cache_main) samples = 5 evals = 1
end

On my laptop I have obtained the following results:

Case	Time	Allocations	Memory
main	421.613 ms	3,871,504	170.56 MiB
current branch, copy_solution = Val(false)	119.937 ms	31,078	1.74 MiB
current branch, copy_solution = Val(true)	155.796 ms	1,179,139	76.50 MiB

Relative to main:

Case	Time	Allocations	Memory
branch Val(false)	~3.5x faster	~125x fewer	~98x less
branch Val(true)	~2.7x faster	~3.3x fewer	~2.2x less

EDIT: see this comment with benchmarks for NEOs.gravityonly!.

PerezHz · 2026-06-14T21:11:34Z

@lbenet thank you for your comments; I tried to simplify some of the redundant code in the new internal helper functions and tried document things better; also added benchmarks. This is now ready for review

PerezHz · 2026-06-15T01:39:28Z

I ran similar benchmarks for NEOs.gravityonly!, and found the following (1,000 steps):

For Float64 initial conditions:

Case	Time	Allocations	Memory
main	901.456 ms	242005	30.06 MiB
current branch, copy_solution = Val(false)	895.824 ms	188002	25.76 MiB
current branch, copy_solution = Val(true)	903.788 ms	194011	27.28 MiB

For order 2 JT initial conditions:

Case	Time	Allocations	Memory
main	12.990 s 14204052	975.27 MiB
current branch, copy_solution = Val(false)	12.544 s	7176104	636.48 MiB
current branch, copy_solution = Val(true)	12.492 s	8964179	793.84 MiB

Given these results, my read is the following: the equations of motion for Kepler problem are "simple" enough that runtime is dominated by dense output storage and allocation rather that TaylorSeries arithmetic operations and other computations in jetcoeffs!; hence the improvements seen in benchmarks. But equations of motion in NEOs.jl are way more computationally-intensive, i.e., those problems are more dominated by computations in jetcoeffs!, so performance improvements may not be as important. That said, it's important to have them, since memory usage improves by ~10-20%.

PerezHz added 10 commits June 6, 2026 16:03

Test TS PR 415

be19961

Remove test that no longer errors

7a60d50

Optimize dense copies

7ca650c

Optimize state storage

8c06ec0

Optimize root storage

27dafe1

Optimize stepsize

d926dbc

Update tests

56fd6f9

Fix dense psol storage; add regression test

3159ffa

Implement safe reusable psol cache option

0436167

Add docstrings for new methods

b6d2be2

Fix docs

783ec11

Update ci.yml

70c53b9

PerezHz mentioned this pull request Jun 12, 2026

Improve Taylor1 performance JuliaDiff/TaylorSeries.jl#415

Merged

lbenet reviewed Jun 12, 2026

View reviewed changes

Comment thread test/many_ode.jl Outdated

PerezHz marked this pull request as draft June 13, 2026 02:30

PerezHz mentioned this pull request Jun 13, 2026

Minimal updates due to JuliaDiff/TaylorSeries.jl#415 #245

Merged

PerezHz changed the title ~~Miscellaneous performance improvements~~ WIP: Miscellaneous performance improvements Jun 13, 2026

PerezHz and others added 6 commits June 13, 2026 14:17

Improve documentation of dense output and storage model

d5c625f

Test TS PR 415

ca349e5

Remove test that no longer errors

df9a463

Add regression test

064aa9e

Import test from #243

9dbdf06

Merge remote-tracking branch 'origin/jp/misc-perf-improvements' into …

b4cb01c

…jp/TS-PR-415

PerezHz added 5 commits June 13, 2026 20:04

Merge branch 'main' into jp/TS-PR-415

0adbf0a

Undo unwanted changes in ci.yml

ea6ce98

Remove _stored_taylor in favor of _stored_value

4be375b

Use nowarn=true in all variables! calls in tests

c79c711

Update docstrings

18b27bf

PerezHz added 3 commits June 14, 2026 16:33

Add test for re-used caches, independent solution case

5c2dea6

Remove _store_taylor!; add _store_value!

f9b706d

Fix tests (suggestion by @lbenet)

41d11a3

PerezHz marked this pull request as ready for review June 14, 2026 21:09

Bump patch version

0e71323

PerezHz changed the title ~~WIP: Miscellaneous performance improvements~~ Miscellaneous performance improvements Jun 14, 2026

Conversation

PerezHz commented Jun 7, 2026

Uh oh!

PerezHz commented Jun 7, 2026

Uh oh!

PerezHz commented Jun 7, 2026

Uh oh!

coveralls commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lbenet Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

PerezHz Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

PerezHz Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

lbenet Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

PerezHz Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PerezHz Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lbenet commented Jun 12, 2026

Uh oh!

PerezHz commented Jun 13, 2026

Uh oh!

PerezHz commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PerezHz commented Jun 14, 2026

Uh oh!

PerezHz commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Jun 8, 2026 •

edited

Loading

PerezHz Jun 13, 2026 •

edited

Loading

PerezHz commented Jun 14, 2026 •

edited

Loading

PerezHz commented Jun 15, 2026 •

edited

Loading