Skip to content

Miscellaneous performance improvements#243

Open
PerezHz wants to merge 27 commits into
mainfrom
jp/TS-PR-415
Open

Miscellaneous performance improvements#243
PerezHz wants to merge 27 commits into
mainfrom
jp/TS-PR-415

Conversation

@PerezHz

@PerezHz PerezHz commented Jun 7, 2026

Copy link
Copy Markdown
Owner

This PR aims at improving overall performance of the package from mostly two different angles:

  • On JT integrations, stepsize noticeably allocates. Here, we fix this by "taylorizing" by hand this method.
  • Avoid deepcopy and instead rely onTaylorSeries.identity!: this avoids extra work by deepcopy not needed here, while maintaning full independency of objects in memory where needed (i.e., avoiding aliasing) and minimally allocating as much as possible.
  • Other scattered fixes.

@PerezHz

PerezHz commented Jun 7, 2026

Copy link
Copy Markdown
Owner Author

This PR is not really dependent on JuliaDiff/TaylorSeries.jl#415, but that branch is checked out here temporarily to see how they work together.

@PerezHz

PerezHz commented Jun 7, 2026

Copy link
Copy Markdown
Owner Author

And fixing JuliaDiff/TaylorSeries.jl#411 did cause a error test here to be fixed, so that case can actually be tested now (see changes around kepler1! in taylorize.jl)

@coveralls

coveralls commented Jun 8, 2026

Copy link
Copy Markdown

Coverage Status

Coverage is 94.135%jp/TS-PR-415 into main. No base build found for main.

length(dest) == length(src) || return _stored_value(src)
TS.identity!(dest, src)
return dest
end

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this function.

If the lengths are the same, which I would think is the common case, _stored_value is some sort of alias to _stored_taylor, which at the end of the day returns a new object(!) using identity!, which is identical to src. I don't see that dest is updated in this case. Moreover, if the lengths are not identical, identity! is also called. So, why not simply use identity! directly...

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that _stored_taylor was essentially an alias for _stored_value; I have removed the former. The difference is precisely the allocation of a new object; I'm trying to find an easier way to do this. The problem is, even two in the method body the values of dest and src are known, the place (slot) in memory where dest will be saved is not known, so maybe this mechanism should be changed so that the slot is passed and updated accordingly.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reorganized this part of the code a bit, but left _stored_value and _copy_value! for now since still represent two genuinely different operations: _stored_value(src) means "an independent stored value is needed, and there may not be a destination yet"; _copy_value!(dest, src) means "there's already a available destination in memory; reuse it if compatible". So yes, both paths use identity!, but one copies into fresh storage and the other copies into existing storage. I would argue that distinction is important enough to keep. The direct way to use this functionality is now _store_value!, which hides that choice:

_store_value!(dest_array, src, i)
_store_value!(dest_matrix, src, i, j)

Comment thread src/rootfinding.jl

When `copy_solution` is `Val(false)`, the returned solution borrows views of the
given arrays. When `copy_solution` is `Val(true)`, `t`, `x`, `p`, `tevents`,
`xevents` and `gresids` are copied into independent owned storage.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that the views are more performant, so what's an use case for the Val(true) case?

@PerezHz PerezHz Jun 13, 2026

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly related to object ownership: taylorinteg! (with the bang !) allows users to reuse caches, i.e., it uses a shared memory model, but since the .psol field is included in the cache, when that cache is reused to construct another solution, psol of the previous solution is overwritten with the new solution. So in this case we want copy_solution = Val(true), which is essentially saying reuse caches, but construct independent solution objects which "own" all their fields (thus, an independent copy is made at solution object construction).

The case copy_solution = Val(false) means users will not necessarily reuse the cache (e.g., by using taylorinteg (non-banged ! version), and thus psol in the cache can be shared with the corresponding field in the solution object, and in that case it's safe to alias (i.e., not allocate).

There's probably other approaches to this problem, but this was a good compromise I found between not using deepcopy for everything, while being able to reuse caches for propagation, (in NEOs.jl caches are reused a lot, and for good reason).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See e.g. here

Comment thread test/many_ode.jl Outdated
@lbenet

lbenet commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

I see this is still work in process, so I left few comments of things I do not understand the motivation of. Have you performed some benchmarks, to see the improvements?

@PerezHz PerezHz marked this pull request as draft June 13, 2026 02:30
@PerezHz

PerezHz commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

@lbenet thank you for your comments; to help clarify how JuliaDiff/TaylorSeries.jl#415 affects TaylorIntegration I have opened #245, which contains the minimal changes in TI needed because of JuliaDiff/TaylorSeries.jl#415. I propose to merge those other two PRs, and then we can come back to this one, which includes further changes. Sorry for the back and forth.

@PerezHz PerezHz changed the title Miscellaneous performance improvements WIP: Miscellaneous performance improvements Jun 13, 2026
PerezHz added a commit that referenced this pull request Jun 13, 2026
* Test TS PR 415

* Remove test that no longer errors

* Add regression test

* Import test from #243

* Fix test

* Update ci.yml; bump patch version fix TS compat
@PerezHz

PerezHz commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

@lbenet I have prepared benchmarks using the Kepler problem on three cases:

  1. current main
  2. this PR, with copy_solution = Val(false): re-use caches, solutions share memory
  3. this PR, with copy_solution = Val(true): re-use caches, and copy (allocate) independent solution for each integration -> this is what current main does, but with expensive deepcopy

Every integration is done with 1,000 steps, order 20 of expansion wrt time, and JT order 2, saving dense output.

These benchmarks can be run using the following julia code on a fresh session with TaylorIntegration and BenchmarkTools installed, and can be run directly on either branch (main or this PR):

using TaylorIntegration
using BenchmarkTools

@taylorize function kepler!(dq, q, p, t)
    r2 = q[1]^2 + q[2]^2
    r3 = r2^1.5

    dq[1] = q[3]
    dq[2] = q[4]
    dq[3] = -q[1] / r3
    dq[4] = -q[2] / r3

    return nothing
end

const order = 20
const abstol = 1.0e-18
const reltol = 0.0
const t0 = 0.0
const tf = 31.39          # gives 1000 accepted steps with these settings
const maxsteps = 2_000
const varorder = 2

const q0 = [0.2, 0.0, 0.0, 3.0]

dq = variables!("xi", numvars = 4, order = varorder, nowarn = true)

const q0TN = q0 .+ dq

make_cache() = TaylorIntegration.init_cache(
    Val(true),
    t0,
    q0TN,
    maxsteps,
    order,
    kepler!,
    nothing;
    parse_eqs = true,
)

# run functions for each case
function run_main_or_default!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
    )
end
function run_copy_false!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
        copy_solution = Val(false),
    )
end
function run_copy_true!(cache)
    TaylorIntegration.taylorinteg!(
        Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
        maxsteps = maxsteps,
        reltol = reltol,
        copy_solution = Val(true),
    )
end

# run function selector based on copy_solution kwarg
function has_copy_solution()
    cache = make_cache()
    try
        run_copy_false!(cache)
        return true
    catch err
        err isa MethodError || rethrow()
        return false
    end
end

# useful info
println("TaylorIntegration source: ", pathof(TaylorIntegration))
println("copy_solution keyword available: ", has_copy_solution())

if has_copy_solution()
    cache_false = make_cache()
    sol_false = run_copy_false!(cache_false)
    println("Val(false): steps = ", length(sol_false.t) - 1)
    @btime run_copy_false!($cache_false) samples = 5 evals = 1

    cache_true = make_cache()
    sol_true = run_copy_true!(cache_true)
    println("Val(true):  steps = ", length(sol_true.t) - 1)
    @btime run_copy_true!($cache_true) samples = 5 evals = 1
else
    cache_main = make_cache()
    sol_main = run_main_or_default!(cache_main)
    println("main/default: steps = ", length(sol_main.t) - 1)
    @btime run_main_or_default!($cache_main) samples = 5 evals = 1
end

On my laptop I have obtained the following results:

Case Time Allocations Memory
main 421.613 ms 3,871,504 170.56 MiB
current branch, copy_solution = Val(false) 119.937 ms 31,078 1.74 MiB
current branch, copy_solution = Val(true) 155.796 ms 1,179,139 76.50 MiB

Relative to main:

Case Time Allocations Memory
branch Val(false) ~3.5x faster ~125x fewer ~98x less
branch Val(true) ~2.7x faster ~3.3x fewer ~2.2x less

EDIT: see this comment with benchmarks for NEOs.gravityonly!.

@PerezHz PerezHz marked this pull request as ready for review June 14, 2026 21:09
@PerezHz

PerezHz commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

@lbenet thank you for your comments; I tried to simplify some of the redundant code in the new internal helper functions and tried document things better; also added benchmarks. This is now ready for review

@PerezHz PerezHz changed the title WIP: Miscellaneous performance improvements Miscellaneous performance improvements Jun 14, 2026
@PerezHz

PerezHz commented Jun 15, 2026

Copy link
Copy Markdown
Owner Author

I ran similar benchmarks for NEOs.gravityonly!, and found the following (1,000 steps):

  • For Float64 initial conditions:
Case Time Allocations Memory
main 901.456 ms 242005 30.06 MiB
current branch, copy_solution = Val(false) 895.824 ms 188002 25.76 MiB
current branch, copy_solution = Val(true) 903.788 ms 194011 27.28 MiB
  • For order 2 JT initial conditions:
Case Time Allocations Memory
main 12.990 s 14204052 975.27 MiB
current branch, copy_solution = Val(false) 12.544 s 7176104 636.48 MiB
current branch, copy_solution = Val(true) 12.492 s 8964179 793.84 MiB

Given these results, my read is the following: the equations of motion for Kepler problem are "simple" enough that runtime is dominated by dense output storage and allocation rather that TaylorSeries arithmetic operations and other computations in jetcoeffs!; hence the improvements seen in benchmarks. But equations of motion in NEOs.jl are way more computationally-intensive, i.e., those problems are more dominated by computations in jetcoeffs!, so performance improvements may not be as important. That said, it's important to have them, since memory usage improves by ~10-20%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants