Miscellaneous performance improvements#243
Conversation
|
This PR is not really dependent on JuliaDiff/TaylorSeries.jl#415, but that branch is checked out here temporarily to see how they work together. |
|
And fixing JuliaDiff/TaylorSeries.jl#411 did cause a error test here to be fixed, so that case can actually be tested now (see changes around |
| length(dest) == length(src) || return _stored_value(src) | ||
| TS.identity!(dest, src) | ||
| return dest | ||
| end |
There was a problem hiding this comment.
I don't think I understand this function.
If the lengths are the same, which I would think is the common case, _stored_value is some sort of alias to _stored_taylor, which at the end of the day returns a new object(!) using identity!, which is identical to src. I don't see that dest is updated in this case. Moreover, if the lengths are not identical, identity! is also called. So, why not simply use identity! directly...
There was a problem hiding this comment.
You're right that _stored_taylor was essentially an alias for _stored_value; I have removed the former. The difference is precisely the allocation of a new object; I'm trying to find an easier way to do this. The problem is, even two in the method body the values of dest and src are known, the place (slot) in memory where dest will be saved is not known, so maybe this mechanism should be changed so that the slot is passed and updated accordingly.
There was a problem hiding this comment.
I reorganized this part of the code a bit, but left _stored_value and _copy_value! for now since still represent two genuinely different operations: _stored_value(src) means "an independent stored value is needed, and there may not be a destination yet"; _copy_value!(dest, src) means "there's already a available destination in memory; reuse it if compatible". So yes, both paths use identity!, but one copies into fresh storage and the other copies into existing storage. I would argue that distinction is important enough to keep. The direct way to use this functionality is now _store_value!, which hides that choice:
_store_value!(dest_array, src, i)
_store_value!(dest_matrix, src, i, j)|
|
||
| When `copy_solution` is `Val(false)`, the returned solution borrows views of the | ||
| given arrays. When `copy_solution` is `Val(true)`, `t`, `x`, `p`, `tevents`, | ||
| `xevents` and `gresids` are copied into independent owned storage. |
There was a problem hiding this comment.
I would guess that the views are more performant, so what's an use case for the Val(true) case?
There was a problem hiding this comment.
This is mainly related to object ownership: taylorinteg! (with the bang !) allows users to reuse caches, i.e., it uses a shared memory model, but since the .psol field is included in the cache, when that cache is reused to construct another solution, psol of the previous solution is overwritten with the new solution. So in this case we want copy_solution = Val(true), which is essentially saying reuse caches, but construct independent solution objects which "own" all their fields (thus, an independent copy is made at solution object construction).
The case copy_solution = Val(false) means users will not necessarily reuse the cache (e.g., by using taylorinteg (non-banged ! version), and thus psol in the cache can be shared with the corresponding field in the solution object, and in that case it's safe to alias (i.e., not allocate).
There's probably other approaches to this problem, but this was a good compromise I found between not using deepcopy for everything, while being able to reuse caches for propagation, (in NEOs.jl caches are reused a lot, and for good reason).
|
I see this is still work in process, so I left few comments of things I do not understand the motivation of. Have you performed some benchmarks, to see the improvements? |
|
@lbenet thank you for your comments; to help clarify how JuliaDiff/TaylorSeries.jl#415 affects TaylorIntegration I have opened #245, which contains the minimal changes in TI needed because of JuliaDiff/TaylorSeries.jl#415. I propose to merge those other two PRs, and then we can come back to this one, which includes further changes. Sorry for the back and forth. |
* Test TS PR 415 * Remove test that no longer errors * Add regression test * Import test from #243 * Fix test * Update ci.yml; bump patch version fix TS compat
|
@lbenet I have prepared benchmarks using the Kepler problem on three cases:
Every integration is done with 1,000 steps, order 20 of expansion wrt time, and JT order 2, saving dense output. These benchmarks can be run using the following julia code on a fresh session with TaylorIntegration and BenchmarkTools installed, and can be run directly on either branch ( using TaylorIntegration
using BenchmarkTools
@taylorize function kepler!(dq, q, p, t)
r2 = q[1]^2 + q[2]^2
r3 = r2^1.5
dq[1] = q[3]
dq[2] = q[4]
dq[3] = -q[1] / r3
dq[4] = -q[2] / r3
return nothing
end
const order = 20
const abstol = 1.0e-18
const reltol = 0.0
const t0 = 0.0
const tf = 31.39 # gives 1000 accepted steps with these settings
const maxsteps = 2_000
const varorder = 2
const q0 = [0.2, 0.0, 0.0, 3.0]
dq = variables!("xi", numvars = 4, order = varorder, nowarn = true)
const q0TN = q0 .+ dq
make_cache() = TaylorIntegration.init_cache(
Val(true),
t0,
q0TN,
maxsteps,
order,
kepler!,
nothing;
parse_eqs = true,
)
# run functions for each case
function run_main_or_default!(cache)
TaylorIntegration.taylorinteg!(
Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
maxsteps = maxsteps,
reltol = reltol,
)
end
function run_copy_false!(cache)
TaylorIntegration.taylorinteg!(
Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
maxsteps = maxsteps,
reltol = reltol,
copy_solution = Val(false),
)
end
function run_copy_true!(cache)
TaylorIntegration.taylorinteg!(
Val(true), kepler!, q0TN, t0, tf, abstol, cache, nothing;
maxsteps = maxsteps,
reltol = reltol,
copy_solution = Val(true),
)
end
# run function selector based on copy_solution kwarg
function has_copy_solution()
cache = make_cache()
try
run_copy_false!(cache)
return true
catch err
err isa MethodError || rethrow()
return false
end
end
# useful info
println("TaylorIntegration source: ", pathof(TaylorIntegration))
println("copy_solution keyword available: ", has_copy_solution())
if has_copy_solution()
cache_false = make_cache()
sol_false = run_copy_false!(cache_false)
println("Val(false): steps = ", length(sol_false.t) - 1)
@btime run_copy_false!($cache_false) samples = 5 evals = 1
cache_true = make_cache()
sol_true = run_copy_true!(cache_true)
println("Val(true): steps = ", length(sol_true.t) - 1)
@btime run_copy_true!($cache_true) samples = 5 evals = 1
else
cache_main = make_cache()
sol_main = run_main_or_default!(cache_main)
println("main/default: steps = ", length(sol_main.t) - 1)
@btime run_main_or_default!($cache_main) samples = 5 evals = 1
endOn my laptop I have obtained the following results:
Relative to
EDIT: see this comment with benchmarks for |
|
@lbenet thank you for your comments; I tried to simplify some of the redundant code in the new internal helper functions and tried document things better; also added benchmarks. This is now ready for review |
|
I ran similar benchmarks for
Given these results, my read is the following: the equations of motion for Kepler problem are "simple" enough that runtime is dominated by dense output storage and allocation rather that TaylorSeries arithmetic operations and other computations in |
This PR aims at improving overall performance of the package from mostly two different angles:
stepsizenoticeably allocates. Here, we fix this by "taylorizing" by hand this method.deepcopyand instead rely onTaylorSeries.identity!: this avoids extra work bydeepcopynot needed here, while maintaning full independency of objects in memory where needed (i.e., avoiding aliasing) and minimally allocating as much as possible.