move wake-stage to (sync) prim.#1854
Conversation
a8c5d11 to
1a6231b
Compare
The job overhead for wake-stage's trivial amount of work amplifies costs especially of startup latency when sourcing many files. Additionally, the point of staging is to get the required information ASAP so as to not race with concurrent modifications. Putting into job queue is more overhead than just doing the reflink directly, and as a primitive we ensure the staging is done immediately not stuck behind a queue of hashing jobs. This was especially noticeable with concurrent runs (multi-wake). Issue diagnosed with help of `wake --ps`! :)
1a6231b to
941fa5f
Compare
|
Compared to v49 on a "source every file in a large repo" pathological (if not representative or necessarily disproving possible regressions in other situations) test case: Nuking cache/db between runs: Nuking cache/db between batches and priming with warmup run: With this change, wake (with WAKE_CAS=1, of course) does considerably less work (user/system) in less wall time. Benchmark was this (quoted for passing to hyperfine): |
| RETURN(claim_result(runtime.heap, false, err)); | ||
| }; | ||
|
|
||
| std::stringstream paths_stream(std::string(paths_arg->c_str(), paths_arg->size())); |
There was a problem hiding this comment.
As noted in detail elsewhere, the input path here both uses an arguably inappropriate separator character (more importantly, inconsistent with elsewhere) and makes many copies of the string in full or in part while processing.
If not in this PR, then in a follow-up:
- Use null-terminator separator, including one at end if needed explicitly
- Don't use stringstream (!)
- Split into
std::string_views usingstring::find(->memchr)
One neat bonus of this approach besides being zero-copy is that we can pass C-strings (null-terminated) to syscalls directly without copying!
The job overhead for wake-stage's trivial amount of work amplifies costs
especially of startup latency when sourcing many files.
Additionally, the point of staging is to get the required information
ASAP so as to not race with concurrent modifications.
Putting into job queue is more overhead than just doing the reflink
directly, and as a primitive we ensure the staging is done immediately
not stuck behind a queue of hashing jobs.
This was especially noticeable with concurrent runs (multi-wake).
Issue diagnosed with help of
wake --ps! :)Assisted by Claude.
1347089 shows wake-stage as a "move" which is the bulk of the work being done here; I couldn't help but unify this code to preferring wcl::result instead of exceptions while touching which apparently crosses the line.