Skip to content

move wake-stage to (sync) prim.#1854

Open
Will Dietz (dtzSiFive) wants to merge 2 commits into
feature/multi-wakefrom
feature/stage-as-prim
Open

move wake-stage to (sync) prim.#1854
Will Dietz (dtzSiFive) wants to merge 2 commits into
feature/multi-wakefrom
feature/stage-as-prim

Conversation

@dtzSiFive
Copy link
Copy Markdown
Contributor

@dtzSiFive Will Dietz (dtzSiFive) commented May 14, 2026

The job overhead for wake-stage's trivial amount of work amplifies costs
especially of startup latency when sourcing many files.

Additionally, the point of staging is to get the required information
ASAP so as to not race with concurrent modifications.

Putting into job queue is more overhead than just doing the reflink
directly, and as a primitive we ensure the staging is done immediately
not stuck behind a queue of hashing jobs.

This was especially noticeable with concurrent runs (multi-wake).

Issue diagnosed with help of wake --ps! :)

Assisted by Claude.


1347089 shows wake-stage as a "move" which is the bulk of the work being done here; I couldn't help but unify this code to preferring wcl::result instead of exceptions while touching which apparently crosses the line.

The job overhead for wake-stage's trivial amount of work amplifies costs
especially of startup latency when sourcing many files.

Additionally, the point of staging is to get the required information
ASAP so as to not race with concurrent modifications.

Putting into job queue is more overhead than just doing the reflink
directly, and as a primitive we ensure the staging is done immediately
not stuck behind a queue of hashing jobs.

This was especially noticeable with concurrent runs (multi-wake).

Issue diagnosed with help of `wake --ps`! :)
@dtzSiFive
Copy link
Copy Markdown
Contributor Author

Compared to v49 on a "source every file in a large repo" pathological (if not representative or necessarily disproving possible regressions in other situations) test case:

Nuking cache/db between runs:

Benchmark 1: mwprim-single
  Time (mean ± σ):     74.711 s ±  2.151 s    [User: 121.918 s, System: 79.676 s]
  Range (min … max):   72.555 s … 78.342 s    10 runs

Benchmark 2: w49-single
  Time (mean ± σ):     109.129 s ±  1.928 s    [User: 221.944 s, System: 138.339 s]
  Range (min … max):   107.217 s … 112.847 s    10 runs

Nuking cache/db between batches and priming with warmup run:

Benchmark 1: mwprim-single
  Time (mean ± σ):     14.928 s ±  0.214 s    [User: 10.183 s, System: 4.539 s]
  Range (min … max):   14.722 s … 15.478 s    10 runs

Benchmark 2: w49-single
  Time (mean ± σ):     23.402 s ±  0.463 s    [User: 18.275 s, System: 4.887 s]
  Range (min … max):   22.964 s … 24.239 s    10 runs

With this change, wake (with WAKE_CAS=1, of course) does considerably less work (user/system) in less wall time.

Benchmark was this (quoted for passing to hyperfine):

'WAKE_CAS=1 wake -x "sources \".\" \`.*\`|rmap len"'

RETURN(claim_result(runtime.heap, false, err));
};

std::stringstream paths_stream(std::string(paths_arg->c_str(), paths_arg->size()));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in detail elsewhere, the input path here both uses an arguably inappropriate separator character (more importantly, inconsistent with elsewhere) and makes many copies of the string in full or in part while processing.

If not in this PR, then in a follow-up:

  1. Use null-terminator separator, including one at end if needed explicitly
  2. Don't use stringstream (!)
  3. Split into std::string_views using string::find (-> memchr)

One neat bonus of this approach besides being zero-copy is that we can pass C-strings (null-terminated) to syscalls directly without copying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant