Skip to content

feat: add per-step timing instrumentation for non-GPU server store path#375

Draft
Copilot wants to merge 3 commits into
copilot/ww24-pr-async-againfrom
copilot/add-detailed-timing-non-gpu
Draft

feat: add per-step timing instrumentation for non-GPU server store path#375
Copilot wants to merge 3 commits into
copilot/ww24-pr-async-againfrom
copilot/add-detailed-timing-non-gpu

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

The non-GPU (async data) server-side store path only emitted a single "Stored N tokens in X seconds" log with no per-step breakdown, unlike the GPU IPC path which has detailed stage timings.

New log output

non_gpu_transfer.py — outer module layer:

[SRV-PREPARE-STORE] req=<id> resolve_keys=0.012 prepare=1.234 total=1.246 ms (strategy=shm)
[SRV-COMMIT-STORE]  req=<id> commit=0.987 total_since_prepare=2.345 ms (strategy=shm, num_tokens=256)

server_transfer.py — pickle path:

[PICKLE-COMMIT] req=<id> deserialize=0.543 reserve_write=0.211 copy_loop=1.102 finish_write=0.089 total=1.945 ms (num_chunks=4)

server_transfer.py — SHM path:

[SHM-PREPARE] req=<id> resolve_keys=0.234 reserve_write=0.156 slots=0.043 total=0.433 ms (num_slots=4)
[SHM-COMMIT]  req=<id> finish_write=0.078 total=0.134 ms (num_keys=4)

Changes

  • TransferStrategy (ABC): adds abstract strategy_name property; PickleTransferStrategy returns "pickle", ShmTransferStrategy returns "shm".
  • PickleTransferStrategy.commit_store(): instruments pickle.loadsreserve_write → tensor copy loop → finish_write with [PICKLE-COMMIT].
  • ShmTransferStrategy.prepare_store(): instruments resolve_obj_keysreserve_write → slot-descriptor loop with [SHM-PREPARE].
  • ShmTransferStrategy.commit_store(): instruments finish_write on the SHM fast path (non-fallback branch) with [SHM-COMMIT]; fallback goes through [PICKLE-COMMIT].
  • NonGPUTransferModule.prepare_store(): wraps strategy.prepare_store() with [SRV-PREPARE-STORE] (context-lookup + strategy-call breakdown).
  • NonGPUTransferModule.commit_store(): wraps strategy.commit_store() with [SRV-COMMIT-STORE] including total_since_prepare end-to-end latency.

All timings use time.perf_counter(); all log calls use %s/%.3f format strings. No functional logic changed.

Copilot AI added 2 commits June 18, 2026 02:45
Add per-step timing logs to the server-side non-GPU store path so that
performance can be profiled comparably to the GPU IPC path.

non_gpu_transfer.py:
- [SRV-PREPARE-STORE]: times context/strategy lookup (resolve_keys) and
  strategy.prepare_store() call, logs strategy name (shm/pickle)
- [SRV-COMMIT-STORE]: times strategy.commit_store() call and total time
  since prepare, logs strategy name and token count
- Imports ShmTransferStrategy for isinstance strategy detection

server_transfer.py:
- Adds `import time`
- [PICKLE-COMMIT]: per-step breakdown of deserialize / reserve_write /
  copy_loop / finish_write with total (PickleTransferStrategy.commit_store)
- [SHM-PREPARE]: per-step breakdown of resolve_keys / reserve_write /
  slot-descriptor loop with total (ShmTransferStrategy.prepare_store)
- [SHM-COMMIT]: finish_write and total timing for the SHM fast path
  (ShmTransferStrategy.commit_store, non-fallback branch only)

All timing uses time.perf_counter(); all logs use %s/%.3f format strings.
Add abstract strategy_name property to TransferStrategy base class,
overridden as "pickle" in PickleTransferStrategy and "shm" in
ShmTransferStrategy. Update non_gpu_transfer.py to call
strategy.strategy_name directly, removing the isinstance coupling and
the now-unnecessary ShmTransferStrategy import.
Copilot AI changed the title [WIP] Add detailed timing instrumentation for non-GPU store path feat: add per-step timing instrumentation for non-GPU server store path Jun 18, 2026
Copilot AI requested a review from hlin99 June 18, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants