feat: add per-step timing instrumentation for non-GPU server store path#375
Draft
Copilot wants to merge 3 commits into
Draft
feat: add per-step timing instrumentation for non-GPU server store path#375Copilot wants to merge 3 commits into
Copilot wants to merge 3 commits into
Conversation
Add per-step timing logs to the server-side non-GPU store path so that performance can be profiled comparably to the GPU IPC path. non_gpu_transfer.py: - [SRV-PREPARE-STORE]: times context/strategy lookup (resolve_keys) and strategy.prepare_store() call, logs strategy name (shm/pickle) - [SRV-COMMIT-STORE]: times strategy.commit_store() call and total time since prepare, logs strategy name and token count - Imports ShmTransferStrategy for isinstance strategy detection server_transfer.py: - Adds `import time` - [PICKLE-COMMIT]: per-step breakdown of deserialize / reserve_write / copy_loop / finish_write with total (PickleTransferStrategy.commit_store) - [SHM-PREPARE]: per-step breakdown of resolve_keys / reserve_write / slot-descriptor loop with total (ShmTransferStrategy.prepare_store) - [SHM-COMMIT]: finish_write and total timing for the SHM fast path (ShmTransferStrategy.commit_store, non-fallback branch only) All timing uses time.perf_counter(); all logs use %s/%.3f format strings.
Add abstract strategy_name property to TransferStrategy base class, overridden as "pickle" in PickleTransferStrategy and "shm" in ShmTransferStrategy. Update non_gpu_transfer.py to call strategy.strategy_name directly, removing the isinstance coupling and the now-unnecessary ShmTransferStrategy import.
Copilot
AI
changed the title
[WIP] Add detailed timing instrumentation for non-GPU store path
feat: add per-step timing instrumentation for non-GPU server store path
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The non-GPU (async data) server-side store path only emitted a single
"Stored N tokens in X seconds"log with no per-step breakdown, unlike the GPU IPC path which has detailed stage timings.New log output
non_gpu_transfer.py— outer module layer:server_transfer.py— pickle path:server_transfer.py— SHM path:Changes
TransferStrategy(ABC): adds abstractstrategy_nameproperty;PickleTransferStrategyreturns"pickle",ShmTransferStrategyreturns"shm".PickleTransferStrategy.commit_store(): instrumentspickle.loads→reserve_write→ tensor copy loop →finish_writewith[PICKLE-COMMIT].ShmTransferStrategy.prepare_store(): instrumentsresolve_obj_keys→reserve_write→ slot-descriptor loop with[SHM-PREPARE].ShmTransferStrategy.commit_store(): instrumentsfinish_writeon the SHM fast path (non-fallback branch) with[SHM-COMMIT]; fallback goes through[PICKLE-COMMIT].NonGPUTransferModule.prepare_store(): wrapsstrategy.prepare_store()with[SRV-PREPARE-STORE](context-lookup + strategy-call breakdown).NonGPUTransferModule.commit_store(): wrapsstrategy.commit_store()with[SRV-COMMIT-STORE]includingtotal_since_prepareend-to-end latency.All timings use
time.perf_counter(); all log calls use%s/%.3fformat strings. No functional logic changed.