Skip to content

Replace per-step store timing logs with unified StoreTimer utility#376

Draft
Copilot wants to merge 4 commits into
copilot/ww24-pr-async-againfrom
copilot/create-store-timer-util
Draft

Replace per-step store timing logs with unified StoreTimer utility#376
Copilot wants to merge 4 commits into
copilot/ww24-pr-async-againfrom
copilot/create-store-timer-util

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

The four store paths (GPU IPC C++ ops, GPU IPC Python ops, Async SHM, Async Pickle) each emitted verbose per-step logger.info timing lines ([FWD ...], [BG ...], [GPU-STORE], [PICKLE-COMMIT], [SHM-PREPARE], etc.) unconditionally at INFO level, adding noise and time.perf_counter() overhead on every request in production.

Changes

New: lmcache/v1/multiprocess/store_timer.py

StoreTimer — zero-overhead timing utility gated on logger.isEnabledFor(logging.DEBUG) at construction. No timestamps are taken in production. Emits a single [STORE-TIMING] debug line with four standardized metrics across all paths:

[STORE-TIMING] req=<id> path=<gpu_ipc|shm|pickle>
    fwd_return=X copy_submitted=X kv_releasable=X e2e_complete=X ms

Metrics:

  • fwd_return — time until the vLLM forward thread is unblocked
  • copy_submitted — time until all GPU copies are enqueued and CPU returns
  • kv_releasable — time until GPU has finished reading paged KV (safe to evict)
  • e2e_complete — time until data is committed and retrievable

Integration

  • gpu_transfer.pypath="gpu_ipc": marks fwd_return at entry (FWD already returned on server side), copy_submitted after event.record(), kv_releasable at gpu_sync point, e2e_complete after submit_callback_to_stream for finish_write
  • async_data.pypath="shm" or "pickle": marks fwd_return when submit_store returns to the forward thread; background closure marks copy_submitted after stream exits, kv_releasable after gather_done.synchronize(), e2e_complete after commit_store()

Cleanup

  • Removes all verbose per-step logger.info timing blocks from gpu_transfer.py, async_data.py, non_gpu_transfer.py, and server_transfer.py
  • Removes now-unused time.perf_counter() variables; removes import time from server_transfer.py
  • Retains existing "Stored %d tokens in %.3f seconds" info-level logs in gpu_transfer.py and non_gpu_transfer.py

Copilot AI changed the title [WIP] Add StoreTimer utility for unified timing logs Replace per-step store timing logs with unified StoreTimer utility Jun 18, 2026
Copilot AI requested a review from hlin99 June 18, 2026 03:27
* Initial plan

* Fix StoreTimer placement for async stores

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants