Skip to content

Add comprehensive profiling instrumentation to async_data.py#371

Draft
Copilot wants to merge 3 commits into
copilot/ww24-pr-async-againfrom
copilot/profilingasync-data-timing
Draft

Add comprehensive profiling instrumentation to async_data.py#371
Copilot wants to merge 3 commits into
copilot/ww24-pr-async-againfrom
copilot/profilingasync-data-timing

Conversation

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown

Adds time.perf_counter() timing to both the forward thread and background gather/commit thread in AsyncDataTransferContext to pinpoint where latency is spent in the async KV store path.

Forward thread — [FWD <request_id>] log per submit_store() call

Field Measures
lock inflight_lock acquire + _is_closing check
prepare prepare_store()
block_ids _single_group_block_ids()
alloc SHM view lookup or pinned staging allocation
submit commit_executor.submit()
bookkeep inflight_commits add + callback registration
total full forward thread time (_t_entry_t_exit)

Background thread — [BG <request_id>] log per gather/commit task

thread_start · stream_enter · event_wait · gather_launch · record · stream_exit · lock · sync · commit · total

Also instrumented

  • flush_inflight_gathers() — logs event count and sync duration
  • close() — logs gather drain and executor shutdown durations

Fixes

  • import time moved to module level (was inside submit_store)
  • Fixed missing total argument in [FWD] log call (format string had 10 specifiers, only 9 args — would raise TypeError at runtime)
  • used_shm_direct (outer scope) used in log instead of _used_shm_direct (closure-captured copy)

No functional changes.

@hlin99

hlin99 commented Jun 16, 2026

Copy link
Copy Markdown
Owner

(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] EngineCore encountered a fatal error.
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] Traceback (most recent call last):
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1158, in run_engine_core
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] engine_core.run_busy_loop()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1199, in run_busy_loop
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self._process_engine_step()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1238, in _process_engine_step
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] outputs, model_executed = self.step_fn()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 499, in step_with_batch_queue
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] exec_future = self.model_executor.execute_model(
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 119, in execute_model
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] output.result()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] return self.__get_result()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] raise self._exception
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 97, in collective_rpc
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] return func(*args, **kwargs)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 345, in execute_model
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] return self.worker.execute_model(scheduler_output)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] return func(*args, **kwargs)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 841, in execute_model
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] output = self.model_runner.execute_model(
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] return func(*args, **kwargs)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4213, in execute_model
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] with (
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] next(self.gen)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/kv_connector_model_runner_mixin.py", line 107, in _get_kv_connector_output
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] kv_connector.wait_for_save()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/workspace/LMCache/lmcache/integration/vllm/lmcache_mp_connector.py", line 745, in wait_for_save
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self.worker_adapter.batched_submit_store_requests(
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/workspace/LMCache/lmcache/integration/vllm/vllm_multi_process_adapter.py", line 1263, in batched_submit_store_requests
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self.submit_store_request(request_id, op, event, cache_salt=salt)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/workspace/LMCache/lmcache/integration/vllm/vllm_multi_process_adapter.py", line 1177, in submit_store_request
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] future = self.transfer_ctx.submit_store(
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/workspace/LMCache/lmcache/v1/multiprocess/transfer_context/async_data.py", line 314, in submit_store
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] logger.info(
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 1539, in info
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self._log(INFO, msg, args, **kwargs)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 1684, in _log
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self.handle(record)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 1700, in handle
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self.callHandlers(record)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 1762, in callHandlers
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] hdlr.handle(record)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 1028, in handle
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] self.emit(record)
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/opentelemetry/sdk/_logs/_internal/init.py", line 617, in emit
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] logger.emit(self._translate(record))
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/local/lib/python3.12/dist-packages/opentelemetry/sdk/_logs/_internal/init.py", line 591, in _translate
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] body = record.getMessage()
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] File "/usr/lib/python3.12/logging/init.py", line 392, in getMessage
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] msg = msg % self.args
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] ~~~~^~~~~~~~~~~
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] TypeError: not enough arguments for format string
(EngineCore pid=108285) [2026-06-16 08:21:43,582] LMCache INFO: Unregistering kv caches (vllm_multi_process_adapter.py:1449:lmcache.integration.vllm.vllm_multi_process_adapter)
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] AsyncLLM output_handler failed.
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] Traceback (most recent call last):
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 660, in output_handler
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] outputs = await engine_core.get_output_async()
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 1030, in get_output_async
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] raise self._format_exception(outputs) from None
(APIServer pid=108019) ERROR 06-16 08:21:43 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=108019) INFO: 127.0.0.1:47432 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

@copilot

Copilot AI changed the title [WIP] Add comprehensive profiling instrumentation to async_data.py Add comprehensive profiling instrumentation to async_data.py Jun 16, 2026
Copilot AI requested a review from hlin99 June 16, 2026 08:09

Copilot AI commented Jun 16, 2026

Copy link
Copy Markdown
Author

(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] EngineCore encountered a fatal error.
(EngineCore pid=108285) ERROR 06-16 08:21:43 [core.py:1167] Traceback (most recent call last):
(Engi...

This was already fixed in commit 5d74074 — the [FWD] log call was missing the total argument (10 specifiers but only 9 args). The current HEAD has (_t_exit - _t_entry) * 1000 as the 8th positional arg, making the count correct. The error you hit was from the intermediate commit 5101739 that had the bug. Please pull the latest and retry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants