feat(storage): add write-back to local CPU for non-blocking get paths by jooho-XCENA · Pull Request #3 · jooho-XCENA/LMCache

jooho-XCENA · 2026-03-31T06:24:52Z

get_non_blocking: add done callback to write-back fetched data to LocalCPUBackend, matching existing get() behavior
prefetch_single_done_callback: write-back prefetched data to LocalCPUBackend after async prefetch completes

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

- get_non_blocking: add done callback to write-back fetched data to LocalCPUBackend, matching existing get() behavior - prefetch_single_done_callback: write-back prefetched data to LocalCPUBackend after async prefetch completes Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Align error handling with prefetch_single_done_callback for consistency. Prevents unhandled exceptions in Future callbacks. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Align with existing get() and batched_get() which exclude MaruBackend from write-back to LocalCPUBackend. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

* add new workload to cli bench Signed-off-by: deng451e <838677410@qq.com>

…che#2922) Add a top-level `gds_path_sharding` config field (default: "by_gpu") that controls how GPUs are assigned to storage paths when multiple comma-separated paths are provided in `gds_path`. This replaces the previously hardcoded by_gpu logic with an explicit, extensible setting. Currently only "by_gpu" is supported (selects path via `device_id % num_paths`); unsupported values raise AssertionError. Generated with [Devin](https://cli.devin.ai/docs) Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com> Co-authored-by: Devin <noreply@cognition.ai>

…he#2949) * [Feat]: Add environment variable support for RESP adapter auth Support LMCACHE_RESP_USERNAME, LMCACHE_RESP_PASSWORD, LMCACHE_RESP_HOST, and LMCACHE_RESP_PORT environment variables in both MP and non-MP modes. Env vars are read inside the adapter at creation time so credentials are never stored in the config object or printed in startup logs. Signed-off-by: Samuel Shen <slshen@tensormesh.ai> * [Feat]: Fix env var precedence and add unit tests for RESP env vars Change precedence so config/CLI args override env vars (env vars serve as defaults). Add unit tests for the precedence logic in both MP and non-MP modes. Signed-off-by: Samuel Shen <slshen@tensormesh.ai> --------- Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

) * Refactor: Auto-align pd_buffer_size down to nearest chunk size multiple - Add buffer size alignment logic to prevent assertion error - Calculate aligned_buffer_size as (origin_size // chunk_size) * chunk_size - Add informative logging when buffer size is adjusted - Release excess buffer memory that can't be aligned - Follows the same pattern as local_cpu_backend.py Signed-off-by: Tony Lin <tony.lin@intel.com> * refine the code per gemini's suggestions Signed-off-by: Tony Lin <tony.lin@intel.com> * refine log msg Signed-off-by: Tony Lin <tony.lin@intel.com> * streamline pd backend buffer alignement Signed-off-by: Tony Lin <tony.lin@intel.com> * Fix test hang by adding backend cleanup Signed-off-by: Tony Lin <tony.lin@intel.com> * remove UT Signed-off-by: Tony Lin <tony.lin@intel.com> * add UT Signed-off-by: Tony Lin <tony.lin@intel.com> * doc update Signed-off-by: Tony Lin <tony.lin@intel.com> * rename function name Signed-off-by: Tony Lin <tony.lin@intel.com> --------- Signed-off-by: Tony Lin <tony.lin@intel.com>

) * [Chore] Add CODEOWNERS for automated PR review assignments Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Chore] Add sammshen to resp L2 adapter ownership Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Chore] Add sammshen to csrc/storage_backends and native connector L2 adapters Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Chore] Add YaoJiayi to L2 eviction ownership Signed-off-by: Samuel Shen <slshen@uchciago.edu> * [Chore] Add OasisGit to multiprocess and http_server ownership Signed-off-by: Samuel Shen <slshen@uchciago.edu> --------- Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

- Add type hints to _write_back closure (Future, CacheEngineKey) - Update prefetch_single_done_callback docstring to reflect write-back behavior Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

…#2958) chore(ci): push nightly baselines to LMCache-CI repo The GitHub PAT for the main repo expired, causing nightly baseline uploads to fail. Switch the upload target to the dedicated LMCache/LMCache-CI repository instead of pushing to benchmarks-main on the main LMCache repo. Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

Move write-back logic from prefetch_single_done_callback to prefetch_all_done_callback to avoid caching non-contiguous chunks. When a middle tier partially fails, subsequent tiers' chunks break prefix continuity and are discarded by prefetch_all_done_callback. Previously, prefetch_single_done_callback would have already cached those invalid chunks. Now write-back only happens after prefix continuity is validated, ensuring only valid chunks are cached. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Add source-backend filtering to prefetch_all_done_callback write-back, matching the sync paths (get, batched_get). Chunks from LocalCPUBackend, PDBackend, and MaruBackend are now skipped during write-back, avoiding redundant re-submission. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Add parameter descriptions and write-back behavior documentation to reflect the new tier_backend_names parameter and write-back logic. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

* Refactor remote plugin to accept multiply connector. Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Skip module_path/class_name check for built-in adapters Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Add document related Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Fix to use DynamicConnectorAdapter to load external connector plugin Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…MCache#2926) * multiprocess: support per-group KV cache transfer with group_idx - gpu_ops: add group_idx param to lmcache_memcpy_async_h2d/d2h, use memory_obj.get_tensor(group_idx) instead of memory_obj.tensor - kv_layer_groups: add build_kv_layer_groups_from_list() to group layers by (shape, dtype) from a plain tensor list - gpu_context: introduce per-group shape_descs_, hidden_dim_sizes_, group_kv_pointers_, and tmp_gpu_buffers_; update get_kv_buffer_shape, get_tmp_gpu_buffer, get_tmp_gpu_buffer_batched to accept group_idx; add get_shape_desc(group_idx) and get_group_kv_pointers(group_idx) - server: update get_layout_desc, _store_loop, _retrieve_loop to iterate over all groups; fix skip_tokens_in_chunk upper bound to use batch_len instead of _BATCH_SIZE Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> * fix: support vectorized KV transfer for non-16B-aligned head sizes Add scalar type fallback hierarchy for block KV transfer kernel: head_bytes % 16 == 0 -> uint4 (16B, fastest) head_bytes % 4 == 0 -> uint32_t (4B) head_bytes % 2 == 0 -> uint16_t (2B) This fixes the runtime error for MLA models where head_size=132 (uint8), giving head_bytes=132 which is not divisible by 16 but is divisible by 4. Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> --------- Signed-off-by: liuyumoye <adeline_ly2023@outlook.com> Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>

12.9 Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

…Cache#2847) Signed-off-by: Ziwen Ning <ningziwe@amazon.com>

LMCache#2801) feat(disk): support multi-path local disk backend with path sharding Allow `local_disk` to accept comma-separated paths (e.g. "/mnt/nvme0/,/mnt/nvme1/") to use multiple NVMe devices. Each GPU worker selects one path at init time via the `local_disk_path_sharding` strategy (currently only "by_gpu": device_id % num_paths), matching the GDS backend approach LMCache#2817 and NIXL approach LMCache#2418. - Path selected once in __init__; _key_to_path, write_file, read_file unchanged from upstream - _parse_local_disk now uses startswith("file://") instead of regex, fixing file:// URIs without a trailing slash - All directories created at startup - Added local_disk_path_sharding config field (default: "by_gpu") - Added tests and updated docs Before this change the only way to increase performance was to use any of the linux multi-pathing technologies to aggregate IOs. Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com>

vLLM nightly now requires PyTorch 2.11.0 which is built against CUDA 13.0. Update the CI base image to match. Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

document long-doc-permutator workload Signed-off-by: deng451e <838677410@qq.com>

* update csrc to support native launch host func * add deadlock ci test Signed-off-by: ApostaC <yihua98@uchicago.edu>

* fix typo bug Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: rename hidden_dim_size to hidden_dim_sizes in describe and server Align with the rename introduced in LMCache#2926 where hidden_dim_size was changed to hidden_dim_sizes (List[int]) to support kv_groups. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> * fix: update test fixture to use hidden_dim_sizes key Update test fixture and assertion in test_describe.py to match the hidden_dim_size -> hidden_dim_sizes rename from LMCache#2926. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com> --------- Signed-off-by: princepride <wangzhipeng628@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* pin version * pin cu128 wheel Signed-off-by: deng451e <838677410@qq.com>

Signed-off-by: idellzheng <idellzheng@tencent.com>

* update prometheus version to fix ut Signed-off-by: ApostaC <yihua98@uchicago.edu> * fix otel sdk version Signed-off-by: ApostaC <yihua98@uchicago.edu> --------- Signed-off-by: ApostaC <yihua98@uchicago.edu>

Changing office hours from Thursdays to Wednesdays Signed-off-by: Nicolas (Nick) Barcet <nijaba@tensormesh.ai>

…V-cache (LMCache#3195) Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

…3194) Reuses the existing L2 long_doc_qa step instead of paying for a second model load. Two changes to that script: 1. Bump --metrics-sample-rate to 1.0 on the L2 relaunch so the histograms record on every event. The default 0.01 would leave them empty in this short workload and flake the assertions. 2. After the existing L2 data-flow checks, add a "Step 5" block that asserts every metric we publish from MP mode actually advances: - newer counters with label dimensions advance > 0 and carry the expected label (l2_store_completed/l2_load_completed by l2_name, lookup_requested_tokens/lookup_hit_tokens by model_name, num_chunks_loaded by worker_id) - the four throughput histograms record at least one observation (lmcache_mp_l0_l1_*_throughput_gbs and lmcache_mp_l2_*_throughput_gbs) The label-presence check catches the case where a counter fires but the attribute plumbing broke — e.g. a future refactor that drops the attribute at emit time but still ticks the counter. Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: missing lock in HFBucketConnector.close() when clearing metadata cache Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

Signed-off-by: aeon-x <talexcao@gmail.com>

Signed-off-by: deng451e <838677410@qq.com>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Signed-off-by: Sangyoon Kwon <syk0905.kwon@samsung.com>

* [Feat]: Implement batch operations in MooncakeConnector for improved efficiency and error handling Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> * [Test]: isolate Mooncake RDMA adapter integration test Close the default TCP adapter before creating the RDMA adapter in the buffer-backed Mooncake integration test so Mooncake master does not allocate test replicas on a TCP segment. Also use the native `rdma_devices` config key expected by Mooncake. Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> * [Fix]: Treat Mooncake exists errors as misses Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> * [Feat]: Implement delete operations in MooncakeConnector Add do_single_delete and do_batch_delete to the Mooncake storage backend, with integration tests covering key deletion, mixed existing/missing batch deletes, and usage tracking updates. Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> * [Fix]: resolve ruff F841 and minor formatting cleanups Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> --------- Signed-off-by: fangchizheng <fangchizheng@mail.ustc.edu.cn> Co-authored-by: maobaolong <baoloongmao@tencent.com>

* Add DAX L2 adapter for MP mode Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * Document DAX MP adapter APIs Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> * Use global eviction for DAX storage Remove DAX core's internal victim selection so slot pressure is handled by LMCache's global MP L2 eviction controller. Update DAX tests to cover full-arena behavior, slot-based cache_salt accounting, and StorageManager-driven L2 eviction. Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: DongDongJu <commisori28@gmail.com> --------- Signed-off-by: DongDongJu <commisori28@gmail.com> Signed-off-by: Dongjoo Seo <dongjoo.seo1@samsung.com> Signed-off-by: Dongjoo Seo <commisori28@gmail.com>

) * [Obs] Expose blend token-level hit-rate counters Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

) Signed-off-by: ApostaC <yihua98@uchicago.edu>

…Cache#3159) Signed-off-by: ApostaC <yihua98@uchicago.edu>

…che#3211) Signed-off-by: elliotz <elliot@character.ai>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

…ndpoints in TP=1 non-MP mode (LMCache#3146) * fix(LMCache#3104): use per-instance FastAPI app to fix 503 on cache endpoints in TP=1 non-MP mode Signed-off-by: baoloongmao <baoloongmao@tencent.com> * fix Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Add support for AZURE_BLOB NIXL backend The NIXL plugin uses Azure Blob Storage as an object store backend instead of S3. It is designed as a drop-in replacement for the OBJ backend, behaving functionally the same and only differing by required configurations. Currently, it only supports CPU to object store offloading. There is currently no GPU direct support Most of the work was allow listing the AZURE_BLOB plugin in code paths where the OBJ plugin was configured. Specifically, updated the following LMCache interfaces to support AZURE_BLOB: * KV cache offloading with the NIXL storage backend for both static and dynamic pools * L2 storage for nixl_store. Note it was not added to the nixl_store_dynamic because the OBJ plugin was not supported there either Signed-off-by: Kyle Knapp <kyleknapp@microsoft.com> * Fix indent in azure config sample Signed-off-by: Kyle Knapp <kyleknapp@microsoft.com> --------- Signed-off-by: Kyle Knapp <kyleknapp@microsoft.com>

LMCache#3174) Signed-off-by: ApostaC <yihuac@vllm.ai>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Signed-off-by: idellzheng <idellzheng@tencent.com>

…che#3092) * [ROCm] Add Triton block-sparse attention backend for CacheBlend Adds LMCTritonSparseBackend as a drop-in replacement for LMCFlashInferSparseBackend that works on both CUDA and ROCm via Triton kernels (no flashinfer dependency). Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com>

…he#3185) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ci(k3-unit-tests): route unit job to k8s queue The k3-unit-tests pipeline-level config targets the k8s queue for the upload step, but the inner job spec still pinned agents.queue to k3-h200-local. With no agents on k3-h200-local, every spawned unit-test job sat indefinitely as 'waiting'. Align with the other k3 pipelines (blend, multiprocess, integration, comprehensive, correctness) which all run on k8s. Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

ci(comprehensive/pd): write prefiller/decoder/proxy logs to repo root Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

* feat(pd_backend): fully async PD KV transfer backend Replace sync PDBackend with async implementation: - Non-blocking transfer: batched_submit_put_task returns immediately (fire-and-forget enqueue) instead of blocking vLLM worker thread until remote alloc + RDMA write complete - Event-driven flow control: replace time.sleep busy-wait polling with Condition-based notification, waking immediately when resources are freed - Self-contained resource release: remove() internally calls ref_count_down() and decrements inflight counter, eliminating caller responsibility for manual cleanup. cache_engine.py updated with _is_sync_pd_backend() guard to prevent double-free - Startup capacity validation: new pd_max_prefill_len config raises ValueError at init if buffer cannot hold the max prefill length, catching misconfiguration before runtime - Configurable timeouts: pd_allocation_timeout_sec, pd_shutdown_timeout_sec, pd_condition_poll_interval_sec replace scattered hardcoded constants - Backward compatible: split into pd_backend.py (sync) and pd_backend_async.py (async), selectable per-instance via pd_backend_mode config (default: "async"). Sync and async instances can coexist in the same cluster — e.g. sender on async while receiver on sync, or vice versa — with no wire protocol incompatibility Signed-off-by: Tony Lin <tony.lin@intel.com>

jooho-XCENA force-pushed the feat/write-back-non-blocking branch from 00642cc to 989e036 Compare April 3, 2026 07:39

jooho-XCENA added 2 commits April 3, 2026 07:55

fix: add try-except to get_non_blocking write-back callback

e419eaf

Align error handling with prefetch_single_done_callback for consistency. Prevents unhandled exceptions in Future callbacks. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

jooho-XCENA force-pushed the feat/write-back-non-blocking branch from 989e036 to e419eaf Compare April 3, 2026 07:57

jooho-XCENA and others added 26 commits April 3, 2026 08:10

fix: add MaruBackend to write-back exclusion list

8ac9ece

Align with existing get() and batched_get() which exclude MaruBackend from write-back to LocalCPUBackend. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

style: apply ruff-format

7a62bad

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

[CLI]Add long-doc-permutator CLI bench workload (LMCache#2937)

6ceed5e

* add new workload to cli bench Signed-off-by: deng451e <838677410@qq.com>

fix: add type hints to write-back callback and update docstring

8686a8b

- Add type hints to _write_back closure (Future, CacheEngineKey) - Update prefetch_single_done_callback docstring to reflect write-back behavior Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Merge branch 'dev' into feat/write-back-non-blocking

23d0ffb

docs: update prefetch_all_done_callback docstring

f59c147

Add parameter descriptions and write-back behavior documentation to reflect the new tier_backend_names parameter and write-back logic. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

[Chore][CI]: K3 base CI image 12.9 CUDA (LMCache#2975)

8cd378f

12.9 Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

fix: use pin=False in _allocate_and_put to prevent pd_buffer leak (LM…

b4d95dc

…Cache#2847) Signed-off-by: Ziwen Ning <ningziwe@amazon.com>

[Chore][CI] Upgrade CI base image to CUDA 13.0 (LMCache#2981)

8885a41

vLLM nightly now requires PyTorch 2.11.0 which is built against CUDA 13.0. Update the CI base image to match. Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

[doc] document long-doc-permutator workload in cli bench (LMCache#2963)

5155663

document long-doc-permutator workload Signed-off-by: deng451e <838677410@qq.com>

[MP][Bugfix] Fix deadlock caused by cuda launch host func (LMCache#2952)

801b016

* update csrc to support native launch host func * add deadlock ci test Signed-off-by: ApostaC <yihua98@uchicago.edu>

[CI] Pin cu128 nightly wheel for blend ci test (LMCache#2987)

3810ae7

* pin version * pin cu128 wheel Signed-off-by: deng451e <838677410@qq.com>

[MP][optimize] optimize save when mla enabled (LMCache#2935)

06981d6

Signed-off-by: idellzheng <idellzheng@tencent.com>

[hotfix] fix prometheus version for UT failure (LMCache#3000)

b02289e

* update prometheus version to fix ut Signed-off-by: ApostaC <yihua98@uchicago.edu> * fix otel sdk version Signed-off-by: ApostaC <yihua98@uchicago.edu> --------- Signed-off-by: ApostaC <yihua98@uchicago.edu>

Update LMCache Office Hours to Wednesday (LMCache#2990)

1dccc7e

Changing office hours from Thursdays to Wednesdays Signed-off-by: Nicolas (Nick) Barcet <nijaba@tensormesh.ai>

sammshen and others added 30 commits May 5, 2026 11:55

[CLI[[Bench] Add prefix-suffix-tuner workload for tiered + Blending K…

a5df0ab

…V-cache (LMCache#3195) Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

[Docs] Add doc for fs native connector (LMCache#3204)

1badbb4

Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>

fix: missing lock when clearing metadata cache (LMCache#3197)

a67d33a

fix: missing lock in HFBucketConnector.close() when clearing metadata cache Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

Merge branch 'dev' into feat/write-back-non-blocking

93c1b89

[Docs] Update LMCache Recipes (LMCache#3209)

3d8888d

Signed-off-by: aeon-x <talexcao@gmail.com>

[observability]update dashboard metrics (LMCache#3205)

59eb024

Signed-off-by: deng451e <838677410@qq.com>

docs: daily drift check — multi-process mode (2026-05-05) (LMCache#3202)

f9a3b1c

Signed-off-by: ApostaC <yihua98@uchicago.edu>

[Feat] Add option to skip raw block checkpoint load (LMCache#3169)

5ff3fe3

Signed-off-by: Sangyoon Kwon <syk0905.kwon@samsung.com>

[Observability] Expose blend token-level hit-rate counters (LMCache#3196

375243f

) * [Obs] Expose blend token-level hit-rate counters Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

[MP] Make vLLM be able to reconnect after LMCache restarts (LMCache#3208

730e8f9

) Signed-off-by: ApostaC <yihua98@uchicago.edu>

[Chore][docs] daily drift check — multi-process mode (2026-04-28) (LM…

c15e064

…Cache#3159) Signed-off-by: ApostaC <yihua98@uchicago.edu>

[05/06/2026] [operator] Add gpuVendor field to support AMD GPUs (LMCa…

c7394b8

…che#3211) Signed-off-by: elliotz <elliot@character.ai>

[CI] Fix flaky comprehensive tests (LMCache#3222)

f76c725

Signed-off-by: ApostaC <yihua98@uchicago.edu>

docs: daily drift check — multi-process mode (2026-04-30 + 2026-05-01) (

2e05f66

LMCache#3174) Signed-off-by: ApostaC <yihuac@vllm.ai>

docs: daily drift check — multi-process mode (2026-05-02) (LMCache#3184)

57b9f53

Signed-off-by: ApostaC <yihua98@uchicago.edu>

docs: daily drift check — multi-process mode (2026-05-03) (LMCache#3186)

7556ee0

Signed-off-by: ApostaC <yihua98@uchicago.edu>

docs: daily drift check — multi-process mode (2026-05-07) (LMCache#3219)

5811710

Signed-off-by: ApostaC <yihua98@uchicago.edu>

docs: daily drift check — multi-process mode (2026-05-08) (LMCache#3230)

13aca20

Signed-off-by: ApostaC <yihua98@uchicago.edu>

[MP] add new mp connector snapshot for vllm 0.20.1 (LMCache#3224)

87bfe8f

Signed-off-by: idellzheng <idellzheng@tencent.com>

[MP] Remove the middle /api/ in all endpoints of http_server (LMCac…

d945fbb

…he#3185) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

[CI]: expose prefiller/decoder/proxy logs as artifacts (LMCache#3240)

a7bc968

ci(comprehensive/pd): write prefiller/decoder/proxy logs to repo root Signed-off-by: Samuel Shen <slshen@uchciago.edu> Co-authored-by: Samuel Shen <slshen@uchciago.edu>

Merge branch 'dev' into feat/write-back-non-blocking

4a268bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): add write-back to local CPU for non-blocking get paths#3

feat(storage): add write-back to local CPU for non-blocking get paths#3
jooho-XCENA wants to merge 182 commits into
devfrom
feat/write-back-non-blocking

jooho-XCENA commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

jooho-XCENA commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants