[Core] Add write-back to local CPU for non-blocking get paths by jooho-XCENA · Pull Request #17 · xcena-dev/LMCache

jooho-XCENA · 2026-04-03T08:27:40Z

What this PR does / why we need it:

Implement write-back logic for async (non-blocking) get paths in StorageManager.

The synchronous paths (get(), batched_get()) already write back fetched data to LocalCPUBackend when retrieving from remote backends. However, the async paths (get_non_blocking(),
prefetch_single_done_callback()) were left as TODOs without write-back, meaning data fetched asynchronously from remote backends was never cached locally.

This PR adds write-back callbacks to both async paths, matching the existing sync behavior:

get_non_blocking(): adds a Future done callback to write back to LocalCPUBackend
prefetch_single_done_callback(): writes back prefetched data after async prefetch completes

Special notes for your reviewers:

Tested with 2 instances using p2p_backend:

Instance 1: Generate cache by running a prompt
Instance 2: Run the same prompt twice to use the cache from Instance 1

	Without write-back	With write-back
1st request	Remote transfer from Instance 1	Remote transfer from Instance 1
2nd request	Remote transfer again (no local cache)	Local CPU hit (no remote transfer)

	With write-back	Without write-back
inst1 warmup (cache generation)	2282.89ms	2278.85ms
inst1 query (2nd, local hit)	157.77ms	157.81ms
inst2 warmup (1st, remote transfer)	991.17ms	1051.75ms
inst2 query (2nd prompt)	157.50ms	1045.66ms

With write-back enabled, inst2's 2nd query (157.50ms) matches inst1's local hit (157.77ms) — the cache is served from local CPU instead of remote transfer. Without write-back, inst2 still fetches remotely every time (1045.66ms)

Discussion: Should async write-back always be enabled?

There may be cases where write-back is unnecessary or even wasteful:

Memory-constrained environments: local CPU cache may evict more valuable entries
Low-reuse prompts: if a prompt is unlikely to be requested again on this instance,
write-back adds latency (memory allocation + copy) with no future benefit,
making the request slower than skipping it entirely

Proposal: Make write-back configurable as an option (e.g., enable_async_write_back), defaulting to True to maintain current behavior while allowing users to disable it when not needed.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

* fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> * fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> --------- Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>

* Add submit_batch_delete to native connector stack for L2 eviction Adds a DELETE operation through all layers of the native connector framework (C++ IStorageConnector -> ConnectorBase -> pybind -> Python NativeConnectorL2Adapter) so that native remote backends (Redis, FS, plugins) can participate in L2 eviction. C++ layer: - Add BATCH_TILE_DELETE to Op enum - Add submit_batch_delete to IStorageConnector interface - Implement in ConnectorBase with tiling, per-key results, and per-key error tolerance (like GET) - Add do_single_delete virtual with default no-op for backward compat - Implement for Redis (RESP DEL) and FS (std::filesystem::remove) Pybind layer: - Add bind_submit_batch_delete template (keys-only, GIL release) - Update LMCACHE_BIND_CONNECTOR_METHODS macro Python layer: - NativeConnectorL2Adapter.delete() submits batch delete and blocks on threading.Event until demux thread signals completion - Backward compatible: detects submit_batch_delete via hasattr - Fix missing super().__init__() call for listener support - Add warning in native_plugin_l2_adapter for plugins without delete Tests: - Add submit_batch_delete to MockNativeConnector - Add TestDeleteInterface: existing key, nonexistent, empty, batch - Add TestDeleteBackwardCompatibility: no-op without the method * Add client-side size tracking for NativeConnectorL2Adapter.get_usage() Enables the L2EvictionController to automatically trigger eviction for native remote backends by tracking stored bytes client-side. - Track per-key sizes in _key_sizes dict, populated on store completion - Increment _current_size_bytes on successful store, decrement on delete - Idempotent: duplicate stores for same key don't double-count - get_usage() returns usage fraction when max_capacity_bytes > 0, or (-1.0, -1.0) when not configured (preserves backward compat) Add max_capacity_bytes config parameter to: - NativeConnectorL2Adapter.__init__ - RESPL2AdapterConfig, FSNativeL2AdapterConfig, NativePluginL2AdapterConfig - All three factory functions Tests: 6 new tests covering zero-capacity, store tracking, delete tracking, store-delete cycles, and idempotent store deduplication. * Update docs and examples for native connector eviction support - l2_eviction.md: Update adapter support matrix to show NativeConnectorL2Adapter now supports delete and get_usage; add configuration example with max_capacity_bytes + eviction - resp.rst: Add max_capacity_bytes to L2 adapter config table; add L2 Eviction section with full configuration example - native_connectors.rst: Add do_single_delete to connector interface; add submit_batch_delete to protocol; add max_capacity_bytes to config examples and native_plugin table; update checklist and method counts - resp/README.md: Add max_capacity_bytes to config table - Config help() strings: Add max_capacity_bytes documentation to RESP, FS native, and native plugin adapter configs * Rename max_capacity_bytes to max_capacity_gb (float) in config Change the L2 adapter capacity config from bytes (int) to GB (float) for consistency with MockL2Adapter's max_size_gb and better ergonomics. The internal _max_capacity_bytes field stays as bytes -- the conversion happens once in NativeConnectorL2Adapter.__init__. Updated: adapter configs, factories, tests, docs, and examples. * Fix eviction listener notifications and delete timeout cleanup Fixes three issues found in code review: 1. (Critical) Add _notify_keys_stored on store completion and _notify_keys_accessed on load completion in the demux loop. Without these, the LRU eviction policy never learns about stored/accessed keys, making eviction non-functional. 2. Clean up _pending_delete_events and _pending_ops on delete timeout to prevent memory leaks. 3. Add docstring to delete() method. Also store keys in _pending_ops for load operations (was None) so _notify_keys_accessed can report which keys were loaded. Listener notifications are fired outside the lock to avoid potential deadlocks with listener callbacks.

…MCache#2705) * feat: Add MaruBackend as a storage backend for CXL shared memory Signed-off-by: jooho-xcena <jooho.lee@xcena.com> Co-authored-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com> Co-authored-by: seohui-XCENA <seohui.son@xcena.com> Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com> * fix: capture store() return value and correct pin docstring - _async_store now uses handler.store() return value instead of unconditionally setting success=True, preventing CXL memory leak on server-side rejection - Fix batched_async_contains docstring to reflect actual batch_pin RPC support Signed-off-by: youngrok-XCENA <yr.song@xcena.com> * style: fix ruff-format in maru_backend.py Signed-off-by: jooho-xcena <jooho.lee@xcena.com> --------- Signed-off-by: jooho-xcena <jooho.lee@xcena.com> Signed-off-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com> Co-authored-by: seohui-XCENA <seohui.son@xcena.com> Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com> Co-authored-by: Rocky Song <167060552+youngrok-XCENA@users.noreply.github.com>

Fix UT after merge LMCache#2851 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.

* Introduce l2 mooncake adapter Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Remove extra files Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Reduce redundant code with setup.py Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>

- get_non_blocking: add done callback to write-back fetched data to LocalCPUBackend, matching existing get() behavior - prefetch_single_done_callback: write-back prefetched data to LocalCPUBackend after async prefetch completes Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Align error handling with prefetch_single_done_callback for consistency. Prevents unhandled exceptions in Future callbacks. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Align with existing get() and batched_get() which exclude MaruBackend from write-back to LocalCPUBackend. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

Oasis-Git and others added 10 commits April 2, 2026 15:40

vllm block event (LMCache#2930)

a060b4b

* fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> * fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> --------- Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>

[MP] Fix UT after merge LMCache#2851 (LMCache#2931)

ba3ba51

Fix UT after merge LMCache#2851 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

[Bugfix]: fix get_num_heads for MLA format (LMCache#2941)

45d4d36

MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.

fix: add try-except to get_non_blocking write-back callback

e419eaf

Align error handling with prefetch_single_done_callback for consistency. Prevents unhandled exceptions in Future callbacks. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

fix: add MaruBackend to write-back exclusion list

8ac9ece

Align with existing get() and batched_get() which exclude MaruBackend from write-back to LocalCPUBackend. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

style: apply ruff-format

7a62bad

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

jooho-XCENA closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Add write-back to local CPU for non-blocking get paths#17

[Core] Add write-back to local CPU for non-blocking get paths#17
jooho-XCENA wants to merge 10 commits into
xcena-dev:devfrom
jooho-XCENA:feat/write-back-non-blocking

jooho-XCENA commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jooho-XCENA commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants