Skip to content

[Core] Add write-back to local CPU for non-blocking get paths#17

Closed
jooho-XCENA wants to merge 10 commits into
xcena-dev:devfrom
jooho-XCENA:feat/write-back-non-blocking
Closed

[Core] Add write-back to local CPU for non-blocking get paths#17
jooho-XCENA wants to merge 10 commits into
xcena-dev:devfrom
jooho-XCENA:feat/write-back-non-blocking

Conversation

@jooho-XCENA

Copy link
Copy Markdown

What this PR does / why we need it:

Implement write-back logic for async (non-blocking) get paths in StorageManager.

The synchronous paths (get(), batched_get()) already write back fetched data to LocalCPUBackend when retrieving from remote backends. However, the async paths (get_non_blocking(),
prefetch_single_done_callback()) were left as TODOs without write-back, meaning data fetched asynchronously from remote backends was never cached locally.

This PR adds write-back callbacks to both async paths, matching the existing sync behavior:

  • get_non_blocking(): adds a Future done callback to write back to LocalCPUBackend
  • prefetch_single_done_callback(): writes back prefetched data after async prefetch completes

Special notes for your reviewers:

Tested with 2 instances using p2p_backend:

  1. Instance 1: Generate cache by running a prompt
  2. Instance 2: Run the same prompt twice to use the cache from Instance 1
Without write-back With write-back
1st request Remote transfer from Instance 1 Remote transfer from Instance 1
2nd request Remote transfer again (no local cache) Local CPU hit (no remote transfer)
With write-back Without write-back
inst1 warmup (cache generation) 2282.89ms 2278.85ms
inst1 query (2nd, local hit) 157.77ms 157.81ms
inst2 warmup (1st, remote transfer) 991.17ms 1051.75ms
inst2 query (2nd prompt) 157.50ms 1045.66ms

With write-back enabled, inst2's 2nd query (157.50ms) matches inst1's local hit (157.77ms) — the cache is served from local CPU instead of remote transfer. Without write-back, inst2 still fetches remotely every time (1045.66ms)

Discussion: Should async write-back always be enabled?

There may be cases where write-back is unnecessary or even wasteful:

  • Memory-constrained environments: local CPU cache may evict more valuable entries
  • Low-reuse prompts: if a prompt is unlikely to be requested again on this instance,
    write-back adds latency (memory allocation + copy) with no future benefit,
    making the request slower than skipping it entirely

Proposal: Make write-back configurable as an option (e.g., enable_async_write_back), defaulting to True to maintain current behavior while allowing users to disable it when not needed.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Oasis-Git and others added 10 commits April 2, 2026 15:40
* fix

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>

* fix

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>

---------

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
* Add submit_batch_delete to native connector stack for L2 eviction

Adds a DELETE operation through all layers of the native connector
framework (C++ IStorageConnector -> ConnectorBase -> pybind -> Python
NativeConnectorL2Adapter) so that native remote backends (Redis, FS,
plugins) can participate in L2 eviction.

C++ layer:
- Add BATCH_TILE_DELETE to Op enum
- Add submit_batch_delete to IStorageConnector interface
- Implement in ConnectorBase with tiling, per-key results, and
  per-key error tolerance (like GET)
- Add do_single_delete virtual with default no-op for backward compat
- Implement for Redis (RESP DEL) and FS (std::filesystem::remove)

Pybind layer:
- Add bind_submit_batch_delete template (keys-only, GIL release)
- Update LMCACHE_BIND_CONNECTOR_METHODS macro

Python layer:
- NativeConnectorL2Adapter.delete() submits batch delete and blocks
  on threading.Event until demux thread signals completion
- Backward compatible: detects submit_batch_delete via hasattr
- Fix missing super().__init__() call for listener support
- Add warning in native_plugin_l2_adapter for plugins without delete

Tests:
- Add submit_batch_delete to MockNativeConnector
- Add TestDeleteInterface: existing key, nonexistent, empty, batch
- Add TestDeleteBackwardCompatibility: no-op without the method

* Add client-side size tracking for NativeConnectorL2Adapter.get_usage()

Enables the L2EvictionController to automatically trigger eviction for
native remote backends by tracking stored bytes client-side.

- Track per-key sizes in _key_sizes dict, populated on store completion
- Increment _current_size_bytes on successful store, decrement on delete
- Idempotent: duplicate stores for same key don't double-count
- get_usage() returns usage fraction when max_capacity_bytes > 0,
  or (-1.0, -1.0) when not configured (preserves backward compat)

Add max_capacity_bytes config parameter to:
- NativeConnectorL2Adapter.__init__
- RESPL2AdapterConfig, FSNativeL2AdapterConfig, NativePluginL2AdapterConfig
- All three factory functions

Tests: 6 new tests covering zero-capacity, store tracking, delete
tracking, store-delete cycles, and idempotent store deduplication.

* Update docs and examples for native connector eviction support

- l2_eviction.md: Update adapter support matrix to show
  NativeConnectorL2Adapter now supports delete and get_usage;
  add configuration example with max_capacity_bytes + eviction
- resp.rst: Add max_capacity_bytes to L2 adapter config table;
  add L2 Eviction section with full configuration example
- native_connectors.rst: Add do_single_delete to connector
  interface; add submit_batch_delete to protocol; add
  max_capacity_bytes to config examples and native_plugin table;
  update checklist and method counts
- resp/README.md: Add max_capacity_bytes to config table
- Config help() strings: Add max_capacity_bytes documentation
  to RESP, FS native, and native plugin adapter configs

* Rename max_capacity_bytes to max_capacity_gb (float) in config

Change the L2 adapter capacity config from bytes (int) to GB (float)
for consistency with MockL2Adapter's max_size_gb and better ergonomics.

The internal _max_capacity_bytes field stays as bytes -- the conversion
happens once in NativeConnectorL2Adapter.__init__.

Updated: adapter configs, factories, tests, docs, and examples.

* Fix eviction listener notifications and delete timeout cleanup

Fixes three issues found in code review:

1. (Critical) Add _notify_keys_stored on store completion and
   _notify_keys_accessed on load completion in the demux loop.
   Without these, the LRU eviction policy never learns about
   stored/accessed keys, making eviction non-functional.

2. Clean up _pending_delete_events and _pending_ops on delete
   timeout to prevent memory leaks.

3. Add docstring to delete() method.

Also store keys in _pending_ops for load operations (was None)
so _notify_keys_accessed can report which keys were loaded.

Listener notifications are fired outside the lock to avoid
potential deadlocks with listener callbacks.
…MCache#2705)

* feat: Add MaruBackend as a storage backend for CXL shared memory

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Co-authored-by: youngrok-XCENA <yr.song@xcena.com>
Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com>
Co-authored-by: seohui-XCENA <seohui.son@xcena.com>
Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com>

* fix: capture store() return value and correct pin docstring

- _async_store now uses handler.store() return value instead of
  unconditionally setting success=True, preventing CXL memory leak
  on server-side rejection
- Fix batched_async_contains docstring to reflect actual batch_pin
  RPC support

Signed-off-by: youngrok-XCENA <yr.song@xcena.com>

* style: fix ruff-format in maru_backend.py

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>

---------

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Signed-off-by: youngrok-XCENA <yr.song@xcena.com>
Co-authored-by: youngrok-XCENA <yr.song@xcena.com>
Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com>
Co-authored-by: seohui-XCENA <seohui.son@xcena.com>
Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com>
Co-authored-by: Rocky Song <167060552+youngrok-XCENA@users.noreply.github.com>
Fix UT after merge LMCache#2851

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim,
so get_num_heads should return 1 instead of raising ValueError.
This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.
* Introduce l2 mooncake adapter

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Remove extra files

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

* Reduce redundant code with setup.py

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

---------

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
- get_non_blocking: add done callback to write-back fetched data
  to LocalCPUBackend, matching existing get() behavior
- prefetch_single_done_callback: write-back prefetched data to
  LocalCPUBackend after async prefetch completes

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Align error handling with prefetch_single_done_callback for
consistency. Prevents unhandled exceptions in Future callbacks.

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Align with existing get() and batched_get() which exclude
MaruBackend from write-back to LocalCPUBackend.

Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
@jooho-XCENA jooho-XCENA closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants