[Core] Add write-back to local CPU for non-blocking get paths#17
Closed
jooho-XCENA wants to merge 10 commits into
Closed
[Core] Add write-back to local CPU for non-blocking get paths#17jooho-XCENA wants to merge 10 commits into
jooho-XCENA wants to merge 10 commits into
Conversation
* fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> * fix Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> --------- Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
* Add submit_batch_delete to native connector stack for L2 eviction Adds a DELETE operation through all layers of the native connector framework (C++ IStorageConnector -> ConnectorBase -> pybind -> Python NativeConnectorL2Adapter) so that native remote backends (Redis, FS, plugins) can participate in L2 eviction. C++ layer: - Add BATCH_TILE_DELETE to Op enum - Add submit_batch_delete to IStorageConnector interface - Implement in ConnectorBase with tiling, per-key results, and per-key error tolerance (like GET) - Add do_single_delete virtual with default no-op for backward compat - Implement for Redis (RESP DEL) and FS (std::filesystem::remove) Pybind layer: - Add bind_submit_batch_delete template (keys-only, GIL release) - Update LMCACHE_BIND_CONNECTOR_METHODS macro Python layer: - NativeConnectorL2Adapter.delete() submits batch delete and blocks on threading.Event until demux thread signals completion - Backward compatible: detects submit_batch_delete via hasattr - Fix missing super().__init__() call for listener support - Add warning in native_plugin_l2_adapter for plugins without delete Tests: - Add submit_batch_delete to MockNativeConnector - Add TestDeleteInterface: existing key, nonexistent, empty, batch - Add TestDeleteBackwardCompatibility: no-op without the method * Add client-side size tracking for NativeConnectorL2Adapter.get_usage() Enables the L2EvictionController to automatically trigger eviction for native remote backends by tracking stored bytes client-side. - Track per-key sizes in _key_sizes dict, populated on store completion - Increment _current_size_bytes on successful store, decrement on delete - Idempotent: duplicate stores for same key don't double-count - get_usage() returns usage fraction when max_capacity_bytes > 0, or (-1.0, -1.0) when not configured (preserves backward compat) Add max_capacity_bytes config parameter to: - NativeConnectorL2Adapter.__init__ - RESPL2AdapterConfig, FSNativeL2AdapterConfig, NativePluginL2AdapterConfig - All three factory functions Tests: 6 new tests covering zero-capacity, store tracking, delete tracking, store-delete cycles, and idempotent store deduplication. * Update docs and examples for native connector eviction support - l2_eviction.md: Update adapter support matrix to show NativeConnectorL2Adapter now supports delete and get_usage; add configuration example with max_capacity_bytes + eviction - resp.rst: Add max_capacity_bytes to L2 adapter config table; add L2 Eviction section with full configuration example - native_connectors.rst: Add do_single_delete to connector interface; add submit_batch_delete to protocol; add max_capacity_bytes to config examples and native_plugin table; update checklist and method counts - resp/README.md: Add max_capacity_bytes to config table - Config help() strings: Add max_capacity_bytes documentation to RESP, FS native, and native plugin adapter configs * Rename max_capacity_bytes to max_capacity_gb (float) in config Change the L2 adapter capacity config from bytes (int) to GB (float) for consistency with MockL2Adapter's max_size_gb and better ergonomics. The internal _max_capacity_bytes field stays as bytes -- the conversion happens once in NativeConnectorL2Adapter.__init__. Updated: adapter configs, factories, tests, docs, and examples. * Fix eviction listener notifications and delete timeout cleanup Fixes three issues found in code review: 1. (Critical) Add _notify_keys_stored on store completion and _notify_keys_accessed on load completion in the demux loop. Without these, the LRU eviction policy never learns about stored/accessed keys, making eviction non-functional. 2. Clean up _pending_delete_events and _pending_ops on delete timeout to prevent memory leaks. 3. Add docstring to delete() method. Also store keys in _pending_ops for load operations (was None) so _notify_keys_accessed can report which keys were loaded. Listener notifications are fired outside the lock to avoid potential deadlocks with listener callbacks.
…MCache#2705) * feat: Add MaruBackend as a storage backend for CXL shared memory Signed-off-by: jooho-xcena <jooho.lee@xcena.com> Co-authored-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com> Co-authored-by: seohui-XCENA <seohui.son@xcena.com> Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com> * fix: capture store() return value and correct pin docstring - _async_store now uses handler.store() return value instead of unconditionally setting success=True, preventing CXL memory leak on server-side rejection - Fix batched_async_contains docstring to reflect actual batch_pin RPC support Signed-off-by: youngrok-XCENA <yr.song@xcena.com> * style: fix ruff-format in maru_backend.py Signed-off-by: jooho-xcena <jooho.lee@xcena.com> --------- Signed-off-by: jooho-xcena <jooho.lee@xcena.com> Signed-off-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: youngrok-XCENA <yr.song@xcena.com> Co-authored-by: hyunyul-XCENA <hyunyul.cho@xcena.com> Co-authored-by: seohui-XCENA <seohui.son@xcena.com> Co-authored-by: kihwan-XCENA <kihwan.kim@xcena.com> Co-authored-by: Rocky Song <167060552+youngrok-XCENA@users.noreply.github.com>
Fix UT after merge LMCache#2851 Signed-off-by: baoloongmao <baoloongmao@tencent.com>
MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.
* Introduce l2 mooncake adapter Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Remove extra files Signed-off-by: baoloongmao <baoloongmao@tencent.com> * Reduce redundant code with setup.py Signed-off-by: baoloongmao <baoloongmao@tencent.com> --------- Signed-off-by: baoloongmao <baoloongmao@tencent.com>
- get_non_blocking: add done callback to write-back fetched data to LocalCPUBackend, matching existing get() behavior - prefetch_single_done_callback: write-back prefetched data to LocalCPUBackend after async prefetch completes Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Align error handling with prefetch_single_done_callback for consistency. Prevents unhandled exceptions in Future callbacks. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Align with existing get() and batched_get() which exclude MaruBackend from write-back to LocalCPUBackend. Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
Signed-off-by: jooho-xcena <jooho.lee@xcena.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Implement write-back logic for async (non-blocking) get paths in
StorageManager.The synchronous paths (
get(),batched_get()) already write back fetched data toLocalCPUBackendwhen retrieving from remote backends. However, the async paths (get_non_blocking(),prefetch_single_done_callback()) were left as TODOs without write-back, meaning data fetched asynchronously from remote backends was never cached locally.This PR adds write-back callbacks to both async paths, matching the existing sync behavior:
get_non_blocking(): adds aFuturedone callback to write back toLocalCPUBackendprefetch_single_done_callback(): writes back prefetched data after async prefetch completesSpecial notes for your reviewers:
Tested with 2 instances using p2p_backend:
With write-back enabled, inst2's 2nd query (157.50ms) matches inst1's local hit (157.77ms) — the cache is served from local CPU instead of remote transfer. Without write-back, inst2 still fetches remotely every time (1045.66ms)
Discussion: Should async write-back always be enabled?
There may be cases where write-back is unnecessary or even wasteful:
write-back adds latency (memory allocation + copy) with no future benefit,
making the request slower than skipping it entirely
Proposal: Make write-back configurable as an option (e.g.,
enable_async_write_back), defaulting toTrueto maintain current behavior while allowing users to disable it when not needed.If applicable: