Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
5f9de57
feat: add Maru connector
jooho-XCENA Mar 6, 2026
3f9f3c0
Merge branch 'dev' into feat/maru-connector
jooho-XCENA Mar 6, 2026
3590a47
Address code review comments on Maru connector (#1)
hyunyul-XCENA Mar 9, 2026
670e825
refactor: use string keys instead of int64 hashed keys in MaruConnect…
seohui-XCENA Mar 9, 2026
7ed6210
Merge branch 'dev' into feat/maru-connector
jooho-XCENA Mar 9, 2026
a18cf52
Merge branch 'dev' into feat/maru-connector
jooho-XCENA Mar 13, 2026
e15f00b
feat: maru storage backend bring-up
youngrok-XCENA Mar 15, 2026
e29b248
chore: fix lint error
youngrok-XCENA Mar 15, 2026
c6d5dbf
refactor: update MaruBackend to use CxlMemoryAdapter facade API
youngrok-XCENA Mar 16, 2026
61e4b62
feat/maru backend (#5)
youngrok-XCENA Mar 17, 2026
7c021a6
feat(maru): use batch RPC APIs for MaruHandler operations (#11)
hyunyul-XCENA Mar 18, 2026
eb5afe0
feat: MaruBackend allocator fallback, ImportError, batch RPC (#7)
jooho-XCENA Mar 19, 2026
b5599d6
feat: MaruBackend pin/unpin and ref_count management (#10)
seohui-XCENA Mar 19, 2026
2be19a9
fix: rename MaruHandler pin/unpin RPC method calls (#12)
seohui-XCENA Mar 20, 2026
d6f47e7
tests: add maru backend test
youngrok-XCENA Mar 20, 2026
ebca1a9
fix: fix ruff fails
youngrok-XCENA Mar 20, 2026
5eceac6
refactor: move maru ImportError handling to __init__.py (#13)
jooho-XCENA Mar 20, 2026
c4f1053
fix: rename handler.pin_kv() to handler.pin() and document ref_count …
hyunyul-XCENA Mar 20, 2026
3b64cd4
docs: update maru.rst for new MaruBackend config
hyunyul-XCENA Mar 20, 2026
193b535
docs: add local_cpu: False to maru config example
hyunyul-XCENA Mar 20, 2026
ddc5460
Merge remote-tracking branch 'upstream/dev' into feat/maru-backend
youngrok-XCENA Mar 20, 2026
fd47482
fix: fix handler method name mismatch
youngrok-XCENA Mar 20, 2026
9507130
Merge branch 'feat/maru-backend' of github.com:xcena-dev/LMCache into…
youngrok-XCENA Mar 20, 2026
796f184
fix: skip put in MLA worker_id_as0 mode, fix test pin_kv→pin
youngrok-XCENA Mar 20, 2026
1425daf
fix: propagate async store failures to Future callers
youngrok-XCENA Mar 20, 2026
000134b
fix: pin memory_obj in async retrieve to balance cleanup unpin
seohui-XCENA Mar 20, 2026
ceedb9e
Merge branch 'feat/maru-backend' of github.com:xcena-dev/LMCache into…
youngrok-XCENA Mar 20, 2026
a73e8ba
fix: MaruBackend test failures, config cleanup, close() drain (#16)
jooho-XCENA Mar 20, 2026
f13580a
docs: tcp -> maru
hyunyul-XCENA Mar 20, 2026
6cb8f9b
Merge branch 'dev' into feat/maru-connector
hyunyul-XCENA Mar 20, 2026
f106204
Merge remote-tracking branch 'origin/feat/maru-connector' into feat/m…
hyunyul-XCENA Mar 20, 2026
cdb7133
tests: add store failure ref_count_down tests for MaruBackend
youngrok-XCENA Mar 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/source/assets/maru-kvcache.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/kv_cache/storage_backends/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Supported Backends
gds
infinistore
local_storage
maru
mock
mooncake
nixl
Expand Down
113 changes: 113 additions & 0 deletions docs/source/kv_cache/storage_backends/maru.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
Maru
====

.. _maru-overview:

Overview
--------

`Maru <https://github.com/xcena-dev/maru>`_ is a high-performance KV cache storage engine built on CXL shared memory,
designed for LLM inference scenarios where multiple instances need to share a KV cache with minimal latency.

.. image:: ../../assets/maru-kvcache.gif
:alt: KV Cache Sharing: Without vs With Maru

For architecture details, see the `Maru documentation <https://xcena-dev.github.io/maru/>`_.

Quick Start
-----------

Install Maru:

.. code-block:: bash

git clone https://github.com/xcena-dev/maru.git
cd maru
./install.sh

This installs ``maru-server``, ``maru-resourced``, and the ``maru`` Python package.

Deploy Model With Maru
~~~~~~~~~~~~~~~~~~~~~~

**Prerequisites:** CXL device (``/dev/dax*``), Python 3.12+, vLLM and LMCache installed.

**1. Start the Maru Server**

.. code-block:: bash

maru-server

**2. Create configuration file** (``maru-config.yaml``):

.. code-block:: yaml

chunk_size: 256
local_cpu: False
max_local_cpu_size: 0
save_unfull_chunk: True

# Maru backend
maru_path: "maru://localhost:5555"
maru_pool_size: 4

**3. Start vLLM with Maru**

.. code-block:: bash

LMCACHE_CONFIG_FILE="maru-config.yaml" \
vllm serve \
meta-llama/Llama-3.1-8B-Instruct \
--max-model-len 65536 \
--kv-transfer-config \
'{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'

Configuration
-------------

**LMCache Parameters:**

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Parameter
- Default
- Description
* - ``maru_path``
- Required
- Maru server URL (format: ``maru://host:port``)
* - ``maru_pool_size``
- ``4.0``
- CXL memory pool size per instance in GB (e.g., ``4``, ``0.5``)

**Advanced Parameters (via extra_config):**

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Parameter
- Default
- Description
* - ``maru_instance_id``
- auto UUID
- Unique client instance identifier
* - ``maru_timeout_ms``
- 5000
- ZMQ RPC socket timeout in milliseconds
* - ``maru_use_async_rpc``
- true
- Async DEALER-ROUTER RPC (``false`` for synchronous REQ-REP)
* - ``maru_max_inflight``
- 64
- Max concurrent async RPC requests
* - ``maru_eager_map``
- true
- Pre-map all shared regions on connect

Additional Resources
--------------------

- `Maru GitHub Repository <https://github.com/xcena-dev/maru>`_
- `Maru Documentation <https://xcena-dev.github.io/maru/>`_
7 changes: 7 additions & 0 deletions lmcache/v1/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,13 @@
"default": None,
"env_converter": int,
},
# Maru CXL shared memory backend
"maru_path": {"type": Optional[str], "default": None, "env_converter": str},
"maru_pool_size": {
"type": float,
"default": 4.0,
"env_converter": float,
},
# Other configurations
# (Deprecated) The url of the actual remote lmcache instance for auditing.
# Please use extra_config['audit_actual_remote_url'] instead.
Expand Down
14 changes: 14 additions & 0 deletions lmcache/v1/storage_backend/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,20 @@ def CreateStorageBackends(
)
storage_backends[str(gds_backend)] = gds_backend

if config.maru_path is not None and "MaruBackend" not in _skip:
try:
# First Party
from lmcache.v1.storage_backend.maru_backend import MaruBackend
except ImportError as e:
raise ImportError(
"The 'maru' and 'maru_lmcache' packages are required "
"to use MaruBackend. Please install them according to "
"the Maru setup documentation."
) from e

maru_backend = MaruBackend(config, metadata, loop, dst_device)
storage_backends[str(maru_backend)] = maru_backend

if config.remote_url is not None and "RemoteBackend" not in _skip:
assert local_cpu_backend is not None, (
"Remote backend requires local CPU backend as a buffer."
Expand Down
Loading
Loading