Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ Maru works as a drop-in remote storage backend for [LMCache](https://github.com/
# LMCache config
remote_url: "maru://localhost:5555"
extra_config:
maru_pool_size: "4G"
maru_pool_size: 4
```

For details on LMCache integration, see the [documentation](https://xcena-dev.github.io/maru/source/integration/lmcache.html).
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api_reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ with MaruHandler(config) as handler:

```{eval-rst}
.. autoclass:: maru_handler.MaruHandler
:members: connect, close, alloc, store, retrieve, exists, delete,
batch_store, batch_retrieve, batch_exists,
:members: connect, close, alloc, free, store, retrieve, exists, pin, unpin, delete,
batch_store, batch_retrieve, batch_exists, batch_pin, batch_unpin,
healthcheck, get_stats
:noindex:
:no-undoc-members:
Expand Down
4 changes: 4 additions & 0 deletions docs/source/design_doc/maru_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,14 @@ The server exposes the following message types:
| `REGISTER_KV` | Register a KV entry at a given location |
| `LOOKUP_KV` | Look up a KV entry's location and handle |
| `EXISTS_KV` | Check whether a key exists |
| `PIN_KV` | Atomically check existence and pin a KV entry |
| `UNPIN_KV` | Unpin a KV entry |
| `DELETE_KV` | Delete a KV entry |
| `BATCH_REGISTER_KV` | Batch register multiple KV entries |
| `BATCH_LOOKUP_KV` | Batch look up multiple keys |
| `BATCH_EXISTS_KV` | Batch check existence of multiple keys |
| `BATCH_PIN_KV` | Batch check existence and pin multiple entries |
| `BATCH_UNPIN_KV` | Batch unpin multiple entries |
| `GET_STATS` | Retrieve server statistics |
| `HEARTBEAT` | Connection health check |
| `HANDSHAKE` | Reserved — initial client-server handshake |
Expand Down
Binary file modified docs/source/design_doc/resource/lmcache_component_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 6 additions & 10 deletions docs/source/getting_started/examples/lmcache/p2p.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,26 +19,22 @@ Both instances share a single configuration file (`maru-config.yaml`):

```yaml
chunk_size: 256
local_cpu: True
max_local_cpu_size: 5
local_cpu: False
max_local_cpu_size: 0
enable_async_loading: True

enable_p2p: False
enable_controller: False

remote_url: "maru://localhost:${MARU_SERVER_PORT}"
remote_serde: "naive"
remote_storage_plugins: ["maru"]
# Maru backend
maru_path: "maru://localhost:${MARU_SERVER_PORT}"
maru_pool_size: 4

extra_config:
remote_storage_plugin.maru.module_path: maru_lmcache.adapter
remote_storage_plugin.maru.class_name: MaruConnectorAdapter
maru_pool_size: "4G"
save_chunk_meta: False
lookup_backoff_time: 0.001
```

Maru is loaded as an LMCache [remote storage plugin](https://docs.lmcache.ai/developer_guide/extending_lmcache/remote_storage_plugins.html). For details on each configuration field, see {doc}`../../../integration/lmcache`.
Maru is loaded as an LMCache [storage backend](https://docs.lmcache.ai/kv_cache/storage_backends/index.html). For details on each configuration field, see {doc}`../../../integration/lmcache`.

## How to Run

Expand Down
12 changes: 4 additions & 8 deletions docs/source/getting_started/examples/lmcache/pd.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,18 @@ Both prefiller and decoder use the same configuration:
```yaml
enable_pd: False
chunk_size: 256
remote_url: "maru://localhost:${MARU_SERVER_PORT}"
remote_serde: "naive"
remote_storage_plugins: ["maru"]
local_cpu: False
max_local_cpu_size: 100
save_unfull_chunk: True
# Maru backend
maru_path: "maru://localhost:${MARU_SERVER_PORT}"
maru_pool_size: 4

extra_config:
remote_storage_plugin.maru.module_path: maru_lmcache.adapter
remote_storage_plugin.maru.class_name: MaruConnectorAdapter
maru_pool_size: "4G"
save_chunk_meta: False
lookup_backoff_time: 0.001
```

Maru is loaded as an LMCache [remote storage plugin](https://docs.lmcache.ai/developer_guide/extending_lmcache/remote_storage_plugins.html). For details on each configuration field, see {doc}`../../../integration/lmcache`.
Maru is loaded as an LMCache [storage backend](https://docs.lmcache.ai/kv_cache/storage_backends/index.html). For details on each configuration field, see {doc}`../../../integration/lmcache`.

## How to Run

Expand Down
113 changes: 50 additions & 63 deletions docs/source/integration/lmcache.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,32 @@ The full stack from inference engine to shared memory:

| Layer | Responsibility | Scope |
|-------|---------------|-------|
| **LMCache stack** | Inference engine → CacheEngine → StorageManager → RemoteBackend | LMCache (external) |
| **MaruConnector** | Adapts LMCache's RemoteConnector to MaruHandler's API | Integration boundary |
| **LMCache stack** | Inference engine → CacheEngine → StorageManager → MaruBackend | LMCache (external) |
| **MaruBackend** | LMCache `AllocatorBackendInterface` — allocates directly on CXL, async store, sync get | Integration boundary |
| **CxlMemoryAdapter** | LMCache `MemoryAllocatorInterface` — translates Maru pages to `TensorMemoryObj` pool | Integration boundary |
| **MaruHandler** | Client-side KV operations, memory mapping, connection management | Maru client |
| **MaruServer** | Central metadata store, memory allocation coordinator | Maru server |

The **integration boundary** sits at MaruConnector. Everything above is LMCache;
everything below is Maru. MaruConnector is the only component that imports from
both projects.
The **integration boundary** sits at MaruBackend + CxlMemoryAdapter. Everything above is LMCache;
everything below is Maru. These two classes are the only components that import from both projects.

## Connector Design
## Backend Design

LMCache defines a `RemoteConnector` interface that all remote storage backends
must implement (`exists`, `get`, `put`, `close`, and batch variants). MaruConnector
implements this interface by delegating to MaruHandler.
### Two-layer integration

**Why the connector pattern:** LMCache's RemoteBackend is designed for pluggable
storage. The same StorageManager can use Redis, S3, Mooncake, or Maru without
any change to the cache engine logic. MaruConnector slots in as one such plugin.

The key translation between the two APIs involves:
```
MaruBackend (AllocatorBackendInterface)
├── CxlMemoryAdapter (MemoryAllocatorInterface)
│ ├── _pool: {region_id: [TensorMemoryObj per page]}
│ └── address encoding: (rid << 32) | pid
└── MaruHandler (Maru client)
├── RpcClient → MaruServer
├── DaxMapper (mmap management)
└── OwnedRegionManager (page allocation)
```

- **Key conversion** — LMCache uses structured `CacheEngineKey` objects; MaruHandler uses string keys (`CacheEngineKey.to_string()`).
- **Zero-copy bridging** — MaruHandler returns `MemoryInfo` (a memoryview wrapper) which the connector wraps as LMCache's `MemoryObj` without copying data.
- **Batch optimization** — The connector maps LMCache's batch operations to MaruHandler's batch RPC calls, reducing round-trip overhead.
**MaruHandler** manages CXL memory (regions, pages, mmap). **CxlMemoryAdapter** translates
pages into LMCache's `TensorMemoryObj` format.

## Data Path

Expand All @@ -53,21 +55,23 @@ When the inference engine produces new KV cache data:
sequenceDiagram
participant IE as Inference Engine
participant CE as CacheEngine
participant MC as MaruConnector
participant MB as MaruBackend
participant MH as MaruHandler
participant MS as MaruServer
participant CXL as CXL Memory

IE->>CE: KV tensors (GPU)
CE->>MC: put(key, MemoryObj)
MC->>MH: alloc(size)
MH-->>MC: handle (page in CXL region)
MC->>CXL: write data via handle buffer (zero-copy)
MC->>MH: store(key, handle)
CE->>MB: allocate(size)
MB->>MH: alloc(size)
MH-->>MB: handle (page in CXL region)
MB-->>CE: MemoryObj (CXL-backed)
CE->>CXL: GPU → CXL direct copy (only data copy)
CE->>MB: put(key, MemoryObj)
MB->>MH: store(key, handle)
MH->>MS: register_kv(key, region_id, offset, length)
MS-->>MH: success
MH-->>MC: True
MC-->>CE: done
MH-->>MB: True
MB-->>CE: done
```

### Retrieve Path (read)
Expand All @@ -78,20 +82,19 @@ When the inference engine needs cached KV data:
sequenceDiagram
participant IE as Inference Engine
participant CE as CacheEngine
participant MC as MaruConnector
participant MB as MaruBackend
participant MH as MaruHandler
participant MS as MaruServer
participant CXL as CXL Memory

IE->>CE: Request KV for prompt prefix
CE->>MC: get(key)
MC->>MH: retrieve(key)
CE->>MB: get(key)
MB->>MH: retrieve(key)
MH->>MS: lookup_kv(key)
MS-->>MH: region_id, offset, length
MH->>CXL: Map shared region (if not already mapped)
MH-->>MC: MemoryInfo (zero-copy memoryview)
MC->>MC: Wrap as MemoryObj (zero-copy)
MC-->>CE: MemoryObj
MH-->>MB: MemoryInfo (zero-copy memoryview)
MB-->>CE: MemoryObj (points to CXL mmap, zero-copy)
CE-->>IE: KV tensors
```

Expand All @@ -101,59 +104,43 @@ accessed directly from CXL shared memory through memory-mapped regions.

## Configuration

Maru is loaded as an LMCache [remote storage plugin](https://docs.lmcache.ai/developer_guide/extending_lmcache/remote_storage_plugins.html) (requires LMCache >= v0.3.14). Configuration is done via the LMCache YAML config file.
Maru is configured as a native LMCache storage backend via the `maru_path` and `maru_pool_size`
config fields. No plugin registration is needed.

```yaml
chunk_size: 256
local_cpu: True
max_local_cpu_size: 5
enable_async_loading: True
local_cpu: False
max_local_cpu_size: 0
save_unfull_chunk: True

# Disable P2P for Maru shared storage mode
enable_p2p: False
enable_controller: False

# Maru backend — format: maru://<host>:<port>[?pool_size=&pool_id=&...]
remote_url: "maru://localhost:5555"
remote_serde: "naive"
remote_storage_plugins: ["maru"]
# Maru backend
maru_path: "maru://localhost:5555"
maru_pool_size: 4

extra_config:
remote_storage_plugin.maru.module_path: maru_lmcache.adapter
remote_storage_plugin.maru.class_name: MaruConnectorAdapter
maru_pool_size: "4G" # CXL memory pool size ("1G", "500M", etc.)
# maru_pool_id: 1 # Pin to specific DAX pool (default: any)
# maru_pool_id: "0,1" # Multi-pool fallback (try pool 0, then 1)
save_chunk_meta: False
lookup_backoff_time: 0.001
# maru_instance_id: "my-id" # Unique client ID (default: auto UUID)
# maru_operation_timeout: 10.0 # Per-operation timeout in seconds
# maru_timeout_ms: 2000 # ZMQ socket timeout (ms)
# maru_timeout_ms: 5000 # ZMQ socket timeout (ms)
# maru_use_async_rpc: true # Async DEALER-ROUTER RPC
# maru_max_inflight: 64 # Max in-flight async requests
# maru_eager_map: true # Pre-map shared regions on connect
```

### Plugin settings
### MaruBackend settings

| Field | Description |
| --- | --- |
| `remote_storage_plugins: ["maru"]` | Registers Maru as a plugin backend |
| `remote_storage_plugin.maru.module_path` | Python module containing the adapter class |
| `remote_storage_plugin.maru.class_name` | Adapter class name (`MaruConnectorAdapter`) |
| Field | Default | Description |
| --- | --- | --- |
| `maru_path` | (required) | MaruServer address. Format: `maru://<host>:<port>` |
| `maru_pool_size` | `4` | CXL memory pool size in GB |

### Maru extra_config parameters

| Parameter | Default | Description |
| --- | --- | --- |
| `maru_pool_size` | `"1G"` | CXL memory pool size. Supports human-readable strings (`"4G"`, `"500M"`) or integer bytes |
| `maru_pool_id` | `None` (any pool) | Pin allocations to specific DAX device pool(s). Single int (`1`) or comma-separated (`"0,1"`) for ordered fallback. Can also be set via URL query: `maru://host:port?pool_id=1` |
| `maru_instance_id` | auto-generated UUID | Unique client instance identifier |
| `maru_operation_timeout` | `10.0` | Timeout in seconds for individual KV operations |
| `maru_timeout_ms` | `2000` | ZMQ socket timeout in milliseconds for RPC communication |
| `maru_timeout_ms` | `5000` | ZMQ socket timeout in milliseconds for RPC communication |
| `maru_use_async_rpc` | `true` | Use async DEALER-ROUTER pattern for higher throughput |
| `maru_max_inflight` | `64` | Max concurrent in-flight async RPC requests |
| `maru_server_url` | (from `remote_url`) | Override server URL. Normally not needed |
| `maru_auto_connect` | `true` | Auto-connect to MaruServer on initialization |
| `maru_eager_map` | `true` | Pre-map all shared regions on connect |

For runnable examples, see
Expand Down
3 changes: 2 additions & 1 deletion examples/lmcache/disagg_prefill/1p1d/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.logs/
.results/
bench_results/
bench_results/
.test_pids
Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
enable_pd: False
chunk_size: 256
# Maru remote backend
remote_url: "maru://localhost:${MARU_SERVER_PORT}"
remote_serde: "naive"
remote_storage_plugins: ["maru"]
local_cpu: False
max_local_cpu_size: 100
save_unfull_chunk: True

# Maru backend
maru_path: "maru://localhost:${MARU_SERVER_PORT}"
maru_pool_size: 4

extra_config:
remote_storage_plugin.maru.module_path: maru_lmcache.adapter
remote_storage_plugin.maru.class_name: MaruConnectorAdapter
maru_pool_size: "4G"
save_chunk_meta: False
lookup_backoff_time: 0.001

Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
enable_pd: False
chunk_size: 256
# Maru remote backend
remote_url: "maru://localhost:${MARU_SERVER_PORT}"
remote_serde: "naive"
remote_storage_plugins: ["maru"]
local_cpu: False
max_local_cpu_size: 100
save_unfull_chunk: True

# Maru backend
maru_path: "maru://localhost:${MARU_SERVER_PORT}"
maru_pool_size: 4

extra_config:
remote_storage_plugin.maru.module_path: maru_lmcache.adapter
remote_storage_plugin.maru.class_name: MaruConnectorAdapter
maru_pool_size: "4G"
save_chunk_meta: False
lookup_backoff_time: 0.001

Loading
Loading