fix: support latest vLLM KV cache layout by youngrok-XCENA · Pull Request #53 · xcena-dev/maru

youngrok-XCENA · 2026-06-02T10:15:32Z

🤔 Background & Motivation (Why)

Latest vLLM versions no longer expose ForwardContext.virtual_engine on the load path.
The Maru vLLM connector currently assumes layer.kv_cache is indexed by forward_context.virtual_engine, which crashes when a repeated prompt hits Maru and start_load_kv() runs.
Reproduced with vLLM 0.22.0 after disabling the unrelated FlashInfer sampler startup issue via VLLM_USE_FLASHINFER_SAMPLER=0.

🏗️ Design Changes

Behavioral change: MaruWorkerConnector.start_load_kv() now supports both older list/tuple KV cache containers and newer direct tensor-style layer.kv_cache values.
No public API change.

📝 Implementation Details

If layer.kv_cache is a list or tuple, keep the previous virtual-engine indexing behavior.
Use getattr(forward_context, "virtual_engine", 0) so older layouts without the field still fall back to virtual engine 0.
If layer.kv_cache is already a tensor, use it directly for KV injection.

✅ Tests

Unit tests
Integration tests
Manual tests
No tests needed (reason: )

Manual validation:

python -m py_compile maru_vllm/connector.py

Note: pytest -q maru-private/tests/unit/test_vllm_connector.py was not runnable in ~/.venv_vllm_latest because the local test conftest requires maru_shm, which is not installed in that environment.

🔗 Related Issues (optional)

🌿 Related PRs (optional)

🌿 Related Branches (optional)

📦 Release Note (for auto-generation / write in English)

NEW

CHANGED

FIXED

Maru vLLM connector now handles latest vLLM KV cache layouts that do not expose ForwardContext.virtual_engine.

IMPORTANT NOTES

github-actions · 2026-06-02T10:16:24Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	6	0	100%
__main__.py	3	3	0%	5, 7–8
allocation_manager.py	102	10	90%	30–31, 44–45, 52, 207, 211–214
client.py	184	8	95%	88, 130, 147–148, 311–313, 319
config.py	46	1	97%	72
constants.py	9	0	100%
device_scanner.py	94	31	67%	25, 100, 102–104, 106, 114–123, 125, 127, 129, 134–143, 145, 147
handler.py	524	79	84%	125, 140, 151–158, 166–168, 176, 187–192, 197, 222, 249–250, 254, 294, 298–299, 304, 326, 333–334, 338–344, 347, 351–353, 369, 371–372, 429, 450, 460–461, 567–569, 576, 698–701, 704, 715–718, 724, 1055–1059, 1065, 1076–1080, 1086, 1165, 1170, 1212
ipc.py	275	2	99%	365, 441
kv_manager.py	102	0	100%
logging_setup.py	19	0	100%
protocol.py	216	0	100%
resource_manager_installer.py	103	13	87%	80–86, 167, 169–172, 187
rpc_async_client.py	190	0	100%
rpc_async_server.py	111	0	100%
rpc_client.py	66	0	100%
rpc_client_base.py	100	10	90%	183, 219–220, 231–232, 304–305, 309–310, 340
rpc_handler_mixin.py	102	19	81%	153–155, 158–160, 218–221, 226, 230–234, 245–246, 252
rpc_server.py	64	0	100%
serializer.py	81	0	100%
server.py	145	20	86%	44, 54–59, 64–65, 73, 168, 172, 243, 247, 265–266, 284–286, 371
stats_manager.py	95	0	100%
types.py	60	1	98%	145
uds_helpers.py	13	0	100%
memory
__init__.py	5	0	100%
allocator.py	55	0	100%
mapper.py	128	2	98%	229, 296
owned_region_manager.py	101	1	99%	212
types.py	62	0	100%
TOTAL	3081	200	93%

Tests	Skipped	Failures	Errors	Time
660	4 💤	0 ❌	0 🔥	6.295s ⏱️

fix: support latest vllm kv cache layout

56e78c6

youngrok-XCENA changed the title ~~[codex] Support latest vLLM KV cache layout~~ Support latest vLLM KV cache layout Jun 2, 2026

youngrok-XCENA closed this Jun 2, 2026

youngrok-XCENA deleted the codex/vllm-kv-cache-compat branch June 2, 2026 11:14

youngrok-XCENA changed the title ~~Support latest vLLM KV cache layout~~ fix: support latest vLLM KV cache layout Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support latest vLLM KV cache layout#53

fix: support latest vLLM KV cache layout#53
youngrok-XCENA wants to merge 1 commit into
mainfrom
codex/vllm-kv-cache-compat

youngrok-XCENA commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

youngrok-XCENA commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤔 Background & Motivation (Why)

🏗️ Design Changes

📝 Implementation Details

✅ Tests

🔗 Related Issues (optional)

🌿 Related PRs (optional)

🌿 Related Branches (optional)

📦 Release Note (for auto-generation / write in English)

NEW

CHANGED

FIXED

IMPORTANT NOTES

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youngrok-XCENA commented Jun 2, 2026 •

edited

Loading