Skip to content

fix: support latest vLLM KV cache layout#53

Closed
youngrok-XCENA wants to merge 1 commit into
mainfrom
codex/vllm-kv-cache-compat
Closed

fix: support latest vLLM KV cache layout#53
youngrok-XCENA wants to merge 1 commit into
mainfrom
codex/vllm-kv-cache-compat

Conversation

@youngrok-XCENA

@youngrok-XCENA youngrok-XCENA commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

🤔 Background & Motivation (Why)

  • Latest vLLM versions no longer expose ForwardContext.virtual_engine on the load path.
  • The Maru vLLM connector currently assumes layer.kv_cache is indexed by forward_context.virtual_engine, which crashes when a repeated prompt hits Maru and start_load_kv() runs.
  • Reproduced with vLLM 0.22.0 after disabling the unrelated FlashInfer sampler startup issue via VLLM_USE_FLASHINFER_SAMPLER=0.

🏗️ Design Changes

  • Behavioral change: MaruWorkerConnector.start_load_kv() now supports both older list/tuple KV cache containers and newer direct tensor-style layer.kv_cache values.
  • No public API change.

📝 Implementation Details

  • If layer.kv_cache is a list or tuple, keep the previous virtual-engine indexing behavior.
  • Use getattr(forward_context, "virtual_engine", 0) so older layouts without the field still fall back to virtual engine 0.
  • If layer.kv_cache is already a tensor, use it directly for KV injection.

✅ Tests

  • Unit tests
  • Integration tests
  • Manual tests
  • No tests needed (reason: )

Manual validation:

  • python -m py_compile maru_vllm/connector.py

Note: pytest -q maru-private/tests/unit/test_vllm_connector.py was not runnable in ~/.venv_vllm_latest because the local test conftest requires maru_shm, which is not installed in that environment.

🔗 Related Issues (optional)

🌿 Related PRs (optional)

🌿 Related Branches (optional)

📦 Release Note (for auto-generation / write in English)

NEW

CHANGED

FIXED

  • Maru vLLM connector now handles latest vLLM KV cache layouts that do not expose ForwardContext.virtual_engine.

IMPORTANT NOTES

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

@youngrok-XCENA youngrok-XCENA changed the title [codex] Support latest vLLM KV cache layout Support latest vLLM KV cache layout Jun 2, 2026
@youngrok-XCENA youngrok-XCENA deleted the codex/vllm-kv-cache-compat branch June 2, 2026 11:14
@youngrok-XCENA youngrok-XCENA changed the title Support latest vLLM KV cache layout fix: support latest vLLM KV cache layout Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant