Skip to content

fix: support latest vLLM KV cache layout#54

Open
youngrok-XCENA wants to merge 1 commit into
mainfrom
fix/vllm-kv-cache-layout
Open

fix: support latest vLLM KV cache layout#54
youngrok-XCENA wants to merge 1 commit into
mainfrom
fix/vllm-kv-cache-layout

Conversation

@youngrok-XCENA

@youngrok-XCENA youngrok-XCENA commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

🤔 Background & Motivation (Why)

  • Latest vLLM versions no longer expose ForwardContext.virtual_engine on the load path.
  • The Maru vLLM connector currently assumes layer.kv_cache is indexed by forward_context.virtual_engine, which crashes when a repeated prompt hits Maru and start_load_kv() runs.

🏗️ Design Changes

  • Behavioral change: MaruWorkerConnector.start_load_kv() now supports both older list/tuple KV cache containers and newer direct tensor-style layer.kv_cache values.
  • No public API change.

📝 Implementation Details

  • If layer.kv_cache is a list or tuple, keep the previous virtual-engine indexing behavior.
  • Use getattr(forward_context, "virtual_engine", 0) so older layouts without the field still fall back to virtual engine 0.
  • If layer.kv_cache is already a tensor, use it directly for KV injection.

✅ Tests

  • Unit tests
  • Integration tests
  • Manual tests
  • No tests needed (reason: )

Manual validation:

  • python -m py_compile maru_vllm/connector.py

Note: pytest -q maru-private/tests/unit/test_vllm_connector.py was not runnable in ~/.venv_vllm_latest because the local test conftest requires maru_shm, which is not installed in that environment.

🔗 Related Issues (optional)

🌿 Related PRs (optional)

🌿 Related Branches (optional)

📦 Release Note (for auto-generation / write in English)

NEW

CHANGED

FIXED

  • Maru vLLM connector now handles latest vLLM KV cache layouts that do not expose ForwardContext.virtual_engine.

IMPORTANT NOTES

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

@youngrok-XCENA youngrok-XCENA marked this pull request as ready for review June 2, 2026 11:40

@jooho-XCENA jooho-XCENA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants