fix: support latest vLLM KV cache layout#54
Open
youngrok-XCENA wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤔 Background & Motivation (Why)
ForwardContext.virtual_engineon the load path.layer.kv_cacheis indexed byforward_context.virtual_engine, which crashes when a repeated prompt hits Maru andstart_load_kv()runs.🏗️ Design Changes
MaruWorkerConnector.start_load_kv()now supports both older list/tuple KV cache containers and newer direct tensor-stylelayer.kv_cachevalues.📝 Implementation Details
layer.kv_cacheis a list or tuple, keep the previous virtual-engine indexing behavior.getattr(forward_context, "virtual_engine", 0)so older layouts without the field still fall back to virtual engine 0.layer.kv_cacheis already a tensor, use it directly for KV injection.✅ Tests
Manual validation:
python -m py_compile maru_vllm/connector.pyNote:
pytest -q maru-private/tests/unit/test_vllm_connector.pywas not runnable in~/.venv_vllm_latestbecause the local test conftest requiresmaru_shm, which is not installed in that environment.🔗 Related Issues (optional)
🌿 Related PRs (optional)
🌿 Related Branches (optional)
📦 Release Note (for auto-generation / write in English)
NEW
CHANGED
FIXED
ForwardContext.virtual_engine.IMPORTANT NOTES