feat(M87): Automatic KV Cache Export from vLLM by hlin99 · Pull Request #187 · xPyD-hub/xPyD-acc

hlin99 · 2026-04-06T08:03:53Z

Summary

Add capture-kv CLI command and capture_kv module for automatic KV cache extraction from vLLM endpoints.

What's included:

CaptureConfig: validated configuration (url, prompt, layers, capture points, TP size)
CaptureResult / LayerCapture: structured result with serialization
reconstruct_tp_shards(): reconstruct full tensors from TP-sharded arrays
save_capture() / load_capture(): .npz round-trip with metadata
capture_kv_mock(): mock capture for testing without a live vLLM instance
CLI capture-kv: --url, --prompt, --output, --layers, --capture-points, --tp-size, --max-tokens, --mock, --json
26 tests covering config validation, shard reconstruction, npz I/O, filtering, mock capture, CLI integration

Design decisions:

Mock-first approach: real vLLM integration requires monkey-patching which is version-specific; the module provides clean interfaces for when that's added
Reuses existing .npz format compatible with check-kv command
TP shard reconstruction validates shapes before concatenation

Closes #186

Add capture_kv module with: - CaptureConfig, CaptureResult, LayerCapture dataclasses - vLLM KV cache capture (mock mode for testing) - TP shard reconstruction via reconstruct_tp_shards() - save/load .npz with layer/head/position metadata - capture-kv CLI subcommand with --url, --prompt, --output, --layers, --capture-points, --tp-size, --max-tokens, --mock, --json flags - 26 tests covering config validation, capture logic, shard reconstruction, npz round-trip, layer filtering, mock capture, CLI integration Closes #186

hlin99-Review-Bot

✅ Approved by hlin99-Review-Bot

Idea Value: Good — KV cache export is a natural complement to the existing check-kv command, and the mock-first approach is pragmatic given vLLM version-specific hooks.

Code Quality:

Clean dataclass design (CaptureConfig, CaptureResult, LayerCapture)
TP shard reconstruction with proper shape validation
npz round-trip with metadata — compatible with existing format
26 tests covering config validation, capture, shard reconstruction, I/O, CLI
CI all green (lint + tests on 3.10/3.11/3.12)
docs/iterations/current.md updated

No issues found. LGTM.

hlin99-Review-BotX

✅ Approved by hlin99-Review-BotX

Idea Value: Good — KV cache export is a natural extension of the existing check-kv workflow. Mock-first approach is pragmatic.

Code Quality:

Clean dataclass design with proper validation
TP shard reconstruction with shape validation before concat
npz round-trip compatible with existing format
26 comprehensive tests
CI all green (lint + tests 3.10/3.11/3.12)
docs/iterations/current.md updated

Second approval — should trigger auto-merge. LGTM.

hlin99-Review-Bot approved these changes Apr 6, 2026

View reviewed changes

hlin99-Review-BotX approved these changes Apr 6, 2026

View reviewed changes

hlin99 merged commit 424059a into main Apr 6, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(M87): Automatic KV Cache Export from vLLM#187

feat(M87): Automatic KV Cache Export from vLLM#187
hlin99 merged 1 commit into
mainfrom
feat/m87-capture-kv

hlin99 commented Apr 6, 2026

Uh oh!

hlin99-Review-Bot left a comment

Uh oh!

hlin99-Review-BotX left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hlin99 commented Apr 6, 2026

Summary

What's included:

Design decisions:

Uh oh!

hlin99-Review-Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hlin99-Review-BotX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants