Skip to content

Conversation

@huwenjie333
Copy link
Collaborator

This PR adds the vLLM cache to the persist storage in the deployment at /root/.cache/vllm. It reduces cold start time from 1.5 mins to 1 min.

Screenshot 2026-01-09 at 2 16 52 PM

@huwenjie333 huwenjie333 requested a review from jqug January 9, 2026 11:26
@huwenjie333 huwenjie333 assigned PatrickCmd and unassigned PatrickCmd Jan 9, 2026
@huwenjie333 huwenjie333 requested a review from PatrickCmd January 9, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants