lmcache

Here are 5 public repositories matching this topic...

High-Performance KV Cache Storage Engine on CXL Shared Memory for LLM Inference

python zero-copy shared-memory kv-cache cxl llm vllm lmcache

cache llm vllm lmcache

Multimodal LLM inference gateway with KV-cache-aware routing and LMCache offload. OpenAI-compatible, benchmarked on GPUs.

gateway inference prometheus openai multimodal fastapi kv-cache llm llmops vllm ai-infrastructure lmcache

Benchmarking LMCache under simulated RTT

docker benchmark simulation traffic-control vllm lmcache

Provide fast, memory-efficient caching for language models to improve response times and reduce redundant computations.

python docker fast benchmark simulation amd zero-copy cuda inference pytorch speed shared-memory traffic-control rocm kv-cache cxl llm vllm lmcache

Add a description, image, and links to the lmcache topic page so that developers can more easily learn about it.

To associate your repository with the lmcache topic, visit your repo's landing page and select "manage topics."