Open-source multi-cluster AI inference platform. Define functions once, deploy anywhere with KV-cache-aware smart routing, multi-tenancy, and autoscaling. Built on SkyPilot, vLLM, KAI Scheduler.
kubernetes open-source gpu multi-tenancy autoscaling inference-server mlops smart-routing multi-cluster kv-cache ai-inference llm vllm skypilot kai-scheduler
-
Updated
Mar 19, 2026 - Go