Kubernetes + Data + AI = Cubed AI
Open-source blueprints for running high-performance data and AI workloads on Kubernetes.
Kube-dAI is where I experiment with emerging tech, build benchmark tools, and create production-ready patterns for data and AI on Kubernetes. Think of it as a lab for scalable data infrastructure—mostly on AWS, always evolving.
What you'll find here:
- Benchmark tools for Spark, GPU acceleration, and distributed compute
- Infrastructure patterns using Terraform, Helm, and GitOps
- Performance analysis for real-world workloads (TPC-DS, RAPIDS, shuffle services)
- Operator utilities like Spark History Server integrations
- Agentic AI tools for data platforms—troubleshooting agents, upgrade agents, and autonomous optimization for Spark, Kubernetes, and distributed systems
If you're running Apache Spark at scale, training models on Kubernetes, or just curious about what's next in cloud-native data—this is the place.
| Repository | Description | Status |
|---|---|---|
| spark-rapids-on-kubernetes | GPU-accelerated Spark with RAPIDS on EKS | ✅ Live |
| spark-k8s-benchmarks | TPC-DS benchmark suite for Spark on K8s | ✅ Live |
| spark-history-server | Production-grade Helm chart for Spark History Server | ✅ Live |
More coming soon: Celeborn benchmarks, DRA experiments, agent orchestrators.
Orchestration: Kubernetes (EKS), Karpenter, ArgoCD
Data Processing: Apache Spark, RAPIDS, Velox, Celeborn
AI/ML: KServe, Ray, Triton Inference Server
IaC: Terraform, Crossplane, Helm
Observability: Prometheus, FluentBit, Spark UI
Explore the projects above—each repository has detailed setup instructions, architecture diagrams, and deployment guides.
Got ideas? Found a bug? Want to add a new benchmark?
- Fork the repo
- Create a feature branch
- Submit a PR
No bureaucracy—just useful contributions. Check individual repos for specific guidelines.
📝 Blog posts on Medium
💬 Open an issue for questions or advanced use cases
Apache 2.0 — use it, modify it, ship it.
Disclaimer: Independent project. Not affiliated with AWS, Apache, or NVIDIA. All trademarks belong to their respective owners.