Homebrew for AI Infrastructure - A simple CLI/UI tool that helps companies deploy and manage their own GPU infrastructure (bare metal, cloud, hybrid) without needing deep DevOps expertise.
- 🚀 Deployment Automation: Easily deploy GPU clusters on bare metal, cloud, or hybrid infrastructure
- 📊 Workload Scheduling: Intelligent job queue management and resource allocation
- 📈 Resource Monitoring: Real-time GPU utilization, memory, and performance monitoring
- 🔧 AI Framework Integration: Support for PyTorch, TensorFlow, JAX, and more
- ☁️ Multi-Cloud Support: Deploy to AWS, GCP, Azure, or on-premises
- 💰 Cost Tracking: Track and allocate infrastructure costs across teams and projects
git clone https://github.com/dewitt4/ai-deployment-manager.git
cd ai-deployment-manager
make build
make install# Initialize configuration
aidm init
# Deploy a GPU cluster
aidm deploy create
# Submit a workload
aidm schedule submit
# Monitor resources
aidm monitor resources
# Check costs
aidm cost reportInitialize the AI Deployment Manager configuration:
aidm initThis creates a configuration file at ~/.aidm/config.yaml with default settings.
# Create a new GPU cluster deployment
aidm deploy create
# List all deployments
aidm deploy list
# Check deployment status
aidm deploy status
# Delete a deployment
aidm deploy delete# Submit a job to the queue
aidm schedule submit
# List all jobs
aidm schedule list
# Cancel a job
aidm schedule cancel <job-id>
# Check queue status
aidm schedule queue# View resource utilization
aidm monitor resources
# Check GPU status
aidm monitor gpu
# Run optimization
aidm monitor optimize# Generate cost report
aidm cost report
# Update cost tracking
aidm cost track
# View cost allocations
aidm cost allocateai-deployment-manager/
├── cmd/
│ └── aidm/ # CLI entry point
├── pkg/
│ ├── deployment/ # GPU cluster deployment automation
│ ├── scheduler/ # Workload scheduling and queue management
│ ├── monitor/ # Resource monitoring and optimization
│ ├── integration/ # AI framework integrations
│ ├── cloud/ # Multi-cloud provider support
│ └── cost/ # Cost tracking and allocation
└── internal/
├── config/ # Configuration management
└── utils/ # Utility functions
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Microsoft Azure
- PyTorch
- TensorFlow
- JAX
- NVIDIA GPUs (A100, V100, T4, etc.)
- Support for CUDA-enabled workloads
Example ~/.aidm/config.yaml:
provider: local
gpu_type: nvidia
framework: pytorch
cloud:
aws:
region: us-west-2
gcp:
project: ""
azure:
subscription: ""
deployment:
cluster_size: 1
gpu_count: 1
monitoring:
enabled: true
interval: 60s
cost:
tracking_enabled: true
currency: USDmake buildmake testmake fmtContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache 2 License - see the LICENSE file for details.
For issues, questions, or contributions, please open an issue on GitHub.