Skip to content

thephimart/private-ai-cloud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

307 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Private AI Cloud

Reproducible infrastructure deployment on Ubuntu Server 24.04 starting from a prepared host with SSH access.

Project Goals

Single-command deployment from a configured base host to a fully operational AI hosting platform.

  • Single-command reproducibility — sudo bash scripts/deploy.sh from base system to running cluster
  • Deterministic infrastructure — same input, same output; rebuild > mutate
  • LXD + kubeadm on a single host — VMs as Kubernetes nodes, no external dependencies
  • Self-hosted AI services — Ollama, Qdrant, and Open WebUI for LAN users
  • Documentation-first engineering — ADRs, runbooks, and phase documentation

Getting Started

  1. Prepare your host — Ubuntu Server 24.04 with SSH access (see docs/runbooks/prerequisites.md)
  2. Run preflight checks — sudo bash scripts/preflight.sh
  3. Deploy — sudo bash scripts/deploy.sh
  4. Configure DNS — sudo bash scripts/setup-hosts.sh (see docs/runbooks/prerequisites.md)
  5. Validate — bash tests/smoke/run.sh (see tests/smoke/README.md)

Architecture Overview

Bare Metal → Ubuntu Server → LXD → VMs → Kubernetes → Platform Services → AI Services

Detailed documentation: docs/architecture/system-overview.md

After Deployment

sudo bash scripts/setup-hosts.sh       # configure *.ai.local DNS on infra host
bash tests/smoke/run.sh              # validate the deployment

Smoke tests verify the system as a user would experience it. See tests/smoke/README.md for details.

Default credentials:

Service Username Password
Grafana admin admin
Prometheus (no auth)
Open WebUI (register on first visit)

Recovery

If a phase fails or you need to reset:

sudo bash scripts/cleanup.sh all       # full reset
sudo bash scripts/cleanup.sh phase-3  # reset through Phase 3, then re-run Phase 3

See docs/runbooks/rebuild-from-scratch.md for details.

Project Status

Current: Deployment pipeline complete. Run smoke tests after deployment to validate.

Planned:

  • VM resource right-sizing — assign role-appropriate CPU/memory per node
  • Self-healing infrastructure — golden image snapshots and automated node recovery
  • Horizontal node scaling — launch additional worker nodes from golden images

See docs/development-plan.md for implementation details.

Repository Structure

Directory Purpose
scripts/ Deployment scripts (orchestrator, per-phase init, cleanup)
config/ Configuration (defaults.yaml)
docs/ Phase documentation, ADRs, runbooks
tests/smoke/ Post-deployment validation tests

Architectural Decisions

Key design decisions are documented in docs/decisions/:

ADR Topic
0001 Project scope and objectives
0002 Directory storage + local-path-provisioner
0003 Single-node first architecture
0004 Routed/NAT mode for LXD bridge
0005 Deterministic infrastructure deployment
0006 VM-based Kubernetes node compilation
0007 Kubeadm init strategy
0008 Phase 5/6 service infrastructure

Intended Audience

  • Platform engineers
  • DevOps / SRE practitioners
  • Infrastructure architects
  • Homelab enthusiasts
  • Researchers exploring self-hosted AI systems

License

MIT License

About

Reproducible private AI platform - Ubuntu-Server-24.04 host running LXD VMs hosting a Kubernetes Cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages