Skip to content

Commit 2267fe2

Browse files
committed
Setup aiwb/airm local env for development
1 parent b0614c9 commit 2267fe2

5 files changed

Lines changed: 1867 additions & 0 deletions

File tree

docs/local-dev-setup.md

Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,330 @@
1+
# Local Development Setup Guide (`setup-local-dev.sh`)
2+
3+
This guide explains the **idempotent** local development setup script that deploys the full AIRM + AIWB stack on a Kind cluster, building images from your local `~/core` repository.
4+
5+
## How It Differs from `bootstrap-kind-cluster.sh`
6+
7+
| | `bootstrap-kind-cluster.sh` | `setup-local-dev.sh` |
8+
|---|---|---|
9+
| **Idempotent** | No — regenerates secrets on re-run | Yes — skips healthy components |
10+
| **Local images** | Optional flag | Builds from `~/core` by default |
11+
| **AIRM + AIWB** | Deploys published images | Builds and deploys from local source |
12+
| **Keycloak** | Base realm import only | Configures dev users, redirect URIs, syncs secrets |
13+
| **Access** | Port-forward only | NodePort services on fixed localhost ports |
14+
| **Re-run safe** | Requires cluster deletion | Safe to re-run anytime (including after laptop restart) |
15+
16+
## Prerequisites
17+
18+
- **Docker** with at least 16 GB memory allocated
19+
- **Kind**`kind` CLI installed
20+
- **kubectl**, **helm** (v3+), **yq**, **openssl**
21+
- **Core repository** cloned at `~/core` (or set `LLM_STUDIO_CORE_PATH`)
22+
- **GHCR access** — either `docker login ghcr.io` or set `GHCR_TOKEN` env var
23+
- **`dev_helpers/`** — must be in your home directory (`~/dev_helpers/`) with AMD CA certificates:
24+
25+
```
26+
~/dev_helpers/
27+
└── certs/
28+
├── AMD_ROOT.crt
29+
├── AMD_ISSUER.crt
30+
└── AMD_COMBINED.crt
31+
```
32+
33+
These certificates are injected into the Kind node so containerd can pull images from registries behind the corporate TLS-intercepting proxy. Without them, image pulls will fail with `x509: certificate signed by unknown authority`. The script also checks `scripts/certs/` if you prefer to keep them in the repo.
34+
35+
## Quick Start
36+
37+
```bash
38+
# 1. Create the Kind cluster (only needed once)
39+
kind create cluster --name cluster-forge-local --config scripts/utils/kind-cluster-config.yaml
40+
41+
# 2. Run the setup
42+
./scripts/setup-local-dev.sh
43+
```
44+
45+
The script takes ~5-10 minutes on first run. Re-runs skip healthy components and finish much faster.
46+
47+
## Environment Variables
48+
49+
| Variable | Default | Description |
50+
|----------|---------|-------------|
51+
| `LLM_STUDIO_CORE_PATH` | `~/core` | Path to the core repository |
52+
| `GHCR_TOKEN` | *(none)* | GitHub token with `read:packages` scope |
53+
| `GHCR_USERNAME` | `git config user.name` | GitHub username for GHCR auth |
54+
| `SKIP_LOCAL_BUILD=1` | *(off)* | Skip building local AIRM/AIWB images |
55+
| `SKIP_IMAGE_PRELOAD=1` | *(off)* | Skip pre-loading container images into Kind |
56+
| `FORCE_REDEPLOY=1` | *(off)* | Ignore readiness checks, redeploy everything |
57+
58+
### Examples
59+
60+
```bash
61+
# Default — builds from ~/core, sets up everything
62+
./scripts/setup-local-dev.sh
63+
64+
# Custom core repo path
65+
LLM_STUDIO_CORE_PATH=/home/me/projects/core ./scripts/setup-local-dev.sh
66+
67+
# Skip image builds (use published images via GHCR)
68+
SKIP_LOCAL_BUILD=1 ./scripts/setup-local-dev.sh
69+
70+
# Force full redeploy after config changes
71+
FORCE_REDEPLOY=1 ./scripts/setup-local-dev.sh
72+
```
73+
74+
## What the Script Does (Step by Step)
75+
76+
### 1. Cluster Check & Certificates
77+
78+
Verifies a Kind cluster is running and applies corporate CA certificates if `scripts/fix_kind_certs.sh` exists. Checks that all prerequisites (`kubectl`, `helm`, `yq`, `openssl`) are installed.
79+
80+
### 2. Namespaces
81+
82+
Creates all required namespaces idempotently:
83+
`argocd`, `cf-gitea`, `cf-openbao`, `airm`, `aim-system`, `keycloak`, `aiwb`
84+
85+
### 3. GHCR Pull Secrets
86+
87+
Detects GHCR credentials in one of three ways (in order):
88+
1. `GHCR_TOKEN` environment variable
89+
2. Existing `docker login ghcr.io` credentials in `~/.docker/config.json`
90+
3. Falls back to building all images locally
91+
92+
Creates `ghcr-pull-secret` in `airm`, `aim-system`, `keycloak`, and `aiwb` namespaces and patches the `default` ServiceAccount in each.
93+
94+
### 4. Storage Class
95+
96+
Creates a `default` StorageClass using Kind's `local-path` provisioner so PVCs bind correctly.
97+
98+
### 5. ArgoCD
99+
100+
Deploys ArgoCD from vendored Helm charts. Skipped if already running. Waits for the application controller, Redis, and repo server to be ready.
101+
102+
### 6. OpenBao (Secrets Management)
103+
104+
Deploys OpenBao and runs the initialization job (creates unseal keys, root token, seeds secrets). On re-run:
105+
- If already running but **sealed** (e.g. after laptop restart), **automatically unseals** it
106+
- Triggers an ExternalSecrets refresh so all secrets sync immediately
107+
108+
### 7. Pre-load Container Images
109+
110+
Pulls common images on the Docker host and loads them into Kind to avoid Docker Hub rate limits and GHCR auth issues inside the cluster:
111+
112+
| Image | Purpose |
113+
|-------|---------|
114+
| `ghcr.io/silogen/keycloak-init:0.1` | Keycloak realm initialization |
115+
| `quay.io/keycloak/keycloak:26.0.0` | Keycloak server |
116+
| `busybox:1.37.0` | Init containers (readiness checks) |
117+
| `postgres:17-alpine` | CNPG init containers |
118+
| `docker.io/liquibase/liquibase:4.31` | Database migrations |
119+
| `rabbitmq:4.1.1-management` | AIRM message broker |
120+
121+
Skips images already present in Kind. Disable with `SKIP_IMAGE_PRELOAD=1`.
122+
123+
### 8. Build Local Images
124+
125+
Calls `scripts/build-local-images.sh` to build AIRM and AIWB images from the core repo and load them into Kind with the `:local` tag. Handles corporate proxy CA certificates automatically.
126+
127+
Triggered when:
128+
- Core repo is found at `LLM_STUDIO_CORE_PATH` (default `~/core`), OR
129+
- No GHCR credentials are available (local build is the only path)
130+
131+
Skipped with `SKIP_LOCAL_BUILD=1`.
132+
133+
### 9. Gitea (Internal Git)
134+
135+
Deploys Gitea and runs initialization jobs. Generates admin credentials on first run, preserves them on re-run.
136+
137+
### 10. Push Repositories to Gitea
138+
139+
Pushes `cluster-forge` and `core` repositories into the internal Gitea instance. ArgoCD watches these repos for application definitions.
140+
141+
### 11. Deploy ArgoCD Applications
142+
143+
Renders the root Helm template with `values_local_kind.yaml` and applies all ArgoCD Application resources. This triggers the full deployment of all enabled components through ArgoCD's GitOps sync.
144+
145+
### 12. NodePort Services
146+
147+
Creates NodePort services that map to fixed host ports via Kind's `extraPortMappings`:
148+
149+
| Service | NodePort | Host Port | URL |
150+
|---------|----------|-----------|-----|
151+
| AIRM UI | 30080 | 8000 | http://localhost:8000 |
152+
| AIWB UI | 30081 | 8001 | http://localhost:8001 |
153+
| Keycloak | 30082 | 8080 | http://localhost:8080 |
154+
| AIRM API | 30083 | 8083 | http://localhost:8083 |
155+
| AIWB API | 30084 | 8084 | http://localhost:8084 |
156+
157+
### 13. Bootstrap AIRM Agent
158+
159+
Creates RabbitMQ vhosts, users, and permissions needed by the AIRM agent. Seeds the `airm-rabbitmq-common-vhost-user` secret. Waits for RabbitMQ to be ready first.
160+
161+
### 14. Configure Keycloak
162+
163+
Configures Keycloak for local development via its Admin API:
164+
- Creates an `admin`/`admin` user in the master realm (for Keycloak console access)
165+
- Creates or updates `devuser@amd.com` / `password` in the `airm` realm
166+
- Assigns the `Platform Administrator` role to devuser
167+
- Adds `http://localhost:*` redirect URIs to OIDC clients (for Swagger, UI login)
168+
- Syncs the Keycloak client secret to `airm` and `aiwb` namespaces (so UI auth works)
169+
170+
### 15. Patch NEXTAUTH_URL
171+
172+
Patches the `NEXTAUTH_URL` environment variable on `airm-ui` and `aiwb-ui` deployments to use `http://localhost:8000` and `http://localhost:8001` respectively. Disables ArgoCD `selfHeal` on those apps so the patches persist.
173+
174+
### 16. AIRM Demo Onboarding
175+
176+
Registers a local cluster with the AIRM API using `devuser@amd.com` credentials. Updates the agent secret with API-issued credentials so heartbeats are tracked.
177+
178+
## Port Map
179+
180+
```
181+
Host Port → Service
182+
─────────────────────────────────────
183+
8000 → AIRM UI
184+
8001 → AIWB UI
185+
8080 → Keycloak
186+
8083 → AIRM API (Swagger: /docs)
187+
8084 → AIWB API (Swagger: /docs)
188+
5432 → PostgreSQL (direct access)
189+
9090* → ArgoCD (requires port-forward)
190+
3000* → Gitea (requires port-forward)
191+
8200* → OpenBao (requires port-forward)
192+
193+
* These services still require manual port-forward
194+
```
195+
196+
## Accessing Services
197+
198+
### Direct Access (NodePort — no port-forward needed)
199+
200+
| Service | URL | Credentials |
201+
|---------|-----|-------------|
202+
| AIRM UI | http://localhost:8000 | `devuser@amd.com` / `password` |
203+
| AIWB UI | http://localhost:8001 | `devuser@amd.com` / `password` |
204+
| AIRM Swagger | http://localhost:8083/docs | OAuth via Keycloak |
205+
| AIWB Swagger | http://localhost:8084/docs | OAuth via Keycloak |
206+
| Keycloak Admin | http://localhost:8080/admin | `admin` / `admin` |
207+
208+
### Port-Forward Required
209+
210+
```bash
211+
# ArgoCD
212+
kubectl port-forward svc/argocd-server -n argocd 9090:443
213+
# Open: https://localhost:9090
214+
# Password: kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
215+
216+
# Gitea
217+
kubectl port-forward svc/gitea-http -n cf-gitea 3000:3000
218+
# Open: http://localhost:3000
219+
220+
# OpenBao
221+
kubectl port-forward svc/openbao-active -n cf-openbao 8200:8200
222+
# Open: http://localhost:8200
223+
# Token: kubectl -n cf-openbao get secret openbao-keys -o jsonpath='{.data.root_token}' | base64 -d
224+
```
225+
226+
## Idempotency Behavior
227+
228+
The script is safe to re-run at any time. Here's what happens on each re-run:
229+
230+
| Component | Already Healthy | Unhealthy / Missing |
231+
|-----------|----------------|---------------------|
232+
| Namespaces | No-op | Created |
233+
| GHCR secrets | Overwritten (safe) | Created |
234+
| ArgoCD | Skipped | Deployed + waited |
235+
| OpenBao | Skipped (unsealed if needed) | Deployed + initialized |
236+
| Image pre-load | Skipped per image | Pulled + loaded |
237+
| Local image build | Rebuilt (always) | Built |
238+
| Gitea | Skipped | Deployed + initialized |
239+
| Git push | Fast push (no-op if up to date) | Full push |
240+
| ArgoCD apps | Re-applied (idempotent) | Created |
241+
| NodePort services | Re-applied (idempotent) | Created |
242+
| AIRM agent bootstrap | Skipped if secret exists | Created |
243+
| Keycloak config | Re-applied (idempotent) | Configured |
244+
| NEXTAUTH_URL patch | Skipped if correct | Patched |
245+
| AIRM onboarding | Skipped if cluster registered | Registered |
246+
247+
## After Laptop Restart
248+
249+
The Kind cluster persists across Docker restarts. After a reboot:
250+
251+
1. Start Docker
252+
2. Run `./scripts/setup-local-dev.sh`
253+
254+
The script will detect the existing cluster, **unseal OpenBao**, refresh ExternalSecrets, and skip everything else that's already healthy. This typically takes under a minute.
255+
256+
## Resetting the Cluster
257+
258+
```bash
259+
# Full reset
260+
kind delete cluster --name cluster-forge-local
261+
kind create cluster --name cluster-forge-local --config scripts/utils/kind-cluster-config.yaml
262+
./scripts/setup-local-dev.sh
263+
```
264+
265+
## Configuration Files
266+
267+
| File | Purpose |
268+
|------|---------|
269+
| `scripts/utils/kind-cluster-config.yaml` | Kind cluster definition with `extraPortMappings` |
270+
| `root/values_local_kind.yaml` | Helm values for all ArgoCD applications |
271+
| `scripts/setup-local-dev.sh` | Main setup script (this doc) |
272+
| `scripts/build-local-images.sh` | Builds AIRM/AIWB Docker images from local source |
273+
| `scripts/fix_kind_certs.sh` | Injects corporate CA certificates into Kind node |
274+
275+
## Troubleshooting
276+
277+
### OpenBao Sealed After Restart
278+
279+
The script handles this automatically. If you need to unseal manually:
280+
281+
```bash
282+
UNSEAL_KEY=$(kubectl get secret openbao-keys -n cf-openbao -o jsonpath='{.data.unseal_key}' | base64 -d)
283+
kubectl exec openbao-0 -n cf-openbao -- bao operator unseal "${UNSEAL_KEY}"
284+
```
285+
286+
### ExternalSecrets Stuck in SecretSyncedError
287+
288+
Usually caused by OpenBao being sealed. After unsealing, trigger a refresh:
289+
290+
```bash
291+
# The setup script does this automatically, but if needed manually:
292+
for es in $(kubectl get externalsecrets -A --no-headers -o custom-columns="NS:.metadata.namespace,NAME:.metadata.name" | tr -s ' '); do
293+
ns=$(echo "$es" | cut -d' ' -f1)
294+
name=$(echo "$es" | cut -d' ' -f2)
295+
kubectl annotate externalsecret "$name" -n "$ns" force-sync="$(date +%s)" --overwrite
296+
done
297+
```
298+
299+
### Docker Hub Rate Limits (ImagePullBackOff)
300+
301+
The script pre-loads common Docker Hub images. If you still hit limits:
302+
303+
```bash
304+
# Login to Docker Hub for higher limits
305+
docker login
306+
307+
# Manually pull and load an image
308+
docker pull <image>
309+
kind load docker-image <image> --name cluster-forge-local
310+
```
311+
312+
### Keycloak Login Redirect Errors
313+
314+
If you see `Invalid parameter: redirect_uri`, re-run the setup script — it adds all `localhost:*` redirect URIs to Keycloak clients.
315+
316+
### Pods Stuck in ImagePullBackOff for GHCR Images
317+
318+
Ensure GHCR credentials are set up:
319+
320+
```bash
321+
# Option A: Docker login
322+
docker login ghcr.io
323+
324+
# Option B: Token
325+
GHCR_TOKEN=ghp_xxx ./scripts/setup-local-dev.sh
326+
```
327+
328+
### NEXTAUTH_URL Errors / DNS_PROBE_STARTED
329+
330+
The script patches `NEXTAUTH_URL` to `http://localhost:8000` (AIRM) and `http://localhost:8001` (AIWB). If ArgoCD reverts this, re-run the script — it disables selfHeal before patching.

0 commit comments

Comments
 (0)