From 41317232a0517b53cf729a7d1a9cc38734adf1f2 Mon Sep 17 00:00:00 2001 From: monadic Date: Tue, 14 Apr 2026 21:16:31 +0100 Subject: [PATCH 1/3] examples: add live AICR flux demo helper --- incubator/global-app-layer/README.md | 9 + .../gpu-eks-h100-training/AI_START_HERE.md | 16 + .../gpu-eks-h100-training/README.md | 31 +- .../gpu-eks-h100-training/contracts.md | 31 ++ .../gpu-eks-h100-training/demo-flux-oci.sh | 391 ++++++++++++++++++ .../gpu-eks-h100-training/lib.sh | 70 ++++ .../gpu-eks-h100-training/prompts.md | 19 + 7 files changed, 565 insertions(+), 2 deletions(-) create mode 100755 incubator/global-app-layer/gpu-eks-h100-training/demo-flux-oci.sh diff --git a/incubator/global-app-layer/README.md b/incubator/global-app-layer/README.md index 3f60a68..1fb6416 100644 --- a/incubator/global-app-layer/README.md +++ b/incubator/global-app-layer/README.md @@ -110,6 +110,15 @@ This package can express NVIDIA's [AICR](https://developer.nvidia.com/blog/valid The GPU example (`gpu-eks-h100-training`) is a **structural proof** using stub images. Swap in real NVIDIA images for functional GPU deployment. +For the fastest proven local Flux OCI lane on the dedicated `demo-flux` kind cluster, use: + +```bash +cd gpu-eks-h100-training +./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster +``` + +That helper keeps the short-prefix budget, repeat-run cleanup, GUI URLs, Flux proof, and cluster proof in one place. + For AICR details: - [confighub-aicr-value-add.md](./confighub-aicr-value-add.md) - [01-nvidia-aicr-fit.md](./01-nvidia-aicr-fit.md) diff --git a/incubator/global-app-layer/gpu-eks-h100-training/AI_START_HERE.md b/incubator/global-app-layer/gpu-eks-h100-training/AI_START_HERE.md index 4465fd7..eddde6a 100644 --- a/incubator/global-app-layer/gpu-eks-h100-training/AI_START_HERE.md +++ b/incubator/global-app-layer/gpu-eks-h100-training/AI_START_HERE.md @@ -20,6 +20,22 @@ Pause after every stage. Show full output. Give GUI links where possible. Do not continue until I say continue. ``` +## Fastest Proven Live Lane + +If the human explicitly wants the known-good local Flux OCI proof path, prefer: + +```bash +./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster +``` + +What to explain before running it: + +- this mutates ConfigHub and the dedicated `demo-flux` kind cluster +- it auto-picks a safe short prefix because the Flux live path has a Kubernetes label budget +- it cleans old local Flux bridge objects on repeat runs +- it waits for Flux `Kustomization Ready=True` before closeout +- it is still structural proof with stub images, not functional GPU proof + ## What This Example Teaches This is an NVIDIA AICR-shaped layered recipe: `gpu-operator` + `nvidia-device-plugin` with three deployment variants. After the demo, the human will understand: diff --git a/incubator/global-app-layer/gpu-eks-h100-training/README.md b/incubator/global-app-layer/gpu-eks-h100-training/README.md index 22b5230..5615558 100644 --- a/incubator/global-app-layer/gpu-eks-h100-training/README.md +++ b/incubator/global-app-layer/gpu-eks-h100-training/README.md @@ -21,7 +21,7 @@ The point is not to recreate all of NVIDIA AICR. The point is to show how Config | Delivery Mode | Status | Notes | |---------------|--------|-------| | **Direct Kubernetes** | Fully working | Worker applies YAML via `kubectl apply`. | -| **Flux OCI** | Fully working | Explicit Flux deployment variant. Current standard controller path. | +| **Flux OCI** | Fully working | Explicit Flux deployment variant. Use `./demo-flux-oci.sh` for the proven local lane and safe short prefix handling. | | **Argo OCI** | Implemented | Explicit Argo deployment variant. Requires ArgoCD v3.1+. See [`07-argo-oci-spec.md`](../07-argo-oci-spec.md). | | **ArgoCDRenderer** | Incompatible | Expects Argo `Application` payloads, not raw manifests. | @@ -83,7 +83,9 @@ In ConfigHub-only mode: In live mode: - deployment variants bound to compatible targets - successful `cub unit apply` -- live resources or delegated delivery objects visible +- Flux `OCIRepository` objects visible +- Flux `Kustomization` objects reaching `Ready=True` +- live workload resources visible in the cluster ## AI-Safe Path @@ -217,6 +219,31 @@ cd incubator/global-app-layer/gpu-eks-h100-training After `./setup.sh`, prefer the printed clickable GUI URLs and `.logs/*.latest.log` files over terminal scrollback alone. +## Fastest Proven Flux OCI Demo + +For the known-good local lane on the dedicated `demo-flux` kind cluster: + +```bash +./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster +``` + +This helper: + +- preflights the live target first +- auto-picks a safe short prefix for the Flux/Kubernetes label budget +- cleans old local Flux bridge objects on repeat runs +- materializes and verifies the layered chain +- approves and applies the Flux deployment units +- waits for Flux `Kustomization Ready=True` +- prints ConfigHub GUI URLs plus Flux and cluster proof surfaces + +Current local truth for this lane: + +- this is structural and delivery proof, not real NVIDIA functional GPU proof +- repeated local runs should use `--cleanup-first` +- the proven local Flux lane currently applies the demo workloads into `default` +- the helper keeps the 63-character Flux label limit visible by choosing a short prefix automatically + ## Upgrade Flow This example also demonstrates how base image updates propagate through the layered chains without flattening the higher-level recipe choices. diff --git a/incubator/global-app-layer/gpu-eks-h100-training/contracts.md b/incubator/global-app-layer/gpu-eks-h100-training/contracts.md index c6e50ad..aabfb19 100644 --- a/incubator/global-app-layer/gpu-eks-h100-training/contracts.md +++ b/incubator/global-app-layer/gpu-eks-h100-training/contracts.md @@ -17,6 +17,8 @@ - `.spaces | length == 8` - `.components | length == 2` - `.recipeManifest.unit == "recipe-eks-h100-ubuntu-training-stack"` + - `.liveConstraints.fluxPrefixMaxLength == 5` + - `.liveConstraints.knownGoodPrefixExample == "nfx05"` ### `./verify.sh --json` @@ -37,6 +39,35 @@ ## ConfigHub State Contracts +## Live Delivery Contract + +### `./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster` + +- mutates: yes +- output shape: human-readable live-run transcript with ConfigHub, Flux, and cluster proof blocks +- proves: + - the target passed read-only preflight before apply + - the helper chose or enforced a Flux-safe prefix + - the layered recipe was materialized and verified + - the Flux deployment units were approved and applied + - ConfigHub GUI URLs were surfaced for review + - Flux `OCIRepository` objects were fetched and Flux `Kustomization` objects reached `Ready=True` + - the demo workloads became visible in the local cluster +- expected anchors: + - `Mode: live delivery` + - `==> Waiting for Flux Kustomization Ready=True` + - `Flux Kustomizations are Ready=True.` + - `==> ConfigHub unit status` + - `==> Flux controller proof` + - `==> Cluster workload proof` + - `Completed live Flux OCI demo for prefix ` + +Known current local truth: + +- the proven local Flux lane currently applies the demo workloads into namespace `default` +- `--cleanup-first` is the repeat-run safe path for the dedicated local `demo-flux` cluster +- this proves layered recipe plus Flux OCI delivery, not functional NVIDIA GPU runtime + ### `cub space get -recipe-eks-h100-ubuntu-training --json` - mutates: no diff --git a/incubator/global-app-layer/gpu-eks-h100-training/demo-flux-oci.sh b/incubator/global-app-layer/gpu-eks-h100-training/demo-flux-oci.sh new file mode 100755 index 0000000..eb6be12 --- /dev/null +++ b/incubator/global-app-layer/gpu-eks-h100-training/demo-flux-oci.sh @@ -0,0 +1,391 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=./lib.sh +source "${SCRIPT_DIR}/lib.sh" + +PREFLIGHT_SCRIPT="${SCRIPT_DIR}/../preflight-live.sh" +DEFAULT_TARGET_REF="demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster" +LOCAL_FLUX_WORKLOAD_NAMESPACE="default" + +target_ref="${DEFAULT_TARGET_REF}" +prefix="" +cleanup_first=0 +skip_cluster_proof=0 +kubeconfig_path="" +kube_context="" +apply_timeout="15m" +flux_ready_timeout_seconds=120 +GENERATED_KUBECONFIG="" +CLUSTER_PROOF_REASON="" +KUBECTL_BASE=() + +usage() { + cat <<'EOF_USAGE' +Usage: + ./demo-flux-oci.sh [options] + +Run the proven NVIDIA AICR-shaped Flux OCI lane end to end: +- preflight the target +- pick a safe short prefix for the Flux live naming budget +- materialize the layered recipe in ConfigHub +- verify the structure +- approve and apply the Flux deployment units +- print ConfigHub GUI URLs plus Flux and cluster proof surfaces + +Options: + --target Flux OCI target ref (default: demo-flux local lane) + --prefix Prefix to use instead of auto-picking a safe short one + --cleanup-first Remove existing local example state before running + --kubeconfig Optional kubeconfig for cluster proof on non-local lanes + --kube-context Optional kubectl context name + --skip-cluster-proof Skip kubectl proof even if a cluster is reachable + --apply-timeout Timeout for cub unit apply --wait (default: 15m) + --flux-ready-timeout Seconds to wait for Flux Kustomization Ready=True (default: 120) + -h, --help Show this help +EOF_USAGE +} + +while [[ $# -gt 0 ]]; do + case "$1" in + --target) + target_ref="${2:-}" + shift 2 + ;; + --prefix) + prefix="${2:-}" + shift 2 + ;; + --cleanup-first) + cleanup_first=1 + shift + ;; + --kubeconfig) + kubeconfig_path="${2:-}" + shift 2 + ;; + --kube-context) + kube_context="${2:-}" + shift 2 + ;; + --skip-cluster-proof) + skip_cluster_proof=1 + shift + ;; + --apply-timeout) + apply_timeout="${2:-}" + shift 2 + ;; + --flux-ready-timeout) + flux_ready_timeout_seconds="${2:-}" + shift 2 + ;; + -h|--help) + usage + exit 0 + ;; + *) + echo "Unknown option: $1" >&2 + usage >&2 + exit 1 + ;; + esac +done + +cleanup_temp_artifacts() { + if [[ -n "${GENERATED_KUBECONFIG}" && -f "${GENERATED_KUBECONFIG}" ]]; then + rm -f "${GENERATED_KUBECONFIG}" + fi +} +trap cleanup_temp_artifacts EXIT + +choose_unique_flux_demo_prefix() { + local attempt=0 + local candidate="" + local candidate_space="" + + while (( attempt < 32 )); do + candidate="$(generate_safe_flux_demo_prefix)" + candidate_space="${candidate}-${DEPLOY_FLUX_SPACE_SUFFIX}" + if ! cub space get "${candidate_space}" >/dev/null 2>&1; then + printf '%s\n' "${candidate}" + return 0 + fi + attempt=$((attempt + 1)) + done + + echo "Unable to generate a unique safe Flux demo prefix after ${attempt} attempts." >&2 + exit 1 +} + +run_kubectl() { + if [[ "${#KUBECTL_BASE[@]}" -eq 0 ]]; then + return 1 + fi + + if [[ -n "${kube_context}" ]]; then + "${KUBECTL_BASE[@]}" --context "${kube_context}" "$@" + return + fi + + "${KUBECTL_BASE[@]}" "$@" +} + +ensure_clean_local_state() { + if ! state_exists; then + return + fi + + if [[ "${cleanup_first}" -eq 1 ]]; then + echo "==> Cleaning up prior local example state" + bash "${SCRIPT_DIR}/cleanup.sh" + return + fi + + cat >&2 </dev/null 2>&1; then + return + fi + + if ! kind get clusters 2>/dev/null | grep -qx 'demo-flux'; then + return + fi + + local tmp_kubeconfig + tmp_kubeconfig="$(mktemp "${TMPDIR:-/tmp}/demo-flux-cleanup-kubeconfig.XXXXXX")" + KUBECONFIG="${tmp_kubeconfig}" kind export kubeconfig --name demo-flux >/dev/null + + echo "==> Cleaning old Flux bridge objects from the dedicated demo-flux cluster" + KUBECONFIG="${tmp_kubeconfig}" kubectl --context kind-demo-flux delete ocirepositories,kustomizations \ + -n flux-system \ + -l app.kubernetes.io/managed-by=flux-oci-bridge \ + --ignore-not-found >/dev/null 2>&1 || true + + echo "==> Cleaning old demo workloads from the dedicated demo-flux cluster" + KUBECONFIG="${tmp_kubeconfig}" kubectl --context kind-demo-flux delete \ + deployment/gpu-operator \ + daemonset/nvidia-device-plugin \ + service/gpu-operator \ + -n "${LOCAL_FLUX_WORKLOAD_NAMESPACE}" \ + --ignore-not-found >/dev/null 2>&1 || true + + rm -f "${tmp_kubeconfig}" +} + +preflight_target() { + local preflight_json + + echo "==> Preflighting Flux live target" + preflight_json="$(bash "${PREFLIGHT_SCRIPT}" "${target_ref}" --json)" + echo "${preflight_json}" | jq '{targetRef, applyReady, providerType, deliveryMode, bridgeWorker}' + + if ! jq -e '.applyReady == true' >/dev/null <<<"${preflight_json}"; then + echo "${preflight_json}" | jq '{targetRef, reasons, nextSteps}' + echo "Target is not ready for live delivery." >&2 + exit 2 + fi + + if ! jq -e '.deliveryMode == "flux-oci"' >/dev/null <<<"${preflight_json}"; then + echo "${preflight_json}" | jq '{targetRef, providerType, deliveryMode, reasons, nextSteps}' + echo "This helper is only for Flux OCI live targets." >&2 + exit 2 + fi +} + +prepare_cluster_proof_lane() { + local access_error="" + + if [[ "${skip_cluster_proof}" -eq 1 ]]; then + CLUSTER_PROOF_REASON="cluster proof was skipped by request" + return + fi + + if [[ -n "${kubeconfig_path}" ]]; then + if [[ ! -f "${kubeconfig_path}" ]]; then + echo "Kubeconfig not found: ${kubeconfig_path}" >&2 + exit 1 + fi + KUBECTL_BASE=(env "KUBECONFIG=${kubeconfig_path}" kubectl) + elif [[ "${target_ref}" == demo-flux/* ]] && command -v kind >/dev/null 2>&1; then + if kind get clusters 2>/dev/null | grep -qx 'demo-flux'; then + GENERATED_KUBECONFIG="$(mktemp "${TMPDIR:-/tmp}/demo-flux-kubeconfig.XXXXXX")" + KUBECONFIG="${GENERATED_KUBECONFIG}" kind export kubeconfig --name demo-flux >/dev/null + KUBECTL_BASE=(env "KUBECONFIG=${GENERATED_KUBECONFIG}" kubectl) + if [[ -z "${kube_context}" ]]; then + kube_context="kind-demo-flux" + fi + else + CLUSTER_PROOF_REASON="demo-flux kind cluster was not found locally" + return + fi + else + CLUSTER_PROOF_REASON="no kubeconfig was provided for cluster proof" + return + fi + + if access_error="$(run_kubectl get ns --request-timeout=5s -o name 2>&1 >/dev/null)"; then + return + fi + + KUBECTL_BASE=() + CLUSTER_PROOF_REASON="$(printf '%s\n' "${access_error}" | sed -n '1p')" +} + +approve_and_apply_flux_units() { + local unit_csv + + unit_csv="$(deployment_unit_name gpu-operator flux),$(deployment_unit_name nvidia-device-plugin flux)" + + echo "==> Approving Flux deployment units" + cub unit approve --space "$(flux_deploy_space)" --unit "${unit_csv}" + + echo "==> Applying Flux deployment units" + cub unit apply --space "$(flux_deploy_space)" --unit "${unit_csv}" --wait --timeout "${apply_timeout}" +} + +wait_for_flux_kustomizations_ready() { + if [[ "${#KUBECTL_BASE[@]}" -eq 0 ]]; then + return + fi + + local start_seconds now_seconds elapsed_seconds + local gpu_name device_name + local gpu_ready device_ready + + gpu_name="$(flux_deploy_space)-$(deployment_unit_name gpu-operator flux)" + device_name="$(flux_deploy_space)-$(deployment_unit_name nvidia-device-plugin flux)" + start_seconds="$(date +%s)" + + echo "==> Waiting for Flux Kustomization Ready=True" + + while true; do + gpu_ready="$( + run_kubectl get kustomization -n flux-system "${gpu_name}" -o json 2>/dev/null \ + | jq -r '[.status.conditions[]? | select(.type == "Ready")][0].status // "Unknown"' 2>/dev/null || true + )" + device_ready="$( + run_kubectl get kustomization -n flux-system "${device_name}" -o json 2>/dev/null \ + | jq -r '[.status.conditions[]? | select(.type == "Ready")][0].status // "Unknown"' 2>/dev/null || true + )" + + if [[ "${gpu_ready}" == "True" && "${device_ready}" == "True" ]]; then + echo "Flux Kustomizations are Ready=True." + return + fi + + now_seconds="$(date +%s)" + elapsed_seconds=$((now_seconds - start_seconds)) + if (( elapsed_seconds >= flux_ready_timeout_seconds )); then + echo "Timed out waiting for Flux Kustomization Ready=True." >&2 + echo "- ${gpu_name}: ${gpu_ready:-Unknown}" >&2 + echo "- ${device_name}: ${device_ready:-Unknown}" >&2 + return 1 + fi + + echo "- ${gpu_name}: ${gpu_ready:-Unknown}" + echo "- ${device_name}: ${device_ready:-Unknown}" + sleep 5 + done +} + +show_confighub_proof() { + echo "==> ConfigHub unit status" + cub unit list --space "$(flux_deploy_space)" --quiet --json \ + | jq '.[] | { + slug: .Unit.Slug, + headRevision: (.Unit.HeadRevisionNum // null), + lastAppliedRevision: (.Unit.LastAppliedRevisionNum // null), + liveRevision: (.Unit.LiveRevisionNum // null), + status: (.UnitStatus.Status // null), + actionResult: (.UnitStatus.ActionResult // null) + }' + + echo "==> GUI review URLs" + echo "- Flux deploy space: $(gui_space_url "$(flux_deploy_space)")" + echo "- Flux unit (gpu-operator): $(gui_unit_url "$(flux_deploy_space)" "$(deployment_unit_name gpu-operator flux)")" + echo "- Flux unit (nvidia-device-plugin): $(gui_unit_url "$(flux_deploy_space)" "$(deployment_unit_name nvidia-device-plugin flux)")" + echo "- Recipe manifest: $(gui_unit_url "$(recipe_space)" "${RECIPE_MANIFEST_UNIT}")" +} + +show_cluster_proof() { + if [[ "${#KUBECTL_BASE[@]}" -eq 0 ]]; then + echo "==> Cluster proof skipped" + echo "- reason: ${CLUSTER_PROOF_REASON:-unavailable}" + echo "- review later:" + echo " kubectl get ocirepositories,kustomizations -A | grep -F '$(flux_deploy_space)'" + echo " kubectl get deployment/gpu-operator daemonset/nvidia-device-plugin service/gpu-operator -n ${LOCAL_FLUX_WORKLOAD_NAMESPACE}" + return + fi + + echo "==> Flux controller proof" + run_kubectl get ocirepositories,kustomizations -A | grep -F "$(flux_deploy_space)" || true + + echo "==> Cluster workload proof" + run_kubectl get deployment/gpu-operator daemonset/nvidia-device-plugin service/gpu-operator -n "${LOCAL_FLUX_WORKLOAD_NAMESPACE}" +} + +require_cub +require_jq +begin_log_capture demo-flux-oci +ensure_clean_local_state +cleanup_local_demo_flux_cluster_state +preflight_target +prepare_cluster_proof_lane + +if [[ -z "${prefix}" ]]; then + prefix="$(choose_unique_flux_demo_prefix)" +fi + +assert_flux_prefix_budget "${prefix}" + +echo "Mode: live delivery" +echo "Using safe Flux demo prefix: ${prefix}" +echo "Flux label-value prefix budget: $(max_flux_prefix_length) characters" +echo "Target: ${target_ref}" + +echo "==> Materializing layered recipe and binding Flux target" +bash "${SCRIPT_DIR}/setup.sh" "${prefix}" "${target_ref}" + +echo "==> Verifying ConfigHub structure" +bash "${SCRIPT_DIR}/verify.sh" + +load_state +approve_and_apply_flux_units +wait_for_flux_kustomizations_ready +show_confighub_proof +show_cluster_proof + +cat < longest_suffix )); then + longest_suffix=${#current_suffix} + fi + done + + echo "${longest_suffix}" +} + +max_flux_prefix_length() { + echo $(( K8S_LABEL_VALUE_MAX_LEN - $(flux_tracking_suffix_length) )) +} + +generate_safe_flux_demo_prefix() { + printf 'nfx%02x\n' $((RANDOM % 256)) +} + +assert_flux_prefix_budget() { + local prefix="$1" + local max_prefix_len + max_prefix_len="$(max_flux_prefix_length)" + + if (( ${#prefix} <= max_prefix_len )); then + return 0 + fi + + cat >&2 < +EOF_BUDGET + exit 1 +} + setup_usage() { cat <<'EOF_USAGE' Usage: @@ -532,6 +584,12 @@ Layer mutations: - recipe: set training intent and validation/plugin profile - deployment variants: all leaves set namespace=${DEPLOY_NAMESPACE} and CLUSTER=${DEPLOY_NAMESPACE} +Flux live constraint: +- Flux copies the Kustomization name into the kustomize.toolkit.fluxcd.io/name label +- Kubernetes label values are limited to ${K8S_LABEL_VALUE_MAX_LEN} characters +- with the current component names, keep the prefix length <= $(max_flux_prefix_length) for the Flux live lane +- use ./demo-flux-oci.sh to auto-pick a safe Flux demo prefix + Recipe manifest: - $(recipe_space)/${RECIPE_MANIFEST_UNIT} - source template: ${RECIPE_BASE_TEMPLATE} @@ -588,6 +646,9 @@ show_setup_plan_json() { --arg manifestUnit "${RECIPE_MANIFEST_UNIT}" \ --arg manifestTemplate "${RECIPE_BASE_TEMPLATE}" \ --arg manifestRendered "${STATE_DIR}/recipe-eks-h100-ubuntu-training-stack.rendered.yaml" \ + --argjson fluxPrefixMaxLength "$(max_flux_prefix_length)" \ + --arg fluxPrefixReason "Flux copies the Kustomization name into the kustomize.toolkit.fluxcd.io/name label, and Kubernetes label values are limited to 63 characters." \ + --arg fluxPrefixExample "${SAFE_FLUX_PREFIX_EXAMPLE}" \ '{ example: $example, mode: "setup-plan", @@ -659,6 +720,11 @@ show_setup_plan_json() { template: $manifestTemplate, renderedOutput: $manifestRendered }, + liveConstraints: { + fluxPrefixMaxLength: $fluxPrefixMaxLength, + fluxReason: $fluxPrefixReason, + knownGoodPrefixExample: $fluxPrefixExample + }, commands: [ "cub space create", "cub unit create ", @@ -852,6 +918,9 @@ set_target_for_compatible_units() { variant="$(deployment_variant_for_provider_type "${provider_type}")" case "${variant}" in direct|flux|argo) + if [[ "${variant}" == "flux" ]]; then + assert_flux_prefix_budget "$(state_prefix)" + fi set_target_for_deploy_variant "${variant}" "${target_ref}" remember_target_ref_for_variant "${target_ref}" ;; @@ -948,6 +1017,7 @@ Next steps: 10. cub unit approve --space $(argo_deploy_space) $(deployment_unit_name gpu-operator argo) && cub unit approve --space $(argo_deploy_space) $(deployment_unit_name nvidia-device-plugin argo) 11. cub unit apply --space $(argo_deploy_space) $(deployment_unit_name gpu-operator argo) && cub unit apply --space $(argo_deploy_space) $(deployment_unit_name nvidia-device-plugin argo) 12. Review recipe manifest: cub unit get --space $(recipe_space) --data-only ${RECIPE_MANIFEST_UNIT} +13. Fastest local Flux proof: ./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster EOF_SUMMARY } diff --git a/incubator/global-app-layer/gpu-eks-h100-training/prompts.md b/incubator/global-app-layer/gpu-eks-h100-training/prompts.md index d950af5..2efcb55 100644 --- a/incubator/global-app-layer/gpu-eks-h100-training/prompts.md +++ b/incubator/global-app-layer/gpu-eks-h100-training/prompts.md @@ -45,6 +45,25 @@ After running `gpu-eks-h100-training`, verify: - live apply state if used - summarize what definitely happened, what did not happen, and what still depends on missing infrastructure +## Prove NVIDIA AICR With Flux OCI + +Use the proven local AICR Flux lane in `gpu-eks-h100-training`. + +Run: + +```bash +./demo-flux-oci.sh --cleanup-first --target demo-flux/flux-renderer-worker-fluxoci-kubernetes-yaml-cluster +``` + +While doing it: + +- surface the ConfigHub GUI URLs early +- show the ConfigHub unit status block after apply +- show Flux `OCIRepository` and `Kustomization` output, not just narration +- show the cluster workload objects after apply +- keep structural proof distinct from functional GPU proof +- harvest any product or docs gaps you notice at the end + ## Whole Lifecycle Walkthrough Guide me through the full `gpu-eks-h100-training` lifecycle. From 679fc7c74136047601d703052e2a9acb4473df7b Mon Sep 17 00:00:00 2001 From: monadic Date: Wed, 15 Apr 2026 08:13:39 +0100 Subject: [PATCH 2/3] docs: update renderer import contract --- gitops-import/README.md | 15 +++++++++++++-- gitops-import/gitops-import/README.md | 15 +++++++++++++-- incubator/gitops-import-argo/AI_START_HERE.md | 4 ++++ incubator/gitops-import-argo/README.md | 14 ++++++++++++++ incubator/gitops-import-argo/contracts.md | 10 ++++++++++ incubator/gitops-import-flux/AI_START_HERE.md | 11 +++++++++-- incubator/gitops-import-flux/README.md | 7 +++++++ incubator/gitops-import-flux/contracts.md | 10 ++++++++++ 8 files changed, 80 insertions(+), 6 deletions(-) diff --git a/gitops-import/README.md b/gitops-import/README.md index db54ddc..a5eef3d 100644 --- a/gitops-import/README.md +++ b/gitops-import/README.md @@ -35,10 +35,21 @@ After setup, the worker registers two targets: # Discover ArgoCD Applications cub gitops discover --space worker-kubernetes-yaml-cluster -# Import into ConfigHub -cub gitops import --space worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster +# Preferred current import path +cub gitops import --space worker-kubernetes-yaml-cluster + +# If your build still requires an explicit renderer target, make the renderer +# authoritative first: +cub target update --space --patch --option IsAuthoritative=true \ + worker-argocdrenderer-kubernetes-yaml-cluster +cub gitops import --space worker-kubernetes-yaml-cluster \ + worker-argocdrenderer-kubernetes-yaml-cluster ``` +If the source Argo `Application` is actively syncing and the renderer target is +unset or non-authoritative, import should now fail visibly. That failure is +correct, not flaky. + ## Applying ConfigHub units to this cluster After setup, you can also use this cluster to apply ConfigHub-managed units directly (e.g. from the layered recipe examples). diff --git a/gitops-import/gitops-import/README.md b/gitops-import/gitops-import/README.md index 42bbf99..57e1b73 100644 --- a/gitops-import/gitops-import/README.md +++ b/gitops-import/gitops-import/README.md @@ -35,10 +35,21 @@ After setup, the worker registers two targets: # Discover ArgoCD Applications cub gitops discover --space worker-kubernetes-yaml-cluster -# Import into ConfigHub -cub gitops import --space worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster +# Preferred current import path +cub gitops import --space worker-kubernetes-yaml-cluster + +# If your build still requires an explicit renderer target, make the renderer +# authoritative first: +cub target update --space --patch --option IsAuthoritative=true \ + worker-argocdrenderer-kubernetes-yaml-cluster +cub gitops import --space worker-kubernetes-yaml-cluster \ + worker-argocdrenderer-kubernetes-yaml-cluster ``` +If the source Argo `Application` is actively syncing and the renderer target is +unset or non-authoritative, import should now fail visibly. That failure is +correct, not flaky. + ## Teardown ```bash diff --git a/incubator/gitops-import-argo/AI_START_HERE.md b/incubator/gitops-import-argo/AI_START_HERE.md index c0cd219..9ca98d5 100644 --- a/incubator/gitops-import-argo/AI_START_HERE.md +++ b/incubator/gitops-import-argo/AI_START_HERE.md @@ -148,6 +148,7 @@ Ask: "This will import discovered resources into ConfigHub. Ready to proceed?" Run: ```bash +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true worker-argocdrenderer-kubernetes-yaml-cluster cub gitops import --space "$CUB_SPACE" worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster --wait cub unit list --space "$CUB_SPACE" --json | jq cub unit-action list --space "$CUB_SPACE" @@ -165,6 +166,9 @@ What to explain: - ConfigHub proves import and renderer facts - `cub-scout` proves live ownership and GitOps context - Contrast path is intentionally mixed — healthy and unhealthy apps reported separately +- The authoritative renderer patch is deliberate +- If the source `Application` were still actively syncing and the renderer + target stayed non-authoritative, import should now fail visibly GUI now: Open ConfigHub space and review units, links, and recent actions. diff --git a/incubator/gitops-import-argo/README.md b/incubator/gitops-import-argo/README.md index 6f6f5df..3345d5e 100644 --- a/incubator/gitops-import-argo/README.md +++ b/incubator/gitops-import-argo/README.md @@ -90,6 +90,8 @@ export CUB_SPACE= ./verify.sh cub target list --space "$CUB_SPACE" --json | jq cub gitops discover --space "$CUB_SPACE" worker-kubernetes-yaml-cluster --json | jq +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true \ + worker-argocdrenderer-kubernetes-yaml-cluster cub gitops import --space "$CUB_SPACE" worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster --wait ``` @@ -139,9 +141,16 @@ Use those for the import path: ```bash cub target list --space "$CUB_SPACE" --json | jq cub gitops discover --space "$CUB_SPACE" worker-kubernetes-yaml-cluster --json | jq +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true \ + worker-argocdrenderer-kubernetes-yaml-cluster cub gitops import --space "$CUB_SPACE" worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster --wait ``` +This authoritative step matters now. If the source Argo `Application` is still +actively syncing and the renderer target is unset or non-authoritative, import +should now fail visibly. Treat that as a correct protection, not a flaky +warning. + The worker runs locally and records its pid and log in `var/worker.pid` and `var/worker.log`. The helper also keeps a local ArgoCD API port-forward for the renderer worker in `var/argocd-port-forward.pid` and `var/argocd-port-forward.log`. ## What Success Looks Like @@ -234,6 +243,11 @@ It can prove that: - the renderer target accepted it - ArgoCD refreshed or reconciled its view of that application +For current renderer behavior, that statement assumes the renderer target is +authoritative or the source `Application` is no longer actively syncing. A +non-authoritative renderer pointed at an actively syncing source should now +fail instead of only warning. + It does not by itself prove that the specific ConfigHub action created new workloads. If the workloads already existed and ArgoCD was already managing them, the strongest honest conclusion is usually that ConfigHub successfully triggered or refreshed Argo-side reconciliation, not that ConfigHub created the workloads through Argo. ## Optional Contrast Path diff --git a/incubator/gitops-import-argo/contracts.md b/incubator/gitops-import-argo/contracts.md index 93f8e63..3c3cea3 100644 --- a/incubator/gitops-import-argo/contracts.md +++ b/incubator/gitops-import-argo/contracts.md @@ -73,6 +73,15 @@ This file documents the safest stable inspection paths for `gitops-import-argo`. - discover ran against the Kubernetes target - GitOps resources were found and serialized into ConfigHub discover state +### `cub target update --space --patch --option IsAuthoritative=true worker-argocdrenderer-kubernetes-yaml-cluster` + +- mutates: yes, ConfigHub only +- output shape: text +- proves: + - the renderer target is configured to manage Argo `Application` resources + authoritatively + - the import path will not rely on the old non-authoritative warning behavior + ### `cub gitops import --space worker-kubernetes-yaml-cluster worker-argocdrenderer-kubernetes-yaml-cluster --wait` - mutates: yes, ConfigHub only @@ -80,6 +89,7 @@ This file documents the safest stable inspection paths for `gitops-import-argo`. - proves: - renderer and wet units were created - the renderer stage completed or failed visibly + - a surfaced autosync conflict is treated as a real block, not as ignorable ### `cub unit list --space --json` diff --git a/incubator/gitops-import-flux/AI_START_HERE.md b/incubator/gitops-import-flux/AI_START_HERE.md index a97c5e6..52738ed 100644 --- a/incubator/gitops-import-flux/AI_START_HERE.md +++ b/incubator/gitops-import-flux/AI_START_HERE.md @@ -113,17 +113,20 @@ GUI feature ask: Pre-import discovery view in ConfigHub. No issue filed yet. **PAUSE.** Wait for the human. -## Stage 4: "Confirm ConfigHub Auth" (read-only gate) +## Stage 4: "Confirm ConfigHub Access" (read-only gate) Run: ```bash cub info +cub space list --names ``` What to explain: -- This is a read-only auth gate before any ConfigHub mutation +- `cub info` proves server details, not authenticated access by itself +- `cub space list --names` is the read-only authenticated access gate before any + ConfigHub mutation - If auth is expired, stop here - The next stage should not start until auth is valid @@ -176,6 +179,7 @@ Ask: "This will import discovered resources into ConfigHub. Ready to proceed?" Run: ```bash +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true cub gitops import --space "$CUB_SPACE" --wait cub target list --space "$CUB_SPACE" --json | jq cub unit list --space "$CUB_SPACE" --json | jq @@ -194,6 +198,9 @@ What to explain: - ConfigHub proves discover and renderer facts - `cub-scout` proves live ownership and GitOps context +- The authoritative renderer patch is deliberate +- If the source Flux object were still actively syncing and the renderer target + stayed non-authoritative, import should now fail visibly GUI now: Open ConfigHub space and review targets, units, and actions. diff --git a/incubator/gitops-import-flux/README.md b/incubator/gitops-import-flux/README.md index 006e9ff..577e1b2 100644 --- a/incubator/gitops-import-flux/README.md +++ b/incubator/gitops-import-flux/README.md @@ -87,6 +87,7 @@ export CUB_SPACE= ./verify.sh cub target list --space "$CUB_SPACE" --json | jq cub gitops discover --space "$CUB_SPACE" --json | jq +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true cub gitops import --space "$CUB_SPACE" --wait ``` @@ -134,9 +135,15 @@ Use those for the import path: ```bash cub target list --space "$CUB_SPACE" --json | jq cub gitops discover --space "$CUB_SPACE" --json | jq +cub target update --space "$CUB_SPACE" --patch --option IsAuthoritative=true cub gitops import --space "$CUB_SPACE" --wait ``` +This authoritative step matters now. If the source Flux `Kustomization` or +`HelmRelease` is still actively syncing and the renderer target is unset or +non-authoritative, import should now fail visibly. Treat that as a correct +protection, not a flaky warning. + The worker installer preloads the Flux controller images and the ConfigHub Flux worker image into the kind cluster before waiting on rollout. That avoids long or flaky first-time image pulls from becoming the main story of the example. The `fluxoci` target is useful beyond this import example. It is the deployment bridge that raw-manifest examples such as `incubator/global-app-layer/gpu-eks-h100-training` can bind to when they want a Flux-managed deployment variant. diff --git a/incubator/gitops-import-flux/contracts.md b/incubator/gitops-import-flux/contracts.md index d4bfc0a..6c9f75c 100644 --- a/incubator/gitops-import-flux/contracts.md +++ b/incubator/gitops-import-flux/contracts.md @@ -98,6 +98,15 @@ This file documents the safest stable inspection paths for `gitops-import-flux`. - discover ran against the Kubernetes target - Flux deployers were found and serialized into ConfigHub discover state +### `cub target update --space --patch --option IsAuthoritative=true ` + +- mutates: yes, ConfigHub only +- output shape: text +- proves: + - the Flux renderer target is configured to manage source objects + authoritatively + - the import path will not rely on the old non-authoritative warning behavior + ### `cub gitops import --space --wait` - mutates: yes, ConfigHub only @@ -106,6 +115,7 @@ This file documents the safest stable inspection paths for `gitops-import-flux`. - renderer and wet units were created - the renderer stage completed or failed visibly - the healthy `podinfo` path can render successfully even when contrast paths fail for real source reasons + - a surfaced source-sync conflict is treated as a real block, not as ignorable ### `cub unit list --space --json` From 39d71dba6221532f9fbb11e7b185e084f0f5422e Mon Sep 17 00:00:00 2001 From: monadic Date: Wed, 15 Apr 2026 08:25:00 +0100 Subject: [PATCH 3/3] docs: simplify top-level onboarding --- AGENTS.md | 6 +- AI-README-FIRST.md | 218 +++++++-------------------- AI_START_HERE.md | 106 ------------- EXAMPLE_CONTRACT_STANDARD.md | 4 +- README.md | 40 ++--- START_HERE.md | 283 ----------------------------------- incubator/README.md | 2 +- 7 files changed, 83 insertions(+), 576 deletions(-) delete mode 100644 AI_START_HERE.md delete mode 100644 START_HERE.md diff --git a/AGENTS.md b/AGENTS.md index c3f57c3..cbe5624 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -134,7 +134,7 @@ Use these shared meanings: - **preview** = read-only orientation only - **fast preview** = the example's read-only path (`--explain`, `--explain-json`, read-only demo/report scripts) - **operational evaluation** = run the smallest real setup/proof path that shows the example actually works -- **guided walkthrough** = pause-heavy presenter mode following the stage structure in `AI_START_HERE.md` +- **guided walkthrough** = pause-heavy presenter mode following the stage structure in the selected example's `AI_START_HERE.md` If the user says "evaluate it quickly" or "use the fast path" and does **not** explicitly say "read-only": @@ -142,7 +142,7 @@ If the user says "evaluate it quickly" or "use the fast path" and does **not** e 2. then continue into the smallest real operational proof if the example has: - `./setup.sh` - `./verify.sh` - - a representative proof action documented in `contracts.md` or `AI_START_HERE.md` + - a representative proof action documented in `contracts.md` or the example's `AI_START_HERE.md` 3. stop before cleanup unless the user asks for cleanup Only stay fully read-only when the user explicitly asks for: @@ -166,7 +166,7 @@ Do not parse human table output if JSON is available. ## 9. Most useful files -- human entry path: [START_HERE.md](./START_HERE.md) +- human entry path: [README.md](./README.md) - AI context: [AI-README-FIRST.md](./AI-README-FIRST.md) - incubator AI path: [incubator/AI_START_HERE.md](./incubator/AI_START_HERE.md) - layered examples: [incubator/global-app-layer/README.md](./incubator/global-app-layer/README.md) diff --git a/AI-README-FIRST.md b/AI-README-FIRST.md index 67b9109..a59f1f7 100644 --- a/AI-README-FIRST.md +++ b/AI-README-FIRST.md @@ -1,20 +1,20 @@ # AI README FIRST -If you are an AI assistant, read this file before exploring the rest of the repo. - -This repo is meant to be understandable and usable by both humans and AI assistants, but you will avoid a lot of confusion if you start here. +This is the canonical repo-level guide for AI assistants working in `confighub/examples`. If you want the shortest strict protocol first, read [AGENTS.md](./AGENTS.md) before this file. -## 1. What This Repo Is +## 1. What This File Is For + +Use this file for: -This is the public `confighub/examples` repo. +- repo-level AI operating rules +- CLI and mutation safety guidance +- choosing the right example family -It contains: +Do not treat this file as the walkthrough script. When the user wants pause-heavy demo mode, use the selected example's `AI_START_HERE.md`, or [incubator/AI_START_HERE.md](./incubator/AI_START_HERE.md) for incubator-wide walkthrough conventions. -- stable examples such as `promotion-demo-data`, `global-app`, `helm-platform-components`, and `vm-fleet` -- incubator work in [`incubator/`](./incubator/README.md), especially the layered-recipe package in [`incubator/global-app-layer/`](./incubator/global-app-layer/README.md) -- verifier scripts in `./scripts/` +## 2. Resolve The Repo Root First Do not hardcode the checkout path. This repo may be checked out as `examples`, `confighub-examples`, or another folder name. @@ -24,11 +24,11 @@ Resolve the repo root first: git rev-parse --show-toplevel ``` -## 2. How To Access Live ConfigHub +## 3. Access Live ConfigHub Through `cub` If you need live ConfigHub state, use the `cub` CLI. -Do **not** assume that fetching `https://hub.confighub.com` directly will give you meaningful data. It is a browser application. For machine use, `cub` is the right interface. +Do not assume that fetching `https://hub.confighub.com` directly will give you meaningful data. It is a browser application. For machine use, `cub` is the right interface. Use: @@ -39,14 +39,14 @@ cub space list --json cub target list --space "*" --json ``` -## 3. Important CLI Gotchas +## 4. Important CLI Gotchas Avoid these common mistakes: - use `cub version`, not `cub --version` - use `cub context list`, not `cub auth status` - use `--json` and `--jq` when you want machine-readable output -- use `--where "Labels.Key = 'value'"` for label filtering on `cub space list`, not `--label` +- use `--where "Labels.Key = 'value'"` for label filtering on list commands, not guessed `--label` filters - use `--dry-run` before `cub function do` or `cub unit apply` if you want a non-mutating preview Good discovery commands: @@ -58,7 +58,7 @@ cub info cub context list --json ``` -## 4. Default Operating Mode +## 5. Default Operating Mode Start in read-only mode. @@ -69,31 +69,7 @@ Recommended order: 3. inspect live ConfigHub state in JSON 4. only then run mutating example flows if the human wants that -## 4a. Shared meanings for "evaluate quickly" - -Use these meanings consistently across examples: - -- **preview** = read-only orientation -- **fast preview** = read-only example-specific path such as `./setup.sh --explain`, `./setup.sh --explain-json`, and any non-mutating demo/report script -- **fast operational evaluation** = preview plus the smallest real setup/proof sequence that demonstrates the example actually works - -If a user says "help me evaluate it quickly" and does not explicitly ask for read-only mode, the default should be: - -1. run the fast preview -2. run the smallest real operational proof path for that example -3. stop before cleanup unless asked - -For a runnable example, that usually means: - -- `./setup.sh` -- one isolation/verification command -- `./verify.sh` -- one representative proof action -- stop before cleanup - -Do not conclude that an example is "ready" based only on preview output if the example provides a real setup path and a representative proof action. - -## 5. Safe First Commands +## 6. Safe First Commands These commands do not mutate ConfigHub or any cluster: @@ -112,36 +88,38 @@ If you already know a specific unit: cub unit get --space --json ``` -What these commands do **not** mutate: +## 7. Choose The Right Example Family -- spaces -- units -- workers -- targets -- clusters -- Git history +Pick the family that matches the user's goal: -## 6. Mutating Commands: Use Care +- Stable no-cluster intro: [promotion-demo-data](./promotion-demo-data/README.md) and [campaigns-demo](./campaigns-demo/README.md) +- GitOps import or brownfield discovery: [gitops-import](./gitops-import/README.md), [import-from-live](./incubator/import-from-live/README.md), [gitops-import-argo](./incubator/gitops-import-argo/README.md), [gitops-import-flux](./incubator/gitops-import-flux/README.md) +- App mutation and platform flow: [springboot-platform-app-centric](./spring-platform/springboot-platform-app-centric/README.md) +- Worker extensibility: [custom-workers](./custom-workers/) +- Layered and advanced model: [incubator/global-app-layer](./incubator/global-app-layer/README.md) +- Full experimental catalog: [incubator/README.md](./incubator/README.md) -Examples often use: +If the user wants a human-oriented overview of the repo, [README.md](./README.md) is the public front door, but it is not the primary AI entry point. -- `./setup.sh` -- `./set-target.sh` -- `cub unit apply` -- `cub function do` +## 8. Shared Meanings For “Evaluate Quickly” -Those **do** create or mutate ConfigHub objects, and in some cases can affect a live cluster. +Use these meanings consistently across examples: -Before you run them, say clearly: +- **preview** = read-only orientation +- **fast preview** = read-only example-specific path such as `./setup.sh --explain`, `./setup.sh --explain-json`, or other non-mutating preview/report commands +- **fast operational evaluation** = preview plus the smallest real setup and proof sequence that demonstrates the example actually works -- what space(s) will be created or changed -- what unit(s) will be created or changed -- whether any target or cluster will be touched -- what read-only or dry-run command could be used first +If a user says "help me evaluate it quickly" and does not explicitly ask for read-only mode, the default should be: + +1. run the fast preview +2. run the smallest real operational proof path for that example +3. stop before cleanup unless asked -## 7. Machine-Readable Contracts +Do not conclude that an example is ready based only on preview output if the example provides a real setup path and proof path. -Use these as your default contracts: +## 9. Machine-Readable Contracts + +Treat these as stable repo-wide contracts: | Command | Output contract | Mutates anything? | |---|---|---| @@ -153,93 +131,33 @@ Use these as your default contracts: | `cub function do --dry-run --json ...` | JSON invocation response | no config write | | `cub unit apply --dry-run --json ...` | JSON apply preview | no live apply | -## 8. Best First Reading Order - -If the user is asking about ConfigHub generally: - -1. [START_HERE.md](./START_HERE.md) -2. [`promotion-demo-data/README.md`](./promotion-demo-data/README.md) -3. [`incubator/global-app-layer/00-config-hub-hello-world.md`](./incubator/global-app-layer/00-config-hub-hello-world.md) - -If the user is asking about NVIDIA AICR, recipes, or layered variants: - -1. [`incubator/global-app-layer/README.md`](./incubator/global-app-layer/README.md) -2. [`incubator/global-app-layer/confighub-aicr-value-add.md`](./incubator/global-app-layer/confighub-aicr-value-add.md) -3. [`incubator/global-app-layer/how-it-works.md`](./incubator/global-app-layer/how-it-works.md) -4. one worked example under `incubator/global-app-layer/` -5. run `./setup.sh --explain-json | jq` inside that example before reading shell code or mutating anything -6. use `./find-runs.sh --json` in `incubator/global-app-layer/` to discover active live runs - -If the user is asking about incubator-only work: - -1. [`incubator/README.md`](./incubator/README.md) -2. [`incubator/AI_START_HERE.md`](./incubator/AI_START_HERE.md) -3. [`incubator/ai-machine-seams-first.md`](./incubator/ai-machine-seams-first.md) +Example-specific seams vary. Do not assume `./verify.sh --json`, `./setup.sh --explain-json`, `./find-runs.sh --json`, or similar flags exist unless the example README, AI guide, script help, or source confirms them. -## 9. Repo Layout You Will Actually Need +## 10. Mutating Commands: Use Care -Most assistants only need these locations: - -- stable repo landing: [README.md](./README.md) -- human entry path: [START_HERE.md](./START_HERE.md) -- AI entry path: [AI_START_HERE.md](./AI_START_HERE.md) -- incubator landing: [`incubator/README.md`](./incubator/README.md) -- incubator machine seams: [`incubator/ai-machine-seams-first.md`](./incubator/ai-machine-seams-first.md) -- incubator eval prompts: [`incubator/ai-cold-eval-prompt-pack.md`](./incubator/ai-cold-eval-prompt-pack.md) -- layered recipes package: [`incubator/global-app-layer/README.md`](./incubator/global-app-layer/README.md) -- package mechanics: [`incubator/global-app-layer/how-it-works.md`](./incubator/global-app-layer/how-it-works.md) -- package value-add story: [`incubator/global-app-layer/confighub-aicr-value-add.md`](./incubator/global-app-layer/confighub-aicr-value-add.md) - -## 10. Placeholder Conventions - -When you see placeholders like: - -- `` -- `` -- `` -- `` - -they mean: - -- these commands are real -- but the specific connected object name depends on the user’s environment - -To discover the actual value, prefer: - -```bash -cub space list --json -cub target list --space "*" --json -``` - -## 11. What To Do If A User Says “Can You Access ConfigHub?” - -Do not answer by trying to fetch the web app. - -Instead: +Examples often use: -1. say that live ConfigHub access is through `cub` -2. run a read-only CLI check such as: +- `./setup.sh` +- `./set-target.sh` +- `cub unit apply` +- `cub function do` -```bash -cub context list --json -cub space list --json -``` +Those do create or mutate ConfigHub objects, and in some cases can affect a live cluster. -3. report what you found -4. only then ask whether they want you to inspect or mutate a specific example +Before you run them, say clearly: -## 12. Skills / Special Instructions +- what space(s) will be created or changed +- what unit(s) will be created or changed +- whether any target or cluster will be touched +- what read-only or dry-run command could be used first -There are no repo-local “skills” files in this repo that you need to load first. +## 11. Walkthrough Mode -Your main tools here are: +Use the selected example's `AI_START_HERE.md` when the user wants a guided demo with pauses after each stage. -- the example READMEs -- the shell scripts in each example -- the `cub` CLI -- JSON output from `cub` +Use [incubator/AI_START_HERE.md](./incubator/AI_START_HERE.md) when the work is specifically in the incubator and you need incubator-specific walkthrough conventions. -## 13. Next Step +## 12. Next Step If you are starting fresh, the best single next step is: @@ -248,28 +166,4 @@ cd ./scripts/verify.sh ``` -Then choose one path: - -- human-friendly overview: [START_HERE.md](./START_HERE.md) -- incubator AI path: [incubator/AI_START_HERE.md](./incubator/AI_START_HERE.md) -- layered recipes and AICR mapping: [incubator/global-app-layer/README.md](./incubator/global-app-layer/README.md) - -If you land in a layered example and want to know what it does, use: - -```bash -cd incubator/global-app-layer/realistic-app -./setup.sh --explain -./setup.sh --explain-json | jq -``` - -Those commands do not mutate ConfigHub. - -If you need to discover live runs for the layered examples, use: - -```bash -cd incubator/global-app-layer -./find-runs.sh -./find-runs.sh realistic-app --json | jq -``` - -This is also read-only. +Then choose the example family that matches the user's goal. diff --git a/AI_START_HERE.md b/AI_START_HERE.md deleted file mode 100644 index 9e822d1..0000000 --- a/AI_START_HERE.md +++ /dev/null @@ -1,106 +0,0 @@ -# AI Start Here - -Use the stronger repo-level AI guide first: - -- [AI-README-FIRST.md](./AI-README-FIRST.md) - -That file explains: - -- how to access live ConfigHub through `cub` -- which commands are read-only -- which commands have stable JSON output -- common CLI gotchas -- where the important docs and examples live - -Then use this file for the incubator-specific path: - -- [`incubator/AI_START_HERE.md`](./incubator/AI_START_HERE.md) - -## CRITICAL: Demo Pacing - -When walking a human through any example in this repo, you MUST pause after every stage. - -After each stage: - -1. run only the command or commands for that stage -2. print the full output on screen; do not abbreviate or summarize it away -3. explain what the output means in plain English -4. if there is a GUI URL or GUI checkpoint, print it explicitly -5. say what the GUI shows today -6. say what the GUI does not show yet -7. name the GUI feature ask and cite the issue number if one exists; if no issue exists yet, say that explicitly -8. tell the human to open the GUI and give them time to click through it -9. stop and ask `Ready to continue?` or `Want to inspect this more?` -10. do not proceed until the human tells you to continue - -If there is no GUI checkpoint for a stage, say that explicitly instead of skipping the note. - -When a stage has GUI relevance, prefer this shape: - -- `GUI now:` exact URL or click path and what is visible today -- `GUI gap:` what the GUI cannot show yet -- `GUI feature ask:` what the GUI should show next, with issue number if known -- `PAUSE:` tell the human to open the GUI and inspect it before continuing - -## Suggested Prompt For Humans - -Use this prompt when you want the AI to behave like a demo guide rather than a fast script runner: - -```text -Read the relevant AI_START_HERE.md file and walk me through the demo. -Pause after every stage. Show full output. -For each stage, tell me what the GUI shows today, what it does not show yet, and the feature ask. -Give me time to click through the GUI before continuing. -Do not continue until I say continue. -``` - -## Where To Start - -If the user is asking for GitHub import with Argo or Flux, start with the published docs and the runnable incubator examples in this repo: - -- [Official GitOps Import docs](https://docs.confighub.com/get-started/examples/gitops-import/) -- [`connect-and-compare`](./incubator/connect-and-compare/README.md) -- [`import-from-live`](./incubator/import-from-live/README.md) -- [`import-from-bundle`](./incubator/import-from-bundle/README.md) -- [`connected-summary-storage`](./incubator/connected-summary-storage/README.md) -- [`artifact-workflow`](./incubator/artifact-workflow/README.md) -- [`incubator/fleet-import`](./incubator/fleet-import/README.md) -- [`incubator/demo-data-adt`](./incubator/demo-data-adt/README.md) -- [`graph-export`](./incubator/graph-export/README.md) -- [`incubator/gitops-import-argo`](./incubator/gitops-import-argo/README.md) -- [`incubator/gitops-import-flux`](./incubator/gitops-import-flux/README.md) - -Use the first six when the user wants no-cluster evidence, reporting, or offline import paths before moving into the live Argo and Flux examples. - -If the user wants an app-style GitOps layout rather than an import flow, start here: - -- [`apptique-flux-monorepo`](./incubator/apptique-flux-monorepo/README.md) -- [`apptique-argo-applicationset`](./incubator/apptique-argo-applicationset/README.md) - -Use it when the user wants: - -- one app -- one shared base -- dev and prod overlays -- Flux-managed rollout without the extra complexity of the Argo siblings - -Use `apptique-argo-applicationset` when the user wants: - -- directory-driven environment discovery -- generated Argo Applications per environment -- the clearest incubator Argo app-style layout in the repo - -If the user is asking about worker extensibility, validation, policy checks, or custom execution paths, use the official worker examples in this repo: - -- [`custom-workers/hello-world-bridge`](./custom-workers/hello-world-bridge/README.md) -- [`custom-workers/hello-world-function`](./custom-workers/hello-world-function/README.md) -- [`custom-workers/kube-score`](./custom-workers/kube-score/README.md) -- [`custom-workers/kyverno`](./custom-workers/kyverno/README.md) -- [`custom-workers/kyverno-server`](./custom-workers/kyverno-server/README.md) -- [`custom-workers/opa-gatekeeper`](./custom-workers/opa-gatekeeper/README.md) - -Use `cub-scout` as companion material when the local examples here are not enough, especially for Helm-first workflows or microservice app-style comparisons: - -- [cub-scout examples index](https://github.com/confighub/cub-scout/tree/main/examples) -- [Helm quickstart](https://github.com/confighub/cub-scout/blob/main/docs/reference/cub-track-quickstart-helm.md) -- [Apptique microservice examples](https://github.com/confighub/cub-scout/tree/main/examples/apptique-examples) diff --git a/EXAMPLE_CONTRACT_STANDARD.md b/EXAMPLE_CONTRACT_STANDARD.md index 69a59d9..807c9a7 100644 --- a/EXAMPLE_CONTRACT_STANDARD.md +++ b/EXAMPLE_CONTRACT_STANDARD.md @@ -11,7 +11,9 @@ Runnable examples need two entry points: | Human-readable | Humans, AI assistants | Understand the example before running | | Machine-readable | Scripts, CI, AI assistants | Inspect the plan programmatically | -The human-readable entry point is `README.md` + `AI_START_HERE.md`. +The human-readable entry point is `README.md`. + +The paced AI walkthrough entry point is `AI_START_HERE.md`. The machine-readable entry point is `setup.sh --explain-json` + `contracts.md`. diff --git a/README.md b/README.md index e1016bb..3ddffe7 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ # ConfigHub Examples -This repo contains runnable examples for ConfigHub. +Runnable examples for ConfigHub. -## Start Here +## Where To Start -- AI assistants: [`AGENTS.md`](./AGENTS.md) -- AI assistants with more context: [`AI-README-FIRST.md`](./AI-README-FIRST.md) -- Example contract standard: [`EXAMPLE_CONTRACT_STANDARD.md`](./EXAMPLE_CONTRACT_STANDARD.md) +- Quickest no-cluster intro: [promotion-demo-data](./promotion-demo-data/README.md) +- GitOps import, compare, and experimental flows: [incubator/README.md](./incubator/README.md) +- App mutation and platform flow: [spring-platform/springboot-platform-app-centric](./spring-platform/springboot-platform-app-centric/README.md) +- Worker extensibility: [custom-workers](./custom-workers/) +- Example authoring contract: [EXAMPLE_CONTRACT_STANDARD.md](./EXAMPLE_CONTRACT_STANDARD.md) ## Safe First Checks @@ -21,24 +23,22 @@ cub target list --space "*" --json If you are not logged in yet, run `cub auth login` before the `cub` commands. -## Stable Paths +## Stable Examples -- [`campaigns-demo`](./campaigns-demo/README.md): 10 compliance campaigns backed by Kyverno CEL policies, with sample Kubernetes units to evaluate. -- [`promotion-demo-data`](./promotion-demo-data/README.md): no-cluster demo data for learning ConfigHub's App-Deployment-Target model and promotion flow. -- [`gitops-import`](./gitops-import/README.md): canonical Argo CD GitOps import example and docs companion. -- [`custom-workers`](./custom-workers/): worker extension examples, including bridge, function, and policy workers. -- [`global-app`](./global-app/README.md): classic multi-service app example. -- [`helm-platform-components`](./helm-platform-components/README.md): platform component example for Helm-managed infrastructure. -- [`vm-fleet`](./vm-fleet/README.md): VM fleet operations example. +- [campaigns-demo](./campaigns-demo/README.md): compliance campaigns backed by Kyverno CEL policies, with sample Kubernetes units to evaluate +- [promotion-demo-data](./promotion-demo-data/README.md): no-cluster demo data for learning ConfigHub's App-Deployment-Target model and promotion flow +- [gitops-import](./gitops-import/README.md): canonical Argo CD GitOps import example and docs companion +- [custom-workers](./custom-workers/): worker extension examples, including bridge, function, and policy workers +- [global-app](./global-app/README.md): classic multi-service app example +- [helm-platform-components](./helm-platform-components/README.md): platform component example for Helm-managed infrastructure +- [vm-fleet](./vm-fleet/README.md): VM fleet operations example -## Recommended Starting Points +## Good First Choices -- If you want to explore Campaigns and compliance workflows, start with [`campaigns-demo`](./campaigns-demo/README.md). ([AI guide](./campaigns-demo/AI_START_HERE.md)) -- If you want the quickest no-cluster path, start with [`promotion-demo-data`](./promotion-demo-data/README.md). ([AI guide](./promotion-demo-data/AI_START_HERE.md)) -- If you want the platform/generator model, start with [`spring-platform`](./spring-platform/). ([AI guide](./spring-platform/springboot-platform-app-centric/AI_START_HERE.md)) -- If you want GitOps import, start with [`gitops-import`](./gitops-import/README.md) and the [Official GitOps Import docs](https://docs.confighub.com/get-started/examples/gitops-import/). -- If you want worker extensibility, start with [`custom-workers`](./custom-workers/). -- If you want a classic multi-service example, use [`global-app`](./global-app/README.md). +- Want the simplest ConfigHub model first: [promotion-demo-data](./promotion-demo-data/README.md) +- Want compare, import, or app-layout work: [incubator/README.md](./incubator/README.md) +- Want a live app-centric mutation story: [springboot-platform-app-centric](./spring-platform/springboot-platform-app-centric/README.md) +- Want a classic multi-service example: [global-app](./global-app/README.md) ## Companion Material diff --git a/START_HERE.md b/START_HERE.md deleted file mode 100644 index c5741eb..0000000 --- a/START_HERE.md +++ /dev/null @@ -1,283 +0,0 @@ -# Start Here - -This is the human entry point for the `confighub/examples` repo. - -If you want the shortest path to understanding, use this order: - -1. look at one no-cluster evidence or offline import example -2. look at one live brownfield discovery example -3. look at one app-style GitOps layout example -4. look at one live GitOps import example with direct evidence -5. look at one worker extension example -6. look at one example that teaches the core ConfigHub object model -7. only then move to the bigger layered or fleet-style examples - -## First Path: No Cluster Required - -Start here if you want visible value quickly without depending on a live cluster: - -- [`connect-and-compare`](./incubator/connect-and-compare/README.md) -- [`import-from-bundle`](./incubator/import-from-bundle/README.md) -- [`connected-summary-storage`](./incubator/connected-summary-storage/README.md) -- [`artifact-workflow`](./incubator/artifact-workflow/README.md) -- [`graph-export`](./incubator/graph-export/README.md) -- [`incubator/fleet-import`](./incubator/fleet-import/README.md) -- [`incubator/demo-data-adt`](./incubator/demo-data-adt/README.md) - -Why: - -- they show evidence, compare, reporting, import proposal, offline bundle replay, aggregation, and scanning -- they also include one small shareable topology artifact path -- they are easy to review -- they make mutation boundaries obvious -- they answer “why does this matter?” quickly - -Typical flow: - -```bash -cd incubator/connect-and-compare -./setup.sh --explain -./setup.sh -./verify.sh -``` - -For the offline import sibling: - -```bash -cd incubator/import-from-bundle -./setup.sh --explain -./setup.sh -./verify.sh -``` - -For the reporting sibling: - -```bash -cd incubator/connected-summary-storage -./setup.sh --explain -./setup.sh -./verify.sh -``` - -For the offline bundle sibling: - -```bash -cd incubator/artifact-workflow -./setup.sh --explain -./setup.sh -./verify.sh -``` - -For the topology-artifact sibling: - -```bash -cd incubator/graph-export -./setup.sh --explain -./setup.sh -./verify.sh -``` - -## Second Path: Brownfield Discovery From A Live Cluster - -If you already have a cluster and want a dry-run ConfigHub proposal before any ConfigHub mutation, use: - -- [`import-from-live`](./incubator/import-from-live/README.md) - -Why: - -- it starts from live cluster reality -- it keeps the first ConfigHub mutation optional -- it is the cleanest single-player bridge from "what is running?" to "what would ConfigHub organize?" - -Typical flow: - -```bash -cd incubator/import-from-live -./setup.sh --explain -./setup.sh -./verify.sh -``` - -## App-Centric Mutation Story - -If you want to understand how ConfigHub routes operational changes through one app, use: - -- [`spring-platform/springboot-platform-app-centric`](./spring-platform/springboot-platform-app-centric/README.md) - -Why: - -- it shows one app (`inventory-api`) with three deployments (dev, stage, prod) -- it shows three target modes: unbound, noop, real -- it shows the three mutation outcomes: apply here, lift upstream, block/escalate -- it works out of the box with noop targets (no cluster required) - -Typical flow: - -```bash -cd spring-platform/springboot-platform-app-centric -./setup.sh --explain -./setup.sh -./verify.sh -``` - -## Third Path: Standard GitOps Import Stories - -Start with the published GitOps docs and then use the runnable examples in this repo: - -- [Official GitOps Import docs](https://docs.confighub.com/get-started/examples/gitops-import/) -- [`incubator/gitops-import-argo`](./incubator/gitops-import-argo/README.md) -- [`incubator/gitops-import-flux`](./incubator/gitops-import-flux/README.md) - -Why: - -- they show the current GitHub + Argo/Flux + AI/CLI + ConfigHub wedge -- they focus on import, rendered manifests, and evidence -- they do not depend on ConfigHub being the workload applier -- they use direct cluster inspection and `cub-scout` as verification layers -- they are now the standard GitOps stories to optimize for first - -Use them like this: - -- Argo standard: start with the healthy guestbook path in [`incubator/gitops-import-argo`](./incubator/gitops-import-argo/README.md). Add the brownfield contrast fixtures only after the guestbook path has created value. -- Flux standard: start with the healthy `podinfo` path in [`incubator/gitops-import-flux`](./incubator/gitops-import-flux/README.md). Add the D2 contrast fixtures only after `podinfo` has created value. -- Readiness bar: if either story cannot create value in 5-10 minutes, it is not ready as the standard front door. - -Typical flow: - -```bash -cd incubator/gitops-import-argo -./setup.sh --explain -./setup.sh -./verify.sh -``` - -For the Flux sibling: - -```bash -cd incubator/gitops-import-flux -./setup.sh --explain -./setup.sh -./verify.sh -``` - -## Fourth Path: App-Style GitOps Layout - -If you want an app-style GitOps layout rather than an import flow, use: - -- [`apptique-flux-monorepo`](./incubator/apptique-flux-monorepo/README.md) -- [`apptique-argo-applicationset`](./incubator/apptique-argo-applicationset/README.md) - -Why: - -- it is the cleanest incubator "one app, multiple environments" GitOps example in the repo -- it shows one base plus two environment overlays -- it is self-contained and live-validated -- it now has both a Flux and Argo incubator path - -Typical flow: - -```bash -cd incubator/apptique-flux-monorepo -./setup.sh --explain -./setup.sh --with-prod -./verify.sh --with-prod -``` - -For the Argo sibling: - -```bash -cd incubator/apptique-argo-applicationset -./setup.sh --explain -./setup.sh -./verify.sh -``` - -## Fifth Path: Worker Extensibility - -If you want to understand how ConfigHub workers are built and extended, go to: - -- [`custom-workers/hello-world-bridge`](./custom-workers/hello-world-bridge/README.md) -- [`custom-workers/hello-world-function`](./custom-workers/hello-world-function/README.md) -- [`custom-workers/kube-score`](./custom-workers/kube-score/README.md) -- [`custom-workers/kyverno`](./custom-workers/kyverno/README.md) -- [`custom-workers/kyverno-server`](./custom-workers/kyverno-server/README.md) -- [`custom-workers/opa-gatekeeper`](./custom-workers/opa-gatekeeper/README.md) - -These show simple bridge and function workers, plus policy and validation examples using the SDK as normal Go modules. - -## Sixth Path: Stable ConfigHub Model - -Then look at [`promotion-demo-data`](./promotion-demo-data/README.md). - -Why: - -- it is stable -- it is easy to review -- it shows ConfigHub’s multi-environment promotion model -- it does not need a live Kubernetes cluster - -Typical flow: - -```bash -cd promotion-demo-data -./setup.sh -./cleanup.sh -``` - -## Seventh Path: Learn The Core Object Model - -Then go to the layered examples package: - -- [`incubator/global-app-layer`](./incubator/global-app-layer/README.md) - -If you are brand new to ConfigHub, start with: - -- [`incubator/global-app-layer/00-config-hub-hello-world.md`](./incubator/global-app-layer/00-config-hub-hello-world.md) - -Then continue to: - -- [`incubator/global-app-layer/confighub-aicr-value-add.md`](./incubator/global-app-layer/confighub-aicr-value-add.md) - -That gives you: - -- one simple unit in one space -- then one layered recipe model -- then the value-add on top of NVIDIA AICR - -When you enter one of the worked examples, use the non-mutating plan first: - -```bash -cd incubator/global-app-layer/realistic-app -./setup.sh --explain -``` - -## Eighth Path: Pick The Right Layered Worked Example - -Inside [`incubator/global-app-layer`](./incubator/global-app-layer/README.md): - -- [`single-component`](./incubator/global-app-layer/single-component/README.md): smallest layered example -- [`frontend-postgres`](./incubator/global-app-layer/frontend-postgres/README.md): small multi-component app -- [`realistic-app`](./incubator/global-app-layer/realistic-app/README.md): fuller app example -- [`gpu-eks-h100-training`](./incubator/global-app-layer/gpu-eks-h100-training/README.md): NVIDIA-style layered recipe - -## Prerequisites - -At minimum: - -```bash -cub auth login -``` - -For real deployments, you also need: - -- a Kubernetes cluster -- a ConfigHub worker -- one or more targets - -## If You Prefer an AI-Guided Path - -Use: - -- [AI_START_HERE.md](./AI_START_HERE.md) - -That page starts in read-only mode and gives exact commands plus machine-readable JSON paths before suggesting mutating flows. diff --git a/incubator/README.md b/incubator/README.md index 1accf20..c9e3ddf 100644 --- a/incubator/README.md +++ b/incubator/README.md @@ -14,7 +14,7 @@ Library of experimental ConfigHub examples and sample apps. No promises. Shoul ## Entry Paths -- For humans: [`../START_HERE.md`](../START_HERE.md) +- For humans: [`../README.md`](../README.md) - For AI assistants: [`AI_START_HERE.md`](./AI_START_HERE.md) - For shared AI-safe evaluation flow: [`ai-machine-seams-first.md`](./ai-machine-seams-first.md) - Why ConfigHub: [`WHY_CONFIGHUB.md`](./WHY_CONFIGHUB.md)