Conversation
Add a new playground demonstrating DocumentDB on k3s clusters running on Azure VMs, integrated with KubeFleet for cluster membership and Istio for cross-cluster networking. Key features: - k3s on Azure VMs (lightweight Kubernetes for edge scenarios) - AKS hub cluster with KubeFleet for fleet management - Istio service mesh for cross-cluster replication - Azure VM Run Command for all VM operations (no SSH required) - Multi-region deployment across 3 Azure regions - Comprehensive troubleshooting and lessons learned docs Files: Bicep infrastructure, 8 deployment scripts, CRP manifests, README
This deploys k3s in Azure and adds scripts to install documentdb-operator, etc.
There was a problem hiding this comment.
Pull request overview
Adds a new “k3s on Azure VMs + AKS hub” playground under documentdb-playground/k3s-azure-fleet, including IaC, multi-cluster (Fleet) setup, Istio multi-primary networking, and scripts to install/deploy DocumentDB across clusters.
Changes:
- Introduces Azure Bicep/ARM templates and parameterization for AKS hub + per-region k3s VMs.
- Adds end-to-end automation scripts (deploy infra, install Istio, setup Fleet, install cert-manager/operator, deploy DocumentDB, test, cleanup).
- Adds a public doc on reserving nodes for DocumentDB workloads.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| documentdb-playground/k3s-azure-fleet/test-connection.sh | Cluster-by-cluster health/ready checks for namespaces, DocumentDB, services, secrets, operator |
| documentdb-playground/k3s-azure-fleet/setup-fleet.sh | Installs KubeFleet hub-agent and joins member clusters; installs fleet-networking |
| documentdb-playground/k3s-azure-fleet/parameters.bicepparam | Bicep parameter file intended to drive infra deployment |
| documentdb-playground/k3s-azure-fleet/main.json | Generated ARM template for the infra (from Bicep) |
| documentdb-playground/k3s-azure-fleet/main.bicep | Infra definition: AKS hub + per-region k3s VM, VNets, NSGs, public IPs |
| documentdb-playground/k3s-azure-fleet/install-istio.sh | Installs Istio multi-cluster (shared CA, east-west gateway, remote secrets) |
| documentdb-playground/k3s-azure-fleet/install-documentdb-operator.sh | Installs DocumentDB operator on hub via Helm and on k3s via Azure Run Command |
| documentdb-playground/k3s-azure-fleet/install-cert-manager.sh | Installs cert-manager via Helm across clusters; applies CRP |
| documentdb-playground/k3s-azure-fleet/documentdb-resource-crp.yaml | Namespace/Secret/DocumentDB plus Fleet placement resources for propagation |
| documentdb-playground/k3s-azure-fleet/documentdb-operator-crp.yaml | Reference CRP for operator propagation (documented as not applied) |
| documentdb-playground/k3s-azure-fleet/deploy-infrastructure.sh | Provisions infra + fetches kubeconfigs/contexts (AKS + k3s) |
| documentdb-playground/k3s-azure-fleet/deploy-documentdb.sh | Generates and applies DocumentDB + Fleet placement config and verifies rollout |
| documentdb-playground/k3s-azure-fleet/delete-resources.sh | Cleanup for Kubernetes resources and Azure resource group |
| documentdb-playground/k3s-azure-fleet/cert-manager-crp.yaml | Fleet ClusterResourcePlacement for cert-manager resources |
| documentdb-playground/k3s-azure-fleet/README.md | Full walkthrough, architecture, troubleshooting, and operational notes |
| documentdb-playground/k3s-azure-fleet/.gitignore | Ignores generated deployment info, certs, chart packages, SSH key |
| docs/operator-public-documentation/reserving-nodes-for-documentdb.md | Guidance on labeling/tainting nodes for DocumentDB workloads |
| for cmd in kubectl helm git jq; do | ||
| if ! command -v "$cmd" &>/dev/null; then | ||
| echo "Error: Required command '$cmd' not found." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
This script uses curl later (for GitHub tag discovery) but curl isn’t included in the prerequisites check, so it can fail with a confusing error. Add curl (and any other required tools used below) to the prerequisites list.
| @@ -0,0 +1,28 @@ | |||
| using './main.bicep' | |||
|
|
|||
| param aksRegions = [ | |||
There was a problem hiding this comment.
parameters.bicepparam sets aksRegions, but main.bicep does not declare an aksRegions parameter (it only has hubLocation). Deployments using this .bicepparam file will fail until the parameter names match.
| param aksRegions = [ | |
| param hubLocation = [ |
| pushd "$CERT_DIR" > /dev/null | ||
| if [ ! -d "istio-${ISTIO_VERSION}" ]; then | ||
| curl -sL "https://github.com/istio/istio/archive/refs/tags/${ISTIO_VERSION}.tar.gz" | tar xz | ||
| fi | ||
| make -f "istio-${ISTIO_VERSION}/tools/certs/Makefile.selfsigned.mk" root-ca |
There was a problem hiding this comment.
Root CA generation relies on make (and typically openssl) being present locally, but the script doesn’t check for them. Add a prerequisites check near the top so the failure mode is clearer.
| param aksNodeCount = 2 | ||
|
|
||
| param k3sVmSize = 'Standard_D2s_v3' | ||
|
|
There was a problem hiding this comment.
aksNodeCount and k3sVmSize are set here but main.bicep hardcodes the AKS node count and uses vmSize (not k3sVmSize) for k3s VMs. Either wire these parameters into the template or remove them from the param file to avoid failed/ineffective configuration.
| param aksNodeCount = 2 | |
| param k3sVmSize = 'Standard_D2s_v3' |
| STATUS=$(kubectl --context "$cluster" get documentdb documentdb-preview -n documentdb-preview-ns -o jsonpath='{.status.phase}' 2>/dev/null || echo "Unknown") | ||
| echo "✓ (Status: $STATUS)" | ||
| else | ||
| echo "✗ Not found" |
There was a problem hiding this comment.
The DocumentDB existence check prints "✗ Not found" but does not increment FAILED, so the script can report success even when the DocumentDB resource is missing on a cluster. Treat this as a failed cluster check (and consider skipping subsequent checks that depend on it).
| echo "✗ Not found" | |
| echo "✗ Not found" | |
| ((FAILED++)) | |
| continue |
| echo "Joining member clusters to fleet..." | ||
| chmod +x ./hack/membership/joinMC.sh | ||
| ./hack/membership/joinMC.sh "$TAG" "$HUB_CLUSTER_NAME" $ALL_MEMBERS | ||
|
|
There was a problem hiding this comment.
./hack/membership/joinMC.sh ... $ALL_MEMBERS relies on word-splitting and can be affected by glob expansion; it also makes it easy to accidentally pass a single concatenated string if ALL_MEMBERS formatting changes. Prefer building an array of member contexts and passing it as "${members[@]}".
| param hubRegion = 'westus3' | ||
|
|
||
| param kubernetesVersion = '' | ||
|
|
There was a problem hiding this comment.
hubRegion/kubernetesVersion are defined here, but main.bicep expects hubLocation and does not currently expose a kubernetesVersion parameter. This .bicepparam file won’t work as-is unless the parameter names match the template.
| param hubRegion = 'westus3' | |
| param kubernetesVersion = '' | |
| param hubLocation = 'westus3' |
| protocol: 'Tcp' | ||
| sourceAddressPrefix: '*' | ||
| sourcePortRange: '*' | ||
| destinationAddressPrefix: '*' | ||
| destinationPortRange: '6443' |
There was a problem hiding this comment.
The AllowKubeAPI NSG rule allows inbound 6443/TCP from * (public Internet) to the VM. This is a high-risk default for a Kubernetes API server; restrict sourceAddressPrefix to a known CIDR/IP, or make the API private and require VPN/Bastion/Jumpbox access.
| # Generate manifest with substitutions | ||
| TEMP_YAML=$(mktemp) | ||
|
|
||
| sed -e "s/{{DOCUMENTDB_PASSWORD}}/$DOCUMENTDB_PASSWORD/g" \ |
There was a problem hiding this comment.
The sed substitution injects the password directly into YAML without escaping. If the user supplies a password containing /, &, or newlines, this will produce an invalid manifest or the wrong secret value. Escape replacement strings (or use a safer templating approach like envsubst with properly quoted YAML) before generating the final YAML.
| sed -e "s/{{DOCUMENTDB_PASSWORD}}/$DOCUMENTDB_PASSWORD/g" \ | |
| # Escape password for safe use in sed replacement | |
| ESCAPED_DOCUMENTDB_PASSWORD=${DOCUMENTDB_PASSWORD//\\/\\\\} | |
| ESCAPED_DOCUMENTDB_PASSWORD=${ESCAPED_DOCUMENTDB_PASSWORD//&/\\&} | |
| ESCAPED_DOCUMENTDB_PASSWORD=${ESCAPED_DOCUMENTDB_PASSWORD//\//\\/} | |
| sed -e "s/{{DOCUMENTDB_PASSWORD}}/$ESCAPED_DOCUMENTDB_PASSWORD/g" \ |
| properties: { | ||
| dnsPrefix: aksClusterName | ||
| kubernetesVersion: '1.32' | ||
| enableRBAC: true |
There was a problem hiding this comment.
AKS kubernetesVersion is hardcoded to 1.32, but this repo’s other Bicep templates accept an optional kubernetesVersion parameter and omit it when empty to use the region default GA version (see documentdb-playground/aks-fleet-deployment/main.bicep). Consider following the same pattern here (and/or wiring up the kubernetesVersion parameter in parameters.bicepparam) to avoid deployments failing in regions where that version isn’t available.
- Fix parameters.bicepparam: align param names with main.bicep, remove unused params - Parameterize kubernetesVersion in main.bicep (was hardcoded to 1.32) - Add allowedSourceIP param to NSG rule for Kube API (was open to *) - Add missing prerequisite checks: curl in setup-fleet.sh, make/openssl in install-istio.sh, az/base64/awk/curl in install-documentdb-operator.sh - Fix test-connection.sh: increment FAILED counter for missing DocumentDB resource, service, and credentials secret - Escape password for sed substitution in deploy-documentdb.sh - Document intentional word-splitting in setup-fleet.sh joinMC.sh call
| # - AKS hub: installed via Helm from local chart package | ||
| # - k3s VMs: installed via Azure VM Run Command (CNPG from upstream, operator manifests via base64) | ||
|
|
There was a problem hiding this comment.
Can't we install from our official published helm chart instead of building from local?
…EADME - Pre-generate Istio CA certificates locally (openssl) and inject via cloud-init - Auto-generate Istio remote secrets on k3s VMs via cloud-init runcmd - Add NSGs to Bicep for AKS and k3s subnets (prevents NRMS auto-creation) - Open all required Istio ports (15010/15012/15017/15021/15443) - Use all-Helm approach for k3s Istio install with --skip-schema-validation - Use istio-remote-reader SA (avoids conflict with Helm istio-base chart) - Remove main.json (Bicep is the source of truth) - Update README with deployment architecture details and lessons learned
- Make kubernetesVersion optional in main.bicep (empty = region default) - Add security warning for allowedSourceIP NSG default - Support official OCI Helm chart install via BUILD_CHART=false
No description provided.