Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ The autopilot uses a **"Patched Baseline"** approach:
- **Convention over Configuration**: Opinionated defaults, customizable when needed

**Three-Tier Management:**
1. **Always-On**: Critical baseline configurations (MachineConfig, NodeHealthCheck, Kubelet settings)
1. **Always-On**: Critical baseline configurations (MachineConfig, Kubelet settings)
2. **Context-Aware**: Activated based on conditions (KubeDescheduler, CPU Manager)
3. **Advanced**: Specialized features (VFIO, USB passthrough, AAQ operator)

Expand Down
9 changes: 0 additions & 9 deletions assets/active/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,15 +76,6 @@ assets:
component: KubeletConfig
reconcile_order: 1

# Phase 1: Always installed - NodeHealthCheck
- name: node-health-check
path: active/node-health/standard-remediation.yaml
phase: 1
install: always
component: NodeHealthCheck
reconcile_order: 1
conditions: []

# Phase 1: Optional Operators (opt-in for clusters with CRDs)
- name: mtv-operator
path: active/operators/mtv.yaml.tpl
Expand Down
4 changes: 0 additions & 4 deletions assets/active/node-health/OWNERS

This file was deleted.

18 changes: 0 additions & 18 deletions assets/active/node-health/standard-remediation.yaml

This file was deleted.

7 changes: 7 additions & 0 deletions assets/tombstones/v0-cleanup/node-health-check.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
metadata:
name: virt-node-health-check
namespace: openshift-operators
labels:
platform.kubevirt.io/managed-by: virt-platform-autopilot
2 changes: 1 addition & 1 deletion cmd/rbac-gen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Templates use Go's `text/template` syntax. The generator replaces all `{{ ... }}
### Resource Pluralization
The generator pluralizes resource kinds using simple heuristics:
- Standard: `Example` → `examples`
- Special cases: `MachineConfig` → `machineconfigs`, `NodeHealthCheck` → `nodehealthchecks`
- Special cases: `MachineConfig` → `machineconfigs`

### Deduplication
Resources are deduplicated by `apiVersion/kind` to avoid duplicate RBAC rules even if the same resource type appears in multiple assets.
Expand Down
3 changes: 2 additions & 1 deletion config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -187,13 +187,14 @@ rules:
- patch
- update
- watch
# NodeHealthCheck
# NodeHealthCheck (includes tombstone cleanup)
- apiGroups:
- remediation.medik8s.io
resources:
- nodehealthchecks
verbs:
- create
- delete
- get
- list
- patch
Expand Down
7 changes: 2 additions & 5 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,12 @@ Only the named assets are considered for reconciliation. All other assets — in

```yaml
annotations:
platform.kubevirt.io/autopilot: "swap-enable,descheduler-loadaware,node-health-check"
platform.kubevirt.io/autopilot: "swap-enable,descheduler-loadaware"
```

```bash
kubectl annotate hyperconverged kubevirt-hyperconverged -n openshift-cnv \
"platform.kubevirt.io/autopilot=swap-enable,descheduler-loadaware,node-health-check"
"platform.kubevirt.io/autopilot=swap-enable,descheduler-loadaware"
```

Asset names correspond to the `name` field in `assets/active/metadata.yaml`. The current set includes:
Expand All @@ -84,7 +84,6 @@ Asset names correspond to the `name` field in `assets/active/metadata.yaml`. The
| `pci-passthrough` | | MachineConfig | Opt-in: hardware + annotation condition |
| `kubelet-perf-settings` | | KubeletConfig | Always-on baseline |
| `kubelet-cpu-manager` | | KubeletConfig | Opt-in: CPUManager feature gate |
| `node-health-check` | | NodeHealthCheck | Always-on baseline |
| `descheduler-loadaware` | | KubeDescheduler | Soft dependency on KubeDescheduler CRD |
| `monitoring-ui-plugin` | | UIPlugin | Soft dependency on COO CRD; enables Perses dashboards in the OpenShift console |
| `mtv-operator` | | ForkliftController | Opt-in: annotation condition |
Expand Down Expand Up @@ -120,7 +119,6 @@ The autopilot manages resources across three tiers based on criticality and acti

Critical baseline configurations applied to all clusters:

- **NodeHealthCheck**: Automatic node remediation for failed hosts
- **MachineConfig**: OS-level optimizations
- Swap optimization for memory management
- NUMA topology awareness
Expand Down Expand Up @@ -479,7 +477,6 @@ virt-platform-autopilot/
│ │ ├── machine-config/ # OS-level configs
│ │ ├── kubelet/ # Kubelet settings
│ │ ├── descheduler/ # KubeDescheduler
│ │ ├── node-health/ # NodeHealthCheck
│ │ ├── observability/ # PrometheusRules
│ │ ├── operators/ # Third-party operator CRs (UIPlugin, MetalLB, MTV…)
│ │ └── metadata.yaml # Asset catalog
Expand Down
42 changes: 10 additions & 32 deletions docs/adding-assets.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ assets/active/
├── hco/ # HyperConverged resource (only one, order: 0)
├── machine-config/ # MachineConfig resources
├── kubelet/ # KubeletConfig resources
├── node-health/ # NodeHealthCheck resources
├── descheduler/ # Descheduler resources
└── operators/ # Third-party operator CRs

Expand Down Expand Up @@ -98,32 +97,18 @@ This scans all assets and generates the necessary ClusterRole permissions.

For resources that don't need dynamic values:

**File:** `assets/active/node-health/standard-remediation.yaml`
**File:** `assets/active/operators/monitoring-uiplugin.yaml`

```yaml
apiVersion: remediation.medik8s.io/v1alpha1
kind: NodeHealthCheck
apiVersion: observability.openshift.io/v1alpha1
kind: UIPlugin
metadata:
name: virt-node-health-check
namespace: openshift-operators
name: monitoring
spec:
minHealthy: 51%
remediationTemplate:
apiVersion: self-node-remediation.medik8s.io/v1alpha1
kind: SelfNodeRemediationTemplate
name: self-node-remediation-automatic-strategy-template
namespace: openshift-operators
selector:
matchExpressions:
- key: node-role.kubernetes.io/worker
operator: Exists
unhealthyConditions:
- duration: 5m
status: "False"
type: Ready
- duration: 5m
status: Unknown
type: Ready
type: Monitoring
monitoring:
perses:
enabled: true
```

No templating needed - this is applied as-is.
Expand Down Expand Up @@ -321,14 +306,13 @@ The `assets/active/metadata.yaml` catalog defines all managed assets.
- `HyperConverged`
- `MachineConfig`
- `KubeletConfig`
- `NodeHealthCheck`
- `KubeDescheduler`
- `ForkliftController`
- `MetalLB`

**reconcile_order**: Processing order (lower numbers first).
- `0`: HCO only (must be first - serves as RenderContext source)
- `1-9`: Critical baseline (MachineConfig, Kubelet, NodeHealthCheck)
- `1-9`: Critical baseline (MachineConfig, Kubelet)
- `10-19`: Scheduling and placement (Descheduler)
- `20+`: Optional operators and advanced features

Expand Down Expand Up @@ -610,7 +594,7 @@ rules:
### 2. Set Appropriate Reconcile Order

- `0`: HCO only
- `1-9`: Infrastructure (MachineConfig, Kubelet, NodeHealthCheck)
- `1-9`: Infrastructure (MachineConfig, Kubelet)
- `10-19`: Scheduling and placement
- `20+`: Optional features

Expand Down Expand Up @@ -810,12 +794,6 @@ createdAt: {{ $timestamp }}

Production-ready HCO configuration with opinionated defaults. Must have `reconcile_order: 0`.

### NodeHealthCheck

**File:** `assets/active/node-health/standard-remediation.yaml`

Simple static YAML - no templating needed.

### Descheduler (Conditional)

**File:** `assets/active/descheduler/recommended.yaml.tpl`
Expand Down
3 changes: 1 addition & 2 deletions docs/debug-endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,14 +327,13 @@ swap-enable INCLUDED MachineConfig -
pci-passthrough EXCLUDED MachineConfig Conditions not met
numa-topology EXCLUDED MachineConfig Conditions not met
kubelet-perf-settings INCLUDED KubeletConfig -
node-health-check INCLUDED NodeHealthCheck -
mtv-operator EXCLUDED ForkliftController Conditions not met
metallb-operator EXCLUDED MetalLB Conditions not met
observability-operator EXCLUDED UIPlugin Conditions not met
descheduler-loadaware FILTERED KubeDescheduler Root exclusion
kubelet-cpu-manager EXCLUDED KubeletConfig Conditions not met
----------------------------------------------------------------------------------------------------
Summary: 3 included, 7 excluded, 1 filtered, 0 errors
Summary: 2 included, 7 excluded, 1 filtered, 0 errors
```

## Use Cases
Expand Down
55 changes: 6 additions & 49 deletions docs/runbooks/VirtPlatformDependencyMissing.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,49 +74,7 @@ spec:
EOF
```

### 2. NodeHealthCheck (Node Auto-Remediation)

**CRD:** `nodehealthchecks.remediation.medik8s.io`
**Provided by:** `node-healthcheck-operator` (via MediK8s)

**Features affected:**
- Automatic node health monitoring
- Self-node remediation integration
- Fence-agents remediation integration
- Unhealthy node detection

**Check if installed:**
```bash
# Check CRD exists
kubectl get crd nodehealthchecks.remediation.medik8s.io

# Check operator is installed
oc get csv -n openshift-operators | grep node-healthcheck

# Check NHC instance
kubectl get nodehealthcheck -A
```

**Install if missing:**
```bash
# Install via OperatorHub (OpenShift Console)
# Search for "Node Health Check Operator"
# Or via CLI:
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: node-healthcheck-operator
namespace: openshift-operators
spec:
channel: stable
name: node-healthcheck-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
```

### 3. MetalLB (LoadBalancer Services)
### 2. MetalLB (LoadBalancer Services)

**CRD:** `metallbs.metallb.io`
**Provided by:** `metallb-operator`
Expand Down Expand Up @@ -156,7 +114,7 @@ spec:
EOF
```

### 5. Forklift (VM Migration)
### 3. Forklift (VM Migration)

**CRD:** `forkliftcontrollers.forklift.konveyor.io`
**Provided by:** `forklift-operator` (Migration Toolkit for Virtualization)
Expand Down Expand Up @@ -238,7 +196,7 @@ kubectl logs -n openshift-cnv -l app=virt-platform-autopilot | \
grep -i "crd.*missing\|dependency\|not found"

# Example log:
# "Skipping asset NodeHealthCheck/virt-workers: CRD nodehealthchecks.remediation.medik8s.io not found"
# "Skipping asset KubeDescheduler/cluster: CRD kubedeschedulers.operator.openshift.io not found"
```

## Resolution Procedures
Expand Down Expand Up @@ -308,8 +266,8 @@ EOF
If you want to manage the affected resources manually:

```bash
# Example: Unmanage NodeHealthCheck resources
kubectl annotate nodehealthcheck virt-workers -n openshift-operators \
# Example: Unmanage KubeDescheduler resources
kubectl annotate kubedescheduler cluster -n openshift-kube-descheduler-operator \
platform.kubevirt.io/mode=unmanaged \
--overwrite

Expand Down Expand Up @@ -354,7 +312,6 @@ For production clusters, install the full stack:
```bash
# Install all recommended operators during cluster setup
# - cluster-kube-descheduler-operator
# - node-healthcheck-operator
# - metallb-operator (if bare-metal)
# - forklift-operator (if migration needed)
```
Expand All @@ -363,7 +320,7 @@ For production clusters, install the full stack:

```bash
# List all CRDs required by virt-platform-autopilot
kubectl get crd | grep -E "kubedescheduler|nodehealthcheck|metallb|machineconfig|forklift"
kubectl get crd | grep -E "kubedescheduler|metallb|machineconfig|forklift"
```

## Related Alerts
Expand Down
8 changes: 4 additions & 4 deletions pkg/controller/platform_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -398,9 +398,9 @@ func TestIsManagedCRD(t *testing.T) {
expected: true,
},
{
name: "NodeHealthCheck is managed",
name: "NodeHealthCheck is not managed (removed)",
crdName: "nodehealthchecks.remediation.medik8s.io",
expected: true,
expected: false,
},
{
name: "ForkliftController is managed",
Expand Down Expand Up @@ -612,7 +612,7 @@ func TestAssetSelectionWithAutopilotAnnotation(t *testing.T) {
wantInAllowlist: []string{"swap-enable"},
wantNotInAllowlist: []string{
"hco-golden-config", "prometheus-alerts", "psi-enable",
"kubelet-perf-settings", "node-health-check", "descheduler-loadaware",
"kubelet-perf-settings", "descheduler-loadaware",
"pci-passthrough", "kubelet-cpu-manager",
"mtv-operator", "metallb-operator", "observability-operator",
},
Expand All @@ -623,7 +623,7 @@ func TestAssetSelectionWithAutopilotAnnotation(t *testing.T) {
wantInAllowlist: []string{"descheduler-loadaware", "psi-enable"},
wantNotInAllowlist: []string{
"hco-golden-config", "prometheus-alerts",
"kubelet-perf-settings", "node-health-check", "swap-enable",
"kubelet-perf-settings", "swap-enable",
},
},
{
Expand Down
8 changes: 4 additions & 4 deletions pkg/engine/patcher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,11 @@ func TestNamespaceNotFoundDoesNotConsumeTokens(t *testing.T) {
loader := pkgassets.NewLoader()
renderer := NewRenderer(loader)

// node-health-check is a static asset in namespace openshift-operators.
// metrics-service is a namespaced asset (namespace comes from HCO).
assetMeta := &pkgassets.AssetMetadata{
Name: "node-health-check",
Path: "active/node-health/standard-remediation.yaml",
Component: "NodeHealthCheck",
Name: "metrics-service",
Path: "active/observability/metrics-service.yaml.tpl",
Component: "Service",
}

hco := pkgcontext.NewMockHCO("kubevirt-hyperconverged", "kubevirt-hyperconverged")
Expand Down
8 changes: 0 additions & 8 deletions test/e2e/assets_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,6 @@ var assetsUnderTest = []testAsset{
GateCRD: "kubeletconfigs.machineconfiguration.openshift.io",
ClusterScoped: true,
},
{
GVK: schema.GroupVersionKind{Group: "remediation.medik8s.io", Version: "v1alpha1", Kind: "NodeHealthCheck"},
Plural: "nodehealthchecks",
Name: "virt-node-health-check",
Namespace: "openshift-operators",
GateCRD: "nodehealthchecks.remediation.medik8s.io",
ClusterScoped: true,
},
{
GVK: schema.GroupVersionKind{Group: "observability.openshift.io", Version: "v1alpha1", Kind: "UIPlugin"},
Plural: "uiplugins",
Expand Down