feat(e2e): add user-override E2E tests and fix OCP-specific test issues by rlobillo · Pull Request #228 · openshift-virtualization/virt-platform-autopilot

rlobillo · 2026-07-01T14:04:20Z

Short Description

Add comprehensive E2E tests for the user-override feature (patch, ignore-fields, unmanaged modes) and fix OCP-specific test issues.

More details

This PR introduces a full E2E test suite for the user-override feature and addresses issues uncovered during testing on OpenShift clusters. It also strengthens metric-cleanup assertions for the selective-activation feature.

What this PR does / why we need it

1. User-override E2E tests (CNV-89267, CNV-89805)

New test file user_override_test.go covering three override modes:

Patch mode: concurrent application with drift detection/correction, security blocks on sensitive kinds, forbidden patch paths, invalid patch syntax
Ignore-fields mode: active field modification without reconciliation
Unmanaged mode: full drift preservation and re-management

Supporting changes:

Extend assets_test.go with UserOverrideFieldSpec (per-asset real operator-controlled fields), sensitiveKinds auto-derivation via initAssets(), and 3 observability assets (Service, ServiceMonitor, PrometheusRule)
Extract shared helpers (setAnnotation, removeAnnotation, setLabel, tamperField, readOverrideFieldValue, findCustomizationMetric, pollResourceField) and generic metric lookup functions into helpers_test.go

2. Skip PrometheusRule drift tests on OCP (fix)

The anti-thrashing and alert suites set PrometheusRule to unmanaged mode so they can patch alert "for" durations without the operator reverting them. This causes all drift-detection tests for PrometheusRule to time out on OpenShift. Added skipIfUnmanagedOnOCP() to skip these specs on OCP clusters.

3. Static KubeDescheduler asset for namespace guard tests (fix)

The namespace guard test was iterating over all assetsUnderTest and deleting each asset's namespace. After adding observability assets in openshift-cnv, this destroyed the operator's own namespace. Replaced with a single static KubeDescheduler asset which is safe to delete.

4. Wait for MachineConfigPools to stabilize (fix)

On OCP clusters, patching the autopilot annotation can trigger MachineConfig rollouts that leave nodes cordoned/updating. Tests that run before the rollout completes hit spurious failures. Added waitForMCPStable() in BeforeSuite and after every autopilot patch to block until all MCPs report Updated=True, Updating=False, Degraded=False. On non-OCP clusters (e.g. Kind) the function returns immediately.

5. Explicit metric-cleanup assertions for selective activation (CNV-89268)

Strengthened the selective-activation E2E tests to explicitly verify that per-asset metrics (compliance_status, paused_resources, reconcile_duration, customization_info) are deleted (series value = -1) when an asset leaves the allowlist. The previous test only compared metrics at two points after exclusion, which would pass even without the fix.

Other

Increased Ginkgo timeout from 10m to 30m in run-e2e.sh to accommodate the expanded test suite

The namespace guard test was iterating over all assetsUnderTest and deleting each asset's namespace. After adding Service, ServiceMonitor, and PrometheusRule assets in openshift-cnv, this destroyed the operator's own namespace, killing the autopilot pod with no way to recover. Replace the dynamic loop with a single static KubeDescheduler asset (namespace openshift-kube-descheduler-operator) which is safe to delete without affecting the operator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nmanaged The anti-thrashing and alert suites set PrometheusRule to unmanaged mode (platform.kubevirt.io/mode=unmanaged) so they can patch alert "for" durations without the operator reverting them. This makes the operator skip PrometheusRule entirely during reconciliation, causing all drift- detection tests for that asset to time out on OpenShift. Add skipIfUnmanagedOnOCP() to skip PrometheusRule specs on OCP clusters where unmanaged mode is active. On kind this is a non-issue because the unmanaged block is never set. Also removes the informer cache sleep workaround from touchHCO() which is no longer needed now that the real root cause is addressed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…unmanaged mode (CNV-89267, CNV-89805) Add a comprehensive E2E test suite for the user-override feature covering three override modes: patch (concurrent application with drift detection/correction), ignore-fields (active field modification without reconciliation), and unmanaged (full drift preservation and re-management). Key changes: - New test file user_override_test.go with tests for concurrent patch application, security blocks on sensitive kinds, forbidden patch paths, invalid patch syntax, ignore-fields with active field tampering, and unmanaged mode with drift preservation and re-management. - Extend assets_test.go with UserOverrideFieldSpec (per-asset real operator-controlled fields), sensitiveKinds auto-derivation via initAssets(), and 3 observability assets (Service, ServiceMonitor, PrometheusRule). - Extract shared helpers (setAnnotation, removeAnnotation, setLabel, tamperField, readOverrideFieldValue, findCustomizationMetric, pollResourceField) and generic metric lookup functions into helpers_test.go for reuse across all test suites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-07-01T14:04:25Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabiand for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-07-01T14:05:04Z

❌ Generated Files Verification Failed

One or more generated files in this PR are out of sync:

CRDs: Run make update-crds if CRD verification failed
RBAC: Run make generate-rbac if RBAC verification failed

Please regenerate the files locally and commit the changes.

On OCP clusters, patching the autopilot annotation can trigger MachineConfig rollouts that leave nodes cordoned/updating. Tests that run before the rollout completes hit spurious failures. Add waitForMCPStable() in BeforeSuite and after every autopilot patch to block until all MCPs report Updated=True, Updating=False, Degraded=False. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…t (CNV-89268) The existing test only checked that metrics didn't change between two points after the allowlist was narrowed, which would pass even without the fix. Now explicitly assert that compliance_status, paused_resources, reconcile_duration, and customization_info are -1 (series deleted) after the asset is excluded from the allowlist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rlobillo and others added 3 commits July 1, 2026 15:38

rlobillo and others added 2 commits July 2, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(e2e): add user-override E2E tests and fix OCP-specific test issues#228

feat(e2e): add user-override E2E tests and fix OCP-specific test issues#228
rlobillo wants to merge 5 commits into
openshift-virtualization:mainfrom
rlobillo:e2e-user-override

rlobillo commented Jul 1, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rlobillo commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Description

More details

What this PR does / why we need it

1. User-override E2E tests (CNV-89267, CNV-89805)

2. Skip PrometheusRule drift tests on OCP (fix)

3. Static KubeDescheduler asset for namespace guard tests (fix)

4. Wait for MachineConfigPools to stabilize (fix)

5. Explicit metric-cleanup assertions for selective activation (CNV-89268)

Other

Uh oh!

openshift-ci Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rlobillo commented Jul 1, 2026 •

edited

Loading