feat(e2e): add user-override E2E tests and fix OCP-specific test issues#228
Open
rlobillo wants to merge 5 commits into
Open
feat(e2e): add user-override E2E tests and fix OCP-specific test issues#228rlobillo wants to merge 5 commits into
rlobillo wants to merge 5 commits into
Conversation
The namespace guard test was iterating over all assetsUnderTest and deleting each asset's namespace. After adding Service, ServiceMonitor, and PrometheusRule assets in openshift-cnv, this destroyed the operator's own namespace, killing the autopilot pod with no way to recover. Replace the dynamic loop with a single static KubeDescheduler asset (namespace openshift-kube-descheduler-operator) which is safe to delete without affecting the operator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nmanaged The anti-thrashing and alert suites set PrometheusRule to unmanaged mode (platform.kubevirt.io/mode=unmanaged) so they can patch alert "for" durations without the operator reverting them. This makes the operator skip PrometheusRule entirely during reconciliation, causing all drift- detection tests for that asset to time out on OpenShift. Add skipIfUnmanagedOnOCP() to skip PrometheusRule specs on OCP clusters where unmanaged mode is active. On kind this is a non-issue because the unmanaged block is never set. Also removes the informer cache sleep workaround from touchHCO() which is no longer needed now that the real root cause is addressed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…unmanaged mode (CNV-89267, CNV-89805) Add a comprehensive E2E test suite for the user-override feature covering three override modes: patch (concurrent application with drift detection/correction), ignore-fields (active field modification without reconciliation), and unmanaged (full drift preservation and re-management). Key changes: - New test file user_override_test.go with tests for concurrent patch application, security blocks on sensitive kinds, forbidden patch paths, invalid patch syntax, ignore-fields with active field tampering, and unmanaged mode with drift preservation and re-management. - Extend assets_test.go with UserOverrideFieldSpec (per-asset real operator-controlled fields), sensitiveKinds auto-derivation via initAssets(), and 3 observability assets (Service, ServiceMonitor, PrometheusRule). - Extract shared helpers (setAnnotation, removeAnnotation, setLabel, tamperField, readOverrideFieldValue, findCustomizationMetric, pollResourceField) and generic metric lookup functions into helpers_test.go for reuse across all test suites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Contributor
|
❌ Generated Files Verification Failed One or more generated files in this PR are out of sync:
Please regenerate the files locally and commit the changes. |
On OCP clusters, patching the autopilot annotation can trigger MachineConfig rollouts that leave nodes cordoned/updating. Tests that run before the rollout completes hit spurious failures. Add waitForMCPStable() in BeforeSuite and after every autopilot patch to block until all MCPs report Updated=True, Updating=False, Degraded=False. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t (CNV-89268) The existing test only checked that metrics didn't change between two points after the allowlist was narrowed, which would pass even without the fix. Now explicitly assert that compliance_status, paused_resources, reconcile_duration, and customization_info are -1 (series deleted) after the asset is excluded from the allowlist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Short Description
Add comprehensive E2E tests for the user-override feature (patch, ignore-fields, unmanaged modes) and fix OCP-specific test issues.
More details
This PR introduces a full E2E test suite for the user-override feature and addresses issues uncovered during testing on OpenShift clusters. It also strengthens metric-cleanup assertions for the selective-activation feature.
What this PR does / why we need it
1. User-override E2E tests (CNV-89267, CNV-89805)
New test file
user_override_test.gocovering three override modes:Supporting changes:
assets_test.gowithUserOverrideFieldSpec(per-asset real operator-controlled fields),sensitiveKindsauto-derivation viainitAssets(), and 3 observability assets (Service, ServiceMonitor, PrometheusRule)setAnnotation,removeAnnotation,setLabel,tamperField,readOverrideFieldValue,findCustomizationMetric,pollResourceField) and generic metric lookup functions intohelpers_test.go2. Skip PrometheusRule drift tests on OCP (fix)
The anti-thrashing and alert suites set PrometheusRule to unmanaged mode so they can patch alert "for" durations without the operator reverting them. This causes all drift-detection tests for PrometheusRule to time out on OpenShift. Added
skipIfUnmanagedOnOCP()to skip these specs on OCP clusters.3. Static KubeDescheduler asset for namespace guard tests (fix)
The namespace guard test was iterating over all
assetsUnderTestand deleting each asset's namespace. After adding observability assets inopenshift-cnv, this destroyed the operator's own namespace. Replaced with a single static KubeDescheduler asset which is safe to delete.4. Wait for MachineConfigPools to stabilize (fix)
On OCP clusters, patching the autopilot annotation can trigger MachineConfig rollouts that leave nodes cordoned/updating. Tests that run before the rollout completes hit spurious failures. Added
waitForMCPStable()inBeforeSuiteand after every autopilot patch to block until all MCPs reportUpdated=True,Updating=False,Degraded=False. On non-OCP clusters (e.g. Kind) the function returns immediately.5. Explicit metric-cleanup assertions for selective activation (CNV-89268)
Strengthened the selective-activation E2E tests to explicitly verify that per-asset metrics (
compliance_status,paused_resources,reconcile_duration,customization_info) are deleted (series value = -1) when an asset leaves the allowlist. The previous test only compared metrics at two points after exclusion, which would pass even without the fix.Other
run-e2e.shto accommodate the expanded test suite