ACM-34978: apply disk encryption defaults on cluster create and update#10442
Conversation
In OCPBUGS-86731 the issue was likelly caused by DiskEncryptionEnabledOn was an empty string and the validation threated empty string as enabled. A new test is also added to test the robustness.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR refactors disk encryption configuration logic by introducing a new ChangesDisk Encryption Refactoring
Sequence DiagramssequenceDiagram
participant Inventory
participant Controller
participant DiskEncryption as diskencryption pkg
participant Database
Inventory->>DiskEncryption: ApplyDiskEncryptionDefaults(de)
DiskEncryption-->>Inventory: (normalized EnableOn, Mode)
Inventory->>Inventory: setDiskEncryptionUsage(IsConfigured)
Inventory->>Database: Persist normalized config
Controller->>DiskEncryption: IsEnabled(enable_on)
DiskEncryption-->>Controller: bool
Controller->>DiskEncryption: DiskEncryptionFieldDefaults(spec values)
DiskEncryption-->>Controller: (defaulted fields)
Controller->>Database: Update with defaults
sequenceDiagram
participant Validator as Host/Hardware Validator
participant DiskEncryption as diskencryption pkg
participant HostRole
Validator->>DiskEncryption: IsConfigured(de)
DiskEncryption-->>Validator: bool
Validator->>DiskEncryption: IsSetWithTpm(de)
DiskEncryption-->>Validator: bool
Validator->>DiskEncryption: EnabledForRole(de, role)
DiskEncryption->>HostRole: Parse enable_on string
DiskEncryption-->>Validator: bool (role matched)
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 12 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@andrej1991: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/common/disk_encryption.go (1)
8-19: 💤 Low valueConsider clarifying that empty strings are treated as nil.
The function correctly uses
swag.StringValue()which converts bothniland empty string pointers to"", then applies defaults. This behavior is robust but not explicitly documented. Consider updating the godoc to mention that empty strings are normalized to defaults.📝 Suggested documentation enhancement
-// DiskEncryptionFieldDefaults returns enable_on and mode with defaults for nil or empty values. +// DiskEncryptionFieldDefaults returns enable_on and mode with defaults applied. +// Both nil pointers and empty strings are treated as unset and replaced with defaults: +// - enableOn defaults to "none" (DiskEncryptionEnableOnNone) +// - mode defaults to "tpmv2" (DiskEncryptionModeTpmv2) func DiskEncryptionFieldDefaults(enableOn, mode *string) (string, string) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/common/disk_encryption.go` around lines 8 - 19, Update the godoc for DiskEncryptionFieldDefaults to explicitly state that both nil pointers and pointers to empty strings are normalized by swag.StringValue to the empty string and therefore will receive the default values (models.DiskEncryptionEnableOnNone for enable_on and models.DiskEncryptionModeTpmv2 for mode); mention the use of swag.StringValue in the description so readers know empty string inputs are treated the same as nil and will be replaced with defaults.internal/common/disk_encryption_test.go (1)
11-23: ⚡ Quick winConsider testing mixed scenarios for better edge case coverage.
The current test covers "all nil", "all empty", and "all set" cases, but doesn't test scenarios where only one field is nil/empty while the other is explicitly set. These are valid real-world scenarios.
🧪 Suggested edge case tests
func TestDiskEncryptionFieldDefaults_MixedInputs(t *testing.T) { // enableOn set, mode nil enableOn, mode := DiskEncryptionFieldDefaults(swag.String(models.DiskEncryptionEnableOnMasters), nil) assert.Equal(t, models.DiskEncryptionEnableOnMasters, enableOn) assert.Equal(t, models.DiskEncryptionModeTpmv2, mode) // enableOn nil, mode set enableOn, mode = DiskEncryptionFieldDefaults(nil, swag.String(models.DiskEncryptionModeTang)) assert.Equal(t, models.DiskEncryptionEnableOnNone, enableOn) assert.Equal(t, models.DiskEncryptionModeTang, mode) // enableOn set, mode empty enableOn, mode = DiskEncryptionFieldDefaults(swag.String(models.DiskEncryptionEnableOnWorkers), swag.String("")) assert.Equal(t, models.DiskEncryptionEnableOnWorkers, enableOn) assert.Equal(t, models.DiskEncryptionModeTpmv2, mode) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/common/disk_encryption_test.go` around lines 11 - 23, Add tests covering mixed nil/empty/value combinations for DiskEncryptionFieldDefaults: call DiskEncryptionFieldDefaults with (1) enableOn set + mode nil and assert enableOn stays the provided value and mode defaults to DiskEncryptionModeTpmv2, (2) enableOn nil + mode set and assert enableOn defaults to DiskEncryptionEnableOnNone while mode stays provided, and (3) enableOn set + mode empty string and assert enableOn stays provided while mode defaults to DiskEncryptionModeTpmv2; place these cases in a new TestDiskEncryptionFieldDefaults_MixedInputs in internal/common/disk_encryption_test.go referencing DiskEncryptionFieldDefaults and the models constants used in existing tests.internal/hardware/validator_test.go (1)
1880-1897: ⚡ Quick winAdd explicit empty-string
enable_onregression coverage.This case validates
EnableOn == nil; please add a sibling case withEnableOn: swag.String("")to lock the exact empty-string regression path.Suggested test addition
+ It("TPM - empty enable_on with tpmv2 mode is treated as disabled", func() { + diskEncryptionClusterID := strfmt.UUID(uuid.New().String()) + diskEncryptionCluster := &common.Cluster{Cluster: models.Cluster{ + ID: &diskEncryptionClusterID, + OpenshiftVersion: openShiftVersionNotInConfig, + DiskEncryption: &models.DiskEncryption{ + EnableOn: swag.String(""), + Mode: swag.String(models.DiskEncryptionModeTpmv2), + }, + }} + + operatorsMock.EXPECT().GetPreflightRequirementsBreakdownForCluster(gomock.Any(), gomock.Eq(diskEncryptionCluster)).Return(operatorRequirements, nil) + + result, err := hwvalidator.GetPreflightHardwareRequirements(context.TODO(), diskEncryptionCluster) + Expect(err).ToNot(HaveOccurred()) + Expect(result.Ocp.Master.Quantitative.TpmEnabledInBios).To(BeFalse()) + Expect(result.Ocp.Worker.Quantitative.TpmEnabledInBios).To(BeFalse()) + })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/hardware/validator_test.go` around lines 1880 - 1897, Add a sibling test in internal/hardware/validator_test.go mirroring the "TPM - unset enable_on with tpmv2 mode is treated as disabled" case but set DiskEncryption.EnableOn to swag.String("") (empty string) instead of nil; create a cluster instance with DiskEncryption: &models.DiskEncryption{Mode: swag.String(models.DiskEncryptionModeTpmv2), EnableOn: swag.String("")}, keep the same operatorsMock.EXPECT().GetPreflightRequirementsBreakdownForCluster(...) return and call hwvalidator.GetPreflightHardwareRequirements(context.TODO(), <that cluster>), then assert result.Ocp.Master.Quantitative.TpmEnabledInBios and result.Ocp.Worker.Quantitative.TpmEnabledInBios are false to lock the empty-string regression path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/common/disk_encryption_test.go`:
- Around line 11-23: Add unit tests for ApplyDiskEncryptionDefaults covering nil
input, nil fields, empty-string fields, and explicit values: call
ApplyDiskEncryptionDefaults(nil) to ensure it doesn't panic; create de :=
&models.DiskEncryption{} and call ApplyDiskEncryptionDefaults(de) and assert
*de.EnableOn == models.DiskEncryptionEnableOnNone and *de.Mode ==
models.DiskEncryptionModeTpmv2; create de with EnableOn and Mode set to
swag.String("") and assert defaults are applied; create de with
EnableOn/swagt.String(models.DiskEncryptionEnableOnMasters) and
Mode/swagt.String(models.DiskEncryptionModeTang) and assert those explicit
values are preserved.
---
Nitpick comments:
In `@internal/common/disk_encryption_test.go`:
- Around line 11-23: Add tests covering mixed nil/empty/value combinations for
DiskEncryptionFieldDefaults: call DiskEncryptionFieldDefaults with (1) enableOn
set + mode nil and assert enableOn stays the provided value and mode defaults to
DiskEncryptionModeTpmv2, (2) enableOn nil + mode set and assert enableOn
defaults to DiskEncryptionEnableOnNone while mode stays provided, and (3)
enableOn set + mode empty string and assert enableOn stays provided while mode
defaults to DiskEncryptionModeTpmv2; place these cases in a new
TestDiskEncryptionFieldDefaults_MixedInputs in
internal/common/disk_encryption_test.go referencing DiskEncryptionFieldDefaults
and the models constants used in existing tests.
In `@internal/common/disk_encryption.go`:
- Around line 8-19: Update the godoc for DiskEncryptionFieldDefaults to
explicitly state that both nil pointers and pointers to empty strings are
normalized by swag.StringValue to the empty string and therefore will receive
the default values (models.DiskEncryptionEnableOnNone for enable_on and
models.DiskEncryptionModeTpmv2 for mode); mention the use of swag.StringValue in
the description so readers know empty string inputs are treated the same as nil
and will be replaced with defaults.
In `@internal/hardware/validator_test.go`:
- Around line 1880-1897: Add a sibling test in
internal/hardware/validator_test.go mirroring the "TPM - unset enable_on with
tpmv2 mode is treated as disabled" case but set DiskEncryption.EnableOn to
swag.String("") (empty string) instead of nil; create a cluster instance with
DiskEncryption: &models.DiskEncryption{Mode:
swag.String(models.DiskEncryptionModeTpmv2), EnableOn: swag.String("")}, keep
the same operatorsMock.EXPECT().GetPreflightRequirementsBreakdownForCluster(...)
return and call hwvalidator.GetPreflightHardwareRequirements(context.TODO(),
<that cluster>), then assert result.Ocp.Master.Quantitative.TpmEnabledInBios and
result.Ocp.Worker.Quantitative.TpmEnabledInBios are false to lock the
empty-string regression path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 990aa00e-6d40-4350-bcf1-b9e382996bfb
📒 Files selected for processing (6)
internal/bminventory/inventory.gointernal/common/disk_encryption.gointernal/common/disk_encryption_test.gointernal/controller/controllers/clusterdeployments_controller.gointernal/hardware/validator.gointernal/hardware/validator_test.go
|
Skipping CI for Draft Pull Request. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10442 +/- ##
==========================================
+ Coverage 44.34% 45.20% +0.86%
==========================================
Files 420 423 +3
Lines 73064 78322 +5258
==========================================
+ Hits 32401 35409 +3008
- Misses 37730 39909 +2179
- Partials 2933 3004 +71
🚀 New features to boost your workflow:
|
|
/retest |
13d5d2b to
c50cd6c
Compare
|
/retest e2e-agent-compac-ipv4 |
|
/retest e2e-agent-compact-ipv4 |
|
/hold |
|
/test e2e-agent-compact-ipv4 |
|
There are other parts of the code that are handling what appears to be non-normalize data (i.e. https://github.com/andrej1991/assisted-service/blob/c50cd6cb2d576daf449dd14210582df95240dadb/internal/network/manifests_generator.go#L378-L381) How are clusters already in the DB handled? We do not seem to normalize data coming from the DB |
| @@ -0,0 +1,29 @@ | |||
| package common | |||
There was a problem hiding this comment.
unrelated to this PR, as this is a common pattern in this repo.
However I think adding anything to the already tangled common package helps the probability of getting circular imports.
As this is a clean helper, with little dependencies (and no internal dependencies apart from models), I think it's a great candidate to move to its own package, something like internal/diskencryption.
There are other helpers around the codebase that could be moved in there as well:
- IsDiskEncryptionEnabledForRole in internal/host/hostutil/ — checks if encryption is enabled for a given host role. Only depends on models, swag, go-funk.
- isDiskEncryptionSetWithTpm in internal/hardware/validator.go — checks if TPM encryption is configured. Only models, swag.
- isDiskEncryptionEnabled in clusterdeployments_controller.go — checks if the CRD spec has encryption enabled. Only swag, models.
However this is probably not part of this PR: would you mind either creating a new task about it or a follow up PR with this simple refactor? 🙏
There was a problem hiding this comment.
"There are other parts of the code that are handling what appears to be non-normalize data (i.e. https://github.com/andrej1991/assisted-service/blob/c50cd6cb2d576daf449dd14210582df95240dadb/internal/network/manifests_generator.go#L378-L381)
How are clusters already in the DB handled? We do not seem to normalize data coming from the DB"
From now on only normalized data will end up in the DB. The new validation can handle the already existing data. The normalization was added due to API consistency.
The second comment is fixed in the last commit.
Normalize nil or empty disk_encryption enable_on and mode to none and tpmv2 when syncing from AgentClusterInstall and when persisting cluster updates. Prevents writing an empty enable_on while mode stays tpmv2, which could leave hosts stuck in Discovering during hardware validation.
c50cd6c to
65c9520
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/host/validator.go (1)
492-494:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winPotential nil pointer dereference at line 492 (existing code affecting changed segment)
Line 492 accesses
c.cluster.DiskEncryption.EnableOnwithout checking ifc.cluster.DiskEncryptionis nil. If disk encryption is not configured for the cluster, this will panic before reaching the changed code at line 533.Other call sites in this PR include defensive nil checks:
manifests_generator.go:353:c.DiskEncryption == nil || ...tang_connectivity_check_cmd.go:74:cluster.DiskEncryption == nil || ...This suggests
DiskEncryptioncan be nil in practice.Note: This issue exists in surrounding code, not introduced by the PR. However, since this PR refactors disk-encryption logic, it's a good opportunity to add the missing guard.
🛡️ Proposed fix
- if c.infraEnv != nil || swag.StringValue(c.cluster.DiskEncryption.EnableOn) == models.DiskEncryptionEnableOnNone { + if c.infraEnv != nil || c.cluster.DiskEncryption == nil || swag.StringValue(c.cluster.DiskEncryption.EnableOn) == models.DiskEncryptionEnableOnNone { return ValidationSuccessSuppressOutput, "" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/host/validator.go` around lines 492 - 494, The code checks c.cluster.DiskEncryption.EnableOn without guarding against c.cluster.DiskEncryption being nil; update the conditional that currently reads "if c.infraEnv != nil || swag.StringValue(c.cluster.DiskEncryption.EnableOn) == models.DiskEncryptionEnableOnNone" to first check "c.cluster.DiskEncryption == nil" (e.g., "if c.infraEnv != nil || c.cluster.DiskEncryption == nil || swag.StringValue(c.cluster.DiskEncryption.EnableOn) == models.DiskEncryptionEnableOnNone") so the nil pointer is avoided and the function (returning ValidationSuccessSuppressOutput, "") behaves the same when DiskEncryption is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/host/validator.go`:
- Around line 492-494: The code checks c.cluster.DiskEncryption.EnableOn without
guarding against c.cluster.DiskEncryption being nil; update the conditional that
currently reads "if c.infraEnv != nil ||
swag.StringValue(c.cluster.DiskEncryption.EnableOn) ==
models.DiskEncryptionEnableOnNone" to first check "c.cluster.DiskEncryption ==
nil" (e.g., "if c.infraEnv != nil || c.cluster.DiskEncryption == nil ||
swag.StringValue(c.cluster.DiskEncryption.EnableOn) ==
models.DiskEncryptionEnableOnNone") so the nil pointer is avoided and the
function (returning ValidationSuccessSuppressOutput, "") behaves the same when
DiskEncryption is absent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 85d850a2-ab1d-4a55-be9a-74b7bf249bec
📒 Files selected for processing (11)
internal/bminventory/inventory.gointernal/controller/controllers/clusterdeployments_controller.gointernal/diskencryption/disk_encryption.gointernal/diskencryption/disk_encryption_test.gointernal/hardware/validator.gointernal/host/hostcommands/tang_connectivity_check_cmd.gointernal/host/hostutil/host_utils.gointernal/host/hostutil/host_utils_test.gointernal/host/validator.gointernal/network/manifests_generator.gosubsystem/kubeapi/kubeapi_test.go
💤 Files with no reviewable changes (2)
- internal/host/hostutil/host_utils_test.go
- internal/host/hostutil/host_utils.go
✅ Files skipped from review due to trivial changes (1)
- internal/diskencryption/disk_encryption.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/bminventory/inventory.go
- internal/controller/controllers/clusterdeployments_controller.go
- subsystem/kubeapi/kubeapi_test.go
| @@ -0,0 +1,138 @@ | |||
| package diskencryption | |||
There was a problem hiding this comment.
Please stick to ginkgo tests, as the rest of the codebase
| "strings" | ||
| "text/template" | ||
|
|
||
| "github.com/go-openapi/swag" |
There was a problem hiding this comment.
There are others places in the code where we inline this check. Even if it still working OK in that case, please change it to be consistent with the new way, so we don't have 2 ways of achieving the same throughout the codebase
Use Ginkgo for diskencryption tests so CI ginkgo flags are accepted, and remove the unused ginkgo table import from hostutil tests.
|
/override ci/prow/edge-e2e-ai-operator-disconnected-capi |
|
/lgtm |
|
@rccrdpccl: Overrode contexts on behalf of rccrdpccl: ci/prow/edge-e2e-ai-operator-ztp-capi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrej1991, rccrdpccl The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@rccrdpccl: Overrode contexts on behalf of rccrdpccl: ci/prow/edge-e2e-ai-operator-disconnected-capi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retitle ACM-34978: apply disk encryption defaults on cluster create and update |
|
@andrej1991: This pull request references ACM-34978 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@gamli75: This pull request references ACM-34978 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "ACM 5.0.0" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/override ci/prow/edge-e2e-ai-operator-ztp-capi |
|
@andrej1991: andrej1991 unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/edge-e2e-ai-operator-disconnected-capi ci/prow/edge-e2e-ai-operator-ztp-capi |
|
@rccrdpccl: Overrode contexts on behalf of rccrdpccl: ci/prow/edge-e2e-ai-operator-disconnected-capi, ci/prow/edge-e2e-ai-operator-ztp-capi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override ci/prow/edge-e2e-ai-operator-ztp-capi |
|
@rccrdpccl: Overrode contexts on behalf of rccrdpccl: ci/prow/edge-e2e-ai-operator-ztp-capi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@andrej1991: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/cherry-pick release-ocm-2.17 |
|
/cherry-pick release-ocm-2.16 |
|
@andrej1991: new pull request created: #10513 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@andrej1991: new pull request created: #10514 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Normalize nil or empty disk_encryption enable_on and mode to none and
tpmv2 when syncing from AgentClusterInstall and when persisting cluster
updates. Prevents writing an empty enable_on while mode stays tpmv2,
which could leave hosts stuck in Discovering during hardware validation.
List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs, README, etc)Reviewers Checklist
Summary by CodeRabbit