Skip to content

MGMT-24571: Fix initrd PPI reboot loop by comparing correct image URL#10456

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
carbonin:fix-initrd-ppi-reboot-loop
Jun 12, 2026
Merged

MGMT-24571: Fix initrd PPI reboot loop by comparing correct image URL#10456
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
carbonin:fix-initrd-ppi-reboot-loop

Conversation

@carbonin

@carbonin carbonin commented Jun 11, 2026

Copy link
Copy Markdown
Member

When a PreprovisioningImage uses InitRD format, the image-updated check was always comparing the current ImageUrl against ISODownloadURL instead of the InitrdURL. This caused the BMH to be rebooted on every reconcile even when the initrd image had not changed.

Extract getExpectedImageURL to select the correct URL (ISO or initrd) based on the PPI's AcceptFormats, and use it in the imageUpdated check.

Assisted-by: Claude Code noreply@anthropic.com

List all the issues related to this PR

Resolves https://redhat.atlassian.net/browse/MGMT-24571

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

@minmzzhang is testing. Will hold until she signs off on it.

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Summary by CodeRabbit

  • Bug Fixes

    • Fixed image update detection to correctly handle multiple image formats (ISO and InitRD), ensuring proper reboot and update behavior based on the requested format.
  • Tests

    • Added comprehensive test coverage for InitRD format image handling and reboot annotation updates.

When a PreprovisioningImage uses InitRD format, the image-updated check
was always comparing the current ImageUrl against ISODownloadURL instead
of the InitrdURL. This caused the BMH to be rebooted on every reconcile
even when the initrd image had not changed.

Extract getExpectedImageURL to select the correct URL (ISO or initrd)
based on the PPI's AcceptFormats, and use it in the imageUpdated check.

Resolves https://redhat.atlassian.net/browse/MGMT-24571

Assisted-by: Claude Code <noreply@anthropic.com>
@carbonin carbonin requested a review from CrystalChun June 11, 2026 19:58
@carbonin carbonin added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2026
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8107811f-4365-48bb-bab4-65211cdf62d3

📥 Commits

Reviewing files that changed from the base of the PR and between bc2a643 and b06897f.

📒 Files selected for processing (2)
  • internal/controller/controllers/preprovisioningimage_controller.go
  • internal/controller/controllers/preprovisioningimage_controller_test.go

Walkthrough

This PR modifies the PreprovisioningImage controller to support format-aware image URL comparison. Instead of always comparing against the ISO download URL, it now selects the appropriate URL (ISO or InitRD) from InfraEnv based on the PreprovisioningImage's accepted format, ensuring reboot and update behavior aligns with the requested image type.

Changes

Format-aware PreprovisioningImage URL comparison

Layer / File(s) Summary
Format-aware URL selection implementation
internal/controller/controllers/preprovisioningimage_controller.go
A new getExpectedImageURL helper selects the correct download URL from InfraEnv based on whether the PreprovisioningImage accepts ISO or InitRD format. The handleImageUpdate method now uses this helper to determine image updates instead of always comparing against infraEnv.Status.ISODownloadURL.
InitRD format test coverage
internal/controller/controllers/preprovisioningimage_controller_test.go
Two Ginkgo tests validate InitRD-format PreprovisioningImage behavior: one confirms that when the initrd URL is already current, no BMH reboot is triggered; the other confirms that when the initrd URL changes, the PPI is updated and the reboot.metal3.io annotation is set on the BareMetalHost.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning FAIL: setInfraEnvIronicConfig has Expect(err).NotTo(HaveOccurred()) with no message, and ClusterVersion (cluster-scoped) is created inside It blocks without cleanup. Add a message to Expect(err).NotTo(HaveOccurred(), "...") and ensure any cluster-scoped test resources (e.g., ClusterVersion) are created in BeforeEach or cleaned in AfterEach.
✅ Passed checks (13 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing a reboot loop bug in InitRD PreprovisioningImage by comparing against the correct image URL.
Description check ✅ Passed The description follows the template structure with all critical sections completed: clear problem statement, solution explanation, issue resolution link, bug fix categorization, environment impact selection, testing status, and checklist completion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR adds two new Ginkgo test cases with stable, deterministic names: "doesn't reboot the host when initrd PreprovisioningImage ImageUrl is up to date" and "reboots the host when the initrd image...
Microshift Test Compatibility ✅ Passed InitRD-focused new Ginkgo It blocks in preprovisioningimage_controller_test.go don’t reference forbidden MicroShift-unsupported APIs/resources (e.g., no ClusterVersion/config.openshift.io/v1) or di...
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds unit tests (not e2e tests) to preprovisioningimage_controller_test.go using fakeclient and gomock. SNO compatibility check applies only to e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR updates PPI image URL comparison (getExpectedImageURL) and related tests only; no new scheduling constraints (affinity/topology spread/nodeSelector/PDB) detected in touched controller/test files.
Ote Binary Stdout Contract ✅ Passed The PR adds a getExpectedImageURL helper and modifies handleImageUpdate to use it. Neither the new helper function nor the modified code contains any stdout writes (fmt.Print/Println/Printf), klog...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR #10456 changes only internal/controller/controller go code + unit tests (2 files); no new Ginkgo e2e tests or IPv6/disconnected connectivity assumptions were added.
No-Weak-Crypto ✅ Passed In PR #10456, the modified controller and test files contain no occurrences of md5/sha1/DES/RC4/3DES/Blowfish/ECB or crypto-related patterns; no custom crypto or constant-time compare usage detected.
Container-Privileges ✅ Passed PR #10456 modifies only the two Go controller/test files; scanning shows no K8s privilege fields (privileged, hostPID/Network/IPC, SYS_ADMIN, allowPrivilegeEscalation).
No-Sensitive-Data-In-Logs ✅ Passed No logging of sensitive data found. All log statements use static strings without format parameters. URL values are assigned to status fields but not logged.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 11, 2026

Copy link
Copy Markdown

@carbonin: This pull request references MGMT-24571 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

When a PreprovisioningImage uses InitRD format, the image-updated check was always comparing the current ImageUrl against ISODownloadURL instead of the InitrdURL. This caused the BMH to be rebooted on every reconcile even when the initrd image had not changed.

Extract getExpectedImageURL to select the correct URL (ISO or initrd) based on the PPI's AcceptFormats, and use it in the imageUpdated check.

Assisted-by: Claude Code noreply@anthropic.com

List all the issues related to this PR

Resolves https://redhat.atlassian.net/browse/MGMT-24571

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

@minmzzhang is testing. Will hold until she signs off on it.

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see [CONTRIBUTING] guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Summary by CodeRabbit

  • Bug Fixes

  • Fixed image update detection to correctly handle multiple image formats (ISO and InitRD), ensuring proper reboot and update behavior based on the requested format.

  • Tests

  • Added comprehensive test coverage for InitRD format image handling and reboot annotation updates.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 11, 2026

@CrystalChun CrystalChun left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2026
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, CrystalChun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CrystalChun,carbonin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@minmzzhang

minmzzhang commented Jun 11, 2026

Copy link
Copy Markdown

With @carbonin's designer image, and without the inspect.metal3.io=false, I was able to pass the inspecting and provisioning phase, now the two spoke workers are provisioned, the fix works!

@openshift-ci openshift-ci Bot requested review from rccrdpccl and shay23bra June 11, 2026 20:54
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 44.36%. Comparing base (a9cfe09) to head (b06897f).
⚠️ Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
...ler/controllers/preprovisioningimage_controller.go 87.50% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10456      +/-   ##
==========================================
+ Coverage   44.34%   44.36%   +0.01%     
==========================================
  Files         420      420              
  Lines       73064    73107      +43     
==========================================
+ Hits        32399    32432      +33     
- Misses      37731    37741      +10     
  Partials     2934     2934              
Files with missing lines Coverage Δ
...ler/controllers/preprovisioningimage_controller.go 80.16% <87.50%> (+0.08%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@carbonin carbonin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD bc2a643 and 2 for PR HEAD b06897f in total

@CrystalChun

Copy link
Copy Markdown
Contributor

/override ci/prow/edge-e2e-ai-operator-disconnected-capi ci/prow/edge-e2e-ai-operator-ztp-capi

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

@CrystalChun: Overrode contexts on behalf of CrystalChun: ci/prow/edge-e2e-ai-operator-disconnected-capi, ci/prow/edge-e2e-ai-operator-ztp-capi

Details

In response to this:

/override ci/prow/edge-e2e-ai-operator-disconnected-capi ci/prow/edge-e2e-ai-operator-ztp-capi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown

@carbonin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit aa192af into openshift:master Jun 12, 2026
24 checks passed
@carbonin carbonin deleted the fix-initrd-ppi-reboot-loop branch June 12, 2026 13:48
@carbonin

Copy link
Copy Markdown
Member Author

/cherry-pick release-ocm-2.17

@carbonin

Copy link
Copy Markdown
Member Author

/cherry-pick release-ocm-2.16

@openshift-cherrypick-robot

Copy link
Copy Markdown

@carbonin: new pull request created: #10473

Details

In response to this:

/cherry-pick release-ocm-2.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot

Copy link
Copy Markdown

@carbonin: new pull request created: #10474

Details

In response to this:

/cherry-pick release-ocm-2.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@carbonin

Copy link
Copy Markdown
Member Author

/cherry-pick release-ocm-2.15

@carbonin

Copy link
Copy Markdown
Member Author

/cherry-pick release-ocm-2.14

@openshift-cherrypick-robot

Copy link
Copy Markdown

@carbonin: new pull request created: #10505

Details

In response to this:

/cherry-pick release-ocm-2.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants