Skip to content

stp, sig-virt: Introduce the AllowWorkloadDisruption migration STP#82

Open
SamAlber wants to merge 1 commit into
RedHatQE:mainfrom
SamAlber:stp/sig-virt/workload-disruption-migration
Open

stp, sig-virt: Introduce the AllowWorkloadDisruption migration STP#82
SamAlber wants to merge 1 commit into
RedHatQE:mainfrom
SamAlber:stp/sig-virt/workload-disruption-migration

Conversation

@SamAlber

@SamAlber SamAlber commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Add Software Test Plan for the AllowWorkloadDisruption (AWD) migration feature, covering PostCopy and Paused migration modes across RHEL and Windows guests with migration, node drain, and CPU/memory hotplug triggers.

Special notes for your reviewer

Summary by CodeRabbit

  • Documentation
    • Added a new QE Software Test Plan for “Allow Workload Disruption Migration,” defining AWD/PostCopy/Paused terminology and providing quality review checklists with acceptance criteria (expected final completion mode, guest background process PID preservation, and CPU/memory hotplug visibility for supported RHEL and Windows guests). Also documents known limitations, tuning/validation guidance, detailed test scope (what’s included and excluded), required environment/entry requirements, and risk/mitigation tracking with reviewer/approver sign-off.

@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a new Software Test Plan at stps/sig-virt/workload-disruption-migration.md covering AWD migration terminology, acceptance criteria, test scope, environment requirements, risks, traceability, and sign-off.

Changes

AWD Migration Quality Plan

Layer / File(s) Summary
Feature terminology and acceptance criteria
stps/sig-virt/workload-disruption-migration.md
Defines AWD/PostCopy/Paused terminology and QE acceptance criteria for migration completion mode, guest background process PID preservation, and hotplugged CPU/memory visibility in supported guest OSes. Records the s390x memory hotplug limitation and Paused-mode tuning dependency.
Test scope and environment baseline
stps/sig-virt/workload-disruption-migration.md
Defines migration test scope and exclusions, then specifies the required topology and cluster, storage, networking, virtualization, and Windows infrastructure conditions.
Entry criteria, risks, and mitigations
stps/sig-virt/workload-disruption-migration.md
Adds tools and entry criteria, plus risks and mitigations centered on final-mode verification, Paused-mode tuning sensitivity, and infrastructure constraints.
Requirement traceability and sign-off
stps/sig-virt/workload-disruption-migration.md
Maps CNV requirements to RHEL and Windows scenarios with tier and priority, then adds reviewer and approver sign-off details.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

commented-coderabbitai[bot]

Suggested reviewers

  • kbidarkar
  • dshchedr
  • vsibirsk
  • akri3i
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing the AllowWorkloadDisruption migration STP for sig-virt.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-virtualization-qe-bot-2

Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest all - Run all available tests

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)

📊 Review Process

Approvers and Reviewers

Approvers:

  • dshchedr
  • vsibirsk

Reviewers:

  • SamAlber
  • SiboWang1997
  • akri3i
  • dshchedr
  • jerry7z
  • kbidarkar
  • vsibirsk
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@openshift-virtualization-qe-bot-2

Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-2

Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Line 175: Update the PM approver line that currently only lists an email
("Product Manager/Owner: [Ronen Sde-Or](ronen@redhat.com)") to include the
GitHub handle per STP Section IV requirements; locate the "Product
Manager/Owner" entry in workload-disruption-migration.md and replace or augment
the email-only link with the approver's GitHub handle (e.g., include the
`@github-username` or a link to their GitHub profile) so the entry contains both
the name and GitHub handle.
- Around line 46-49: The Known Limitations section under the heading "#### **2.
Known Limitations**" currently lists s390x memory hotplug but lacks the
mandatory sign-off; add an explicit sign-off line for that limitation in the
form "*Sign-off:* [Name/Date]" or replace it with the required phrase "None —
reviewed and confirmed by [Name/Date]" if appropriate, and do the same for the
Test Limitations block referenced around lines 86-88 (ensure each listed
limitation has its own sign-off or the explicit "None — reviewed and confirmed…"
statement) so the STP sections I.2 and II.1 comply with the sign-off guideline.
- Around line 24-30: Summary: The paragraph uses internal API/field names
(allowWorkloadDisruption, MigrationPolicy, allowPostCopy) and implementation
mechanics (precopy, PostCopy, Paused) instead of user-facing behavior; rewrite
to describe only observable user experience and acceptance criteria.
Instructions: Remove all internal identifiers and implementation terms (e.g.,
allowWorkloadDisruption, MigrationPolicy, allowPostCopy, precopy, PostCopy,
Paused) and replace with plain-language descriptions of what users will see
(e.g., "migration may use more disruptive methods to ensure completion", "VM may
be briefly stunned or live state resumed on the destination", "migration will
complete even if memory is rapidly changing"); update acceptance criteria to be
product-observable (e.g., successful node drain completes, VM state visible as
paused or resumed to users, migration completes rather than failing) and apply
the same rewrite style at the other referenced locations (lines ~39, ~57, ~137).
- Around line 41-45: Convert the two items under the "Customer Use Cases"
section into user-story format (e.g., "As an admin, I want to drain a node for
maintenance so that VMs with AWD policy migrate using PostCopy or Paused mode
and resume without process loss" and "As an admin, I want to hotplug CPU/memory
to a running VM so AWD migration reflects new resources without process loss"),
and update the "Non-Functional Requirements (NFRs)" section to explicitly list
and address Monitoring, Observability, UI, Documentation, Performance, Security,
and Scalability (each with a short measurable acceptance criterion or owner),
instead of deferring them to section II.3, so reviewers can clearly verify NFR
coverage.
- Around line 141-149: The Risks table in workload-disruption-migration.md is
missing the required "Other" risk category and lacks explicit sign-off details
for each risk; update the table to include a seventh row labeled "Other" with a
clear risk description and mitigation strategy, and add a "Sign-off" column (or
append sign-off cells) for every risk row containing the approver's name/role
and date (or a placeholder like "TBD by [ROLE] - YYYY-MM-DD") so each risk entry
includes Risk description, Mitigation strategy, and Sign-off; ensure you edit
the existing table header and rows (the risk table block around the Risk
Category/Test Coverage/Test Environment entries) to add these fields
consistently.
- Around line 92-107: The Test Strategy table is missing a "Scale Testing" row
and the "Upgrade Testing" row lacks an explicit statement about upgrade path
evaluation; add a new table row labeled "Scale Testing" with Description
covering scale/volume considerations and a clear Applicable (Y/N or N/A) and
Comments noting planned approach or deferred rationale, and update the existing
"Upgrade Testing" row (the cell under Description or Comments for "Upgrade
Testing") to explicitly state whether the upgrade path was evaluated or deferred
and include any mitigation/plan; locate and edit the rows named "Scale Testing"
and "Upgrade Testing" in the existing markdown table (the entries currently
titled "Upgrade Testing" and the nearby "Performance Testing") to make these
additions/clarifications.
- Around line 79-85: Replace the four TODO PM/Lead Agreement entries in the
Out-of-Scope table for the rows containing "Windows node drain", "Windows memory
hotplug", "s390x memory hotplug", and "Performance / scale testing" with a
proper "PM/Lead Agreement" value including the approver's name and date (e.g.,
"Name / YYYY-MM-DD"), and convert the table into the required itemized format by
ensuring each item has explicit Rationale and PM/Lead Agreement fields (move the
rationale text into the Rationale field and replace each TODO with the name/date
agreement), so the Out-of-Scope section meets the STP Section II.1 requirements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 090197de-e007-45a6-80c5-dbbc6fba13a7

📥 Commits

Reviewing files that changed from the base of the PR and between 424a4f9 and 7cb614b.

📒 Files selected for processing (1)
  • stps/sig-virt/workload-disruption-migration.md

Comment on lines +24 to +30
When a VM is live-migrated, precopy may fail to converge if the guest dirties pages faster than the migration
bandwidth can transfer them. With `allowWorkloadDisruption` enabled on a MigrationPolicy, KubeVirt is permitted
to use disruptive migration modes (PostCopy or Paused) to guarantee the migration completes. This feature is
critical for operations that require migration — such as node drain and CPU/memory hotplug — where a failed
migration would block the operation entirely. The mode selection is controlled by the `allowPostCopy` field:
when enabled, the migration switches to PostCopy; when disabled, the VM is stunned and migrated in Paused mode.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Rewrite feature description and requirements in user-facing terms only.

This STP repeatedly uses API/field names and internal mechanics. That breaks the repo rule for user-perspective-only STPs and makes acceptance criteria less product-observable.

As per coding guidelines, "STPs must describe what users experience from a user perspective only — no API field names, CRD names, internal component references, or implementation mechanisms".

Also applies to: 39-39, 57-57, 137-137

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 24 - 30,
Summary: The paragraph uses internal API/field names (allowWorkloadDisruption,
MigrationPolicy, allowPostCopy) and implementation mechanics (precopy, PostCopy,
Paused) instead of user-facing behavior; rewrite to describe only observable
user experience and acceptance criteria. Instructions: Remove all internal
identifiers and implementation terms (e.g., allowWorkloadDisruption,
MigrationPolicy, allowPostCopy, precopy, PostCopy, Paused) and replace with
plain-language descriptions of what users will see (e.g., "migration may use
more disruptive methods to ensure completion", "VM may be briefly stunned or
live state resumed on the destination", "migration will complete even if memory
is rapidly changing"); update acceptance criteria to be product-observable
(e.g., successful node drain completes, VM state visible as paused or resumed to
users, migration completes rather than failing) and apply the same rewrite style
at the other referenced locations (lines ~39, ~57, ~137).

Comment thread stps/sig-virt/workload-disruption-migration.md Outdated
Comment thread stps/sig-virt/workload-disruption-migration.md
Comment thread stps/sig-virt/workload-disruption-migration.md
Comment thread stps/sig-virt/workload-disruption-migration.md
Comment thread stps/sig-virt/workload-disruption-migration.md Outdated
@openshift-virtualization-qe-bot

Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot

Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
stps/sig-virt/workload-disruption-migration.md (7)

175-175: ⚠️ Potential issue | 🟡 Minor

MEDIUM: PM approver entry needs GitHub handle.

Line 175 includes email only; Section IV requires reviewer/approver identity with GitHub handle.

As per coding guidelines, "STP Sign-off and Approval (Section IV) must list Reviewers with names and GitHub handles and Approvers (QE Lead, PM, Dev Lead at minimum) with no placeholder text remaining".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` at line 175, The PM approver
entry currently shows only an email ("Product Manager/Owner: [Ronen
Sde-Or](rsdeor@redhat.com)"); update the Section IV approvers/reviewers list to
include the PM's GitHub handle alongside their name (for example replace the
email-only link with a GitHub-style reference such as "Product Manager/Owner:
Ronen Sde‑Or (`@rsdeor`)" or a Markdown link to their GitHub profile), ensuring
the Approvers block (QE Lead, PM, Dev Lead) contains real names and GitHub
handles with no placeholder text remaining.

18-20: ⚠️ Potential issue | 🔴 Critical

CRITICAL: Rewrite internal/API-centric wording into user-observable behavior only.

The STP still relies on API fields/CRD terms and internal mechanics (e.g., MigrationPolicy, field names, VMI status internals) across core requirement, strategy, and scenario text. This blocks approval for this repo’s STP standard.

As per coding guidelines, "STPs must describe what users experience from the user perspective only — no API field names, CRD names, internal component references, or implementation mechanisms".

Also applies to: 24-30, 39-39, 42-43, 57-57, 66-66, 70-76, 88-88, 98-98, 137-137, 145-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 18 - 20, Rewrite
the affected sections to describe only user-observable behavior and outcomes,
removing all API/CRD/internal names and implementation mechanics: replace
references like "AWD", "MigrationPolicy", "PostCopy", "Paused", and any mentions
of VMI status internals with plain-language descriptions of what the user will
see (e.g., "the VM may be briefly paused to finish transfer", "remaining memory
is fetched on-demand after start on target", "a migration mode that resumes
immediately once final data arrives"), and update the core requirement,
strategy, and scenario paragraphs accordingly so they describe behaviors,
triggers, and user-facing states rather than field names or internal components.

48-48: ⚠️ Potential issue | 🟠 Major

HIGH: Known/Test limitations are missing mandatory sign-off lines.

Each limitation item must include explicit sign-off metadata (*Sign-off:* [Name/Date]) or the required “None — reviewed and confirmed…” statement when applicable.

As per coding guidelines, "STP Known Limitations (Section I.2) must include sign-off with *Sign-off:* [Name/Date] for each limitation" and "STP Test Limitations (Section II.1) must each have sign-off: *Sign-off:* [Name/Date]."

Also applies to: 88-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` at line 48, The Known
Limitations and Test Limitations entries are missing the mandatory sign-off
metadata; update each limitation item in "Known Limitations (Section I.2)" and
"Test Limitations (Section II.1)" — including the s390x memory hotplug note (the
line stating "s390x does not support maxGuest memory...") — to append a sign-off
line in the exact format "*Sign-off:* [Name/Date]" or, if there truly are none,
replace the absence with the required statement "None — reviewed and confirmed
[Name/Date]"; ensure every listed limitation has one of these sign-offs so the
document meets the STP sign-off requirements.

92-107: ⚠️ Potential issue | 🟠 Major

HIGH: Test Strategy is still incomplete (missing Scale Testing).

The table does not include a dedicated Scale Testing entry, which is mandatory even if deferred/N/A with justification.

As per coding guidelines, "STP Test Strategy (Section II.2) must address all testing types (Functional, Automation, Regression, Performance, Scale, Security, Usability, Monitoring, Compatibility, Upgrade, Dependencies, Cross Integrations, Cloud Testing)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 92 - 107, Add a
missing "Scale Testing" row to the test matrix table: insert a row labeled
"Scale Testing" under the other test types with an Applicable (Y/N or N/A) value
and a concise comment that either describes the planned scale tests (e.g.,
node/pod/migration concurrency, VMI counts, storage IO at scale) or justifies
why scale testing is deferred/N/A for this STP; ensure the new row follows the
existing table formatting and references "Scale Testing" so the STP Test
Strategy (Section II.2) explicitly addresses this required testing type.

79-85: ⚠️ Potential issue | 🔴 Critical

CRITICAL: Out-of-Scope section still contains TODO approvals and non-compliant structure.

PM/ Lead Agreement cannot remain TODO, and each out-of-scope item must be in the required itemized format with rationale and named/date agreement.

As per coding guidelines, "STP Out of Scope items (Section II.1) must have Rationale and PM/Lead Agreement with name and date (format: - **Item** / *Rationale:* / *PM/Lead Agreement:*)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 79 - 85, Replace
the table under "Out-of-Scope Item" in workload-disruption-migration.md (Section
II.1) so that no PM/ Lead Agreement cells contain "TODO": convert each row into
the required itemized format "- **Item** / *Rationale:* / *PM/Lead Agreement:*"
and fill the PM/Lead Agreement with an actual name and date (e.g., "Jane Doe,
2026-04-19") for each of the four items (Windows node drain, Windows memory
hotplug, s390x memory hotplug, Performance / scale testing); ensure the
rationale text from the table is preserved and that the header/table is removed
or replaced so the section conforms exactly to the prescribed itemized
structure.

141-149: ⚠️ Potential issue | 🟠 Major

HIGH: Risks section missing required category and sign-off data.

The table omits the Other risk category and does not include explicit per-risk sign-off, which are both required.

As per coding guidelines, "STP Risks (Section II.5) must address ALL 7 risk categories (Timeline/Schedule, Test Coverage, Test Environment, Untestable Aspects, Resource Constraints, Dependencies, Other), even if N/A with justification" and "STP Risks must each have: Risk description, Mitigation strategy, and Sign-off."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 141 - 149, The
Risks table in STP Risks (Section II.5) is missing the required "Other" risk
category and per-risk sign-off entries; update the markdown table to include a
row for "Other" (even if N/A with justification) and add a Sign-off column entry
for every existing row (Timeline/Schedule, Test Coverage, Test Environment,
Untestable Aspects, Resource Constraints, Dependencies, and Other) so each risk
has Risk description, Mitigation strategy, and Sign-off; ensure the new column
values clearly indicate who signs off (or "N/A — justification") and use the
same table format/pipe alignment as the existing rows.

41-44: ⚠️ Potential issue | 🟠 Major

HIGH: Section I.1 still misses required use-case and NFR structure.

Line 41 is not in user-story format (As a [role], I want...), and Line 44 does not explicitly and verifiably address all required NFR categories (Monitoring, Observability, UI, Documentation, Performance, Security, Scalability).

As per coding guidelines, "STP Customer use cases must be in user story format ("As a [role], I want to [action]")" and "STP NFRs must explicitly address: Monitoring, Observability, UI, Documentation, Performance, Security, Scalability. NFRs not covered must have justification".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 41 - 44, Convert
the "Customer Use Cases" entry into one or more user stories using the required
format (e.g., "As an admin, I want to drain a node so that VMs with AWD policy
migrate using PostCopy or Paused mode and resume without process loss"; "As an
admin, I want to hotplug CPU/memory so that AWD migration preserves processes
and reflects new resources in the guest"), and update the "Non-Functional
Requirements (NFRs)" row to explicitly list each NFR category (Monitoring,
Observability, UI, Documentation, Performance, Security, Scalability) with
either a short requirement or a justification why it is not applicable (for
example: Monitoring: none required / justification; Observability: covered by X
tests; UI: none; Documentation: covered in KubeVirt user-guide; Performance:
completionTimeoutPerGiB and bandwidth caps defined for testability; Security:
RBAC covered by core KubeVirt tests; Scalability: state impact limited to
node-level), referencing the table headings "Customer Use Cases" and
"Non-Functional Requirements (NFRs)" and terms like MigrationPolicy and
completionTimeoutPerGiB to keep context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Around line 110-122: The test-environment table contains empty cells; update
the table rows (notably the "Required Operators", "Platform", and "Special
Configurations" columns and any other blank example cells) so every cell
contains either an explicit value or "N/A", and make the "OCP & OpenShift
Virtualization Version(s)" entry explicitly state both the supported OCP version
and the OpenShift Virtualization operator version (e.g., "OCP 4.17+, OVS X.Y.Z"
or "OCP 4.17+, OpenShift Virtualization X.Y.Z"); ensure the "Specification
Examples" column entries are completed for the affected rows (Cluster Topology,
Required Operators, Platform, Special Configurations) so there are no empty
cells remaining.

---

Duplicate comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Line 175: The PM approver entry currently shows only an email ("Product
Manager/Owner: [Ronen Sde-Or](rsdeor@redhat.com)"); update the Section IV
approvers/reviewers list to include the PM's GitHub handle alongside their name
(for example replace the email-only link with a GitHub-style reference such as
"Product Manager/Owner: Ronen Sde‑Or (`@rsdeor`)" or a Markdown link to their
GitHub profile), ensuring the Approvers block (QE Lead, PM, Dev Lead) contains
real names and GitHub handles with no placeholder text remaining.
- Around line 18-20: Rewrite the affected sections to describe only
user-observable behavior and outcomes, removing all API/CRD/internal names and
implementation mechanics: replace references like "AWD", "MigrationPolicy",
"PostCopy", "Paused", and any mentions of VMI status internals with
plain-language descriptions of what the user will see (e.g., "the VM may be
briefly paused to finish transfer", "remaining memory is fetched on-demand after
start on target", "a migration mode that resumes immediately once final data
arrives"), and update the core requirement, strategy, and scenario paragraphs
accordingly so they describe behaviors, triggers, and user-facing states rather
than field names or internal components.
- Line 48: The Known Limitations and Test Limitations entries are missing the
mandatory sign-off metadata; update each limitation item in "Known Limitations
(Section I.2)" and "Test Limitations (Section II.1)" — including the s390x
memory hotplug note (the line stating "s390x does not support maxGuest
memory...") — to append a sign-off line in the exact format "*Sign-off:*
[Name/Date]" or, if there truly are none, replace the absence with the required
statement "None — reviewed and confirmed [Name/Date]"; ensure every listed
limitation has one of these sign-offs so the document meets the STP sign-off
requirements.
- Around line 92-107: Add a missing "Scale Testing" row to the test matrix
table: insert a row labeled "Scale Testing" under the other test types with an
Applicable (Y/N or N/A) value and a concise comment that either describes the
planned scale tests (e.g., node/pod/migration concurrency, VMI counts, storage
IO at scale) or justifies why scale testing is deferred/N/A for this STP; ensure
the new row follows the existing table formatting and references "Scale Testing"
so the STP Test Strategy (Section II.2) explicitly addresses this required
testing type.
- Around line 79-85: Replace the table under "Out-of-Scope Item" in
workload-disruption-migration.md (Section II.1) so that no PM/ Lead Agreement
cells contain "TODO": convert each row into the required itemized format "-
**Item** / *Rationale:* / *PM/Lead Agreement:*" and fill the PM/Lead Agreement
with an actual name and date (e.g., "Jane Doe, 2026-04-19") for each of the four
items (Windows node drain, Windows memory hotplug, s390x memory hotplug,
Performance / scale testing); ensure the rationale text from the table is
preserved and that the header/table is removed or replaced so the section
conforms exactly to the prescribed itemized structure.
- Around line 141-149: The Risks table in STP Risks (Section II.5) is missing
the required "Other" risk category and per-risk sign-off entries; update the
markdown table to include a row for "Other" (even if N/A with justification) and
add a Sign-off column entry for every existing row (Timeline/Schedule, Test
Coverage, Test Environment, Untestable Aspects, Resource Constraints,
Dependencies, and Other) so each risk has Risk description, Mitigation strategy,
and Sign-off; ensure the new column values clearly indicate who signs off (or
"N/A — justification") and use the same table format/pipe alignment as the
existing rows.
- Around line 41-44: Convert the "Customer Use Cases" entry into one or more
user stories using the required format (e.g., "As an admin, I want to drain a
node so that VMs with AWD policy migrate using PostCopy or Paused mode and
resume without process loss"; "As an admin, I want to hotplug CPU/memory so that
AWD migration preserves processes and reflects new resources in the guest"), and
update the "Non-Functional Requirements (NFRs)" row to explicitly list each NFR
category (Monitoring, Observability, UI, Documentation, Performance, Security,
Scalability) with either a short requirement or a justification why it is not
applicable (for example: Monitoring: none required / justification;
Observability: covered by X tests; UI: none; Documentation: covered in KubeVirt
user-guide; Performance: completionTimeoutPerGiB and bandwidth caps defined for
testability; Security: RBAC covered by core KubeVirt tests; Scalability: state
impact limited to node-level), referencing the table headings "Customer Use
Cases" and "Non-Functional Requirements (NFRs)" and terms like MigrationPolicy
and completionTimeoutPerGiB to keep context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef2e93af-35f0-4f4f-9f48-ea1140c1d9d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb614b and df64602.

📒 Files selected for processing (1)
  • stps/sig-virt/workload-disruption-migration.md

Comment thread stps/sig-virt/workload-disruption-migration.md
@openshift-virtualization-qe-bot-5

Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-5

Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from 7f3a3f4 to 8af640b Compare April 19, 2026 14:58
@openshift-virtualization-qe-bot-2

Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-2

Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from 8af640b to 3573a3b Compare June 22, 2026 12:46
@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from a599b71 to 997c156 Compare June 25, 2026 14:33
@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from 997c156 to 432dd10 Compare June 25, 2026 14:39
@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from 432dd10 to 4055f06 Compare June 25, 2026 15:07
| Out-of-Scope Item | Rationale | PM/ Lead Agreement |
|:------------------------------|:----------------------------------------------------------------------------------------------------------------|:------------------|
| Windows node drain | Already validated on RHEL; node drain is a cluster-level trigger. | TODO |
| Windows memory hotplug | Already validated on RHEL. | TODO |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would Windows memory hotplug play a role here?
If this is about migrating a VM with AWD that had memory hotplugged, I believe we should include this, as this might be different in that case?
Also what does "Already validated on RHEL mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
memory hotplug in windows works differently from rhel

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @mtessun and @vsibirsk . Added a Windows memory hotplug test, the mechanism is indeed different. Windows uses the VirtIO balloon driver while Linux uses native ACPI memory hotplug.

| **Understand Value** | [x] | Customers running latency-sensitive or memory-intensive workloads need guaranteed migration completion for maintenance operations (node drain, hotplug) without losing in-guest processes. | |
| **Customer Use Cases** | [x] | 1. As an admin, I want to drain a node for maintenance so that VMs with AWD policy migrate and resume without process loss.<br>2. As an admin, I want to hotplug CPU/memory to a running VM so that AWD migration completes and the guest reflects the new resources without process loss. | |
| **Testability** | [x] | Testable by configuring an AWD migration policy with a tight completion timeout and capped bandwidth, then verifying the migration mode after migration. | |
| **Acceptance Criteria** | [x] | 1. Migration completes in the expected mode (PostCopy or Paused) under AWD policy.<br>2. Guest background process PID is preserved after migration.<br>3. Hotplugged CPU/memory is reflected in the guest OS after AWD migration.<br>4. Both RHEL and Windows guests are supported. | |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2-4 are implementation details. for STP 1 is enough

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +70 to +75
- **[P0]** AWD Migration Mode (RHEL): Verify explicit migration completes in PostCopy and Paused modes with process preservation.
- **[P0]** AWD Migration Mode (Windows): Verify explicit migration completes in PostCopy and Paused modes with process preservation.
- **[P0]** AWD Node Drain (RHEL): Verify node drain triggers AWD migration in the expected mode with process preservation.
- **[P1]** AWD CPU Hotplug (RHEL): Verify CPU hotplug triggers AWD migration and guest reports new CPU count with process preservation.
- **[P1]** AWD CPU Hotplug (Windows): Verify CPU hotplug triggers AWD migration and guest reports new CPU count with process preservation.
- **[P1]** AWD Memory Hotplug (RHEL): Verify memory hotplug triggers AWD migration and guest reports new memory amount with process preservation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to split between guest os here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| Out-of-Scope Item | Rationale | PM/ Lead Agreement |
|:------------------------------|:----------------------------------------------------------------------------------------------------------------|:------------------|
| Windows node drain | Already validated on RHEL; node drain is a cluster-level trigger. | TODO |
| Windows memory hotplug | Already validated on RHEL. | TODO |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
memory hotplug in windows works differently from rhel

Comment on lines +156 to +161
| CNV-15225 | As an admin, I want AWD migration to complete in the expected mode (PostCopy/Paused) on RHEL VMs. | Migrate RHEL VM with AWD policy; verify migration mode is PostCopy or Paused and background process PID is preserved. | Tier 1 | P0 |
| CNV-15245 | As an admin, I want node drain to trigger AWD migration in the expected mode on RHEL VMs. | Drain the node hosting a RHEL VM with AWD policy; verify migration mode and background process PID after drain completes. | Tier 2 | P0 |
| CNV-15234 | As an admin, I want CPU hotplug to trigger AWD migration and reflect the new CPU count in the RHEL guest. | Hotplug CPU sockets on RHEL VM with AWD policy; verify migration mode, guest CPU count, and background process PID. | Tier 2 | P1 |
| CNV-15235 | As an admin, I want memory hotplug to trigger AWD migration and reflect the new memory amount in the RHEL guest. | Hotplug memory on RHEL VM with AWD policy; verify migration mode, guest memory amount, and background process PID. | Tier 2 | P1 |
| CNV-15246 | As an admin, I want AWD migration to complete in the expected mode (PostCopy/Paused) on Windows VMs. | Migrate Windows VM with AWD policy; verify migration mode is PostCopy or Paused and background process PID is preserved. | Tier 1 | P0 |
| CNV-15247 | As an admin, I want CPU hotplug to trigger AWD migration and reflect the new CPU count in the Windows guest. | Hotplug CPU sockets on Windows VM with AWD policy; verify migration mode, guest CPU count, and background process PID. | Tier 2 | P1 |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. from current description all these cases are t2
  2. splitting requirements on guest types is redundant (it's already test descriptions level of details)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread stps/sig-virt/workload-disruption-migration.md Outdated
Add Software Test Plan for the AllowWorkloadDisruption (AWD) migration
feature, covering PostCopy and Paused migration modes across RHEL and
Windows guests with migration, node drain, and CPU/memory hotplug
triggers.

Signed-off-by: Samuel Albershtein <salbersh@redhat.com>
Assisted-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants