Workload Name
migration-stress
Workload Description
High memory-dirty-rate workload purpose-built to stress KubeVirt live migration. Continuously writes to memory pages at a configurable rate, forcing the live migration pre-copy phase to iterate more rounds before convergence. Produces a sustained, measurable memory dirtying pattern that makes the VM actively hard to migrate — the exact scenario that storage, networking, and monitoring partners need to validate their products against during migration events.
This is uniquely KubeVirt/CNV — no equivalent exists in generic VM workload tooling. Live migration is a first-class operation in OpenShift Virtualization (triggered by node drain, maintenance, rebalancing), and partner products must continue functioning correctly while VMs migrate. The existing memory workload uses stress-ng to allocate and hold memory at 80% pressure — it tests memory capacity, not memory mutation rate. migration-stress targets a completely different axis: how fast pages are being dirtied, which directly determines migration difficulty and duration.
Tooling and Packages
- Tool: stress-ng with the
vm stressor in aggressive write mode, or a custom mmap/write loop
- RPM packages:
stress-ng
- systemd service command:
stress-ng --vm 2 --vm-bytes 75% --vm-method write64 --vm-keep --aggressive
--vm 2: two VM stressor workers
--vm-bytes 75%: each worker maps 75% of available memory
--vm-method write64: write 64-bit values (high dirty rate)
--vm-keep: keep mappings (re-dirty same pages rather than remap)
--aggressive: maximize throughput
- Alternative: custom C/shell program using
mmap(MAP_PRIVATE|MAP_ANONYMOUS) + sequential write loop for maximum dirty rate control
- Configurable parameters:
dirty-rate-target: target MB/s of memory dirtying (default: unbounded — as fast as possible)
vm-workers: number of stressor workers (default: 2)
vm-bytes-percent: percentage of memory to map per worker (default: 75)
vm-method: stress-ng vm method — write64 (fast), write1024 (faster), zero (fastest) (default: write64)
VM Count Model
Single VM (like cpu, memory, disk)
Required Resources
Memory should be sized larger than defaults (e.g., 4Gi+) to produce meaningful migration difficulty. With only 1Gi of memory, even a high dirty rate converges quickly. At 4-8Gi with aggressive dirtying, migration takes noticeably longer and may require multiple pre-copy rounds.
Cloud-Init Details
packages:
- stress-ng
write_files:
- path: /etc/systemd/system/virtwork-migration-stress.service
content: |
[Unit]
Description=Virtwork migration stress workload (high memory dirty rate)
After=multi-user.target
[Service]
Type=simple
ExecStart=/usr/bin/stress-ng \
--vm 2 \
--vm-bytes 75%% \
--vm-method write64 \
--vm-keep \
--aggressive \
--metrics-brief
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
runcmd:
- systemctl enable --now virtwork-migration-stress.service
Use Case
- Storage partners (CSI drivers): During live migration, the storage layer must handle the VM's disk being accessed from a new node. A VM that is actively dirtying memory at high rates creates longer migration windows and more complex storage handoff scenarios. Partners need to validate that their CSI driver maintains I/O consistency during extended migration — not just the quick migrations that idle VMs produce.
- Network partners (CNI, SDN, OVN-Kubernetes): Live migration moves a VM's network identity (IP, MAC) to a new node. A VM with active memory writes produces a longer migration window during which network traffic must be correctly routed to both the source and destination. Partners need sustained migrations to validate network cutover correctness and downtime measurement.
- Monitoring/Observability partners: Partners need to validate that their agents correctly handle metric continuity during migration — no gaps, no duplicate data points, correct host/node attribution before and after migration. A VM that migrates quickly (idle VM, <1 second) doesn't exercise this. A VM that takes 30-60 seconds to migrate because of dirty memory exposes real-world monitoring gaps.
- Platform engineering / Node lifecycle: Validates that
oc adm drain with live migration works correctly under realistic conditions. An idle cluster drains instantly; a cluster with migration-stress VMs reveals the real-world drain duration and any timeout issues.
- Red Hat CNV engineering: Provides a reproducible, tunable migration difficulty benchmark. Adjusting
vm-bytes-percent and vm-method produces a spectrum from "easy migration" to "convergence-challenged migration" — useful for regression testing migration improvements.
Additional Context
- This workload is distinct from the existing
memory workload in purpose and behavior:
memory: allocates and holds memory at target pressure — tests memory capacity and OOM behavior
migration-stress: continuously writes to memory pages — tests migration convergence and duration
- A VM can run both simultaneously (memory pressure + high dirty rate), but they test different things.
- The key metric to observe is migration duration:
virtctl migrate <vm-name> followed by watching oc get vmim (VirtualMachineInstanceMigration) for the migration to complete. With this workload active, migration should take measurably longer than an idle VM.
- Consider pairing this workload with a monitoring dashboard that shows: migration start time, dirty page rate during pre-copy iterations, convergence point, total migration duration, and post-migration workload resumption. This is the demo that sells the value to partners.
- stress-ng's
--vm-method options provide a tunable dirty rate spectrum:
write64: moderate dirty rate (~2-5 GB/s depending on hardware)
write1024: higher dirty rate
zero: maximum dirty rate (memset to zero)
- This lets users tune migration difficulty from "slightly harder than idle" to "convergence-challenged."
- The
--vm-keep flag is critical — without it, stress-ng remaps memory on each iteration, which tests page fault handling rather than page dirtying. --vm-keep ensures the same pages are repeatedly dirtied, which is what stresses the migration pre-copy algorithm.
- Memory recommendation: 4Gi minimum, 8Gi for meaningful migration stress. At 2Gi with write64, migration typically converges in 1-2 pre-copy rounds. At 8Gi, it may take 5+ rounds or require post-copy migration fallback.
Workload Name
migration-stress
Workload Description
High memory-dirty-rate workload purpose-built to stress KubeVirt live migration. Continuously writes to memory pages at a configurable rate, forcing the live migration pre-copy phase to iterate more rounds before convergence. Produces a sustained, measurable memory dirtying pattern that makes the VM actively hard to migrate — the exact scenario that storage, networking, and monitoring partners need to validate their products against during migration events.
This is uniquely KubeVirt/CNV — no equivalent exists in generic VM workload tooling. Live migration is a first-class operation in OpenShift Virtualization (triggered by node drain, maintenance, rebalancing), and partner products must continue functioning correctly while VMs migrate. The existing
memoryworkload uses stress-ng to allocate and hold memory at 80% pressure — it tests memory capacity, not memory mutation rate.migration-stresstargets a completely different axis: how fast pages are being dirtied, which directly determines migration difficulty and duration.Tooling and Packages
vmstressor in aggressive write mode, or a custom mmap/write loopstress-ngstress-ng --vm 2 --vm-bytes 75% --vm-method write64 --vm-keep --aggressive--vm 2: two VM stressor workers--vm-bytes 75%: each worker maps 75% of available memory--vm-method write64: write 64-bit values (high dirty rate)--vm-keep: keep mappings (re-dirty same pages rather than remap)--aggressive: maximize throughputmmap(MAP_PRIVATE|MAP_ANONYMOUS)+ sequential write loop for maximum dirty rate controldirty-rate-target: target MB/s of memory dirtying (default: unbounded — as fast as possible)vm-workers: number of stressor workers (default: 2)vm-bytes-percent: percentage of memory to map per worker (default: 75)vm-method: stress-ng vm method —write64(fast),write1024(faster),zero(fastest) (default:write64)VM Count Model
Single VM (like cpu, memory, disk)
Required Resources
Memory should be sized larger than defaults (e.g., 4Gi+) to produce meaningful migration difficulty. With only 1Gi of memory, even a high dirty rate converges quickly. At 4-8Gi with aggressive dirtying, migration takes noticeably longer and may require multiple pre-copy rounds.
Cloud-Init Details
Use Case
oc adm drainwith live migration works correctly under realistic conditions. An idle cluster drains instantly; a cluster with migration-stress VMs reveals the real-world drain duration and any timeout issues.vm-bytes-percentandvm-methodproduces a spectrum from "easy migration" to "convergence-challenged migration" — useful for regression testing migration improvements.Additional Context
memoryworkload in purpose and behavior:memory: allocates and holds memory at target pressure — tests memory capacity and OOM behaviormigration-stress: continuously writes to memory pages — tests migration convergence and durationvirtctl migrate <vm-name>followed by watchingoc get vmim(VirtualMachineInstanceMigration) for the migration to complete. With this workload active, migration should take measurably longer than an idle VM.--vm-methodoptions provide a tunable dirty rate spectrum:write64: moderate dirty rate (~2-5 GB/s depending on hardware)write1024: higher dirty ratezero: maximum dirty rate (memset to zero)--vm-keepflag is critical — without it, stress-ng remaps memory on each iteration, which tests page fault handling rather than page dirtying.--vm-keepensures the same pages are repeatedly dirtied, which is what stresses the migration pre-copy algorithm.