Skip to content

[Workload]: realtime #161

Description

@mrhillsman

Workload Name

realtime

Workload Description

Latency-sensitive workload that runs cyclictest on a real-time-capable VM to measure scheduling jitter. Produces continuous worst-case latency measurements (min/avg/max in microseconds) that quantify whether the VM's CPU pinning, NUMA topology, and hugepages configuration actually delivers deterministic scheduling under load.

This workload consumes the OpenShift Virtualization Cookbook's performance tuning recipes (CPU pinning, NUMA, hugepages, real-time kernel) and produces a quantitative signal — maximum scheduling jitter in microseconds. Without this workload, the Cookbook's tuning recommendations are validated only by inspection ("the VM spec has dedicated CPUs") rather than by measurement ("the VM achieves <20μs worst-case jitter").

This is critical for Red Hat's telco vertical — the largest deployment base for OpenShift Virtualization in production. Telco VNFs (Virtual Network Functions) running in VMs require deterministic scheduling with jitter below 20-50μs. Every telco partner validating on OPL needs to prove their VNF achieves this target on the tuned VM.

Tooling and Packages

  • Tool: cyclictest from the rt-tests package
  • RPM packages: rt-tests
  • systemd service command: cyclictest --mlockall --smp --priority=99 --interval=1000 --distance=0 --duration=0
    • --mlockall: lock all memory to prevent page faults
    • --smp: one thread per CPU
    • --priority=99: highest real-time priority
    • --interval=1000: 1000μs (1ms) measurement interval
    • --distance=0: all threads use the same interval
    • --duration=0: run indefinitely
  • Output: periodic max-latency reports to stdout/journald
  • Configurable parameters:
    • rt-priority: scheduling priority (default: 99)
    • rt-interval: measurement interval in microseconds (default: 1000)
    • rt-duration: test duration in seconds (default: 0 = indefinite)
    • rt-histogram: enable latency histogram output (default: false)

VM Count Model

Single VM (like cpu, memory, disk)

Required Resources

  • Persistent storage (DataVolume)
  • Kubernetes Service (for inter-VM communication)
  • Kubernetes Secret (for credentials or config)
  • Additional CPU/memory beyond defaults
  • GPU or special device passthrough

The VM spec should request dedicated CPUs (spec.domain.cpu.dedicatedCpuPlacement: true) and ideally be configured with NUMA topology and hugepages. Without dedicated CPU placement, cyclictest results are meaningless — the point is to measure jitter on pinned cores, not the expected high jitter of shared cores.

Cloud-Init Details

packages:
  - rt-tests
  - tuned
  - tuned-profiles-realtime
write_files:
  - path: /usr/local/bin/virtwork-realtime.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      PRIORITY="${RT_PRIORITY:-99}"
      INTERVAL="${RT_INTERVAL:-1000}"
      DURATION="${RT_DURATION:-0}"
      HISTOGRAM="${RT_HISTOGRAM:-}"

      # Apply real-time tuned profile if available
      if command -v tuned-adm &>/dev/null; then
        tuned-adm profile realtime 2>/dev/null || true
      fi

      ARGS="--mlockall --smp --priority=$PRIORITY --interval=$INTERVAL --distance=0"
      if [ "$DURATION" -gt 0 ] 2>/dev/null; then
        ARGS="$ARGS --duration=$DURATION"
      fi
      if [ -n "$HISTOGRAM" ]; then
        ARGS="$ARGS --histogram=200"
      fi

      exec cyclictest $ARGS
  - path: /etc/systemd/system/virtwork-realtime.service
    content: |
      [Unit]
      Description=Virtwork real-time latency measurement (cyclictest)
      After=multi-user.target
      [Service]
      Type=simple
      Environment=RT_PRIORITY=99
      Environment=RT_INTERVAL=1000
      Environment=RT_DURATION=0
      ExecStart=/usr/local/bin/virtwork-realtime.sh
      Restart=always
      RestartSec=5
      LimitMEMLOCK=infinity
      LimitRTPRIO=99
      CPUSchedulingPolicy=fifo
      CPUSchedulingPriority=99
      [Install]
      WantedBy=multi-user.target
runcmd:
  - systemctl enable --now virtwork-realtime.service

Use Case

  • Telco partners (VNF vendors — Ericsson, Nokia, Mavenir, Samsung): The primary audience. Telco VNFs running signal processing, packet forwarding, and baseband functions in VMs require worst-case scheduling jitter below 20-50μs. Every telco partner validating on OPL needs a reproducible way to prove their VNF achieves this target on the configured VM. cyclictest is the industry-standard tool for this measurement.
  • Red Hat CNV performance validation: Validates that the platform's real-time features — CPU manager static policy, NUMA-aware scheduling, hugepages, RT kernel — actually deliver deterministic scheduling when composed together. The Cookbook documents how to configure these features; this workload measures whether they work.
  • Performance tuning partners: Partners building performance profiling or tuning tools need a workload with a clear, measurable performance target (max jitter < Xμs). cyclictest provides exactly this — a single number that improves or regresses as tuning changes are applied.
  • Hardware/platform partners (Intel, AMD, ARM server vendors): Need to validate that their hardware's real-time capabilities are correctly exposed through KubeVirt to the VM guest. cyclictest on a properly configured VM should produce results comparable to bare-metal cyclictest — any significant deviation indicates a platform issue.
  • Cookbook integration: Direct feedback loop — apply Cookbook performance tuning recipes, run virtwork run --workloads realtime, check if max jitter meets the target. This is the missing "verify" step in the Cookbook's performance tuning chapter.

Additional Context

  • VM spec requirements: This workload is only meaningful on VMs with dedicated CPU placement. The implementation should either:
    1. Automatically set spec.domain.cpu.dedicatedCpuPlacement: true on the VM spec (via VMResources() or a new method), or
    2. Warn/document that results on shared-CPU VMs are not representative.
      Consider adding a --dedicated-cpus flag or auto-detecting the cluster's CPU manager policy.
  • Base image consideration: For best results, the VM should run a real-time kernel (kernel-rt). The default Fedora container disk doesn't include kernel-rt. Options:
    1. Use a RHEL 9 container disk with RT kernel available
    2. Document that the standard kernel produces higher jitter (typically 50-200μs vs <20μs on RT kernel)
    3. Install kernel-rt via cloud-init and reboot — adds significant boot time
  • Output interpretation:
    • Max latency <20μs: excellent, meets telco requirements
    • Max latency 20-50μs: acceptable for many VNFs
    • Max latency >100μs: CPU pinning or NUMA configuration issue
    • Max latency >1000μs: no dedicated CPUs or severe interference
  • Pairing with other workloads: The most valuable test is running cyclictest alongside other workloads (cpu, disk, network) on the same cluster. Real-time scheduling should maintain low jitter even when other VMs are generating heavy load on adjacent cores. This is the "noisy neighbor" test that telco partners care most about.
  • tuned profile: The realtime tuned profile configures kernel parameters (transparent hugepages disabled, CPU frequency governor set to performance, IRQ balancing tuned) that significantly improve cyclictest results. Installing and activating it via cloud-init is recommended.
  • cyclictest output format: T: 0 ( 1234) P:99 I:1000 C: 50000 Min: 1 Act: 3 Avg: 2 Max: 12 — the Max column is the critical number for telco validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.size/LDenotes a PR that changes 100-499 lines, ignoring generated files.workload-requestRequest for a new workload typeworkload/tier-2High impact, introduces new patterns or requires domain knowledge.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions