Skip to content

[Workload]: object-storage #163

Description

@mrhillsman

Workload Name

object-storage

Workload Description

S3-compatible object storage workload that performs sustained PUT, GET, DELETE, and LIST operations against an S3-compatible endpoint. Produces continuous object lifecycle traffic — uploading objects of configurable sizes, reading them back, listing buckets, and deleting old objects — to stress object storage backends from inside a KubeVirt VM.

This fills a fundamentally different storage niche than the existing disk workload. The disk workload tests block storage via the CSI driver — fio issues read/write syscalls against a mounted filesystem backed by a PersistentVolume. object-storage tests the S3 API path — HTTP PUT/GET requests over the network to an object storage service. These are completely different data planes: different protocols (POSIX vs HTTP), different access patterns (random block I/O vs whole-object upload/download), different backend implementations, and different partner products.

The primary targets are OpenShift Data Foundation (ODF) with its Ceph-backed S3 (via RADOS Gateway), MinIO, and cloud storage gateway partners. ODF is a core Red Hat product and validating its S3 performance from VMs is a gap today.

Tooling and Packages

  • Tool: warp (MinIO's S3 performance benchmark) or s5cmd (fast S3 client) or AWS CLI (aws s3)
  • RPM packages: none — warp and s5cmd are single Go binaries; AWS CLI available via pip3 install awscli
  • systemd service command: warp mixed --host=<endpoint> --access-key=<key> --secret-key=<secret> --bucket=virtwork-bench --duration=0 --obj.size=1MiB --concurrent=16
    • --duration=0: run indefinitely
    • --obj.size=1MiB: 1 MiB object size (configurable)
    • --concurrent=16: 16 concurrent operations
    • mixed mode: 50% GET, 30% PUT, 10% DELETE, 10% LIST (default distribution)
  • Configurable parameters:
    • s3-endpoint: S3-compatible endpoint URL (required — no default, must point to ODF/MinIO/external)
    • s3-access-key / s3-secret-key: credentials (via Secret)
    • s3-bucket: target bucket name (default: virtwork-bench)
    • s3-object-size: object size (default: 1MiB)
    • s3-concurrency: concurrent operations (default: 16)
    • s3-op-mix: operation distribution (default: mixed — 50% GET, 30% PUT, 10% DELETE, 10% LIST)

VM Count Model

Single VM (like cpu, memory, disk)

Required Resources

  • Persistent storage (DataVolume)
  • Kubernetes Service (for inter-VM communication)
  • Kubernetes Secret (for credentials or config)
  • Additional CPU/memory beyond defaults
  • GPU or special device passthrough

The Secret holds S3 access key and secret key credentials. The S3 endpoint is typically a Kubernetes Service (e.g., rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc.cluster.local) or an external URL.

Cloud-Init Details

write_files:
  - path: /usr/local/bin/virtwork-object-storage.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      S3_ENDPOINT="${S3_ENDPOINT:?S3_ENDPOINT is required}"
      S3_ACCESS_KEY="${S3_ACCESS_KEY:?S3_ACCESS_KEY is required}"
      S3_SECRET_KEY="${S3_SECRET_KEY:?S3_SECRET_KEY is required}"
      S3_BUCKET="${S3_BUCKET:-virtwork-bench}"
      S3_OBJ_SIZE="${S3_OBJ_SIZE:-1MiB}"
      S3_CONCURRENCY="${S3_CONCURRENCY:-16}"

      exec /usr/local/bin/warp mixed \
        --host="$S3_ENDPOINT" \
        --access-key="$S3_ACCESS_KEY" \
        --secret-key="$S3_SECRET_KEY" \
        --bucket="$S3_BUCKET" \
        --obj.size="$S3_OBJ_SIZE" \
        --concurrent="$S3_CONCURRENCY" \
        --duration=0 \
        --autoterm \
        --noclear
  - path: /etc/systemd/system/virtwork-object-storage.service
    content: |
      [Unit]
      Description=Virtwork S3 object storage workload
      After=network-online.target
      Wants=network-online.target
      [Service]
      Type=simple
      EnvironmentFile=/etc/virtwork/s3-credentials
      ExecStart=/usr/local/bin/virtwork-object-storage.sh
      Restart=always
      RestartSec=10
      [Install]
      WantedBy=multi-user.target
  - path: /etc/virtwork/s3-credentials
    permissions: '0600'
    content: |
      S3_ENDPOINT=<from-config>
      S3_ACCESS_KEY=<from-secret>
      S3_SECRET_KEY=<from-secret>
      S3_BUCKET=virtwork-bench
      S3_OBJ_SIZE=1MiB
      S3_CONCURRENCY=16
runcmd:
  - curl -Lo /usr/local/bin/warp https://github.com/minio/warp/releases/download/v1.1.4/warp_Linux_x86_64
  - chmod +x /usr/local/bin/warp
  - systemctl enable --now virtwork-object-storage.service

Use Case

  • ODF (OpenShift Data Foundation) validation: ODF provides S3-compatible object storage via Ceph RADOS Gateway. Validating S3 throughput and latency from VMs is a gap — most ODF testing comes from pods. VMs accessing ODF's S3 endpoint exercise a different network path (VM pod network → Service → RGW pod) and are the access pattern that enterprise customers migrating from VMware will use.
  • MinIO partners: MinIO is widely deployed on OpenShift as an alternative object store. Partners need sustained S3 traffic to validate throughput, multi-part upload handling, and consistency guarantees under load from VMs.
  • Cloud storage gateway partners (NetApp StorageGRID, Scality, Cloudian): These products provide S3-compatible interfaces to enterprise storage. Partners need to validate that their gateway handles sustained object operations from VM-based applications — the typical deployment pattern when migrating legacy applications that use S3 libraries.
  • Backup/DR partners: Many backup products (Velero, Cohesity, Commvault) write backup data to S3-compatible storage. This workload simulates the I/O pattern of a backup job — large sequential PUTs followed by periodic GETs for restore validation and DELETEs for retention policy enforcement.
  • Data pipeline partners: Applications running in VMs that produce data (logs, telemetry, ETL output) often write to object storage. Partners building data pipeline products need to validate that their S3 ingestion handles sustained PUT traffic from VM-based producers.

Additional Context

  • Endpoint configuration is required: Unlike other workloads that are self-contained, object-storage requires an external S3 endpoint. The implementation should fail fast with a clear error if s3-endpoint is not configured. Consider detecting ODF's default RGW endpoint automatically if ODF is installed on the cluster.
  • warp vs alternatives:
    • warp (MinIO): purpose-built S3 benchmark, single binary, excellent statistics output, supports mixed workloads. Preferred for initial implementation.
    • s5cmd: fast S3 client, good for simple PUT/GET loops, less configurable workload profiles.
    • AWS CLI: universally available but slower and less suitable for sustained benchmarking.
    • cosbench (Intel): comprehensive but heavyweight (Java, requires orchestrator). Overkill for a VM-based workload.
  • Bucket lifecycle: The --noclear flag tells warp not to clean up the bucket on exit. Combined with Restart=always, this means objects accumulate across service restarts. The --autoterm flag enables automatic benchmarking termination detection. For indefinite operation, the mixed mode's DELETE operations naturally reclaim space — but monitor bucket size for very long runs.
  • Object size spectrum: The default 1MiB is a balanced choice. For backup/DR validation, use larger objects (64MiB-1GiB). For metadata-heavy workloads (many small files), use smaller objects (4KiB-64KiB). Configurable via s3-object-size.
  • Multi-VM scaling: At --vm-count 5 with 16 concurrent ops each, this produces 80 concurrent S3 operations — enough to stress most object storage deployments. The operations are naturally distributed across the object namespace (warp uses random object keys), so multiple VMs don't create hot-spot contention.
  • Credential management: S3 credentials should be injected via a Kubernetes Secret, written to /etc/virtwork/s3-credentials as an EnvironmentFile, and never logged. This follows the same pattern as SSH credentials in the existing workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/awaiting-more-evidenceLowest priority. Possibly useful, but not yet enough support to actually get it done.size/LDenotes a PR that changes 100-499 lines, ignoring generated files.workload-requestRequest for a new workload typeworkload/tier-3Specialized or novel. Significant engineering or new platform support.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions