Skip to content

[Workload]: database-replication #158

Description

@mrhillsman

Workload Name

database-replication

Workload Description

PostgreSQL streaming replication workload with a primary VM running pgbench against a local PostgreSQL instance and a replica VM consuming the WAL (Write-Ahead Log) stream via streaming replication. Produces sustained replication I/O patterns — WAL generation, network-based WAL shipping, replay lag metrics — alongside the OLTP transaction signals from the existing database workload.

This extends the single-node database workload into a multi-VM pattern that exercises a fundamentally different storage and network profile. WAL shipping is sequential, append-only, and latency-sensitive — completely different from pgbench's random read/write OLTP pattern. Storage partners need both patterns because their CSI drivers optimize differently for sequential vs random I/O. HA and database partners need to validate failover behavior, replication lag monitoring, and data consistency across VMs.

Tooling and Packages

Primary VM:

  • Tool: PostgreSQL + pgbench (same as existing database workload)
  • RPM packages: postgresql-server, postgresql
  • systemd services:
    • postgresql.service — database server configured for streaming replication (wal_level=replica, max_wal_senders=5)
    • virtwork-database.service — pgbench continuous OLTP loop (reused from existing database workload)
  • Configuration: pg_hba.conf updated to allow replication connections from replica VMs; postgresql.conf tuned for WAL generation (wal_level=replica, max_wal_senders=5, wal_keep_size=256MB)
  • Exposes port 5432 via Kubernetes Service for replica connections

Replica VM:

  • Tool: PostgreSQL (streaming replication standby)
  • RPM packages: postgresql-server, postgresql
  • Setup: pg_basebackup from primary on first boot, then continuous WAL replay via streaming replication
  • systemd service: postgresql.service configured as hot standby (hot_standby=on, primary_conninfo pointing to primary Service)
  • Monitoring: replication lag available via pg_stat_replication (primary) and pg_stat_wal_receiver (replica)

VM Count Model

VM pair - server and client (like network)

Required Resources

  • Persistent storage (DataVolume)
  • Kubernetes Service (for inter-VM communication)
  • Kubernetes Secret (for credentials or config)
  • Additional CPU/memory beyond defaults
  • GPU or special device passthrough

The Secret holds the replication user credentials. The Service exposes the primary's port 5432 for the replica's primary_conninfo. Both VMs need persistent DataVolumes — the primary for the database and WAL, the replica for the base backup and replayed data.

Cloud-Init Details

Primary VM:

packages:
  - postgresql-server
  - postgresql
write_files:
  - path: /usr/local/bin/virtwork-db-replication-setup.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      # Initialize and configure for streaming replication
      postgresql-setup --initdb
      cat >> /var/lib/pgsql/data/postgresql.conf <<PGEOF
      listen_addresses = '*'
      wal_level = replica
      max_wal_senders = 5
      wal_keep_size = 256MB
      hot_standby = on
      PGEOF
      # Allow replication connections from cluster network
      echo "host replication replicator 0.0.0.0/0 md5" >> /var/lib/pgsql/data/pg_hba.conf
      echo "host all all 0.0.0.0/0 md5" >> /var/lib/pgsql/data/pg_hba.conf
      systemctl enable --now postgresql
      # Create replication user and pgbench database
      sudo -u postgres psql -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD '<from-secret>';"
      sudo -u postgres createdb pgbench
      sudo -u postgres pgbench -i -s 10 pgbench
runcmd:
  - /usr/local/bin/virtwork-db-replication-setup.sh
  - systemctl enable --now virtwork-database.service

Replica VM:

packages:
  - postgresql-server
  - postgresql
write_files:
  - path: /usr/local/bin/virtwork-db-replica-setup.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      # Base backup from primary
      PGPASSWORD='<from-secret>' pg_basebackup \
        -h <primary-service> -p 5432 -U replicator \
        -D /var/lib/pgsql/data -Fp -Xs -R
      chown -R postgres:postgres /var/lib/pgsql/data
      chmod 0700 /var/lib/pgsql/data
      # -R flag creates standby.signal and sets primary_conninfo automatically
      systemctl enable --now postgresql
runcmd:
  - /usr/local/bin/virtwork-db-replica-setup.sh

Use Case

  • Storage partners (CSI drivers): WAL shipping produces sequential, append-only write patterns that stress storage differently than pgbench's random OLTP I/O. Partners need to validate that their storage class handles both patterns simultaneously — the primary doing random reads/writes while continuously generating sequential WAL, and the replica doing sequential WAL replay. This is the I/O profile of every production PostgreSQL deployment.
  • Database partners (EDB, Percona, Crunchy Data): Need to validate their PostgreSQL operators and management tools correctly detect replication topology, monitor replication lag, and handle failover in a KubeVirt VM environment. The streaming replication setup provides the pg_stat_replication and pg_stat_wal_receiver views that these tools rely on.
  • HA/DR partners: Need sustained replication to validate that their failover automation works correctly when the primary becomes unavailable — the replica should be promotable, and the partner product should detect the topology change.
  • Monitoring partners: Replication lag (sent_lsn vs replay_lsn) is a critical metric for any PostgreSQL monitoring product. This workload provides a continuously updating replication lag signal that monitoring agents should detect and alert on.
  • Network partners: WAL streaming is a sustained, latency-sensitive TCP connection between VMs. Network disruption (jitter, packet loss) directly impacts replication lag — this pairs naturally with chaos-network for validating network product resilience.

Additional Context

  • Follows the same multi-VM pattern as network and tps — implements MultiVMWorkload with RoleDistribution() []RoleSpec for primary/replica roles.
  • The primary VM reuses cloud-init patterns from the existing database workload (PostgreSQL setup + pgbench loop). The replication configuration layers on top.
  • The Secret for replication credentials follows the same pattern as the existing SSH key Secret — created by the resources package, referenced in cloud-init.
  • Consider exposing configurable parameters: wal-keep-size (WAL retention), pgbench-scale (database size / transaction complexity), replica-count (number of replicas per primary — could extend beyond a simple pair).
  • Replication lag can be queried with: SELECT pid, state, sent_lsn, replay_lsn, replay_lag FROM pg_stat_replication; on the primary — useful for demo and monitoring validation.
  • This workload naturally composes with chaos-network — injecting latency on the replica's network interface directly increases replication lag, providing a controlled test for monitoring alerting thresholds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.size/XLDenotes a PR that changes 500-999 lines, ignoring generated files.workload-requestRequest for a new workload typeworkload/tier-2High impact, introduces new patterns or requires domain knowledge.

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions