[Workload]: database-replication

### Workload Name

database-replication

### Workload Description

PostgreSQL streaming replication workload with a primary VM running pgbench against a local PostgreSQL instance and a replica VM consuming the WAL (Write-Ahead Log) stream via streaming replication. Produces sustained replication I/O patterns — WAL generation, network-based WAL shipping, replay lag metrics — alongside the OLTP transaction signals from the existing `database` workload.

This extends the single-node `database` workload into a multi-VM pattern that exercises a fundamentally different storage and network profile. WAL shipping is sequential, append-only, and latency-sensitive — completely different from pgbench's random read/write OLTP pattern. Storage partners need both patterns because their CSI drivers optimize differently for sequential vs random I/O. HA and database partners need to validate failover behavior, replication lag monitoring, and data consistency across VMs.

### Tooling and Packages

**Primary VM:**
- Tool: PostgreSQL + pgbench (same as existing `database` workload)
- RPM packages: `postgresql-server`, `postgresql`
- systemd services:
  - `postgresql.service` — database server configured for streaming replication (`wal_level=replica`, `max_wal_senders=5`)
  - `virtwork-database.service` — pgbench continuous OLTP loop (reused from existing `database` workload)
- Configuration: `pg_hba.conf` updated to allow replication connections from replica VMs; `postgresql.conf` tuned for WAL generation (`wal_level=replica`, `max_wal_senders=5`, `wal_keep_size=256MB`)
- Exposes port 5432 via Kubernetes Service for replica connections

**Replica VM:**
- Tool: PostgreSQL (streaming replication standby)
- RPM packages: `postgresql-server`, `postgresql`
- Setup: `pg_basebackup` from primary on first boot, then continuous WAL replay via streaming replication
- systemd service: `postgresql.service` configured as hot standby (`hot_standby=on`, `primary_conninfo` pointing to primary Service)
- Monitoring: replication lag available via `pg_stat_replication` (primary) and `pg_stat_wal_receiver` (replica)

### VM Count Model

VM pair - server and client (like network)

### Required Resources

- [x] Persistent storage (DataVolume)
- [x] Kubernetes Service (for inter-VM communication)
- [x] Kubernetes Secret (for credentials or config)
- [ ] Additional CPU/memory beyond defaults
- [ ] GPU or special device passthrough

The Secret holds the replication user credentials. The Service exposes the primary's port 5432 for the replica's `primary_conninfo`. Both VMs need persistent DataVolumes — the primary for the database and WAL, the replica for the base backup and replayed data.

### Cloud-Init Details

**Primary VM:**
```yaml
packages:
  - postgresql-server
  - postgresql
write_files:
  - path: /usr/local/bin/virtwork-db-replication-setup.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      # Initialize and configure for streaming replication
      postgresql-setup --initdb
      cat >> /var/lib/pgsql/data/postgresql.conf <<PGEOF
      listen_addresses = '*'
      wal_level = replica
      max_wal_senders = 5
      wal_keep_size = 256MB
      hot_standby = on
      PGEOF
      # Allow replication connections from cluster network
      echo "host replication replicator 0.0.0.0/0 md5" >> /var/lib/pgsql/data/pg_hba.conf
      echo "host all all 0.0.0.0/0 md5" >> /var/lib/pgsql/data/pg_hba.conf
      systemctl enable --now postgresql
      # Create replication user and pgbench database
      sudo -u postgres psql -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD '<from-secret>';"
      sudo -u postgres createdb pgbench
      sudo -u postgres pgbench -i -s 10 pgbench
runcmd:
  - /usr/local/bin/virtwork-db-replication-setup.sh
  - systemctl enable --now virtwork-database.service
```

**Replica VM:**
```yaml
packages:
  - postgresql-server
  - postgresql
write_files:
  - path: /usr/local/bin/virtwork-db-replica-setup.sh
    permissions: '0755'
    content: |
      #!/bin/bash
      set -euo pipefail
      # Base backup from primary
      PGPASSWORD='<from-secret>' pg_basebackup \
        -h <primary-service> -p 5432 -U replicator \
        -D /var/lib/pgsql/data -Fp -Xs -R
      chown -R postgres:postgres /var/lib/pgsql/data
      chmod 0700 /var/lib/pgsql/data
      # -R flag creates standby.signal and sets primary_conninfo automatically
      systemctl enable --now postgresql
runcmd:
  - /usr/local/bin/virtwork-db-replica-setup.sh
```

### Use Case

- **Storage partners (CSI drivers):** WAL shipping produces sequential, append-only write patterns that stress storage differently than pgbench's random OLTP I/O. Partners need to validate that their storage class handles both patterns simultaneously — the primary doing random reads/writes while continuously generating sequential WAL, and the replica doing sequential WAL replay. This is the I/O profile of every production PostgreSQL deployment.
- **Database partners (EDB, Percona, Crunchy Data):** Need to validate their PostgreSQL operators and management tools correctly detect replication topology, monitor replication lag, and handle failover in a KubeVirt VM environment. The streaming replication setup provides the `pg_stat_replication` and `pg_stat_wal_receiver` views that these tools rely on.
- **HA/DR partners:** Need sustained replication to validate that their failover automation works correctly when the primary becomes unavailable — the replica should be promotable, and the partner product should detect the topology change.
- **Monitoring partners:** Replication lag (`sent_lsn` vs `replay_lsn`) is a critical metric for any PostgreSQL monitoring product. This workload provides a continuously updating replication lag signal that monitoring agents should detect and alert on.
- **Network partners:** WAL streaming is a sustained, latency-sensitive TCP connection between VMs. Network disruption (jitter, packet loss) directly impacts replication lag — this pairs naturally with `chaos-network` for validating network product resilience.

### Additional Context

- Follows the same multi-VM pattern as `network` and `tps` — implements `MultiVMWorkload` with `RoleDistribution() []RoleSpec` for primary/replica roles.
- The primary VM reuses cloud-init patterns from the existing `database` workload (PostgreSQL setup + pgbench loop). The replication configuration layers on top.
- The Secret for replication credentials follows the same pattern as the existing SSH key Secret — created by the `resources` package, referenced in cloud-init.
- Consider exposing configurable parameters: `wal-keep-size` (WAL retention), `pgbench-scale` (database size / transaction complexity), `replica-count` (number of replicas per primary — could extend beyond a simple pair).
- Replication lag can be queried with: `SELECT pid, state, sent_lsn, replay_lsn, replay_lag FROM pg_stat_replication;` on the primary — useful for demo and monitoring validation.
- This workload naturally composes with `chaos-network` — injecting latency on the replica's network interface directly increases replication lag, providing a controlled test for monitoring alerting thresholds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Workload]: database-replication #158

Workload Name

Workload Description

Tooling and Packages

VM Count Model

Required Resources

Cloud-Init Details

Use Case

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Workload]: database-replication #158

Description

Workload Name

Workload Description

Tooling and Packages

VM Count Model

Required Resources

Cloud-Init Details

Use Case

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions