- Quick Start
- Project Layout and Architecture
- Idempotency and Safety
- Testing
- Makefile Targets
- CI/CD Pipeline
- Workload Development
- SSH and Audit Quick Reference
- Conventions and References
- Go 1.26+
- Ginkgo CLI (for BDD test runner)
go install github.com/onsi/ginkgo/v2/ginkgo@latestgo mod download# Build the binary
go build -o virtwork ./cmd/virtwork
# Run without building
go run ./cmd/virtwork --help
go run ./cmd/virtwork run --dry-run
go run ./cmd/virtwork run --dry-run --ssh-user virtwork --ssh-key-file ~/.ssh/id_ed25519.pubcmd/virtwork/ # Entry point (Cobra root + subcommands)
internal/ # Application packages (not importable externally)
constants/ # API coordinates, labels, defaults
config/ # Config struct, Viper priority chain
cluster/ # controller-runtime client init + scheme
cloudinit/ # Cloud-config YAML builder
logging/ # Structured slog logger (verbose -> DEBUG)
vm/ # VM spec construction + typed CRUD + retry
resources/ # Namespace + Service + Secret helpers
wait/ # VMI readiness polling (errgroup)
cleanup/ # Label-based teardown (VMs, Services, Secrets)
audit/ # SQLite audit tracking (Auditor interface, schema, records)
orchestrator/ # Run + Cleanup orchestrators (errgroup concurrency, catalog loading)
workloads/ # Workload + MultiVMWorkload interfaces, built-in + generic implementations, registry, catalog
testutil/ # Shared test helpers for integration and E2E tests
tests/ # Tests requiring external infrastructure
e2e/ # E2E acceptance tests (//go:build e2e)
build/
golden-image/ # Optional Fedora container disk with pre-installed tools
deploy/ # Kustomize manifests for OpenShift deployment
docs/ # Documentation
README.md # Documentation index
architecture.md # Layered architecture and mermaid diagrams
development.md # This file
configuration.md # Complete configuration reference
deployment.md # OpenShift deployment deep-dive
audit-schema.md # SQLite audit schema reference
chaos-workloads.md # Chaos engineering workload guide
guide/ # Hands-on guides (overview, deploying, adding workloads)
The codebase follows a strict layered architecture where each layer depends only on layers below it. See architecture.md for full diagrams.
| Layer | Packages | Goroutines | Purpose |
|---|---|---|---|
| 0 | constants |
No | Pure values — API coordinates, labels, defaults |
| 1 | config, cloudinit, cluster, logging, audit |
No | Configuration, cloud-init YAML, K8s client init, structured logging, audit tracking |
| 2 | vm, resources, wait |
Yes | K8s CRUD operations with retry, readiness polling |
| 3 | workloads |
No | Pure data producers — cloud-init specs, resource structs |
| 4 | cmd/virtwork, orchestrator, cleanup |
Yes | Dependency wiring, orchestration, teardown |
Go's native concurrency is used throughout. The internal/orchestrator package drives parallel VM creation and readiness polling via errgroup.Group for structured error handling, with context.Context for timeouts and cancellation.
g, ctx := errgroup.WithContext(ctx)
for _, vmName := range vmNames {
name := vmName
g.Go(func() error {
return vm.CreateVM(ctx, c, spec)
})
}
if err := g.Wait(); err != nil {
return err
}The registry (internal/workloads/registry.go) ships 9 built-in workloads:
| Name | Type | Purpose | Reference |
|---|---|---|---|
cpu |
single-role | CPU stress via stress-ng | configuration.md |
memory |
single-role | Memory pressure via stress-ng | configuration.md |
disk |
single-role | Mixed I/O benchmark via fio | configuration.md |
database |
single-role | PostgreSQL + pgbench continuous loop | configuration.md |
network |
multi-role | iperf3 server/client bandwidth test | configuration.md |
tps |
multi-role | Multi-protocol transactions per second | configuration.md |
chaos-disk |
single-role | Disk fill/release chaos loop | chaos-workloads.md |
chaos-network |
single-role | Network latency/loss injection | chaos-workloads.md |
chaos-process |
single-role | Random process signal injection | chaos-workloads.md |
Additional workloads can be added without modifying Go code via the catalog system.
apierrors.IsAlreadyExists()responses are treated as success (resource already exists)apierrors.IsTooManyRequests()and server errors trigger retry with exponential backoffapierrors.IsNotFound()is fatal for CRUD (CNV not installed?)apierrors.IsUnauthorized()/apierrors.IsForbidden()are fatal (auth errors)- All created resources are labeled with
app.kubernetes.io/managed-by: virtworkfor cleanup tracking --dry-runprints specs without any cluster interaction- OpenShift HAProxy load balancers may drop the first TLS connection when connection pools are cold, causing transient failures on the first API call after an idle period. The retry logic (backoff on
IsTooManyRequests()and server errors) covers this. If running against remote clusters and seeing intermittent first-call failures, this is expected behavior — the retry will succeed.
| Location | Build Tag | Cluster Required | Coverage | Description |
|---|---|---|---|---|
internal/*/_test.go |
(none) | No | ~60-80% | Unit tests alongside source, all K8s calls use fake client |
internal/*/_integration_test.go |
integration |
Yes | ~40-60% | Integration tests alongside source, real cluster interactions |
tests/e2e/ |
e2e |
Yes | Black-box | E2E/acceptance tests, CLI binary testing against real cluster |
internal/testutil/ |
mixed | Conditional | 58.2% | Shared test helpers: unit tests (no cluster) + integration tests (requires cluster) |
Unit tests use controller-runtime's fake client:
fake.NewClientBuilder().
WithScheme(cluster.NewScheme()).
WithObjects(existingResources...).
Build()Integration tests use a real cluster connection:
c = testutil.MustConnect("")
namespace = testutil.UniqueNamespace("integ-prefix")
DeferCleanup(func() { testutil.CleanupNamespace(ctx, c, namespace) })E2E tests invoke the built binary:
stdout, stderr, exitCode, err := testutil.RunVirtwork("run", "--dry-run", "--workloads", "cpu")
Expect(exitCode).To(Equal(0))# Full unit test suite
go test ./...
# With race detector
go test -race ./...
# Specific package
go test ./internal/vm/...
# With verbose output
go test -v ./...
# With coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out# Run all tests recursively
ginkgo -r
# Run specific package
ginkgo ./internal/vm/
# Verbose with labels
ginkgo -r -v
# With race detector
ginkgo -r -race
# Run in parallel (multiple Ginkgo nodes)
ginkgo -r -p
# Focus on specific tests (use FDescribe/FIt in code)
ginkgo -r --focus "BuildVMSpec"The internal/testutil/ package provides shared test helpers used by integration and E2E tests. As of the test coverage improvements, it now has comprehensive unit and integration test coverage.
Core Functions:
MustConnect(kubeconfigPath string) client.Client- Connects to cluster, panics on failure (suitable for BeforeEach setup)UniqueNamespace(prefix string) string- Generates collision-proof names likevirtwork-test-<prefix>-<random>EnsureTestNamespace(ctx, c, namespace) error- Creates namespace with managed-by labelsCleanupNamespace(ctx, c, namespace)- Deletes all managed resources + namespace (error-tolerant for AfterEach)ManagedLabels() map[string]string- Returns standard virtwork labels for resource trackingDefaultVMOpts(name, namespace) VMSpecOpts- Returns minimal VM spec (1 CPU, 512Mi, Fedora disk, basic cloud-init)WaitForVMRunning(ctx, c, name, namespace, timeout) error- Polls until VMI phase is RunningBinaryPath() (string, error)- Returns path to built virtwork binary (checksVIRTWORK_BINARYenv var, builds on first call)RunVirtwork(args...) (stdout, stderr string, exitCode int, err error)- Executes virtwork binary for E2E tests
Example Usage Pattern:
var _ = Describe("MyFeature", func() {
var ctx context.Context
var c client.Client
var namespace string
BeforeEach(func() {
ctx = context.Background()
c = testutil.MustConnect("")
namespace = testutil.UniqueNamespace("my-feature")
Expect(testutil.EnsureTestNamespace(ctx, c, namespace)).To(Succeed())
})
AfterEach(func() {
testutil.CleanupNamespace(ctx, c, namespace)
})
It("should deploy a VM", func() {
opts := testutil.DefaultVMOpts("test-vm", namespace)
vmObj := vm.BuildVMSpec(opts)
Expect(vm.CreateVM(ctx, c, vmObj)).To(Succeed())
// Wait for VM to boot
Expect(testutil.WaitForVMRunning(ctx, c, "test-vm", namespace, 5*time.Minute)).To(Succeed())
})
})Testing the testutil package:
# Unit tests (no cluster required)
go test ./internal/testutil -v
# Integration tests (cluster required)
go test -tags integration ./internal/testutil -v
# Coverage report
go test ./internal/testutil -coverprofile=coverage.out
go tool cover -html=coverage.outIntegration and E2E tests require a live OpenShift cluster with specific operators and configurations.
- OpenShift: 4.12+ (tested on 4.14, 4.15, 4.16)
- OpenShift Virtualization (CNV): 4.12+
- KubeVirt API compatibility: v1.7.0+
- CDI (Containerized Data Importer) API: v1.64.0+
- Storage: Default StorageClass with ReadWriteOnce support
- Networking: Pod network with masquerade networking support
- Permissions: Cluster admin or namespace admin with permissions to:
- Create/delete namespaces
- Create/delete VirtualMachines, VirtualMachineInstances
- Create/delete Services, Secrets, DataVolumes
- List namespaces cluster-wide (for cleanup tests)
Tests use the following kubeconfig resolution order:
KUBECONFIGenvironment variable (recommended for CI)~/.kube/config(default for local development)- In-cluster config (when running inside a pod)
# Set kubeconfig for tests
export KUBECONFIG=/path/to/your/kubeconfig
# Verify cluster access
oc whoami
oc get nodesBefore running integration or E2E tests, verify operators are installed:
# Check KubeVirt/CNV operator
oc get csv -n openshift-cnv | grep kubevirt
# Check CDI operator
oc get csv -n openshift-cnv | grep containerized-data-importer
# Verify default StorageClass exists
oc get sc | grep default
# Test namespace creation permission
oc create namespace virtwork-test-check
oc delete namespace virtwork-test-check| Test Type | Namespace Count | VM Count | Disk Usage | Runtime | Cluster Load |
|---|---|---|---|---|---|
| Unit tests | 0 | 0 | ~50MB (built binaries) | ~5-10s | None (fake client) |
| Integration tests | ~15-20 | 0-5 | ~100MB | ~2-5min | Low (namespace/resource CRUD) |
| E2E tests | ~5-10 | 2-10 | ~500MB-2GB (containerDisks) | ~10-30min | Medium (VM boots, workload execution) |
| Full suite | ~25-30 | 10-15 | ~2-3GB | ~15-40min | Medium |
Notes:
- Integration tests create namespaces and minimal resources but rarely boot VMs
- E2E tests deploy full workloads with VM boot (slower, higher resource usage)
- Tests use unique namespace names (
virtwork-test-*-<random>) to avoid collisions - Cleanup is automatic via
DeferCleanup()but KubeVirt finalizers may delay namespace deletion
"connection refused" errors:
# Check kubeconfig is set and cluster is accessible
echo $KUBECONFIG
oc cluster-info
# Verify API server connectivity
oc get nodes"no matches for kind VirtualMachine":
# KubeVirt CRDs not installed
oc get crd | grep kubevirt
# Install OpenShift Virtualization operator via OperatorHub
oc get csv -n openshift-cnv"timeout waiting for VM to be running":
- Check cluster has sufficient resources (CPU, memory, storage)
- Verify default StorageClass is available and bound
- Check VM events:
oc get events -n <namespace> --sort-by='.lastTimestamp' - Increase timeout in test if cluster is resource-constrained
"namespace stuck in Terminating":
- KubeVirt finalizers are cleaning up VM resources
- Wait up to 60 seconds for automatic cleanup
- Force delete if stuck:
oc delete namespace <name> --grace-period=0 --force
Tests fail with "AlreadyExists" errors:
- Previous test run cleanup incomplete
- Clean up manually:
virtwork cleanuporoc delete namespace virtwork-test-<prefix>-*
Integration tests live alongside source code with //go:build integration build tags. They are excluded from go test ./... (no tag).
Prerequisites:
KUBECONFIGset or~/.kube/configavailable- Cluster with KubeVirt/CNV and CDI operators installed
- Permissions to create/delete namespaces, VMs, Services, Secrets
# Run all integration tests
go test -tags integration ./internal/...
# Run integration tests for a specific package
go test -tags integration ./internal/vm/...
# Via Ginkgo
ginkgo -r --build-tags integration ./internal/
# Skip slow tests (VM boot required)
ginkgo -r --build-tags integration --label-filter='!slow' ./internal/E2E tests live in tests/e2e/ and exercise the CLI binary as a black box. The binary is built automatically in BeforeSuite, or you can provide a pre-built binary via VIRTWORK_BINARY.
Prerequisites:
- All integration test prerequisites above
- Go toolchain (for binary build) or
VIRTWORK_BINARYenv var
# Run all E2E tests
go test -tags e2e ./tests/e2e/...
# Via Ginkgo
ginkgo -r --build-tags e2e ./tests/e2e/
# Skip slow tests (cluster deployment)
ginkgo -r --build-tags e2e --label-filter='!slow' ./tests/e2e/
# Use a pre-built binary
VIRTWORK_BINARY=./virtwork go test -tags e2e ./tests/e2e/...
# Run everything (unit + integration + e2e)
go test -tags "integration e2e" ./...When testing cloud-init or any YAML output, always parse the YAML string before asserting on values:
// GOOD: Parse, then assert on structure
userdata, err := wl.CloudInitUserdata()
Expect(err).NotTo(HaveOccurred())
var parsed map[string]interface{}
Expect(yaml.Unmarshal([]byte(userdata), &parsed)).To(Succeed())
Expect(parsed).To(HaveKey("packages"))
// BAD: Assert on raw string (fragile — key order, whitespace, line folding)
Expect(userdata).To(ContainSubstring("packages:\n- stress-ng"))Each workload writes a systemd .service file via cloud-init write_files, then enables/starts it via runcmd. This ensures workloads survive VM reboots and can be managed with standard systemd tooling.
For workloads with initialization (database), use ExecStartPre for setup and ExecStart for the main loop. For workloads with multiple configurations (disk/fio), write job files as separate write_files entries.
A Makefile provides convenient shortcuts for common development tasks:
# Show all available targets
make help
# Run unit tests (excludes integration/e2e)
make test
# Run integration tests (requires cluster)
make test-integration
# Run e2e tests (requires cluster)
make test-e2e
# Run all tests (unit + integration + e2e)
make test-all
# Run go vet
make vet
# Run golangci-lint
make lint
# Format code
make fmt
# Build the binary
make build
# Run full CI validation locally (vet + test + build)
make ci
# Run all verification checks (fmt + vet + lint + test)
make verify
# Clean build artifacts
make clean
# Install development tools (golangci-lint)
make install-tools
# Build container image locally (uses podman by default)
make container-build
# Build with docker instead of podman
CONTAINER_RUNTIME=docker make container-buildThe project uses GitHub Actions for automated validation on push and pull requests.
-
Build & Test (
.github/workflows/build.yml) — Runs on every push/PR to main- Go vet checks
- Unit tests with race detector
- Binary build verification
- Only runs unit tests (integration/e2e require live cluster)
-
Linting (
.github/workflows/lint.yml) — Runs golangci-lint on every push/PR- Code quality checks
- Static analysis
- Best practices enforcement
-
Container Image (
.github/workflows/image.yml) — Builds and pushes container images- Triggered on main branch pushes and tags
-
Release (
.github/workflows/release.yml) — Automated releases via GoReleaser- Triggered on version tags (e.g.,
v1.0.0) - Builds multi-platform binaries
- Generates release notes
- Triggered on version tags (e.g.,
Before pushing code, run the same checks that CI will execute:
# Quick validation (vet + test + build)
make ci
# Full verification (includes fmt and lint)
make verify
# Or run individual checks
make vet
make test
make lintCreate internal/workloads/<name>.go:
package workloads
type MyWorkload struct {
BaseWorkload
}
func NewMyWorkload(cfg config.WorkloadConfig, sshUser, sshPassword string, sshKeys []string) *MyWorkload {
return &MyWorkload{BaseWorkload: BaseWorkload{
Config: cfg,
SSHUser: sshUser,
SSHPassword: sshPassword,
SSHAuthorizedKeys: sshKeys,
}}
}
func (w *MyWorkload) Name() string {
return "my-workload"
}
func (w *MyWorkload) CloudInitUserdata() (string, error) {
// Use BaseWorkload's helper to inject SSH credentials automatically
return w.BuildCloudConfig(cloudinit.CloudConfigOpts{
Packages: []string{"my-package"},
// ...
})
}BaseWorkload provides defaults via embedding. Override only what you need:
| Method | Default | Override When |
|---|---|---|
ExtraVolumes() |
nil |
VM needs additional volume mounts |
ExtraDisks() |
nil |
VM needs additional disk definitions |
DataVolumeTemplates() |
nil, nil |
Workload needs persistent storage (returns []DataVolumeTemplateSpec, error) |
RequiresService() |
false |
VMs need a K8s Service for communication |
ServiceSpec() |
nil |
Define the Service when RequiresService() is true |
VMCount() |
1 |
Workload needs multiple VMs (e.g., server/client) |
Add a RegistryEntry to internal/workloads/registry.go. Each entry pairs a WorkloadFactory with a ParamSchema so the registry can validate user-supplied params at deploy time. DefaultRegistry() currently has nine entries; add yours alongside:
func DefaultRegistry() Registry {
return Registry{
"cpu": {
Factory: func(cfg config.WorkloadConfig, opts *RegistryOpts) Workload {
return NewCPUWorkload(cfg, opts.SSHUser, opts.SSHPassword, opts.SSHAuthorizedKeys)
},
ParamSchema: CPUParamSchema,
},
// ... other built-in entries ...
"my-workload": {
Factory: func(cfg config.WorkloadConfig, opts *RegistryOpts) Workload {
return NewMyWorkload(cfg, opts.SSHUser, opts.SSHPassword, opts.SSHAuthorizedKeys)
},
ParamSchema: MyParamSchema,
},
}
}Important: When adding a new workload, expect these ripple effects:
- Registry tests will fail (entry count, name list assertions)
- Orchestration BDD tests will fail (total VM count assertions)
- Update both before considering the feature complete
If your workload needs more than one role of VM (e.g., a server and a client), implement the MultiVMWorkload interface in addition to Workload:
// In internal/workloads/workload.go
type MultiVMWorkload interface {
Workload
RoleDistribution() []RoleSpec
UserdataForRole(role string, namespace string) (string, error)
}
type RoleSpec struct {
Role string
VMCount int
}Pattern:
- Embed
BaseWorkloadand store any per-workload state (e.g., aNamespacefield for in-cluster DNS). - Implement
RoleDistribution()to return per-role VM counts (e.g.,[]RoleSpec{{Role: "server", VMCount: 1}, {Role: "client", VMCount: 1}}). - Override
VMCount()to sum theRoleSpec.VMCountvalues fromRoleDistribution(). - Implement
UserdataForRole(role, namespace)to return role-specific cloud-init. The defaultCloudInitUserdata()typically delegates toUserdataForRole("server", namespace). - Set
RequiresService()totrueand provide aServiceSpec()selecting server VMs by thevirtwork/role: serverlabel that the orchestrator applies automatically. - Clients reach servers via the in-cluster DNS name
<service-name>.<namespace>.svc.cluster.local— never poll for pod IPs.
The canonical references are internal/workloads/network.go (simplest — one port, iperf3) and internal/workloads/tps.go (multi-port Service). All workloads support configurable Params via a typed param schema (see Configurable Params below).
If your workload needs persistent storage inside the VM:
- Override
DataVolumeTemplates()to return a CDIDataVolumefor each volume needed. Usevm.BuildDataVolumeTemplate(name, size)frominternal/vm. - Override
ExtraVolumes()andExtraDisks()to wire the DataVolume into the VM. Always set theSerialfield on theDisk— the in-VM script discovers the device through/dev/disk/by-id/virtio-<serial>, which is deterministic across reboots (unlike/dev/vdX, which is not). - In your cloud-init userdata, write the shared
diskSetupScript(serial, mountPoint)helper (frominternal/workloads/workload.go) as the first script. It waits for the symlink, formats with XFS if empty, mounts, and writes/etc/fstabfor persistence across reboots. - The
NamespaceDataVolumeshelper ininternal/orchestrator/types.goautomatically suffixes DV template names with the VM name to avoid collisions across multiple VMs of the same workload. Your template name should be the un-suffixed base (e.g.,virtwork-chaos-disk-data).
Reference workloads: disk.go (single fio mount), database.go (PostgreSQL data dir), chaos_disk.go (fill/release loop). All three follow the same pattern.
All workloads expose tunable knobs through WorkloadConfig.Params map[string]string. Users set these in YAML config under workloads.<name>.params. Each workload declares a typed param schema — a slice of ParamDef entries that define the key, type, default, and description for every supported param:
var CPUParamSchema = ParamSchema{
{Key: "cpu-load-percent", Type: ParamInt, Default: "100", Desc: "Target CPU load percentage for stress-ng (--cpu-load)"},
{Key: "cpu-method", Type: ParamString, Default: "all", Desc: "CPU stressor method for stress-ng (--cpu-method)"},
}Set the schema in your constructor on the embedded BaseWorkload:
func NewMyWorkload(cfg config.WorkloadConfig, sshUser, sshPassword string, sshKeys []string) *MyWorkload {
return &MyWorkload{
BaseWorkload: BaseWorkload{
Config: cfg,
ParamSchema: MyParamSchema,
SSHUser: sshUser, SSHPassword: sshPassword, SSHAuthorizedKeys: sshKeys,
},
}
}Then call w.GetParam("key") to retrieve the value — it returns the user's override if set, otherwise the schema default:
func (w *MyWorkload) CloudInitUserdata() (string, error) {
unit := fmt.Sprintf(mySystemdUnitTemplate, w.GetParam("concurrency"), w.GetParam("duration"))
return w.BuildCloudConfig(CloudConfigOpts{
WriteFiles: []WriteFile{{
Path: "/etc/systemd/system/virtwork-my-workload.service",
Content: unit,
}},
})
}GetParam panics on unknown keys — this is intentional; it catches typos in workload code at test time. The orchestrator calls registry.ValidateParams() before constructing workloads, rejecting unknown keys (with "did you mean?" suggestions) and type-mismatched values at deploy time.
Five param types are available: ParamString, ParamInt, ParamBool, ParamList (semicolon-separated), and ParamDict (semicolon-separated key=value pairs). Register your workload in DefaultRegistry() as a RegistryEntry pairing the factory with the schema so validation applies automatically. Catalog workloads declare the same param schema in workload.yaml and use {{key}} placeholders in their service files instead of GetParam() calls — see Catalog Workloads for details.
Every workload should have a Context("param wiring") test block with three cases:
- Nil params —
Paramsfield omitted, output contains default values - Full override — all param keys set, output reflects custom values
- Partial override — some keys set, unset keys fall back to defaults
See internal/workloads/cpu_test.go for the simplest example, or disk_test.go for a workload with multiple output files to verify. Document new param keys in docs/configuration.md — both in the YAML example and the params table.
The internal/logging package provides a shared *slog.Logger returned by NewLogger(w io.Writer, verbose bool). Use it instead of fmt.Fprintf or log.Printf in any code path under cmd/ or in packages that perform I/O (internal/wait is the current example):
logger := logging.NewLogger(cmd.OutOrStdout(), verbose)
logger.Info("vm created",
slog.String("vm_name", name),
slog.String("namespace", ns),
slog.String("workload", component))The output is JSON. --verbose flips the level from INFO to DEBUG. Workload constructors and the pure data layer (internal/workloads, internal/cloudinit) should not log — they remain pure data producers.
The catalog system lets users and operators add new workloads without modifying Go code. A catalog workload is a directory containing one or more systemd .service files and an optional workload.yaml manifest. At deploy time the orchestrator loads catalog entries, registers them alongside the built-in workloads, and creates VMs with the same cloud-init pipeline.
Use the catalog when:
- You want to deploy a custom workload without rebuilding the binary
- You want to iterate on workload definitions without a Go development environment
- You want to share workload definitions as portable directories
Catalog entries also support declarative storage (DataVolumes, extra disks) and K8s Services via the workload.yaml manifest — no Go code required for these either.
~/.virtwork/catalog/ # default catalog-dir (override with --catalog-dir)
├── my-stress/ # single-role entry
│ ├── workload.yaml # optional manifest (packages, params)
│ └── workload.service # systemd unit file
└── my-benchmark/ # multi-role entry
├── workload.yaml # required for multi-role (declares roles)
├── server.service # one service file per role
└── client.service
Each subdirectory is a catalog entry. The directory name becomes the workload name.
Rules:
- At least one
*.servicefile is required - Single-role entries: all
.servicefiles are written to every VM - Multi-role entries: each declared role must have a matching
{role}.servicefile workload.yamlis optional for single-role entries, required for multi-role entries
description: "Human-readable description of the workload"
packages:
- stress-ng # system packages to install via cloud-init
params:
- key: cpu-load # parameter name (used as {{cpu-load}} in service files)
type: int # string | int | bool | list | dict
default: "50" # default value (always a string)
desc: "CPU load percentage for stress-ng"
roles: # omit for single-role entries
- name: server
vm-count: 1 # 0 means "use the global --vm-count flag"
- name: client
vm-count: 2
storage: # optional — persistent volumes attached to VMs
- name: data # DataVolume name (suffixed with VM name automatically)
size: 10Gi # volume size
serial: vw-data # virtio serial (device discovered via /dev/disk/by-id/virtio-<serial>)
mount: /mnt/data # mount point inside the VM
service: # optional — K8s Service for inter-VM communication
ports:
- name: iperf # port name
port: 5201 # port number
protocol: TCP # TCP or UDP
selector-role: server # role whose VMs the Service selects (multi-role entries)All fields are optional for single-role entries. When roles: is present the entry becomes multi-role, which requires a manifest and one {role}.service file per declared role.
A single-role catalog entry produces a GenericWorkload — one VM type running all the declared service files.
Example — a custom stress-ng workload:
~/.virtwork/catalog/my-stress/
├── workload.yaml
└── workload.service
workload.yaml:
description: "Custom CPU stress test"
packages:
- stress-ng
params:
- key: cpu-load
type: int
default: "50"
desc: "CPU load percentage"
- key: method
type: string
default: "all"
desc: "stress-ng CPU method"workload.service:
[Unit]
Description=Virtwork custom stress
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/stress-ng --cpu 0 --cpu-load {{cpu-load}} --cpu-method {{method}} --timeout 0
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.targetDeploy it:
virtwork run --from-catalog my-stress --dry-run
virtwork run --from-catalog my-stress --params my-stress.cpu-load=75The generated cloud-init will install stress-ng, write the service file to /etc/systemd/system/workload.service, substitute {{cpu-load}} and {{method}} with param values, and enable the service.
A multi-role catalog entry produces a GenericMultiWorkload — multiple VM roles with per-role service files. This is the catalog equivalent of implementing the MultiVMWorkload interface.
Example — an iperf3 server/client benchmark:
~/.virtwork/catalog/my-benchmark/
├── workload.yaml
├── server.service
└── client.service
workload.yaml:
description: "iperf3 server/client benchmark"
packages:
- iperf3
params:
- key: duration
type: int
default: "60"
desc: "Test duration in seconds"
roles:
- name: server
vm-count: 1
- name: client
vm-count: 2server.service:
[Unit]
Description=iperf3 server
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/iperf3 -s
Restart=always
[Install]
WantedBy=multi-user.targetclient.service:
[Unit]
Description=iperf3 client
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/iperf3 -c server -t {{duration}} --forceflush
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetDeploy it:
virtwork run --from-catalog my-benchmark --dry-runThis creates 3 VMs total (1 server + 2 clients). Each role gets its own cloud-init with only that role's service file.
VM count resolution: If a role's vm-count is 0 (or omitted), the global --vm-count flag is used instead, with a minimum of 1 VM per role. Roles with an explicit vm-count > 0 always use the manifest value.
Catalog workloads use {{key}} placeholders in service file content. At deploy time, each placeholder is replaced with:
- The user-supplied value (via
--params entry-name.key=value), if provided - The
defaultfromworkload.yaml, otherwise
This is the catalog equivalent of calling w.GetParam("key") in a Go workload. The same five param types are supported: string, int, bool, list (semicolon-separated), and dict (semicolon-separated key=value pairs). The orchestrator validates param types before VM creation, rejecting unknown keys with "did you mean?" suggestions.
Catalog entries can declare persistent storage via the storage: field in workload.yaml. Each storage entry creates a CDI DataVolume, attaches it as an extra disk, and injects a diskSetupScript into the cloud-init userdata that waits for the device, formats it with XFS if empty, mounts it, and writes an /etc/fstab entry for persistence across reboots.
storage:
- name: data
size: 10Gi
serial: vw-data
mount: /mnt/dataField reference:
| Field | Required | Description |
|---|---|---|
name |
Yes | DataVolume base name (automatically suffixed with the VM name to avoid collisions) |
size |
Yes | Volume size (e.g., 10Gi, 500Mi) |
serial |
Yes | Virtio serial — the in-VM device is discovered at /dev/disk/by-id/virtio-<serial>, which is deterministic across reboots |
mount |
Yes | Mount point inside the VM (e.g., /mnt/data) |
This is the catalog equivalent of overriding DataVolumeTemplates(), ExtraDisks(), and ExtraVolumes() in a built-in workload. The same serial-based discovery pattern used by disk.go, database.go, and chaos_disk.go is applied automatically.
Example — fio workload with persistent storage:
workload.yaml:
description: "fio benchmark with persistent volume"
packages:
- fio
storage:
- name: virtwork-fio-data
size: 20Gi
serial: vw-fio
mount: /mnt/fio
params:
- key: runtime
type: int
default: "300"
desc: "fio runtime in seconds"workload.service:
[Unit]
Description=fio benchmark
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/bin/fio --name=randrw --directory=/mnt/fio --rw=randrw --bs=4k --size=1G --runtime={{runtime}} --time_based --ioengine=libaio --direct=1 --numjobs=4 --group_reporting
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.targetCatalog entries can declare a K8s Service via the service: field in workload.yaml. This creates a ClusterIP Service that other VMs (or pods) can reach via in-cluster DNS.
service:
ports:
- name: iperf
port: 5201
protocol: TCP
selector-role: serverField reference:
| Field | Required | Description |
|---|---|---|
ports[].name |
Yes | Port name |
ports[].port |
Yes | Port number |
ports[].protocol |
Yes | TCP or UDP |
selector-role |
No | For multi-role entries, the role whose VMs the Service selects. Omit for single-role entries (selects all VMs of this workload). |
This is the catalog equivalent of overriding RequiresService() and ServiceSpec() in a built-in workload. Clients reach the service via <workload-name>.<namespace>.svc.cluster.local.
Both --workloads and --from-catalog can be specified in the same invocation:
virtwork run --workloads cpu,memory --from-catalog my-stress --dry-runBehavior:
- If
--from-catalogis set but--workloadsis not explicitly provided, the default built-in workloads are cleared — only catalog entries run - If both flags are set, all specified workloads (built-in + catalog) run together
- A catalog entry cannot shadow a built-in workload name (e.g., naming a catalog entry
cpuproduces an error)
| Flag | Default | Description |
|---|---|---|
--catalog-dir |
~/.virtwork/catalog |
Path to catalog directory |
--from-catalog |
(none) | Catalog entry names to load (repeatable, comma-separated) |
| Environment Variable | Description |
|---|---|
VIRTWORK_FROM_CATALOG |
Same as --from-catalog |
YAML config:
catalog_dir: /path/to/catalog
from_catalog:
- my-stress
- my-benchmark- CLI parses
--from-catalogand--catalog-dirintoConfig.FromCatalogandConfig.CatalogDir - The orchestrator starts with
DefaultRegistry()(built-in workloads only) - For each catalog entry name,
LoadCatalogEntry()reads the directory, parses the manifest, and discovers service files CatalogEntry.Factory()returns aWorkloadFactorythat produces either aGenericWorkload(single-role) orGenericMultiWorkload(multi-role)- The entry is injected into the registry as a
RegistryEntrywith its factory and param schema - From this point, catalog workloads follow the same VM planning, creation, and cloud-init pipeline as built-in workloads
Whether adding a built-in workload or a catalog entry, write tests that cover the workload's cloud-init output and configuration.
Create internal/workloads/my_workload_test.go using Ginkgo:
var _ = Describe("MyWorkload", func() {
var wl *MyWorkload
BeforeEach(func() {
wl = NewMyWorkload(config.WorkloadConfig{CPUCores: 2, Memory: "2Gi"}, "virtwork", "", nil)
})
It("should return correct name", func() {
Expect(wl.Name()).To(Equal("my-workload"))
})
It("should produce valid cloud-init YAML", func() {
userdata, err := wl.CloudInitUserdata()
Expect(err).NotTo(HaveOccurred())
Expect(userdata).To(HavePrefix("#cloud-config"))
var parsed map[string]interface{}
Expect(yaml.Unmarshal([]byte(userdata), &parsed)).To(Succeed())
})
It("should reflect config in VMResources", func() {
res := wl.VMResources()
Expect(res.CPUCores).To(Equal(2))
Expect(res.Memory).To(Equal("2Gi"))
})
})Catalog entries are tested through GenericWorkload and GenericMultiWorkload. Create a temporary catalog directory in your test, write the manifest and service files, then verify loading and cloud-init output:
var _ = Describe("My catalog entry", func() {
var catalogDir string
BeforeEach(func() {
catalogDir, _ = os.MkdirTemp("", "virtwork-catalog-*")
entryDir := filepath.Join(catalogDir, "my-entry")
os.MkdirAll(entryDir, 0o755)
os.WriteFile(filepath.Join(entryDir, "workload.yaml"), []byte(`
description: "test entry"
packages:
- curl
params:
- key: url
type: string
default: "http://example.com"
desc: "Target URL"
`), 0o644)
os.WriteFile(filepath.Join(entryDir, "workload.service"), []byte(`[Service]
ExecStart=/usr/bin/curl {{url}}
`), 0o644)
})
AfterEach(func() {
os.RemoveAll(catalogDir)
})
It("should load and substitute params", func() {
entry, err := workloads.LoadCatalogEntry(catalogDir, "my-entry")
Expect(err).NotTo(HaveOccurred())
factory := entry.Factory()
wl := factory(config.WorkloadConfig{CPUCores: 1, Memory: "1Gi"}, &workloads.RegistryOpts{})
userdata, err := wl.CloudInitUserdata()
Expect(err).NotTo(HaveOccurred())
Expect(userdata).To(ContainSubstring("http://example.com"))
})
})See internal/workloads/catalog_test.go, generic_test.go, and generic_multi_test.go for comprehensive examples covering error cases, multi-role entries, and param overrides.
VMs can be configured with SSH access for debugging. Every execution is tracked in a local SQLite database for operational visibility. For the complete configuration reference (all flags, environment variables, YAML keys, and precedence rules), see configuration.md.
SSH key flags: --ssh-user, --ssh-password, --ssh-key, --ssh-key-file
When no SSH flags are provided, no user account is created in the VM. When any SSH credential is set, BaseWorkload.BuildCloudConfig() automatically injects a users block into the cloud-init output. Prefer SSH key-only authentication (--ssh-key-file) for anything beyond test/lab environments — passwords appear in plaintext in the VM spec.
Audit flags: --audit (default: true), --no-audit, --audit-db (default: virtwork.db)
Each execution gets a UUID applied as a virtwork/run-id label on all K8s resources. The audit database tracks execution parameters, workload details, VM details, resource details, and events across 5 tables. See audit-schema.md for the full schema reference.
Accessing VMs:
# Via virtctl (after deploying with --ssh-key-file)
virtctl ssh --ssh-key ~/.ssh/id_rsa virtwork@virtwork-cpu-0
# Via oc (port forward then SSH)
oc port-forward vmi/virtwork-cpu-0 2222:22
ssh -p 2222 virtwork@localhostQuerying the audit database:
# Recent executions
sqlite3 virtwork.db "SELECT id, run_id, command, status, started_at FROM audit_log ORDER BY id DESC LIMIT 10;"
# VMs created in a specific run
sqlite3 virtwork.db "SELECT vm_name, component, cpu_cores, memory, status FROM vm_details WHERE audit_id = 1;"
# Events timeline
sqlite3 virtwork.db "SELECT event_type, message, occurred_at FROM events WHERE audit_id = 1 ORDER BY occurred_at;"This project uses Conventional Commits:
| Prefix | Use For |
|---|---|
feat: |
New functionality |
fix: |
Bug fixes |
test: |
Test additions or changes |
docs: |
Documentation only |
refactor: |
Code restructuring without behavior change |
chore: |
Build, tooling, or maintenance |
Every commit must include a DCO Signed-off-by trailer matching the author's identity; a commit-msg hook enforces this.
Architecture and Design:
- architecture.md — Layered architecture, mermaid diagrams, key design decisions
Configuration:
- configuration.md — Complete config reference (flags, env vars, YAML keys, ConfigMap, SSH, audit)
Deployment:
- deployment.md — OpenShift deployment via Kustomize
Workloads:
- chaos-workloads.md — Operator guide for chaos-disk, chaos-network, chaos-process
Audit:
- audit-schema.md — SQLite schema for the audit database
Guides:
- guide/03-adding-a-workload.md — TDD walkthrough that builds a new Go workload from scratch
- guide/04-adding-a-catalog-workload.md — Hands-on tutorial for creating catalog workloads without Go code
Historical (frozen — not updated piecemeal):
implementation-plan.md— Original phased build planopenshift-virtualization-workload-automation.md— Initial design document