voomd is an experimental GPU pressure daemon for Linux systems running AMD GPUs.
The goal is simple: treat VRAM pressure a bit more like RAM pressure, with policy, hysteresis, workload classification, and reclaim decisions that are better than "hope the driver recovers".
Important
This is a dry-run prototype published as a useful snapshot, not a production-ready daemon. It is currently good for observation, policy iteration, and homelab experimentation. It is not yet something I would tell strangers to trust with real enforcement on important workloads.
What exists today:
- a runnable daemon script:
voomd - dry-run pressure tracking
- workload classification by cgroup/unit/cmdline
- per-device pressure states with hysteresis
- structured decision logging
- multi-source telemetry strategy:
amd-smias primary- DRM sysfs as fallback
- stale workload reuse when richer telemetry stalls
What does not exist yet:
- production-grade enforcement
- strong guarantees across different AMD stacks
- polished packaging/service/install UX
- broad hardware validation
Because the idea and the current implementation are already useful.
I would rather publish an honest experimental repo than wait for a level of completeness that may take months while my attention moves to other projects. Maintenance may be sporadic. README statements about future work are intent, not a promise.
voomd currently models:
NORMALGUARDEDPRESSURECRITICAL
and classifies workloads roughly into:
critical-no-killgraceful-reclaimkillable
It watches GPU pressure, keeps state in ~/.local/state/voomd, and logs what it would reclaim in dry-run mode.
voomd: current daemon scriptexamples/voomd.json: example config from the live homelab prototypedocs/VOOMD_MVP.md: original MVP design notes
- telemetry/provider mapping is still being refined
- per-process attribution is only as good as the active provider
- current behavior is tuned on one homelab, not generalized across many AMD systems
- the daemon should be considered "policy research with working code", not finished infrastructure
Use it if you want to:
- inspect GPU pressure behavior
- prototype reclaim policy on AMD/Linux
- compare telemetry sources under load
- build your own GPU
oomd-style control loop
Do not use it yet if you need:
- low-risk autonomous kill decisions
- polished deployment
- wide hardware support guarantees
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).