Releases: AI-Ocean/gpu-usage-audit
Releases · AI-Ocean/gpu-usage-audit
v1.0.3
v1.0.2
v1.0.1
Changes since v1.0.0
Full Changelog: v1.0.0...v1.0.1
v1.0.0
Changes since v0.4.1
- Prepare bare-metal 1.0 release (#12) (59e06c7)
- Prune post-1.0 runtime planning (#11) (944583d)
- Document bare metal 1.0 status (3fd8745)
- PR B: Packaging And Install UX (#10) (06b6599)
- Bare Metal Scope Reset (#9) (dd1d41d)
- Document bare metal 1.0 scope (856e0e0)
- Default daemon and report DB to tmp path (#8) (f15afdc)
- RuntimePlan and doctor v1 (#7) (239a333)
- Document uv tool update and uninstall (#6) (d2cee3b)
Full Changelog: v0.4.1...v1.0.0
v0.4.1
v0.4.0
Changes since v0.3.0
- Release hardening and download path (#3) (18a371f)
- PR 2: Command Surface Skeleton (#2) (3708b11)
- Merge pull request #1 from AI-Ocean/proposal/auto-runtime-architecture (e2880b8)
- ci(release): replace --generate-notes with git-log-based notes (0315928)
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Changes since v0.2.0
- docs+bump: 0.3.0 — separate demo/daemon, document --since semantics (3c8ade9)
- feat(cli): add 'demo' subcommand for self-contained fake-tier demos (804121c)
- docs(readme): trim out-of-scope; add Usage section with daemon/report semantics (0e90d3c)
- docs(readme): revive Quick demo, simplify install, add NVML section (ffa4556)
- chore: drop CHANGELOG.md; release workflow uses --generate-notes (6040b22)
Full Changelog: v0.2.0...v0.3.0
v0.2.0
First Python stable. Real NVIDIA NVML telemetry support added — daemon
now runs on real GPU hosts via --tier nvml.
Added
NVMLTier(gpu_usage_audit.nvml): real telemetry viapynvml
(nvidia-ml-py) — compute-running processes, per-card UUID +
utilization, bytes→MB memory conversion. Late-bound import so the
package works without the [nvml] extra;--tier nvmlraises a
friendly install hint if the extra is missing.daemon --tier {fake,nvml}flag (defaultfake). The fake source
remains usable on any host for the funnel demo; the NVML source is
for real GPU hosts.[nvml]optional dependency:pip install gpu-usage-audit[nvml]or
uvx --with nvidia-ml-py gpu-usage-audit ....
Fixed
FakeTiernow pins syntheticloginuid_uservalues (alice / bob /
None) so the daemon'ssystem_user_lookupno longer accidentally
resolves a real local user when a synthetic PID happens to exist on
the host.
v0.2.0a1
First Python alpha. The 5-section report and daemon loop are ported
from the Go v0.1.0 design; no real NVML yet.
Added
daemonsubcommand — FakeTier sampling into SQLite with anti-drift
scheduling and signal-driven shutdown (threading.Eventcancels
stop.wait(delay)).reportsubcommand — five-section retrospective report:- §1 Headline three-bar (active / idle-held / truly-idle).
- §2 Waste (
idle GPU-hours,equivalently-unused GPUs). - §3 Per-GPU breakdown.
- §4 Top identities (loginuid-resolved or
unknown). - §5 Day-of-week × hour activity heatmap.
FakeTier— deterministic 5-tick GPU-0 cycle
(active → idle-held → truly-idle → repeat), invariant GPU-1/2.Classify/Summarize/detect_env_kind(bare/docker/k8s)
ports of the Go v0.1.0 decisions.- SQLite layer:
journal_mode=WAL,busy_timeout=5000, indexes on
(gpu_uuid, ts), transactionalwrite_snapshot. version/helpsubcommands alongside--version/--help._durationargparse type:"30s"/"1h"/"200ms"parsing
(Gotime.ParseDurationsubset).- Test suite (85 tests, standard
testing-style withpytest): unit
tests for every domain module plus CLI smoke and integration fixtures. - GitHub Actions CI:
ruff+mypy --strict+pyteston every push
/ PR viauv sync --all-groups --locked. - Release workflow: tag push (
v*) →uv build→ GitHub Release with
sdist + wheel attached, release notes extracted from this CHANGELOG.
Notes
- This is an alpha — the binary's
--helpworks, daemon/report run
end-to-end on fake telemetry, but real NVML is not wired yet. - PyPI distribution requires trusted-publishing setup; until then the
wheel/sdist are downloadable from the GitHub Release page. - Go v0.1.0 remains downloadable at the
v0.1.0tag /go-archive
branch.
v0.1.0
First public release.
Added
daemonsubcommand — periodic GPU/process sampling into SQLite with
anti-drift scheduling, signal-driven shutdown, and single-transaction
per tick.reportsubcommand — five-section retrospective report from any
accumulated database file:- §1 Headline: active / idle-held / truly-idle proportions with a
glyph-differentiated three-bar. - §2 Waste: idle GPU-hours and equivalent unused GPU count.
- §3 Per-GPU: idle-held breakdown by card.
- §4 Top identities: by-user GPU-hours and idle-held share.
- §5 Heatmap: day-of-week × hour activity grid.
- §1 Headline: active / idle-held / truly-idle proportions with a
FakeTier— deterministic time-varying fake telemetry source so the
daemon is exercisable on any host (no NVIDIA driver required).- Identity resolution via
/proc/<pid>/loginuidwithUserLookupFunc
abstraction; pluggable table-based lookup for tests. - Host environment auto-detection (
bare/docker/k8s) from
/proc/1/cgroup. - Three-table schema (
host,gpu_sample,proc_sample) — minimal
surface aligned to the idle-held question. - SQLite
journal_mode=WAL+busy_timeout=5000, so the daemon and
reportcan share the same database file withoutSQLITE_BUSY. - Indexes
idx_gpu_sample_uuid_tsandidx_proc_sample_uuid_tson
(gpu_uuid, ts)for card-keyed time-window queries. help/versionsubcommands (alongside--help/--version).- Unit and DB-layer test coverage (standard
testingonly, no
third-party deps):Classify,DetectEnvKind,Summarize,
FakeTierphase cycle, and allLoad*report queries against
a real on-disk SQLite fixture. - GitHub Actions CI:
vet+ race-enabledtest+buildon every
push and pull request. - Apache 2.0 license, Makefile (
build/run/test/clean),
--versioninjected at link time.
Notes
- v0.1.0 ships fake telemetry only — the daemon is exercisable on any
host. Real NVML support is targeted for v0.2.0. - The legacy
gpu-usage-audit(v0.1.x) project is archived in favour
of this rewrite.