diff --git a/README.md b/README.md
index 2b3538d..f9d08a2 100644
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@ Jupyter notebook open with an 8 GB tensor on the GPU and went to
 lunch — `nvidia-smi` will show 1% utilization, but the card is
 *unusable* by anyone else. This tool measures that.
 
-> **Status:** main is being reset around the bare-metal 1.0 scope.
+> **Status:** main tracks the bare-metal 1.0 scope.
 > `gua doctor` checks only the current machine. `daemon` records NVML
 > telemetry from the current NVIDIA host, `report` reads the resulting
 > SQLite database, and `demo` runs anywhere with fake telemetry. The Go
diff --git a/projects/auto-runtime-audit/plan.ko.md b/projects/auto-runtime-audit/plan.ko.md
deleted file mode 100644
index 0f159ec..0000000
--- a/projects/auto-runtime-audit/plan.ko.md
+++ /dev/null
@@ -1,357 +0,0 @@
-# Auto Runtime Audit 개발 계획
-
-상태: 보류
-범위: auto-runtime architecture 제안을 구현하기 위한 개발 계획
-
-> 2026-05-14 scope reset: 1.0 제품은 auto-runtime/cluster-wide audit 대신
-> **설치된 현재 베어메탈 머신**만 진단하고 수집하는 방향으로 정리한다. 1.0
-> 기준 문서는 `projects/bare-metal-1.0/plan.ko.md`를 따른다. 이 문서는
-> Kubernetes, Slurm, Docker/Podman, scheduler allocation-aware report를 다시
-> 확장할 때 참고할 보류 문서로 남긴다.
-
-## 목표
-
-`gpu-usage-audit`를 실제 GPU telemetry와 scheduler allocation context를
-결합하는 retrospective audit 도구로 만든다.
-
-제품은 다음 질문에 답해야 한다.
-
-- 누가 GPU capacity를 할당받았는가?
-- 할당받은 GPU를 실제로 사용했는가?
-- scheduler allocation 없이 GPU를 사용한 주체는 누구인가?
-- 어떤 GPU가 memory-held 상태였지만 compute-idle이었는가?
-
-구현은 top-down으로 진행한다. 먼저 사용자에게 보일 report, runtime plan,
-data model, fake end-to-end flow를 정의한다. 그 다음 실제 host, Kubernetes,
-Slurm adapter를 붙인다.
-
-## 기대 아키텍처
-
-기대하는 module 경계:
-
-```text
-gpu_usage_audit/
-  cli/              # gua doctor/start/status/report/stop
-  doctor/           # environment check와 RuntimePlan 생성
-  runtime/          # collector가 어디에서 실행되는가
-  telemetry/        # 실제 GPU fact, 보통 NVML
-  scheduler/        # allocation과 ownership context
-  attribution/      # PID -> pod/job/user 매핑
-  storage/          # SQLite schema, migration, export, rollup
-  report/           # classification, aggregation, rendering
-  packaging/        # systemd unit, k8s manifest, OCI image
-```
-
-핵심 분리:
-
-```text
-Runtime placement: collector process가 어디에서 실행되는가?
-Telemetry source: 실제 GPU 상태를 어떻게 관측하는가?
-Scheduler context: 누가 GPU capacity를 할당받았는가?
-Attribution: 관측된 PID를 owner로 어떻게 되돌려 매핑하는가?
-Report model: telemetry와 allocation을 어떻게 결합하는가?
-```
-
-Kubernetes와 Slurm은 scheduler context provider다. telemetry source가 아니다.
-기본 telemetry source는 계속 NVML이다.
-
-## 지원 영역
-
-| 영역 | Runtime | Telemetry | Scheduler | 기대 기능 |
-|---|---|---|---|---|
-| Bare metal | host systemd 또는 foreground | NVML | none | active / idle-held / truly-idle |
-| Bare metal + Slurm | host systemd | NVML | Slurm | job, user, account audit |
-| Kubernetes / GPU Operator | DaemonSet | pod 내부 NVML | Kubernetes | pod와 namespace audit |
-| Local Docker/Podman | local container | container 내부 NVML | none | host 직접 실행이 불가능할 때 fallback |
-| Demo/test | foreground | fake | fake 또는 none | GPU 접근 없이 제품 의미 검증 |
-
-## Delivery 원칙
-
-- 모든 PR은 독립적으로 merge 가능해야 하며, merge 후 프로젝트는 동작 가능한
-  상태여야 한다.
-- 새 `gua` command surface를 도입하는 동안 기존 command는 compatibility
-  alias로 유지할 수 있다.
-- detection은 read-only여야 한다. package를 설치하거나 system/cluster 상태를
-  변경하면 안 된다.
-- `start`는 system 또는 cluster 상태를 변경하기 전에 concrete plan을 보여줘야
-  한다.
-- runtime placement와 scheduler context는 독립적으로 감지해야 한다.
-- fake telemetry와 fake scheduler flow로 실제 cluster integration 전에 report
-  semantics를 검증해야 한다.
-
-## PR 계획
-
-### PR 1: Proposal And Roadmap
-
-현재 PR.
-
-Deliver:
-
-- Auto-runtime architecture proposal.
-- 한국어 번역본.
-- 이 PR 단위 개발 계획.
-
-Working state:
-
-- 문서 변경만 포함한다.
-- runtime behavior 변경은 없다.
-
-Merge 전 정리:
-
-- runtime placement와 scheduler context가 독립적이라는 점을 명확히 한다.
-- Kubernetes owner identity는 안정적인 UID를 기준으로 두고, namespace/name은
-  display field로 둔다.
-- GPU request 없이 `NVIDIA_VISIBLE_DEVICES=all`이 있는 경우 anomaly로 다루되,
-  이 collector, DCGM, NVIDIA device/plugin component 같은 GPU management
-  agent는 명시적으로 예외 처리한다.
-- 의도하지 않은 Markdown trailing whitespace를 제거한다. 단, hard line break가
-  의도된 경우는 예외다.
-
-### PR 2: Command Surface Skeleton
-
-Deliver:
-
-- `gua` console entry point.
-- Top-level commands:
-
-```sh
-gua doctor
-gua start --dry-run
-gua status
-gua report
-gua stop
-gua uninstall
-```
-
-- 기존 `gpu-usage-audit daemon/report/demo` compatibility path.
-- unsupported 또는 아직 설치되지 않은 mode에 대한 명확한 placeholder behavior.
-- CLI smoke test.
-
-Working state:
-
-- 사용자는 새 command surface를 실행해볼 수 있다.
-- 기존 문서화된 command는 계속 동작한다.
-- `start/status/stop`은 아무것도 조용히 변경하지 않는다.
-
-### PR 3: RuntimePlan And Doctor V1
-
-Deliver:
-
-- `RuntimePlan` model.
-- `gua doctor` human-readable output.
-- `gua doctor --json`.
-- `gua start --dry-run`에서 recommended plan 출력.
-- 다음 항목에 대한 read-only check:
-  - OS/kernel/Python.
-  - `/dev/nvidia*`.
-  - NVML load/init/device count.
-  - `kubectl` 존재와 auth.
-  - Kubernetes runtime signal.
-  - Slurm command/config signal.
-  - Docker/Podman NVIDIA fallback signal.
-
-Working state:
-
-- 사용자는 아무것도 설치하지 않고 현재 machine에 어떤 runtime path가 추천되는지
-  이해할 수 있다.
-
-### PR 4: Data Model V2 And Migration
-
-Deliver:
-
-- Schema versioning과 migration.
-- `node`.
-- 확장된 `gpu_sample`.
-- `gpu_process_sample`.
-- `allocation_sample`.
-- `owner_sample`.
-- legacy DB read compatibility.
-
-Working state:
-
-- 기존 host daemon/report behavior가 새 schema에서도 계속 동작한다.
-- scheduler allocation이 없어도 report는 기존 active / idle-held / truly-idle
-  view를 출력한다.
-
-### PR 5: Combined Classification And Fake Scheduler
-
-Deliver:
-
-- Allocation-aware classification:
-
-```text
-allocated-active
-allocated-idle-held
-allocated-unused
-unallocated-active
-unallocated-idle-held
-truly-idle
-unknown-active
-unknown-idle-held
-unknown-unused
-```
-
-- Fake scheduler adapter.
-- allocated, unallocated, unknown allocation state를 모두 포함하는 demo data.
-- combined class report section.
-- classification과 report aggregation test.
-
-Working state:
-
-- 실제 GPU, Kubernetes, Slurm 없이도 최종 제품 의미를 검증할 수 있다.
-
-### PR 6: Install State And Local Host Runtime
-
-Deliver:
-
-- Local install state file.
-- Default DB path.
-- Host foreground runtime adapter.
-- `gua start --mode host --foreground`.
-- `gua status`.
-- `--db`가 생략되면 state를 사용하는 `gua report --since ...`.
-- 가능한 foreground/state-aware flow에서 `gua stop`.
-
-Working state:
-
-- Single-host 사용자는 매 command마다 직접 `--db`를 넘기지 않고 새 `gua`
-  workflow를 사용할 수 있다.
-
-### PR 7: Systemd Host Runtime
-
-Deliver:
-
-- systemd unit template.
-- `gua start --mode host`.
-- `gua stop`.
-- `gua uninstall`.
-- `gua uninstall --delete-data`.
-- `--dry-run`과 `--yes`.
-- root/permission diagnostic.
-- 기본 data 보존.
-
-Working state:
-
-- bare-metal host collection을 새 UX로 설치, 중지, 제거할 수 있다.
-
-### PR 8: Kubernetes Manifest Dry Run
-
-Deliver:
-
-- 내장 Kubernetes manifest template.
-- Namespace, ServiceAccount, RBAC, ConfigMap, DaemonSet rendering.
-- GPU-capable node targeting logic.
-- `hostPID: true` 기본값.
-- `--no-host-pid` opt-out.
-- plan output의 security와 RBAC 설명.
-
-Working state:
-
-- 사용자는 Kubernetes cluster에 무엇이 설치될지 apply 없이 정확히 검토할 수 있다.
-
-### PR 9: Kubernetes Runtime Adapter
-
-Deliver:
-
-- 공식 OCI image path.
-- `gua start --mode k8s`.
-- `gua status --mode k8s`.
-- `gua stop --mode k8s`.
-- `kubectl apply/delete` integration.
-- Collector pod discovery.
-- Node별 hostPath SQLite DB.
-- Node-level last-sample status.
-
-Working state:
-
-- Kubernetes GPU node에서 DaemonSet으로 collector를 실행할 수 있다.
-- Scheduler attribution은 아직 limited일 수 있다.
-
-### PR 10: Kubernetes Report Export
-
-Deliver:
-
-- `gua report --since ... --node NODE`.
-- `gua report --since ... --all-nodes`.
-- Collector pod fan-out.
-- Windowed export.
-- JSONL export format.
-- Parallel collection.
-- `pods/exec` RBAC diagnostic.
-
-Working state:
-
-- 사용자는 per-node collector database에서 cluster-level report를 만들 수 있다.
-
-### PR 11: Kubernetes Scheduler Attribution
-
-Deliver:
-
-- Kubernetes API owner snapshot.
-- Pod UID 기반 owner identity.
-- PodResources API integration.
-- Pod resource request/limit parsing.
-- `/proc/<pid>/cgroup` PID-to-pod mapping.
-- cgroup v1/v2 parser coverage.
-- `NVIDIA_VISIBLE_DEVICES=all` anomaly detection.
-- GPU management pod exception.
-
-Working state:
-
-- Kubernetes report에서 pod/namespace별 allocated-active, allocated-unused,
-  unallocated-active, unallocated-idle-held를 볼 수 있다.
-
-### PR 12: Slurm Doctor And Scheduler Adapter
-
-Deliver:
-
-- Doctor의 Slurm detection.
-- `scontrol`, `squeue`, optional `sacct` integration.
-- Node-level running job allocation snapshot.
-- job/user/account owner model.
-- requested GPU count.
-- cgroup PID-to-job mapping.
-- best-effort exact GPU-to-job mapping.
-
-Working state:
-
-- Slurm compute node에서 job, user, account별 GPU usage report가 동작한다.
-
-### PR 13: Rollup And Retention
-
-Deliver:
-
-- Raw sample retention policy.
-- 1-minute rollup table.
-- Combined class rollup.
-- Cleanup command.
-- raw와 rollup window를 함께 읽는 report.
-
-Working state:
-
-- 장기 실행 collector가 core audit class를 잃지 않으면서 DB size를 통제한다.
-
-### PR 14: Packaging And Release Polish
-
-Deliver:
-
-- host, Kubernetes, Slurm, demo path를 위한 README quickstart.
-- Troubleshooting matrix.
-- Wheel release verification.
-- OCI image release workflow.
-- Manifest path가 안정화되었다면 optional Helm chart.
-
-Working state:
-
-- 새 사용자가 문서만 보고 install, start, inspect, report, uninstall을 진행할 수
-  있다.
-
-## 권장 Merge 순서
-
-핵심 foundation은 PR 2부터 PR 5까지다.
-
-```text
-CLI surface -> RuntimePlan/doctor -> schema V2 -> combined report semantics
-```
-
-그 다음 host, Kubernetes, Slurm은 안정된 contract 위에 붙는 adapter 작업이 된다.
diff --git a/projects/auto-runtime-audit/plan.md b/projects/auto-runtime-audit/plan.md
deleted file mode 100644
index 4683402..0000000
--- a/projects/auto-runtime-audit/plan.md
+++ /dev/null
@@ -1,359 +0,0 @@
-# Auto Runtime Audit Development Plan
-
-Status: on hold
-Scope: development plan for the auto-runtime architecture proposal
-
-> 2026-05-14 scope reset: the 1.0 product is focused on diagnosing and
-> collecting from **the currently installed bare-metal machine**, not
-> auto-runtime or cluster-wide audit. The 1.0 plan of record is
-> `projects/bare-metal-1.0/plan.ko.md`. This document remains as a deferred
-> reference for a future expansion back into Kubernetes, Slurm, Docker/Podman,
-> and scheduler allocation-aware reporting.
-
-## Goal
-
-Build `gpu-usage-audit` as a retrospective audit tool that joins actual GPU
-telemetry with scheduler allocation context.
-
-The product should answer:
-
-- Who was allocated GPU capacity?
-- Did they actually use it?
-- Who used GPUs without scheduler allocation?
-- Which GPUs were memory-held but compute-idle?
-
-The implementation should be top-down. First define the user-facing report,
-runtime plan, data model, and fake end-to-end flow. Then attach real host,
-Kubernetes, and Slurm adapters.
-
-## Architecture Shape
-
-Expected module boundaries:
-
-```text
-gpu_usage_audit/
-  cli/              # gua doctor/start/status/report/stop
-  doctor/           # environment checks and RuntimePlan creation
-  runtime/          # where the collector runs
-  telemetry/        # actual GPU facts, usually NVML
-  scheduler/        # allocation and ownership context
-  attribution/      # PID -> pod/job/user mapping
-  storage/          # SQLite schema, migration, export, rollup
-  report/           # classification, aggregation, rendering
-  packaging/        # systemd units, k8s manifests, OCI image
-```
-
-The core separation:
-
-```text
-Runtime placement: where does the collector process run?
-Telemetry source: how do we observe actual GPU state?
-Scheduler context: who was allocated GPU capacity?
-Attribution: how do observed PIDs map back to owners?
-Report model: how do telemetry and allocation combine?
-```
-
-Kubernetes and Slurm are scheduler context providers. They are not telemetry
-sources. The default telemetry source remains NVML.
-
-## Supported Areas
-
-| Area | Runtime | Telemetry | Scheduler | Expected capability |
-|---|---|---|---|---|
-| Bare metal | host systemd or foreground | NVML | none | active / idle-held / truly-idle |
-| Bare metal + Slurm | host systemd | NVML | Slurm | job, user, account audit |
-| Kubernetes / GPU Operator | DaemonSet | NVML inside pod | Kubernetes | pod and namespace audit |
-| Local Docker/Podman | local container | NVML inside container | none | fallback when host execution is unavailable |
-| Demo/test | foreground | fake | fake or none | product semantics without GPU access |
-
-## Delivery Principles
-
-- Every PR must merge independently and leave the project in a working state.
-- Existing commands may remain as compatibility aliases while the new `gua`
-  command surface is introduced.
-- Detection must be read-only. It must not install packages or mutate system or
-  cluster state.
-- `start` must show a concrete plan before changing system or cluster state.
-- Runtime placement and scheduler context must be detected independently.
-- Fake telemetry and fake scheduler flows should prove the report semantics
-  before real cluster integrations are added.
-
-## PR Plan
-
-### PR 1: Proposal And Roadmap
-
-Current PR.
-
-Deliver:
-
-- Auto-runtime architecture proposal.
-- Korean translation.
-- This PR-based development plan.
-
-Working state:
-
-- Documentation-only change.
-- No runtime behavior changes.
-
-Before merge:
-
-- Clarify that runtime placement and scheduler context are independent.
-- Use Kubernetes UID as the stable owner identity, with namespace/name as
-  display fields.
-- Treat `NVIDIA_VISIBLE_DEVICES=all` without GPU request as an anomaly, with
-  explicit exceptions for GPU management agents such as this collector, DCGM,
-  and NVIDIA device/plugin components.
-- Remove unintended Markdown trailing whitespace unless a hard line break is
-  deliberately required.
-
-### PR 2: Command Surface Skeleton
-
-Deliver:
-
-- `gua` console entry point.
-- Top-level commands:
-
-```sh
-gua doctor
-gua start --dry-run
-gua status
-gua report
-gua stop
-gua uninstall
-```
-
-- Existing `gpu-usage-audit daemon/report/demo` compatibility path.
-- Clear placeholder behavior for unsupported or not-yet-installed modes.
-- CLI smoke tests.
-
-Working state:
-
-- Users can run the new command surface.
-- Existing documented commands still work.
-- `start/status/stop` do not silently mutate anything.
-
-### PR 3: RuntimePlan And Doctor V1
-
-Deliver:
-
-- `RuntimePlan` model.
-- `gua doctor` human-readable output.
-- `gua doctor --json`.
-- `gua start --dry-run` rendering the recommended plan.
-- Read-only checks for:
-  - OS/kernel/Python.
-  - `/dev/nvidia*`.
-  - NVML load/init/device count.
-  - `kubectl` presence and auth.
-  - Kubernetes runtime signals.
-  - Slurm command/config signals.
-  - Docker/Podman NVIDIA fallback signals.
-
-Working state:
-
-- Users can understand which runtime path is recommended on the current
-  machine without installing anything.
-
-### PR 4: Data Model V2 And Migration
-
-Deliver:
-
-- Schema versioning and migration.
-- `node`.
-- expanded `gpu_sample`.
-- `gpu_process_sample`.
-- `allocation_sample`.
-- `owner_sample`.
-- Legacy DB read compatibility.
-
-Working state:
-
-- Existing host daemon/report behavior continues on the new schema.
-- Scheduler allocation may be absent, but reports still produce the legacy
-  active / idle-held / truly-idle view.
-
-### PR 5: Combined Classification And Fake Scheduler
-
-Deliver:
-
-- Allocation-aware classification:
-
-```text
-allocated-active
-allocated-idle-held
-allocated-unused
-unallocated-active
-unallocated-idle-held
-truly-idle
-unknown-active
-unknown-idle-held
-unknown-unused
-```
-
-- Fake scheduler adapter.
-- Demo data covering allocated, unallocated, and unknown allocation states.
-- Report section for combined classes.
-- Tests for classification and report aggregation.
-
-Working state:
-
-- The final product meaning is testable without real GPUs, Kubernetes, or Slurm.
-
-### PR 6: Install State And Local Host Runtime
-
-Deliver:
-
-- Local install state file.
-- Default DB path.
-- Host foreground runtime adapter.
-- `gua start --mode host --foreground`.
-- `gua status`.
-- `gua report --since ...` using state when `--db` is omitted.
-- `gua stop` for foreground/state-aware flows where applicable.
-
-Working state:
-
-- Single-host users can use the new `gua` workflow without manually passing
-  `--db` through every command.
-
-### PR 7: Systemd Host Runtime
-
-Deliver:
-
-- systemd unit template.
-- `gua start --mode host`.
-- `gua stop`.
-- `gua uninstall`.
-- `gua uninstall --delete-data`.
-- `--dry-run` and `--yes`.
-- root/permission diagnostics.
-- Data preservation by default.
-
-Working state:
-
-- Bare-metal host collection can be installed, stopped, and removed through the
-  new UX.
-
-### PR 8: Kubernetes Manifest Dry Run
-
-Deliver:
-
-- Embedded Kubernetes manifest templates.
-- Namespace, ServiceAccount, RBAC, ConfigMap, and DaemonSet rendering.
-- GPU-capable node targeting logic.
-- `hostPID: true` default.
-- `--no-host-pid` opt-out.
-- Security and RBAC explanation in the plan output.
-
-Working state:
-
-- Users can inspect exactly what would be installed in a Kubernetes cluster
-  without applying it.
-
-### PR 9: Kubernetes Runtime Adapter
-
-Deliver:
-
-- Official OCI image path.
-- `gua start --mode k8s`.
-- `gua status --mode k8s`.
-- `gua stop --mode k8s`.
-- `kubectl apply/delete` integration.
-- Collector pod discovery.
-- Per-node hostPath SQLite DB.
-- Node-level last-sample status.
-
-Working state:
-
-- Kubernetes GPU nodes can run collectors through a DaemonSet.
-- Scheduler attribution may still be limited.
-
-### PR 10: Kubernetes Report Export
-
-Deliver:
-
-- `gua report --since ... --node NODE`.
-- `gua report --since ... --all-nodes`.
-- Collector pod fan-out.
-- Windowed export.
-- JSONL export format.
-- Parallel collection.
-- `pods/exec` RBAC diagnostics.
-
-Working state:
-
-- Users can generate a cluster-level report from per-node collector databases.
-
-### PR 11: Kubernetes Scheduler Attribution
-
-Deliver:
-
-- Kubernetes API owner snapshot.
-- Pod UID based owner identity.
-- PodResources API integration.
-- Pod resource request/limit parsing.
-- `/proc/<pid>/cgroup` PID-to-pod mapping.
-- cgroup v1/v2 parser coverage.
-- `NVIDIA_VISIBLE_DEVICES=all` anomaly detection.
-- GPU management pod exceptions.
-
-Working state:
-
-- Kubernetes reports can show allocated-active, allocated-unused,
-  unallocated-active, and unallocated-idle-held by pod/namespace.
-
-### PR 12: Slurm Doctor And Scheduler Adapter
-
-Deliver:
-
-- Slurm detection in doctor.
-- `scontrol`, `squeue`, and optional `sacct` integration.
-- Node-level running job allocation snapshot.
-- job/user/account owner model.
-- Requested GPU count.
-- cgroup PID-to-job mapping.
-- Best-effort exact GPU-to-job mapping.
-
-Working state:
-
-- Slurm compute nodes can report GPU usage by job, user, and account.
-
-### PR 13: Rollup And Retention
-
-Deliver:
-
-- Raw sample retention policy.
-- 1-minute rollup tables.
-- Combined class rollup.
-- Cleanup command.
-- Report support for raw plus rollup windows.
-
-Working state:
-
-- Long-running collectors keep DB size under control without losing the core
-  audit classes.
-
-### PR 14: Packaging And Release Polish
-
-Deliver:
-
-- README quickstart for host, Kubernetes, Slurm, and demo paths.
-- Troubleshooting matrix.
-- Wheel release verification.
-- OCI image release workflow.
-- Optional Helm chart, if the manifest path has stabilized.
-
-Working state:
-
-- A new user can install, start, inspect, report, and uninstall using the docs.
-
-## Recommended Merge Order
-
-The critical foundation is PR 2 through PR 5:
-
-```text
-CLI surface -> RuntimePlan/doctor -> schema V2 -> combined report semantics
-```
-
-After that, host, Kubernetes, and Slurm become adapter work against stable
-contracts.
diff --git a/projects/bare-metal-1.0/handoff.ko.md b/projects/bare-metal-1.0/handoff.ko.md
index c1a4b29..fea4c2a 100644
--- a/projects/bare-metal-1.0/handoff.ko.md
+++ b/projects/bare-metal-1.0/handoff.ko.md
@@ -22,11 +22,15 @@
 - `daemon`은 기존 DB 파일이 있으면 실패한다.
 - `report`는 DB 파일이 없으면 실패한다.
 - `gua`의 사용자 표면은 `doctor`만 남긴다.
+- auto-runtime proposal/project 문서는 삭제했다. Kubernetes/Slurm/Docker/Podman
+  확장을 다시 시작하려면 새 proposal로 시작한다.
 
 ## 현재 상태
 
 - PR A: implemented in PR #9.
 - PR B: implemented in PR #10.
+- Post-1.0 cleanup: 완료. auto-runtime 문서와 `RuntimePlan`/env detection
+  잔재를 제거했다.
 - PR C: 구현 대부분은 README/CLI에 반영된 것으로 보이나 계획서에는 아직 완료
   상태가 없다.
 - PR D: 대기. 현재 버전은 `0.4.1`이며 1.0 release bump는 아직 하지 않았다.
@@ -34,14 +38,18 @@
 마지막 로컬 검증은 모두 통과했다.
 
 ```sh
-uv run pytest
 uv run ruff check
 uv run ruff format --check
 uv run mypy
-uv build --out-dir /tmp/gua-dist-check-20260515
-bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-check-20260515/gpu_usage_audit-0.4.1-py3-none-any.whl
+uv run pytest
+uv build --out-dir /tmp/gua-dist-prune-20260515
+bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-prune-20260515/gpu_usage_audit-0.4.1-py3-none-any.whl
 ```
 
+cleanup 후 결과는 `pytest` 107 passed, `mypy` 25 source files, `ruff format`
+26 files 기준이다. `/tmp/gua-dist-prune-20260515`로 build와 wheel smoke도
+통과했다.
+
 ## 주의할 점
 
 - 현재 로컬 개발 머신은 NVIDIA host가 아니다. `gua doctor`가 unsupported를 내는 것은
@@ -49,6 +57,8 @@ bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-check-20260515/gpu_usage_audit-0.
 - `/tmp/gua.db`가 이미 존재한다. 기본 경로 daemon 테스트는 이 파일 때문에 실패하는
   것이 기대 동작이다.
 - 실제 1.0 acceptance는 NVIDIA 베어메탈 호스트에서만 닫을 수 있다.
+- `daemon`과 `demo`는 host row의 `env_kind`를 항상 `"bare"`로 기록한다. 1.0은
+  container/k8s runtime 감지를 하지 않는다.
 - PR C를 닫기 전에 문서만 보고 끝내지 말고, 기존 DB 존재/부재 error UX가 README와
   CLI 출력에서 서로 같은 메시지를 주는지 확인한다.
 - PR D에서 tag를 만들기 전에는 `scripts/check-tag-version.py`가 tag와
diff --git a/projects/bare-metal-1.0/plan.ko.md b/projects/bare-metal-1.0/plan.ko.md
index 1856439..ec119a6 100644
--- a/projects/bare-metal-1.0/plan.ko.md
+++ b/projects/bare-metal-1.0/plan.ko.md
@@ -234,7 +234,8 @@ Deliver:
 - [x] auto-runtime doctor 구현 제거 또는 축소.
 - [x] `gua doctor`를 local machine / host NVML readiness 전용으로 재작성.
 - [x] k8s/slurm/docker signal 제거.
-- [x] `RuntimePlan`을 host/unsupported 중심으로 축소.
+- [x] auto-runtime `RuntimePlan` 잔재를 제거하고 `gua doctor` 내부의
+  `DoctorPlan`으로 축소.
 - [x] README의 제품 설명을 single-host bare-metal 중심으로 재정렬.
 - [x] `gua start/status/report/stop/uninstall` placeholder 사용자 표면 제거.
 - [x] `gua doctor --db PATH`로 실제 daemon/report DB 경로를 점검.
@@ -313,16 +314,12 @@ gpu-usage-audit report --since 1h --interval 30s
 
 ## Deferred Work
 
-아래는 1.0 GA 전 또는 1.0 이후 다시 검토한다.
+아래는 1.0 GA 전 또는 이후 다시 검토할 수 있는 운영 품질 항목이다. Kubernetes,
+Slurm, Docker/Podman, scheduler allocation, managed runtime 같은 1.0 이후
+제품 확장은 현재 코드베이스와 프로젝트 문서에서 제거했다. 다시 진행하려면 새
+proposal로 시작한다.
 
 - `nvidia-ml-py` upper bound 정책 (`>=12.535,<13` 같은 known-good range 여부).
 - `NVMLInfo.failure_kind` 같은 구조적 실패 타입 도입.
 - unsupported text output에 `Blockers:` 섹션을 별도로 노출할지 결정.
 - raw NVML detail의 redact 옵션 또는 JSON 필드 분리.
-- Kubernetes current-node 진단.
-- GPU Operator staged NVML path.
-- Slurm allocation context.
-- Docker/Podman fallback collector.
-- scheduler allocation-aware report.
-- DB schema v2.
-- managed `gua start/status/stop/uninstall`.
diff --git a/projects/bare-metal-1.0/status.ko.md b/projects/bare-metal-1.0/status.ko.md
index f72629d..5962303 100644
--- a/projects/bare-metal-1.0/status.ko.md
+++ b/projects/bare-metal-1.0/status.ko.md
@@ -5,11 +5,11 @@
 ## 요약
 
 Bare Metal 1.0은 단일 NVIDIA 베어메탈 호스트만 대상으로 하는 방향으로 정리되어
-있다. PR A/B 범위는 구현 완료 상태이고, 다음으로는 PR C runbook hardening을
+있다. PR A/B 범위는 구현 완료 상태이고, 이번 cleanup에서 1.0 이후 확장을 위한
+auto-runtime 문서와 코드 잔재를 제거했다. 다음으로는 PR C runbook hardening을
 닫을지 확인한 뒤 PR D release prep으로 넘어가면 된다.
 
-점검 시작 시 워크트리는 깨끗했다. 현재 변경분은 이 `status.ko.md`와
-`handoff.ko.md` 추가뿐이다.
+cleanup 시작 시 워크트리는 깨끗했다.
 
 ## 구현 상태
 
@@ -20,37 +20,48 @@ Bare Metal 1.0은 단일 NVIDIA 베어메탈 호스트만 대상으로 하는 
 | Packaging UX | 완료 | `nvidia-ml-py`가 기본 dependency이고 `nvml` extra는 빈 compatibility alias. |
 | `daemon`/`report` DB UX | 구현됨 | 기본 DB는 `/tmp/gua.db`; daemon은 기존 DB를 거부하고 report는 없는 DB를 거부. |
 | README bare-metal 문서 | 대부분 완료 | 2-shell flow, systemd 예시, 운영 notes가 들어가 있음. |
+| Post-1.0 cleanup | 완료 | auto-runtime proposal/project 문서, k8s/docker env 감지, `RuntimePlan` 잔재 제거. |
 | PR C closure | 미확정 | 계획서에는 아직 완료 표시가 없다. README와 CLI UX를 기준으로 닫을지 최종 확인 필요. |
 | PR D release prep | 대기 | 현재 package version은 `0.4.1`; 1.0 릴리스 버전 bump와 릴리스 노트 정리가 남음. |
 | NVIDIA host acceptance | 미검증 | 현재 로컬 머신에는 NVIDIA device/driver가 없어 실제 host 수집 loop는 확인하지 못함. |
 
 ## 검증 결과
 
-2026-05-15 로컬 검증:
+2026-05-15 cleanup 후 로컬 검증:
 
 ```sh
 git status --short
-uv run pytest
 uv run ruff check
 uv run ruff format --check
 uv run mypy
-uv build --out-dir /tmp/gua-dist-check-20260515
-bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-check-20260515/gpu_usage_audit-0.4.1-py3-none-any.whl
+uv run pytest
+uv build --out-dir /tmp/gua-dist-prune-20260515
+bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-prune-20260515/gpu_usage_audit-0.4.1-py3-none-any.whl
 env GITHUB_REF_NAME=v0.4.1 uv run python scripts/check-tag-version.py
 ```
 
 결과:
 
-- `git status --short`: 점검 시작 시 변경 없음. 문서 작성 후에는
-  `status.ko.md`, `handoff.ko.md`가 새 파일로 남아 있음.
-- `pytest`: 118 passed.
+- `git status --short`: cleanup 변경분만 존재.
 - `ruff check`: pass.
-- `ruff format --check`: 28 files already formatted.
-- `mypy`: no issues in 27 source files.
+- `ruff format --check`: 26 files already formatted.
+- `mypy`: no issues in 25 source files.
+- `pytest`: 107 passed.
 - `uv build`: sdist/wheel build 성공.
 - wheel smoke: 성공.
 - tag-version check: `v0.4.1`과 `pyproject.toml` version 일치.
 
+## 이번 cleanup 변경
+
+- `proposals/design-auto-runtime*.md` 삭제.
+- `projects/auto-runtime-audit/plan*.md` 삭제.
+- `src/gpu_usage_audit/env.py`와 `tests/test_env.py` 삭제.
+- `daemon`/`demo`는 1.0 계약대로 host `env_kind`를 `"bare"`로 직접 기록.
+- `RuntimePlan` 모델 제거. `gua doctor`는 내부 `DoctorPlan`으로 host/unsupported,
+  reasons, blockers, warnings만 유지.
+- `DoctorPlan` JSON에서 post-1.0 placeholder였던 `scheduler`, `telemetry`,
+  `confidence`, `required_privileges`, `actions` 필드 제거.
+
 ## 로컬 `doctor` 상태
 
 현재 개발 머신은 NVIDIA host가 아니므로 `uv run gua doctor --json`은
diff --git a/proposals/design-auto-runtime.ko.md b/proposals/design-auto-runtime.ko.md
deleted file mode 100644
index e733b68..0000000
--- a/proposals/design-auto-runtime.ko.md
+++ /dev/null
@@ -1,1144 +0,0 @@
-# gpu-usage-audit 자동 런타임 설계
-
-상태: 초안
-작성일: 2026-05-12
-
-## 개요
-
-`gpu-usage-audit`는 사용자가 현재 머신이 베어메탈인지, Kubernetes인지,
-컨테이너 런타임 호스트인지, Slurm compute node인지 몰라도 시작할 수 있는
-도구가 되어야 한다.
-
-목표 UX:
-
-```sh
-gua doctor
-gua start
-
-# 며칠 뒤
-gua status
-gua report --since 3d
-gua stop
-```
-
-제품은 적절한 collector 실행 방식을 자동으로 감지해야 한다. 단, 그 결정을
-숨기면 안 된다. 사용자는 배포 모델을 미리 알 필요가 없어야 하지만, `gua`는
-무엇을 선택했고 왜 그렇게 판단했는지 명확히 보여줘야 한다.
-
-예:
-
-```text
-Detected environment:
-  host NVML: initialized, GPU count=0
-  kubernetes: available
-  k8s NVIDIA runtime: available
-  slurm: not detected
-
-Recommended plan:
-  runtime: k8s-daemonset
-  telemetry: nvml
-  scheduler: k8s
-
-Reason:
-  GPUs are not visible from the host namespace, but they are visible inside
-  Kubernetes containers with NVIDIA_VISIBLE_DEVICES=all.
-```
-
-이것이 제품의 주요 변화다. `daemon`은 저수준 collector로 남기고,
-`gua start`가 launcher/orchestrator 역할을 맡는다.
-
-## 동기와 차별점
-
-이 프로젝트가 가져가야 할 영역은 raw GPU telemetry 그 자체가 아니다.
-DCGM exporter, `nvidia-smi`, 여러 Grafana dashboard는 이미 utilization,
-memory, temperature, process-level fact를 잘 보여준다. Slurm accounting,
-Kubernetes metadata, cluster dashboard도 scheduler-side allocation과
-ownership을 보여준다.
-
-비어 있는 영역은 둘을 retrospective하게 join한 뷰다.
-
-```text
-누가 GPU를 할당받았고, 그 GPU가 실제로 유의미한 일을 했는가?
-scheduler allocation 없이 GPU를 사용한 주체는 누구인가?
-어떤 GPU가 memory-held 상태였지만 compute-idle이었는가?
-어떤 GPU가 할당됐지만 의미 있는 GPU process가 전혀 없었는가?
-```
-
-이 combined view가 핵심 가치다. 따라서 `gpu-usage-audit`는 또 하나의 live
-GPU monitor가 되면 안 된다. 실제 NVML 관측과 scheduler context를 결합하는
-가벼운 retrospective audit 도구가 되어야 한다.
-
-가장 중요한 headline class는 다음이다.
-
-```text
-allocated-idle-held     # scheduler가 할당했고, process가 memory를 잡았지만 compute는 차가움
-allocated-unused        # scheduler가 할당했지만, NVML상 의미 있는 사용이 없음
-unallocated-active      # scheduler allocation 없이 GPU가 사용됨
-unallocated-idle-held   # scheduler allocation 없이 GPU memory가 잡힘
-```
-
-Kubernetes에서 `nvidia.com/gpu` request 없이 `NVIDIA_VISIBLE_DEVICES=all`이
-있는 pod는 first-class anomaly다. 이 pod는 scheduler accounting에 잡히지
-않는 GPU 접근 권한을 가질 수 있다. 이는 표준 GPU telemetry나 kube-state류
-metadata만으로는 만들어지지 않는 신호다.
-
-## 제품 목표
-
-1. **첫 사용에 환경 지식이 필요 없어야 한다**
-   - 사용자는 node가 베어메탈인지, k8s인지, Docker인지, Slurm인지 몰라도
-     `gua doctor`나 `gua start`를 실행할 수 있어야 한다.
-
-2. **마법처럼 숨기지 말고 투명해야 한다**
-   - auto mode는 선택한 plan, 판단 이유, 필요한 권한, 저장 위치, cleanup
-     명령을 출력해야 한다.
-   - 고급 사용자는 `--mode host`, `--mode k8s`, `--mode slurm`,
-     `--mode container`로 명시 override할 수 있어야 한다.
-
-3. **Retrospective audit이 우선이다**
-   - 핵심 가치는 "지난 N시간/일 동안 무엇이 있었는가?"다.
-   - live dashboard, quota, scheduling decision, remediation은 첫 제품
-     표면이 아니다.
-
-4. **실제 GPU 사용과 scheduler allocation을 모두 측정한다**
-   - NVML은 "GPU가 일을 하고 있는가, memory를 잡고 있는가?"에 답한다.
-   - k8s/Slurm은 "이 GPU가 workload에 할당됐는가?"에 답한다.
-   - report는 둘을 결합해야 한다.
-
-5. **운영 부담이 낮아야 한다**
-   - 기본 저장소는 SQLite로 유지한다.
-   - 기본 경로에는 database service, web server, Prometheus, Grafana가
-     필요하지 않아야 한다.
-
-6. **실패 모드가 좋아야 한다**
-   - `gua`가 실행될 수 없다면 driver, NVML, device visibility, container
-     runtime, kubectl auth, Slurm config, permission 중 어느 층이 실패했는지
-     말해야 한다.
-
-## 비목표
-
-- Slurm, Kubernetes, DCGM, Prometheus, Grafana, Open OnDemand, cluster
-  dashboard를 대체하지 않는다.
-- quota를 enforce하거나 job을 kill하지 않는다.
-- workload scheduling을 하지 않는다.
-- 최소 제품에서 central server를 요구하지 않는다.
-- 모든 설치를 silent하게 만들지 않는다. system 또는 cluster 상태 변경은
-  명시적이어야 한다.
-
-## 지원 환경 분류
-
-### 1. 베어메탈 host
-
-전형적인 형태:
-
-```text
-/dev/nvidia0..N 이 host에서 보임
-host NVML init 성공
-host NVML device count > 0
-scheduler가 없거나 scheduler context 비활성
-```
-
-Runtime:
-
-```text
-runtime: host-systemd or host-foreground
-telemetry: nvml
-scheduler: none
-```
-
-현재 프로젝트와 가장 가까운 형태다.
-
-### 2. Kubernetes / GPU Operator
-
-전형적인 형태:
-
-```text
-host에는 /dev/nvidiactl만 보일 수 있음
-host NVML device count가 0일 수 있음
-GPU device는 pod 안에 inject됨
-runtimeClassName=nvidia가 있을 수 있음
-NVIDIA_VISIBLE_DEVICES가 device 노출을 제어함
-```
-
-Runtime:
-
-```text
-runtime: k8s-daemonset
-telemetry: nvml
-scheduler: k8s
-```
-
-GPU가 container namespace 안에서만 보일 수 있으므로 collector는 Kubernetes
-안에서 실행되어야 한다.
-
-사용자가 Docker를 직접 build하거나 run할 필요는 없어야 한다. 제품 내부에서
-공식 OCI image를 사용하는 것은 괜찮다.
-
-### 3. Slurm compute node
-
-전형적인 형태:
-
-```text
-host /dev/nvidia0..N 이 보임
-Slurm이 GPU를 GRES로 관리함
-job이 --gres=gpu:N 또는 --gpus=N 으로 GPU를 요청함
-Slurm이 job step 안에 CUDA_VISIBLE_DEVICES를 설정함
-cgroup이 visible device file을 제한할 수 있음
-```
-
-Runtime:
-
-```text
-runtime: host-systemd or host-foreground
-telemetry: nvml
-scheduler: slurm
-```
-
-Slurm 지원의 핵심은 NVML을 동작시키는 것이 아니다. NVML 사용 상태와 Slurm
-allocation state를 결합하는 것이다.
-
-### 4. 로컬 컨테이너 런타임
-
-전형적인 형태:
-
-```text
-host command를 직접 실행할 수 없거나 직접 실행하면 안 됨
-docker/podman이 NVIDIA container를 실행할 수 있음
-docker run --gpus all ... 에서 GPU가 보임
-```
-
-Runtime:
-
-```text
-runtime: local-container
-telemetry: nvml
-scheduler: none
-```
-
-fallback으로는 유용하지만, 기본 UX가 되어서는 안 된다.
-
-## 핵심 아키텍처
-
-collector와 report 코드 전체에 환경 분기를 퍼뜨리면 안 된다. 제품을 세 축으로
-분리한다.
-
-```text
-1. Collector Runtime
-   collector process가 어디에서 실행되는가?
-
-2. Telemetry Source
-   실제 GPU 상태를 어떻게 읽는가?
-
-3. Scheduler Context
-   GPU가 누구에게 예약/할당되었는가?
-```
-
-구체적 조합:
-
-| Environment | Runtime | Telemetry | Scheduler |
-|---|---|---|---|
-| Bare metal | host-systemd | nvml | none |
-| Kubernetes / GPU Operator | k8s-daemonset | nvml | k8s |
-| Slurm | host-systemd | nvml | slurm |
-| Docker-only | local-container | nvml | none |
-| Demo/test | foreground | fake | none/fake |
-
-중요한 규칙:
-
-```text
-Kubernetes와 Slurm은 telemetry source가 아니다.
-telemetry source는 여전히 NVML이다.
-Kubernetes와 Slurm은 runtime placement와 allocation context를 제공한다.
-```
-
-## CLI 설계
-
-### 기본 명령
-
-```text
-gua doctor
-gua start
-gua status
-gua report
-gua stop
-gua uninstall
-```
-
-### 저수준 명령
-
-아래 명령은 유지할 수 있지만 첫 사용 UX의 중심이 되어서는 안 된다.
-
-```text
-gua daemon run
-gua daemon export
-gua db inspect
-```
-
-현재 `gpu-usage-audit daemon`과 `gpu-usage-audit report`는 migration 기간에
-compatibility alias로 남길 수 있다.
-
-### `gua doctor`
-
-읽기 전용 환경 진단 명령이다.
-
-기본 출력은 사람이 읽기 쉬운 형태다. 자동화에는 `--json`을 사용한다.
-
-예:
-
-```sh
-gua doctor
-gua doctor --json
-gua doctor --mode k8s
-```
-
-Doctor가 확인할 항목:
-
-- OS, kernel, Python, uv/pipx 가용성
-- `/dev/nvidia*`
-- host NVML load/init/device count
-- `/run/nvidia/driver` 아래 GPU Operator staged NVML
-- staged NVML path를 host mode에 써야 하는지 여부
-- `nvidia-smi` 존재 여부
-- `kubectl` 가용성과 인증 상태
-- k8s runtime class
-- k8s GPU pod/DaemonSet
-- 필요한 k8s resource를 만들 수 있는 권한
-- Slurm command와 node GRES
-- Docker/Podman NVIDIA runtime fallback
-
-Doctor는 `RuntimePlan`을 만든다.
-
-### `gua start`
-
-기본 mode는 `auto`다.
-
-```sh
-gua start
-gua start --mode auto
-gua start --mode host
-gua start --mode k8s
-gua start --mode slurm
-gua start --mode container
-gua start --dry-run
-gua start --yes
-```
-
-동작:
-
-1. doctor를 실행한다.
-2. runtime plan을 선택한다.
-3. plan을 출력한다.
-4. system이나 cluster 상태를 변경하는 작업이라면 TTY에서 확인을 받는다.
-5. install state를 local에 저장한다.
-
-예:
-
-```text
-Plan:
-  mode: k8s-daemonset
-  namespace: gpu-usage-audit
-  image: ghcr.io/AI-Ocean/gpu-usage-audit:0.4.0
-  db: hostPath /var/lib/gpu-usage-audit/gua.db
-  nodes: GPU-capable nodes
-  cleanup: gua stop --mode k8s
-
-Continue? [y/N]
-```
-
-### `gua status`
-
-설치/실행 중인 collector 상태를 보여준다.
-
-```text
-mode: k8s-daemonset
-collectors:
-  gpusystem: running, last sample 12s ago, GPUs visible=10
-  ds02: running, last sample 10s ago, GPUs visible=4
-storage:
-  per-node SQLite under /var/lib/gpu-usage-audit/gua.db
-```
-
-### `gua report`
-
-기본적으로 저장된 install state를 사용한다.
-
-```sh
-gua report --since 24h
-gua report --since 3d --node gpusystem
-gua report --since 3d --all-nodes
-gua report --db /var/lib/gpu-usage-audit/gua.db --since 3d
-```
-
-k8s에서는 사용자가 DB 위치를 알 필요가 없어야 한다. CLI가 collector pod를
-발견하고 `kubectl exec` 등을 통해 export stream을 받아 local에서 집계할 수
-있다.
-
-### `gua stop`과 `gua uninstall`
-
-`stop`은 기본적으로 collector를 멈추되 data는 보존해야 한다.
-
-`uninstall`은 설치된 resource를 제거하고, 선택적으로 data도 지울 수 있다.
-
-```sh
-gua stop
-gua uninstall
-gua uninstall --delete-data
-```
-
-## RuntimePlan 인터페이스
-
-detector는 바로 실행하지 말고 구조화된 plan을 만들어야 한다.
-
-개념 모델:
-
-```python
-class RuntimePlan:
-    mode: Literal[
-        "host-systemd",
-        "host-foreground",
-        "k8s-daemonset",
-        "local-container",
-        "unsupported",
-    ]
-    telemetry: Literal["nvml", "fake"]
-    scheduler: Literal["none", "k8s", "slurm"]
-    confidence: Literal["high", "medium", "low"]
-    reasons: list[str]
-    blockers: list[str]
-    warnings: list[str]
-    required_privileges: list[str]
-    actions: list[PlannedAction]
-```
-
-Runtime adapter가 plan을 소비한다.
-
-```text
-HostRuntimeAdapter
-K8sRuntimeAdapter
-ContainerRuntimeAdapter
-```
-
-Scheduler adapter는 snapshot을 enrich한다.
-
-```text
-NoSchedulerAdapter
-K8sSchedulerAdapter
-SlurmSchedulerAdapter
-```
-
-Telemetry adapter는 hardware fact를 만든다.
-
-```text
-NVMLTelemetry
-FakeTelemetry
-```
-
-## 감지 순서
-
-Auto mode는 모든 GPU를 볼 수 있는 가장 덜 놀라운 runtime을 선호해야 한다.
-
-제안 순서:
-
-1. Host NVML
-   - host NVML이 GPU를 보면 host runtime은 viable하다.
-   - Slurm이 감지되면 scheduler context는 `slurm`이다.
-   - 아니면 scheduler context는 `none`이다.
-   - host NVML이 version mismatch로 실패했지만 `/run/nvidia/driver` 아래
-     GPU Operator staged NVML이 있으면 plan에 host runtime remediation을
-     기록한다.
-     - pynvml import 전에 `LD_LIBRARY_PATH`를 prepend하여 re-exec하거나,
-     - collector 시작 전에 library path를 설정하는 작은 launcher wrapper를
-       사용한다.
-     pynvml/libnvidia-ml이 이미 load된 뒤 `LD_LIBRARY_PATH`를 바꾸는 것은
-     충분하지 않다.
-
-2. Kubernetes
-   - host NVML이 GPU를 보지 못하지만 k8s가 있고 NVIDIA runtime이 pod 안에
-     GPU를 노출할 수 있으면 `k8s-daemonset`을 사용한다.
-   - `node.status.capacity["nvidia.com/gpu"]`만 믿지 않는다. 일부 cluster는
-     accounting이 unusual/custom이어도 pod 안에 GPU를 노출한다.
-
-3. Local container runtime
-   - Docker/Podman이 all GPU를 가진 NVIDIA container를 실행할 수 있으면
-     `local-container`를 사용한다.
-
-4. Unsupported
-   - 가장 가까운 viable path를 설명한다.
-
-중요: detection은 package를 설치하거나 cluster를 변경하면 안 된다.
-
-## Kubernetes Runtime 설계
-
-### 설치 형태
-
-최소 설치:
-
-```text
-Namespace: gpu-usage-audit
-DaemonSet: gpu-usage-audit
-ServiceAccount: gpu-usage-audit
-ConfigMap: collector config
-hostPath DB: /var/lib/gpu-usage-audit/gua.db
-```
-
-DaemonSet 요구사항:
-
-```yaml
-runtimeClassName: nvidia
-hostPID: true
-env:
-  - name: NVIDIA_VISIBLE_DEVICES
-    value: all
-  - name: NVIDIA_DRIVER_CAPABILITIES
-    value: compute,utility
-```
-
-가능한 mount:
-
-```text
-/var/lib/gpu-usage-audit         read-write DB hostPath
-/proc                            read-only host process metadata, if needed
-/var/lib/kubelet/pod-resources   read-only pod resources socket, if available
-```
-
-`hostPID: true`는 node-wide process attribution에 중요하다. NVML은 GPU
-process PID를 보고할 수 있지만, host PID visibility가 없으면 collector가
-그 PID를 `/proc/<pid>/cgroup`으로 다시 매핑하지 못할 수 있다.
-
-기본값은 `hostPID: true`가 되어야 하며 opt-out을 제공한다. 일부 cluster는
-restricted Pod Security profile을 강제하므로
-`gua start --mode k8s --no-host-pid`가 가능해야 한다. 단, plan은
-process-to-pod attribution이 약해진다고 명확히 말해야 한다.
-
-DaemonSet은 기본적으로 모든 node가 아니라 GPU-capable node만 대상으로 해야
-한다. 선호 selector:
-
-```text
-nvidia.com/gpu.present=true
-feature.node.kubernetes.io/pci-10de.present=true
-```
-
-GPU Feature Discovery / Node Feature Discovery label이 없다면 더 넓은
-DaemonSet을 설치한 뒤 collector self-check로 fallback할 수 있다.
-
-### Kubernetes Allocation Context
-
-k8s adapter는 세 데이터 source를 결합해야 한다.
-
-1. Kubernetes API
-   - Pod, namespace, node name, owner reference, resource request/limit.
-
-2. Kubelet PodResources API
-   - 어떤 pod/container가 어떤 GPU device ID를 받았는지에 대한 가장 좋은
-     source.
-
-3. Host `/proc/<pid>/cgroup`
-   - 관측된 GPU process PID를 pod/container로 매핑하는 가장 좋은 source.
-
-이 구분이 중요한 이유는, 관측한 cluster에 다음 형태의 pod가 있었기 때문이다.
-
-```text
-NVIDIA_VISIBLE_DEVICES=all
-no nvidia.com/gpu request
-all GPUs visible inside the container
-```
-
-이 pod들은 scheduler accounting이 깨끗하게 표현하지 못하는 방식으로 GPU를
-쓸 수 있다.
-
-adapter는 다음을 명시적으로 감지해야 한다.
-
-```text
-NVIDIA_VISIBLE_DEVICES=all
-NVIDIA_VISIBLE_DEVICES=<GPU UUID list>
-no nvidia.com/gpu request or limit
-```
-
-이는 raw environment variable로만 저장하지 말고 scheduler-accounting
-anomaly로 표면화해야 한다.
-
-### Cgroup 호환성
-
-Process attribution은 `/proc/<pid>/cgroup`에 의존하지만 cgroup v1과 unified
-cgroup v2는 path 표현이 다르다. Kubernetes와 Slurm 배포 모두 cgroup v2로
-이동하는 추세다.
-
-parser는 k8s adapter와 Slurm adapter가 공유하는 module이어야 한다. 지원할
-항목:
-
-```text
-cgroup v1 controller-specific lines
-cgroup v2 unified `0::/path` lines
-systemd slice escaping
-containerd / CRI-O pod and container IDs
-Slurm job_<id> and step_<id> paths
-```
-
-process-to-owner attribution 구현 전에 이 결정을 내려야 한다.
-
-### Kubernetes Report 의미론
-
-report는 scheduler allocation과 실제 GPU state를 모두 보여줘야 한다.
-
-```text
-allocated-active
-allocated-idle-held
-allocated-unused
-unallocated-active
-unallocated-idle-held
-truly-idle
-```
-
-정의:
-
-```text
-allocated-unused = scheduler가 GPU를 할당했지만 의미 있는 NVML process/memory가 없음
-unallocated-active = NVML상 사용이 있지만 scheduler allocation이 없거나 알 수 없음
-unallocated-idle-held = scheduler allocation 없이 memory가 잡힘
-truly-idle = allocation도 없고 의미 있는 NVML 사용도 없음
-```
-
-## Slurm Runtime 설계
-
-Slurm은 일반적으로 GPU를 GRES로 관리한다.
-
-중요한 Slurm 사실:
-
-- GPU는 보통 `Name=gpu`인 GRES로 설정된다.
-- job은 `--gres=gpu:N`, `--gpus=N` 또는 관련 flag로 GPU를 요청한다.
-- Slurm은 job step에 `CUDA_VISIBLE_DEVICES`를 설정한다.
-- Slurm은 cgroup으로 visible device file을 제한할 수 있다.
-- Slurm은 `gres.conf`에서 NVML을 통해 NVIDIA GPU를 autodetect할 수 있다.
-
-Slurm 지원은 다음으로 다뤄야 한다.
-
-```text
-runtime: host-systemd
-telemetry: nvml
-scheduler: slurm
-```
-
-collector는 user job 밖에서 compute node 위에 실행된다. 실제 GPU 사용은
-NVML로 읽고, allocation context는 Slurm에서 읽는다.
-
-### Slurm 감지 신호
-
-```text
-scontrol exists
-sinfo exists
-slurmd process or service exists
-/etc/slurm/slurm.conf or $SLURM_CONF exists
-scontrol show node <hostname> reports Gres or CfgTRES with gpu
-```
-
-### Slurm Allocation Context
-
-초기 adapter source:
-
-```text
-scontrol show node <node>
-squeue -h -w <node>
-scontrol show job -d <jobid>
-sacct, when available
-/proc/<pid>/cgroup for job_<id> or step_<id>
-```
-
-MVP가 지원해야 할 것:
-
-- 이 node에서 실행 중인 job.
-- 각 job의 user.
-- 각 job이 요청한 GPU 수.
-- 가능하면 할당된 GPU device ID 또는 UUID.
-- cgroup을 통한 GPU PID -> Slurm job ID 매핑.
-
-Slurm이 exact GPU ID를 노출하지 않는 경우 첫 버전에서는 per-GPU allocation을
-`allocated-unknown-gpu`로 표시해도 된다.
-
-## Data Model V2
-
-현재 schema는 hardware sample과 process sample을 담는다. 여전히 유용하지만,
-scheduler allocation은 first-class storage가 필요하다.
-
-제안 table:
-
-### `node`
-
-```text
-node_id
-hostname
-first_seen
-last_seen
-runtime_mode       # host-systemd / k8s-daemonset / local-container
-scheduler_kind     # none / k8s / slurm
-driver_version
-collector_version
-```
-
-### `gpu_sample`
-
-```text
-ts
-node_id
-gpu_uuid
-gpu_index
-parent_uuid          # nullable, set for MIG instances or virtual slices
-mig_profile          # nullable, e.g. 1g.5gb
-share_id             # nullable, for MIG/vGPU/time-slicing/MPS-style slices
-bus_id
-util_pct
-mem_used_mb
-mem_total_mb
-```
-
-### `gpu_process_sample`
-
-```text
-ts
-node_id
-gpu_uuid
-pid
-process_name
-mem_used_mb
-loginuid_user
-owner_key          # nullable, references observed owner if resolved
-```
-
-### `allocation_sample`
-
-```text
-ts
-node_id
-scheduler_kind     # k8s / slurm
-gpu_uuid           # nullable if exact GPU unknown
-parent_uuid        # nullable, physical GPU for MIG/vGPU/shared allocations
-owner_kind         # k8s_pod / slurm_job
-owner_key          # stable ID: namespace/name or job ID
-owner_name
-namespace
-user_name
-account
-requested_gpus
-share_fraction     # nullable, for fractional/shared GPU allocation
-allocation_state   # allocated / released / unknown
-raw_ref
-```
-
-### `owner_sample`
-
-정규화된 report에 유용한 optional table:
-
-```text
-ts
-owner_kind
-owner_key
-owner_name
-namespace
-user_name
-account
-labels_json
-```
-
-### Migration
-
-기존 DB는 legacy mode로 읽을 수 있다.
-
-```text
-scheduler_kind = none
-allocation state = unknown
-```
-
-report는 기존 DB에서도 계속 동작해야 한다.
-
-### Retention과 Rollup
-
-Raw process sample은 빠르게 커질 수 있다. 바쁜 node는 tick마다 많은 row를
-만들 수 있다.
-
-```text
-1 Hz * 10 GPUs * 50 GPU processes = 500 process rows/sec
-```
-
-SQLite는 유용한 short-term window를 감당할 수 있지만, 긴 retention에는 명시적
-정책이 필요하다. 기본 저장소는 운영 모델을 단순하게 유지해야 한다.
-
-```text
-raw samples:       7-14 days by default
-1-minute rollups:  90 days by default
-5-minute rollups:  optional long-term retention
-```
-
-제안 rollup table:
-
-```text
-gpu_rollup_1m
-owner_rollup_1m
-allocation_rollup_1m
-```
-
-Rollup은 평균 utilization만 보존하면 안 되고 combined class를 보존해야 한다.
-그렇지 않으면 `allocated-unused` 같은 핵심 신호가 downsampling 중 사라진다.
-
-## Classification Model
-
-기존 hardware classification은 유지한다.
-
-```text
-util >= 10                  -> active
-util <  10 and mem > 100    -> idle-held
-util <  10 and mem <= 100   -> truly-idle
-```
-
-scheduler allocation을 추가한다.
-
-```text
-allocation known and present -> allocated
-allocation absent            -> unallocated
-allocation unavailable       -> unknown
-```
-
-Combined class:
-
-| Allocation | Hardware | Combined |
-|---|---|---|
-| allocated | active | allocated-active |
-| allocated | idle-held | allocated-idle-held |
-| allocated | truly-idle | allocated-unused |
-| unallocated | active | unallocated-active |
-| unallocated | idle-held | unallocated-idle-held |
-| unallocated | truly-idle | truly-idle |
-| unknown | active | active |
-| unknown | idle-held | idle-held |
-| unknown | truly-idle | truly-idle |
-
-이 모델은 기존 report 의미를 유지하면서 k8s/Slurm 가치를 추가한다.
-
-## Storage와 Reporting 전략
-
-### Single Node
-
-기본:
-
-```text
-/var/lib/gpu-usage-audit/gua.db
-```
-
-user-mode/foreground fallback:
-
-```text
-~/.local/share/gpu-usage-audit/gua.db
-```
-
-### Kubernetes
-
-MVP:
-
-- node마다 hostPath 기반 SQLite DB 하나.
-- `gua report`가 collector pod를 발견한다.
-- `gua report`가 각 collector pod 안에서
-  `gua daemon export --format jsonl`을 실행하고 local에서 집계한다.
-
-이 방식은 central database나 service를 피할 수 있지만 한계가 있다.
-
-- `pods/exec` RBAC는 종종 제한된다.
-- 많은 node를 sequential exec하면 느리다.
-- 큰 export에는 streaming, compression, time-window filtering이 필요하다.
-
-report 구현은 병렬 fan-out을 해야 하고 필요한 time window만 요청해야 한다.
-또한 alternative export path를 지원해야 한다.
-
-나중:
-
-- 각 collector pod의 read-only HTTP export endpoint.
-- `kubectl port-forward` 기반 report collection.
-- cluster-internal aggregator Job.
-- optional central PVC.
-- optional Prometheus/exporter mode.
-- optional object storage export.
-
-### Slurm
-
-MVP:
-
-- compute node마다 SQLite DB 하나.
-- 먼저 local node report를 지원한다.
-
-나중:
-
-- Slurm controller-side aggregator.
-- `gua report --partition` 또는 `--nodes`.
-
-## Packaging과 Installation
-
-### 기본 CLI 설치
-
-권장:
-
-```sh
-uv tool install gpu-usage-audit
-```
-
-또는:
-
-```sh
-pipx install gpu-usage-audit
-```
-
-첫 사용 마찰을 줄이기 위해 `nvidia-ml-py`를 optional extra가 아니라 기본
-dependency로 둘지 검토한다. 작고, GPU audit 도구가 NVML binding 누락으로 첫
-실행에서 실패하는 것은 좋지 않다.
-
-### OCI Image
-
-k8s runtime에는 필요하다.
-
-```text
-ghcr.io/AI-Ocean/gpu-usage-audit:<version>
-ghcr.io/AI-Ocean/gpu-usage-audit:latest
-```
-
-사용자가 Docker를 직접 실행할 필요는 없다. image는 k8s runtime adapter가
-사용하는 내부 구현 디테일이다.
-
-### Kubernetes 설치
-
-초기 구현은 Python package 안에 manifest template을 내장할 수 있다.
-
-나중:
-
-- GitHub Releases에 standalone YAML 게시.
-- Helm chart 게시.
-
-### One-Line Installer
-
-나중에 가능한 UX:
-
-```sh
-curl -Ls https://github.com/AI-Ocean/gpu-usage-audit/releases/latest/download/install.sh | sh
-```
-
-이는 CLI만 설치해야 한다. systemd service나 k8s DaemonSet을 조용히 설치하면
-안 된다.
-
-## Security와 Permission
-
-### Host Mode
-
-필요:
-
-- NVML 접근.
-- `/proc/<pid>/loginuid`와 cgroup metadata read 권한.
-- DB directory write 권한.
-- systemd install에는 root 필요.
-
-테스트용으로 non-root foreground mode를 지원해야 한다.
-
-### Kubernetes Mode
-
-필요:
-
-- namespace, service account, configmap, daemonset, RBAC를 만들 수 있는 권한.
-- target node의 모든 GPU에 접근할 수 있는 runtime 권한.
-- pod와 node metadata read 권한.
-- process attribution을 위한 hostPID와 read-only `/proc` 접근 가능성.
-- SQLite DB를 위한 hostPath write 권한.
-- exec 기반 export를 쓸 경우 `gua report`용 optional `pods/exec`.
-
-install plan은 resource를 적용하기 전에 이 권한들을 출력해야 한다.
-
-collector의 최소 RBAC는 다음에서 시작한다.
-
-```text
-get/list/watch pods
-get/list/watch nodes
-```
-
-`pods/exec`는 report-side에만 필요하며 collector 자체에는 필요하지 않아야
-한다.
-
-### Slurm Mode
-
-필요:
-
-- Host NVML 접근.
-- Slurm command/config/accounting read 접근.
-- process cgroup read 접근.
-- systemd install에는 보통 admin 권한 필요.
-
-Slurm job user가 node-wide collector를 설치한다고 기대하면 안 된다.
-
-## 구현 마일스톤
-
-### M0: 집중 ADR
-
-넓은 구현 전에 위험도가 높은 세부사항에 대해 짧은 architecture decision
-record를 작성한다.
-
-- GPU Operator staged NVML loading과 host-mode re-exec.
-- MIG, vGPU, MPS, time-slicing 표현.
-- cgroup v1/v2 parser와 owner attribution.
-- k8s report export path: `pods/exec` vs HTTP endpoint vs aggregator.
-
-### M1: Doctor와 RuntimePlan
-
-아직 collection 동작은 바꾸지 않는다.
-
-Deliver:
-
-- `gua doctor`
-- host NVML/device check
-- k8s check
-- Slurm check
-- structured JSON output
-- recommended plan
-
-환경 가정을 설치 없이 검증하므로 가장 leverage가 높은 milestone이다.
-
-### M2: Schema V2와 Combined Report Model
-
-Deliver:
-
-- migration-safe DB schema
-- allocation table
-- combined classes
-- fake scheduler tests
-- old DB compatibility
-- retention and rollup policy
-
-이것이 차별화 기능이다. 모든 runtime adapter가 같은 model을 target할 수
-있도록 일찍 들어가야 한다.
-
-### M3: CLI Surface와 State
-
-Deliver:
-
-- `gua start --dry-run`
-- `gua status`
-- local state file
-- 기존 command compatibility alias
-
-아직 k8s install은 하지 않는다.
-
-### M4: Kubernetes Runtime Adapter
-
-Deliver:
-
-- official OCI image
-- embedded DaemonSet manifest
-- `gua start --mode k8s`
-- `gua stop --mode k8s`
-- parallel, windowed export 기반 collector pod report
-
-관측한 GPU Operator 환경을 해결한다.
-
-### M5: Kubernetes Scheduler Adapter
-
-Deliver:
-
-- pod/process attribution
-- 가능한 경우 PodResources API integration
-- namespace/pod/user별 report
-- GPU request 없는 `NVIDIA_VISIBLE_DEVICES=all` pod 탐지
-- unrequested GPU access anomaly headline
-
-### M6: Host Runtime Adapter
-
-Deliver:
-
-- systemd unit install
-- foreground mode
-- host preflight
-- GPU Operator staged NVML re-exec 또는 명확한 diagnostic
-
-### M7: Slurm Scheduler Adapter
-
-Deliver:
-
-- Slurm detection
-- job allocation snapshot
-- cgroup 기반 process-to-job mapping
-- best-effort exact GPU-to-job mapping
-- job/user/account별 report
-
-### M8: Documentation과 Release Polish
-
-Deliver:
-
-- quickstart
-- architecture docs
-- troubleshooting matrix
-- wheel + OCI image release workflow
-- optional Helm chart
-
-## 현재 서버 해석
-
-관측한 `gpusystem` 서버는 다음에 해당한다.
-
-```text
-runtime: k8s-daemonset
-telemetry: nvml
-scheduler: k8s
-```
-
-이유:
-
-- Host에는 `/dev/nvidiactl`만 보인다.
-- Host NVML은 device를 보지 못한다.
-- Kubernetes workload container 안에서는 `/dev/nvidia0..9`가 보인다.
-- 일부 pod는 `runtimeClassName=nvidia`와 `nvidia.com/gpu` request를 쓴다.
-- 일부 pod는 GPU request 없이 `NVIDIA_VISIBLE_DEVICES=all`을 노출한다.
-
-이 환경이 바로 runtime placement와 scheduler context를 분리해야 하는 이유다.
-
-## Open Questions
-
-제안 결정:
-
-1. `nvidia-ml-py`는 기본 dependency가 되어야 한다.
-2. k8s DaemonSet은 `hostPID: true`를 기본값으로 하고 `--no-host-pid` opt-out을
-   제공한다.
-3. k8s install은 기본적으로 GPU-capable node만 target해야 한다.
-4. collector RBAC는 read-only로 시작한다: pods와 nodes. `pods/exec`는
-   exec 기반 report transport에만 필요하다.
-5. `gua report`는 local state가 node-scoped일 때 current node를 기본값으로
-   하고, cluster report에는 `--all-nodes`를 제공한다.
-6. Slurm MVP는 detection, node-level job allocation, cgroup PID-to-job mapping을
-   포함해야 한다. Exact GPU-to-job mapping은 best effort다.
-7. MIG field는 schema v2에 미리 들어가야 한다. report는 초기에는 MIG를 일반
-   GPU-like device처럼 다뤄도 된다.
-8. `gua`를 primary command로 둔다. `gpu-usage-audit`는 compatibility alias로
-   유지한다.
-
-아직 열려 있는 질문:
-
-1. 첫 k8s report transport는 `pods/exec`, HTTP export, 또는 둘 다 중 무엇인가?
-2. 바쁜 node에서 acceptable한 기본 raw retention window는 얼마인가?
-3. rollup은 collector process에서 계산할 것인가, report/export 시점에 계산할
-   것인가?
-4. HAMi/vGPU/time-slicing의 fractional sharing을 scheduler 간 어떻게 정규화할
-   것인가?
-
-## 참고 자료
-
-- NVIDIA DCGM Exporter deployment patterns:
-  https://docs.nvidia.com/datacenter/dcgm/latest/gpu-telemetry/dcgm-exporter.html
-- NVIDIA Container Toolkit GPU environment variables:
-  https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.18.1/docker-specialized.html
-- NVIDIA GPU Operator overview:
-  https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
-- NVIDIA GPU Operator CDI and GPU Management Containers:
-  https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/cdi.html
-- Kubernetes Device Plugins:
-  https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/
-- Kubernetes kubelet files and Pod Resources API path:
-  https://kubernetes.io/docs/reference/node/kubelet-files/
-- Slurm GRES GPU scheduling:
-  https://slurm.schedmd.com/gres.html
-- Slurm `gres.conf`:
-  https://slurm.schedmd.com/gres.conf.html
-- Slurm cgroups:
-  https://slurm.schedmd.com/cgroups.html
-- Jeon et al., "Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN
-  Training Workloads", USENIX ATC 2019:
-  https://www.usenix.org/conference/atc19/presentation/jeon
-- Hu et al., "Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for
-  Deep Learning Training Jobs", ASPLOS 2023:
-  https://doi.org/10.1145/3575693.3575705
diff --git a/proposals/design-auto-runtime.md b/proposals/design-auto-runtime.md
deleted file mode 100644
index b0ae944..0000000
--- a/proposals/design-auto-runtime.md
+++ /dev/null
@@ -1,1145 +0,0 @@
-# gpu-usage-audit auto-runtime design
-
-Status: draft
-Date: 2026-05-12
-
-## Summary
-
-`gpu-usage-audit` should become a tool that a user can start without knowing
-whether the machine is bare metal, Kubernetes, a container runtime host, or a
-Slurm compute node.
-
-Target UX:
-
-```sh
-gua doctor
-gua start
-
-# days later
-gua status
-gua report --since 3d
-gua stop
-```
-
-The product should auto-detect the right collector runtime, but it must not hide
-the decision. The user should not need to know the deployment model up front,
-but `gua` should clearly report what it chose and why.
-
-Example:
-
-```text
-Detected environment:
-  host NVML: initialized, GPU count=0
-  kubernetes: available
-  k8s NVIDIA runtime: available
-  slurm: not detected
-
-Recommended plan:
-  runtime: k8s-daemonset
-  telemetry: nvml
-  scheduler: k8s
-
-Reason:
-  GPUs are not visible from the host namespace, but they are visible inside
-  Kubernetes containers with NVIDIA_VISIBLE_DEVICES=all.
-```
-
-This is the main product shift: `daemon` remains a low-level collector, while
-`gua start` becomes the launcher/orchestrator.
-
-## Motivation and Differentiation
-
-The gap this project should own is not raw GPU telemetry alone. DCGM exporter,
-`nvidia-smi`, and many Grafana dashboards already expose utilization, memory,
-temperature, and process-level facts. Slurm accounting, Kubernetes metadata, and
-cluster dashboards already expose scheduler-side allocation and ownership.
-
-The missing view is the retrospective join between the two:
-
-```text
-Who was allocated a GPU, and did that GPU actually do useful work?
-Who used a GPU without a scheduler allocation?
-Which GPUs were memory-held but compute-idle?
-Which GPUs were allocated but had no meaningful GPU process at all?
-```
-
-That combined view is the unique value. `gpu-usage-audit` should therefore
-avoid becoming another live GPU monitor. It should be a lightweight retrospective
-audit tool that correlates actual NVML observations with scheduler context.
-
-The most important headline classes are:
-
-```text
-allocated-idle-held     # scheduler allocated it, process held memory, compute was cold
-allocated-unused        # scheduler allocated it, but NVML saw no meaningful use
-unallocated-active      # GPU was used without visible scheduler allocation
-unallocated-idle-held   # GPU memory was held without visible scheduler allocation
-```
-
-In Kubernetes, `NVIDIA_VISIBLE_DEVICES=all` without a corresponding
-`nvidia.com/gpu` request is a first-class anomaly. It means a pod can access GPUs
-that scheduler accounting may not represent. This is one of the signals that
-standard GPU telemetry and kube-state style metadata do not provide by
-themselves.
-
-## Product Goals
-
-1. **No environment knowledge required for first use**
-   - The user can run `gua doctor` or `gua start` without knowing whether the
-     node is bare metal, k8s, Docker, or Slurm.
-
-2. **Transparent, not magical**
-   - Auto mode must print the selected plan, reasons, required privileges,
-     storage location, and cleanup command.
-   - Advanced users can override with `--mode host`, `--mode k8s`,
-     `--mode slurm`, or `--mode container`.
-
-3. **Retrospective audit first**
-   - The core value remains "what happened over the last N hours/days?"
-   - Live dashboards, quotas, scheduling decisions, and remediation are not the
-     first product surface.
-
-4. **Measure both actual GPU use and scheduler allocation**
-   - NVML answers: "is a GPU doing work or holding memory?"
-   - k8s/Slurm answer: "was this GPU allocated to a workload?"
-   - The report should combine both.
-
-5. **Low operational footprint**
-   - SQLite remains the default local storage.
-   - No database service, web server, Prometheus, or Grafana required for the
-     default path.
-
-6. **Good failure modes**
-   - If `gua` cannot run, it should say which layer failed: driver, NVML,
-     device visibility, container runtime, kubectl auth, Slurm config, or
-     permissions.
-
-## Non-Goals
-
-- Replacing Slurm, Kubernetes, DCGM, Prometheus, Grafana, Open OnDemand, or
-  cluster dashboards.
-- Enforcing quotas or killing jobs.
-- Scheduling workloads.
-- Requiring a central server in the minimum viable product.
-- Making every install silent. Cluster or system changes should be explicit.
-
-## Supported Environment Classes
-
-### 1. Bare Metal Host
-
-Typical shape:
-
-```text
-/dev/nvidia0..N visible on host
-host NVML init succeeds
-host NVML device count > 0
-no scheduler detected, or scheduler context disabled
-```
-
-Runtime:
-
-```text
-runtime: host-systemd or host-foreground
-telemetry: nvml
-scheduler: none
-```
-
-This is closest to the current project.
-
-### 2. Kubernetes / GPU Operator
-
-Typical shape:
-
-```text
-host may only show /dev/nvidiactl
-host NVML device count may be 0
-GPU devices are injected into pods
-runtimeClassName=nvidia may exist
-NVIDIA_VISIBLE_DEVICES controls device exposure
-```
-
-Runtime:
-
-```text
-runtime: k8s-daemonset
-telemetry: nvml
-scheduler: k8s
-```
-
-The collector must run inside Kubernetes because the GPUs may only be visible in
-container namespaces.
-
-The user should not need to build or run Docker manually. The product can still
-use an official OCI image internally.
-
-### 3. Slurm Compute Node
-
-Typical shape:
-
-```text
-host /dev/nvidia0..N visible
-Slurm manages GPUs as GRES
-jobs request GPUs with --gres=gpu:N or --gpus=N
-Slurm sets CUDA_VISIBLE_DEVICES inside job steps
-cgroups may restrict visible device files
-```
-
-Runtime:
-
-```text
-runtime: host-systemd or host-foreground
-telemetry: nvml
-scheduler: slurm
-```
-
-Slurm support is not mainly about making NVML work. It is about combining NVML
-use with Slurm allocation state.
-
-### 4. Local Container Runtime
-
-Typical shape:
-
-```text
-host command cannot or should not run directly
-docker/podman can run NVIDIA containers
-docker run --gpus all ... sees GPUs
-```
-
-Runtime:
-
-```text
-runtime: local-container
-telemetry: nvml
-scheduler: none
-```
-
-This is useful as a fallback, but should not be the primary UX.
-
-## Core Architecture
-
-Do not spread environment branches throughout the collector and report code.
-Separate the product into three axes.
-
-```text
-1. Collector Runtime
-   Where does the collector process run?
-
-2. Telemetry Source
-   How does it read actual GPU state?
-
-3. Scheduler Context
-   Who has the GPU reserved or allocated?
-```
-
-Concrete combinations:
-
-| Environment | Runtime | Telemetry | Scheduler |
-|---|---|---|---|
-| Bare metal | host-systemd | nvml | none |
-| Kubernetes / GPU Operator | k8s-daemonset | nvml | k8s |
-| Slurm | host-systemd | nvml | slurm |
-| Docker-only | local-container | nvml | none |
-| Demo/test | foreground | fake | none/fake |
-
-The important rule:
-
-```text
-Kubernetes and Slurm are not telemetry sources.
-NVML is still the telemetry source.
-Kubernetes and Slurm provide runtime placement and allocation context.
-```
-
-## CLI Design
-
-### Primary Commands
-
-```text
-gua doctor
-gua start
-gua status
-gua report
-gua stop
-gua uninstall
-```
-
-### Low-Level Commands
-
-These can remain available, but should not be the primary first-run UX.
-
-```text
-gua daemon run
-gua daemon export
-gua db inspect
-```
-
-The current `gpu-usage-audit daemon` and `gpu-usage-audit report` can remain as
-compatibility aliases during migration.
-
-### `gua doctor`
-
-Read-only environment diagnosis.
-
-Default output is human-readable. `--json` is required for automation.
-
-Example:
-
-```sh
-gua doctor
-gua doctor --json
-gua doctor --mode k8s
-```
-
-Doctor checks:
-
-- OS, kernel, Python, uv/pipx availability
-- `/dev/nvidia*`
-- host NVML load/init/device count
-- GPU Operator staged NVML under `/run/nvidia/driver`
-- whether the staged NVML path should be used for host mode
-- `nvidia-smi` presence if available
-- `kubectl` availability and auth
-- k8s runtime classes
-- k8s GPU pods/DaemonSets
-- ability to create required k8s resources
-- Slurm commands and node GRES
-- Docker/Podman NVIDIA runtime fallback
-
-Doctor produces a `RuntimePlan`.
-
-### `gua start`
-
-Default mode is `auto`.
-
-```sh
-gua start
-gua start --mode auto
-gua start --mode host
-gua start --mode k8s
-gua start --mode slurm
-gua start --mode container
-gua start --dry-run
-gua start --yes
-```
-
-Behavior:
-
-1. Run doctor.
-2. Select a runtime plan.
-3. Print the plan.
-4. If the action mutates system or cluster state, ask for confirmation when
-   running in a TTY.
-5. Persist install state locally.
-
-Example:
-
-```text
-Plan:
-  mode: k8s-daemonset
-  namespace: gpu-usage-audit
-  image: ghcr.io/AI-Ocean/gpu-usage-audit:0.4.0
-  db: hostPath /var/lib/gpu-usage-audit/gua.db
-  nodes: GPU-capable nodes
-  cleanup: gua stop --mode k8s
-
-Continue? [y/N]
-```
-
-### `gua status`
-
-Shows the installed/running collector state.
-
-```text
-mode: k8s-daemonset
-collectors:
-  gpusystem: running, last sample 12s ago, GPUs visible=10
-  ds02: running, last sample 10s ago, GPUs visible=4
-storage:
-  per-node SQLite under /var/lib/gpu-usage-audit/gua.db
-```
-
-### `gua report`
-
-Default should use the saved install state.
-
-```sh
-gua report --since 24h
-gua report --since 3d --node gpusystem
-gua report --since 3d --all-nodes
-gua report --db /var/lib/gpu-usage-audit/gua.db --since 3d
-```
-
-For k8s, `gua report` should not require users to know where the DB is. It can
-query collector pods through `kubectl exec` and stream an export format back to
-the local CLI.
-
-### `gua stop` and `gua uninstall`
-
-`stop` should stop the collector but preserve data by default.
-
-`uninstall` can remove installed resources and optionally data.
-
-```sh
-gua stop
-gua uninstall
-gua uninstall --delete-data
-```
-
-## RuntimePlan Interface
-
-The detector should produce a structured plan, not directly perform actions.
-
-Conceptual model:
-
-```python
-class RuntimePlan:
-    mode: Literal[
-        "host-systemd",
-        "host-foreground",
-        "k8s-daemonset",
-        "local-container",
-        "unsupported",
-    ]
-    telemetry: Literal["nvml", "fake"]
-    scheduler: Literal["none", "k8s", "slurm"]
-    confidence: Literal["high", "medium", "low"]
-    reasons: list[str]
-    blockers: list[str]
-    warnings: list[str]
-    required_privileges: list[str]
-    actions: list[PlannedAction]
-```
-
-Runtime adapters consume a plan:
-
-```text
-HostRuntimeAdapter
-K8sRuntimeAdapter
-ContainerRuntimeAdapter
-```
-
-Scheduler adapters enrich snapshots:
-
-```text
-NoSchedulerAdapter
-K8sSchedulerAdapter
-SlurmSchedulerAdapter
-```
-
-Telemetry adapters produce hardware facts:
-
-```text
-NVMLTelemetry
-FakeTelemetry
-```
-
-## Detection Order
-
-Auto mode should prefer the least surprising runtime that can see all GPUs.
-
-Proposed order:
-
-1. Host NVML
-   - If host NVML sees GPUs, host runtime is viable.
-   - If Slurm is detected, scheduler context becomes `slurm`.
-   - Otherwise scheduler context is `none`.
-   - If host NVML fails with a likely version mismatch but staged GPU Operator
-     NVML exists under `/run/nvidia/driver`, the plan should record a host
-     runtime remediation:
-     - re-exec with `LD_LIBRARY_PATH` prepended before importing pynvml, or
-     - use a tiny launcher wrapper that sets the library path before starting
-       the collector.
-     Changing `LD_LIBRARY_PATH` after pynvml/libnvidia-ml has already been
-     loaded is not sufficient.
-
-2. Kubernetes
-   - If host NVML cannot see GPUs, but k8s is available and NVIDIA runtime can
-     expose GPUs in a pod, use `k8s-daemonset`.
-   - Do not rely only on `node.status.capacity["nvidia.com/gpu"]`; some
-     clusters expose GPUs to pods even when accounting is unusual or custom.
-
-3. Local container runtime
-   - If Docker/Podman can run an NVIDIA container with all GPUs, use
-     `local-container`.
-
-4. Unsupported
-   - Explain the nearest viable path.
-
-Important: detection should never install packages or mutate the cluster.
-
-## Kubernetes Runtime Design
-
-### Installation Shape
-
-Minimum viable install:
-
-```text
-Namespace: gpu-usage-audit
-DaemonSet: gpu-usage-audit
-ServiceAccount: gpu-usage-audit
-ConfigMap: collector config
-hostPath DB: /var/lib/gpu-usage-audit/gua.db
-```
-
-DaemonSet requirements:
-
-```yaml
-runtimeClassName: nvidia
-hostPID: true
-env:
-  - name: NVIDIA_VISIBLE_DEVICES
-    value: all
-  - name: NVIDIA_DRIVER_CAPABILITIES
-    value: compute,utility
-```
-
-Likely mounts:
-
-```text
-/var/lib/gpu-usage-audit         read-write DB hostPath
-/proc                            read-only host process metadata, if needed
-/var/lib/kubelet/pod-resources   read-only pod resources socket, if available
-```
-
-`hostPID: true` is important for node-wide process attribution. NVML can report
-PIDs for GPU processes, but without host PID visibility the collector may not be
-able to map those PIDs back to `/proc/<pid>/cgroup`.
-
-Default should be `hostPID: true` with an opt-out mode. Some clusters enforce
-restricted Pod Security profiles, so `gua start --mode k8s --no-host-pid` should
-be possible, but the plan must say that process-to-pod attribution will be
-weaker.
-
-The DaemonSet should target GPU-capable nodes by default, not every node.
-Preferred selectors:
-
-```text
-nvidia.com/gpu.present=true
-feature.node.kubernetes.io/pci-10de.present=true
-```
-
-If GPU Feature Discovery / Node Feature Discovery labels are absent, the
-installer can fall back to a broader DaemonSet plus collector self-checks.
-
-### Kubernetes Allocation Context
-
-The k8s adapter should combine three data sources:
-
-1. Kubernetes API
-   - Pods, namespaces, node names, owner references, resource requests/limits.
-
-2. Kubelet PodResources API
-   - Best source for which pod/container received which GPU device IDs.
-
-3. Host `/proc/<pid>/cgroup`
-   - Best source for mapping an observed GPU process PID to a pod/container.
-
-This distinction matters because the current observed cluster has pods with:
-
-```text
-NVIDIA_VISIBLE_DEVICES=all
-no nvidia.com/gpu request
-all GPUs visible inside the container
-```
-
-Those pods can use GPUs even though scheduler accounting may not represent the
-use cleanly.
-
-The adapter should explicitly detect:
-
-```text
-NVIDIA_VISIBLE_DEVICES=all
-NVIDIA_VISIBLE_DEVICES=<GPU UUID list>
-no nvidia.com/gpu request or limit
-```
-
-These should be surfaced as scheduler-accounting anomalies, not just stored as
-raw environment variables.
-
-### Cgroup Compatibility
-
-Process attribution depends on `/proc/<pid>/cgroup`, but cgroup v1 and unified
-cgroup v2 encode paths differently. Kubernetes and Slurm deployments are both
-moving toward cgroup v2.
-
-The parser should be a shared module used by the k8s and Slurm adapters. It
-should support:
-
-```text
-cgroup v1 controller-specific lines
-cgroup v2 unified `0::/path` lines
-systemd slice escaping
-containerd / CRI-O pod and container IDs
-Slurm job_<id> and step_<id> paths
-```
-
-This should be decided before implementing process-to-owner attribution.
-
-### Kubernetes Report Semantics
-
-The report should show both scheduler allocation and actual GPU state:
-
-```text
-allocated-active
-allocated-idle-held
-allocated-unused
-unallocated-active
-unallocated-idle-held
-truly-idle
-```
-
-Where:
-
-```text
-allocated-unused = scheduler allocated GPU, but no meaningful NVML process/mem
-unallocated-active = NVML shows use, but scheduler allocation is absent/unknown
-unallocated-idle-held = memory held without scheduler allocation
-truly-idle = no allocation and no meaningful NVML use
-```
-
-## Slurm Runtime Design
-
-Slurm generally manages GPUs through GRES.
-
-Important Slurm facts:
-
-- GPUs are configured as GRES, usually `Name=gpu`.
-- Jobs request GPUs with `--gres=gpu:N`, `--gpus=N`, or related flags.
-- Slurm sets `CUDA_VISIBLE_DEVICES` for job steps.
-- Slurm can use cgroups to restrict visible device files.
-- Slurm can autodetect NVIDIA GPUs with NVML in `gres.conf`.
-
-Slurm support should be treated as:
-
-```text
-runtime: host-systemd
-telemetry: nvml
-scheduler: slurm
-```
-
-The collector runs on compute nodes outside user jobs. It reads NVML for actual
-GPU use and Slurm for allocation context.
-
-### Slurm Detection Signals
-
-```text
-scontrol exists
-sinfo exists
-slurmd process or service exists
-/etc/slurm/slurm.conf or $SLURM_CONF exists
-scontrol show node <hostname> reports Gres or CfgTRES with gpu
-```
-
-### Slurm Allocation Context
-
-Initial adapter sources:
-
-```text
-scontrol show node <node>
-squeue -h -w <node>
-scontrol show job -d <jobid>
-sacct, when available
-/proc/<pid>/cgroup for job_<id> or step_<id>
-```
-
-MVP should support:
-
-- Which jobs are running on this node.
-- Which users own those jobs.
-- How many GPUs each job requested.
-- If available, which GPU device IDs or UUIDs are allocated.
-- Mapping GPU PIDs back to Slurm job IDs via cgroup.
-
-It is acceptable for the first version to mark per-GPU allocation as
-`allocated-unknown-gpu` if Slurm does not expose exact GPU IDs in the available
-commands.
-
-## Data Model V2
-
-The current schema captures hardware samples and process samples. That is still
-useful, but scheduler allocation needs first-class storage.
-
-Proposed tables:
-
-### `node`
-
-```text
-node_id
-hostname
-first_seen
-last_seen
-runtime_mode       # host-systemd / k8s-daemonset / local-container
-scheduler_kind     # none / k8s / slurm
-driver_version
-collector_version
-```
-
-### `gpu_sample`
-
-```text
-ts
-node_id
-gpu_uuid
-gpu_index
-parent_uuid          # nullable, set for MIG instances or virtual slices
-mig_profile          # nullable, e.g. 1g.5gb
-share_id             # nullable, for MIG/vGPU/time-slicing/MPS-style slices
-bus_id
-util_pct
-mem_used_mb
-mem_total_mb
-```
-
-### `gpu_process_sample`
-
-```text
-ts
-node_id
-gpu_uuid
-pid
-process_name
-mem_used_mb
-loginuid_user
-owner_key          # nullable, references observed owner if resolved
-```
-
-### `allocation_sample`
-
-```text
-ts
-node_id
-scheduler_kind     # k8s / slurm
-gpu_uuid           # nullable if exact GPU unknown
-parent_uuid        # nullable, physical GPU for MIG/vGPU/shared allocations
-owner_kind         # k8s_pod / slurm_job
-owner_key          # stable ID: namespace/name or job ID
-owner_name
-namespace
-user_name
-account
-requested_gpus
-share_fraction     # nullable, for fractional/shared GPU allocation
-allocation_state   # allocated / released / unknown
-raw_ref
-```
-
-### `owner_sample`
-
-Optional but useful for normalized reporting:
-
-```text
-ts
-owner_kind
-owner_key
-owner_name
-namespace
-user_name
-account
-labels_json
-```
-
-### Migration
-
-The existing DB can be read as legacy mode:
-
-```text
-scheduler_kind = none
-allocation state = unknown
-```
-
-Reports should continue to work on old DBs.
-
-### Retention and Rollups
-
-Raw process samples can become large quickly. A busy node can produce many rows
-per tick:
-
-```text
-1 Hz * 10 GPUs * 50 GPU processes = 500 process rows/sec
-```
-
-SQLite can handle useful short-term windows, but long retention needs an
-explicit policy. Default storage should keep the operational model simple:
-
-```text
-raw samples:       7-14 days by default
-1-minute rollups:  90 days by default
-5-minute rollups:  optional long-term retention
-```
-
-Proposed rollup tables:
-
-```text
-gpu_rollup_1m
-owner_rollup_1m
-allocation_rollup_1m
-```
-
-Rollups should preserve the combined classes, not just average utilization.
-Otherwise the core signal, such as `allocated-unused`, disappears during
-downsampling.
-
-## Classification Model
-
-Keep the existing hardware classification:
-
-```text
-util >= 10                  -> active
-util <  10 and mem > 100    -> idle-held
-util <  10 and mem <= 100   -> truly-idle
-```
-
-Add scheduler allocation:
-
-```text
-allocation known and present -> allocated
-allocation absent            -> unallocated
-allocation unavailable       -> unknown
-```
-
-Combined classes:
-
-| Allocation | Hardware | Combined |
-|---|---|---|
-| allocated | active | allocated-active |
-| allocated | idle-held | allocated-idle-held |
-| allocated | truly-idle | allocated-unused |
-| unallocated | active | unallocated-active |
-| unallocated | idle-held | unallocated-idle-held |
-| unallocated | truly-idle | truly-idle |
-| unknown | active | active |
-| unknown | idle-held | idle-held |
-| unknown | truly-idle | truly-idle |
-
-This lets the product keep the original report semantics while adding k8s/Slurm
-value.
-
-## Storage and Reporting Strategy
-
-### Single Node
-
-Default:
-
-```text
-/var/lib/gpu-usage-audit/gua.db
-```
-
-User-mode/foreground fallback:
-
-```text
-~/.local/share/gpu-usage-audit/gua.db
-```
-
-### Kubernetes
-
-MVP:
-
-- One SQLite DB per node via hostPath.
-- `gua report` discovers collector pods.
-- `gua report` runs `gua daemon export --format jsonl` inside each collector
-  pod and aggregates locally.
-
-This avoids a central database or service, but it has known limits:
-
-- `pods/exec` RBAC is often restricted.
-- Sequential exec across many nodes is slow.
-- Large exports need streaming, compression, and time-window filtering.
-
-The report implementation should fan out in parallel and request only the
-needed time window. It should also support an alternative export path.
-
-Later:
-
-- Optional read-only HTTP export endpoint in each collector pod.
-- Optional `kubectl port-forward` based report collection.
-- Optional cluster-internal aggregator Job.
-- Optional central PVC.
-- Optional Prometheus/exporter mode.
-- Optional object storage export.
-
-### Slurm
-
-MVP:
-
-- One SQLite DB per compute node.
-- Local node reports first.
-
-Later:
-
-- Slurm controller-side aggregator.
-- `gua report --partition` or `--nodes`.
-
-## Packaging and Installation
-
-### Primary CLI Install
-
-Recommended:
-
-```sh
-uv tool install gpu-usage-audit
-```
-
-or:
-
-```sh
-pipx install gpu-usage-audit
-```
-
-To reduce first-run friction, consider making `nvidia-ml-py` a default
-dependency instead of an optional extra. It is small, and missing NVML bindings
-should not be the reason a GPU audit tool fails on first use.
-
-### OCI Image
-
-Needed for k8s runtime.
-
-```text
-ghcr.io/AI-Ocean/gpu-usage-audit:<version>
-ghcr.io/AI-Ocean/gpu-usage-audit:latest
-```
-
-The user does not need to run Docker manually. The image is an implementation
-detail used by the k8s runtime adapter.
-
-### Kubernetes Install
-
-Initial implementation can embed a manifest template in the Python package.
-
-Later:
-
-- Publish standalone YAML in GitHub Releases.
-- Publish Helm chart.
-
-### One-Line Installer
-
-Optional later UX:
-
-```sh
-curl -Ls https://github.com/AI-Ocean/gpu-usage-audit/releases/latest/download/install.sh | sh
-```
-
-This should install the CLI only. It should not silently install a systemd
-service or k8s DaemonSet.
-
-## Security and Permissions
-
-### Host Mode
-
-Needs:
-
-- NVML access.
-- Read access to `/proc/<pid>/loginuid` and cgroup metadata.
-- Write access to DB directory.
-- systemd install requires root.
-
-Non-root foreground mode should be supported for testing.
-
-### Kubernetes Mode
-
-Needs:
-
-- Ability to create namespace, service account, configmap, daemonset, and RBAC.
-- Runtime access to all GPUs on the target node.
-- Read access to pod and node metadata.
-- Potential hostPID and read-only `/proc` access for process attribution.
-- hostPath write access for SQLite DB.
-- Optional `pods/exec` for `gua report` if using exec-based export.
-
-The install plan must print these privileges before applying resources.
-
-Minimum collector RBAC should start with:
-
-```text
-get/list/watch pods
-get/list/watch nodes
-```
-
-`pods/exec` should be report-side only, not required by the collector itself.
-
-### Slurm Mode
-
-Needs:
-
-- Host NVML access.
-- Read access to Slurm commands/config/accounting.
-- Read access to process cgroups.
-- systemd install usually requires admin privileges.
-
-Slurm job users should not be expected to install node-wide collectors.
-
-## Implementation Milestones
-
-### M0: Focused ADRs
-
-Before broad implementation, write short architecture decision records for the
-highest-risk details:
-
-- GPU Operator staged NVML loading and host-mode re-exec.
-- MIG, vGPU, MPS, and time-slicing representation.
-- cgroup v1/v2 parser and owner attribution.
-- k8s report export path: `pods/exec` versus HTTP endpoint versus aggregator.
-
-### M1: Doctor and RuntimePlan
-
-No behavior changes to collection yet.
-
-Deliver:
-
-- `gua doctor`
-- host NVML/device checks
-- k8s checks
-- Slurm checks
-- structured JSON output
-- recommended plan
-
-This is the highest leverage milestone because it validates environment
-assumptions without installing anything.
-
-### M2: Schema V2 and Combined Report Model
-
-Deliver:
-
-- migration-safe DB schema
-- allocation table
-- combined classes
-- fake scheduler tests
-- old DB compatibility
-- retention and rollup policy
-
-This is the differentiating feature. It should land early so every runtime
-adapter can target the same model.
-
-### M3: CLI Surface and State
-
-Deliver:
-
-- `gua start --dry-run`
-- `gua status`
-- local state file
-- compatibility aliases for old commands
-
-No k8s install yet.
-
-### M4: Kubernetes Runtime Adapter
-
-Deliver:
-
-- official OCI image
-- embedded DaemonSet manifest
-- `gua start --mode k8s`
-- `gua stop --mode k8s`
-- `gua report` from collector pods with parallel, windowed export
-
-This solves the observed GPU Operator environment.
-
-### M5: Kubernetes Scheduler Adapter
-
-Deliver:
-
-- pod/process attribution
-- PodResources API integration where available
-- report by namespace/pod/user
-- detection of `NVIDIA_VISIBLE_DEVICES=all` pods without GPU requests
-- anomaly headline for unrequested GPU access
-
-### M6: Host Runtime Adapter
-
-Deliver:
-
-- systemd unit install
-- foreground mode
-- host preflight
-- GPU Operator staged NVML re-exec or clear diagnostic
-
-### M7: Slurm Scheduler Adapter
-
-Deliver:
-
-- Slurm detection
-- job allocation snapshots
-- process-to-job mapping through cgroups
-- exact GPU-to-job mapping on a best-effort basis
-- report by job/user/account
-
-### M8: Documentation and Release Polish
-
-Deliver:
-
-- quickstart
-- architecture docs
-- troubleshooting matrix
-- release workflow for wheel + OCI image
-- optional Helm chart
-
-## Current Server Interpretation
-
-The observed `gpusystem` server fits:
-
-```text
-runtime: k8s-daemonset
-telemetry: nvml
-scheduler: k8s
-```
-
-Why:
-
-- Host only shows `/dev/nvidiactl`.
-- Host NVML cannot see devices.
-- Kubernetes workload containers can see `/dev/nvidia0..9`.
-- Some pods use `runtimeClassName=nvidia` and `nvidia.com/gpu` requests.
-- Some pods expose `NVIDIA_VISIBLE_DEVICES=all` without GPU requests.
-
-This environment is exactly why runtime placement and scheduler context must be
-separate abstractions.
-
-## Open Questions
-
-Proposed decisions:
-
-1. `nvidia-ml-py` should become a default dependency.
-2. k8s DaemonSet should default to `hostPID: true`, with `--no-host-pid` opt-out.
-3. k8s install should target GPU-capable nodes by default.
-4. Collector RBAC should be read-only: pods and nodes. `pods/exec` is only
-   needed for the exec-based report transport.
-5. `gua report` should default to the current node when local state is
-   node-scoped, and support `--all-nodes` for cluster reports.
-6. Slurm MVP should include detection, node-level job allocation, and cgroup
-   PID-to-job mapping. Exact GPU-to-job mapping is best effort.
-7. MIG fields should be in schema v2 even if reports initially treat them as
-   ordinary GPU-like devices.
-8. `gua` should become the primary command. `gpu-usage-audit` should remain as
-   a compatibility alias.
-
-Still open:
-
-1. Should the first k8s report transport be `pods/exec`, HTTP export, or both?
-2. What default raw retention window is acceptable for busy nodes?
-3. Should rollups be computed in the collector process or during report/export?
-4. How should fractional sharing from HAMi/vGPU/time-slicing be normalized
-   across schedulers?
-
-## References
-
-- NVIDIA DCGM Exporter deployment patterns:
-  https://docs.nvidia.com/datacenter/dcgm/latest/gpu-telemetry/dcgm-exporter.html
-- NVIDIA Container Toolkit GPU environment variables:
-  https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.18.1/docker-specialized.html
-- NVIDIA GPU Operator overview:
-  https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
-- NVIDIA GPU Operator CDI and GPU Management Containers:
-  https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/cdi.html
-- Kubernetes Device Plugins:
-  https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/
-- Kubernetes kubelet files and Pod Resources API path:
-  https://kubernetes.io/docs/reference/node/kubelet-files/
-- Slurm GRES GPU scheduling:
-  https://slurm.schedmd.com/gres.html
-- Slurm `gres.conf`:
-  https://slurm.schedmd.com/gres.conf.html
-- Slurm cgroups:
-  https://slurm.schedmd.com/cgroups.html
-- Jeon et al., "Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN
-  Training Workloads", USENIX ATC 2019:
-  https://www.usenix.org/conference/atc19/presentation/jeon
-- Hu et al., "Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for
-  Deep Learning Training Jobs", ASPLOS 2023:
-  https://doi.org/10.1145/3575693.3575705
diff --git a/src/gpu_usage_audit/__init__.py b/src/gpu_usage_audit/__init__.py
index 0c77716..30d729c 100644
--- a/src/gpu_usage_audit/__init__.py
+++ b/src/gpu_usage_audit/__init__.py
@@ -1,8 +1,7 @@
 """gpu-usage-audit — surfaces idle-held NVIDIA GPU memory.
 
-이 패키지의 외부 API 는 아직 *진행 중*. v0.2.0 알파 단계에서
-Go v0.1.0 의 5-section report 를 Python 으로 옮기는 작업이 진행 중.
-v0.2.0 stable 까지는 import path 가 바뀔 수 있음.
+1.0 scope는 단일 로컬 베어메탈 NVIDIA 호스트의 NVML telemetry를 SQLite에
+기록하고, active / idle-held / truly-idle retrospective report를 출력하는 것.
 """
 
 # 런타임에서 버전 노출. pyproject.toml 의 [project.version] 과 동기 유지.
diff --git a/src/gpu_usage_audit/__main__.py b/src/gpu_usage_audit/__main__.py
index f67f0b2..8813ab4 100644
--- a/src/gpu_usage_audit/__main__.py
+++ b/src/gpu_usage_audit/__main__.py
@@ -34,7 +34,6 @@
     doctor_report_to_dict,
     render_doctor,
 )
-from .env import detect_env_kind
 from .identity import system_user_lookup
 from .model import HostMeta
 from .nvml import NVMLNotAvailableError, NVMLTier
@@ -64,6 +63,7 @@
     "d": "days",
 }
 DEFAULT_DB_PATH = DOCTOR_DEFAULT_DB_PATH
+LOCAL_ENV_KIND = "bare"
 
 
 def _duration(s: str) -> timedelta:
@@ -225,7 +225,7 @@ def _cmd_daemon(args: argparse.Namespace) -> int:
             return 1
         host = HostMeta(
             hostname=socket.gethostname() or "unknown",
-            env_kind=detect_env_kind("/proc"),
+            env_kind=LOCAL_ENV_KIND,
             driver_version=driver,
             first_seen=datetime.now(UTC),
         )
@@ -295,7 +295,7 @@ def _cmd_demo(args: argparse.Namespace) -> int:
         driver = tier.probe()
         host = HostMeta(
             hostname=socket.gethostname() or "unknown",
-            env_kind=detect_env_kind("/proc"),
+            env_kind=LOCAL_ENV_KIND,
             driver_version=driver,
             first_seen=datetime.now(UTC),
         )
diff --git a/src/gpu_usage_audit/doctor.py b/src/gpu_usage_audit/doctor.py
index 39476e6..e566bb1 100644
--- a/src/gpu_usage_audit/doctor.py
+++ b/src/gpu_usage_audit/doctor.py
@@ -16,10 +16,10 @@
 from pathlib import Path
 from typing import Literal
 
-from .model import RuntimePlan
 from .nvml import NVMLNotAvailableError, _decode, _load_pynvml, nvml_init_error_message
 
 type CheckStatus = Literal["ok", "warning", "error", "skipped"]
+type ReadinessMode = Literal["host", "unsupported"]
 type Which = Callable[[str], str | None]
 
 DEFAULT_COMMAND_TIMEOUT_SECONDS = 3.0
@@ -114,11 +114,21 @@ class DetectionFacts:
     database: DatabaseInfo
 
 
+@dataclass(slots=True)
+class DoctorPlan:
+    """`gua doctor` 의 로컬 베어메탈 readiness 판정."""
+
+    mode: ReadinessMode
+    reasons: list[str] = field(default_factory=list)
+    blockers: list[str] = field(default_factory=list)
+    warnings: list[str] = field(default_factory=list)
+
+
 @dataclass(slots=True)
 class DoctorReport:
     generated_at: datetime
     checks: list[DoctorCheck]
-    plan: RuntimePlan
+    plan: DoctorPlan
 
 
 def run_command(cmd: Sequence[str], timeout: float) -> CommandResult:
@@ -173,7 +183,7 @@ def build_doctor_report(
         nvml=nvml_info,
         database=database_info,
     )
-    plan = select_runtime_plan(facts)
+    plan = select_doctor_plan(facts)
     return DoctorReport(
         generated_at=generated_at,
         checks=[
@@ -446,15 +456,12 @@ def probe_default_db(db_path: str | Path = DEFAULT_DB_PATH) -> tuple[DatabaseInf
     )
 
 
-def select_runtime_plan(facts: DetectionFacts) -> RuntimePlan:
+def select_doctor_plan(facts: DetectionFacts) -> DoctorPlan:
     blockers = _unsupported_blockers(facts)
     warnings = _host_warnings(facts)
     if blockers:
-        return RuntimePlan(
+        return DoctorPlan(
             mode="unsupported",
-            telemetry="nvml",
-            scheduler="none",
-            confidence="high",
             reasons=[
                 "This command only audits the local machine, and host readiness is incomplete."
             ],
@@ -462,21 +469,14 @@ def select_runtime_plan(facts: DetectionFacts) -> RuntimePlan:
             warnings=warnings,
         )
 
-    return RuntimePlan(
+    return DoctorPlan(
         mode="host",
-        telemetry="nvml",
-        scheduler="none",
-        confidence="high",
         reasons=[
             f"Local NVML initialized and sees {facts.nvml.device_count} GPU(s).",
             "`nvidia-smi -L` lists GPUs on this machine.",
             "The 1.0 workflow writes local NVML samples to a local SQLite database.",
         ],
         warnings=warnings,
-        required_privileges=[
-            "permission to read NVML GPU and process state",
-            "write access to the collector database path",
-        ],
     )
 
 
@@ -535,7 +535,7 @@ def doctor_report_to_dict(report: DoctorReport) -> dict[str, object]:
         "read_only": True,
         "no_system_changes": True,
         "checks": [doctor_check_to_dict(check) for check in report.checks],
-        "plan": runtime_plan_to_dict(report.plan),
+        "plan": doctor_plan_to_dict(report.plan),
     }
     if report.plan.mode == "host":
         data["recommended_commands"] = _recommended_commands_for(report)
@@ -552,18 +552,12 @@ def doctor_check_to_dict(check: DoctorCheck) -> dict[str, object]:
     }
 
 
-def runtime_plan_to_dict(plan: RuntimePlan) -> dict[str, object]:
+def doctor_plan_to_dict(plan: DoctorPlan) -> dict[str, object]:
     return {
         "mode": plan.mode,
-        "telemetry": plan.telemetry,
-        "scheduler": plan.scheduler,
-        "confidence": plan.confidence,
         "reasons": plan.reasons,
         "blockers": plan.blockers,
         "warnings": plan.warnings,
-        "required_privileges": plan.required_privileges,
-        # schema_version=1 호환을 위해 RuntimePlan 모델 필드 없이 빈 리스트를 유지한다.
-        "actions": [],
     }
 
 
diff --git a/src/gpu_usage_audit/env.py b/src/gpu_usage_audit/env.py
deleted file mode 100644
index 8c5015d..0000000
--- a/src/gpu_usage_audit/env.py
+++ /dev/null
@@ -1,57 +0,0 @@
-"""호스트 환경 분류 — `/proc/1/cgroup` 의 마지막 필드를 보고 bare/docker/k8s 결정.
-
-PID 1 은 부팅 직후 커널이 띄우는 init — bare 머신이면 systemd 관리
-경로(`/system.slice/...`, `/init.scope` 등), 컨테이너 안이면
-`/docker/...` 또는 `/kubepods/...` 같은 시그니처가 등장한다.
-
-매칭 우선순위: k8s → docker → bare → unknown.
-- k8s 를 먼저 보는 이유: k8s 파드는 내부적으로 docker/containerd 위에
-  도는 경우가 흔해 docker 시그니처가 false positive 가 될 수 있다.
-- unknown 은 silent 폴백 — *알 수 없는 환경* 을 "bare 인 척" 하면 위험.
-"""
-
-from __future__ import annotations
-
-from pathlib import Path
-
-
-def detect_env_kind(proc_root: str | Path = "/proc") -> str:
-    """`proc_root/1/cgroup` 을 읽고 "bare"/"docker"/"k8s"/"unknown" 반환.
-
-    Args:
-        proc_root: 일반적으로 `/proc`. 테스트에서는 t.TempDir() 같은
-            pyfakefs 대신 *실 파일* 픽스처를 깔아도 동작 — Go 의
-            DetectEnvKind 와 동일한 시그니처.
-
-    Returns:
-        분류 문자열. 파일 부재/읽기 실패 시 "unknown".
-    """
-    path = Path(proc_root) / "1" / "cgroup"
-    try:
-        data = path.read_text()
-    except OSError:
-        return "unknown"
-
-    if "kubepods" in data:
-        return "k8s"
-    if "docker" in data or "containerd" in data:
-        return "docker"
-
-    # cgroup 라인 형식: "<hierarchy>:<controllers>:<path>" (v1) 또는
-    # "0::<path>" (v2). 마지막 필드가 systemd 관리 경로면 bare.
-    for raw_line in data.splitlines():
-        line = raw_line.strip()
-        if not line:
-            continue
-        parts = line.split(":", 2)
-        if len(parts) != 3:
-            continue
-        p = parts[2]
-        if (
-            p == "/"
-            or p == "/init.scope"
-            or p.startswith("/system.slice")
-            or p.startswith("/user.slice")
-        ):
-            return "bare"
-    return "unknown"
diff --git a/src/gpu_usage_audit/model.py b/src/gpu_usage_audit/model.py
index 198d131..aad0f23 100644
--- a/src/gpu_usage_audit/model.py
+++ b/src/gpu_usage_audit/model.py
@@ -10,15 +10,9 @@
 
 from dataclasses import dataclass, field
 from datetime import datetime
-from typing import Literal
 
 from .classify import Class
 
-RuntimeMode = Literal["host", "unsupported"]
-TelemetrySource = Literal["nvml"]
-SchedulerSource = Literal["none"]
-PlanConfidence = Literal["high", "medium", "low"]
-
 
 @dataclass(slots=True)
 class GPUSample:
@@ -58,9 +52,9 @@ class Snapshot:
 class HostMeta:
     """데몬 startup 에 한 번 결정하고 *수명 내내 들고 다니는* 호스트 컨텍스트.
 
-    hostname/env_kind/driver_version 은 데몬 lifetime 동안 변하지 않는다는
-    가정. first_seen 은 host row 의 immutable 필드 (재시작 후에도 첫
-    INSERT 시각 보존), last_seen 은 매 틱 갱신.
+    1.0은 로컬 베어메탈 호스트 전용이므로 env_kind 는 "bare" 로 기록한다.
+    hostname/env_kind/driver_version 은 데몬 lifetime 동안 변하지 않는다는 가정.
+    first_seen 은 host row 의 immutable 필드, last_seen 은 매 틱 갱신.
     """
 
     hostname: str
@@ -69,20 +63,6 @@ class HostMeta:
     first_seen: datetime
 
 
-@dataclass(slots=True)
-class RuntimePlan:
-    """`gua doctor` 가 만든 로컬 호스트 readiness 판정."""
-
-    mode: RuntimeMode
-    telemetry: TelemetrySource
-    scheduler: SchedulerSource
-    confidence: PlanConfidence
-    reasons: list[str] = field(default_factory=list)
-    blockers: list[str] = field(default_factory=list)
-    warnings: list[str] = field(default_factory=list)
-    required_privileges: list[str] = field(default_factory=list)
-
-
 @dataclass(slots=True)
 class HostRow:
     """report 측이 host 테이블에서 읽어 헤더에 노출하는 모양.
diff --git a/src/gpu_usage_audit/tier.py b/src/gpu_usage_audit/tier.py
index 786f30a..b2ccac3 100644
--- a/src/gpu_usage_audit/tier.py
+++ b/src/gpu_usage_audit/tier.py
@@ -1,8 +1,7 @@
-"""데이터 소스 추상 + 학습/테스트용 FakeTier.
+"""GPU telemetry source 추상 + demo/test용 FakeTier.
 
 Tier 는 "한 틱의 GPU 텔레메트리를 어디서 받아오는가" 의 추상. 운영용
-NVMLTier (v0.2.0 후속에서 추가) 와 학습/테스트용 FakeTier 가 같은
-자리에 꽂힌다.
+NVMLTier 와 demo/test용 FakeTier 가 같은 자리에 꽂힌다.
 
 Python 에는 typing.Protocol — Go 의 interface 와 *구조적 호환*
 (implements 선언 불필요). FakeTier 와 NVMLTier 가 같은 모양을 가지면
diff --git a/tests/test_doctor.py b/tests/test_doctor.py
index 37d657c..30ccf97 100644
--- a/tests/test_doctor.py
+++ b/tests/test_doctor.py
@@ -62,8 +62,6 @@ def test_build_doctor_report_checks_only_local_bare_metal(tmp_path: Path) -> Non
     ]
     assert runner.calls == [("nvidia-smi", "-L")]
     assert report.plan.mode == "host"
-    assert report.plan.telemetry == "nvml"
-    assert report.plan.scheduler == "none"
 
     rendered = render_doctor(report)
     assert "Scope:\n  machine: local" in rendered
@@ -324,7 +322,8 @@ def test_doctor_report_json_is_local_scope(tmp_path: Path) -> None:
     assert isinstance(plan, dict)
     assert isinstance(checks, list)
     assert plan["mode"] == "host"
-    assert plan["actions"] == []
+    assert "scheduler" not in plan
+    assert "actions" not in plan
     assert [check["id"] for check in checks if isinstance(check, dict)] == [
         "os",
         "nvidia_devices",
diff --git a/tests/test_env.py b/tests/test_env.py
deleted file mode 100644
index d68dd5d..0000000
--- a/tests/test_env.py
+++ /dev/null
@@ -1,55 +0,0 @@
-"""DetectEnvKind 테스트. Go v0.1.0 의 TestDetectEnvKind 와 동일 케이스."""
-
-from __future__ import annotations
-
-from pathlib import Path
-
-import pytest
-
-from gpu_usage_audit.env import detect_env_kind
-
-
-@pytest.mark.parametrize(
-    ("name", "content", "want"),
-    [
-        (
-            "k8s — kubepods 경로",
-            "12:devices:/kubepods/besteffort/pod-abc/container-xyz\n",
-            "k8s",
-        ),
-        (
-            "k8s 우선순위 — kubepods + docker 둘 다",
-            "12:devices:/kubepods/...\n11:cpu:/docker/abc\n",
-            "k8s",
-        ),
-        ("docker — docker 경로", "12:devices:/docker/abcdef\n", "docker"),
-        ("docker — containerd 경로", "12:devices:/containerd/xyz\n", "docker"),
-        ("bare — system.slice", "0::/system.slice/gpu-audit.service\n", "bare"),
-        ("bare — init.scope", "0::/init.scope\n", "bare"),
-        ("bare — 루트 경로", "0::/\n", "bare"),
-        ("bare — user.slice", "0::/user.slice/user-1000.slice\n", "bare"),
-        ("unknown — 모르는 경로", "0::/some/weird/path\n", "unknown"),
-    ],
-)
-def test_detect_env_kind_from_content(
-    tmp_path: Path,
-    name: str,
-    content: str,
-    want: str,
-) -> None:
-    proc_dir = tmp_path / "1"
-    proc_dir.mkdir()
-    (proc_dir / "cgroup").write_text(content)
-    got = detect_env_kind(tmp_path)
-    assert got == want, f"{name}: got {got!r}, want {want!r}\n  content={content!r}"
-
-
-def test_detect_env_kind_missing_file(tmp_path: Path) -> None:
-    # proc_root 자체는 존재하지만 1/cgroup 파일 없음 — unknown 폴백.
-    assert detect_env_kind(tmp_path) == "unknown"
-
-
-def test_detect_env_kind_missing_root(tmp_path: Path) -> None:
-    # proc_root 자체가 없는 경로도 OSError 흡수 → unknown.
-    nonexistent = tmp_path / "does-not-exist"
-    assert detect_env_kind(nonexistent) == "unknown"
diff --git a/tests/test_smoke.py b/tests/test_smoke.py
index e45dafd..055c60d 100644
--- a/tests/test_smoke.py
+++ b/tests/test_smoke.py
@@ -24,8 +24,7 @@
     gua_main,
     main,
 )
-from gpu_usage_audit.doctor import DoctorCheck, DoctorReport
-from gpu_usage_audit.model import RuntimePlan
+from gpu_usage_audit.doctor import DoctorCheck, DoctorPlan, DoctorReport
 
 
 def test_version_string_is_nonempty() -> None:
@@ -207,11 +206,8 @@ def _fake_doctor_report(*, db_path: str | Path = DEFAULT_DB_PATH) -> DoctorRepor
                 details={"path": str(db_path), "is_default": Path(db_path) == DEFAULT_DB_PATH},
             ),
         ],
-        plan=RuntimePlan(
+        plan=DoctorPlan(
             mode="host",
-            telemetry="nvml",
-            scheduler="none",
-            confidence="high",
             reasons=["Local NVML initialized and sees 2 GPU(s)."],
         ),
     )