Tracks scope intentionally deferred from the MVP /metrics endpoint shipped in #35 / #211.
The MVP exposes flat counters and gauges with the minimum useful set of labels: build/evaluation counts by status, scheduler queue depth, connected workers, and cache totals. The work below requires either new internal data collection or non-trivial label-dimension design.
Scope
- Per-org / per-cache / per-project label dimensions. Most of the above metrics would be more useful broken down by tenant. Requires:
- A label allowlist policy so cardinality cannot explode (e.g. opt-in per org, or top-N orgs only).
- DB query rewrites to GROUP BY org/cache/project where applicable.
- Build-duration and evaluation-duration histograms. Use the existing `build.build_time_ms` and `(updated_at - created_at)` for evals. Histograms via `prometheus::Histogram` with sane bucket boundaries (e.g. exponential 1s → 1h).
- HTTP request-duration histogram. Extend the existing `TraceLayer` to record per-route latency keyed by `MatchedPath`.
- Worker-side metrics:
- Peer-to-peer transfer bytes/requests
- Concurrent build slot utilisation (`assigned_jobs.len()` / `max_concurrent_builds`)
- Build queue wait times
- Process / runtime metrics: RSS, fd count, tokio task count. Most easily via the `prometheus` crate's optional process collector (Linux only).
- Per-cache traffic rate (not just totals) and storage growth rate broken down by cache. Source data is already in `cache_metric` — this is presentation, but with cardinality concerns.
Out of scope for this issue
- A bundled Grafana dashboard (separate issue if desired)
Driven by the conversation in #35.
Tracks scope intentionally deferred from the MVP
/metricsendpoint shipped in #35 / #211.The MVP exposes flat counters and gauges with the minimum useful set of labels: build/evaluation counts by status, scheduler queue depth, connected workers, and cache totals. The work below requires either new internal data collection or non-trivial label-dimension design.
Scope
Out of scope for this issue
Driven by the conversation in #35.