ci(bench): make p99 latency metrics non-gating in the release gate by amavashev · Pull Request #199 · runcycles/cycles-server

amavashev · 2026-06-18T17:03:20Z

Summary

Makes the p99 latency metrics non-gating in the benchmark gate. They stay measured and reported, but no longer fail the build — p50 latency and concurrent throughput remain the hard gates.

Why

The v0.1.25.34 release gate failed on commit_p99 (+94% vs baseline) while every p50 and throughput metric was within tolerance. p99 tail latency on a 200-iteration micro-benchmark over shared GitHub runners swings ~2× run-to-run (commit_p99 measured 6.5 → 8.2 → 12.6 across three release-gate runs) — GC pauses and runner contention dominate the tail, far beyond the 25% threshold. A same-machine .21→.34 comparison showed only +8% on commit_p99, confirming it's environmental noise, not a code regression. No single-sample baseline can stabilize a metric that varies 2×.

Change

HEADLINE_METRICS entries gain a third element, gating.
reserve_p99_ms / commit_p99_ms → gating=False: shown in the table (labelled noisy (non-gating) when they exceed the threshold) but excluded from the pass/fail decision.
p50 latency (reserve/commit/release/event) + concurrent_throughput_32t stay gating=True.
Applies to both the release gate and the nightly trend check; a footnote in the summary explains the policy.

Verification

Case	Result
p99-only breach (the real failing run: `commit_p99` +94%)	OK, exit 0
real p50 regression (`commit_p50` +100%)	REGRESSION DETECTED, exit 1
bootstrap (empty baseline)	exit 0
trend mode	exit 0, no crash

CI-tooling only — no production / spec / wire change, no version bump. AUDIT updated.

The release gate failed v0.1.25.34 on commit_p99 (+94% vs baseline) while every p50 and throughput metric was within tolerance. p99 tail latency on a 200-iteration micro-benchmark over shared GitHub runners swings ~2x run-to-run (commit_p99 was 6.5 -> 8.2 -> 12.6 across three runs) from GC pauses / runner contention, far beyond the 25% threshold — and same-machine .21->.34 showed only +8% on commit_p99, so it's noise, not a regression. No single-sample baseline can stabilize a metric that varies 2x. HEADLINE_METRICS gains a third element, `gating`. reserve_p99_ms and commit_p99_ms are now non-gating: still measured and shown in the summary table (labelled "noisy (non-gating)" when they exceed the threshold) but no longer failing the build. p50 latency (reserve/commit/release/event) and concurrent_throughput_32t stay hard gates. Applies to the release gate and the nightly trend check. Verified: p99-only breach now passes (exit 0); a real +100% commit_p50 regression still fails (exit 1); bootstrap and trend modes unaffected. CI-tooling only — no production/spec/wire change, no version bump. AUDIT updated.

amavashev merged commit 4ff7e80 into main Jun 18, 2026
6 checks passed

amavashev deleted the ci/benchmark-gate-noisy-p99 branch June 18, 2026 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(bench): make p99 latency metrics non-gating in the release gate#199

ci(bench): make p99 latency metrics non-gating in the release gate#199
amavashev merged 1 commit into
mainfrom
ci/benchmark-gate-noisy-p99

amavashev commented Jun 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amavashev commented Jun 18, 2026

Summary

Why

Change

Verification

Next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant