feat(inference): ADR-064 Phase-0 decode slope measurement harness by ohdearquant · Pull Request #184 · ohdearquant/lattice

ohdearquant · 2026-05-31T17:54:16Z

Summary

Add scripts/bench_decode_slope.sh — runs decode at multiple context lengths (64-1024), fits linear model per_tok_ms = slope*ctx + intercept, outputs JSON with slope_ms, intercept_ms, r_squared, tok_per_sec_64
Add make bench-decode Makefile target for convenient invocation
Add debug-only runtime assertion in MetalKvCache::new verifying buffer sizes match expected f32 layout (gates feat(inference): wire f16 KV cache into default Metal decode path #154 f16 migration)

Implements #168 (decode slope harness) and #170 (KV layout assertion).

Bench output (validated on M2 Max)

{"slope_ms":0.003771,"intercept_ms":6.1545,"r_squared":0.918216,"tok_per_sec_64":156.4,"n_points":3,"contexts":[64,128,256],"per_tok_ms":[6.4791,6.5125,7.1616]}

Test plan

RUNS=3 CONTEXTS="64 256" make bench-decode produces valid JSON on stdout
cargo build --release --features "metal-gpu,f16" -p lattice-inference succeeds
Debug assertion passes on current f32 KV layout
Verify JSON is parseable by downstream tooling (ADR-064 CI integration)

🤖 Generated with Claude Code

Add `scripts/bench_decode_slope.sh` that runs decode at multiple context lengths (64-1024), fits a linear model (per_tok_ms = slope*ctx + intercept), and outputs JSON with slope_ms, intercept_ms, r_squared, tok_per_sec_64. Add `make bench-decode` Makefile target for convenient invocation. Add debug-only runtime assertion in MetalKvCache::new verifying buffer sizes match expected f32 layout (will need updating when #154 migrates to f16). Implements #168 (decode slope harness) and #170 (KV layout assertion). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The ENVIRON[] lookup requires variables to be in the environment, not positional shell args. Prefix the awk invocation with the variable assignments so they appear in ENVIRON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The model may hit EOS before generating the requested token count (Qwen3.5-0.8B caps at ~346 tokens). Previously used the requested count as the denominator, giving incorrect per-token times. Now extracts actual completion count from bench_decode_ab RESULT output. Also fixed macOS awk compatibility (no match() with capture groups). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes from PR #184 review (round 1): 1. [Major] Input validation: reject contexts <= N1, reject duplicates, validate RUNS is positive integer, guard bc output, guard awk denom=0. Script now exits 1 with JSON error for all invalid inputs. 2. [Medium] KV assertion safety: skip when num_full_layers == 0, verify ALL K and V buffers (not just k_bufs[0]). 3. Use `set -euo pipefail` (was `set -uo pipefail`). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ohdearquant · 2026-05-31T18:04:09Z

Codex Review — Round 1

Verdict: REQUEST CHANGES → Fixed (all findings addressed in commit 5085d21)

Findings Addressed

[Major] Input validation — CONTEXTS="64 64" and CONTEXTS="8 64" now properly fail with JSON error and exit 1. Added: set -e, context validation (>N1, no dupes), bc output guard, awk denom==0 guard.
[Medium] KV assertion safety — Now skips when num_full_layers == 0 and verifies ALL K/V buffer pairs (not just k_bufs[0]).

Verification

$ CONTEXTS="64 64" ./scripts/bench_decode_slope.sh
WARNING: skipping duplicate context 64
{"error":"need at least 2 valid contexts (integers > 8)"}
# exit=1 ✓

$ CONTEXTS="8 64" ./scripts/bench_decode_slope.sh  
WARNING: skipping invalid context 8 (must be integer > 8)
{"error":"need at least 2 valid contexts (integers > 8)"}
# exit=1 ✓

$ RUNS=2 CONTEXTS="64 256" ./scripts/bench_decode_slope.sh
{"slope_ms":0.003179,"intercept_ms":6.0260,...,"tok_per_sec_64":160.5}
# exit=0 ✓

Ready for re-review.

ohdearquant and others added 4 commits May 31, 2026 13:46

ohdearquant merged commit b75df67 into main May 31, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): ADR-064 Phase-0 decode slope measurement harness#184

feat(inference): ADR-064 Phase-0 decode slope measurement harness#184
ohdearquant merged 4 commits into
mainfrom
show/perf-discover/measure-substrate

ohdearquant commented May 31, 2026

Uh oh!

ohdearquant commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented May 31, 2026

Summary

Bench output (validated on M2 Max)

Test plan

Uh oh!

ohdearquant commented May 31, 2026

Codex Review — Round 1

Findings Addressed

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant