feat: add benchmark artifact tooling by raoabinav · Pull Request #352 · StarTrail-org/LEANN

raoabinav · 2026-06-03T18:06:43Z

Evaluator Summary

Abi added reproducible benchmark artifact tooling for recall, latency, storage, provenance, query-log summaries, and backend comparisons, moving LEANN evaluation from ad hoc scripts toward auditable reports.
The design records input hashes, timing/storage stats, provenance, and machine-readable JSON plus Markdown summaries so benchmark runs can be compared across datasets and backends.
The implementation includes shared metrics/provenance/storage helpers and hardens BM25/DiskANN baseline scripts so future benchmark work has a consistent artifact contract.
Quality bar: unit tests cover benchmark generation, backend comparison manifests, query-log summaries, metrics, provenance, and baseline scripts; lint/format and diff checks were run.

raoabinav · 2026-06-04T03:06:23Z

Replaced by #357 from the neutral feature branch.

raoabinav added 4 commits June 3, 2026 11:06

feat: add benchmark artifact tooling

b9c28d4

fix: avoid model downloads in readme ci tests

e67da6c

fix: satisfy ty for benchmark tests

740f062

fix: normalize benchmark provenance paths

228e01f

raoabinav closed this Jun 4, 2026

raoabinav deleted the codex/p0-benchmarks branch June 4, 2026 03:19