Skip to content

feat: add benchmark artifact tooling#352

Closed
raoabinav wants to merge 4 commits into
StarTrail-org:mainfrom
raoabinav:codex/p0-benchmarks
Closed

feat: add benchmark artifact tooling#352
raoabinav wants to merge 4 commits into
StarTrail-org:mainfrom
raoabinav:codex/p0-benchmarks

Conversation

@raoabinav
Copy link
Copy Markdown
Contributor

@raoabinav raoabinav commented Jun 3, 2026

Evaluator Summary

  • Abi added reproducible benchmark artifact tooling for recall, latency, storage, provenance, query-log summaries, and backend comparisons, moving LEANN evaluation from ad hoc scripts toward auditable reports.
  • The design records input hashes, timing/storage stats, provenance, and machine-readable JSON plus Markdown summaries so benchmark runs can be compared across datasets and backends.
  • The implementation includes shared metrics/provenance/storage helpers and hardens BM25/DiskANN baseline scripts so future benchmark work has a consistent artifact contract.
  • Quality bar: unit tests cover benchmark generation, backend comparison manifests, query-log summaries, metrics, provenance, and baseline scripts; lint/format and diff checks were run.

@raoabinav
Copy link
Copy Markdown
Contributor Author

Replaced by #357 from the neutral feature branch.

@raoabinav raoabinav closed this Jun 4, 2026
@raoabinav raoabinav deleted the codex/p0-benchmarks branch June 4, 2026 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant