docs(eval): record full-passage rerank re-measure (#236 confirmed) by jrosskopf · Pull Request #237 · DataZooDE/escurel

jrosskopf · 2026-06-30T08:00:43Z

Follow-up to #236. The eval harness caught that the rerank stage regressed quality (it scored the 200-char snippet); #236 fixed it to score the full block body. This records the empirical confirmation in docs/eval/baseline-scifact.md.

Re-measured on a 400-doc / 50-query SciFact mini split with full passages:

metric (rerank vs single_pass)	snippet (orig 1k run)	full-passage (#236)
ΔnDCG@10	−0.175 (regression)	+0.018 (improvement)
ΔMRR	−0.19	+0.031

The cross-encoder now improves ranking, as intended. Latency is the remaining blocker: full passages are longer token sequences, so rerank is ~63 s/query on CPU even at 50 candidates — it needs GPU and/or a small rerank_candidates. Quality fixed; latency/default-on is the next lever.

Docs + JSON only.

🤖 Generated with Claude Code

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

The #236 fix (rerank scores the full block body) flips the SciFact rerank result from a regression to an improvement. Re-measured on a 400-doc / 50-query mini split with full passages: rerank vs single_pass nDCG@10: snippet -0.175 -> full-passage +0.018 rerank vs single_pass MRR: snippet -0.19 -> full-passage +0.031 Quality is fixed (the cross-encoder now improves ranking). Latency is the remaining blocker — full passages are longer sequences, so rerank is ~63 s/query on CPU even at 50 candidates; it needs GPU + a small candidate pool. Adds the re-measure JSON + an Update section to baseline-scifact.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jrosskopf merged commit 6f591bd into main Jun 30, 2026
1 check passed

jrosskopf deleted the docs/rerank-remeasure branch June 30, 2026 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(eval): record full-passage rerank re-measure (#236 confirmed)#237

docs(eval): record full-passage rerank re-measure (#236 confirmed)#237
jrosskopf merged 1 commit into
mainfrom
docs/rerank-remeasure

jrosskopf commented Jun 30, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jrosskopf commented Jun 30, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrosskopf commented Jun 30, 2026 •

edited by blacksmith-sh Bot

Loading