Skip to content

docs(eval): record full-passage rerank re-measure (#236 confirmed)#237

Merged
jrosskopf merged 1 commit into
mainfrom
docs/rerank-remeasure
Jun 30, 2026
Merged

docs(eval): record full-passage rerank re-measure (#236 confirmed)#237
jrosskopf merged 1 commit into
mainfrom
docs/rerank-remeasure

Conversation

@jrosskopf

@jrosskopf jrosskopf commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #236. The eval harness caught that the rerank stage regressed quality (it scored the 200-char snippet); #236 fixed it to score the full block body. This records the empirical confirmation in docs/eval/baseline-scifact.md.

Re-measured on a 400-doc / 50-query SciFact mini split with full passages:

metric (rerank vs single_pass) snippet (orig 1k run) full-passage (#236)
ΔnDCG@10 −0.175 (regression) +0.018 (improvement)
ΔMRR −0.19 +0.031

The cross-encoder now improves ranking, as intended. Latency is the remaining blocker: full passages are longer token sequences, so rerank is ~63 s/query on CPU even at 50 candidates — it needs GPU and/or a small rerank_candidates. Quality fixed; latency/default-on is the next lever.

Docs + JSON only.

🤖 Generated with Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

The #236 fix (rerank scores the full block body) flips the SciFact rerank
result from a regression to an improvement. Re-measured on a 400-doc /
50-query mini split with full passages:

  rerank vs single_pass nDCG@10:  snippet -0.175  ->  full-passage +0.018
  rerank vs single_pass MRR:      snippet -0.19   ->  full-passage +0.031

Quality is fixed (the cross-encoder now improves ranking). Latency is the
remaining blocker — full passages are longer sequences, so rerank is ~63
s/query on CPU even at 50 candidates; it needs GPU + a small candidate
pool. Adds the re-measure JSON + an Update section to baseline-scifact.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jrosskopf jrosskopf merged commit 6f591bd into main Jun 30, 2026
1 check passed
@jrosskopf jrosskopf deleted the docs/rerank-remeasure branch June 30, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant