fix(retrieval): rerank scores the full block body, not the 200-char snippet by jrosskopf · Pull Request #236 · DataZooDE/escurel

jrosskopf · 2026-06-30T05:57:43Z

Summary

The escurel-eval SciFact baseline (#235) showed the cross-encoder rerank stage (#215) regressing retrieval quality — nDCG@10 0.846 → 0.671 — instead of improving it. Root cause: Indexer::rerank_hits built the reranker candidates from SearchHit.snippet, the hydrated 200-char lead, so on abstract-length passages the cross-encoder judged relevance from ~13% of the text and reordered worse than the bi-encoder that embedded the whole doc.

rerank_hits now feeds the full block body: a new rerank_passages helper bulk-fetches blocks.body for the candidate set in one block_id IN (…) query and uses that as the passage text, falling back to the snippet for hits with no resolvable block (the sql_view lane). The reranker's own tokenizer truncates to its max length, so per-pair cost stays bounded.

Test plan

cargo test -p escurel-index --test rerank
- rerank_scores_full_body_not_just_the_snippet (new) — a note whose distinguishing token sits past the 200-char snippet is promoted by a keyword reranker only because the full body is now fed; it also asserts the token is absent from the snippet, guarding the premise.
- existing reorder + set-preservation (INV-ACL-FUSION) and identity-when-disabled tests unchanged and green.
Full local gate: fmt, clippy --workspace --all-targets -D warnings, cargo test --workspace --all-targets, cargo build --workspace --release.

Follow-up

Re-measuring the SciFact baseline with full-passage rerank is a separate, slow run (CPU cross-encoder throughput). The harness verifies the mechanism here; the empirical nDCG re-measure + a possible lower default rerank_candidates for latency come next.

🤖 Generated with Claude Code

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

…nippet The cross-encoder rerank stage (#215) scored `SearchHit.snippet` — the hydrated 200-char lead — so on anything longer than a sentence it judged relevance from ~13% of the passage. The escurel-eval SciFact baseline caught the consequence: rerank REGRESSED nDCG@10 (0.846 → 0.671) instead of improving it (`docs/eval/baseline-scifact.md`). `Indexer::rerank_hits` now bulk-fetches the full `blocks.body` for the candidate set (one `block_id IN (…)` query) and feeds that to the reranker, falling back to the snippet for any hit with no resolvable block (e.g. the sql_view lane). The reranker's tokenizer truncates to its own max length, so the per-pair cost stays bounded. Test plan: - tests/rerank.rs::rerank_scores_full_body_not_just_the_snippet — a note whose distinguishing token sits PAST the 200-char snippet is promoted by a keyword reranker only because the full body is now fed; the test also asserts the token is absent from the snippet (guards the premise). - Existing rerank tests (reorder + set-preservation, identity-when-off) unchanged and green. - Full local gate green. Re-measuring the SciFact baseline with full-passage rerank is the follow-up (CPU cross-encoder throughput makes it a separate, slow run). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

#237) The #236 fix (rerank scores the full block body) flips the SciFact rerank result from a regression to an improvement. Re-measured on a 400-doc / 50-query mini split with full passages: rerank vs single_pass nDCG@10: snippet -0.175 -> full-passage +0.018 rerank vs single_pass MRR: snippet -0.19 -> full-passage +0.031 Quality is fixed (the cross-encoder now improves ranking). Latency is the remaining blocker — full passages are longer sequences, so rerank is ~63 s/query on CPU even at 50 candidates; it needs GPU + a small candidate pool. Adds the re-measure JSON + an Update section to baseline-scifact.md. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

jrosskopf merged commit 860f168 into main Jun 30, 2026
1 check passed

jrosskopf deleted the fix/rerank-full-passage branch June 30, 2026 06:23

jrosskopf mentioned this pull request Jun 30, 2026

docs(eval): record full-passage rerank re-measure (#236 confirmed) #237

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(retrieval): rerank scores the full block body, not the 200-char snippet#236

fix(retrieval): rerank scores the full block body, not the 200-char snippet#236
jrosskopf merged 1 commit into
mainfrom
fix/rerank-full-passage

jrosskopf commented Jun 30, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jrosskopf commented Jun 30, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jrosskopf commented Jun 30, 2026 •

edited by blacksmith-sh Bot

Loading