Skip to content

fix(retrieval): rerank scores the full block body, not the 200-char snippet#236

Merged
jrosskopf merged 1 commit into
mainfrom
fix/rerank-full-passage
Jun 30, 2026
Merged

fix(retrieval): rerank scores the full block body, not the 200-char snippet#236
jrosskopf merged 1 commit into
mainfrom
fix/rerank-full-passage

Conversation

@jrosskopf

@jrosskopf jrosskopf commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

The escurel-eval SciFact baseline (#235) showed the cross-encoder rerank stage (#215) regressing retrieval quality — nDCG@10 0.846 → 0.671 — instead of improving it. Root cause: Indexer::rerank_hits built the reranker candidates from SearchHit.snippet, the hydrated 200-char lead, so on abstract-length passages the cross-encoder judged relevance from ~13% of the text and reordered worse than the bi-encoder that embedded the whole doc.

rerank_hits now feeds the full block body: a new rerank_passages helper bulk-fetches blocks.body for the candidate set in one block_id IN (…) query and uses that as the passage text, falling back to the snippet for hits with no resolvable block (the sql_view lane). The reranker's own tokenizer truncates to its max length, so per-pair cost stays bounded.

Test plan

  • cargo test -p escurel-index --test rerank
    • rerank_scores_full_body_not_just_the_snippet (new) — a note whose distinguishing token sits past the 200-char snippet is promoted by a keyword reranker only because the full body is now fed; it also asserts the token is absent from the snippet, guarding the premise.
    • existing reorder + set-preservation (INV-ACL-FUSION) and identity-when-disabled tests unchanged and green.
  • Full local gate: fmt, clippy --workspace --all-targets -D warnings, cargo test --workspace --all-targets, cargo build --workspace --release.

Follow-up

Re-measuring the SciFact baseline with full-passage rerank is a separate, slow run (CPU cross-encoder throughput). The harness verifies the mechanism here; the empirical nDCG re-measure + a possible lower default rerank_candidates for latency come next.

🤖 Generated with Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

…nippet

The cross-encoder rerank stage (#215) scored `SearchHit.snippet` — the
hydrated 200-char lead — so on anything longer than a sentence it judged
relevance from ~13% of the passage. The escurel-eval SciFact baseline
caught the consequence: rerank REGRESSED nDCG@10 (0.846 → 0.671) instead
of improving it (`docs/eval/baseline-scifact.md`).

`Indexer::rerank_hits` now bulk-fetches the full `blocks.body` for the
candidate set (one `block_id IN (…)` query) and feeds that to the
reranker, falling back to the snippet for any hit with no resolvable
block (e.g. the sql_view lane). The reranker's tokenizer truncates to its
own max length, so the per-pair cost stays bounded.

Test plan:
  - tests/rerank.rs::rerank_scores_full_body_not_just_the_snippet — a note
    whose distinguishing token sits PAST the 200-char snippet is promoted
    by a keyword reranker only because the full body is now fed; the test
    also asserts the token is absent from the snippet (guards the premise).
  - Existing rerank tests (reorder + set-preservation, identity-when-off)
    unchanged and green.
  - Full local gate green.

Re-measuring the SciFact baseline with full-passage rerank is the
follow-up (CPU cross-encoder throughput makes it a separate, slow run).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jrosskopf jrosskopf merged commit 860f168 into main Jun 30, 2026
1 check passed
@jrosskopf jrosskopf deleted the fix/rerank-full-passage branch June 30, 2026 06:23
jrosskopf added a commit that referenced this pull request Jun 30, 2026
#237)

The #236 fix (rerank scores the full block body) flips the SciFact rerank
result from a regression to an improvement. Re-measured on a 400-doc /
50-query mini split with full passages:

  rerank vs single_pass nDCG@10:  snippet -0.175  ->  full-passage +0.018
  rerank vs single_pass MRR:      snippet -0.19   ->  full-passage +0.031

Quality is fixed (the cross-encoder now improves ranking). Latency is the
remaining blocker — full passages are longer sequences, so rerank is ~63
s/query on CPU even at 50 candidates; it needs GPU + a small candidate
pool. Adds the re-measure JSON + an Update section to baseline-scifact.md.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant