docs: LoCoMo retrieval algorithm comparison findings#49
Open
rajkripal wants to merge 1 commit into
Open
Conversation
Adds `papers/locomo-run/retrieval-algorithm-comparison.md` with results from four retrieval configurations tested on conv-26 (n=199), plus cross-validation on conv-30 and conv-41. Uniform scoring shows a small gain on conv-26 (+0.014 F1) but collapses on conv-41 (-0.181 F1 vs qe-gte). Recommendation: keep vector+recency blend as default; do not ship uniform scoring. Also removes `tests/test_retrieval_uniform.py`, which was an untracked orphan on main. It imports `_entity_match_score` from `core.retrieval`, a symbol that does not exist on main (the uniform feature was never merged). The test fails on collection with an ImportError.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
papers/locomo-run/retrieval-algorithm-comparison.mdwith resultsfrom four retrieval configurations tested on conv-26 (n=199), plus
cross-validation on conv-30 and conv-41.
Uniform scoring shows a small gain on conv-26 (+0.014 F1) but collapses
on conv-41 (-0.181 F1 vs qe-gte). Recommendation: keep vector+recency
blend as default; do not ship uniform scoring.
Also removes
tests/test_retrieval_uniform.py. It was untracked on main(the uniform feature was never merged) and imports
_entity_match_scorefrom
core.retrieval, a symbol that does not exist on main. The testfails on collection with an ImportError. Deleting the orphan keeps
pytestclean without needing ignores.