TFIDF slop calculator implementation#5
Conversation
Signed-off-by: Miles Song <bodasong@amazon.com>
Signed-off-by: Miles Song <bodasong@amazon.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces ChangesSlop Calculator Scoring Feature
🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
| // (single-char tokens are dropped, so only the words below have positions) | ||
| // apple=0 red=1 blue=2 banana=3 yellow=4 green=5 grape=6 purple=7 | ||
| // orange=8 cherry=9 pink=10 violet=11 one=12 two=13 three=14 four=15 | ||
| // five=16 six=17 |
There was a problem hiding this comment.
Feel like we should continue with listing the last few positions if we are going to list them at all
| // rounding of std::sqrt under -ffast-math: seed from a double then correct. | ||
| // Comparisons use division rather than squaring so they cannot overflow even | ||
| // when n is near UINT64_MAX (where (x+1)*(x+1) would wrap). | ||
| uint32_t IntSqrt(uint64_t n) { |
There was a problem hiding this comment.
should we add a test for this IntSqrt as this seems like something we had to update. The code seems correct but having a test on the overflow might be a good sanity check
Add
SlopCalculatorfor TFIDF scoringAdds
SlopCalculator(src/indexes/scoring/slop_calculator.{h,cc}) — a standalone, dependency-free component computing slop, the proximity penalty thatdivides the TFIDF numerator (
score = words_TFIDF * doc_score / norm / slop).This is the math-only slice; wiring it into query execution is a follow-up.
Algorithm
slop = max(1, floor(sqrt(Σ MinGap(node[i], node[i+1])²)))
over consecutive outermost-level query nodes. A nested group collapses to the union of its leaf positions (one anchor);
MinGapis a two-pointer closest-pairscan. The min-1 guard applies only to the final result.
Behavior
red red): no special case → gap 0 → guard → slop 1.Tests
testing/scoring/slop_calculator_test.cc— 11 algorithm cases + 2 death tests, golden values hand-computed. All pass.Build
New
scoringlib +scoring_testtarget. Run:./build.sh --run-tests=scoring_test.Summary by CodeRabbit
New Features
Tests