feat: Optimize similarity search with vectorized cosine similarity (#634)#648
feat: Optimize similarity search with vectorized cosine similarity (#634)#648fennhelloworld wants to merge 1 commit into
Conversation
…itesh-1918#634) Replace per-ticket loop in DuplicateService.check_duplicate() with vectorized batched cosine similarity computation. Instead of calling util.cos_sim() individually for each stored embedding (O(n) kernel launches), all stored embeddings are stacked into a single 2D tensor and compared against the query in one matrix operation. Key changes: - Add _embedding_matrix, _ticket_ids, and _embedding_matrix_dirty to DuplicateService for lazy-rebuild caching - Add _rebuild_embedding_matrix() to stack embeddings into 2D tensor - Rewrite check_duplicate() to use vectorized util.cos_sim() with the stacked matrix and torch.argmax() for best-match selection - Mark matrix dirty on add_ticket() for correctness - Add benchmark_similarity.py showing speedup results: n=10: 10x, n=100: 33x, n=500: 196x, n=1000: 394x, n=5000: 421x Closes ritesh-1918#634
|
Someone is attempting to deploy a commit to the ritesh Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThe PR introduces vectorized cosine similarity computation to duplicate ticket detection. ChangesVectorized Duplicate Detection
Sequence Diagram(s)Not applicable. The changes are a performance optimization refactor within a single service class and a supporting benchmark utility; they do not introduce new multi-component control flows or external interactions that would benefit from visualization beyond the checkpoints already shown in the hidden artifact. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
backend/services/duplicate_service.py (2)
125-125: ⚡ Quick winMake the optional parameter explicit (
float | None).Ruff flags this as an implicit
Optional(RUF013). Line 23 already uses| Nonesyntax, so this is consistent with the file.♻️ Proposed fix
- def check_duplicate(self, text: str, threshold: float = None) -> dict: + def check_duplicate(self, text: str, threshold: float | None = None) -> dict:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/duplicate_service.py` at line 125, The function signature for check_duplicate currently uses the implicit Optional pattern (threshold: float = None); update the type annotation to be explicit by changing it to threshold: float | None = None in the check_duplicate method so it matches the file's use of `| None` and satisfies the RUF013 rule.
96-112: ⚡ Quick winFix potential state desync in
_rebuild_embedding_matrix()by snapshotting_tickets
DuplicateService._rebuild_embedding_matrix()builds_ticket_idsand the stackedembeddingsfrom two separate passes overself._tickets.add_ticket()appends toself._ticketsand sets_embedding_matrix_dirty=True, whilecheck_duplicate()may rebuild the matrix when dirty/stale, so concurrent mutation could desync_ticket_idsvs_embedding_matrix.In
backend/main.py, the call sites forduplicate_service.add_ticket(...)andduplicate_service.check_duplicate(...)are insideasync defroutes, but the service methods are synchronous andtorchops may release the GIL; if the app is running with multiple threads/workers within a process, this race is still plausible. Snapshotting avoids the mismatch without relying on deployment details.- self._ticket_ids = [tid for tid, _, _ in self._tickets] - embeddings = [emb for _, emb, _ in self._tickets] - self._embedding_matrix = torch.stack(embeddings) + tickets = list(self._tickets) # consistent snapshot + self._ticket_ids = [tid for tid, _, _ in tickets] + embeddings = [emb for _, emb, _ in tickets] + self._embedding_matrix = torch.stack(embeddings)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/duplicate_service.py` around lines 96 - 112, _rebuild_embedding_matrix currently iterates over self._tickets twice, which can lead to _ticket_ids vs the stacked _embedding_matrix getting out of sync if self._tickets is mutated concurrently (e.g., between add_ticket and check_duplicate); fix by snapshotting tickets at the start of _rebuild_embedding_matrix (e.g., local_tickets = list(self._tickets)) and then build _ticket_ids and embeddings from that snapshot before calling torch.stack, then set _embedding_matrix and _ticket_ids and clear _embedding_matrix_dirty; this ensures atomic consistency without changing add_ticket or check_duplicate signatures.backend/services/benchmark_similarity.py (1)
26-45: ⚡ Quick winAdd an untimed warm-up before measuring.
The first timed round absorbs one-time allocation/kernel-init overhead, which can skew the reported averages (most visibly at small
n). Since the PR's speedup claims rely on these numbers, a warm-up call makes them more representative.♻️ Proposed fix
def benchmark_loop(query: torch.Tensor, stored: list[torch.Tensor], rounds: int = 5) -> float: """Old approach: iterate and compute cos_sim one at a time.""" + for emb in stored: # warm-up + util.cos_sim(query, emb) times = []def benchmark_vectorized(query: torch.Tensor, matrix: torch.Tensor, rounds: int = 5) -> float: """New approach: single batched cos_sim call.""" query_2d = query.unsqueeze(0) + util.cos_sim(query_2d, matrix) # warm-up times = []🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/benchmark_similarity.py` around lines 26 - 45, Both benchmark_loop and benchmark_vectorized should perform an untimed warm-up call to amortize one-time allocation/kernel-init overhead before starting the timed rounds; update the functions (benchmark_loop and benchmark_vectorized) to run the same computation once (e.g., one pass over stored in benchmark_loop and one util.cos_sim call in benchmark_vectorized) prior to the for _ in range(rounds) timing loop so the measured rounds exclude initialization costs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@backend/services/benchmark_similarity.py`:
- Around line 26-45: Both benchmark_loop and benchmark_vectorized should perform
an untimed warm-up call to amortize one-time allocation/kernel-init overhead
before starting the timed rounds; update the functions (benchmark_loop and
benchmark_vectorized) to run the same computation once (e.g., one pass over
stored in benchmark_loop and one util.cos_sim call in benchmark_vectorized)
prior to the for _ in range(rounds) timing loop so the measured rounds exclude
initialization costs.
In `@backend/services/duplicate_service.py`:
- Line 125: The function signature for check_duplicate currently uses the
implicit Optional pattern (threshold: float = None); update the type annotation
to be explicit by changing it to threshold: float | None = None in the
check_duplicate method so it matches the file's use of `| None` and satisfies
the RUF013 rule.
- Around line 96-112: _rebuild_embedding_matrix currently iterates over
self._tickets twice, which can lead to _ticket_ids vs the stacked
_embedding_matrix getting out of sync if self._tickets is mutated concurrently
(e.g., between add_ticket and check_duplicate); fix by snapshotting tickets at
the start of _rebuild_embedding_matrix (e.g., local_tickets =
list(self._tickets)) and then build _ticket_ids and embeddings from that
snapshot before calling torch.stack, then set _embedding_matrix and _ticket_ids
and clear _embedding_matrix_dirty; this ensures atomic consistency without
changing add_ticket or check_duplicate signatures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 42974083-3733-43bf-ad65-454075d2fccd
📒 Files selected for processing (2)
backend/services/benchmark_similarity.pybackend/services/duplicate_service.py
Summary
Closes #634 — Optimizes the duplicate detection similarity search by replacing the per-ticket loop with vectorized batched cosine similarity.
Problem
DuplicateService.check_duplicate()previously iterated over every stored ticket embedding and calledutil.cos_sim()individually, resulting in O(n) separate tensor operations and kernel launches. Under load with many cached tickets, this caused significant latency.Solution
All stored embeddings are now stacked into a single 2D tensor (
_embedding_matrix) and compared against the query embedding in one batched matrix operation, thentorch.argmax()identifies the best match.Key changes
backend/services/duplicate_service.pycheck_duplicate(), added_rebuild_embedding_matrix(), lazy matrix cachingbackend/services/benchmark_similarity.pyBenchmark results
Implementation details
_embedding_matrix_dirtyisTrue(afteradd_ticket()), avoiding redundant computation.check_duplicate(),add_ticket(),is_available(),load()) is unchanged — same inputs, same outputs.torchandsentence_transformers.utilalready in the project.How to test
# Run the benchmark python backend/services/benchmark_similarity.pyChecklist
Summary by CodeRabbit
Performance
Chores