feat: Optimize similarity search with vectorized cosine similarity (#634) by fennhelloworld · Pull Request #648 · ritesh-1918/HELPDESK.AI

fennhelloworld · 2026-05-29T19:29:26Z

Summary

Closes #634 — Optimizes the duplicate detection similarity search by replacing the per-ticket loop with vectorized batched cosine similarity.

Problem

DuplicateService.check_duplicate() previously iterated over every stored ticket embedding and called util.cos_sim() individually, resulting in O(n) separate tensor operations and kernel launches. Under load with many cached tickets, this caused significant latency.

Solution

All stored embeddings are now stacked into a single 2D tensor (_embedding_matrix) and compared against the query embedding in one batched matrix operation, then torch.argmax() identifies the best match.

Key changes

File	Change
`backend/services/duplicate_service.py`	Vectorized `check_duplicate()`, added `_rebuild_embedding_matrix()`, lazy matrix caching
`backend/services/benchmark_similarity.py`	New benchmark script comparing loop vs vectorized performance

Benchmark results

Tickets	Loop (ms)	Vectorized (ms)	Speedup
10	0.70	0.07	10x
100	2.90	0.09	33x
500	14.43	0.07	196x
1,000	29.52	0.07	394x
5,000	144.16	0.34	421x

Implementation details

Lazy rebuild: The embedding matrix is only rebuilt when _embedding_matrix_dirty is True (after add_ticket()), avoiding redundant computation.
Backward compatible: The public API (check_duplicate(), add_ticket(), is_available(), load()) is unchanged — same inputs, same outputs.
No new dependencies: Uses existing torch and sentence_transformers.util already in the project.

How to test

# Run the benchmark
python backend/services/benchmark_similarity.py

Checklist

Code follows existing style and conventions
No breaking changes to public API
Benchmark demonstrates measurable improvement
Closes [BOUNTY] [level:critical] Vectorize Sentence-Transformers Cosine Similarity Computations with NumPy and ONNX Runtime #634

Summary by CodeRabbit

Performance
- Optimized duplicate detection to use vectorized similarity computations, improving throughput when processing large ticket batches.
Chores
- Added internal performance benchmarking tooling for duplicate detection analysis.

…itesh-1918#634) Replace per-ticket loop in DuplicateService.check_duplicate() with vectorized batched cosine similarity computation. Instead of calling util.cos_sim() individually for each stored embedding (O(n) kernel launches), all stored embeddings are stacked into a single 2D tensor and compared against the query in one matrix operation. Key changes: - Add _embedding_matrix, _ticket_ids, and _embedding_matrix_dirty to DuplicateService for lazy-rebuild caching - Add _rebuild_embedding_matrix() to stack embeddings into 2D tensor - Rewrite check_duplicate() to use vectorized util.cos_sim() with the stacked matrix and torch.argmax() for best-match selection - Mark matrix dirty on add_ticket() for correctness - Add benchmark_similarity.py showing speedup results: n=10: 10x, n=100: 33x, n=500: 196x, n=1000: 394x, n=5000: 421x Closes ritesh-1918#634

vercel · 2026-05-29T19:29:31Z

Someone is attempting to deploy a commit to the ritesh Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-29T19:29:41Z

📝 Walkthrough

Walkthrough

The PR introduces vectorized cosine similarity computation to duplicate ticket detection. DuplicateService now caches a stacked embedding matrix and replaces per-ticket loops with a single batched similarity call. A benchmark script validates the performance improvement across multiple dataset sizes.

Changes

Vectorized Duplicate Detection

Layer / File(s)	Summary
DuplicateService vectorized implementation `backend/services/duplicate_service.py`	Adds torch and numpy imports, caches an embedding matrix and ticket-ID list with a dirty flag, introduces `_rebuild_embedding_matrix()` to construct the matrix from stored tickets, marks the cache as stale in `add_ticket()`, and replaces per-ticket cosine similarity looping in `check_duplicate()` with vectorized matrix computation via `torch.argmax`.
Benchmark comparison script `backend/services/benchmark_similarity.py`	Generates synthetic unit-normalized embeddings, implements separate loop-based and vectorized similarity benchmark functions, and runs both approaches across multiple dataset sizes to measure and report speedup.

Sequence Diagram(s)

Not applicable. The changes are a performance optimization refactor within a single service class and a supporting benchmark utility; they do not introduce new multi-component control flows or external interactions that would benefit from visualization beyond the checkpoints already shown in the hidden artifact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

#634 — Vectorize Sentence-Transformers Cosine Similarity Computations: This PR directly implements the core vectorization objective using PyTorch (not ONNX); it replaces loop-based cosine similarity with batched matrix operations and includes benchmark validation of the speedup.

Poem

🐰 A rabbit's ode to swift lookups—
No loops, just stacked embeddings bright,
One argmax finds the match at light!
The matrix dances, fast and lean,
Benchmarks sing of scaling supreme. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	Vectorization objective met with torch batched operations; benchmarks provided showing speedups; but ONNX export script not implemented, partially addressing `#634` requirements.	Implement ONNX export script to convert the SentenceTransformer model to .onnx format as required by issue `#634`.
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely summarizes the main optimization: vectorizing cosine similarity for faster duplicate detection.
Out of Scope Changes check	✅ Passed	All changes are in-scope: benchmark script and DuplicateService modifications directly support the vectorization objective; no unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

backend/services/duplicate_service.py (2)
125-125: ⚡ Quick win

Make the optional parameter explicit (float | None).

Ruff flags this as an implicit Optional (RUF013). Line 23 already uses | None syntax, so this is consistent with the file.
♻️ Proposed fix
-    def check_duplicate(self, text: str, threshold: float = None) -> dict:
+    def check_duplicate(self, text: str, threshold: float | None = None) -> dict:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/duplicate_service.py` at line 125, The function signature
for check_duplicate currently uses the implicit Optional pattern (threshold:
float = None); update the type annotation to be explicit by changing it to
threshold: float | None = None in the check_duplicate method so it matches the
file's use of `| None` and satisfies the RUF013 rule.
96-112: ⚡ Quick win

Fix potential state desync in _rebuild_embedding_matrix() by snapshotting _tickets

DuplicateService._rebuild_embedding_matrix() builds _ticket_ids and the stacked embeddings from two separate passes over self._tickets. add_ticket() appends to self._tickets and sets _embedding_matrix_dirty=True, while check_duplicate() may rebuild the matrix when dirty/stale, so concurrent mutation could desync _ticket_ids vs _embedding_matrix.

In backend/main.py, the call sites for duplicate_service.add_ticket(...) and duplicate_service.check_duplicate(...) are inside async def routes, but the service methods are synchronous and torch ops may release the GIL; if the app is running with multiple threads/workers within a process, this race is still plausible. Snapshotting avoids the mismatch without relying on deployment details.
-        self._ticket_ids = [tid for tid, _, _ in self._tickets]
-        embeddings = [emb for _, emb, _ in self._tickets]
-        self._embedding_matrix = torch.stack(embeddings)
+        tickets = list(self._tickets)  # consistent snapshot
+        self._ticket_ids = [tid for tid, _, _ in tickets]
+        embeddings = [emb for _, emb, _ in tickets]
+        self._embedding_matrix = torch.stack(embeddings)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/duplicate_service.py` around lines 96 - 112,
_rebuild_embedding_matrix currently iterates over self._tickets twice, which can
lead to _ticket_ids vs the stacked _embedding_matrix getting out of sync if
self._tickets is mutated concurrently (e.g., between add_ticket and
check_duplicate); fix by snapshotting tickets at the start of
_rebuild_embedding_matrix (e.g., local_tickets = list(self._tickets)) and then
build _ticket_ids and embeddings from that snapshot before calling torch.stack,
then set _embedding_matrix and _ticket_ids and clear _embedding_matrix_dirty;
this ensures atomic consistency without changing add_ticket or check_duplicate
signatures.
backend/services/benchmark_similarity.py (1)
26-45: ⚡ Quick win

Add an untimed warm-up before measuring.

The first timed round absorbs one-time allocation/kernel-init overhead, which can skew the reported averages (most visibly at small n). Since the PR's speedup claims rely on these numbers, a warm-up call makes them more representative.
♻️ Proposed fix
 def benchmark_loop(query: torch.Tensor, stored: list[torch.Tensor], rounds: int = 5) -> float:
     """Old approach: iterate and compute cos_sim one at a time."""
+    for emb in stored:  # warm-up
+        util.cos_sim(query, emb)
     times = []
 def benchmark_vectorized(query: torch.Tensor, matrix: torch.Tensor, rounds: int = 5) -> float:
     """New approach: single batched cos_sim call."""
     query_2d = query.unsqueeze(0)
+    util.cos_sim(query_2d, matrix)  # warm-up
     times = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/benchmark_similarity.py` around lines 26 - 45, Both
benchmark_loop and benchmark_vectorized should perform an untimed warm-up call
to amortize one-time allocation/kernel-init overhead before starting the timed
rounds; update the functions (benchmark_loop and benchmark_vectorized) to run
the same computation once (e.g., one pass over stored in benchmark_loop and one
util.cos_sim call in benchmark_vectorized) prior to the for _ in range(rounds)
timing loop so the measured rounds exclude initialization costs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/services/benchmark_similarity.py`:
- Around line 26-45: Both benchmark_loop and benchmark_vectorized should perform
an untimed warm-up call to amortize one-time allocation/kernel-init overhead
before starting the timed rounds; update the functions (benchmark_loop and
benchmark_vectorized) to run the same computation once (e.g., one pass over
stored in benchmark_loop and one util.cos_sim call in benchmark_vectorized)
prior to the for _ in range(rounds) timing loop so the measured rounds exclude
initialization costs.

In `@backend/services/duplicate_service.py`:
- Line 125: The function signature for check_duplicate currently uses the
implicit Optional pattern (threshold: float = None); update the type annotation
to be explicit by changing it to threshold: float | None = None in the
check_duplicate method so it matches the file's use of `| None` and satisfies
the RUF013 rule.
- Around line 96-112: _rebuild_embedding_matrix currently iterates over
self._tickets twice, which can lead to _ticket_ids vs the stacked
_embedding_matrix getting out of sync if self._tickets is mutated concurrently
(e.g., between add_ticket and check_duplicate); fix by snapshotting tickets at
the start of _rebuild_embedding_matrix (e.g., local_tickets =
list(self._tickets)) and then build _ticket_ids and embeddings from that
snapshot before calling torch.stack, then set _embedding_matrix and _ticket_ids
and clear _embedding_matrix_dirty; this ensures atomic consistency without
changing add_ticket or check_duplicate signatures.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42974083-3733-43bf-ad65-454075d2fccd

📥 Commits

Reviewing files that changed from the base of the PR and between da8faf2 and 35a9990.

📒 Files selected for processing (2)

backend/services/benchmark_similarity.py
backend/services/duplicate_service.py

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Optimize similarity search with vectorized cosine similarity (#634)#648

feat: Optimize similarity search with vectorized cosine similarity (#634)#648
fennhelloworld wants to merge 1 commit into
ritesh-1918:mainfrom
fennhelloworld:feat/vectorized-cosine-similarity

fennhelloworld commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fennhelloworld commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Key changes

Benchmark results

Implementation details

How to test

Checklist

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fennhelloworld commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading