perf: eliminate N+1 metadata queries in vector search (searchL1Vector / searchL0Vector)#13
Open
yuanrengu wants to merge 1 commit into
Open
Conversation
searchL1Vector and searchL0Vector previously issued one metadata query per vec0 KNN result (N individual stmtGetMeta.get() calls). With topK=20 and a buffer of 10, this meant up to 30 round-trips per search. Replace with a single `SELECT ... WHERE record_id IN (?, ...)` query via new batchGetL1Meta/batchGetL0Meta helpers, then look up results from a Map. This reduces SQLite round-trips from O(N) to O(1) per vector search while preserving the existing fault-tolerant behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: yuanrengu <heyonggang0811@126.com>
b6eb86a to
ed126e6
Compare
Collaborator
|
Hi @yuanrengu, thanks for the PR! 👍 The N+1 query optimization in vector search is an interesting area to look at. We'll review the implementation details — including edge cases like empty Thanks for the contribution! 🙏 |
Author
|
@Maxwell-Code07 Thanks for taking a look! Let me know if any changes are needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace per-row metadata lookups with a single batch query in
searchL1VectorandsearchL0Vector.Previously, after vec0 returned KNN results, each
record_idwas looked up individually viastmtGetMeta.get(record_id)— an N+1 pattern that issues up totopK + buffer(default 30) separate SQLite round-trips per vector search.Now, all valid
record_ids are collected and fetched in oneSELECT ... WHERE record_id IN (?, ...)query via newbatchGetL1Meta/batchGetL0Metahelpers, then resolved from aMapin O(1).Changes
batchGetL1Meta(recordIds)— new private method, batch-fetches L1 metadata, returnsMap<string, meta>batchGetL0Meta(recordIds)— new private method, batch-fetches L0 metadata, returnsMap<string, meta>searchL1Vector— refactored to use batch lookup instead of N individualstmtGetMeta.get()callssearchL0Vector— refactored to use batch lookup instead of N individualstmtL0GetMeta.get()callsBehavior
No functional change. The zero-vector filtering, orphan detection, result ordering, and fault-tolerant behavior are all preserved:
continuein loopfilterbefore loopTest plan
npx tsdown)