fix(search): sanitize FTS5 query input#396
Open
drive888 wants to merge 1 commit into
Open
Conversation
Signed-off-by: drive888 <2085696241@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
变更说明
修复
buildFtsQuery()未在分词前清理 FTS5 查询语法的问题,避免用户输入中的AND/OR/NOT/NEAR等操作符影响 MATCH 查询语义。问题原因
buildFtsQuery()会把用户输入分词后用OR拼成 FTS5 查询。虽然每个 token 会被双引号包裹,但原始输入在进入 jieba 或 fallback regex 分词前没有显式清理 FTS5 操作符,存在 defense-in-depth 缺口。主要修改
buildFtsQuery()分词前清理AND/OR/NOT/NEAR/NEAR/nANDROID、ordinary、中文查询和 API 关键词CHANGELOG.mdRecall 一致性验证
为确认清洗逻辑不会影响正常召回效果,新增真实 SQLite FTS 召回对比测试。
测试方式:
VectorStoremem-1:memory sqlite user preferencemem-2:旅行计划 API TypeScriptmem-3:ANDROID scanner ordinary nearbymem-4:project issue recall screenshot comparisonlegacyQuerysanitizedQuerysearchL1Fts()查询 SQLite FTSrecord_id和content覆盖 query:
memory sqlite user preference旅行计划 API TypeScriptANDROID scanner ordinary nearbyproject recall screenshot comparisonmemory AND sqlite NEAR/3 user "preference"*验证结果:
ANDROID/ordinary:未被误删,召回一致AND/NEAR/3/ 引号 /*,但真实召回结果仍与旧逻辑一致真实输出中可以看到每组 query 都打印了:
legacyQuerysanitizedQuerylegacyRecallsanitizedRecall其中
legacyRecall和sanitizedRecall的id/content完全一致,确认清洗后没有造成正常召回退化。部分测试截图:


测试
npm test -- src/core/store/sqlite.test.ts --reporter=verbose1 test file passed8 tests passedFTS recall parity: legacy vs sanitized的真实召回对比npm test5 test files passed75 tests passednpm run build:plugin关联 Issue
Closes #160