Skip to content

fix(store): sanitize FTS5 operators in buildFtsQuery (fixes #160)#380

Open
MLYoshi wants to merge 1 commit into
TencentCloud:mainfrom
MLYoshi:fix/fts5-injection-160
Open

fix(store): sanitize FTS5 operators in buildFtsQuery (fixes #160)#380
MLYoshi wants to merge 1 commit into
TencentCloud:mainfrom
MLYoshi:fix/fts5-injection-160

Conversation

@MLYoshi

@MLYoshi MLYoshi commented Jul 3, 2026

Copy link
Copy Markdown

Summary

Closes #160

Adds FTS5 operator injection protection to buildFtsQuery() in src/core/store/sqlite.ts using a defense-in-depth approach:

Changes

src/core/store/sqlite.ts — New sanitizeFtsRaw() + token-level filter:

Layer Purpose
Input-level (sanitizeFtsRaw) Strip FTS5 syntax chars ("'()*:\^-) and boolean operators (AND/OR/NOT/NEAR) before tokenization. Uses \b word-boundary to protect words like ANDROID.
Token-level (FTS5_OPERATORS Set) Secondary defense — filters operator tokens that may leak through jieba tokenizer.
Quoting Filter empty tokens after replaceAll('"', "") to avoid bare "" in output.

All 4 existing call sites (auto-recall, l1-dedup, conversation-search, memory-search) benefit automatically with no changes needed.

src/core/store/sqlite.test.ts — 11 new security test cases covering:

  • FTS5 boolean operators stripped (AND/OR/NOT/NEAR)
  • Case-insensitive operator stripping
  • FTS5 special characters ("'()*:\^-)
  • Exclude operator (-) stripped
  • Column filter syntax (content:foo) blocked
  • Word-boundary protection (ANDROID preserved)
  • Pure-operator input returns null
  • Empty token filtering
  • Jieba mock path sanitization
  • Normal Chinese/English recall unaffected
  • Empty/whitespace input

Testing

npx vitest run --reporter=verbose
# 5 files passed · 78 tests passed (67 existing + 11 new, zero regressions)

…oud#160)

Add sanitizeFtsRaw() to strip FTS5 special operators and characters
from raw search input before tokenization. Defense-in-depth approach:

- Input-level: strip FTS5 syntax chars and boolean operators
  (AND/OR/NOT/NEAR) with \b word-boundary protection
- Token-level: FTS5_OPERATORS Set as secondary filter for tokens that
  may leak through jieba tokenizer
- Quoting: filter empty tokens after quote-stripping to avoid bare ""

Add 11 security test cases covering operator injection, special chars,
column filter syntax, word-boundary protection, jieba/fallback paths,
pure-operator input, empty tokens, and normal recall preservation.
@Maxwell-Code07

Copy link
Copy Markdown
Collaborator

Thank you for submitting this PR and participating in Tencent Rhino-bird Open-source Training Program!
We have successfully received your submission. The program is currently in full swing, and we will complete the Code Review for you as soon as possible. Please keep an eye on the status notifications for this PR so you can follow up promptly once the review feedback is provided.
Thanks again for your contribution and open-source spirit! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(search): buildFtsQuery does not sanitize FTS5 operators — user input alters query semantics

2 participants