NAPI-RS approximate substring matching for Node.js and Bun. Finds near-matches within edit distance k with stable UTF-16 offsets, replace-safe match ranges, and optional diacritics normalization.
Built on Myers' bit-parallel algorithm (1999), implemented in Rust and exposed to JavaScript via NAPI-RS.
npm install @stll/fuzzy-search
# or
bun add @stll/fuzzy-searchThe companion @stll/fuzzy-search-wasm package is
available for browser builds.
If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:
import { defineConfig } from "vite";
import stllFuzzySearchWasm from "@stll/fuzzy-search-wasm/vite";
export default defineConfig({
plugins: [stllFuzzySearchWasm()],
});Prebuilts are available for:
| Platform | Architecture |
|---|---|
| macOS | x64, arm64 |
| Linux (glibc) | x64, arm64 |
| WASM | browser |
import { FuzzySearch } from "@stll/fuzzy-search";
const fs = new FuzzySearch(
[
{ pattern: "Gaislerová", distance: 1 },
{ pattern: "Novák", distance: 1 },
{ pattern: "Příbram", distance: 2 },
],
{
normalizeDiacritics: true,
wholeWords: true,
},
);
fs.findIter("Smlouva s Gais1erová v Pribram");
// [
// { pattern: 0, start: 10, end: 20,
// text: "Gais1erová", distance: 1 },
// { pattern: 2, start: 23, end: 30,
// text: "Pribram", distance: 0 },
// ]Patterns can be strings (default distance 1) or objects with explicit distance and optional name:
const fs = new FuzzySearch([
"simple", // distance 1
{ pattern: "named", name: "entity" }, // distance 1
{ pattern: "precise", distance: 2 }, // distance 2
]);Distance must be less than pattern length.
const fs = new FuzzySearch(patterns, {
// Strip diacritics before matching (NFD + remove
// combining marks). "Příbram" matches "Pribram"
// at distance 0.
normalizeDiacritics: true, // default: false
// Only match whole words. Uses Unicode
// is_alphanumeric() for boundary detection.
// CJK characters always pass (no inter-word
// spaces in CJK).
wholeWords: true, // default: true
// Case-insensitive matching (Unicode-aware).
caseInsensitive: true, // default: false
// Unicode word boundaries (reserved for future
// UAX#29 segmentation support).
unicodeBoundaries: true, // default: true
// Drop matches whose score is below threshold.
// Score = 1 - distance / pattern.length.
// Inclusive (score >= minScore keeps the match).
minScore: 0.7,
// Return only the top k matches by score, across
// all patterns. Tie-broken by start, then pattern.
kBest: 5,
});Every match carries a normalized score in [0, 1],
computed as 1 - distance / pattern.length and
clamped at 0. Pair it with minScore and kBest for
top-N ranking without a follow-up sort:
const fs = new FuzzySearch(
[
{ pattern: "Novák", distance: 2 },
{ pattern: "Gaislerová", distance: 2 },
],
{ wholeWords: true, minScore: 0.7, kBest: 3 },
);
fs.findIter("Nowák a Gais1erova");
// [
// { pattern: 0, text: "Nowák", distance: 1, score: 0.8, ... },
// { pattern: 1, text: "Gais1erova", distance: 2, score: 0.8, ... },
// ]replaceAll always replaces every distance-qualified
match and ignores minScore / kBest, so the
replacements-by-pattern contract stays
deterministic.
fs.replaceAll("Smlouva s Gais1erová", [
"[REDACTED]",
"[REDACTED]",
"[REDACTED]",
]);
// "Smlouva s [REDACTED]"replacements[i] replaces pattern i.
import { distance } from "@stll/fuzzy-search";
distance("kitten", "sitting"); // 3
distance("abcd", "abdc", "damerau-levenshtein"); // 1The repository includes a checked-in benchmark harness for synthetic and corpus-based searches. The inputs are public and the scripts are reproducible from the repo. Run them locally:
bun run bench:install
bun run bench:download
bun run bench:speed
bun run bench:correctnessThe speed harness compares practical JS ecosystem
alternatives, but not every comparator implements the
same exact semantics. @stll/fuzzy-search is solving
approximate substring search with offsets and
replacement-friendly match ranges; tools like
fuse.js and fuzzball are included as reference
points, not as exact drop-in equivalents. The
headline comparisons in this repo are the
substring-mode rows against sliding-window
Levenshtein baselines.
Representative baseline from the checked-in public harness on this machine:
- runtime: Bun
1.3.12 - platform: macOS
26.4.1(Darwin arm64)
| Scenario | @stll/fuzzy-search |
Sliding-window JS baseline | Relative |
|---|---|---|---|
Czech legal, 64 KB, 5 names |
2.41 ms |
80.78 ms |
33.5x |
Bible, 4.0 MB, 5 names |
239.91 ms |
3903.26 ms |
16.3x |
Czech news, 4.8 MB, 5 names |
262.39 ms |
4350.52 ms |
16.6x |
German news, 5.5 MB, 5 names |
405.72 ms |
6816.03 ms |
16.8x |
These rows are substring mode (wholeWords: false)
with edit distance 1-2, which is the core workload
this package is designed for.
Alternatives tested
- fastest-levenshtein + sliding window — fastest JS Levenshtein distance
- fuse.js — fuzzy search (scoring, not substring matching)
- fuzzball — Python rapidfuzz port
- naive JS — O(nm) Levenshtein per window position
Correctness is covered by example-based tests and property tests. The property suite verifies distance bounds, oracle agreement, whole-word boundaries, UTF-16 offset stability, normalization behavior, and mixed option combinations over randomized inputs.
| Method | Returns | Description |
|---|---|---|
new FuzzySearch(patterns, options?) |
instance | Build matcher |
.findIter(haystack) |
FuzzyMatch[] |
Non-overlapping matches |
.isMatch(haystack) |
boolean |
Any pattern matches? |
.replaceAll(haystack, replacements) |
string |
Replace matched patterns |
.patternCount |
number |
Number of patterns |
type PatternEntry =
| string
| { pattern: string; distance?: number; name?: string };
type Options = {
normalizeDiacritics?: boolean; // default: false
wholeWords?: boolean; // default: true
caseInsensitive?: boolean; // default: false
unicodeBoundaries?: boolean; // default: true
minScore?: number; // drop matches below threshold
kBest?: number; // top-k by score, ties by start
};
type FuzzyMatch = {
pattern: number; // index into patterns array
start: number; // UTF-16 code unit offset
end: number; // exclusive
text: string; // matched substring
distance: number; // actual Levenshtein distance
score: number; // 1 - distance/pattern.length
name?: string; // pattern name (if provided)
};Match offsets are UTF-16 code unit indices,
compatible with String.prototype.slice().
- Constructor throws if a pattern is empty, longer than 64 characters, or has distance >= pattern length.
replaceAllthrows ifreplacements.lengthdoes not equalpatternCount.
-
Myers' bit-parallel algorithm scans the text in O(n) per pattern for patterns up to 64 characters. No DFA construction, no state explosion at higher distances.
-
Start position recovery via small-window Levenshtein: for each match end position from Myers, a window of [m-k, m+k] characters is evaluated to find the exact start and distance.
-
Diacritics normalization: NFD decomposition + combining mark stripping (Unicode General Category M via
unicode-normalizationcrate). Covers all scripts. -
UTF-16 offset translation: character-level matching with incremental char→UTF-16 mapping for JS string compatibility.
- Pattern length capped at 64 characters. Myers uses a single u64 bit-vector per pattern. Longer patterns would need multi-word vectors (not yet implemented).
- No streaming API. The full haystack must be in
memory. For chunked processing, use
@stll/aho-corasick'sStreamMatcherfor exact prefiltering and fuzzy-search on flagged regions. - WASM requires
SharedArrayBuffer. Browser builds needCross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corpheaders.
bun install
bun run build # native module (requires Rust)
bun test # 36 unit tests
bun run test:props # 36 property tests × 1000 runs
bun run bench:install # benchmark dependencies
bun run bench:download # download corpora
bun run bench:speed # speed comparison
bun run bench:correctness # oracle verification
bun run lint # oxlint
bun run format # oxfmt + rustfmt