@stll/fuzzy-search

NAPI-RS approximate substring matching for Node.js and Bun. Finds near-matches within edit distance k with stable UTF-16 offsets, replace-safe match ranges, and optional diacritics normalization.

Built on Myers' bit-parallel algorithm (1999), implemented in Rust and exposed to JavaScript via NAPI-RS.

Install

npm install @stll/fuzzy-search
# or
bun add @stll/fuzzy-search

The companion @stll/fuzzy-search-wasm package is available for browser builds.

If you use the browser package with Vite, import the bundled plugin so the generated WASM loader is not pre-bundled into broken asset URLs:

import { defineConfig } from "vite";
import stllFuzzySearchWasm from "@stll/fuzzy-search-wasm/vite";

export default defineConfig({
  plugins: [stllFuzzySearchWasm()],
});

Prebuilts are available for:

Platform	Architecture
macOS	x64, arm64
Linux (glibc)	x64, arm64
WASM	browser

Usage

import { FuzzySearch } from "@stll/fuzzy-search";

const fs = new FuzzySearch(
  [
    { pattern: "Gaislerová", distance: 1 },
    { pattern: "Novák", distance: 1 },
    { pattern: "Příbram", distance: 2 },
  ],
  {
    normalizeDiacritics: true,
    wholeWords: true,
  },
);

fs.findIter("Smlouva s Gais1erová v Pribram");
// [
//   { pattern: 0, start: 10, end: 20,
//     text: "Gais1erová", distance: 1 },
//   { pattern: 2, start: 23, end: 30,
//     text: "Pribram", distance: 0 },
// ]

Patterns

Patterns can be strings (default distance 1) or objects with explicit distance and optional name:

const fs = new FuzzySearch([
  "simple", // distance 1
  { pattern: "named", name: "entity" }, // distance 1
  { pattern: "precise", distance: 2 }, // distance 2
]);

Distance must be less than pattern length.

Options

const fs = new FuzzySearch(patterns, {
  // Strip diacritics before matching (NFD + remove
  // combining marks). "Příbram" matches "Pribram"
  // at distance 0.
  normalizeDiacritics: true, // default: false

  // Only match whole words. Uses Unicode
  // is_alphanumeric() for boundary detection.
  // CJK characters always pass (no inter-word
  // spaces in CJK).
  wholeWords: true, // default: true

  // Case-insensitive matching (Unicode-aware).
  caseInsensitive: true, // default: false

  // Unicode word boundaries (reserved for future
  // UAX#29 segmentation support).
  unicodeBoundaries: true, // default: true

  // Drop matches whose score is below threshold.
  // Score = 1 - distance / pattern.length.
  // Inclusive (score >= minScore keeps the match).
  minScore: 0.7,

  // Return only the top k matches by score, across
  // all patterns. Tie-broken by start, then pattern.
  kBest: 5,
});

Scored output

Every match carries a normalized score in [0, 1], computed as 1 - distance / pattern.length and clamped at 0. Pair it with minScore and kBest for top-N ranking without a follow-up sort:

const fs = new FuzzySearch(
  [
    { pattern: "Novák", distance: 2 },
    { pattern: "Gaislerová", distance: 2 },
  ],
  { wholeWords: true, minScore: 0.7, kBest: 3 },
);

fs.findIter("Nowák a Gais1erova");
// [
//   { pattern: 0, text: "Nowák", distance: 1, score: 0.8, ... },
//   { pattern: 1, text: "Gais1erova", distance: 2, score: 0.8, ... },
// ]

replaceAll always replaces every distance-qualified match and ignores minScore / kBest, so the replacements-by-pattern contract stays deterministic.

Replace

fs.replaceAll("Smlouva s Gais1erová", [
  "[REDACTED]",
  "[REDACTED]",
  "[REDACTED]",
]);
// "Smlouva s [REDACTED]"

replacements[i] replaces pattern i.

Distance helper

import { distance } from "@stll/fuzzy-search";

distance("kitten", "sitting"); // 3
distance("abcd", "abdc", "damerau-levenshtein"); // 1

Benchmarks

The repository includes a checked-in benchmark harness for synthetic and corpus-based searches. The inputs are public and the scripts are reproducible from the repo. Run them locally:

bun run bench:install
bun run bench:download
bun run bench:speed
bun run bench:correctness

The speed harness compares practical JS ecosystem alternatives, but not every comparator implements the same exact semantics. @stll/fuzzy-search is solving approximate substring search with offsets and replacement-friendly match ranges; tools like fuse.js and fuzzball are included as reference points, not as exact drop-in equivalents. The headline comparisons in this repo are the substring-mode rows against sliding-window Levenshtein baselines.

Representative baseline from the checked-in public harness on this machine:

runtime: Bun 1.3.12
platform: macOS 26.4.1 (Darwin arm64)

Scenario	`@stll/fuzzy-search`	Sliding-window JS baseline	Relative
Czech legal, `64 KB`, `5` names	`2.41 ms`	`80.78 ms`	`33.5x`
Bible, `4.0 MB`, `5` names	`239.91 ms`	`3903.26 ms`	`16.3x`
Czech news, `4.8 MB`, `5` names	`262.39 ms`	`4350.52 ms`	`16.6x`
German news, `5.5 MB`, `5` names	`405.72 ms`	`6816.03 ms`	`16.8x`

These rows are substring mode (wholeWords: false) with edit distance 1-2, which is the core workload this package is designed for.

Alternatives tested

fastest-levenshtein + sliding window — fastest JS Levenshtein distance
fuse.js — fuzzy search (scoring, not substring matching)
fuzzball — Python rapidfuzz port
naive JS — O(nm) Levenshtein per window position

Correctness

Correctness is covered by example-based tests and property tests. The property suite verifies distance bounds, oracle agreement, whole-word boundaries, UTF-16 offset stability, normalization behavior, and mixed option combinations over randomized inputs.

API

Method	Returns	Description
`new FuzzySearch(patterns, options?)`	instance	Build matcher
`.findIter(haystack)`	`FuzzyMatch[]`	Non-overlapping matches
`.isMatch(haystack)`	`boolean`	Any pattern matches?
`.replaceAll(haystack, replacements)`	`string`	Replace matched patterns
`.patternCount`	`number`	Number of patterns

Types

type PatternEntry =
  | string
  | { pattern: string; distance?: number; name?: string };

type Options = {
  normalizeDiacritics?: boolean; // default: false
  wholeWords?: boolean; // default: true
  caseInsensitive?: boolean; // default: false
  unicodeBoundaries?: boolean; // default: true
  minScore?: number; // drop matches below threshold
  kBest?: number; // top-k by score, ties by start
};

type FuzzyMatch = {
  pattern: number; // index into patterns array
  start: number; // UTF-16 code unit offset
  end: number; // exclusive
  text: string; // matched substring
  distance: number; // actual Levenshtein distance
  score: number; // 1 - distance/pattern.length
  name?: string; // pattern name (if provided)
};

Match offsets are UTF-16 code unit indices, compatible with String.prototype.slice().

Error handling

Constructor throws if a pattern is empty, longer than 64 characters, or has distance >= pattern length.
replaceAll throws if replacements.length does not equal patternCount.

How it works

Myers' bit-parallel algorithm scans the text in O(n) per pattern for patterns up to 64 characters. No DFA construction, no state explosion at higher distances.
Start position recovery via small-window Levenshtein: for each match end position from Myers, a window of [m-k, m+k] characters is evaluated to find the exact start and distance.
Diacritics normalization: NFD decomposition + combining mark stripping (Unicode General Category M via unicode-normalization crate). Covers all scripts.
UTF-16 offset translation: character-level matching with incremental char→UTF-16 mapping for JS string compatibility.

Limitations

Pattern length capped at 64 characters. Myers uses a single u64 bit-vector per pattern. Longer patterns would need multi-word vectors (not yet implemented).
No streaming API. The full haystack must be in memory. For chunked processing, use @stll/aho-corasick's StreamMatcher for exact prefiltering and fuzzy-search on flagged regions.
WASM requires SharedArrayBuffer. Browser builds need Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers.

Development

bun install
bun run build           # native module (requires Rust)
bun test                # 36 unit tests
bun run test:props      # 36 property tests × 1000 runs

bun run bench:install   # benchmark dependencies
bun run bench:download  # download corpora
bun run bench:speed     # speed comparison
bun run bench:correctness  # oracle verification

bun run lint            # oxlint
bun run format          # oxfmt + rustfmt

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.agents/skills		.agents/skills
.ai		.ai
.cargo		.cargo
.claude/commands		.claude/commands
.github		.github
__bench__		__bench__
__test__		__test__
crates/core		crates/core
npm		npm
provenance		provenance
scripts		scripts
src		src
wasm		wasm
.gitignore		.gitignore
.gitmodules		.gitmodules
.oxfmtrc.json		.oxfmtrc.json
.provenance.yml		.provenance.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
build.rs		build.rs
bun.lock		bun.lock
bunfig.toml		bunfig.toml
clippy.toml		clippy.toml
deny.toml		deny.toml
fuzzy-search-wasm32-wasi.d.ts		fuzzy-search-wasm32-wasi.d.ts
fuzzy-search.wasi-browser.d.ts		fuzzy-search.wasi-browser.d.ts
fuzzy-search.wasi-browser.js		fuzzy-search.wasi-browser.js
fuzzy-search.wasi.cjs		fuzzy-search.wasi.cjs
index.cjs		index.cjs
index.d.ts		index.d.ts
llms.txt		llms.txt
oxlint.config.ts		oxlint.config.ts
package.json		package.json
rustfmt.toml		rustfmt.toml
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts
wasi-worker-browser.mjs		wasi-worker-browser.mjs
wasi-worker.mjs		wasi-worker.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@stll/fuzzy-search

Install

Usage

Patterns

Options

Scored output

Replace

Distance helper

Benchmarks

Correctness

API

Types

Error handling

How it works

Limitations

Development

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

@stll/fuzzy-search

Install

Usage

Patterns

Options

Scored output

Replace

Distance helper

Benchmarks

Correctness

API

Types

Error handling

How it works

Limitations

Development

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages