fastsearch

🌏 中文版: README.zh-CN.md

A single-binary, external hybrid-search engine that treats managed Postgres (pgvector) as the source of truth. It runs full-text / vector / hybrid retrieval over traceable document chunks (parsed by docparse-rs, or plain text / markdown) and carries page+bbox citations end-to-end to the answer layer — purpose-built for retrieval and grounding in AI Agents / RAG.

👉 Building an Agent? Start with Using fastsearch in an Agent (the four faces · RAG recipe · MCP · multi-tenant ACL · comparison with alternatives).

The key edge over ParadeDB: it runs on any managed Postgres (RDS / Supabase / Neon) — it only needs pgvector + logical replication, and requires no shared_preload_libraries native extension.

Architecture

docparse chunks / text files → Postgres (source of truth: chunk + metadata + ACL + pgvector)
                       │ logical-replication CDC (pgoutput, idempotent, LSN resumable)
                       ▼
   fastsearch engine (single binary, stateless multi-replica, derived index rebuildable)
     · BM25 inverted index (Tantivy/mmap)   · vector (brute-force / HNSW+u8 quant / pgvector direct, filter-aware)
     · fusion (RRF / normalized / weighted) · per-document ACL enforced server-side (cannot be bypassed)
     · citation traceability (page+bbox+section) + resolve_citation deep links · multimodal
   Four faces: CLI · library · REST · MCP

Five-minute quickstart

The CLI is a thin REST client of the server (like the Typesense/Qdrant/Algolia CLIs). Start a server, then the CLI talks to it — feeding a folder uploads chunks and you get full hybrid search back.

cargo build -p fastsearch-server -p fastsearch-cli
# 1) start the server (truth source / indexing / embedding / persistence all live here)
FASTSEARCH_DATA=./data FASTSEARCH_KEYS="dev=:" ./target/debug/fastsearch-server &   # REST :8642
# 2) CLI as client (--server/--key, or env FASTSEARCH_SERVER/FASTSEARCH_KEY)
# Feed it a folder (recurses .md/.txt; markdown headings become breadcrumbs), then search
./target/debug/fastsearch index-dir --server http://localhost:8642 --key dev --collection kb ./my-docs
./target/debug/fastsearch search    --server http://localhost:8642 --key dev --collection kb --query "gross margin" --json

For docparse / PDF / REST / MCP / Python usage, see the Agent usage guide.

Modules (workspace crates)

crate	Responsibility
`fastsearch-core`	Document model, query/filter AST, fusion (RRF / normalized / weighted), citations, ACL
`fastsearch-text`	Tantivy BM25 + CJK (jieba) + filtering + highlighting/facets + ACL
`fastsearch-vector`	Three vector backends: brute-force (deterministic default) / HNSW+u8 quant (approximate) / pgvector direct; filter-aware
`fastsearch-embed`	Embedder trait + configurable HTTP backend (Ollama / OpenAI-compatible)
`fastsearch-pg`	Postgres source of truth: DDL, Chunk↔row mapping, doc-level replace write path, pgvector direct query
`fastsearch-sync`	CDC apply: pgoutput decode + idempotency + LSN checkpoint + replace semantics
`fastsearch-engine`	Orchestration: ingest→CDC→index→full-text / vector / hybrid search→citations + deep pagination + rebuild + media resolution
`fastsearch-eval`	Relevance evaluation: golden set + nDCG/recall/MRR + CI regression gate
`fastsearch-server`	REST (axum) + API-key auth + ACL cannot be bypassed + metrics/rate-limit/audit + media gateway + CDC lifecycle
`fastsearch-mcp`	The fourth face: MCP (stdio + JSON-RPC) exposing the `search` / `resolve_citation` tools
`fastsearch-cli`	`fastsearch` binary: thin REST client of the server (no embedded engine). index / index-dir (feed a folder) / search / similar / ingest (client-side multi-format parse: PDF/DOCX/HTML/MD/CSV/XLSX/PPTX/SRT/EML/image + OCR + table recognition) / eval — see Ingestion & parsing
`clients/{python,ts}`	Zero-dependency SDKs + LangChain / LlamaIndex adapters. TS published on npm: `npm install fastsearch-client` (agent tool defs + RAG helpers — README); Python from source (README)

End-to-end usable: ingest/CDC → index → three search modes (keyword / vector / hybrid) → hits with citations, ACL enforced and unbypassable. All four faces in place.

Build & test

cargo test --workspace                                    # all green (PG integration runs when DATABASE_URL is set)
cargo clippy --workspace --all-targets -- -D warnings     # zero warnings
cargo fmt --all --check
DATABASE_URL=postgres://... cargo test -p fastsearch-pg   # PG integration (CI uses the pgvector/pgvector image + wal_level=logical)

Documentation

Using fastsearch in an Agent (developer usage guide)
Ingestion & parsing — multi-format ingest, OCR, table recognition, build tiers, model setup
Architecture cheat-sheet / commands / invariants (CLAUDE.md)
Module breakdown & spec index
Requirements analysis · Product design
Deployment · Capacity & SLO

License

Apache-2.0. Tokenization dictionaries come from jieba-rs (MIT, with embedded dict) — no share-alike obligation (e.g. CC-BY-SA); shipping the MIT attribution is sufficient (see the license review).

Name		Name	Last commit message	Last commit date
Latest commit History 339 Commits
.github/workflows		.github/workflows
benchmarks/qdrant-adapter		benchmarks/qdrant-adapter
clients		clients
crates		crates
deploy		deploy
docs		docs
example		example
vendor/docparse		vendor/docparse
.gitignore		.gitignore
AI_AGENT_DEV_SPEC.md		AI_AGENT_DEV_SPEC.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
docker-compose.yml		docker-compose.yml
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fastsearch

Architecture

Five-minute quickstart

Modules (workspace crates)

Build & test

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

fastsearch

Architecture

Five-minute quickstart

Modules (workspace crates)

Build & test

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages