A web search microservice and library that aggregates results from 20+ search engines and knowledge/paper/code APIs, with intelligent result merging and reranking. Ships as two first-class implementations — JavaScript (@link-assistant/web-search) and Rust (the web-search crate) — that stay in lock-step: the same provider catalog, categories, merge strategies, and CLI/HTTP surface in both languages.
- Many providers, four categories: 40 providers grouped into
search,knowledge,papers, andcode— a superset of FormalAI'sweb_search_coreregistry (see Search Providers and the issue #5 compatibility map). - Descriptor-driven catalog: Engines are declared as data (URL, request kind, parser) and run through one shared
GenericProvider, so adding an engine in one place adds it everywhere. - web-capture component: JavaScript can lazily load
@link-assistant/web-capture, and Rust delegateswc:*providers to the publishedweb-capturecrate. - Result merging: Combine results using RRF, weighted scoring, or interleaving.
- Configurable weights: Adjust provider weights for custom reranking.
- URL deduplication: Automatic normalization and deduplication across providers.
- Typed provider registry: A single source of truth powering provider discovery (CLI
--list-providers, HTTP/providers,/categories) and provider instantiation. - Dual language parity: Identical behavior and an extensive shared test suite across JavaScript and Rust.
- Multi-runtime support: The JavaScript build works with Bun, Node.js, and Deno.
web-search ships the same three entry points in both languages. Every entry
point is available from the published packages — no Git checkout or path
dependency required.
| Entry point | JavaScript (@link-assistant/web-search) |
Rust (web-search crate) |
|---|---|---|
| Library | import { createSearchEngine } from '@link-assistant/web-search' |
use web_search::WebSearchEngine; |
| CLI | npx web-search "query" (bin web-search) |
web-search "query" (binary web-search) |
| HTTP server | npx web-search serve --port 3000 |
web-search serve --port 3000 |
- npm package:
@link-assistant/web-search - crates.io crate:
web-search - GitHub releases:
js-v*(JavaScript) andrust-v*(Rust)
# Install the latest published version
npm install @link-assistant/web-search # npm
bun add @link-assistant/web-search # bun
yarn add @link-assistant/web-search # yarn
# Pin a specific published version (recommended for CI / reproducible builds)
npm install @link-assistant/web-search@0.9.0# Add the latest published crate
cargo add web-search
# Pin a specific published version (recommended for CI / reproducible builds)
cargo add web-search@0.2.0
# Or install the CLI/server binary directly from crates.io
cargo install web-search # latest
cargo install web-search@0.2.0 # pinnedThe pinned versions above match the current published baseline (npm
@link-assistant/web-search@0.9.0, crates.ioweb-search@0.2.0). Replace them with the latest tags shown on the badges at the top of this README.
import {
WebSearchEngine,
createSearchEngine,
} from '@link-assistant/web-search';
// Create a search engine
const engine = createSearchEngine();
// Search across all providers
const results = await engine.search('artificial intelligence');
// Search with options
const results = await engine.search('machine learning', {
limit: 20,
providers: ['google', 'duckduckgo'],
strategy: 'rrf',
weights: { google: 1.5, duckduckgo: 1.0 },
});
// Search single provider
const googleResults = await engine.searchSingle('deep learning', 'google');# Start the server
npx web-search serve --port 3000
# Or with bun
bunx web-search serve --port 3000API Endpoints:
GET /search?q=<query>- Search all providersPOST /search- Search with options in bodyGET /search/:provider?q=<query>- Search single providerGET /providers- List available providers and the typed registry (filter with?category=<search|knowledge|papers|code>)GET /categories- List provider ids grouped by categoryGET /health- Health check
Example:
curl "http://localhost:3000/search?q=rust+programming&limit=10&strategy=rrf"
# Only the scholarly-paper providers
curl "http://localhost:3000/providers?category=papers"
# Provider ids per category
curl "http://localhost:3000/categories"# Search from command line
npx web-search "artificial intelligence"
# With options
npx web-search "machine learning" --limit 20 --providers google,bing --format json
# Search category-specific providers
npx web-search "transformer architecture" --providers arxiv,crossref,openalex
# Output just URLs
npx web-search "deep learning" --format urls
# Discover every available provider, grouped by category
npx web-search --list-providersDefault strategy. Combines results by their rank positions across providers.
const results = await engine.search(query, { strategy: 'rrf' });Score results based on provider weights and rank positions.
const results = await engine.search(query, {
strategy: 'weighted',
weights: { google: 2.0, duckduckgo: 1.0, bing: 0.5 },
});Round-robin style interleaving of results from each provider.
const results = await engine.search(query, { strategy: 'interleave' });Providers are organized into the four categories formal-ai consumes, and the
catalog is a superset of FormalAI's web_search_core registry (issue #5 —
see the compatibility map).
Run npx web-search --list-providers (or cargo run -- --list-providers from
rust/) to print the live catalog; both languages report the same 40
providers.
| Category | Providers | Access |
|---|---|---|
search |
google, bing, duckduckgo, searx, brave, mojeek, ecosia, startpage, yahoo, yandex, lite (DuckDuckGo Lite), wc:* |
API / hybrid / HTML / component |
knowledge |
wikipedia, wikidata, wiktionary, wikinews, internet-archive, dbpedia, openlibrary, semantic-scholar, openalex, crossref, cambridge-dictionary, merriam-webster, dictionary-com, collins-dictionary | API / HTML |
papers |
arxiv, europepmc, doaj | API (CORS-readable) |
code |
github, hackernews, gitlab, codeberg, gitee, bitbucket, gitflic | API (CORS-readable) |
Native search providers are listed above. The optional wc:* providers are
the same engines delegated through the web-capture component.
apiproviders call a JSON/Atom endpoint directly.htmlproviders scrape a search-results page with a per-engine regex through the shared anchor-list parser (the search engines, plus the dictionaryknowledgeproviders, which resolve a headword to its canonical entry page).hybridproviders (google, bing) use an official API when credentials are configured and fall back to scraping otherwise.componentproviders (wc:*) are backed by the optional@link-assistant/web-capturelibrary — see web-capture component.
The category defaults follow FormalAI's DuckDuckGo-first plan: duckduckgo
(search), wikipedia (knowledge), arxiv (papers), and github (code). When
no providers are requested, the live default plan is duckduckgo,
internet-archive, wikipedia, wikidata, wiktionary, wikinews.
GITHUB_TOKEN is optional but raises the GitHub search rate limit when set.
import {
GoogleProvider,
BingProvider,
DuckDuckGoProvider,
} from '@link-assistant/web-search';
// Google: Custom Search API when configured, scraping fallback otherwise
const google = new GoogleProvider({
apiKey: 'your-api-key',
searchEngineId: 'your-cx-id',
});
// Bing: Web Search API when configured, scraping fallback otherwise
const bing = new BingProvider({ apiKey: 'your-bing-api-key' });
// DuckDuckGo: HTML scraping, no API key required
const duckduckgo = new DuckDuckGoProvider();Every other engine in the table is declared as a descriptor (id, request kind,
parser) and instantiated through a single GenericProvider. The registry can
build the whole catalog so you can pick any provider by id:
import { buildProviders, API_ENGINES } from '@link-assistant/web-search';
// Instantiate the full catalog (Map<id, provider>) and select one
const arxiv = buildProviders().get('arxiv');
const results = await arxiv.search('graph neural networks', { limit: 5 });
// Or build directly from a descriptor
import { createGenericProvider } from '@link-assistant/web-search';
const crossref = createGenericProvider(
API_ENGINES.find((d) => d.id === 'crossref')
);Any provider can be backed by the optional
@link-assistant/web-capture
component library, exposed through the wc:* provider ids
(wc:wikipedia, wc:duckduckgo, wc:google, wc:bing, wc:brave). The
dependency is loaded lazily; when it is not installed the provider warns once and
returns an empty result set so the rest of the aggregation keeps working. You can
also inject a custom implementation for testing:
import { createWebCaptureProvider } from '@link-assistant/web-search';
const provider = createWebCaptureProvider({
engine: 'wikipedia',
// Optional: inject a fetch/search implementation (defaults to @link-assistant/web-capture)
searchImpl: async (query, options) => [
/* { title, url, snippet } */
],
});A typed registry is the single source of truth for discovery and instantiation:
import {
CATEGORIES, // ['search', 'knowledge', 'papers', 'code']
getRegistry, // full provider metadata
getProviderIds, // ids, optionally filtered by category
getDefaultProviderIds, // ids used when none are specified
buildProviders, // instantiate the whole catalog
} from '@link-assistant/web-search';
getProviderIds('papers'); // ['crossref', 'openalex', 'arxiv']const engine = new WebSearchEngine(config);
// Search methods
await engine.search(query, options);
await engine.searchSingle(query, providerName, options);
// Provider management
engine.getAvailableProviders();
engine.getProviderStatus();
engine.setProviderWeight(name, weight);
engine.setProviderEnabled(name, enabled);
engine.getProvider(name);import {
mergeResults,
mergeWithRRF,
mergeWithWeights,
mergeWithInterleave,
} from '@link-assistant/web-search';
// Merge results from multiple providers
const merged = mergeResults(resultsByProvider, {
strategy: 'rrf',
weights: { google: 1.5 },
rrfK: 60,
removeDuplicates: true,
});A first-class Rust implementation lives in the rust/ directory (crate
web-search). It mirrors the JavaScript library: the same descriptor-driven
catalog, the same typed registry, the same four categories, and the same 22
providers — verified by a shared test suite (cargo test).
cd rust
cargo build --release# Search
./target/release/web-search "artificial intelligence" --limit 10
# Category-specific providers
./target/release/web-search "graph neural networks" --providers arxiv,crossref
# List every available provider, grouped by category (matches the JS CLI)
./target/release/web-search --list-providers
# Start server (GET /search, /providers, /categories, /health)
./target/release/web-search serve --port 3000use web_search::{WebSearchEngine, SearchOptions, MergeStrategy};
let engine = WebSearchEngine::new();
let results = engine.search_with_options(
"machine learning",
SearchOptions { limit: Some(10), ..Default::default() },
None,
Some(MergeOptions { strategy: MergeStrategy::Rrf, ..Default::default() })
).await?;Language-specific project files live under js/ and rust/; repository-level
documentation and workflow metadata stay at the root. CI/CD helper scripts live
with their language: js/scripts/ and rust/scripts/.
cd js
# Install dependencies
bun install
# Run tests
bun test
# Run with other runtimes
npm test
deno test --allow-read --allow-env --allow-net
# Lint code
bun run lint
# Format code
bun run format
# Verify JavaScript/Rust layout and provider parity
cd ..
node js/scripts/check-js-rust-parity.mjscd rust
# Run tests
cargo test
# Run clippy
cargo clippy
# Format code
cargo fmt
# Run Rust CI/CD guard scripts from the repository root
cd ..
rust-script rust/scripts/check-file-size.rs --rust-root rust
rust-script rust/scripts/check-crate-size.rs --rust-root rustGOOGLE_API_KEY- Google Custom Search API keyGOOGLE_SEARCH_ENGINE_ID- Google Custom Search Engine IDBING_API_KEY- Bing Web Search API key
Unlicense - Public Domain