Privacy-first, unlimited, zero-cost neural search for AI agents
A high-precision Model Context Protocol (MCP) server that provides deep investigation tools for LLMs. Built on the Investigator Pattern: rich metadata and deep content extraction enable models (Claude, Gemini, GPT-4) to perform factually perfect research β without complex prompts.
π§π· VersΓ£o em PortuguΓͺs
| Feature | Exa Search | WIE - Web Investigator Engine |
|---|---|---|
| Cost | Paid tier | Free & open-source |
| Privacy | Data may be logged | Zero-logging, self-hosted |
| Customization | Limited | Full source access |
| Infrastructure | External API dependency | Run on your own hardware |
| Transparency | Proprietary black box | Fully transparent |
| Speed | Fast | Tuned for LLM efficiency |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP SERVER β
β (Investigator Pattern) β
β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ ββββββββββββββββββββ β
β β web_search() β β web_search_advanced() β β site_search() β β
β β (Discovery) β β (Full Exa parity) β β (Definitive) β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ ββββββββββββββββββββ β
β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ β
β β fetch_page() β β get_contents() β β
β β (Single URL) β β (Batch + Highlights)β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββ β
β β answer() β (Extractive QA) β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND LAYER β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
β β SearxNG β β FlashRank β β Content Extraction β β
β β Multi-Engine β β Reranking β β curl_cffi β nodriver β httpxβ β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Type | Description | Use Case |
|---|---|---|
auto |
Best overall β balanced speed/quality | General research, first-pass discovery |
fast |
Instant results, minimal processing | Quick facts, single-engine, no rerank |
instant |
Sub-second, ultra-lean results | "I'm feeling lucky" β top 3 results only |
deep-lite |
Light deep research β 3 query variations | Background research, initial investigation |
deep |
Full deep research β 5 query variations | Comprehensive reports, detailed analysis |
deep-reasoning |
Multi-step chain-of-thought β 7+ variations | Complex investigations, multi-perspective synthesis |
Choose the right search type based on your speed/quality needs:
| Type | Speed | Queries | Rerank | Best For |
|---|---|---|---|---|
instant |
β‘β‘β‘ Very fast | 1 | Quick facts, "I'm feeling lucky" | |
fast |
β‘β‘ Fast | 1 | General quick search | |
auto |
β‘ Balanced | 1 | Default β recommended | |
deep_lite |
π’ Slow | 3 | Background research | |
deep |
π’π’ Slower | 5 | Comprehensive reports | |
deep_reasoning |
π’π’π’ Slowest | 7+ | Complex investigations |
For maximum speed, edit your .env file:
# Maximum speed (1 engine, 5s timeout, fast mode)
SEARXNG_ENGINES=google
SEARCH_TIMEOUT_SECONDS=5
SEARCH_DEFAULT_TYPE=fastFor maximum quality, use deep modes in your calls:
web_search_advanced({
"query": "your topic",
"type": "deep", # or "deep_reasoning"
"numResults": 20
})Filter results by content type for more targeted research:
| Category | Description | Best For |
|---|---|---|
general |
General web content | Broad topics |
news |
News articles and outlets | Current events, breaking news |
research_paper |
Scholarly articles, arxiv | Academic research, citations |
company |
Business sites, org pages | Company profiles, business info |
people |
Biography, profiles, social | Person lookup, biographical info |
financial_report |
SEC filings, earnings, PDFs | Investment research, financial analysis |
product |
Product pages, e-commerce | Product specifications, reviews |
personal_site |
Blogs, portfolios, indie sites | Expert opinions, personal insights |
code |
GitHub, StackOverflow, docs | Code examples, documentation |
video |
Video content | Tutorials, visual demonstrations |
image |
Images and visual content | Visual references, diagrams |
Basic multi-engine search with authority tier scoring.
{
"query": "latest Python release 2025",
"time_range": "day", # hour, day, week, month, year
"categories": "news", # general, news, images, videos, it, science
"safesearch": "0", # 0=off, 1=moderate, 2=strict
"limit": 10 # 1-20 results
}| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
Required | Search query string |
time_range |
string |
null |
Filter: hour, day, week, month, year |
categories |
string |
null |
Category filter |
safesearch |
string |
null |
Safe search: 0, 1, 2 |
limit |
int |
10 |
Results 1-20 |
Full-featured search with all filter options, highlights, and summaries.
{
"query": "OpenAI GPT-5 release date",
"type": "deep",
"numResults": 20,
"category": "news",
"includeDomains": ["reuters.com", "bloomberg.com"],
"startPublishedDate": "2025-01-01",
"enableHighlights": True,
"enableSummary": True
}| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
Required | Search query |
type |
SearchType |
"auto" |
Search type: auto, fast, instant, deep_lite, deep, deep_reasoning |
numResults |
int |
10 |
Results count (1-100) |
category |
Category |
null |
Category filter |
includeDomains |
list[string] |
null |
Required domains |
excludeDomains |
list[string] |
null |
Blocked domains |
startPublishedDate |
string |
null |
ISO date β published after |
endPublishedDate |
string |
null |
ISO date β published before |
startCrawlDate |
string |
null |
ISO date β crawled after |
endCrawlDate |
string |
null |
ISO date β crawled before |
includeText |
list[string] |
null |
Required phrases in page |
excludeText |
list[string] |
null |
Excluded phrases |
userLocation |
object |
null |
{"country": "US", "city": "NYC"} |
safesearch |
int |
0 |
0=off, 1=moderate, 2=strict |
enableHighlights |
bool |
true |
Include query-matched highlights |
highlight_sentences |
int |
3 |
Sentences per highlight (1-10) |
enableSummary |
bool |
false |
Include extractive summary |
additionalQueries |
bool |
true |
Enable query expansion for deep modes |
Fetch multiple URLs simultaneously with highlights and summaries.
{
"urls": [
"https://arxiv.org/abs/2401.04012",
"https://github.com/openai/gpt-5"
],
"highlight_query": "GPT-5 architecture capabilities",
"highlight_sentences": 5,
"enableSummary": True,
"max_tokens": 8000
}| Parameter | Type | Default | Description |
|---|---|---|---|
urls |
list[string] |
Required | URLs to fetch (1-20) |
highlight_query |
string |
null |
Query for highlight extraction |
highlight_sentences |
int |
3 |
Sentences per highlight |
enableSummary |
bool |
false |
Include extractive summary |
max_tokens |
int |
8000 |
Per-URL token budget (500-128000) |
Extract answers directly from source documents.
{
"query": "What is the main contribution of this paper?",
"urls": ["https://arxiv.org/abs/2401.04012"]
}| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
Required | Question to answer |
urls |
list[string] |
Required | Source URLs (1-20) |
Search within a specific domain for authoritative results.
{
"query": "async io release notes",
"site": "docs.python.org"
}| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
Required | Search query |
site |
string |
Required | Target domain (e.g., github.com, docs.rs) |
Extract clean markdown content from a single URL.
{
"url": "https://docs.python.org/3/whatsnew/3.12.html",
"max_tokens": 16000
}| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string |
Required | Full page URL |
max_tokens |
int |
null |
Token budget override (500-128000) |
git clone https://github.com/your-user/SearchEngineLLM.git
cd SearchEngineLLM
cp .env.example .env
# Edit .env: change SEARXNG_SECRET to a secure random stringdocker compose up -dThis server supports both STDIO (local) and HTTP (remote) transport.
python -m src.server{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"]
}
}
}python -m src.server http
# Server runs at http://localhost:8000/mcp{
"mcpServers": {
"investigator": {
"url": "http://localhost:8000/mcp"
}
}
}File: configs/claude_desktop.json
{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"],
"env": {}
}
}
}Add to ~/Library/Application Support/Claude/claude_desktop_config.json
File: configs/cursor.json
{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"]
}
}
}Settings β MCP β Add new server
File: configs/zed.json
{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"]
}
}
}.zed/config.json
File: configs/windsurf.json
{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"]
}
}
}Settings β MCP β Add new server
File: configs/vscode.json
{
"mcpServers": {
"investigator": {
"command": "python",
"args": ["-m", "src.server"]
}
}
}.vscode/mcp.json
python -m src.server httpFile: configs/lm-studio.json
{
"mcpServers": {
"investigator": {
"url": "http://localhost:8000/mcp"
}
}
}web_search({
"query": "latestSpaceX Starship launch",
"time_range": "day"
})
web_search_advanced({
"query": "impact of LLMs on software development",
"type": "deep",
"category": "research_paper",
"numResults": 20,
"enableHighlights": true,
"enableSummary": true
})
get_contents({
"urls": [
"https://arxiv.org/abs/2401.04012",
"https://github.com/anthropic/claude-code"
],
"highlight_query": "LLM code generation capabilities",
"enableSummary": true
})
answer({
"query": "What is the context window size for Claude 3.5?",
"urls": ["https://docs.anthropic.com/en/docs/about-claude/all-releases"]
})
site_search({
"query": "async io concurrency",
"site": "docs.python.org"
})
web_search_advanced({
"query": "Tesla stock performance 2024",
"type": "deep",
"category": "financial_report",
"startPublishedDate": "2024-01-01"
})
web_search_advanced({
"query": "machine learning transformers attention",
"type": "deep",
"category": "research_paper",
"includeDomains": ["arxiv.org", "papers.nips.cc"]
})
web_search_advanced({
"query": "John Carmack career biography",
"type": "deep",
"category": "people"
})
Ready-to-use research prompts for autonomous investigation.
You are a professional company researcher. Investigate [COMPANY NAME] using the following approach:
1. Use web_search_advanced with category="company" to find official sources
2. Search for recent news, financial reports, and official statements
3. Use get_contents to extract detailed information from their website and press releases
4. Use answer to extract key facts about products, leadership, and recent developments
5. Compile findings into a comprehensive company profile with:
- Company overview and mission
- Recent performance and news
- Leadership and key personnel
- Products or services offered
- Financial position (if public)
You are a professional investigator specializing in people search. Find comprehensive information about [PERSON NAME] by:
1. Use web_search_advanced with category="people" for biographical sources
2. Search for professional profiles (LinkedIn, Crunchbase), Wikipedia, and personal websites
3. Use get_contents to extract detailed biographical information
4. Use answer to extract career history, achievements, and notable facts
5. Return a detailed profile including:
- Professional background and career
- Notable achievements and contributions
- Current affiliation
- Educational background
You are an academic researcher. Conduct a thorough investigation of [TOPIC/ PAPER] by:
1. Use web_search_advanced with category="research_paper", targeting arxiv.org and academic databases
2. Search for the paper title, authors, and key concepts
3. Use get_contents to extract the full paper content
4. Use answer to extract:
- Main contribution and innovations
- Methodology used
- Key results and conclusions
- Limitations and future work
5. Return a structured academic summary with citations
You are a financial analyst. Analyze [COMPANY/TOPIC] financial reports by:
1. Use web_search_advanced with category="financial_report" to find SEC filings, earnings reports
2. Search for annual reports (10-K), quarterly reports (10-Q), and investor presentations
3. Use get_contents to extract detailed financial data
4. Use answer to extract key financial metrics, ratios, and narrative
5. Return a financial summary with:
- Revenue and profit trends
- Key financial ratios
- Significant events or changes
- Investment highlights and risks
You are a senior software engineer. Investigate [LIBRARY/ FRAMEWORK/ API] for code context by:
1. Use web_search_advanced with category="code", targeting GitHub, StackOverflow, and official docs
2. Search for usage examples, tutorials, and API documentation
3. Use get_contents to extract code examples and official documentation
4. Use answer to extract:
- API signatures and parameters
- Common usage patterns
- Best practices and gotchas
- Version compatibility information
5. Return comprehensive code documentation with examples
You are a senior investigative researcher. Conduct a deep investigation of [COMPLEX TOPIC] using chain-of-thought reasoning:
1. Use web_search_advanced with type="deep_reasoning" for comprehensive multi-perspective analysis
2. Generate multiple query variations to explore different angles:
- Historical background and origins
- Current state and recent developments
- Key stakeholders and perspectives
- Controversies and debates
- Future implications and predictions
3. Use get_contents to extract detailed information from authoritative sources
4. Use answer to synthesize findings across multiple sources
5. Cross-reference claims and identify consensus vs. disputed points
6. Return a comprehensive investigation report with:
- Executive summary
- Detailed findings from each perspective
- Evidence and citations
- Areas of consensus and dispute
- Implications and recommendations
Results are classified by source reliability:
| Tier | Description | Examples |
|---|---|---|
| π’ Tier 1 | Definitive / Official | docs.python.org, github.com, .gov, .edu |
| π΅ Tier 2 | Authoritative | Wikipedia, Stack Overflow, Arxiv |
| π‘ Tier 3 | Reference | Tech blogs, News outlets, Publications |
| βͺ Tier 4 | Other | Reddit, Generic blogs, SEO content |
SearchEngineLLM/
βββ src/
β βββ server.py # MCP server entry point
β βββ config.py # Pydantic settings & configuration
β βββ models.py # Request/response schemas
β βββ errors.py # Error classes
β βββ tools/
β βββ web_search.py # Basic discovery search
β βββ web_search_advanced.py # Full Exa-parity advanced search
β βββ web_fetch.py # Single URL content extraction
β βββ get_contents.py # Batch content + highlights
β βββ site_search.py # Domain-specific search
β βββ answer.py # Extractive Q&A
βββ configs/ # MCP client configurations
βββ docs/
β βββ superpowers/
β βββ specs/ # Design specifications
βββ docker-compose.yml # Docker Compose (MCP + SearxNG)
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# Stop services
docker compose downServices:
- wie-mcp-server: WIE MCP server on port 8000
- wie-searxng: Meta-search engine on port 8080
- SSRF protection: DNS validation, private IP blocking
- URL validation: Scheme and netloc required
- API key middleware: Optional Bearer token auth
- Zero logging: No user data stored or logged
GNU Affero General Public License v3 (AGPLv3) β LICENSE
Copyright (C) 2025-2026 Jonathan Lima
- Check if container is running:
docker ps - Check logs:
docker logs wie-mcp-server
- Use the URL
/mcp(Streamable HTTP), not/sse
- Verify the URL is correct: must end with
/mcp
To access from another computer, replace localhost with the host IP:
{
"mcpServers": {
"investigator": {
"url": "http://192.168.1.100:8000/mcp"
}
}
}In docker-compose.yml, add:
environment:
- SEARXNG_HOST=http://searxng:8080
- API_KEY=your-key-hereTo change the port (e.g., 8090):
# docker-compose.yml
ports:
- "8090:8000"
# Client configuration
"url": "http://localhost:8090/mcp"