English | 简体中文
Omni-Search-Skill is a full-stack search and retrieval skill for agentic workflows.
Its vision is simple:
no-blind-spot, high-speed web search and fetching across the public web
It combines search, fetch, search-then-fetch, and crawl into one skill with a unified output shape and provider routing layer.
- Searches the live web through multiple providers
- Fetches a specific page as clean Markdown
- Resolves a query into top search hits and fetches the best page(s)
- Crawls a site for relevant pages when a docs map or content graph is needed
- Routes automatically between local, free, and paid providers
- Detects junk content (captcha, JS-required pages) and falls back automatically
- Skips known-blocked domains to avoid wasted attempts
| Provider | Type | Free Tier |
|---|---|---|
| Jina Search | API (key optional) | Generous free tier |
| DuckDuckGo (ddgs) | Library | Unlimited |
| CN free (DDG + Bing CN) | HTML scraping | Unlimited |
| Brave Search | API | 2,000/month |
| Serper.dev | API | 2,500/month |
| Google CSE | API | 100/day |
| Bing Web Search | API (Azure) | 1,000/month |
| Tavily Search | API | 1,000/month |
| Baidu AI Search | API | With key |
| Exa | via mcporter |
With key |
| Provider | Type | Notes |
|---|---|---|
| Local Scrapling | Local browser | Fast + stealth (Camoufox) auto-fallback |
| Jina Reader | API | Good for JS-heavy sites |
| Tavily Extract | API | Paid fallback |
| Firecrawl Scrape | API | Paid fallback |
- Tavily Crawl
The router uses a tiered, cost-optimized strategy:
Search routing:
- Tier 1 — Free: Jina (with key) → DuckDuckGo (ddgs library) → CN free HTML
- Tier 2 — Freemium: Tavily → Brave → Serper → Google CSE → Bing
- Tier 3 — Specialized: Baidu (Chinese), Exa
Fetch routing:
- Local Scrapling (fast mode, then stealth auto-fallback)
- Jina Reader
- Tavily Extract / Firecrawl (when paid allowed)
Domain-aware optimization: Sites known to block local fetching (x.com, zhihu, weibo, bloomberg, wsj, etc.) skip straight to API providers — saves time and avoids flaky failures.
- Request-level retry with backoff for transient HTTP errors (429, 5xx)
- Stealth fetch retry for Camoufox browser crashes
- Junk content detection: captcha pages, JS-required shells → auto fallback
- Graceful degradation: returns best available result instead of failing
- Content quality threshold: minimum 500 chars of usable content before accepting
omni-search-skill/
SKILL.md
README.md
README.zh-CN.md
requirements.txt
.env.example
scripts/
omni_search.py
eval_benchmark.py
omni_search_skill/
cli.py
models.py
providers.py
router.py
utils.py
git clone https://github.com/d-wwei/omni-search-skill.git
cd omni-search-skill
python3 -m pip install -r requirements.txtFor stealth fetching (JS-heavy sites), also install the Camoufox browser:
python3 -m camoufox fetchThe system works with zero API keys (using ddgs + local fetch), but adding keys unlocks more providers and better coverage:
| Key | Provider | How to get |
|---|---|---|
JINA_API_KEY |
Jina Search + Reader | jina.ai |
BRAVE_API_KEY |
Brave Search | brave.com/search/api |
SERPER_API_KEY |
Serper.dev (Google SERP) | serper.dev |
TAVILY_API_KEY |
Tavily Search/Extract/Crawl | tavily.com |
GOOGLE_CSE_API_KEY + GOOGLE_CSE_CX |
Google Custom Search | developers.google.com |
BING_API_KEY |
Bing Web Search (Azure) | azure.microsoft.com |
BAIDU_API_KEY |
Baidu AI Search | cloud.baidu.com |
FIRECRAWL_API_KEY |
Firecrawl Scrape | firecrawl.dev |
Place them in .env based on .env.example.
# Check what is available in the current environment
python3 scripts/omni_search.py providers
# Search the web
python3 scripts/omni_search.py search "latest AI news"
# Fetch a page
python3 scripts/omni_search.py fetch "https://openai.com/news"
# Search first, then fetch top result(s)
python3 scripts/omni_search.py resolve "Tavily extract docs" --fetch-top 2
# Crawl a docs site
python3 scripts/omni_search.py crawl "https://docs.tavily.com"The project includes a comprehensive benchmark (scripts/eval_benchmark.py) that tests against 35 fetch targets and 22 search queries across:
- Social media (X, Reddit, Instagram, TikTok, Xiaohongshu)
- Finance (Seeking Alpha, Yahoo Finance, Bloomberg, WSJ, FT)
- Chinese web (Douban, Zhihu, 36kr, Weibo, Bilibili)
- Tech (HN, GitHub, arXiv, StackOverflow, OpenAI docs)
- News (Wikipedia, BBC, NYT, whitehouse.gov, WHO)
- Hard targets (LinkedIn, Medium, Pinterest, Amazon, Google Scholar)
- Multilingual search (English, Chinese, Japanese, French, Korean)
python3 scripts/eval_benchmark.py- Blocks localhost and private-network fetch targets by default
- Prefers local and lower-cost routes first
- Uses paid providers only when they unlock better quality or coverage
- Falls through to the next provider on failure instead of retrying the same route
- Retries only on transient HTTP errors (429, 5xx) with backoff