Skip to content

hkwuks/sxng-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

38 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” SXNG CLI

A powerful command-line interface for SearXNG
Privacy-respecting web search from your terminal

npm version npm downloads license node version

Features โ€ข Installation โ€ข Quick Start โ€ข Usage โ€ข Configuration


โœจ Features

  • ๐Ÿ”Ž Multi-Engine Search โ€” Search across Google, Bing, DuckDuckGo, GitHub, StackOverflow, and 30+ engines simultaneously
  • ๐Ÿ”„ Dynamic Discovery โ€” Auto-fetches available engines and categories from your SearXNG server
  • ๐Ÿ“„ Multiple Formats โ€” Markdown (LLM-optimized) or JSON output
  • ๐Ÿง  Deep Search โ€” Multi-round iterative research with session accumulation and knowledge graph
  • ๐Ÿ” Content Extraction โ€” Extract full article content from search results, with Obscura fallback for JS-heavy pages
  • ๐Ÿ—‚๏ธ Session Management โ€” Accumulate search results across multiple rounds with deduplication
  • ๐Ÿ•ธ๏ธ Knowledge Graph โ€” Build semantic graphs of entities and relationships
  • โšก Fast & Lightweight โ€” Built with TypeScript, minimal dependencies
  • ๐Ÿ”ง Flexible Config โ€” Environment variables, config file, or interactive setup
  • ๐Ÿฅ Health Check โ€” Verify server connectivity instantly
  • ๐ŸŒ Proxy Support โ€” HTTP/HTTPS proxy configuration

๐Ÿ“ฆ Installation

Self-host SearXNG

For WSL

WSL2 will automatically shut itself down after you exit all the connections. I suggest you use https://github.com/gardengim/keepwsl to keep it alive.

Before starting the searXNG container, you must create a settings.yml file in the ./searxng directory. You can visit https://github.com/searxng/searxng for specific configuration methods.

An example of settings.yml is just like below.

๐Ÿ“‹ ็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ด settings.yml ้…็ฝฎ๏ผˆ30+ ๆœ็ดขๅผ•ๆ“Ž๏ผ‰
use_default_settings: true

server:
  secret_key: "random string"
  limiter: false

outgoing:
  request_timeout: 10.0 # ๅ…จๅฑ€้ป˜่ฎค่ถ…ๆ—ถ
  max_request_timeout: 10.0
  pool_connections: 200
  pool_maxsize: 20
  retries: 2

search:
  safe_search: 0
  formats:
    - html
    - json
    - csv
    - rss

valkey:
  url: valkey://valkey:6379/0

engines:
  # ==================== ้€š็”จๆœ็ดข ====================
  - name: google
    engine: google
    shortcut: g

  - name: bing
    engine: bing
    shortcut: bi
    disabled: false

  - name: duckduckgo
    engine: duckduckgo
    shortcut: ddg

  - name: brave
    engine: brave
    shortcut: br

  - name: startpage
    engine: startpage
    shortcut: sp

  - name: qwant
    engine: qwant
    shortcut: qw
    disabled: false

  - name: yandex
    engine: yandex
    shortcut: yx
    disabled: false

  - name: karmasearch
    engine: karmasearch
    categories: [general, web]
    search_type: web
    shortcut: ka
    disabled: false

  # ==================== ไธญๆ–‡ๆœ็ดข ====================
  - name: baidu
    engine: baidu
    shortcut: bd
    disabled: false

  - name: sogou
    engine: sogou
    shortcut: sg
    disabled: false

  - name: 360search
    engine: 360search
    shortcut: 360so
    disabled: false

  - name: quark
    engine: quark
    shortcut: qk
    disabled: false

  # ==================== ็ผ–็จ‹็›ธๅ…ณ ====================
  - name: github
    engine: github
    shortcut: gh

  - name: github code
    engine: github_code
    shortcut: ghc

  - name: gitlab
    engine: gitlab
    base_url: https://gitlab.com
    shortcut: gl
    disabled: false

  - name: codeberg
    engine: gitea
    base_url: https://codeberg.org
    shortcut: cb
    disabled: false

  - name: stackexchange
    engine: stackexchange
    shortcut: se

  - name: stackoverflow
    engine: stackexchange
    shortcut: so
    categories: q&a
    stackexchange_site: stackoverflow

  - name: npm
    engine: npm
    shortcut: npm
    disabled: false

  - name: pypi
    engine: pypi
    shortcut: py

  - name: crates.io
    engine: crates
    shortcut: crate
    disabled: false

  - name: pkg.go.dev
    engine: pkg_go_dev
    shortcut: go
    disabled: false

  - name: metacpan
    engine: metacpan
    shortcut: cpan
    disabled: false

  - name: docker hub
    engine: docker_hub
    shortcut: dh

  - name: huggingface
    engine: huggingface
    shortcut: hf
    disabled: false

  - name: huggingface datasets
    engine: huggingface
    huggingface_endpoint: datasets
    shortcut: hfd
    disabled: false

  - name: hex
    engine: hex
    shortcut: hex
    disabled: false

  - name: mdn
    engine: json_engine
    shortcut: mdn
    categories: [it]
    paging: true
    search_url: https://developer.mozilla.org/api/v1/search?q={query}&page={pageno}
    results_query: documents
    url_query: mdn_url
    url_prefix: https://developer.mozilla.org
    title_query: title
    content_query: summary

  - name: arch linux wiki
    engine: archlinux
    shortcut: al

  - name: gentoo wiki
    engine: mediawiki
    shortcut: gentoo
    categories: ["it", "software wikis"]
    base_url: "https://wiki.gentoo.org/"
    api_path: "api.php"
    search_type: text
    

  - name: lobste.rs
    engine: xpath
    search_url: https://lobste.rs/search?q={query}&what=stories&order=relevance
    results_xpath: //li[contains(@class, "story")]
    url_xpath: .//a[@class="u-url"]/@href
    title_xpath: .//a[@class="u-url"]
    content_xpath: .//a[@class="domain"]
    categories: it
    shortcut: lo
    
    disabled: false

  # ==================== ็Ÿฅ่ฏ†/้—ฎ็ญ” ====================
  - name: wikipedia
    engine: wikipedia
    shortcut: wp
    display_type: ["infobox"]
    categories: [general]

  - name: wikidata
    engine: wikidata
    shortcut: wd
    
    weight: 2
    display_type: ["infobox"]
    categories: [general]

  - name: reddit
    engine: reddit
    shortcut: re
    disabled: false

  - name: hackernews
    engine: hackernews
    shortcut: hn
    disabled: false

  # ==================== ๅ›พ็‰‡ ====================
  - name: google images
    engine: google_images
    shortcut: goi

  - name: bing images
    engine: bing_images
    shortcut: bii

  - name: duckduckgo images
    engine: duckduckgo_extra
    categories: [images]
    ddg_category: images
    shortcut: ddi

  - name: pinterest
    engine: pinterest
    shortcut: pin

  - name: unsplash
    engine: unsplash
    shortcut: us

  - name: pixabay
    engine: pixabay
    shortcut: pxb

  - name: deviantart
    engine: deviantart
    shortcut: da
    disabled: false

  - name: flickr
    categories: images
    shortcut: fl
    engine: flickr_noapi
    disabled: false

  - name: openverse
    engine: openverse
    categories: images
    shortcut: opv
    disabled: false

  - name: artic
    engine: artic
    shortcut: arc
    disabled: false

  # ==================== ่ง†้ข‘ ====================
  - name: google videos
    engine: google_videos
    shortcut: gov

  - name: bing videos
    engine: bing_videos
    shortcut: biv

  - name: duckduckgo videos
    engine: duckduckgo_extra
    categories: [videos]
    ddg_category: videos
    shortcut: ddv

  - name: youtube
    engine: youtube_noapi
    shortcut: yt

  - name: bilibili
    engine: bilibili
    shortcut: bili
    disabled: false

  # ==================== ๆ–ฐ้—ป ====================
  - name: google news
    engine: google_news
    shortcut: gon

  - name: bing news
    engine: bing_news
    shortcut: bin

  - name: duckduckgo news
    engine: duckduckgo_extra
    categories: [news]
    ddg_category: news
    shortcut: ddn

  # ==================== ้Ÿณไน ====================
  - name: bandcamp
    engine: bandcamp
    shortcut: bc
    categories: music
    disabled: false

  - name: deezer
    engine: deezer
    shortcut: dz
    disabled: false

  - name: mixcloud
    engine: mixcloud
    shortcut: mc
    disabled: false

  - name: genius
    engine: genius
    shortcut: gen
    disabled: false

  # ==================== ๅญฆๆœฏ/ๆ–‡ๆกฃ ====================
  - name: arxiv
    engine: arxiv
    shortcut: arx

  - name: semantic scholar
    engine: semantic_scholar
    shortcut: sem

  - name: google scholar
    engine: google_scholar
    shortcut: gsch

  - name: pubmed
    engine: pubmed
    shortcut: pub

  - name: crossref
    engine: crossref
    shortcut: cr
    disabled: false

  # ==================== ็คพไบคๅช’ไฝ“ ====================
  - name: lemmy posts
    engine: lemmy
    lemmy_type: Posts
    shortcut: lepo
    disabled: false

  - name: mastodon users
    engine: mastodon
    mastodon_type: accounts
    base_url: https://mastodon.social
    shortcut: mau
    disabled: false

  # ==================== ๆ–‡ไปถ/็งๅญ ====================
  - name: library genesis
    engine: xpath
    search_url: https://libgen.rs/search.php?req={query}
    url_xpath: //a[contains(@href,"book/index.php?md5")]/@href
    title_xpath: //a[contains(@href,"book/")]/text()[1]
    content_xpath: //td/a[1][contains(@href,"=author")]/text()
    categories: files
    shortcut: lg
    disabled: false

  - name: kickass
    engine: kickass
    base_url:
      - https://kickasstorrents.to
      - https://kickasstorrents.cr
    shortcut: kc
    disabled: false

  - name: annas archive
    engine: annas_archive
    base_url:
      - https://annas-archive.gl
      - https://annas-archive.vg
    shortcut: aa
    disabled: false

  # ==================== ็ฟป่ฏ‘ ====================
  - name: lingva
    engine: lingva
    shortcut: lv
    disabled: false

  - name: currency
    engine: currency_convert
    shortcut: cc

  # ==================== ๅ…ถไป– ====================
  - name: imdb
    engine: imdb
    shortcut: imdb
    disabled: false

  - name: steam
    engine: steam
    shortcut: stm
    disabled: false

  - name: goodreads
    engine: goodreads
    shortcut: good
    disabled: false

An example of docker-compose.yml is just like below.

services:
   searxng:
        image: docker.io/searxng/searxng:latest
        container_name: searxng
        restart: unless-stopped
        ports:
            - "8080:8080"
        volumes:
            - ./searxng:/etc/searxng:Z
        depends_on:
            - valkey
        ulimits:
            nofile:
                soft: 10000
                hard: 65535

    valkey:
        container_name: valkey
        image: docker.io/valkey/valkey:9-alpine
        command: valkey-server --save 30 1 --loglevel warning
        restart: always
        volumes:
            - ./valkey:/data/

From npm (Recommended)

npm install -g sxng-cli
npx skills add hkwuks/sxng-cli

โš ๏ธ Skill ๅŒๆญฅ๏ผšๆ›ดๆ–ฐ sxng-cli ๅŽ๏ผŒ่ฏทๅŒๆญฅๆ›ดๆ–ฐ sxng skill ไปฅไฟๆŒๅŠŸ่ƒฝไธ€่‡ด๏ผš

npx skills update hkwuks/sxng-cli

From Source

git clone https://github.com/hkwuks/sxng-cli.git
cd sxng-cli/cli
npm install
npm run build
npm link

Obscura (Optional โ€” for JS-heavy pages)

sxng extract uses Defuddle + linkedom by default for lightweight content extraction. When a page requires JavaScript rendering (SPAs, dynamic content), enable Obscura as a fallback:

# Linux x86_64
curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-x86_64-linux.tar.gz
tar xzf obscura-x86_64-linux.tar.gz
cp obscura ~/.local/bin/

# macOS Apple Silicon
curl -LO https://github.com/h4ckf0r0day/obscura/releases/latest/download/obscura-aarch64-macos.tar.gz
tar xzf obscura-aarch64-macos.tar.gz
cp obscura /usr/local/bin/

# Docker
docker run -d --name obscura -p 127.0.0.1:9222:9222 h4ckf0r0day/obscura

# Verify
obscura --version

No extra npm dependencies needed โ€” Obscura is called via CLI. Auto-detected from PATH, ~/.local/bin/obscura, or /usr/local/bin/obscura.


๐Ÿš€ Quick Start

  1. Install the CLI:

    npm install -g sxng-cli
  2. Configure the CLI:

    sxng init

    Or set environment variable:

    export SEARXNG_BASE_URL=http://your-searxng-instance:8080
  3. Perform a search:

    sxng "TypeScript tutorial"

๐Ÿ“– Usage

Commands

Command Description
sxng init Interactive configuration setup
sxng <query> Perform a web search
sxng --queries "q1,q2" Multi-query search with RRF fusion
sxng extract --urls <urls> Extract content from web pages
sxng extract --obscura Extract with Obscura JS-rendering fallback
sxng --session new Create deep search session
sxng session-list List all sessions
sxng session-delete <session-name> Delete a session
sxng graph-add <session> Add entities to knowledge graph
sxng query-graph <session> Query knowledge graph
sxng --health Check SearXNG server health
sxng --engines-list List available search engines from server
sxng --categories-list List available categories from server
sxng --help Show help message

Search Options

Option Description
-e, --engines <list> Comma-separated list of search engines (e.g., google,github)
-c, --categories <list> Comma-separated list of categories (e.g., it,science)
-l, --limit <n> Maximum number of results (default: 10)
-p, --page <n> Page number for pagination
--lang <code> Language code (e.g., en, zh, ja)
--time <range> Time range: day, week, month, year, all
-f, --format <fmt> Output format: md, json, csv, html (default: md)
--queries <list> Multi-query with RRF fusion (e.g., q1,q2,q3)
--session <session-name> Session directory or new for deep search
--owner <session-name> Session owner identifier
--desc <text> Session description

Examples

# Basic search (outputs Markdown by default)
sxng "machine learning"

# Output as JSON
sxng --format json "machine learning"

# Search with specific engines
sxng --engines google,duckduckgo "privacy tools"

# Search IT and Science categories
sxng --categories it,science "kubernetes tutorial"

# Limit results and filter by time
sxng --limit 5 --time week "latest AI news"

# Output as CSV
sxng --format csv "python tutorial" > results.csv

# Multi-query search with RRF fusion
sxng --queries "tokio tutorial,rust async basics,async-std guide"

# List available engines (fetched from server)
sxng --engines-list

# List available categories (fetched from server)
sxng --categories-list

โš™๏ธ Configuration

Configuration priority (highest to lowest):

  1. Environment variables
  2. Local config file (./sxng.config.json)
  3. Global config file (~/sxng-cli/sxng.config.json)
  4. Default values

Environment Variables

Variable Description Default
SEARXNG_BASE_URL SearXNG server URL (required)
SEARXNG_DEFAULT_ENGINE Default search engine (none)
SEARXNG_ALLOWED_ENGINES Comma-separated allowed engines (all)
SEARXNG_DEFAULT_LIMIT Default result limit 10
SEARXNG_DEFAULT_FORMAT Default output format (md, json, csv, html) md
SEARXNG_USE_PROXY Use proxy (true/false) false
SEARXNG_PROXY_URL Proxy URL (none)
SEARXNG_TIMEOUT Request timeout in ms 10000

Config File

Config file search order (first found wins):

  1. Local config - ./sxng.config.json (current working directory, for project-specific settings)
  2. Global config - ~/sxng-cli/sxng.config.json (user home directory, for global defaults)

Create sxng.config.json:

{
  "baseUrl": "http://localhost:8080",
  "defaultEngine": "",
  "allowedEngines": [],
  "defaultLimit": 10,
  "defaultFormat": "md",
  "useProxy": false,
  "proxyUrl": "",
  "timeout": 10000
}

๐Ÿง  Deep Search

Deep search enables multi-round iterative research with session accumulation and knowledge graph building.

Quick Example

# Create a session and search
sxng --session new --owner "researcher" --desc "Rust async study" "rust async ecosystem"
# Session created: ~/sxng-cli/sessions/<session-name>

# Extract content from results (by name or path)
sxng extract --session <session-name>
# or: sxng extract --session ~/sxng-cli/sessions/<session-name>

# Add knowledge graph entities (by name or path)
sxng graph-add <session-name> --data '{
  "entities": [
    {"label": "tokio", "entityType": "runtime", "score": 0.95},
    {"label": "async-std", "entityType": "runtime", "score": 0.85}
  ],
  "edges": [
    {"source": "e:tokio", "target": "e:async_std", "relation": "alternative_to", "weight": 0.9}
  ]
}'

# Query the graph (by name or path)
sxng query-graph <session-name> --seeds "tokio" --depth 2

# Continue research (results accumulate)
sxng --session <session-name> --queries "tokio vs async-std,benchmark 2024"

Session Management

Command Description
sxng --session new Create new auto-named session
sxng --session <session-name> Use session by name (auto-resolves to ~/sxng-cli/sessions/<session-name>)
sxng --session <path> Use session by full path
sxng session-list List all sessions with stats
sxng session-delete <session-name> Delete specific session
sxng session-delete --older <hours> Delete old sessions

Session Path Resolution:

  • Pure name (e.g., my-session) โ†’ ~/sxng-cli/sessions/my-session
  • Full path (e.g., /custom/path/session) โ†’ used as-is
  • new โ†’ auto-generate unique name under ~/sxng-cli/sessions/

Session Data Structure

Each session stores three files in ~/sxng-cli/sessions/<session-name>/:

  • results.json โ€” Accumulated search results (URL dedup, multi-round)
  • graph.json โ€” Knowledge graph (structural + semantic layers)
  • meta.json โ€” Session metadata (owner, description, timestamps)

Knowledge Graph

Structural Layer (auto-built):

  • q: โ€” Query nodes
  • r: โ€” Result nodes
  • d: โ€” Domain nodes

Semantic Layer (via graph-add):

  • e: โ€” Entity nodes with type and score

๐Ÿ—๏ธ Architecture

Content Extraction

sxng extract uses a two-tier extraction strategy:

  1. Defuddle + linkedom (default, lightweight) โ€” Parses raw HTML with linkedom, extracts readable content with Defuddle. Fast, no browser needed.
  2. Obscura (optional fallback) โ€” When Defuddle extracts too little content (< 50 chars), Obscura renders the page with V8 JS engine and re-extracts. Use --obscura to enable.
# Default: Defuddle only (fast)
sxng extract --urls "https://example.com"

# With Obscura fallback for JS-heavy pages
sxng extract --urls "https://spa-site.com" --obscura

# Obscura direct markdown output (skip Defuddle re-parse)
sxng extract --urls "https://spa-site.com" --obscura --obscura-dump markdown

# Custom Obscura binary path
sxng extract --urls "https://spa-site.com" --obscura --obscura-path /path/to/obscura

Extraction options:

Option Description
--obscura Enable Obscura fallback for JS-rendered pages
--obscura-path <path> Path to Obscura binary (auto-detected if omitted)
--obscura-dump <format> Obscura output format: html (default) or markdown

Dynamic Engine/Category Discovery

Unlike other CLI tools that hardcode supported engines and categories, this tool dynamically fetches them from your SearXNG server's /config endpoint:

  • Engines and categories are retrieved at runtime from the server
  • This ensures compatibility with any SearXNG instance configuration
  • Adding new engines to your SearXNG instance automatically makes them available in the CLI

Use sxng --engines-list and sxng --categories-list to see what's available on your server.

Output Format

The CLI supports multiple output formats:

  • Markdown (default) - Optimized for LLM context windows, saves ~50% tokens vs JSON
  • JSON - Structured envelope format for programmatic use
๐Ÿ“ ็‚นๅ‡ปๅฑ•ๅผ€่พ“ๅ‡บๆ ผๅผ็คบไพ‹

Markdown Format (Default)

## Search: machine learning

**5** results
Total: 42

### 1. [Machine Learning Tutorial](https://example.com/ml)

Learn machine learning from scratch...

Engine: google | Category: general | Score: 1

---

### Suggestions

- deep learning tutorial
- neural networks

JSON Envelope Format

{
  "status": "ok|error",
  "data": { ... },
  "error": null,
  "hint": "..."
}

๐Ÿ› ๏ธ Development


โญ Star History

Star History Chart

About

Cli Tool and Skill for self-host searxng service

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors