| ZIM CRUD agent (read/write/edit/delete) + semantic search | ZIM archives (Wikipedia)

Offline CRUD + semantic search agent for ZIM archives (Kiwix format). Read, write, edit, delete articles, build vector indexes, and serve via MCP – all offline, low‑resource.

ZimAgent is a complete offline agent for working with ZIM files (the Kiwix archive format). Unlike a simple reader, it supports CRUD operations – you can read, write, edit, and delete articles, plus perform semantic search using TurboRag. It also exposes an MCP server so that other AI agents can manipulate ZIM archives programmatically.
- 📖 Read articles by path (e.g.,
A/Python_(programming_language)) - ✍️ Write new articles (HTML content + metadata)
- ✏️ Edit/update existing articles
- 🗑️ Delete articles
- 🔍 Semantic search – find articles by meaning (powered by TurboRag)
- 📝 Full‑text keyword search (SQLite FTS5 fallback)
- 🤖 MCP server – expose all operations via stdio or SSE for Claude Desktop or any MCP client
- 🧠 Offline first – no internet required after setup
git clone https://github.com/AHX47/zim-agent.git
cd zim-agent
pip install -r requirements.txtTurboRag provides the embedding and vector search engine.
pip install turborag-ahx47 # or your published turborag packagemkdir -p data
# Mini Wikipedia (~90 MB) for testing
wget https://download.kiwix.org/zim/wikipedia/wikipedia_en_top_mini_2024-12.zim -O data/test.zimYou need the Gemma 300M GGUF model (≈150 MB):
mkdir -p models
wget -O models/embeddinggemma-300m-q4_k_m.gguf \
"https://huggingface.co/sabafallah/embeddinggemma-300m-Q4_K_M-GGUF/resolve/main/embeddinggemma-300m-q4_k_m.gguf"python main.py index --zim data/test.zim --max-articles 5000The index will be stored in data/zim_index/.
python main.py read --path "A/Python_(programming_language)"python main.py write --path "A/My_New_Article" \
--title "My New Article" \
--html "<p>This is a brand new article.</p>"python main.py edit --path "A/My_New_Article" --html "<p>Updated content</p>"python main.py delete --path "A/My_New_Article"python main.py ask "What is the main programming language for machine learning?"python main.py chatStart the server:
python main.py serve-mcp --port 8002 --transport sseOr for stdio (Claude Desktop):
python main.py serve-mcp --transport stdio| Tool | Description |
|---|---|
read_article(path) |
Returns HTML and plain text of an article |
search_articles(query, k) |
Semantic search (returns top‑k articles) |
write_article(path, title, html) |
Creates a new article |
edit_article(path, html) |
Replaces content of existing article |
delete_article(path) |
Removes the article |
User: Create a new article about "Rust programming" in my ZIM.
Claude: [calls write_article with path "R/Rust_programming"]
Agent: Article created successfully.
┌─────────────────────────────────────────────────┐
│ CLI (read/write/edit/delete/ask/chat) │
│ MCP Server (stdio/SSE) │
├─────────────────────────────────────────────────┤
│ ZimAgent Core │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ ZIM │ │ TurboRag │ │ SQLite FTS5 │ │
│ │ Reader │ │ Semantic │ │ Keyword │ │
│ │ Writer │ │ Index │ │ Search │ │
│ └──────────┘ └──────────┘ └──────────────┘ │
├─────────────────────────────────────────────────┤
│ Dependencies: libzim, turborag, llama-cpp │
└─────────────────────────────────────────────────┘
Create a config.yaml (or edit the default in main.py):
embed_model: "models/embeddinggemma-300m-q4_k_m.gguf"
zim_path: "data/test.zim"
index_path: "data/zim_index"
chunk_size: 512
mcp_host: "127.0.0.1"
mcp_port: 8002- Python 3.10+
- Rust (only if you rebuild TurboVec – not required if using
pip install turborag-ahx47) - ~1 GB RAM
- ~2 GB disk space (for ZIM + index + models)
- No internet required at runtime
git clone https://github.com/AHX47/zim-agent.git
cd zim-agent
pip install -r requirements.txt
pip install -e .MIT
- GitHub: AHX47/zim-agent
- Related projects: turborag-ahx47,turborag, turbovec