-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
How the pieces connect and data flows through the system.
When you run wn ingest https://example.com/article or the AI calls ingest_url(), here's what happens end to end:
URL
|
v
Jina Reader (r.jina.ai) -- fetches page, returns clean markdown
|
v
SHA-256 hash -- checks if content already exists (dedup)
|
v
raw/article-title.md -- saved to disk, immutable from this point
|
v
Database indexes the raw source -- path, URL, hash, compiled=false
|
v
Content returned to the host AI -- the AI now has the full text
|
v
AI reads wiki/index.md -- understands what already exists
|
v
AI writes/updates 10-15 pages:
wiki/sources/article.md -- source summary
wiki/concepts/topic-a.md -- concept page (new or updated)
wiki/concepts/topic-b.md -- another concept touched
wiki/index.md -- new pages added to catalog
wiki/overview.md -- evolving thesis refined
wiki/tags.md -- new tags added
wiki/contradictions.md -- conflicts flagged (if any)
wiki/log.md -- dated entry appended
|
v
AI calls index_article() -- each page indexed in database (FTS5, tags, links)
|
v
AI calls mark_compiled() -- raw source marked as processed
The key: WikiNow handles steps 1-5 (fetch, dedup, save, index, return content). The AI handles steps 6-10 (read wiki state, write pages, update index). WikiNow is the plumbing; the AI is the brain.
For YouTube URLs, the flow is the same but step 1 uses yt-dlp for English subtitles, with Whisper as a fallback if no subtitles exist. For local files (PDF, epub, audio, text), step 1 uses the appropriate extractor instead of Jina Reader.
The database is a disposable cache. Delete wikinow.db and it rebuilds from the markdown files on next access.
The sync happens automatically every time the database connection opens:
For each .md file in wiki/sources/, wiki/concepts/, wiki/comparisons/, wiki/queries/:
|
Is it in the database?
|
No --> index it (parse frontmatter, extract wikilinks, add to FTS5)
Yes --> compare file's last-modified time vs database's indexed_at time
|
File is newer --> re-index it
File is same --> skip (no work needed)
For each database entry:
|
Does the .md file still exist on disk?
|
No --> delete from database (article, FTS5 entry, tags, links)
Yes --> keep
Same process for raw/ files.
This means:
- Edit a page in Obsidian --> database picks it up automatically
- Delete a page --> database removes it
- Add a new .md file manually --> database indexes it
- Delete
wikinow.db--> everything rebuilds from files
No rebuild command. No migration. No manual intervention.
Every project has a CLAUDE.md file that tells the AI how to maintain the wiki. It's generated on wn init with default instructions (13 ingest steps, 7 query steps, lint checks, confidence levels, frontmatter format).
But the schema is designed to evolve through conversation:
Session 1:
You: "When you write concept pages, always include a ## History section"
AI: calls update_schema("Domain-Specific Notes", "Always include ## History...")
--> CLAUDE.md updated on disk
Session 2:
AI reads CLAUDE.md on startup
--> sees the new instruction
--> all future concept pages include ## History
The update_schema() tool modifies specific sections of CLAUDE.md without touching the rest. The schema accumulates project-specific conventions over time -- the AI learns what works for your domain.
CLAUDE.md is symlinked to AGENTS.md (for Codex/Copilot) and .github/copilot-instructions.md (for GitHub Copilot). One file, three names, all AI tools read the same instructions.
WikiNow has two entry points that share the same underlying code:
| Operation | CLI (wn) |
MCP (host AI) |
|---|---|---|
| Ingest a URL | wn ingest https://... |
ingest_url("https://...") |
| Search | wn search "query" |
search("query") |
| Health check | wn lint |
lint() |
| Read a page | wn read concepts/ai.md |
read("concepts/ai.md") |
| Export | wn export |
export() |
| Stats | wn stats |
get_project_stats() |
The CLI is for quick human interactions. The MCP server is for the AI to use programmatically during conversation. Both call the same database, same ingestion modules, same search.
For contributors -- what each file does:
wikinow/
├── cli.py Typer app -- 14 commands, Rich panels, ASCII logo
├── server.py FastMCP server -- 21 tools, path traversal protection
├── config.py YAML config -- frozen dataclasses, env var resolution, singleton
├── project.py init (dirs + schema + git + obsidian), switch, list
├── templates.py CLAUDE.md content, wiki file templates, Obsidian JSON configs
├── export.py Concatenate wiki pages into single markdown file
├── db/
│ ├── schemas.py 5 CREATE TABLE statements + 5 indexes
│ └── storage.py Self-healing cache -- CRUD, FTS5 search, filesystem sync
├── ingestion/
│ ├── jina.py URL -> markdown via Jina Reader (stdlib urllib only)
│ ├── youtube.py yt-dlp subtitles + Whisper fallback, json3 parsing
│ ├── pdf.py pymupdf text extraction (optional dep)
│ ├── epub.py ebooklib + BeautifulSoup chapter extraction (optional dep)
│ ├── audio.py Whisper transcription, English-only check (optional dep)
│ └── text.py pathlib read, title from filename stem
└── search/
├── wiki.py FTS5 keyword search (delegates to db)
└── web.py Ollama web search API (optional dep)