Architecture

How the pieces connect and data flows through the system.

How a Source Becomes Wiki Pages

When you run wn ingest https://example.com/article or the AI calls ingest_url(), here's what happens end to end:

URL
 |
 v
Jina Reader (r.jina.ai)          -- fetches page, returns clean markdown
 |
 v
SHA-256 hash                     -- checks if content already exists (dedup)
 |
 v
raw/article-title.md             -- saved to disk, immutable from this point
 |
 v
Database indexes the raw source  -- path, URL, hash, compiled=false
 |
 v
Content returned to the host AI  -- the AI now has the full text
 |
 v
AI reads wiki/index.md           -- understands what already exists
 |
 v
AI writes/updates 10-15 pages:
  wiki/sources/article.md        -- source summary
  wiki/concepts/topic-a.md       -- concept page (new or updated)
  wiki/concepts/topic-b.md       -- another concept touched
  wiki/index.md                  -- new pages added to catalog
  wiki/overview.md               -- evolving thesis refined
  wiki/tags.md                   -- new tags added
  wiki/contradictions.md         -- conflicts flagged (if any)
  wiki/log.md                    -- dated entry appended
 |
 v
AI calls index_article()         -- each page indexed in database (FTS5, tags, links)
 |
 v
AI calls mark_compiled()         -- raw source marked as processed

The key: WikiNow handles steps 1-5 (fetch, dedup, save, index, return content). The AI handles steps 6-10 (read wiki state, write pages, update index). WikiNow is the plumbing; the AI is the brain.

For YouTube URLs, the flow is the same but step 1 uses yt-dlp for English subtitles, with Whisper as a fallback if no subtitles exist. For local files (PDF, epub, audio, text), step 1 uses the appropriate extractor instead of Jina Reader.

How the Database Stays in Sync

The database is a disposable cache. Delete wikinow.db and it rebuilds from the markdown files on next access.

The sync happens automatically every time the database connection opens:

For each .md file in wiki/sources/, wiki/concepts/, wiki/comparisons/, wiki/queries/:
  |
  Is it in the database?
  |
  No  --> index it (parse frontmatter, extract wikilinks, add to FTS5)
  Yes --> compare file's last-modified time vs database's indexed_at time
           |
           File is newer --> re-index it
           File is same  --> skip (no work needed)

For each database entry:
  |
  Does the .md file still exist on disk?
  |
  No  --> delete from database (article, FTS5 entry, tags, links)
  Yes --> keep

Same process for raw/ files.

This means:

Edit a page in Obsidian --> database picks it up automatically
Delete a page --> database removes it
Add a new .md file manually --> database indexes it
Delete wikinow.db --> everything rebuilds from files

No rebuild command. No migration. No manual intervention.

How the Schema Evolves

Every project has a CLAUDE.md file that tells the AI how to maintain the wiki. It's generated on wn init with default instructions (13 ingest steps, 7 query steps, lint checks, confidence levels, frontmatter format).

But the schema is designed to evolve through conversation:

Session 1:
  You: "When you write concept pages, always include a ## History section"
  AI:  calls update_schema("Domain-Specific Notes", "Always include ## History...")
  -->  CLAUDE.md updated on disk

Session 2:
  AI reads CLAUDE.md on startup
  -->  sees the new instruction
  -->  all future concept pages include ## History

The update_schema() tool modifies specific sections of CLAUDE.md without touching the rest. The schema accumulates project-specific conventions over time -- the AI learns what works for your domain.

CLAUDE.md is symlinked to AGENTS.md (for Codex/Copilot) and .github/copilot-instructions.md (for GitHub Copilot). One file, three names, all AI tools read the same instructions.

Two Interfaces, Same Operations

WikiNow has two entry points that share the same underlying code:

Operation	CLI (`wn`)	MCP (host AI)
Ingest a URL	`wn ingest https://...`	`ingest_url("https://...")`
Search	`wn search "query"`	`search("query")`
Health check	`wn lint`	`lint()`
Read a page	`wn read concepts/ai.md`	`read("concepts/ai.md")`
Export	`wn export`	`export()`
Stats	`wn stats`	`get_project_stats()`

The CLI is for quick human interactions. The MCP server is for the AI to use programmatically during conversation. Both call the same database, same ingestion modules, same search.

Package Map

For contributors -- what each file does:

wikinow/
├── cli.py              Typer app -- 14 commands, Rich panels, ASCII logo
├── server.py           FastMCP server -- 21 tools, path traversal protection
├── config.py           YAML config -- frozen dataclasses, env var resolution, singleton
├── project.py          init (dirs + schema + git + obsidian), switch, list
├── templates.py        CLAUDE.md content, wiki file templates, Obsidian JSON configs
├── export.py           Concatenate wiki pages into single markdown file
├── db/
│   ├── schemas.py      5 CREATE TABLE statements + 5 indexes
│   └── storage.py      Self-healing cache -- CRUD, FTS5 search, filesystem sync
├── ingestion/
│   ├── jina.py         URL -> markdown via Jina Reader (stdlib urllib only)
│   ├── youtube.py      yt-dlp subtitles + Whisper fallback, json3 parsing
│   ├── pdf.py          pymupdf text extraction (optional dep)
│   ├── epub.py         ebooklib + BeautifulSoup chapter extraction (optional dep)
│   ├── audio.py        Whisper transcription, English-only check (optional dep)
│   └── text.py         pathlib read, title from filename stem
└── search/
    ├── wiki.py         FTS5 keyword search (delegates to db)
    └── web.py          Ollama web search API (optional dep)

WikiNow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

How a Source Becomes Wiki Pages

How the Database Stays in Sync

How the Schema Evolves

Two Interfaces, Same Operations

Package Map

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally