xbrain diff — compare two snapshots, surface unexpected drift

After a re-enrichment, it is currently impossible to tell **how much changed**. Did 2% of items get reassigned, or 40%? Did the topic-page overviews shift drastically? Today: no diff, no answer.

Depends on: snapshot system (#C).

### What drift means

Two distinct phenomena that the diff distinguishes:

1. **Expected drift from corpus growth** — a topic with +20 items (from 50 to 70) **should** change its overview. Healthy signal.
2. **Unexpected drift from prompt/model change** — same items reassigned to different topics, or an overview rewritten when only 2 items changed. Noise.

The diff surfaces both; the user (or eval, #8) judges which is which.

### Mini-spec

\`xbrain diff <snapshot-a> <snapshot-b>\` (default \`snapshot-b\` = current live state).

Output sections:

- **Items reassigned:** count + % of items whose \`primary_topic\` changed. List the top N most-frequent transitions (e.g. \`ai-coding → software-engineering: 12 items\`).
- **Topic-level changes:** for each topic, items added / removed / unchanged. Flag topics with >10% growth or >10% shrinkage.
- **Overview drift:** for each topic, similarity between old and new overview (cosine similarity of embeddings, or LLM-judged similarity if WS3 #8 is available). Flag overviews that changed sharply on small corpus changes.
- **Vocab changes:** which slugs were added, removed, renamed.

Optionally \`--format json\` for machine consumption (CI / WS3 eval can ingest).

### Acceptance

- Works without WS3 / LLM judge: minimum viable diff is mechanical (counts, set differences, embedding similarity via a small offline model).
- Adds an optional \`--judge\` flag that uses the WS3 judge if #8 is built.
- Tests cover: snapshot diff with fixture pairs; transitions correctly counted; vocab changes detected.

### Why this is not part of WS3

WS3 compares **against a fixed gold standard** ("is your enrichment good in absolute terms?"). \`xbrain diff\` compares **two runs of your system** ("is your enrichment stable across changes?"). Different question, different tool, shared infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xbrain diff — compare two snapshots, surface unexpected drift #18

What drift means

Mini-spec

Acceptance

Why this is not part of WS3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

xbrain diff — compare two snapshots, surface unexpected drift #18

Description

What drift means

Mini-spec

Acceptance

Why this is not part of WS3

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions