You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a re-enrichment, it is currently impossible to tell how much changed. Did 2% of items get reassigned, or 40%? Did the topic-page overviews shift drastically? Today: no diff, no answer.
Depends on: snapshot system (#C).
What drift means
Two distinct phenomena that the diff distinguishes:
Expected drift from corpus growth — a topic with +20 items (from 50 to 70) should change its overview. Healthy signal.
Unexpected drift from prompt/model change — same items reassigned to different topics, or an overview rewritten when only 2 items changed. Noise.
The diff surfaces both; the user (or eval, #8) judges which is which.
Mini-spec
`xbrain diff ` (default `snapshot-b` = current live state).
Output sections:
Items reassigned: count + % of items whose `primary_topic` changed. List the top N most-frequent transitions (e.g. `ai-coding → software-engineering: 12 items`).
Topic-level changes: for each topic, items added / removed / unchanged. Flag topics with >10% growth or >10% shrinkage.
Overview drift: for each topic, similarity between old and new overview (cosine similarity of embeddings, or LLM-judged similarity if WS3 WS3 — enrichment evaluation harness #8 is available). Flag overviews that changed sharply on small corpus changes.
Vocab changes: which slugs were added, removed, renamed.
Optionally `--format json` for machine consumption (CI / WS3 eval can ingest).
Acceptance
Works without WS3 / LLM judge: minimum viable diff is mechanical (counts, set differences, embedding similarity via a small offline model).
WS3 compares against a fixed gold standard ("is your enrichment good in absolute terms?"). `xbrain diff` compares two runs of your system ("is your enrichment stable across changes?"). Different question, different tool, shared infrastructure.
After a re-enrichment, it is currently impossible to tell how much changed. Did 2% of items get reassigned, or 40%? Did the topic-page overviews shift drastically? Today: no diff, no answer.
Depends on: snapshot system (#C).
What drift means
Two distinct phenomena that the diff distinguishes:
The diff surfaces both; the user (or eval, #8) judges which is which.
Mini-spec
`xbrain diff ` (default `snapshot-b` = current live state).
Output sections:
Optionally `--format json` for machine consumption (CI / WS3 eval can ingest).
Acceptance
Why this is not part of WS3
WS3 compares against a fixed gold standard ("is your enrichment good in absolute terms?"). `xbrain diff` compares two runs of your system ("is your enrichment stable across changes?"). Different question, different tool, shared infrastructure.