Skip to content

feat(stats): annotation-based cluster-validity + ARI/NMI agreement#65

Open
tsenoner wants to merge 12 commits into
feat/projection-statisticsfrom
feat/annotation-cluster-validity
Open

feat(stats): annotation-based cluster-validity + ARI/NMI agreement#65
tsenoner wants to merge 12 commits into
feat/projection-statisticsfrom
feat/annotation-cluster-validity

Conversation

@tsenoner

@tsenoner tsenoner commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Reworks projection cluster-validity to be annotation-based, aligning it with the original request in #31 (which asked for validity scores computed "for any selected feature/annotation" on the "original high-dimensional embeddings").

Why

The shipped metric computed silhouette/DBI/CH on auto-KMeans labels of the 2D projection — partly circular (KMeans optimises the compactness silhouette/CH reward) and annotation-agnostic. It answered "does this projection form clean KMeans blobs?" rather than "how well does this projection/embedding separate my major_group?"

What changed

  • Annotation-based validity — silhouette / Davies-Bouldin / Calinski-Harabasz are now scored on a user-selected annotation's own category labels, on both the source embedding (the true-separability "ceiling", computed once per embedding) and each 2D projection (as-displayed). stat_family="annotation_validity", space_kind ∈ {embedding, projection}.
  • ARI/NMI agreement — the retained auto-clusterings are compared against each annotation ("did KMeans recover major_group?"). stat_family="cluster_agreement", reusing the KMeans labels (no second sweep).
  • Removed the circular auto-cluster self-validity rows. Kept n_clusters meta + the cluster_elbow_*/cluster_silhouette_* membership columns (+ per-point silhouette confidence + auto legend).
  • New flag --stats-annotation on prepare and stats: auto (all suitable low-cardinality categoricals; default, gated on --stats) or a comma-separated list.
  • statistics.parquet gains an annotation column (additive — the bundle treats the table as opaque parquet, so no reader breaks).

Design

Spec + plan committed under docs/superpowers/. Key decisions were made interactively: score on both spaces; --stats-annotation auto|list; keep auto-clusters + add ARI/NMI; gap/BIC K-selection deferred to #64.

Testing

Full fast suite green (593 passed). New tests: test_annotation_select.py (suitability filter), test_annotation_validity.py (per-space scoring, missing-value exclusion, subsample determinism), plus driver/CLI coverage for the embedding+projection passes and --stats-annotation. Built task-by-task with per-task + whole-branch review.

Parked follow-ups (after this lands)

  • Regenerate the 3FTx sample bundle and update the frontend spec protspace_web#296 for the new annotation column / annotation_validity rows.
  • The paused feat/projection-statistics → main merge.

Refs: #31, #64, protspace_web#296

🤖 Generated with Claude Code

tsenoner and others added 12 commits July 2, 2026 22:49
Rework cluster-validity to score user-selected annotations (silhouette/DBI/CH
on both the embedding and each projection) + ARI/NMI vs the auto-clusters,
replacing the circular auto-KMeans self-validity. Keeps the group-detection
membership columns. Gap/BIC k-selection deferred to #64.

Refs: #31, #64, protspace_web#296

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 TDD tasks: annotation dimension in the data model, suitability filter +
label builder, AnnotationValidityStatistic (embedding + projection), ARI/NMI
agreement folded into ClusterValidityStatistic, driver once-per-embedding
pass, --stats-annotation on stats + prepare, docs.

Refs: #31, #64, protspace_web#296

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ation)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Threads an `annotations` kwarg through `compute_statistics` into every
projection's StatContext, registers AnnotationValidityStatistic in the
statistics registry, and adds a once-per-embedding pass that runs any
statistic opting in via `embedding_space` (currently just
annotation-validity) against the raw embedding as a separability
ceiling. Also patches faithfulness.py's StatRow constructions with the
now-required `annotation` field, and fixes the Task-1 test debt this
exposed (_statrow helper + the 8→9 column schema assertion).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…line

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
_faith_row() built StatRow(...) without the now-required `annotation`
field, breaking 3 tests after the annotation-dimension schema change.
Faithfulness rows are not annotation-scoped, so annotation="".

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…otation

Update CLAUDE.md, docs/cli.md, README.md, and the prepare-bundle Colab
notebook to describe the shipped feature: silhouette/DBI/CH validity is
now scored per user-selected annotation on both the embedding and each
projection (space_kind embedding|projection, annotation column in
statistics.parquet), auto-clustering is no longer self-scored but
instead reports ARI/NMI agreement with each annotation, and the new
--stats-annotation (auto|comma-list) flag picks which columns to score
on prepare and stats. Refresh the stats/ package-structure tree and the
test-file table with current test counts (grep -c '^def test_'),
including the new test_annotation_select.py and test_annotation_validity.py.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…xtra-copy, cleanups)

Fixes six minor findings from the whole-branch review of the
annotation-based cluster-validity feature: stale eight-column docstrings
in stats/base.py, a strict "auto" guard in cli/stats.py that didn't match
the parser's normalised comparison, a shared/aliased `extra` dict across
StatRows in annotation_validity.py, an unused np.unique return in
validity.py, a duplicated _clean() call in annotation_select.py, and a
doc/notebook wording+formatting nit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant