feat(stats): annotation-based cluster-validity + ARI/NMI agreement#65
Open
tsenoner wants to merge 12 commits into
Open
feat(stats): annotation-based cluster-validity + ARI/NMI agreement#65tsenoner wants to merge 12 commits into
tsenoner wants to merge 12 commits into
Conversation
Rework cluster-validity to score user-selected annotations (silhouette/DBI/CH on both the embedding and each projection) + ARI/NMI vs the auto-clusters, replacing the circular auto-KMeans self-validity. Keeps the group-detection membership columns. Gap/BIC k-selection deferred to #64. Refs: #31, #64, protspace_web#296 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8 TDD tasks: annotation dimension in the data model, suitability filter + label builder, AnnotationValidityStatistic (embedding + projection), ARI/NMI agreement folded into ClusterValidityStatistic, driver once-per-embedding pass, --stats-annotation on stats + prepare, docs. Refs: #31, #64, protspace_web#296 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ation) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Threads an `annotations` kwarg through `compute_statistics` into every projection's StatContext, registers AnnotationValidityStatistic in the statistics registry, and adds a once-per-embedding pass that runs any statistic opting in via `embedding_space` (currently just annotation-validity) against the raw embedding as a separability ceiling. Also patches faithfulness.py's StatRow constructions with the now-required `annotation` field, and fixes the Task-1 test debt this exposed (_statrow helper + the 8→9 column schema assertion). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…line Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
_faith_row() built StatRow(...) without the now-required `annotation` field, breaking 3 tests after the annotation-dimension schema change. Faithfulness rows are not annotation-scoped, so annotation="". Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…otation Update CLAUDE.md, docs/cli.md, README.md, and the prepare-bundle Colab notebook to describe the shipped feature: silhouette/DBI/CH validity is now scored per user-selected annotation on both the embedding and each projection (space_kind embedding|projection, annotation column in statistics.parquet), auto-clustering is no longer self-scored but instead reports ARI/NMI agreement with each annotation, and the new --stats-annotation (auto|comma-list) flag picks which columns to score on prepare and stats. Refresh the stats/ package-structure tree and the test-file table with current test counts (grep -c '^def test_'), including the new test_annotation_select.py and test_annotation_validity.py. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…xtra-copy, cleanups) Fixes six minor findings from the whole-branch review of the annotation-based cluster-validity feature: stale eight-column docstrings in stats/base.py, a strict "auto" guard in cli/stats.py that didn't match the parser's normalised comparison, a shared/aliased `extra` dict across StatRows in annotation_validity.py, an unused np.unique return in validity.py, a duplicated _clean() call in annotation_select.py, and a doc/notebook wording+formatting nit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reworks projection cluster-validity to be annotation-based, aligning it with the original request in #31 (which asked for validity scores computed "for any selected feature/annotation" on the "original high-dimensional embeddings").
Why
The shipped metric computed silhouette/DBI/CH on auto-KMeans labels of the 2D projection — partly circular (KMeans optimises the compactness silhouette/CH reward) and annotation-agnostic. It answered "does this projection form clean KMeans blobs?" rather than "how well does this projection/embedding separate my
major_group?"What changed
stat_family="annotation_validity",space_kind ∈ {embedding, projection}.major_group?").stat_family="cluster_agreement", reusing the KMeans labels (no second sweep).n_clustersmeta + thecluster_elbow_*/cluster_silhouette_*membership columns (+ per-point silhouette confidence + auto legend).--stats-annotationonprepareandstats:auto(all suitable low-cardinality categoricals; default, gated on--stats) or a comma-separated list.statistics.parquetgains anannotationcolumn (additive — the bundle treats the table as opaque parquet, so no reader breaks).Design
Spec + plan committed under
docs/superpowers/. Key decisions were made interactively: score on both spaces;--stats-annotation auto|list; keep auto-clusters + add ARI/NMI; gap/BIC K-selection deferred to #64.Testing
Full fast suite green (593 passed). New tests:
test_annotation_select.py(suitability filter),test_annotation_validity.py(per-space scoring, missing-value exclusion, subsample determinism), plus driver/CLI coverage for the embedding+projection passes and--stats-annotation. Built task-by-task with per-task + whole-branch review.Parked follow-ups (after this lands)
annotationcolumn /annotation_validityrows.feat/projection-statistics→ main merge.Refs: #31, #64, protspace_web#296
🤖 Generated with Claude Code