Skip to content

feat(stats): cluster-selection, silhouette-as-score, global faithfulness metrics#63

Merged
tsenoner merged 4 commits into
feat/projection-statisticsfrom
feat/projection-stats-extras
Jul 2, 2026
Merged

feat(stats): cluster-selection, silhouette-as-score, global faithfulness metrics#63
tsenoner merged 4 commits into
feat/projection-statisticsfrom
feat/projection-stats-extras

Conversation

@tsenoner

@tsenoner tsenoner commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Sub-branch of #61 (targets feat/projection-statistics, not main) so this new work is reviewed on top of the already-reviewed base.

What's new

1. --cluster-selection elbow | silhouette | both

Choose how the cluster count K is picked (on prepare --stats and protspace stats):

  • elbow (default) → cluster_<proj> membership column, validity rows label_kind=kmeans_elbow.
  • silhouettecluster_silhouette_<proj> (K maximising silhouette over the sweep), label_kind=kmeans_silhouette.
  • both → both columns + both label_kinds (statistics rows are distinguished by label_kind).

kmeans_elbow gains an optional silhouette-K pass (computed only on request, so the default path is unchanged).

2. Per-point silhouette as an attached confidence (not a separate column)

The membership value now carries the per-point silhouette as cluster N|<silhouette> — the same value|score convention as UniProt evidence codes / InterPro bit scores — replacing the separate silhouette_<proj> column. Gated by --no-scores. The auto legend strips the suffix to key categories by the bare cluster N.

⚠️ Frontend coordination: the frontend must treat cluster_* / cluster_silhouette_* columns as score-bearing (split value|score, category = left, confidence = right), exactly like the existing score-bearing annotations — otherwise every distinct silhouette becomes its own category. Documented on tsenoner/protspace_web#296.

3. Global faithfulness metrics

Two whole-layout metrics added alongside the local kNN ones (rows tagged scope: local|global):

  • random_triplet — relative-ordering accuracy over random triplets (∈[0,1]).
  • spearman_distance — rank correlation of all pairwise distances (∈[−1,1]).

Verification

  • 574 fast tests pass; ruff clean. New tests for cluster-selection, score gating, global metrics, silhouette-K, legend stripping, row-order invariance (both regimes), and CLI validation.
  • End-to-end CLI run confirmed (--stats --cluster-selection both): two clusterings (elbow K=7 vs silhouette K=2), membership values cluster 3|0.6013, 5 faithfulness metrics.
  • Ran a 5-agent adversarial verification (0 bugs); its two CONCERNs are fixed in the 2nd commit: random_triplet is now row-order invariant in all paths, prepare validates --cluster-selection fail-fast, stale docs refreshed.

Notes / decisions for review

  • Column naming: elbow keeps cluster_<proj> (backward compatible); silhouette uses cluster_silhouette_<proj>.
  • Global metrics reuse the existing faithfulness subsample (bounded n²); random_triplet needs paired_distances-supported metrics (euclidean/cosine/manhattan) and degrades best-effort otherwise.

🤖 Generated with Claude Code

tsenoner and others added 2 commits July 2, 2026 12:50
…-score, global faithfulness

Sub-branch of feat/projection-statistics for separate review.

- --cluster-selection elbow|silhouette|both (prepare + stats): emit the elbow
  clustering (`cluster_<proj>`), the max-silhouette-K clustering
  (`cluster_silhouette_<proj>`), or both; validity rows carry the matching
  label_kind (kmeans_elbow / kmeans_silhouette). kmeans_elbow optionally returns
  the silhouette-optimal K + labels (computed only on request).
- Per-point silhouette is now attached to the membership value as `cluster N|<sil>`
  (the UniProt-ECO / InterPro-bit-score convention) instead of a separate
  silhouette_<proj> column; gated by --no-scores. Legend builder strips the
  suffix to recover the bare category.
- Two global faithfulness metrics: random_triplet (relative-ordering accuracy
  over random triplets) and spearman_distance (rank correlation of all pairwise
  distances). Rows tagged scope=local|global.

Tests updated for the single-column format; added cases for cluster-selection,
score gating, global metrics, and silhouette-K selection. 572 fast tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- random_triplet was NOT row-order invariant for n<=sample_threshold (it samples
  triplets by array position). Canonicalise emb/coords/ids by id up front in
  FaithfulnessStatistic.compute so EVERY metric depends only on the id-set, in
  both the subsampled and non-subsampled paths. Invariance test now parametrised
  over both regimes and asserts all five metrics.
- prepare: validate --cluster-selection before the expensive query/embed/similarity
  stages (fail-fast), mirroring the stats command; add a CLI rejection test.
- Refresh stale docs/help/comments that still referenced the removed separate
  silhouette_<proj> column (carriage.py, cli/stats.py) and fix a "dense ranks"
  comment (ordinal ranks) + hoist a repeated fancy-index in random_triplet.

574 fast tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tsenoner and others added 2 commits July 2, 2026 13:31
…j> + sync docs

- Rename the elbow clustering's membership column cluster_<proj> -> cluster_elbow_<proj>
  so both selections are explicitly named (cluster_elbow_ / cluster_silhouette_).
  The column name is the only provenance signal that survives to the frontend
  (AnnotationColumn.extra is dropped at carriage), so name the method in it.
- Bring docs + notebook current with the whole extras feature set (they only
  reflected the base PR): --cluster-selection, silhouette-as-attached-score
  (no separate silhouette_ column), and the local/global faithfulness split.
  Updated docs/cli.md, CLAUDE.md, README.md, ProtSpace_Preparation.ipynb.

574 fast tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality pass over the projection-stats "extras" (cluster-selection,
silhouette-as-score, global faithfulness).

Correctness
- random_triplet: sample two DISTINCT others per anchor (j != m != anchor)
  instead of drawing uniformly from [0, n). Self-pairs are distance-0 and
  trivially "agree" in both spaces, biasing the accuracy score upward.

Robustness / efficiency
- faithfulness: return the n > hard_ceiling skip row BEFORE the canonical
  sort/copy, so oversized inputs (metrics skipped anyway) don't pay a wasted
  O(n log n) sort + two array copies.
- cluster-validity: fall back to the 'elbow' default when the raw stats API
  receives an unrecognised cluster_selection (the CLI already validates via a
  Typer enum) instead of silently emitting no labelling at all.

Simplify
- model --cluster-selection as ClusterSelection(str, Enum) in common_options;
  Typer auto-validates, deleting two duplicated manual validation blocks in
  prepare.py + stats.py.
- validity: carry selection_name in a _Labeling NamedTuple (drops the
  reverse-derivation; shrinks _emit_labeling's signature 8 -> 5 args).
- kmeans_elbow: unify the two duplicate ElbowResult return sites.
- faithfulness: factor the 3x repeated local-scope extra dict.

Docs
- sync stale test-count table in CLAUDE.md (37->43, 11->12, 9->10).
- sync driver.compute_statistics docstring params (cluster_selection,
  include_scores, max_fit_sample, n_triplets_per_point, cluster_annotations).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tsenoner tsenoner merged commit 0553202 into feat/projection-statistics Jul 2, 2026
4 checks passed
@tsenoner tsenoner deleted the feat/projection-stats-extras branch July 2, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant