feat(pro): shap_to_feat_imp + CPPPlot sample= (SHAP per-sample plumbing)#323
Conversation
…CPPPlot sample= shortcut Ports the explanation-similarity clustermap from the original gamma-secretase project into a library-grade pro API, adds the shap_to_feat_imp normalization helper, and lets CPPPlot.ranking/profile/feature_map resolve a sample by name. - ShapModelPlot.clustermap: correlation-of-SHAP-vectors clustermap with row/col class-color sidebars, a class legend, a labelled horizontal colorbar, and font via plot_gco; returns the seaborn ClusterGrid. - ShapModelPlot.get_clusters: deterministic dendrogram cut (n_clusters / color_threshold), replacing the original dendrogram-color parsing. - shap_to_feat_imp: signed impact (reusing the ShapModel backend) / absolute importance, both normalized to sum(|.|)=100. - CPPPlot sample=: resolves col_imp=feat_impact_<entry> (+ TMD-JMD parts from df_parts for profile/feature_map) and sets shap_plot=True; default output unchanged when sample is None. ShapModelPlot / shap_to_feat_imp stay unwired at the top level (TODO #305, CONFIRM-FIRST). pro-gated; tests skip cleanly when shap is absent. Part of #305 / prototype for #313. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-consistency test - comp_shap_correlation: raise a clear ValueError naming samples with a constant (zero-variance) SHAP vector instead of an opaque scipy non-finite-distance error (covers clustermap + get_clusters). - shap_to_feat_imp: raise on an all-zero vector instead of silently returning nan. - Add a regression test proving get_clusters uses the same linkage the clustermap dendrogram draws, plus negative tests for both new guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under default Matplotlib rcParams the class legend overflowed the figure's right edge (~35px) and was clipped on a plain savefig without bbox_inches='tight'. Reserve a right margin via grid.gs.update(right=0.80) so the legend fits inside the canvas; verified under both default rcParams and plot_settings(). Add a regression test asserting the legend stays within the figure bounds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Counter dedup - Remove unused n_features unpacking (3 spots; only n_samples is used). - df_parts.index[int(sample)] instead of list(df_parts.index)[int(sample)]. - Duplicate-name detection via collections.Counter (single pass) instead of an O(n^2) list(names).count(...) scan. Output byte-identical; all tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The auto-discovering pro-contract meta-test requires every public *_pro symbol's one-line summary to carry the [pro] / aaanalysis[pro] install marker; shap_to_feat_imp lacked it and was failing test_pro_marker_in_summary. Add the marker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #323 +/- ##
=======================================
Coverage 94.93% 94.94%
=======================================
Files 185 186 +1
Lines 17883 17915 +32
Branches 3038 3040 +2
=======================================
+ Hits 16978 17010 +32
+ Misses 598 597 -1
- Partials 307 308 +1
... and 2 files with indirect coverage changes
🚀 New features to boost your workflow:
|
breimanntools
left a comment
There was a problem hiding this comment.
The clustermap is not only for shap values but also for featuers or other numerical represntations. Perhaps we need a plot_clustmarp utils insterad of asigning it to SHapModel. Should we make a general plotting class called AAPlot (aap) and the predction plots can be asigned to this one as well. Or AAPredPlot) I do not know right now
…d by AAPredPlot.clustermap The explanation-similarity clustermap now lives on the core AAPredPlot (drawing from provided importance vectors). Remove the ShapModelPlot class (clustermap + get_clusters), its clustermap backend (sm_plot.py), and its tests. This #323 branch now carries only the stand-alone shap_to_feat_imp helper (pro) + the CPPPlot sample= plumbing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
breimanntools
left a comment
There was a problem hiding this comment.
I want to check this in the example notebook! Make sure that this is capability and shortcut is shown there clarely (like first the way we do it so far, then intro that a shortcut for this exist as follows....
…ooks (review) Per PR review: show the sample= shortcut in the example notebooks after the existing manual per-sample SHAP path. Each of feature_map / ranking / profile now presents the explicit way first (name the impact column, resolve TMD-JMD parts via get_seq_kws, pass col_imp + parts + shap_plot=True) and then the equivalent one-argument shortcut (sample=<entry or index>). feature_map/profile key the impact column on the entry name (as add_feat_impact writes it) and pass df_seq + df_parts; ranking needs no parts. Both calls render side by side so the equivalence is visible. Notebooks re-executed with outputs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Part of #305 / #313, refreshed onto current master.
What this adds
shap_to_feat_imp(pro) — convert a per-sample SHAP vector into normalized signed feature impact / absolute importance, reusingShapModel.add_feat_impact's backend so the two never diverge.CPPPlot.ranking/profile/feature_mapsample=— resolve a sample by name (feature-impact column + TMD-JMD parts) so per-sample SHAP plots need no manualcol_imp=f"feat_impact_{name}"string plumbing.Consolidation note
The explanation-similarity clustermap moved to the core
AAPredPlot.clustermap(drawing from provided importance vectors), soShapModelPlot(which only held the clustermap + its dendrogram cut) was removed here along with its backend and tests. This PR now carries only the SHAP per-sample helpers above.Part of #313 (no closing keyword).
🤖 Generated with Claude Code