Skip to content

feat(pro): shap_to_feat_imp + CPPPlot sample= (SHAP per-sample plumbing)#323

Merged
breimanntools merged 9 commits into
masterfrom
feat/shap-clustermap
Jul 3, 2026
Merged

feat(pro): shap_to_feat_imp + CPPPlot sample= (SHAP per-sample plumbing)#323
breimanntools merged 9 commits into
masterfrom
feat/shap-clustermap

Conversation

@breimanntools

@breimanntools breimanntools commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Part of #305 / #313, refreshed onto current master.

What this adds

  • shap_to_feat_imp (pro) — convert a per-sample SHAP vector into normalized signed feature impact / absolute importance, reusing ShapModel.add_feat_impact's backend so the two never diverge.
  • CPPPlot.ranking / profile / feature_map sample= — resolve a sample by name (feature-impact column + TMD-JMD parts) so per-sample SHAP plots need no manual col_imp=f"feat_impact_{name}" string plumbing.

Consolidation note

The explanation-similarity clustermap moved to the core AAPredPlot.clustermap (drawing from provided importance vectors), so ShapModelPlot (which only held the clustermap + its dendrogram cut) was removed here along with its backend and tests. This PR now carries only the SHAP per-sample helpers above.

Part of #313 (no closing keyword).

🤖 Generated with Claude Code

breimanntools and others added 5 commits July 1, 2026 05:02
…CPPPlot sample= shortcut

Ports the explanation-similarity clustermap from the original gamma-secretase
project into a library-grade pro API, adds the shap_to_feat_imp normalization
helper, and lets CPPPlot.ranking/profile/feature_map resolve a sample by name.

- ShapModelPlot.clustermap: correlation-of-SHAP-vectors clustermap with row/col
  class-color sidebars, a class legend, a labelled horizontal colorbar, and font
  via plot_gco; returns the seaborn ClusterGrid.
- ShapModelPlot.get_clusters: deterministic dendrogram cut (n_clusters /
  color_threshold), replacing the original dendrogram-color parsing.
- shap_to_feat_imp: signed impact (reusing the ShapModel backend) / absolute
  importance, both normalized to sum(|.|)=100.
- CPPPlot sample=: resolves col_imp=feat_impact_<entry> (+ TMD-JMD parts from
  df_parts for profile/feature_map) and sets shap_plot=True; default output
  unchanged when sample is None.

ShapModelPlot / shap_to_feat_imp stay unwired at the top level (TODO #305,
CONFIRM-FIRST). pro-gated; tests skip cleanly when shap is absent.

Part of #305 / prototype for #313.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-consistency test

- comp_shap_correlation: raise a clear ValueError naming samples with a constant
  (zero-variance) SHAP vector instead of an opaque scipy non-finite-distance error
  (covers clustermap + get_clusters).
- shap_to_feat_imp: raise on an all-zero vector instead of silently returning nan.
- Add a regression test proving get_clusters uses the same linkage the clustermap
  dendrogram draws, plus negative tests for both new guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under default Matplotlib rcParams the class legend overflowed the figure's right
edge (~35px) and was clipped on a plain savefig without bbox_inches='tight'.
Reserve a right margin via grid.gs.update(right=0.80) so the legend fits inside
the canvas; verified under both default rcParams and plot_settings(). Add a
regression test asserting the legend stays within the figure bounds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Counter dedup

- Remove unused n_features unpacking (3 spots; only n_samples is used).
- df_parts.index[int(sample)] instead of list(df_parts.index)[int(sample)].
- Duplicate-name detection via collections.Counter (single pass) instead of
  an O(n^2) list(names).count(...) scan.
Output byte-identical; all tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The auto-discovering pro-contract meta-test requires every public *_pro symbol's
one-line summary to carry the [pro] / aaanalysis[pro] install marker; shap_to_feat_imp
lacked it and was failing test_pro_marker_in_summary. Add the marker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.74468% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.94%. Comparing base (1a152de) to head (db3ee55).
⚠️ Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
aaanalysis/feature_engineering/_cpp_plot.py 93.75% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #323   +/-   ##
=======================================
  Coverage   94.93%   94.94%           
=======================================
  Files         185      186    +1     
  Lines       17883    17915   +32     
  Branches     3038     3040    +2     
=======================================
+ Hits        16978    17010   +32     
+ Misses        598      597    -1     
- Partials      307      308    +1     
Files with missing lines Coverage Δ
aaanalysis/explainable_ai_pro/__init__.py 100.00% <100.00%> (ø)
aaanalysis/explainable_ai_pro/_shap_model_plot.py 100.00% <100.00%> (ø)
aaanalysis/feature_engineering/_cpp_plot.py 97.96% <93.75%> (-0.38%) ⬇️

... and 2 files with indirect coverage changes

Components Coverage Δ
cpp_core 94.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@breimanntools breimanntools left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clustermap is not only for shap values but also for featuers or other numerical represntations. Perhaps we need a plot_clustmarp utils insterad of asigning it to SHapModel. Should we make a general plotting class called AAPlot (aap) and the predction plots can be asigned to this one as well. Or AAPredPlot) I do not know right now

breimanntools and others added 3 commits July 1, 2026 18:52
…d by AAPredPlot.clustermap

The explanation-similarity clustermap now lives on the core AAPredPlot (drawing from
provided importance vectors). Remove the ShapModelPlot class (clustermap + get_clusters),
its clustermap backend (sm_plot.py), and its tests. This #323 branch now carries only the
stand-alone shap_to_feat_imp helper (pro) + the CPPPlot sample= plumbing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@breimanntools breimanntools changed the title feat: ShapModelPlot.clustermap + shap_to_feat_imp + CPPPlot sample= (prototype for #313) feat(pro): shap_to_feat_imp + CPPPlot sample= (SHAP per-sample plumbing) Jul 2, 2026
@breimanntools breimanntools marked this pull request as ready for review July 2, 2026 12:48

@breimanntools breimanntools left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to check this in the example notebook! Make sure that this is capability and shortcut is shown there clarely (like first the way we do it so far, then intro that a shortcut for this exist as follows....

…ooks (review)

Per PR review: show the sample= shortcut in the example notebooks after the existing
manual per-sample SHAP path. Each of feature_map / ranking / profile now presents the
explicit way first (name the impact column, resolve TMD-JMD parts via get_seq_kws, pass
col_imp + parts + shap_plot=True) and then the equivalent one-argument shortcut
(sample=<entry or index>). feature_map/profile key the impact column on the entry name
(as add_feat_impact writes it) and pass df_seq + df_parts; ranking needs no parts. Both
calls render side by side so the equivalence is visible. Notebooks re-executed with outputs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@breimanntools breimanntools merged commit 0bc2266 into master Jul 3, 2026
13 checks passed
@breimanntools breimanntools deleted the feat/shap-clustermap branch July 3, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant