Skip to content

feat: ShapModelPlot.clustermap (explanation-similarity) + shap_to_feat_imp + sample= SHAP plots #313

Description

@breimanntools

Part of #305.

Problem

The richest explanation figure in the original project — clustering samples by
explanation similarity (correlation of their per-sample SHAP-value vectors,
so proteins group by why the model scores them, not just by features) — is
~130 lines of sns.clustermap + scipy.cluster.hierarchy surgery
(plot_shap_cluster_map / plot_shap_dendrogram, scripts/plot_cpp_shap.py).
It has no library API. Separately, per-sample SHAP plots in the γ-secretase
notebook (cells 30/32) couple the plot call to add_feat_impact's naming by
hand:

seq_kws = sf.get_seq_kws(df_seq=_df_gsec, df_parts=df_parts_dom, sample=acc)
fig, ax = aa.CPPPlot().feature_map(df_feat=df_feat, shap_plot=True,
                                   col_imp=f"feat_impact_{name}", **seq_kws)

The col_imp=f"feat_impact_{name}" string-templating + get_seq_kws plumbing is
repeated per sample.

Goal

Ship a SHAP-correlation clustermap and let the existing SHAP plot paths resolve a
sample by name, removing both the 130-line clustermap boilerplate and the
per-sample string plumbing. All SHAP pieces stay in the pro extra.

Requirements

  • ShapModelPlot.clustermap(...) — correlation clustermap of per-sample SHAP
    vectors with row/col class-color sidebars (port plot_shap_cluster_map).
  • shap_to_feat_imp helper: SHAP vector → normalized signed feature impact
    (shap/Σ|shap|*100) + absolute importance (port ml_shap.py); reuse any
    existing ShapModel logic to avoid divergence.
  • Let feature_map / ranking / profile accept sample="APP" to resolve
    the feat_impact_* column + seq_kws internally (removes the
    get_seq_kws + col_imp plumbing from cells 30/32).
  • numpydoc + Examples include; pro-gated import.

KPIs / Acceptance criteria

  • clustermap reproduces the project figure's clustering on the canonical
    SHAP matrix (linkage + cluster assignment match).
  • feature_map(..., sample="APP") equals today's explicit
    col_imp=f"feat_impact_APP" + seq_kws call (regression).
  • shap_to_feat_imp matches the manual normalization within float tolerance.
  • ≥1 unit test per new symbol; pro import-gating verified (core import
    still succeeds without SHAP).

Scope / non-goals

Dependencies

Standards checklist

  • pro/core gating · CONFIRM-FIRST (__init__.py/__all__) · plotting.md
    compliance · numpydoc · tests · no-print

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:2Importanttopic:XAIExplainability methods integrated into AAanalysistype:featureImplementation of feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions