Part of #305.
Problem
The richest explanation figure in the original project — clustering samples by
explanation similarity (correlation of their per-sample SHAP-value vectors,
so proteins group by why the model scores them, not just by features) — is
~130 lines of sns.clustermap + scipy.cluster.hierarchy surgery
(plot_shap_cluster_map / plot_shap_dendrogram, scripts/plot_cpp_shap.py).
It has no library API. Separately, per-sample SHAP plots in the γ-secretase
notebook (cells 30/32) couple the plot call to add_feat_impact's naming by
hand:
seq_kws = sf.get_seq_kws(df_seq=_df_gsec, df_parts=df_parts_dom, sample=acc)
fig, ax = aa.CPPPlot().feature_map(df_feat=df_feat, shap_plot=True,
col_imp=f"feat_impact_{name}", **seq_kws)
The col_imp=f"feat_impact_{name}" string-templating + get_seq_kws plumbing is
repeated per sample.
Goal
Ship a SHAP-correlation clustermap and let the existing SHAP plot paths resolve a
sample by name, removing both the 130-line clustermap boilerplate and the
per-sample string plumbing. All SHAP pieces stay in the pro extra.
Requirements
KPIs / Acceptance criteria
Scope / non-goals
Dependencies
Standards checklist
Part of #305.
Problem
The richest explanation figure in the original project — clustering samples by
explanation similarity (correlation of their per-sample SHAP-value vectors,
so proteins group by why the model scores them, not just by features) — is
~130 lines of
sns.clustermap+scipy.cluster.hierarchysurgery(
plot_shap_cluster_map/plot_shap_dendrogram,scripts/plot_cpp_shap.py).It has no library API. Separately, per-sample SHAP plots in the γ-secretase
notebook (cells 30/32) couple the plot call to
add_feat_impact's naming byhand:
The
col_imp=f"feat_impact_{name}"string-templating +get_seq_kwsplumbing isrepeated per sample.
Goal
Ship a SHAP-correlation clustermap and let the existing SHAP plot paths resolve a
sample by name, removing both the 130-line clustermap boilerplate and the
per-sample string plumbing. All SHAP pieces stay in the
proextra.Requirements
ShapModelPlot.clustermap(...)— correlation clustermap of per-sample SHAPvectors with row/col class-color sidebars (port
plot_shap_cluster_map).shap_to_feat_imphelper: SHAP vector → normalized signed feature impact(
shap/Σ|shap|*100) + absolute importance (portml_shap.py); reuse anyexisting
ShapModellogic to avoid divergence.feature_map/ranking/profileacceptsample="APP"to resolvethe
feat_impact_*column +seq_kwsinternally (removes theget_seq_kws+col_impplumbing from cells 30/32).Examplesinclude;pro-gated import.KPIs / Acceptance criteria
clustermapreproduces the project figure's clustering on the canonicalSHAP matrix (linkage + cluster assignment match).
feature_map(..., sample="APP")equals today's explicitcol_imp=f"feat_impact_APP"+seq_kwscall (regression).shap_to_feat_impmatches the manual normalization within float tolerance.proimport-gating verified (core importstill succeeds without SHAP).
Scope / non-goals
proonly (SHAP is aprodep). The custom trimmed/recolored sub-treerendering is optional polish, not required for the KPI.
Dependencies
Standards checklist
__init__.py/__all__) · plotting.mdcompliance · numpydoc · tests · no-print