Skip to content

CPP filter blind to distributed (jointly-strong) signals #341

Description

@breimanntools

Part of #336 (usability epic).

Problem

CPP's feature filter ranks features by their individual mean_dif/abs_auc, so it is blind to
distributed signals — feature blocks that are individually weak but jointly decisive. Concrete case
from this project (iBCE-EL linear epitopes):

  • Amino-acid composition: each amino acid's abs_auc ≈ 0.03 (≈ random), yet the 20 together give
    ROC-AUC ≈ 0.75.
  • When CPP was given a combined identity + physicochemical scale set, the filter selected 0%
    identity
    features (physicochemical scales score higher individually) and performance collapsed
    0.75 → 0.57
    — the winning signal was filtered out.

Diagnostic that catches it: the marginal-vs-joint "lift" = full-block model AUC − best-single-feature
AUC (AAC +0.21 vs physicochemical +0.04).

Suggestion

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions