Skip to content

Resolve three output-changing correctness defects (CPP redundancy filter, TreeModel seeding, BH p-values) under ADR-0032 + regolden #343

Description

@breimanntools

Problem

The July 2026 correctness audit found three defects that are real but change
published output
, so — unlike the low-risk batch — each needs the same
governance before any code lands: a declared ADR-0032 equivalence tier and a
regolden of the affected regression anchors (ADR-0015 pattern). They are
independent defects, grouped here only because they share that decision gate and
must not be folded into the low-risk correctness batch.

1. CPP redundancy filter compares digit-characters, not positions

filtering_info_ builds each feature's position set with set(x) where x is the
comma-joined position string (e.g. "11,12,…,20"), so the position-overlap gate
compares digit characters {'0'–'9', ','} instead of integer positions. Any two
multi-digit-position features look ~100% overlapping, so the positional
decorrelation the algorithm documents effectively never fires — redundancy
reduction degenerates into a pure scale-correlation-within-category filter.
Reproduced: TMD-Segment(1,2) (pos 11–20) vs TMD-Segment(2,2) (pos 21–30), true
overlap 0.0, computed 1.0, second dropped. Identical defect in the CPP.simplify
path. This has been the shipped behavior since the file's inception.

2. TreeModel does no per-round seeding → zero uncertainty under a fixed seed

fit_tree_based_models' round loop passes a constant random_state to the RFE
RandomForestClassifier and the importance-model kwargs, so under a fixed seed
every round fits identical estimators; feat_importance_std (and predict_proba's
pred_std) collapse to exactly 0 and rounds 2..N are wasted. This hits the
encouraged reproducibility path and contradicts the "average across training
rounds enhances robustness" claim. ShapModel already solves this with per-round
random_state + round_idx reseeding (_seed_model_kwargs).

3. BH-adjusted p-values omit the monotonicity step

_bh_corrected_pvalues computes sorted_pvals * n / ranks and clips to 1 but
omits the reverse cumulative-minimum, so p_val_fdr_bh deviates from canonical
Benjamini–Hochberg (e.g. statsmodels.multipletests('fdr_bh')) in non-monotone
regions (inflated/conservative). Does not affect selection (ranking uses
abs_auc/abs_mean_dif), only the reported column.

Goal

Correct all three, each landing under a declared ADR-0032 tier with a regolden of
the affected anchors — after a maintainer decision per defect.

Decision needed (HITL)

Each defect changes published output and requires maintainer sign-off + a regolden
before code. Decide, per defect, the ADR-0032 tier and the quality band (for #1)
on the canonical DOM_GSEC cell.

Requirements

CPP redundancy filter (#1)

  • _backend/cpp/_filters/_redundancy_filter.py and _backend/cpp/_simplify.py
    — parse COL_POSITION via .split(",") (optionally map(int, …)).
  • Declare the ADR-0032 tier (T3) + documented quality band; regolden the CPP anchor.

TreeModel per-round seeding (#2)

  • explainable_ai/_backend/tree_model/tree_model_fit.py — per-round kwargs with
    random_state + i for the RFE RandomForestClassifier and the importance
    models (no-op when random_state is None), mirroring _seed_model_kwargs.
  • Regression anchor freezing the seeded importance mean + asserting non-zero std.

BH p-value monotonicity (#3)

  • _backend/cpp/_utils_feature_stat.py — apply
    np.minimum.accumulate(corrected[::-1])[::-1] before scattering back.
  • Anchor the p_val_fdr_bh column on the canonical cell.

KPIs / Acceptance criteria

Scope / non-goals

  • All three are output-changing; kept out of the low-risk correctness batch.
  • No performance changes (ADR-0033: program closed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:1Very importanttopic:coreCore featurestype:bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions