Problem
The July 2026 correctness audit found three defects that are real but change
published output, so — unlike the low-risk batch — each needs the same
governance before any code lands: a declared ADR-0032 equivalence tier and a
regolden of the affected regression anchors (ADR-0015 pattern). They are
independent defects, grouped here only because they share that decision gate and
must not be folded into the low-risk correctness batch.
1. CPP redundancy filter compares digit-characters, not positions
filtering_info_ builds each feature's position set with set(x) where x is the
comma-joined position string (e.g. "11,12,…,20"), so the position-overlap gate
compares digit characters {'0'–'9', ','} instead of integer positions. Any two
multi-digit-position features look ~100% overlapping, so the positional
decorrelation the algorithm documents effectively never fires — redundancy
reduction degenerates into a pure scale-correlation-within-category filter.
Reproduced: TMD-Segment(1,2) (pos 11–20) vs TMD-Segment(2,2) (pos 21–30), true
overlap 0.0, computed 1.0, second dropped. Identical defect in the CPP.simplify
path. This has been the shipped behavior since the file's inception.
2. TreeModel does no per-round seeding → zero uncertainty under a fixed seed
fit_tree_based_models' round loop passes a constant random_state to the RFE
RandomForestClassifier and the importance-model kwargs, so under a fixed seed
every round fits identical estimators; feat_importance_std (and predict_proba's
pred_std) collapse to exactly 0 and rounds 2..N are wasted. This hits the
encouraged reproducibility path and contradicts the "average across training
rounds enhances robustness" claim. ShapModel already solves this with per-round
random_state + round_idx reseeding (_seed_model_kwargs).
3. BH-adjusted p-values omit the monotonicity step
_bh_corrected_pvalues computes sorted_pvals * n / ranks and clips to 1 but
omits the reverse cumulative-minimum, so p_val_fdr_bh deviates from canonical
Benjamini–Hochberg (e.g. statsmodels.multipletests('fdr_bh')) in non-monotone
regions (inflated/conservative). Does not affect selection (ranking uses
abs_auc/abs_mean_dif), only the reported column.
Goal
Correct all three, each landing under a declared ADR-0032 tier with a regolden of
the affected anchors — after a maintainer decision per defect.
Decision needed (HITL)
Each defect changes published output and requires maintainer sign-off + a regolden
before code. Decide, per defect, the ADR-0032 tier and the quality band (for #1)
on the canonical DOM_GSEC cell.
Requirements
CPP redundancy filter (#1)
TreeModel per-round seeding (#2)
BH p-value monotonicity (#3)
KPIs / Acceptance criteria
Scope / non-goals
- All three are output-changing; kept out of the low-risk correctness batch.
- No performance changes (ADR-0033: program closed).
Problem
The July 2026 correctness audit found three defects that are real but change
published output, so — unlike the low-risk batch — each needs the same
governance before any code lands: a declared ADR-0032 equivalence tier and a
regolden of the affected regression anchors (ADR-0015 pattern). They are
independent defects, grouped here only because they share that decision gate and
must not be folded into the low-risk correctness batch.
1. CPP redundancy filter compares digit-characters, not positions
filtering_info_builds each feature's position set withset(x)wherexis thecomma-joined position string (e.g.
"11,12,…,20"), so the position-overlap gatecompares digit characters
{'0'–'9', ','}instead of integer positions. Any twomulti-digit-position features look ~100% overlapping, so the positional
decorrelation the algorithm documents effectively never fires — redundancy
reduction degenerates into a pure scale-correlation-within-category filter.
Reproduced:
TMD-Segment(1,2)(pos 11–20) vsTMD-Segment(2,2)(pos 21–30), trueoverlap 0.0, computed 1.0, second dropped. Identical defect in the
CPP.simplifypath. This has been the shipped behavior since the file's inception.
2. TreeModel does no per-round seeding → zero uncertainty under a fixed seed
fit_tree_based_models' round loop passes a constantrandom_stateto the RFERandomForestClassifierand the importance-model kwargs, so under a fixed seedevery round fits identical estimators;
feat_importance_std(andpredict_proba'spred_std) collapse to exactly 0 and rounds 2..N are wasted. This hits theencouraged reproducibility path and contradicts the "average across training
rounds enhances robustness" claim.
ShapModelalready solves this with per-roundrandom_state + round_idxreseeding (_seed_model_kwargs).3. BH-adjusted p-values omit the monotonicity step
_bh_corrected_pvaluescomputessorted_pvals * n / ranksand clips to 1 butomits the reverse cumulative-minimum, so
p_val_fdr_bhdeviates from canonicalBenjamini–Hochberg (e.g.
statsmodels.multipletests('fdr_bh')) in non-monotoneregions (inflated/conservative). Does not affect selection (ranking uses
abs_auc/abs_mean_dif), only the reported column.Goal
Correct all three, each landing under a declared ADR-0032 tier with a regolden of
the affected anchors — after a maintainer decision per defect.
Decision needed (HITL)
Each defect changes published output and requires maintainer sign-off + a regolden
before code. Decide, per defect, the ADR-0032 tier and the quality band (for #1)
on the canonical
DOM_GSECcell.Requirements
CPP redundancy filter (#1)
_backend/cpp/_filters/_redundancy_filter.pyand_backend/cpp/_simplify.py— parse
COL_POSITIONvia.split(",")(optionallymap(int, …)).TreeModel per-round seeding (#2)
explainable_ai/_backend/tree_model/tree_model_fit.py— per-round kwargs withrandom_state + ifor the RFERandomForestClassifierand the importancemodels (no-op when
random_state is None), mirroring_seed_model_kwargs.BH p-value monotonicity (#3)
_backend/cpp/_utils_feature_stat.py— applynp.minimum.accumulate(corrected[::-1])[::-1]before scattering back.p_val_fdr_bhcolumn on the canonical cell.KPIs / Acceptance criteria
set on the canonical cell re-frozen with the band stated numerically.
feat_importance_stdnon-zero under a fixed seed whilefit(seed) == fit(seed)still holds; anchor frozen.statsmodelsfdr_bh within tol on a non-monotone fixture;selection unchanged; anchor frozen.
versionchangednote.Scope / non-goals