feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate by breimanntools · Pull Request #344 · breimanntools/aaanalysis

breimanntools · 2026-07-04T15:28:36Z

Three coupled changes, all off master, 335 prediction + api-meta tests green.

1. `AAPredPlot.eval` → grouped bars with hue = model (method comparison)

eval compares methods (models × metrics), so it is a grouped bar plot — one colored bar per model (the hue), grouped by metric, with cross-validation score_std error bars, hatched held-out bars, and an optional chance baseline. Comparing CPP parameter combinations is a different job, already covered by the feature-optimization protocol aap.find_features + its aap.plot_eval heatmap, which eval's docstring now points to. (No kind/cmap; no heatmap branch.)

This also settles the intent behind #310 — the "evaluation heatmap" is the parameter-range heatmap (aap.plot_eval), while eval is the method-comparison bar plot.

2. Full parameter coverage in all 14 AAPred/AAPredPlot example notebooks

They previously demonstrated 1–2 params each; every one now demonstrates every public parameter of its method (as explicit keywords, grouped), re-executed with saved outputs.

3. A CI gate so this can't regress

tests/unit/api_tests/test_notebook_param_coverage.py: every method with an example notebook must demonstrate every public param (name-based). Prediction notebooks are held to zero gaps; a committed notebook_param_coverage_baseline.txt ratchets down the pre-existing ~94-method / 257-param backlog (pyright-style — may only shrink, with a staleness guard forcing burn-down).
.claude/rules/notebooks.md upgraded from an aspirational note to a test-enforced rule.

Root cause of the recurring gap: the notebook checker existed as a skill script but was never wired into CI or run. It is now a real gate. Backlog burn-down tracked in a follow-up issue.

🤖 Generated with Claude Code

…ll notebook param coverage + CI gate - AAPredPlot.eval now renders a heatmap when both models and metrics vary (2D) and a grouped bar plot otherwise (1D); new 'kind' (force layout) and 'cmap' params. Resolves the eval-heatmap gap (issue #310). +7 adaptive tests. - All 14 AAPred/AAPredPlot example notebooks now demonstrate EVERY public parameter of their method (they previously showed 1-2), re-executed with outputs. - New CI gate tests/unit/api_tests/test_notebook_param_coverage.py: every method's example notebook must demonstrate every public param (name-based), with a committed baseline ratchet for the ~94-method backlog; prediction notebooks held to zero gaps. Strengthens the .claude/rules/notebooks.md rule from aspirational to enforced. 340 prediction + api-meta tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ison), not heatmap Per review: the heatmap/bar split by data shape was wrong. eval compares *methods* (models x metrics), so it is a grouped bar plot with hue = model — always. Comparing CPP *parameter combinations* (parameter ranges) is a different job already covered by the feature-optimization protocol (aap.find_features + its aap.plot_eval heatmap), which eval's docstring now points to. Drops the kind/cmap params and the heatmap branch; updates tests + the example notebook (demonstrates figsize/dict_color/baseline/ylabel, and a multi-model method comparison). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

codecov · 2026-07-04T15:55:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.83%. Comparing base (7dcc8d8) to head (17b7a1a).

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #344   +/-   ##
=======================================
  Coverage   94.83%   94.83%           
=======================================
  Files         196      196           
  Lines       18767    18767           
  Branches     3175     3175           
=======================================
  Hits        17797    17797           
  Misses        633      633           
  Partials      337      337

Files with missing lines	Coverage Δ
aaanalysis/prediction/_aa_pred_plot.py	`94.70% <100.00%> (ø)`

Components	Coverage Δ
cpp_core	`94.95% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

breimanntools and others added 2 commits July 4, 2026 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344

feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344
breimanntools wants to merge 2 commits into
masterfrom
feat/aapred-eval-adaptive

breimanntools commented Jul 4, 2026

Uh oh!

codecov Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breimanntools commented Jul 4, 2026

1. AAPredPlot.eval → grouped bars with hue = model (method comparison)

2. Full parameter coverage in all 14 AAPred/AAPredPlot example notebooks

3. A CI gate so this can't regress

Uh oh!

codecov Bot commented Jul 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `AAPredPlot.eval` → grouped bars with hue = model (method comparison)