Skip to content

feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344

Open
breimanntools wants to merge 2 commits into
masterfrom
feat/aapred-eval-adaptive
Open

feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344
breimanntools wants to merge 2 commits into
masterfrom
feat/aapred-eval-adaptive

Conversation

@breimanntools

Copy link
Copy Markdown
Owner

Three coupled changes, all off master, 335 prediction + api-meta tests green.

1. AAPredPlot.eval → grouped bars with hue = model (method comparison)

eval compares methods (models × metrics), so it is a grouped bar plot — one colored bar per model (the hue), grouped by metric, with cross-validation score_std error bars, hatched held-out bars, and an optional chance baseline. Comparing CPP parameter combinations is a different job, already covered by the feature-optimization protocol aap.find_features + its aap.plot_eval heatmap, which eval's docstring now points to. (No kind/cmap; no heatmap branch.)

This also settles the intent behind #310 — the "evaluation heatmap" is the parameter-range heatmap (aap.plot_eval), while eval is the method-comparison bar plot.

2. Full parameter coverage in all 14 AAPred/AAPredPlot example notebooks

They previously demonstrated 1–2 params each; every one now demonstrates every public parameter of its method (as explicit keywords, grouped), re-executed with saved outputs.

3. A CI gate so this can't regress

  • tests/unit/api_tests/test_notebook_param_coverage.py: every method with an example notebook must demonstrate every public param (name-based). Prediction notebooks are held to zero gaps; a committed notebook_param_coverage_baseline.txt ratchets down the pre-existing ~94-method / 257-param backlog (pyright-style — may only shrink, with a staleness guard forcing burn-down).
  • .claude/rules/notebooks.md upgraded from an aspirational note to a test-enforced rule.

Root cause of the recurring gap: the notebook checker existed as a skill script but was never wired into CI or run. It is now a real gate. Backlog burn-down tracked in a follow-up issue.

🤖 Generated with Claude Code

breimanntools and others added 2 commits July 4, 2026 15:15
…ll notebook param coverage + CI gate

- AAPredPlot.eval now renders a heatmap when both models and metrics vary (2D) and a
  grouped bar plot otherwise (1D); new 'kind' (force layout) and 'cmap' params. Resolves the
  eval-heatmap gap (issue #310). +7 adaptive tests.
- All 14 AAPred/AAPredPlot example notebooks now demonstrate EVERY public parameter of their
  method (they previously showed 1-2), re-executed with outputs.
- New CI gate tests/unit/api_tests/test_notebook_param_coverage.py: every method's example
  notebook must demonstrate every public param (name-based), with a committed baseline
  ratchet for the ~94-method backlog; prediction notebooks held to zero gaps. Strengthens
  the .claude/rules/notebooks.md rule from aspirational to enforced.

340 prediction + api-meta tests green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ison), not heatmap

Per review: the heatmap/bar split by data shape was wrong. eval compares *methods* (models
x metrics), so it is a grouped bar plot with hue = model — always. Comparing CPP *parameter
combinations* (parameter ranges) is a different job already covered by the feature-optimization
protocol (aap.find_features + its aap.plot_eval heatmap), which eval's docstring now points to.
Drops the kind/cmap params and the heatmap branch; updates tests + the example notebook
(demonstrates figsize/dict_color/baseline/ylabel, and a multi-model method comparison).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.83%. Comparing base (7dcc8d8) to head (17b7a1a).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #344   +/-   ##
=======================================
  Coverage   94.83%   94.83%           
=======================================
  Files         196      196           
  Lines       18767    18767           
  Branches     3175     3175           
=======================================
  Hits        17797    17797           
  Misses        633      633           
  Partials      337      337           
Files with missing lines Coverage Δ
aaanalysis/prediction/_aa_pred_plot.py 94.70% <100.00%> (ø)
Components Coverage Δ
cpp_core 94.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant