feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344
Open
breimanntools wants to merge 2 commits into
Open
feat(prediction): AAPredPlot.eval bars-with-hue + full AAPred notebook param coverage + CI gate#344breimanntools wants to merge 2 commits into
breimanntools wants to merge 2 commits into
Conversation
…ll notebook param coverage + CI gate - AAPredPlot.eval now renders a heatmap when both models and metrics vary (2D) and a grouped bar plot otherwise (1D); new 'kind' (force layout) and 'cmap' params. Resolves the eval-heatmap gap (issue #310). +7 adaptive tests. - All 14 AAPred/AAPredPlot example notebooks now demonstrate EVERY public parameter of their method (they previously showed 1-2), re-executed with outputs. - New CI gate tests/unit/api_tests/test_notebook_param_coverage.py: every method's example notebook must demonstrate every public param (name-based), with a committed baseline ratchet for the ~94-method backlog; prediction notebooks held to zero gaps. Strengthens the .claude/rules/notebooks.md rule from aspirational to enforced. 340 prediction + api-meta tests green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ison), not heatmap Per review: the heatmap/bar split by data shape was wrong. eval compares *methods* (models x metrics), so it is a grouped bar plot with hue = model — always. Comparing CPP *parameter combinations* (parameter ranges) is a different job already covered by the feature-optimization protocol (aap.find_features + its aap.plot_eval heatmap), which eval's docstring now points to. Drops the kind/cmap params and the heatmap branch; updates tests + the example notebook (demonstrates figsize/dict_color/baseline/ylabel, and a multi-model method comparison). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #344 +/- ##
=======================================
Coverage 94.83% 94.83%
=======================================
Files 196 196
Lines 18767 18767
Branches 3175 3175
=======================================
Hits 17797 17797
Misses 633 633
Partials 337 337
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three coupled changes, all off master, 335 prediction + api-meta tests green.
1.
AAPredPlot.eval→ grouped bars with hue = model (method comparison)evalcompares methods (models × metrics), so it is a grouped bar plot — one colored bar per model (the hue), grouped by metric, with cross-validationscore_stderror bars, hatched held-out bars, and an optional chancebaseline. Comparing CPP parameter combinations is a different job, already covered by the feature-optimization protocolaap.find_features+ itsaap.plot_evalheatmap, whicheval's docstring now points to. (Nokind/cmap; no heatmap branch.)This also settles the intent behind #310 — the "evaluation heatmap" is the parameter-range heatmap (
aap.plot_eval), whileevalis the method-comparison bar plot.2. Full parameter coverage in all 14 AAPred/AAPredPlot example notebooks
They previously demonstrated 1–2 params each; every one now demonstrates every public parameter of its method (as explicit keywords, grouped), re-executed with saved outputs.
3. A CI gate so this can't regress
tests/unit/api_tests/test_notebook_param_coverage.py: every method with an example notebook must demonstrate every public param (name-based). Prediction notebooks are held to zero gaps; a committednotebook_param_coverage_baseline.txtratchets down the pre-existing ~94-method / 257-param backlog (pyright-style — may only shrink, with a staleness guard forcing burn-down)..claude/rules/notebooks.mdupgraded from an aspirational note to a test-enforced rule.Root cause of the recurring gap: the notebook checker existed as a skill script but was never wired into CI or run. It is now a real gate. Backlog burn-down tracked in a follow-up issue.
🤖 Generated with Claude Code