-
Notifications
You must be signed in to change notification settings - Fork 0
Add documentation of applications. #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
6a56957
improved changelog
GernotMaier c474379
improved readme
GernotMaier fe069c5
add diagnostic docu
GernotMaier 6ec8cb4
dependabot
GernotMaier 367c39b
diagnostics
GernotMaier c92a9ea
importance
GernotMaier 064b1bc
docstring improvements
GernotMaier 1c14cd5
generalization gap
GernotMaier 53893b4
pyproject.toml
GernotMaier 8264e44
diagnostics
GernotMaier 8c45b4b
readme
GernotMaier ef42113
changelog
GernotMaier 8a8d698
warnings
GernotMaier 14674f2
sign
GernotMaier eb305f7
correction cp
GernotMaier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| --- | ||
| # Set update schedule for GitHub Actions | ||
|
|
||
| version: 2 | ||
| updates: | ||
|
|
||
| - package-ecosystem: "github-actions" | ||
| directory: "/" | ||
| schedule: | ||
| # Check for updates to GitHub Actions every month | ||
| interval: "monthly" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,14 +1,37 @@ | ||
| Fix critical bugs in stereo regression pipeline: | ||
| ## Stereo Regression: Training on Residuals with Standardization and Energy Weighting | ||
|
|
||
| - **Fixed double log10 application**: E_residual was being computed with log10(ErecS) that had already been log10'd. Now ErecS/Erec remain in linear space during training/apply; log10 applied explicitly when needed. | ||
| - **Fixed energy bin weighting**: Bins with fewer than 10 events now correctly get zero weight instead of being clamped; weight sorting preserves bin order. | ||
| - **Fixed standardization inversion**: Added proper loading and validation of target_mean/target_std scalers in stereo apply pipeline to prevent KeyError crashes. | ||
| - **Fixed ErecS validation**: Safe log10 computation during apply avoids RuntimeWarning for invalid values; all output rows preserved even with invalid energy. | ||
| - **Fixed evaluation metrics**: ErecS in evaluation now properly converted to log10 space for energy resolution comparison. | ||
| - **Fixed FutureWarning**: Series positional indexing converted to numpy arrays for future pandas compatibility. | ||
| ### Architectural Change | ||
|
|
||
| New features and improvements: | ||
| - **Training targets changed from absolute to residual values**: Models now predict residuals (deviations from baseline reconstructions) rather than absolute directions/energies. This allows XGBoost to learn corrections to existing Eventdisplay reconstructions (DispBDT, intersection method) and leverage their baseline accuracy as a starting point. | ||
|
|
||
| - **Comprehensive test coverage**: Added `test_regression_apply.py` with full unit test suite covering standardization inversion, residual computation, ErecS handling, and final prediction reconstruction. | ||
| - **Improved error messages**: Clear, actionable error messages when standardization parameters are missing or mismatched in apply pipeline. | ||
| - **Data preservation guarantee**: Stereo apply pipeline now preserves all input rows even when encountering invalid energy values, ensuring output count equals input count. | ||
| ### Critical Bug Fixes | ||
|
|
||
| - **Fixed double log10 application**: Energy residuals computed in linear space; log10 applied explicitly during evaluation | ||
| - **Fixed standardization inversion**: Apply pipeline now loads and validates target_mean/target_std scalers (prevents KeyError) | ||
| - **Fixed energy-bin weighting**: Bins with <10 events get zero weight; correct inverse weighting for balanced training | ||
| - **Fixed ErecS validation**: Safe log10 computation during apply; all input rows preserved in output | ||
GernotMaier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - **Fixed evaluation metrics**: Energy resolution compared in log10 space with proper baseline alignment | ||
| - **Fixed FutureWarning**: Series positional indexing converted to numpy arrays for pandas compatibility | ||
|
|
||
| ### New Features | ||
|
|
||
| - **Target standardization in training**: Residuals standardized to mean=0, std=1 during training to enable multi-target learning with balanced learning signals (direction and energy equally weighted) | ||
| - **Energy-bin weighted training**: Events weighted inversely by energy bin density; bins with <10 events excluded to prevent overfitting on low-statistics regions | ||
| - **Per-target SHAP importance caching**: Feature importances computed once during training for each target (Xoff_residual, Yoff_residual, E_residual), cached for diagnostic tools | ||
| - **Diagnostic scripts**: | ||
| - `diagnostic_shap_summary.py`: Top-20 feature importance plots per residual target | ||
| - `plot_training_evaluation.py`: Energy resolution and residual distribution visualization | ||
| - **Comprehensive test suites**: 20 new tests covering residual computation, standardization, energy weighting, apply inference | ||
| - **Robust error handling**: Clear messages for missing scalers; guaranteed row-count preservation in apply pipeline | ||
|
|
||
| ### Enhanced Diagnostic Pipeline | ||
|
|
||
| - **Generalization-gap metrics cached during training**: Train/test RMSE, gap %, and generalization ratio computed and cached in the model artifact, enabling fast overfitting assessment without recomputation | ||
| - **Residual normality statistics cached during training**: Normality tests (Kolmogorov-Smirnov, Anderson-Darling), distribution shape metrics (skewness, kurtosis, Q-Q R²), and outlier counts computed once during training and cached for fast retrieval | ||
| - **Diagnostic reconstruction from model metadata**: All regression diagnostics (generalization-gap, partial-dependence, residual-normality) now reconstruct the held-out test split from stored model metadata + input file list, enabling reproducibility and offline analysis without CSV exports | ||
| - **Cache-first diagnostic workflows**: Diagnostic scripts load cached metrics first (fast) with graceful fallback to reconstruction if cache unavailable (backward compatible with older models) | ||
| - **CLI entry points for all diagnostics**: | ||
| - `eventdisplay-ml-diagnostic-generalization-gap`: Quantify overfitting via train/test RMSE comparison | ||
| - `eventdisplay-ml-diagnostic-partial-dependence`: Validate model captures physics via partial dependence curves | ||
| - `eventdisplay-ml-diagnostic-residual-normality`: Validate residual normality and detect outliers | ||
| - **Fixed sklearn FutureWarning**: Partial dependence plots convert feature data to float64 to avoid integer dtype warnings in newer scikit-learn versions | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.