Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
# Set update schedule for GitHub Actions

version: 2
updates:

- package-ecosystem: "github-actions"
directory: "/"
schedule:
# Check for updates to GitHub Actions every month
interval: "monthly"
270 changes: 270 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,59 @@ Stereo analysis methods implemented in Eventdisplay provide direction / energies

Output is a single ROOT tree called `StereoAnalysis` with the same number of events as the input tree.

### Training Stereo Reconstruction Models

The stereo regression training pipeline uses multi-target XGBoost to predict residuals (deviations from baseline reconstructions):

**Targets:** `[Xoff_residual, Yoff_residual, E_residual]` (residuals on direction and energy as reconstruction by the BDT stereo reconstruction method)

**Key techniques:**

- **Target standardization:** Targets are mean-centered and scaled to unit variance during training
- **Energy-bin weighting:** Events are weighted inversely by energy bin density; bins with fewer than 10 events are excluded from training to prevent overfitting on low-statistics regions
- **Multiplicity weighting:** Higher-multiplicity events (more telescopes) receive higher sample weights to prioritize high-confidence reconstructions
- **Per-target SHAP importance:** Feature importance values computed during training for each target and cached for later analysis

**Training command:**

```bash
eventdisplay-ml-train-xgb-stereo \
--input_file_list train_files.txt \
--model_prefix models/stereo_model \
--max_events 100000 \
--train_test_fraction 0.5 \
--max_cores 8
```

**Output:** Joblib model file containing:

- XGBoost trained model object
- Target standardization scalers (mean/std)
- Feature list and SHAP importance rankings
- Training metadata (random state, hyperparameters)

### Applying Stereo Reconstruction Models

The apply pipeline loads trained models and makes predictions:

**Key safeguards:**

- Invalid energy values (≤0 or NaN) produce NaN outputs but preserve all input event rows
- Missing standardization parameters raise ValueError (prevents silent data corruption)
- Output row count always equals input row count

**Apply command:**

```bash
eventdisplay-ml-apply-xgb-stereo \
--input_file_list apply_files.txt \
--output_file_list output_files.txt \
--model_prefix models/stereo_model
```


**Output:** ROOT files with `StereoAnalysis` tree containing reconstructed Xoff, Yoff, and log10(E).

## Gamma/hadron separation using XGBoost

Gamma/hadron separation is performed using XGB Boost classification trees. Features are image parameters and stereo reconstruction parameters provided by Eventdisplay.
Expand All @@ -27,6 +80,223 @@ The zenith angle dependence is accounted for by including the zenith angle as a

Output is a single ROOT tree called `Classification` with the same number of events as the input tree. It contains the classification prediction (`Gamma_Prediction`) and boolean flags (e.g. `Is_Gamma_75` for 75% signal efficiency cut).

## Diagnostic Tools

The committed regression diagnostics in this branch are:

### SHAP feature-importance summary

Tests: Feature importance

- Load per-target SHAP importances cached in the trained model file
- Create one top-20 feature plot per residual target (`Xoff_residual`, `Yoff_residual`, `E_residual`)

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated PNGs

Run:

```bash
eventdisplay-ml-diagnostic-shap-summary \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Outputs:

- `diagnostics/shap_importance_Xoff_residual.png`
- `diagnostics/shap_importance_Yoff_residual.png`
- `diagnostics/shap_importance_E_residual.png`

### Permutation importance

- Rebuild the held-out test split from the model metadata and original input files
- Shuffle one feature at a time and measure the relative RMSE increase per residual target
- Validate predictive dependence on features rather than cached model attribution

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots
- `--top_n`: number of top features to include in the plot (optional)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-permutation-importance \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/ \
--top_n 20
```

Optional override:

```bash
eventdisplay-ml-diagnostic-permutation-importance \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--output_dir diagnostics/
```

Output:

- `diagnostics/permutation_importance.png`

Notes:

- This diagnostic is slower than the SHAP summary because it rebuilds the processed test split.
- It is the better choice when you want to measure actual performance sensitivity to each feature.

### Generalization gap

- Read the cached train/test RMSE summary written during training
- Compare final train and test RMSE for each residual target
- Quantify the overfitting gap after training is complete

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-generalization-gap \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Optional override:

```bash
eventdisplay-ml-diagnostic-generalization-gap \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--output_dir diagnostics/
```

Output:

- `diagnostics/generalization_gap.png`

Notes:

- This diagnostic measures final overfitting by comparing train and test residual RMSE.
- Older model files without cached metrics fall back to rebuilding the original train/test split.
- Unlike `plot_training_evaluation.py`, it summarizes final RMSE, not the per-iteration XGBoost training history.

### Partial Dependence Plots

- Visualize how each feature influences model predictions
- Prove the model captures physics by checking that multiplicity reduces corrections and baselines show smooth relationships

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots (optional; default: `diagnostics`)
- `--features`: space-separated list of features to plot (optional; default: `DispNImages Xoff_weighted_bdt Yoff_weighted_bdt ErecS`)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-partial-dependence \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/ \
--features DispNImages Xoff_weighted_bdt ErecS
```

Optional override:

```bash
eventdisplay-ml-diagnostic-partial-dependence \
--model_file models/stereo_model.joblib \
--input_file_list files.txt \
--features Xoff_weighted_bdt Yoff_weighted_bdt
```

Output:

- `diagnostics/partial_dependence.png` (grid of feature × target subplots)

Notes:

- PDP displays predicted residual output as a function of a single feature while holding others constant
- Multiplicity effect: high-multiplicity events should show smaller corrections (negative slope)
- Baseline stability: baseline features (e.g., `weighted_bdt`) should show smooth, linear relationships
- This diagnostic rebuilds the held-out test split and is slower than SHAP summary

### Residual Normality Diagnostics

- Validate that model residuals follow a normal distribution
- Detect outlier events and check for systematic biases in reconstruction errors

Required inputs:

- `--model_file`: trained stereo model `.joblib`
- `--output_dir`: directory for generated plots (optional; default: `diagnostics`)
- `--input_file_list`: optional override if the path stored in the model metadata is no longer valid

Run:

```bash
eventdisplay-ml-diagnostic-residual-normality \
--model_file models/stereo_model.joblib \
--output_dir diagnostics/
```

Optional override:

```bash
eventdisplay-ml-diagnostic-residual-normality \
--model_file models/stereo_model.joblib \
--input_file_list files.txt
```

Output:

- Residual normality statistics printed to console:
- Mean and standard deviation per target
- Kolmogorov-Smirnov test p-value (normality test)
- Anderson-Darling test statistic and critical value
- Skewness and kurtosis
- Q-Q plot R² value
- Number of outliers (>3σ) per target
- `diagnostics/residual_diagnostics.png` (single 2xN grid; generated on cache miss when reconstruction is required)

Notes:

- Residual normality stats are cached during training and loaded from the model file for fast retrieval
- Diagnostic plots (histograms, Q-Q plots) are only generated when the split must be reconstructed
- Invalid KS test or Anderson-Darling results (NaN/inf) are reported as special values
- Outlier counts help identify events with unusually large reconstruction errors

### Training-evaluation curves

- Plot XGBoost training vs validation metric curves
- Useful for checking convergence and overfitting behavior

Required inputs:

- `--model_file`: trained model `.joblib` containing an XGBoost model
- `--output_file`: output image path (optional; if omitted, plot is shown interactively)

Run:

```bash
eventdisplay-ml-plot-training-evaluation \
--model_file models/stereo_model.joblib \
--output_file diagnostics/training_curves.png
```

Output:

- Figure with one panel per tracked metric (for example `rmse`), showing training and test curves.

## Generative AI disclosure

Generative AI tools (including Claude, ChatGPT, and Gemini) were used to assist with code development, debugging, and documentation drafting. All AI-assisted outputs were reviewed, validated, and, where necessary, modified by the authors to ensure accuracy and reliability.
Expand Down
45 changes: 34 additions & 11 deletions docs/changes/53.feature.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,37 @@
Fix critical bugs in stereo regression pipeline:
## Stereo Regression: Training on Residuals with Standardization and Energy Weighting

- **Fixed double log10 application**: E_residual was being computed with log10(ErecS) that had already been log10'd. Now ErecS/Erec remain in linear space during training/apply; log10 applied explicitly when needed.
- **Fixed energy bin weighting**: Bins with fewer than 10 events now correctly get zero weight instead of being clamped; weight sorting preserves bin order.
- **Fixed standardization inversion**: Added proper loading and validation of target_mean/target_std scalers in stereo apply pipeline to prevent KeyError crashes.
- **Fixed ErecS validation**: Safe log10 computation during apply avoids RuntimeWarning for invalid values; all output rows preserved even with invalid energy.
- **Fixed evaluation metrics**: ErecS in evaluation now properly converted to log10 space for energy resolution comparison.
- **Fixed FutureWarning**: Series positional indexing converted to numpy arrays for future pandas compatibility.
### Architectural Change

New features and improvements:
- **Training targets changed from absolute to residual values**: Models now predict residuals (deviations from baseline reconstructions) rather than absolute directions/energies. This allows XGBoost to learn corrections to existing Eventdisplay reconstructions (DispBDT, intersection method) and leverage their baseline accuracy as a starting point.

- **Comprehensive test coverage**: Added `test_regression_apply.py` with full unit test suite covering standardization inversion, residual computation, ErecS handling, and final prediction reconstruction.
- **Improved error messages**: Clear, actionable error messages when standardization parameters are missing or mismatched in apply pipeline.
- **Data preservation guarantee**: Stereo apply pipeline now preserves all input rows even when encountering invalid energy values, ensuring output count equals input count.
### Critical Bug Fixes

- **Fixed double log10 application**: Energy residuals computed in linear space; log10 applied explicitly during evaluation
- **Fixed standardization inversion**: Apply pipeline now loads and validates target_mean/target_std scalers (prevents KeyError)
- **Fixed energy-bin weighting**: Bins with <10 events get zero weight; correct inverse weighting for balanced training
- **Fixed ErecS validation**: Safe log10 computation during apply; all input rows preserved in output
- **Fixed evaluation metrics**: Energy resolution compared in log10 space with proper baseline alignment
- **Fixed FutureWarning**: Series positional indexing converted to numpy arrays for pandas compatibility

### New Features

- **Target standardization in training**: Residuals standardized to mean=0, std=1 during training to enable multi-target learning with balanced learning signals (direction and energy equally weighted)
- **Energy-bin weighted training**: Events weighted inversely by energy bin density; bins with <10 events excluded to prevent overfitting on low-statistics regions
- **Per-target SHAP importance caching**: Feature importances computed once during training for each target (Xoff_residual, Yoff_residual, E_residual), cached for diagnostic tools
- **Diagnostic scripts**:
- `diagnostic_shap_summary.py`: Top-20 feature importance plots per residual target
- `plot_training_evaluation.py`: Energy resolution and residual distribution visualization
- **Comprehensive test suites**: 20 new tests covering residual computation, standardization, energy weighting, apply inference
- **Robust error handling**: Clear messages for missing scalers; guaranteed row-count preservation in apply pipeline

### Enhanced Diagnostic Pipeline

- **Generalization-gap metrics cached during training**: Train/test RMSE, gap %, and generalization ratio computed and cached in the model artifact, enabling fast overfitting assessment without recomputation
- **Residual normality statistics cached during training**: Normality tests (Kolmogorov-Smirnov, Anderson-Darling), distribution shape metrics (skewness, kurtosis, Q-Q R²), and outlier counts computed once during training and cached for fast retrieval
- **Diagnostic reconstruction from model metadata**: All regression diagnostics (generalization-gap, partial-dependence, residual-normality) now reconstruct the held-out test split from stored model metadata + input file list, enabling reproducibility and offline analysis without CSV exports
- **Cache-first diagnostic workflows**: Diagnostic scripts load cached metrics first (fast) with graceful fallback to reconstruction if cache unavailable (backward compatible with older models)
- **CLI entry points for all diagnostics**:
- `eventdisplay-ml-diagnostic-generalization-gap`: Quantify overfitting via train/test RMSE comparison
- `eventdisplay-ml-diagnostic-partial-dependence`: Validate model captures physics via partial dependence curves
- `eventdisplay-ml-diagnostic-residual-normality`: Validate residual normality and detect outliers
- **Fixed sklearn FutureWarning**: Partial dependence plots convert feature data to float64 to avoid integer dtype warnings in newer scikit-learn versions
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,11 @@ urls."documentation" = "https://github.com/Eventdisplay/Eventdisplay-ML"
urls."repository" = "https://github.com/Eventdisplay/Eventdisplay-ML"
scripts.eventdisplay-ml-apply-xgb-classify = "eventdisplay_ml.scripts.apply_xgb_classify:main"
scripts.eventdisplay-ml-apply-xgb-stereo = "eventdisplay_ml.scripts.apply_xgb_stereo:main"
scripts.eventdisplay-ml-diagnostic-generalization-gap = "eventdisplay_ml.scripts.diagnostic_generalization_gap:main"
scripts.eventdisplay-ml-diagnostic-partial-dependence = "eventdisplay_ml.scripts.diagnostic_partial_dependence:main"
scripts.eventdisplay-ml-diagnostic-permutation-importance = "eventdisplay_ml.scripts.diagnostic_permutation_importance:main"
scripts.eventdisplay-ml-diagnostic-residual-normality = "eventdisplay_ml.scripts.diagnostic_residual_normality:main"
scripts.eventdisplay-ml-diagnostic-shap-summary = "eventdisplay_ml.scripts.diagnostic_shap_summary:main"
scripts.eventdisplay-ml-plot-classification-performance-metrics = "eventdisplay_ml.scripts.plot_classification_performance_metrics:main"
scripts.eventdisplay-ml-plot-classification-gamma-efficiency = "eventdisplay_ml.scripts.plot_classification_gamma_efficiency:main"
scripts.eventdisplay-ml-plot-training-evaluation = "eventdisplay_ml.scripts.plot_training_evaluation:main"
Expand Down
Loading