Skip to content

nmdl-mizo/ELNES-Cal2Exp-Corrector

Repository files navigation

Experimental XANES/ELNES Spectrum Prediction From DFT

This is a machine learning pipeline for predicting experimental-like X-ray Absorption Near Edge Structure and Electron Energy-Loss Near-Edge Structure (XANES/ELNES) from DFT calculations using Random Forest.

Overview

This project provides tools to: Predict experimental-like spectra from DFT calculations.

Data Structure

The repository includes the following data directories:

  • aligned_data/: Cleaned and aligned dataset used for machine learning training.
  • cal_ELNES/: Calculated ELNES spectra (before cleaning & aligning).
  • exp_ELNES/: Experimental ELNES spectra (before cleaning & aligning).

Requirements

  • Python >= 3.8
  • numpy
  • pandas
  • scipy
  • scikit-learn
  • matplotlib
  • joblib

Install dependencies:

pip install numpy pandas scipy scikit-learn matplotlib joblib

Usage

1. Prepare Input Data

Input CSV format:

  • First column: Energy (eV)
  • Subsequent columns: Intensity values
  • First row: Headers (energy label + sample names)

Example:

Energy,Intensity
280.0,0.0
280.1,0.0
...

2. Configure Paths

Edit the following variables in the script:

base_dir = "/path/to/your/working/directory"
model_file = os.path.join(base_dir, "rf_models.pkl")
scaler_file = os.path.join(base_dir, "rf_scalers.pkl")

input_files = [
    "your_input_file.csv",
]

3. Configure Preprocessing

preprocess = True       # Set False if data is already preprocessed
target_points = 300     # Number of output points 
energy_step = 0.1       # Energy grid spacing (eV)
target_position = 119   # Edge alignment position (120th point) 
sigma = 3.0             # Gaussian smoothing parameter (Optional)

4. Run

python cal2exp.py

Output

Results are saved in predictions_YYYYMMDD_HHMMSS/:

predictions_20260105_143022/
├── sample_name/
│   ├── preprocessed_sample_name.csv    # Preprocessed input spectra
│   ├── predicted_sample_name_*.csv     # Model predictions
│   └── prediction_plots/
│       └── predictions_page_001.png    # Comparison plots

Model Files

We provide two types of pre-trained models:

  • cal2exp_models: Predicts experimental spectra from DFT inputs.
  • exp2cal_models: Converts experimental spectra back to DFT-like spectra, allowing users to leverage models and analysis workflows originally designed for DFT data.

Preprocessing Pipeline

  1. First derivative analysis: Find absorption edge onset.
  2. Interpolation: Resample to uniform 0.1 eV grid (300 points).
  3. Alignment: Align edge position to 120th point.
  4. Gaussian smoothing: σ = 3.0.
  5. Normalization: Scale maximum intensity to 1.0.

Troubleshooting

Spectrum not aligned to the correct position

Symptom: The input spectrum in the output comparison plot does not start near the 120th point, or the absorption edge appears shifted from the expected position.

Cause: The edge-finding algorithm (find_valid_max_derivative) searches for the maximum-derivative point that has at least 50% zero-intensity values among the preceding min_index points. If the zero-intensity pre-edge region of your spectrum is shorter than min_index points on the original energy grid, the function will not find a valid candidate and will fall back to the global maximum of the first derivative, which may not correspond to the true absorption edge onset.

Solution: Increase min_index in the find_valid_max_derivative call to a value bigger than the number of pre-edge zero points in your spectrum:

def find_valid_max_derivative(intensity, first_derivative, min_index=50, zero_threshold=0.5):

Start by increasing min_index gradually (e.g., 50 → 100 → 150) until the edge is correctly aligned to the 120th point in the output plot. If this doesn't work, please try decreasing min_index gradually.

Citation

If you use this project in your research, please cite this paper.

Yinan Wang, Yu Fujikata, Louis Wong, Yasuji Muramatsu, Teruyasu Mizoguchi, "Systematic correction of core-loss spectra via machine learning: bridging the gap between simulated and experimental spectra", Ultramicroscopy, 283 (2026) 114336-1-12., https://doi.org/10.1016/j.ultramic.2026.114336

Additional Citations for Data and Methodology

Depending on which part of the data you use, please also include the following citations:

If you use the cal_ELNES data:

T. Mizoguchi, I. Tanaka, S.-P. Gao, C.J. Pickard, First-principles calculation of spectral features, chemical shift and absolute threshold of ELNES and XANES using a plane wave pseudopotential method, J. Phys. Condens. Matter 21 (2009) 104204. https://doi.org/10.1088/0953-8984/21/10/104204

If you use the exp_ELNES data:

A.P. Hitchcock, Inner shell excitation spectroscopy of molecules using inelastic electron scattering, J. Electron Spectrosc. Relat. Phenom. 112 (2000) 9–29. https://doi.org/10.1016/S0368-2048(00)00200-0

License

MIT License

Contact

ynwang@iis.u-tokyo.ac.jp

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages