This is a machine learning pipeline for predicting experimental-like X-ray Absorption Near Edge Structure and Electron Energy-Loss Near-Edge Structure (XANES/ELNES) from DFT calculations using Random Forest.
This project provides tools to: Predict experimental-like spectra from DFT calculations.
The repository includes the following data directories:
aligned_data/: Cleaned and aligned dataset used for machine learning training.cal_ELNES/: Calculated ELNES spectra (before cleaning & aligning).exp_ELNES/: Experimental ELNES spectra (before cleaning & aligning).
- Python >= 3.8
- numpy
- pandas
- scipy
- scikit-learn
- matplotlib
- joblib
Install dependencies:
pip install numpy pandas scipy scikit-learn matplotlib joblibInput CSV format:
- First column: Energy (eV)
- Subsequent columns: Intensity values
- First row: Headers (energy label + sample names)
Example:
Energy,Intensity
280.0,0.0
280.1,0.0
...Edit the following variables in the script:
base_dir = "/path/to/your/working/directory"
model_file = os.path.join(base_dir, "rf_models.pkl")
scaler_file = os.path.join(base_dir, "rf_scalers.pkl")
input_files = [
"your_input_file.csv",
]preprocess = True # Set False if data is already preprocessed
target_points = 300 # Number of output points
energy_step = 0.1 # Energy grid spacing (eV)
target_position = 119 # Edge alignment position (120th point)
sigma = 3.0 # Gaussian smoothing parameter (Optional)python cal2exp.pyResults are saved in predictions_YYYYMMDD_HHMMSS/:
predictions_20260105_143022/
├── sample_name/
│ ├── preprocessed_sample_name.csv # Preprocessed input spectra
│ ├── predicted_sample_name_*.csv # Model predictions
│ └── prediction_plots/
│ └── predictions_page_001.png # Comparison plots
We provide two types of pre-trained models:
- cal2exp_models: Predicts experimental spectra from DFT inputs.
- exp2cal_models: Converts experimental spectra back to DFT-like spectra, allowing users to leverage models and analysis workflows originally designed for DFT data.
- First derivative analysis: Find absorption edge onset.
- Interpolation: Resample to uniform 0.1 eV grid (300 points).
- Alignment: Align edge position to 120th point.
- Gaussian smoothing: σ = 3.0.
- Normalization: Scale maximum intensity to 1.0.
Symptom: The input spectrum in the output comparison plot does not start near the 120th point, or the absorption edge appears shifted from the expected position.
Cause: The edge-finding algorithm (find_valid_max_derivative) searches for the maximum-derivative point that has at least 50% zero-intensity values among the preceding min_index points. If the zero-intensity pre-edge region of your spectrum is shorter than min_index points on the original energy grid, the function will not find a valid candidate and will fall back to the global maximum of the first derivative, which may not correspond to the true absorption edge onset.
Solution: Increase min_index in the find_valid_max_derivative call to a value bigger than the number of pre-edge zero points in your spectrum:
def find_valid_max_derivative(intensity, first_derivative, min_index=50, zero_threshold=0.5):Start by increasing min_index gradually (e.g., 50 → 100 → 150) until the edge is correctly aligned to the 120th point in the output plot. If this doesn't work, please try decreasing min_index gradually.
If you use this project in your research, please cite this paper.
Yinan Wang, Yu Fujikata, Louis Wong, Yasuji Muramatsu, Teruyasu Mizoguchi, "Systematic correction of core-loss spectra via machine learning: bridging the gap between simulated and experimental spectra", Ultramicroscopy, 283 (2026) 114336-1-12., https://doi.org/10.1016/j.ultramic.2026.114336
Depending on which part of the data you use, please also include the following citations:
If you use the cal_ELNES data:
T. Mizoguchi, I. Tanaka, S.-P. Gao, C.J. Pickard, First-principles calculation of spectral features, chemical shift and absolute threshold of ELNES and XANES using a plane wave pseudopotential method, J. Phys. Condens. Matter 21 (2009) 104204. https://doi.org/10.1088/0953-8984/21/10/104204
If you use the exp_ELNES data:
A.P. Hitchcock, Inner shell excitation spectroscopy of molecules using inelastic electron scattering, J. Electron Spectrosc. Relat. Phenom. 112 (2000) 9–29. https://doi.org/10.1016/S0368-2048(00)00200-0
MIT License