This repository accompanies a forthcoming paper that evaluates neural network architectures from the 2024 LEAP ClimSim Kaggle competition in "online" coupled climate simulations with E3SM-MMF (Energy Exascale Earth System Model - Multi-scale Modeling Framework).
The ClimSim Kaggle competition challenged participants to develop machine learning emulators of cloud and convection processes for climate modeling. This repository tests whether architectures that performed well in offline metrics also produce stable, physically realistic results when coupled to a climate model.
- 6 Model Architectures: Implementations of winning Kaggle competition architectures plus baseline
- 5 Training Configurations: Architecture-agnostic design variations inspired by competition insights
- Multi-seed Training: Multiple random seeds (7, 43, 1024) for robust evaluation
- Online Testing Framework: Uses FTorch-based E3SM-MMF for coupled simulations
- Comprehensive Evaluation: Offline metrics, online simulation analysis, and figure generation scripts
See ARCHITECTURE.md for detailed structure documentation.
├── baseline_models/ # Model implementations and training scripts
│ ├── convnext/ # ConvNeXt architecture
│ ├── encdec_lstm/ # Encoder-Decoder LSTM
│ ├── pao_model/ # Pao model (3rd place)
│ ├── pure_resLSTM/ # Pure ResLSTM (2nd place)
│ ├── squeezeformer/ # Squeezeformer (1st place)
│ └── unet/ # U-Net baseline
│ └── training_*/ # 5 training configurations per model
│
├── evaluation/ # Evaluation scripts and notebooks
│ ├── offline/ # Test set metrics
│ └── online/ # Coupled simulation analysis
│
├── preprocessing/ # Data preparation scripts
├── online_ensembling/ # Online ensemble simulation scripts
│
├── preprocess_figure_data.ipynb # Compute metrics (run first)
└── generate_paper_figures.ipynb # Generate visualizations (run second)
Baseline architecture adapted from Hu et al. (2025) using encoder-decoder structure with skip connections. Progressively downsamples vertical dimension while expanding feature space, with scalar outputs averaged and concatenated to vertically-resolved variables.
Integrates convolutional and transformer components. Originally designed for automatic speech recognition, combines local context capture via depthwise convolutions with global dependency modeling through multi-head self-attention.
Multi-layer bidirectional LSTM with residual connections. Processes vertical profiles through 10 blocks of LSTM + layer normalization + GELU activation, embedding a physical prior of vertical locality.
Processes vertically-resolved and scalar variables separately before combining. Uses residual blocks with convolutional and transformer components, followed by bidirectional LSTM layers.
Modern convolutional architecture competitive with vision transformers. Employs depthwise convolutions with large kernels, batch normalization, and residual connections across multiple stages.
Uses encoder-decoder MLP to learn combined latent representation before recurrent processing. Bidirectional LSTM followed by GRU layer, breaking traditional vertical locality assumptions.
Each model can be trained with 5 different configurations:
- Standard (
training_default): Baseline using Kaggle-available input variables - Confidence Loss (
training_conf_loss): Adds confidence head to predict loss magnitude (1st place team innovation) - Difference Loss (
training_diff_loss): Adds loss term comparing vertical differences (2nd place team innovation) - Multirepresentation (
training_multirep): Uses three parallel encodings of vertical profiles - level-wise normalization, column-wise normalization, and log-symmetric transformation (1st place team innovation) - Expanded Inputs (
training_v6): Adds large-scale forcings, tendencies at previous timesteps (t-1, t-2), and latitude coordinates (following Hu et al. 2025)
ClimSim dataset available at HuggingFace. The paper uses the low-resolution dataset with real geography.
Each model's training directory contains:
conf/: Hydra configuration files for different seedsslurm/: SLURM job submission scriptstrain_{model}.py: Training script{model}.py: Model architecture definitionwrap_model.py: Wrapper for online inference (includes normalization)
Note: Code is provided for transparency and reproducibility, not as out-of-the-box software. You will need to adapt paths and configurations for your environment.
Example structure:
baseline_models/unet/training_default/
├── conf/
│ ├── config.yaml # Base configuration
│ ├── config_seed_7.yaml # Seed 7 variant
│ ├── config_seed_43.yaml # Seed 43 variant
│ └── config_seed_1024.yaml # Seed 1024 variant
├── slurm/
│ └── unet.sbatch # Job submission script
├── train_unet.py # Training script
├── unet.py # Model architecture
└── wrap_model.py # Inference wrapperOnline coupled simulations use FTorch for PyTorch-Fortran integration. See the FTorch-based E3SM-MMF repository for:
- E3SM-MMF setup with FTorch
- Model integration workflow
- Simulation configuration files
NOTE: The version of E3SM-MMF with FTorch used for climsim-kaggle-edition uses a version of YAKL (commit 4109dc0) that compiles with cudatoolkit 11.7 but fails with cudatoolkit 12.x. As a consequence, newer versions of PyTorch (2.6.0 and up) may be incompatible.
The evaluation pipeline consists of multiple phases that separate expensive computations from figure generation.
Quick Start for Paper Figures:
- Run
preprocess_figure_data.ipynbto compute expensive metrics and save results - Run
generate_paper_figures.ipynbto generate all main and supplementary figures
Detailed Workflow:
The full evaluation pipeline has 4 phases (see ARCHITECTURE.md for details):
-
Offline Inference (
evaluation/offline/offline_inference_test.py): Runs inference on 90 model combinations (6 models × 5 configs × 3 seeds) and saves predictions as.npzfiles and R² scores as.pklfiles -
Offline Diagnostics (
evaluation/offline/create_offline_*.py): Generates diagnostic plots from the predictions (bias profiles, zonal means, etc.) -
Online Preprocessing (
preprocess_figure_data.ipynb): Loads multi-year online simulation data, computes expensive statistics (RMSE, precipitation, etc.), and saves processed results as.pklfiles -
Figure Generation (
generate_paper_figures.ipynb): Loads precomputed data and generates all publication figures
This workflow design enables rapid iteration on figures without rerunning expensive inference or simulation loading steps.
- PyTorch (for training and inference)
- NVIDIA PhysicsNeMo (originally called Modulus; used during training)
- Hydra (configuration management)
- Standard scientific Python stack (numpy, xarray, matplotlib, etc.)
See individual model directories for specific dependencies.
Models, checkpoints, and normalization files have been uploaded to HuggingFace and are available in the ClimSim Kaggle Models collection.
If you use this code or build upon this work, please cite the accompanying paper:
@article{Lin2025-ko,
title = {Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a \$50,000 Kaggle Competition},
author = {Lin, Jerry and Hu, Zeyuan and Beucler, Tom and Frields, Katherine and Christensen, Hannah and Hannah, Walter and Heuer, Helge and Ukkonnen, Peter and Mansfield, Laura A and Zheng, Tian and Peng, Liran and Gupta, Ritwik and Gentine, Pierre and Al-Naher, Yusef and Duan, Mingjiang and Hattori, Kyo and Ji, Weiliang and Li, Chunhan and Matsuda, Kippei and Murakami, Naoki and Ron, Shlomo and Serlin, Marec and Song, Hongjian and Tanabe, Yuma and Yamamoto, Daisuke and Zhou, Jianyao and Pritchard, Mike},
journal = {arXiv preprint arXiv:2511.20963},
year = {2025},
month = {11},
url = {https://arxiv.org/abs/2511.20963}
}
- ClimSim Kaggle Competition
- Hu et al. (2025). Stable Machine-Learning Parameterization of Subgrid Processes with Real Geography and Full-physics Emulation. arXiv:2407.00124
- ClimSim Dataset on HuggingFace
- FTorch-based E3SM-MMF
See LICENSE file for details.