ClimSim Kaggle Edition: Online Testing of Competition-Winning Architectures

This repository accompanies a forthcoming paper that evaluates neural network architectures from the 2024 LEAP ClimSim Kaggle competition in "online" coupled climate simulations with E3SM-MMF (Energy Exascale Earth System Model - Multi-scale Modeling Framework).

Overview

The ClimSim Kaggle competition challenged participants to develop machine learning emulators of cloud and convection processes for climate modeling. This repository tests whether architectures that performed well in offline metrics also produce stable, physically realistic results when coupled to a climate model.

Key Features

6 Model Architectures: Implementations of winning Kaggle competition architectures plus baseline
5 Training Configurations: Architecture-agnostic design variations inspired by competition insights
Multi-seed Training: Multiple random seeds (7, 43, 1024) for robust evaluation
Online Testing Framework: Uses FTorch-based E3SM-MMF for coupled simulations
Comprehensive Evaluation: Offline metrics, online simulation analysis, and figure generation scripts

Repository Structure

See ARCHITECTURE.md for detailed structure documentation.

├── baseline_models/          # Model implementations and training scripts
│   ├── convnext/             # ConvNeXt architecture
│   ├── encdec_lstm/          # Encoder-Decoder LSTM
│   ├── pao_model/            # Pao model (3rd place)
│   ├── pure_resLSTM/         # Pure ResLSTM (2nd place)
│   ├── squeezeformer/        # Squeezeformer (1st place)
│   └── unet/                 # U-Net baseline
│       └── training_*/       # 5 training configurations per model
│
├── evaluation/               # Evaluation scripts and notebooks
│   ├── offline/              # Test set metrics
│   └── online/               # Coupled simulation analysis
│
├── preprocessing/            # Data preparation scripts
├── online_ensembling/        # Online ensemble simulation scripts
│
├── preprocess_figure_data.ipynb   # Compute metrics (run first)
└── generate_paper_figures.ipynb   # Generate visualizations (run second)

Model Architectures

U-Net

Baseline architecture adapted from Hu et al. (2025) using encoder-decoder structure with skip connections. Progressively downsamples vertical dimension while expanding feature space, with scalar outputs averaged and concatenated to vertically-resolved variables.

Squeezeformer (1st Place)

Integrates convolutional and transformer components. Originally designed for automatic speech recognition, combines local context capture via depthwise convolutions with global dependency modeling through multi-head self-attention.

Pure ResLSTM (2nd Place)

Multi-layer bidirectional LSTM with residual connections. Processes vertical profiles through 10 blocks of LSTM + layer normalization + GELU activation, embedding a physical prior of vertical locality.

Pao Model (3rd Place)

Processes vertically-resolved and scalar variables separately before combining. Uses residual blocks with convolutional and transformer components, followed by bidirectional LSTM layers.

ConvNeXt (4th Place)

Modern convolutional architecture competitive with vision transformers. Employs depthwise convolutions with large kernels, batch normalization, and residual connections across multiple stages.

Encoder-Decoder LSTM (5th Place)

Uses encoder-decoder MLP to learn combined latent representation before recurrent processing. Bidirectional LSTM followed by GRU layer, breaking traditional vertical locality assumptions.

Training Configurations

Each model can be trained with 5 different configurations:

Standard (training_default): Baseline using Kaggle-available input variables
Confidence Loss (training_conf_loss): Adds confidence head to predict loss magnitude (1st place team innovation)
Difference Loss (training_diff_loss): Adds loss term comparing vertical differences (2nd place team innovation)
Multirepresentation (training_multirep): Uses three parallel encodings of vertical profiles - level-wise normalization, column-wise normalization, and log-symmetric transformation (1st place team innovation)
Expanded Inputs (training_v6): Adds large-scale forcings, tendencies at previous timesteps (t-1, t-2), and latitude coordinates (following Hu et al. 2025)

Getting Started

Data

ClimSim dataset available at HuggingFace. The paper uses the low-resolution dataset with real geography.

Training

Each model's training directory contains:

conf/: Hydra configuration files for different seeds
slurm/: SLURM job submission scripts
train_{model}.py: Training script
{model}.py: Model architecture definition
wrap_model.py: Wrapper for online inference (includes normalization)

Note: Code is provided for transparency and reproducibility, not as out-of-the-box software. You will need to adapt paths and configurations for your environment.

Example structure:

baseline_models/unet/training_default/
├── conf/
│   ├── config.yaml              # Base configuration
│   ├── config_seed_7.yaml       # Seed 7 variant
│   ├── config_seed_43.yaml      # Seed 43 variant
│   └── config_seed_1024.yaml    # Seed 1024 variant
├── slurm/
│   └── unet.sbatch              # Job submission script
├── train_unet.py                # Training script
├── unet.py                      # Model architecture
└── wrap_model.py                # Inference wrapper

Online Testing

Online coupled simulations use FTorch for PyTorch-Fortran integration. See the FTorch-based E3SM-MMF repository for:

E3SM-MMF setup with FTorch
Model integration workflow
Simulation configuration files

NOTE: The version of E3SM-MMF with FTorch used for climsim-kaggle-edition uses a version of YAKL (commit 4109dc0) that compiles with cudatoolkit 11.7 but fails with cudatoolkit 12.x. As a consequence, newer versions of PyTorch (2.6.0 and up) may be incompatible.

Evaluation

The evaluation pipeline consists of multiple phases that separate expensive computations from figure generation.

Quick Start for Paper Figures:

Run preprocess_figure_data.ipynb to compute expensive metrics and save results
Run generate_paper_figures.ipynb to generate all main and supplementary figures

Detailed Workflow:

The full evaluation pipeline has 4 phases (see ARCHITECTURE.md for details):

Offline Inference (evaluation/offline/offline_inference_test.py): Runs inference on 90 model combinations (6 models × 5 configs × 3 seeds) and saves predictions as .npz files and R² scores as .pkl files
Offline Diagnostics (evaluation/offline/create_offline_*.py): Generates diagnostic plots from the predictions (bias profiles, zonal means, etc.)
Online Preprocessing (preprocess_figure_data.ipynb): Loads multi-year online simulation data, computes expensive statistics (RMSE, precipitation, etc.), and saves processed results as .pkl files
Figure Generation (generate_paper_figures.ipynb): Loads precomputed data and generates all publication figures

This workflow design enables rapid iteration on figures without rerunning expensive inference or simulation loading steps.

Requirements

PyTorch (for training and inference)
NVIDIA PhysicsNeMo (originally called Modulus; used during training)
Hydra (configuration management)
Standard scientific Python stack (numpy, xarray, matplotlib, etc.)

See individual model directories for specific dependencies.

Saved Models

Models, checkpoints, and normalization files have been uploaded to HuggingFace and are available in the ClimSim Kaggle Models collection.

Citation

If you use this code or build upon this work, please cite the accompanying paper:

@article{Lin2025-ko,
  title     = {Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a \$50,000 Kaggle Competition},
  author    = {Lin, Jerry and Hu, Zeyuan and Beucler, Tom and Frields, Katherine and Christensen, Hannah and Hannah, Walter and Heuer, Helge and Ukkonnen, Peter and Mansfield, Laura A and Zheng, Tian and Peng, Liran and Gupta, Ritwik and Gentine, Pierre and Al-Naher, Yusef and Duan, Mingjiang and Hattori, Kyo and Ji, Weiliang and Li, Chunhan and Matsuda, Kippei and Murakami, Naoki and Ron, Shlomo and Serlin, Marec and Song, Hongjian and Tanabe, Yuma and Yamamoto, Daisuke and Zhou, Jianyao and Pritchard, Mike},
  journal   = {arXiv preprint arXiv:2511.20963},
  year      = {2025},
  month     = {11},
  url       = {https://arxiv.org/abs/2511.20963}
}

References

ClimSim Kaggle Competition
Hu et al. (2025). Stable Machine-Learning Parameterization of Subgrid Processes with Real Geography and Full-physics Emulation. arXiv:2407.00124
ClimSim Dataset on HuggingFace
FTorch-based E3SM-MMF

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 709 Commits
.github		.github
E3SM_nvlab @ a8d59bf		E3SM_nvlab @ a8d59bf
baseline_models		baseline_models
climsim_utils		climsim_utils
dataset_statistics		dataset_statistics
demo_notebooks		demo_notebooks
evaluation		evaluation
grid_info		grid_info
mmf_scripts		mmf_scripts
online_ensembling		online_ensembling
preprocessing		preprocessing
tests		tests
website		website
.gitignore		.gitignore
.gitmodules		.gitmodules
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
README.md		README.md
generate_paper_figures.ipynb		generate_paper_figures.ipynb
preprocess_figure_data.ipynb		preprocess_figure_data.ipynb
setup.py		setup.py
system_requirements.md		system_requirements.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClimSim Kaggle Edition: Online Testing of Competition-Winning Architectures

Overview

Key Features

Repository Structure

Model Architectures

U-Net

Squeezeformer (1st Place)

Pure ResLSTM (2nd Place)

Pao Model (3rd Place)

ConvNeXt (4th Place)

Encoder-Decoder LSTM (5th Place)

Training Configurations

Getting Started

Data

Training

Online Testing

Evaluation

Requirements

Saved Models

Citation

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClimSim Kaggle Edition: Online Testing of Competition-Winning Architectures

Overview

Key Features

Repository Structure

Model Architectures

U-Net

Squeezeformer (1st Place)

Pure ResLSTM (2nd Place)

Pao Model (3rd Place)

ConvNeXt (4th Place)

Encoder-Decoder LSTM (5th Place)

Training Configurations

Getting Started

Data

Training

Online Testing

Evaluation

Requirements

Saved Models

Citation

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages