Skip to content

leap-stc/climsim-kaggle-edition

Repository files navigation

ClimSim Kaggle Edition: Online Testing of Competition-Winning Architectures

This repository accompanies a forthcoming paper that evaluates neural network architectures from the 2024 LEAP ClimSim Kaggle competition in "online" coupled climate simulations with E3SM-MMF (Energy Exascale Earth System Model - Multi-scale Modeling Framework).

Overview

The ClimSim Kaggle competition challenged participants to develop machine learning emulators of cloud and convection processes for climate modeling. This repository tests whether architectures that performed well in offline metrics also produce stable, physically realistic results when coupled to a climate model.

Key Features

  • 6 Model Architectures: Implementations of winning Kaggle competition architectures plus baseline
  • 5 Training Configurations: Architecture-agnostic design variations inspired by competition insights
  • Multi-seed Training: Multiple random seeds (7, 43, 1024) for robust evaluation
  • Online Testing Framework: Uses FTorch-based E3SM-MMF for coupled simulations
  • Comprehensive Evaluation: Offline metrics, online simulation analysis, and figure generation scripts

Repository Structure

See ARCHITECTURE.md for detailed structure documentation.

├── baseline_models/          # Model implementations and training scripts
│   ├── convnext/             # ConvNeXt architecture
│   ├── encdec_lstm/          # Encoder-Decoder LSTM
│   ├── pao_model/            # Pao model (3rd place)
│   ├── pure_resLSTM/         # Pure ResLSTM (2nd place)
│   ├── squeezeformer/        # Squeezeformer (1st place)
│   └── unet/                 # U-Net baseline
│       └── training_*/       # 5 training configurations per model
│
├── evaluation/               # Evaluation scripts and notebooks
│   ├── offline/              # Test set metrics
│   └── online/               # Coupled simulation analysis
│
├── preprocessing/            # Data preparation scripts
├── online_ensembling/        # Online ensemble simulation scripts
│
├── preprocess_figure_data.ipynb   # Compute metrics (run first)
└── generate_paper_figures.ipynb   # Generate visualizations (run second)

Model Architectures

U-Net

Baseline architecture adapted from Hu et al. (2025) using encoder-decoder structure with skip connections. Progressively downsamples vertical dimension while expanding feature space, with scalar outputs averaged and concatenated to vertically-resolved variables.

Squeezeformer (1st Place)

Integrates convolutional and transformer components. Originally designed for automatic speech recognition, combines local context capture via depthwise convolutions with global dependency modeling through multi-head self-attention.

Pure ResLSTM (2nd Place)

Multi-layer bidirectional LSTM with residual connections. Processes vertical profiles through 10 blocks of LSTM + layer normalization + GELU activation, embedding a physical prior of vertical locality.

Pao Model (3rd Place)

Processes vertically-resolved and scalar variables separately before combining. Uses residual blocks with convolutional and transformer components, followed by bidirectional LSTM layers.

ConvNeXt (4th Place)

Modern convolutional architecture competitive with vision transformers. Employs depthwise convolutions with large kernels, batch normalization, and residual connections across multiple stages.

Encoder-Decoder LSTM (5th Place)

Uses encoder-decoder MLP to learn combined latent representation before recurrent processing. Bidirectional LSTM followed by GRU layer, breaking traditional vertical locality assumptions.

Training Configurations

Each model can be trained with 5 different configurations:

  1. Standard (training_default): Baseline using Kaggle-available input variables
  2. Confidence Loss (training_conf_loss): Adds confidence head to predict loss magnitude (1st place team innovation)
  3. Difference Loss (training_diff_loss): Adds loss term comparing vertical differences (2nd place team innovation)
  4. Multirepresentation (training_multirep): Uses three parallel encodings of vertical profiles - level-wise normalization, column-wise normalization, and log-symmetric transformation (1st place team innovation)
  5. Expanded Inputs (training_v6): Adds large-scale forcings, tendencies at previous timesteps (t-1, t-2), and latitude coordinates (following Hu et al. 2025)

Getting Started

Data

ClimSim dataset available at HuggingFace. The paper uses the low-resolution dataset with real geography.

Training

Each model's training directory contains:

  • conf/: Hydra configuration files for different seeds
  • slurm/: SLURM job submission scripts
  • train_{model}.py: Training script
  • {model}.py: Model architecture definition
  • wrap_model.py: Wrapper for online inference (includes normalization)

Note: Code is provided for transparency and reproducibility, not as out-of-the-box software. You will need to adapt paths and configurations for your environment.

Example structure:

baseline_models/unet/training_default/
├── conf/
│   ├── config.yaml              # Base configuration
│   ├── config_seed_7.yaml       # Seed 7 variant
│   ├── config_seed_43.yaml      # Seed 43 variant
│   └── config_seed_1024.yaml    # Seed 1024 variant
├── slurm/
│   └── unet.sbatch              # Job submission script
├── train_unet.py                # Training script
├── unet.py                      # Model architecture
└── wrap_model.py                # Inference wrapper

Online Testing

Online coupled simulations use FTorch for PyTorch-Fortran integration. See the FTorch-based E3SM-MMF repository for:

  • E3SM-MMF setup with FTorch
  • Model integration workflow
  • Simulation configuration files

NOTE: The version of E3SM-MMF with FTorch used for climsim-kaggle-edition uses a version of YAKL (commit 4109dc0) that compiles with cudatoolkit 11.7 but fails with cudatoolkit 12.x. As a consequence, newer versions of PyTorch (2.6.0 and up) may be incompatible.

Evaluation

The evaluation pipeline consists of multiple phases that separate expensive computations from figure generation.

Quick Start for Paper Figures:

  1. Run preprocess_figure_data.ipynb to compute expensive metrics and save results
  2. Run generate_paper_figures.ipynb to generate all main and supplementary figures

Detailed Workflow:

The full evaluation pipeline has 4 phases (see ARCHITECTURE.md for details):

  1. Offline Inference (evaluation/offline/offline_inference_test.py): Runs inference on 90 model combinations (6 models × 5 configs × 3 seeds) and saves predictions as .npz files and R² scores as .pkl files

  2. Offline Diagnostics (evaluation/offline/create_offline_*.py): Generates diagnostic plots from the predictions (bias profiles, zonal means, etc.)

  3. Online Preprocessing (preprocess_figure_data.ipynb): Loads multi-year online simulation data, computes expensive statistics (RMSE, precipitation, etc.), and saves processed results as .pkl files

  4. Figure Generation (generate_paper_figures.ipynb): Loads precomputed data and generates all publication figures

This workflow design enables rapid iteration on figures without rerunning expensive inference or simulation loading steps.

Requirements

  • PyTorch (for training and inference)
  • NVIDIA PhysicsNeMo (originally called Modulus; used during training)
  • Hydra (configuration management)
  • Standard scientific Python stack (numpy, xarray, matplotlib, etc.)

See individual model directories for specific dependencies.

Saved Models

Models, checkpoints, and normalization files have been uploaded to HuggingFace and are available in the ClimSim Kaggle Models collection.

Citation

If you use this code or build upon this work, please cite the accompanying paper:

@article{Lin2025-ko,
  title     = {Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a \$50,000 Kaggle Competition},
  author    = {Lin, Jerry and Hu, Zeyuan and Beucler, Tom and Frields, Katherine and Christensen, Hannah and Hannah, Walter and Heuer, Helge and Ukkonnen, Peter and Mansfield, Laura A and Zheng, Tian and Peng, Liran and Gupta, Ritwik and Gentine, Pierre and Al-Naher, Yusef and Duan, Mingjiang and Hattori, Kyo and Ji, Weiliang and Li, Chunhan and Matsuda, Kippei and Murakami, Naoki and Ron, Shlomo and Serlin, Marec and Song, Hongjian and Tanabe, Yuma and Yamamoto, Daisuke and Zhou, Jianyao and Pritchard, Mike},
  journal   = {arXiv preprint arXiv:2511.20963},
  year      = {2025},
  month     = {11},
  url       = {https://arxiv.org/abs/2511.20963}
}

References

License

See LICENSE file for details.

About

GitHub repo for Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a $50,000 Kaggle Competition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors