Skip to content

Merge ACE2.1-ERA5 (AIMIP) training and evaluation baseline configs#1027

Open
brianhenn wants to merge 40 commits intomainfrom
workflow/aimip-ace-to-merge
Open

Merge ACE2.1-ERA5 (AIMIP) training and evaluation baseline configs#1027
brianhenn wants to merge 40 commits intomainfrom
workflow/aimip-ace-to-merge

Conversation

@brianhenn
Copy link
Copy Markdown
Contributor

@brianhenn brianhenn commented Mar 31, 2026

This PR adds the full set of scripts and configurations for "ACE2.1-ERA5" — a modification of the deterministic ACE2-ERA5 model trained and evaluated under the AIMIP protocol. It also merges main to pick up the SecondaryDecoderConfig API used for the pressure-level decoder fine-tuning stage, while retaining all of the job names, output paths, and checkpoint IDs as used on the original branch where the workflow actually occurred.

Changes:

  • configs/baselines/era5-aimip/ — new directory containing all scripts and configs for the ACE2.1-ERA5 pipeline (previously configs/baselines/era5/aimip/)

    • run-ace-train.sh / ace-train-config.yaml — train 4-seed ensemble on ERA5 1979–2008
    • run-ace-evaluator-seed-selection.sh / run-ace-evaluator-seed-selection-single.sh — evaluate trained and fine-tuned checkpoints to select best seeds
    • run-ace-fine-tune-decoder-pressure-levels.sh / ace-fine-tune-pressure-level-separate-decoder-config.yaml — fine-tune a secondary MLP decoder for 65 pressure-level diagnostic variables, using secondary_decoder (main's SecondaryDecoderConfig)
    • run-ace-inference.sh / ace-aimip-inference-{,p2k-,p4k-}config.yaml — 46-year inference with 5 ICs × 3 SST scenarios; IC label expansion done via inline sed at job time (eliminates 15 near-identical committed config files)
    • README.md — documents the intended workflow
  • Tests added

  • If dependencies changed, "deps only" image rebuilt and "latest_deps_only_image.txt" file updated

Remove intermediate fine-tuning explorations superseded by the
separate-decoder + LR warmup approach (the final model). Trim the
fine-tuning and seed-selection launch scripts to reference only the
final approach.

Deleted configs (non-final fine-tuning variants):
- ace-fine-tune-decoder-pressure-level-config.yaml
- ace-fine-tune-decoder-pressure-level-lr-warmup-config.yaml
- ace-fine-tune-decoder-pressure-level-frozen-config.yaml
- ace-fine-tune-decoder-pressure-level-frozen-lr-warmup-config.yaml
- ace-fine-tune-decoder-pressure-level-reweight-config.yaml
- ace-fine-tune-decoder-pressure-level-separate-decoder-config.yaml
- restart-ace-fine-tune-decoder-pressure-levels.sh
Resolves conflicts in fme/core/step/single_module.py and test_step.py
by accepting main's SecondaryDecoderConfig/SecondaryDecoder approach and
dropping the branch's inline MLP + additional_diagnostic_names approach.

Updates AIMIP configs to use the new secondary_decoder config format and
moves loss/parameter_init from stepper to stepper_training per the
TrainConfig restructuring in main.
Delete 15 pre-generated IC-specific config files and instead do the
_r[N]i label substitution inside the gantry container at job runtime
via sed, keeping only the 3 template configs committed.
Copy link
Copy Markdown
Contributor

@Arcomano1234 Arcomano1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments / questions for my own curiosity but this looks mostly good to go. I've been using some of these scripts so it will be nice to have in main. My only real comment is removing your hard-coded wandb name in a lot of the job submission scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants