Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
61b51fe
Adds a test case for training nequip.
harveydevereux Jan 16, 2026
df60cf2
Add file_prefix option to train cli.
harveydevereux Jan 20, 2026
d1039b9
Test fine-tuning for nequip
harveydevereux Jan 20, 2026
fdd30c2
Adds extra models, nequip tested on foundation
harveydevereux Jan 23, 2026
681c9eb
Fix formatting, use curl
harveydevereux Jan 23, 2026
79387bd
win bash
harveydevereux Jan 23, 2026
05456dc
Apply suggestions from code review
harveydevereux Feb 4, 2026
d14c3cb
Typehints and nequip foundation_model
harveydevereux Feb 4, 2026
376d480
Suggestion + typos/imports
harveydevereux Feb 4, 2026
71c563e
Use Python script for extra models
harveydevereux Feb 4, 2026
4824d9a
Use python in ci extra models download
harveydevereux Feb 4, 2026
82ba4bb
Supply path arg, create if not exists
harveydevereux Feb 4, 2026
8fee4d7
Adds SevenNet training
harveydevereux Feb 3, 2026
dc8e2f9
Add finetune test
harveydevereux Feb 3, 2026
720b11e
Add finetuning test
harveydevereux Feb 4, 2026
a743ac0
Add Sevennet foundation download
harveydevereux Feb 4, 2026
68d0f9a
Sevennet in train/cli
harveydevereux Feb 4, 2026
c917ba7
Ruff
harveydevereux Feb 23, 2026
a15c6b9
Apply suggestions from code review
harveydevereux Mar 13, 2026
88c48ed
Remove duplicate line in windows.yml
harveydevereux Mar 13, 2026
57a9a40
Apply suggestions from code review
harveydevereux Mar 13, 2026
4bb17e0
Add file_prefix option to train cli.
harveydevereux Jan 20, 2026
c474690
Test fine-tuning for nequip
harveydevereux Jan 20, 2026
f06a3f1
Adds extra models, nequip tested on foundation
harveydevereux Jan 23, 2026
e087191
Fix formatting, use curl
harveydevereux Jan 23, 2026
bfa893c
Update train docs
harveydevereux Jan 23, 2026
c0a8097
Apply suggestions from code review
harveydevereux Feb 4, 2026
c720654
Suggestion + typos/imports
harveydevereux Feb 4, 2026
bac67e1
Supply path arg, create if not exists
harveydevereux Feb 4, 2026
efc40e3
Add grace to train
harveydevereux Feb 5, 2026
43bd4f0
ruff
harveydevereux Feb 23, 2026
4686607
Remove duplicate
harveydevereux Mar 13, 2026
1726b85
Add training yml file
harveydevereux Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion docs/source/user_guide/command_line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -659,7 +659,7 @@ Training and fine-tuning MLIPs
------------------------------

.. note::
Currently only MACE and Nequip models are supported.
Currently MACE, Nequip, and SevenNet models are supported.

Models can be trained by passing an archictecture and an archictecture specific configuration file as options to the ``janus train`` command. The configuration file will be passed to the corresponding MLIPs command line interface. For example to train a MACE MLIP:

Expand Down Expand Up @@ -699,6 +699,37 @@ Configuration of Nequip training is outlined in the `Nequip user guide <https://

The results directory contents depends on the options selected in the configuration file, but may typically contain model checkpoint, ``.ckpt``, files and a metrics directory.

.. note::
Different architectures may have different restrictions or features. For example Nequip requires YAML files to be written as ``.yaml`` rather than ``.yml``. See the sections below for specific archictecture guidance.

Foundational models can also be fine-tuned, by passing the ``--fine-tune`` option:

.. code-block:: bash

janus train mace --mlip-config /path/to/mace/fine/tuning/config.yml --fine-tune

By default the output of training or fine-tuning will be in the ``./janus_results`` directory. This directory's contents varies in structure depending on the architecture being trained and whether fine-tuning is being conducted. However as with other commands a log file, ``train-log.yml``, and summary file, ``train-summary.yml``, will be generated in ``./janus_results`` by default.

Training MACE MLIPs
+++++++++++++++++++

For MACE, training will create ``logs``, ``checkpoints`` and ``results`` sub-directories, as well as saving the trained model, and a compiled version of the model.

Instructions for writing a MACE ``config.yml`` file can be found in the `MACE Readme <https://github.com/ACEsuit/mace?tab=readme-ov-file#training>`_ and the `MACE run_train CLI <https://github.com/ACEsuit/mace/blob/main/mace/cli/run_train.py>`_.


Training Nequip MLIPS
+++++++++++++++++++++

Configuration of Nequip training is outlined in the `Nequip user guide <https://nequip.readthedocs.io/en/latest/guide/guide.html>`_. In particular note that the configuration file must have a ``.yaml`` extension.

The results directory contents depends on the options selected in the configuration file, but may typically contain model checkpoint, ``.ckpt``, files and a metrics directory.


Training SevenNet MLIPS
+++++++++++++++++++++++

The `SevenNet documentation <https://sevennet.readthedocs.io/en/latest/>`_ contains information on training SevenNet MLIPs. The SevenNet `tutorial repository <https://github.com/MDIL-SNU/sevennet_tutorial/tree/main>`_ also contains some example ```.yaml``` configuration files for training and fine-tuning.

Preprocessing training data
----------------------------
Expand Down
39 changes: 39 additions & 0 deletions janus_core/cli/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,45 @@ def train(
"""Fine-tuning requested but there is no checkpoint or
package specified in your config."""
)
case "sevennet":
continue_section = config["train"].get("continue")
if continue_section is None and fine_tune:
raise ValueError(
"""Fine-tuning requested but there is no continue
section in your config."""
)
model = continue_section.get("checkpoint")
if model is None:
raise ValueError(
"""No model specified as a checkpoint for
fine-tuning.
"""
)
if not fine_tune and continue_section is not None:
raise ValueError(
"""Fine-tuning not requested but a continue
section is in your config. Please use
--fine-tune"""
)

case "grace":
if "potential" not in config:
raise ValueError("No potential is specified in you config.")

if fine_tune:
model = config["potential"].get("finetune_foundation_model")
if model is None:
raise ValueError(
"""Fine-tuning was requested but your conifg
does not contains a finetune_foundation_model"""
)
elif "finetune_foundation_model" in config["potential"]:
raise ValueError(
"""Fine-tuning not requested but finetune_foundation_model
is in your config. Please use --fine-tune.
"""
)

case _:
raise ValueError(f"Unsupported Architecture ({arch})")

Expand Down
21 changes: 21 additions & 0 deletions janus_core/training/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

from __future__ import annotations

from argparse import ArgumentParser
from pathlib import Path
from typing import Any

import yaml
Expand Down Expand Up @@ -93,6 +95,20 @@ def train(
)
foundation_model = model["checkpoint_path"]

case "sevennet":
from sevenn.main.sevenn import cmd_parser_train, run

parser = ArgumentParser()
cmd_parser_train(parser)
mlip_args = parser.parse_args(
[str(mlip_config), "--working_dir", str(file_prefix), "-s"]
)

case "grace":
from tensorpotential.cli.gracemaker import main as run

mlip_args = [str(mlip_config)]

case _:
raise ValueError(f"{arch} is currently unsupported in train.")

Expand Down Expand Up @@ -120,6 +136,11 @@ def train(

run(mlip_args)

if arch == "grace" and (Path.cwd() / "seed").exists():
# Gracemaker always works in ./seed.
file_prefix.mkdir(parents=True, exist_ok=True)
(Path.cwd() / "seed").rename(file_prefix.resolve() / "seed")

if logger:
logger.info("Training complete")
if tracker:
Expand Down
35 changes: 35 additions & 0 deletions tests/data/grace_fine_tune.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
seed: 42
cutoff: 6

data:
filename: "tests/data/mlip_train.pkl.gz"
reference_energy: 0

potential:
finetune_foundation_model: "GRACE-1L-OAM"

fit:
loss:
energy:
type: huber
weight: 17
delta: 0.01
forces:
type: huber
weight: 32.
delta: 0.01

maxiter: 1 # Max number of optimization epochs
optimizer: Adam
opt_params: { learning_rate: 0.008, use_ema: True, ema_momentum: 0.99, weight_decay: 1.e-20, clipnorm: 1.0}
scheduler: cosine_decay # scheduler for learning-rate reduction during training
scheduler_params: {"minimal_learning_rate": 0.0001}

batch_size: 32 # Important hyperparameter for Adam and irrelevant (but must be) for L-BFGS-B/BFGS
test_batch_size: 200 # test batch size (optional)

jit_compile: True # for XLA compilation, must be used in almost all cases
train_max_n_buckets: 10 ## max number of buckets in train set
test_max_n_buckets: 3 ## same for test

checkpoint_freq: 10 # frequency for **REGULAR** checkpoints.
Binary file added tests/data/mlip_train.pkl.gz
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file would not be required if we upgraded to tensorpotential 0.5.5 since that add extxyz support.

It is small though, and it requires various modifications to the xyz's we have as well

Copy link
Member

@ElliottKasoar ElliottKasoar Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should upgrade at some point, but it introduces conflicts with basically everything else via PyTorch/CUDA conflicts (see ICAMS/grace-tensorpotential#23), if I remember correctly

This may be more tractable once we cut out some of the unsupported MLIPs from our extras

Binary file not shown.
92 changes: 92 additions & 0 deletions tests/data/sevennet_fine_tune.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
model:
chemical_species: auto

cutoff: 2.0
irreps_manual:
- 128x0e
- 128x0e+64x1e+32x2e+32x3e
- 128x0e+64x1e+32x2e+32x3e
- 128x0e+64x1e+32x2e+32x3e
- 128x0e+64x1e+32x2e+32x3e
- 128x0e
channel: 128
lmax: 3
num_convolution_layer: 5
is_parity: false
radial_basis:
radial_basis_name: bessel
bessel_basis_num: 8
cutoff_function:
cutoff_function_name: poly_cut
poly_cut_p_value: 6

act_radial: silu
weight_nn_hidden_neurons:
- 64
- 64
act_scalar:
e: silu
o: tanh
act_gate:
e: silu
o: tanh

train_denominator: false
train_shift_scale: false
use_bias_in_linear: false

readout_as_fcn: false
self_connection_type: linear
interaction_type: nequip

train:
random_seed: 1
is_train_stress: True
epoch: 1



optimizer: 'adam'
optim_param:
lr: 0.005
scheduler: 'exponentiallr'
scheduler_param:
gamma: 0.99

force_loss_weight: 0.1
stress_loss_weight: 1e-06

per_epoch: 1



error_record:
- ['Energy', 'RMSE']
- ['Force', 'RMSE']
- ['Stress', 'RMSE']
- ['TotalLoss', 'None']

continue:
reset_optimizer: True
reset_scheduler: True
reset_epoch: True
checkpoint: 'tests/models/extra/SevenNet_l3i5.pth'

use_statistic_values_of_checkpoint: True

data:
batch_size: 4
data_divide_ratio: 0.1

shift: 'per_atom_energy_mean'
scale: 'force_rms'



data_format: 'ase'
data_format_args:
index: ':'



load_dataset_path: ['tests/data/mlip_train.xyz']
55 changes: 55 additions & 0 deletions tests/data/sevennet_train.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
model:
chemical_species: 'Auto'
cutoff: 2.0
channel: 4
lmax: 1
num_convolution_layer: 1

weight_nn_hidden_neurons: [4, 4]
radial_basis:
radial_basis_name: 'bessel'
bessel_basis_num: 8
cutoff_function:
cutoff_function_name: 'poly_cut'
poly_cut_p_value: 6

act_gate: {'e': 'silu', 'o': 'tanh'}
act_scalar: {'e': 'silu', 'o': 'tanh'}

is_parity: False

self_connection_type: 'nequip'

conv_denominator: "avg_num_neigh"
train_denominator: False
train_shift_scale: False

train:
random_seed: 1
is_train_stress: True
epoch: 2
optimizer: 'adam'
optim_param:
lr: 0.005
scheduler: 'exponentiallr'
scheduler_param:
gamma: 0.99
force_loss_weight: 0.1
stress_loss_weight: 1e-06
per_epoch: 1
error_record:
- ['Energy', 'RMSE']
- ['Force', 'RMSE']
- ['Stress', 'RMSE']
- ['TotalLoss', 'None']

data:
batch_size: 4
data_divide_ratio: 0.1

shift: 'per_atom_energy_mean'
scale: 'force_rms'
data_format: 'ase'
data_format_args:
index: ':'
load_dataset_path: ['tests/data/mlip_train.xyz']
6 changes: 6 additions & 0 deletions tests/models/extra_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,13 @@
args = parser.parse_args()

args.path.mkdir(parents=True, exist_ok=True)

urlretrieve(
"https://zenodo.org/records/16980200/files/NequIP-MP-L-0.1.nequip.zip",
filename=args.path / "NequIP-MP-L-0.1.nequip.zip",
)

urlretrieve(
"https://github.com/MDIL-SNU/SevenNet/raw/dff008ac9c53d368b5bee30a27fa4bdfd73f19b2/sevenn/pretrained_potentials/SevenNet_l3i5/checkpoint_l3i5.pth",
filename=args.path / "SevenNet_l3i5.pth",
)
6 changes: 6 additions & 0 deletions tests/models/extra_models.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
if [ ! -d tests/models/extra ]
then
mkdir tests/models/extra
fi

(cd tests/models/extra; curl --output NequIP-MP-L-0.1.nequip.zip https://zenodo.org/records/16980200/files/NequIP-MP-L-0.1.nequip.zip)
Loading
Loading