-
Notifications
You must be signed in to change notification settings - Fork 17
Add grace to train #663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
harveydevereux
wants to merge
33
commits into
stfc:main
Choose a base branch
from
harveydevereux:add_grace_to_train
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Add grace to train #663
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
61b51fe
Adds a test case for training nequip.
harveydevereux df60cf2
Add file_prefix option to train cli.
harveydevereux d1039b9
Test fine-tuning for nequip
harveydevereux fdd30c2
Adds extra models, nequip tested on foundation
harveydevereux 681c9eb
Fix formatting, use curl
harveydevereux 79387bd
win bash
harveydevereux 05456dc
Apply suggestions from code review
harveydevereux d14c3cb
Typehints and nequip foundation_model
harveydevereux 376d480
Suggestion + typos/imports
harveydevereux 71c563e
Use Python script for extra models
harveydevereux 4824d9a
Use python in ci extra models download
harveydevereux 82ba4bb
Supply path arg, create if not exists
harveydevereux 8fee4d7
Adds SevenNet training
harveydevereux dc8e2f9
Add finetune test
harveydevereux 720b11e
Add finetuning test
harveydevereux a743ac0
Add Sevennet foundation download
harveydevereux 68d0f9a
Sevennet in train/cli
harveydevereux c917ba7
Ruff
harveydevereux a15c6b9
Apply suggestions from code review
harveydevereux 88c48ed
Remove duplicate line in windows.yml
harveydevereux 57a9a40
Apply suggestions from code review
harveydevereux 4bb17e0
Add file_prefix option to train cli.
harveydevereux c474690
Test fine-tuning for nequip
harveydevereux f06a3f1
Adds extra models, nequip tested on foundation
harveydevereux e087191
Fix formatting, use curl
harveydevereux bfa893c
Update train docs
harveydevereux c0a8097
Apply suggestions from code review
harveydevereux c720654
Suggestion + typos/imports
harveydevereux bac67e1
Supply path arg, create if not exists
harveydevereux efc40e3
Add grace to train
harveydevereux 43bd4f0
ruff
harveydevereux 4686607
Remove duplicate
harveydevereux 1726b85
Add training yml file
harveydevereux File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| seed: 42 | ||
| cutoff: 6 | ||
|
|
||
| data: | ||
| filename: "tests/data/mlip_train.pkl.gz" | ||
| reference_energy: 0 | ||
|
|
||
| potential: | ||
| finetune_foundation_model: "GRACE-1L-OAM" | ||
|
|
||
| fit: | ||
| loss: | ||
| energy: | ||
| type: huber | ||
| weight: 17 | ||
| delta: 0.01 | ||
| forces: | ||
| type: huber | ||
| weight: 32. | ||
| delta: 0.01 | ||
|
|
||
| maxiter: 1 # Max number of optimization epochs | ||
| optimizer: Adam | ||
| opt_params: { learning_rate: 0.008, use_ema: True, ema_momentum: 0.99, weight_decay: 1.e-20, clipnorm: 1.0} | ||
| scheduler: cosine_decay # scheduler for learning-rate reduction during training | ||
| scheduler_params: {"minimal_learning_rate": 0.0001} | ||
|
|
||
| batch_size: 32 # Important hyperparameter for Adam and irrelevant (but must be) for L-BFGS-B/BFGS | ||
| test_batch_size: 200 # test batch size (optional) | ||
|
|
||
| jit_compile: True # for XLA compilation, must be used in almost all cases | ||
| train_max_n_buckets: 10 ## max number of buckets in train set | ||
| test_max_n_buckets: 3 ## same for test | ||
|
|
||
| checkpoint_freq: 10 # frequency for **REGULAR** checkpoints. |
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| model: | ||
| chemical_species: auto | ||
|
|
||
| cutoff: 2.0 | ||
| irreps_manual: | ||
| - 128x0e | ||
| - 128x0e+64x1e+32x2e+32x3e | ||
| - 128x0e+64x1e+32x2e+32x3e | ||
| - 128x0e+64x1e+32x2e+32x3e | ||
| - 128x0e+64x1e+32x2e+32x3e | ||
| - 128x0e | ||
| channel: 128 | ||
| lmax: 3 | ||
| num_convolution_layer: 5 | ||
| is_parity: false | ||
| radial_basis: | ||
| radial_basis_name: bessel | ||
| bessel_basis_num: 8 | ||
| cutoff_function: | ||
| cutoff_function_name: poly_cut | ||
| poly_cut_p_value: 6 | ||
|
|
||
| act_radial: silu | ||
| weight_nn_hidden_neurons: | ||
| - 64 | ||
| - 64 | ||
| act_scalar: | ||
| e: silu | ||
| o: tanh | ||
| act_gate: | ||
| e: silu | ||
| o: tanh | ||
|
|
||
| train_denominator: false | ||
| train_shift_scale: false | ||
| use_bias_in_linear: false | ||
|
|
||
| readout_as_fcn: false | ||
| self_connection_type: linear | ||
| interaction_type: nequip | ||
|
|
||
| train: | ||
| random_seed: 1 | ||
| is_train_stress: True | ||
| epoch: 1 | ||
|
|
||
|
|
||
|
|
||
| optimizer: 'adam' | ||
| optim_param: | ||
| lr: 0.005 | ||
| scheduler: 'exponentiallr' | ||
| scheduler_param: | ||
| gamma: 0.99 | ||
|
|
||
| force_loss_weight: 0.1 | ||
| stress_loss_weight: 1e-06 | ||
|
|
||
| per_epoch: 1 | ||
|
|
||
|
|
||
|
|
||
| error_record: | ||
| - ['Energy', 'RMSE'] | ||
| - ['Force', 'RMSE'] | ||
| - ['Stress', 'RMSE'] | ||
| - ['TotalLoss', 'None'] | ||
|
|
||
| continue: | ||
| reset_optimizer: True | ||
| reset_scheduler: True | ||
| reset_epoch: True | ||
| checkpoint: 'tests/models/extra/SevenNet_l3i5.pth' | ||
|
|
||
| use_statistic_values_of_checkpoint: True | ||
|
|
||
| data: | ||
| batch_size: 4 | ||
| data_divide_ratio: 0.1 | ||
|
|
||
| shift: 'per_atom_energy_mean' | ||
| scale: 'force_rms' | ||
|
|
||
|
|
||
|
|
||
| data_format: 'ase' | ||
| data_format_args: | ||
| index: ':' | ||
|
|
||
|
|
||
|
|
||
| load_dataset_path: ['tests/data/mlip_train.xyz'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| model: | ||
| chemical_species: 'Auto' | ||
| cutoff: 2.0 | ||
| channel: 4 | ||
| lmax: 1 | ||
| num_convolution_layer: 1 | ||
|
|
||
| weight_nn_hidden_neurons: [4, 4] | ||
| radial_basis: | ||
| radial_basis_name: 'bessel' | ||
| bessel_basis_num: 8 | ||
| cutoff_function: | ||
| cutoff_function_name: 'poly_cut' | ||
| poly_cut_p_value: 6 | ||
|
|
||
| act_gate: {'e': 'silu', 'o': 'tanh'} | ||
| act_scalar: {'e': 'silu', 'o': 'tanh'} | ||
|
|
||
| is_parity: False | ||
|
|
||
| self_connection_type: 'nequip' | ||
|
|
||
| conv_denominator: "avg_num_neigh" | ||
| train_denominator: False | ||
| train_shift_scale: False | ||
|
|
||
| train: | ||
| random_seed: 1 | ||
| is_train_stress: True | ||
| epoch: 2 | ||
| optimizer: 'adam' | ||
| optim_param: | ||
| lr: 0.005 | ||
| scheduler: 'exponentiallr' | ||
| scheduler_param: | ||
| gamma: 0.99 | ||
| force_loss_weight: 0.1 | ||
| stress_loss_weight: 1e-06 | ||
| per_epoch: 1 | ||
| error_record: | ||
| - ['Energy', 'RMSE'] | ||
| - ['Force', 'RMSE'] | ||
| - ['Stress', 'RMSE'] | ||
| - ['TotalLoss', 'None'] | ||
|
|
||
| data: | ||
| batch_size: 4 | ||
| data_divide_ratio: 0.1 | ||
|
|
||
| shift: 'per_atom_energy_mean' | ||
| scale: 'force_rms' | ||
| data_format: 'ase' | ||
| data_format_args: | ||
| index: ':' | ||
| load_dataset_path: ['tests/data/mlip_train.xyz'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| if [ ! -d tests/models/extra ] | ||
| then | ||
| mkdir tests/models/extra | ||
| fi | ||
|
|
||
| (cd tests/models/extra; curl --output NequIP-MP-L-0.1.nequip.zip https://zenodo.org/records/16980200/files/NequIP-MP-L-0.1.nequip.zip) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file would not be required if we upgraded to
tensorpotential 0.5.5since that addextxyzsupport.It is small though, and it requires various modifications to the xyz's we have as well
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely should upgrade at some point, but it introduces conflicts with basically everything else via PyTorch/CUDA conflicts (see ICAMS/grace-tensorpotential#23), if I remember correctly
This may be more tractable once we cut out some of the unsupported MLIPs from our extras