A workflow to train or evaluate ANI-related networks.
install torchani, if working with parquet dataset (currently not working for hipergator because of cuda driver issue)
conda create -n cudf -c https://roitberg.chem.ufl.edu/projects/conda-packages-uf-gainesville -c rapidsai -c nvidia -c defaults -c conda-forge sandbox_cudf python=3.8
otherwise install torchani with
conda create -n ani -c https://roitberg.chem.ufl.edu/projects/conda-packages-uf-gainesville -c pytorch -c nvidia -c defaults -c conda-forge sandbox python=3.8
then install ani_engine and dependencies
conda activate ani # or cudf
git clone git@github.com:roitberg-group/ani_engine.git
cd ani_engine
pip install -e .
pip install -r test_requirements.txtAfter installation, there will be an executable script (ani_engine) available on you path.
$ which ani_engine
/home/richard/program/anaconda3/envs/cudf/bin/ani_engine
- train
- convert h5 dataset to parquet format
- eval
- quickly eval a (builtin model / trained model / trained ensemble model) for (a dataset file / a folder of datasets)
- save evaluation results as csv (example of ani2x model for comp6v1 dataset: misc/eval/ani2x-Overall-comp6v1.csv)
- csv is sorted by mean_abs_error_kcal_mol in descending order
- only avaiable for parquet format dataset
Initiate ani_run, the following command will create a ani_run folder
ani_engine initfolder structure:
ani_run/
├── ani_run # scripts, custom models and engines
│ ├── engines
│ │ └── __init__.py
│ ├── __init__.py
│ └── models
│ └── __init__.py
├── configs # config files
├── datasets
├── logs
└── setup.py
A simple demo is available at ani_run_demo.
It's recommended to make this folder under source control (like git).
And note that please always run ani_engine command within the first ani_run folder.
download dataset from torchani
torchani download --help
ani_engine train configs/1x-energy.yamlUseful options:
--name: override general.name in the config file--use_wandb: flag to enable wandb, before using it you need to register a wandb account, and sign in withwandb login. Check quickstart - documentation for wandb. And let Richard know you wandb account email, so you could be added to our roitberg-group organization.--mode: choose between {run, test, debug}, this controls which folder, e.g.logs/runorlogs/debug, the log and checkpoint files should be saved. Folderlogs/debugshould be the one that could be safely deleted without any issue.
Check more options with ani_engine train --help.
prepare ensembles configs
ani_engine prepare_ensembles configs/1x-energy-ensemble.yaml -n 8 --mode=runoutput
=> custom config_options:
{'general;mode': 'run'}
=> prepareing
=> using following configs for each ensemble
logs/run/20210808_213427-395a1814/0/config.yaml
logs/run/20210808_213427-395a1814/1/config.yaml
logs/run/20210808_213427-395a1814/2/config.yaml
logs/run/20210808_213427-395a1814/3/config.yaml
logs/run/20210808_213427-395a1814/4/config.yaml
logs/run/20210808_213427-395a1814/5/config.yaml
logs/run/20210808_213427-395a1814/6/config.yaml
logs/run/20210808_213427-395a1814/7/config.yaml
train
ani_engine train logs/run/20210808_213427-395a1814/0/config.yamlCheck more options with ani_engine prepare_ensembles --help.
Convert h5 dataset into parquet format.
# convert one h5 file
ani_engine h52pq datasets/ani1x/ANI-1x-wB97X-6-31Gd.h5
# or a folder contains multiple h5 files
ani_engine h52pq datasets/ani1x/Evaluate trained/builtin model with specified dataset.
ani_engine eval config_path data_path
- config_path
- torchani builtin model: ani1x, ani1ccx, ani2x
- a config file: logs/debug/20210819_210426-769a2f31/config.yaml
- a log dir contain config.yaml: logs/debug/20210819_210426-769a2f31/
- a log dir of ensemble training: logs/debug/20210819_210426-769a2f31/
- data_path
- single h5 or pq dataset file
- a directory of datasets, specify file extension type by --ext=h5 or --ext=pq
examples:
ani_engine eval ani1x datasets/comp6v1/ANI-MD-Bench.pq
ani_engine eval ani1x datasets/comp6v1/ --ext=pq
ani_engine eval logs/debug/20210819_210426-769a2f31/config.yaml datasets/comp6v1/ANI-MD-Bench.pqFor pq datasets, eval will save the prediction results into a csv file, for example:
$ ani_engine eval ani1x datasets/comp6v1/ANI-MD-Bench.pq
Evaluating the following 1 datasets:
['datasets/comp6v1/ANI-MD-Bench.pq']
=> loading datasets
[1/1]: datasets/comp6v1/ANI-MD-Bench.pq
loading ['datasets/comp6v1/ANI-MD-Bench.pq'] time used: 0.03 s, peak memory used: 28.00 MB, memory used: 28.00MB
abs_error_kcal_mol description:
count 1791.000000
mean 4.518622
std 7.594079
min 0.000057
25% 0.598989
50% 1.416891
75% 3.310224
max 41.041344
Name: abs_error_kcal_mol, dtype: float64
prediction results:
count atoms mean_energy_hartree mean_pred_hartree std_abs_error_kcal_mol min_abs_error_kcal_mol max_abs_error_kcal_mol mean_abs_error_kcal_mol dataset
C101H154N28O29 128 312 -7654.514618 -7654.475488 7.506302 4.599105 41.043289 24.554672 ANI-MD-Bench
C51H68N12O18 128 149 -3994.653553 -3994.623883 3.530049 9.905260 28.050765 18.618089 ANI-MD-Bench
C20H30O 128 51 -855.053566 -855.061703 3.279170 0.201232 15.995083 5.240489 ANI-MD-Bench
C38H52N6O7 128 103 -2333.813382 -2333.811527 2.054335 0.025642 10.021924 2.646591 ANI-MD-Bench
C24H33N3O4 128 64 -1399.129431 -1399.126231 1.275210 0.050889 6.324997 2.266660 ANI-MD-Bench
C22H28N2O 127 53 -1039.608298 -1039.610291 1.080829 0.017344 5.322560 1.524564 ANI-MD-Bench
C22H31NO 128 55 -986.664078 -986.665765 1.031409 0.006053 5.559594 1.516545 ANI-MD-Bench
C16H28N2O4 128 50 -1036.672773 -1036.672597 1.063736 0.000057 5.246314 1.283457 ANI-MD-Bench
C18H27NO3 128 49 -982.282336 -982.283091 0.977692 0.007971 3.976573 1.246637 ANI-MD-Bench
C8H10N4O2 128 24 -680.186476 -680.185177 0.763155 0.023915 3.445625 1.053451 ANI-MD-Bench
C17H21NO 128 40 -790.164569 -790.164118 0.765038 0.033761 3.286045 1.045221 ANI-MD-Bench
C15H25N3O 128 44 -825.889368 -825.890083 0.636617 0.024524 3.005481 0.871884 ANI-MD-Bench
C13H21NO3 128 38 -788.188306 -788.188702 0.730836 0.013122 4.013644 0.848035 ANI-MD-Bench
C8H9NO2 128 20 -515.323039 -515.323440 0.380368 0.008054 1.945716 0.521167 ANI-MD-Bench
csv saved at: /work/dev/ani_run/logs/eval/ani1x/ani1x-20210914_170039-ANI-MD-Bench.csv
test_rmse: 8.89Generate multiple configs based on a base_config file and a matrix_config file
- set() is applied to each parameter's values, so there are no repeating field.
- if general.name exist, it will be set as general.name_{i}
- if general.note exist, it will be set with matrix info for this config
check example at tests/test_config
cd tests/test_config
ani_engine genconfs base.yaml matrix.yaml -v