Skip to content

roitberg-group/ani_engine

Repository files navigation

ani_engine

A workflow to train or evaluate ANI-related networks.

Install

install torchani, if working with parquet dataset (currently not working for hipergator because of cuda driver issue)

conda create -n cudf -c https://roitberg.chem.ufl.edu/projects/conda-packages-uf-gainesville -c rapidsai -c nvidia -c defaults -c conda-forge sandbox_cudf python=3.8

otherwise install torchani with

conda create -n ani -c https://roitberg.chem.ufl.edu/projects/conda-packages-uf-gainesville -c pytorch -c nvidia -c defaults -c conda-forge sandbox python=3.8

then install ani_engine and dependencies

conda activate ani  # or cudf
git clone git@github.com:roitberg-group/ani_engine.git
cd ani_engine
pip install -e .
pip install -r test_requirements.txt

After installation, there will be an executable script (ani_engine) available on you path.

$ which ani_engine
/home/richard/program/anaconda3/envs/cudf/bin/ani_engine

Features

  • train
    • read config file, check more detail at configs
    • ensemble train
    • wandb support - Example Report 1
    • customize your own engine and model
  • convert h5 dataset to parquet format
  • eval
    • quickly eval a (builtin model / trained model / trained ensemble model) for (a dataset file / a folder of datasets)
    • save evaluation results as csv (example of ani2x model for comp6v1 dataset: misc/eval/ani2x-Overall-comp6v1.csv)
      • csv is sorted by mean_abs_error_kcal_mol in descending order
      • only avaiable for parquet format dataset

Examples

0. init

Initiate ani_run, the following command will create a ani_run folder

ani_engine init

folder structure:

ani_run/
├── ani_run   # scripts, custom models and engines
│   ├── engines
│   │   └── __init__.py
│   ├── __init__.py
│   └── models
│       └── __init__.py
├── configs   # config files
├── datasets
├── logs
└── setup.py

A simple demo is available at ani_run_demo.
It's recommended to make this folder under source control (like git).
And note that please always run ani_engine command within the first ani_run folder.

1. download

download dataset from torchani

torchani download --help

2. train

ani_engine train configs/1x-energy.yaml

Useful options:

  • --name: override general.name in the config file
  • --use_wandb: flag to enable wandb, before using it you need to register a wandb account, and sign in with wandb login. Check quickstart - documentation for wandb. And let Richard know you wandb account email, so you could be added to our roitberg-group organization.
  • --mode: choose between {run, test, debug}, this controls which folder, e.g. logs/run or logs/debug, the log and checkpoint files should be saved. Folder logs/debug should be the one that could be safely deleted without any issue.

Check more options with ani_engine train --help.

3. ensemble train

prepare ensembles configs

ani_engine prepare_ensembles configs/1x-energy-ensemble.yaml -n 8 --mode=run

output

=> custom config_options:
{'general;mode': 'run'}

=> prepareing
=> using following configs for each ensemble
logs/run/20210808_213427-395a1814/0/config.yaml
logs/run/20210808_213427-395a1814/1/config.yaml
logs/run/20210808_213427-395a1814/2/config.yaml
logs/run/20210808_213427-395a1814/3/config.yaml
logs/run/20210808_213427-395a1814/4/config.yaml
logs/run/20210808_213427-395a1814/5/config.yaml
logs/run/20210808_213427-395a1814/6/config.yaml
logs/run/20210808_213427-395a1814/7/config.yaml

train

ani_engine train logs/run/20210808_213427-395a1814/0/config.yaml

Check more options with ani_engine prepare_ensembles --help.

4. h52pq

Convert h5 dataset into parquet format.

# convert one h5 file
ani_engine h52pq datasets/ani1x/ANI-1x-wB97X-6-31Gd.h5
# or a folder contains multiple h5 files
ani_engine h52pq datasets/ani1x/

5. eval

Evaluate trained/builtin model with specified dataset.

ani_engine eval config_path data_path
  • config_path
    • torchani builtin model: ani1x, ani1ccx, ani2x
    • a config file: logs/debug/20210819_210426-769a2f31/config.yaml
    • a log dir contain config.yaml: logs/debug/20210819_210426-769a2f31/
    • a log dir of ensemble training: logs/debug/20210819_210426-769a2f31/
  • data_path
    • single h5 or pq dataset file
    • a directory of datasets, specify file extension type by --ext=h5 or --ext=pq

examples:

ani_engine eval ani1x datasets/comp6v1/ANI-MD-Bench.pq
ani_engine eval ani1x datasets/comp6v1/ --ext=pq
ani_engine eval logs/debug/20210819_210426-769a2f31/config.yaml datasets/comp6v1/ANI-MD-Bench.pq

For pq datasets, eval will save the prediction results into a csv file, for example:

$ ani_engine eval ani1x datasets/comp6v1/ANI-MD-Bench.pq
Evaluating the following 1 datasets:
['datasets/comp6v1/ANI-MD-Bench.pq']
=> loading datasets

[1/1]: datasets/comp6v1/ANI-MD-Bench.pq
loading ['datasets/comp6v1/ANI-MD-Bench.pq']  time used: 0.03 s, peak memory used: 28.00 MB, memory used: 28.00MB
abs_error_kcal_mol description:
count    1791.000000
mean        4.518622
std         7.594079
min         0.000057
25%         0.598989
50%         1.416891
75%         3.310224
max        41.041344
Name: abs_error_kcal_mol, dtype: float64

prediction results:
                count  atoms  mean_energy_hartree  mean_pred_hartree  std_abs_error_kcal_mol  min_abs_error_kcal_mol  max_abs_error_kcal_mol  mean_abs_error_kcal_mol       dataset
C101H154N28O29    128    312         -7654.514618       -7654.475488                7.506302                4.599105               41.043289                24.554672  ANI-MD-Bench
C51H68N12O18      128    149         -3994.653553       -3994.623883                3.530049                9.905260               28.050765                18.618089  ANI-MD-Bench
C20H30O           128     51          -855.053566        -855.061703                3.279170                0.201232               15.995083                 5.240489  ANI-MD-Bench
C38H52N6O7        128    103         -2333.813382       -2333.811527                2.054335                0.025642               10.021924                 2.646591  ANI-MD-Bench
C24H33N3O4        128     64         -1399.129431       -1399.126231                1.275210                0.050889                6.324997                 2.266660  ANI-MD-Bench
C22H28N2O         127     53         -1039.608298       -1039.610291                1.080829                0.017344                5.322560                 1.524564  ANI-MD-Bench
C22H31NO          128     55          -986.664078        -986.665765                1.031409                0.006053                5.559594                 1.516545  ANI-MD-Bench
C16H28N2O4        128     50         -1036.672773       -1036.672597                1.063736                0.000057                5.246314                 1.283457  ANI-MD-Bench
C18H27NO3         128     49          -982.282336        -982.283091                0.977692                0.007971                3.976573                 1.246637  ANI-MD-Bench
C8H10N4O2         128     24          -680.186476        -680.185177                0.763155                0.023915                3.445625                 1.053451  ANI-MD-Bench
C17H21NO          128     40          -790.164569        -790.164118                0.765038                0.033761                3.286045                 1.045221  ANI-MD-Bench
C15H25N3O         128     44          -825.889368        -825.890083                0.636617                0.024524                3.005481                 0.871884  ANI-MD-Bench
C13H21NO3         128     38          -788.188306        -788.188702                0.730836                0.013122                4.013644                 0.848035  ANI-MD-Bench
C8H9NO2           128     20          -515.323039        -515.323440                0.380368                0.008054                1.945716                 0.521167  ANI-MD-Bench
csv saved at: /work/dev/ani_run/logs/eval/ani1x/ani1x-20210914_170039-ANI-MD-Bench.csv


test_rmse: 8.89

6. genconfs

Generate multiple configs based on a base_config file and a matrix_config file

  • set() is applied to each parameter's values, so there are no repeating field.
  • if general.name exist, it will be set as general.name_{i}
  • if general.note exist, it will be set with matrix info for this config

check example at tests/test_config

cd tests/test_config
ani_engine genconfs base.yaml matrix.yaml -v

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages