This repository contains the implementation of our proposed machine learning approach for MOF Free Energy Prediction.
Schematic representation of MOFseq
For more details, please read our paper, Highly Accurate and Fast Prediction of MOF Free Energy Via Machine Learning
You can install MOF-FreeEnergy by following these steps:
git clone https://github.com/vertaix/MOF-FreeEnergy.git
cd MOF-FreeEnergy
conda create -n <environment_name> requirement.txt
conda activate <environment_name>
We added the data (train, validation, and test splits) we used in our experiments in this repository for easy access and reproducibility. No preprocessing is needed.
We first pretrain on strain energy by running:
python src/llmprop_train.py \
--model_name llmprop \
--input_type mofseq-1 \ # other options: mofseq-2, mofname, mofid_v1
--property_name SE_atom \
--dr 0.2 \
--lr 1e-3 \
--max_len 2000 \
--epochs 100 \
--train_bs 64 \
--inference_bs 512Then we finetune on free energy by running:
python src/llmprop_train.py \
--model_name llmprop_finetune \
--property_name FE_atom \
--input_type mofseq-1 \
--dr 0.2 \
--lr 1e-3 \
--max_len 2000 \
--epochs 200 \
--train_bs 64 \
--inference_bs 512 \
--pretraining_ckpt_path "checkpoints/se_atom/mofseq-1/best_checkpoint.pt" \For evaluation run:
python src/llmprop_evaluate.py \
--input_type mofseq-1 \ # other options: mofseq-2
--property_name FE_atom \
--inference_bs 512 \
--max_len 2000 \
--checkpoint_path "checkpoints/fe_atom/mofseq-1/best_checkpoint.pt" \
--config_path "checkpoints/fe_atom/mofseq-1/training_config.json" \Follow the below example to predict the free energy of one or few samples. For more details, please check this inference notebook:
from src.llmprop_inference import predict
mofname = "SR_nkc_v1-4c_Cu_1_Ch_v2-4c_1anC_Ch_v3-3c_B_Ch_2B_fused_Ch"
mofid = "[Cu][Cu].[O-]C(=O)c1ccc2c(c1)ccc(c2)c1cc2-c3ccc4c(c3)ccc(c4)C3(c4ccc5c(c4)ccc(-c4cc(-c6cc7ccc(-c(c1)c2)cc7cc6)cc(c4)c1ccc2c(c1)ccc(c2)C(=O)[O-])c5)c1ccc2c(c1)ccc(-c1cc(-c4ccc5cc(-c6cc(-c7cc8ccc3cc8cc7)cc(c6)c3ccc6c(c3)ccc(c6)C(=O)[O-])ccc5c4)cc(c1)c1ccc3c(c1)ccc(c3)C(=O)[O-])c2 MOFid-v1.TIMEOUT.cat0.NO_REF"
prop_name = "FE_atom" # options: "FE_atom" for free energy, "SE_atom" for strain energy
embeddings, predictions, predicting_time = predict(
input_mofname=mofname,
input_mofid=mofid,
property_name=prop_name
)
print("-"*50)
print(f'predicted {prop_name}:', predictions)
print('inference time (s): ', predicting_time)@article{NiyongaboRubungo2025,
author={Niyongabo Rubungo, Andre and Fajardo-Rojas, Fernando and G{\'o}mez-Gualdr{\'o}n, Diego A. and Dieng, Adji Bousso},
title={Highly Accurate and Fast Prediction of MOF Free Energy via Machine Learning},
journal={Journal of the American Chemical Society},
year={2025},
month={Dec},
day={16},
publisher={American Chemical Society},
issn={0002-7863},
doi={10.1021/jacs.5c13960},
url={https://doi.org/10.1021/jacs.5c13960}
}