Skip to content

ai4sci-research/FABind

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FABind: Fast and Accurate Protein-Ligand Binding πŸ”₯

News

πŸ”₯Apr 01 2024: Release our new version FABind+ with enhanced performance and sampling ability. Check the FABind+ paper on arxiv. The corresponding codes will be released soon.

πŸ”₯Mar 02 2024: Fix the bug of inference from custom complex caused by an incorrect loaded parameter and rdkit version. We also normalize the order of the atom for the writed mol file in post optimization. See more details in this commit.

πŸ”₯Jan 01 2024: Upload trained checkpoint into Google Drive.

πŸ”₯Nov 09 2023: Move trained checkpoint from Github to HuggingFace.

πŸ”₯Oct 10 2023: The trained FABind model and processed dataset are released!

πŸ”₯Oct 11 2023: Initial commits. More codes, pre-trained model, and data are coming soon.

Overview

This repository contains the source code for NeurIPS 2023 paper "FABind: Fast and Accurate Protein-Ligand Binding". FABind achieves accurate docking performance with high speed compared to recent baselines. If you have questions, don't hesitate to open an issue or ask me via qizhipei@ruc.edu.cn, Kaiyuan Gao via im_kai@hust.edu.cn, or Lijun Wu via lijuwu@microsoft.com. We are happy to hear from you!

Setup Environment

This is an example of how to set up a working conda environment to run the code. In this example, we have cuda version==11.3, torch==1.12.0, and rdkit==2021.03.4. To make sure the pyg packages are installed correctly, we directly install them from whl.

As the trained model checkpoint is included in the HuggingFace repository with git-lfs, you need to install git-lfs to pull the data correctly.

sudo apt-get install git-lfs # run this if you have not installed git-lfs
git lfs install
git clone https://github.com/QizhiPei/FABind.git --recursive
conda create --name fabind python=3.8
conda activate fabind
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_cluster-1.6.0%2Bpt112cu113-cp38-cp38-linux_x86_64.whl
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.1.0%2Bpt112cu113-cp38-cp38-linux_x86_64.whl
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_sparse-0.6.15%2Bpt112cu113-cp38-cp38-linux_x86_64.whl 
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_spline_conv-1.2.1%2Bpt112cu113-cp38-cp38-linux_x86_64.whl
pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/pyg_lib-0.2.0%2Bpt112cu113-cp38-cp38-linux_x86_64.whl
pip install torch-geometric==2.4.0
pip install torchdrug==0.1.2 torchmetrics==0.10.2 tqdm mlcrate pyarrow accelerate Bio lmdb fair-esm tensorboard
pip install fair-esm
pip install rdkit-pypi==2021.03.4
conda install -c conda-forge openbabel # install openbabel to save .mol2 file and .sdf file at the same time

Data

The PDBbind 2020 dataset can be download from http://www.pdbbind.org.cn. We then follow the same data processing as TankBind.

We also provided processed dataset on zenodo. If you want to train FABind from scratch, or reproduce the FABind results, you can:

  1. download dataset from zenodo
  2. unzip the zip file and place it into data_path such that data_path=pdbbind2020

Generate the ESM2 embeddings for the proteins

Before training or evaluation, you need to first generate the ESM2 embeddings for the proteins based on the preprocessed data above.

data_path=pdbbind2020

python fabind/tools/generate_esm2_t33.py ${data_path}

Then the ESM2 embedings will be saved at ${data_path}/dataset/processed/esm2_t33_650M_UR50D.lmdb.

Model

The pre-trained model is placed at ckpt/best_model.bin, which will be automatically downloaded when cloning this reporsitory with --recursive.

You can also manually download the pre-trained model from Hugging Face or Google Drive.

Evaluation

data_path=pdbbind2020
ckpt_path=ckpt/best_model.bin

python fabind/test_fabind.py \
    --batch_size 4 \
    --data-path $data_path \
    --resultFolder ./results \
    --exp-name test_exp \
    --ckpt $ckpt_path \
    --local-eval

Inference on Custom Complexes

Here are the scripts available for inference with smiles and according pdb files.

The following script iteratively runs:

  • Given smiles in index_csv, preprocess molecules with num_threads multiprocessing and save each processed molecule to {save_pt_dir}/mol.
  • Given protein pdb files in pdb_file_dir, preprocess protein information and save it to {save_pt_dir}/processed_protein.pt.
  • Load model checkpoint in ckpt_path, save the predicted molecule conformation in output_dir. Another csv file in output_dir indicates the smiles and according filename.
index_csv=../inference_examples/example.csv
pdb_file_dir=../inference_examples/pdb_files
num_threads=1
save_pt_dir=../inference_examples/temp_files
save_mols_dir=${save_pt_dir}/mol
ckpt_path=../ckpt/best_model.bin
output_dir=../inference_examples/inference_output

cd fabind

echo "======  preprocess molecules  ======"
python inference_preprocess_mol_confs.py --index_csv ${index_csv} --save_mols_dir ${save_mols_dir} --num_threads ${num_threads}

echo "======  preprocess proteins  ======"
python inference_preprocess_protein.py --pdb_file_dir ${pdb_file_dir} --save_pt_dir ${save_pt_dir}

echo "======  inference begins  ======"
python fabind_inference.py \
    --ckpt ${ckpt_path} \
    --batch_size 4 \
    --seed 128 \
    --test-gumbel-soft \
    --redocking \
    --post-optim \
    --write-mol-to-file \
    --sdf-output-path-post-optim ${output_dir} \
    --index-csv ${index_csv} \
    --preprocess-dir ${save_pt_dir} \
    --sdf-to-mol2

Re-training

data_path=pdbbind_2020
# write the default accelerate settings
python -c "from accelerate.utils import write_basic_config; write_basic_config(mixed_precision='no')"
# "accelerate launch" will run the experiments in multi-gpu if applicable 
accelerate launch fabind/main_fabind.py \
    --batch_size 3 \
    -d 0 \
    -m 5 \
    --data-path $data_path \
    --label baseline \
    --addNoise 5 \
    --resultFolder ./results \
    --use-compound-com-cls \
    --total-epochs 500 \
    --exp-name train_tmp \
    --coord-loss-weight 1.0 \
    --pair-distance-loss-weight 1.0 \
    --pair-distance-distill-loss-weight 1.0 \
    --pocket-cls-loss-weight 1.0 \
    --pocket-distance-loss-weight 0.05 \
    --lr 5e-05 --lr-scheduler poly_decay \
    --distmap-pred mlp \
    --hidden-size 512 --pocket-pred-hidden-size 128 \
    --n-iter 8 --mean-layers 4 \
    --refine refine_coord \
    --coordinate-scale 5 \
    --geometry-reg-step-size 0.001 \
    --rm-layernorm --add-attn-pair-bias --explicit-pair-embed --add-cross-attn-layer \
    --noise-for-predicted-pocket 0 \
    --clip-grad \
    --random-n-iter \
    --pocket-idx-no-noise \
    --pocket-cls-loss-func bce \
    --use-esm2-feat

About

Citations

@article{pei2023fabind,
  title={FABind: Fast and Accurate Protein-Ligand Binding},
  author={Pei, Qizhi and Gao, Kaiyuan and Wu, Lijun and Zhu, Jinhua and Xia, Yingce and Xie, Shufang and Qin, Tao and He, Kun and Liu, Tie-Yan and Yan, Rui},
  journal={arXiv preprint arXiv:2310.06763},
  year={2023}
}

@inproceedings{pei2023fabind,
  title={{FAB}ind: Fast and Accurate Protein-Ligand Binding},
  author={Qizhi Pei and Kaiyuan Gao and Lijun Wu and Jinhua Zhu and Yingce Xia and Shufang Xie and Tao Qin and Kun He and Tie-Yan Liu and Rui Yan},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=PnWakgg1RL}
}
@misc{gao2024fabind,
      title={FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation}, 
      author={Kaiyuan Gao and Qizhi Pei and Jinhua Zhu and Tao Qin and Kun He and Lijun Wu},
      journal={arXiv preprint arXiv:2403.20261},
      year={2024}
}

Related

Awesome-docking

Acknowledegments

We appreciate EquiBind, TankBind, E3Bind, DiffDock and other related works for their open-sourced contributions.

About

FABind: Fast and Accurate Protein-Ligand Binding (NeurIPS 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%