📝 Paper | 🤗 Benchmark datasets | 🚩 Checkpoints | ⚙️ Application | 📚 Cite our paper!
The official implementation of manuscript "XMolCap: Advancing Molecular Captioning through Multimodal Fusion and Explainable Graph Neural Networks"
Large language models (LLMs) have significantly advanced computational biology by enabling the integration of molecular, protein, and natural language data to accelerate drug discovery. However, existing molecular captioning approaches often underutilize diverse molecular modalities and lack interpretability. In this study, we introduce XMolCap, a novel explainable molecular captioning framework that integrates molecular images, SMILES strings, and graph-based structures through a stacked multimodal fusion mechanism. The framework is built upon a BioT5-based encoder-decoder architecture, which serves as the backbone for extracting feature representations from SELFIES. By leveraging specialized models such as SwinOCSR, SciBERT, and GIN-MoMu, XMolCap effectively captures complementary information from each modality. Our model not only achieves state-of-the-art performance on two benchmark datasets (L+M-24 and ChEBI-20), outperforming several strong baselines, but also provides detailed, functional group-aware, and property-specific explanations through graph-based interpretation. XMolCap is publicly available at https://github.com/cbbl-skku-org/XMolCap/ for reproducibility and local deployment. We believe it holds strong potential for clinical and pharmaceutical applications by generating accurate, interpretable molecular descriptions that deepen our understanding of molecular properties and interactions.
2025.05.16: Happy to announce that our manuscript was accepted 🎉🎉🎉 (DOI: 10.1109/JBHI.2025.3572910).2025.02.14: XMolCap was considered for major revision.2024.12.03: Manuscript was submitted to IEEE Journal of Biomedical and Health Informatics (IEEE JBHI).
Create an environment using Miniconda or Conda:
conda create -n XMolCap python=3.10
conda activate XMolCapAfter cloning the repo, run the following command to install required packages:
# installing pytorch, recommend vervion 2.1.2 or above, you should change cuda version based on your GPU devices
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
# installing additional packages
pip install -r requirements.txt
# install additional packages for Torch Geometric, cuda version should match with torch's cuda version
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.2+cu121.htmlWe use these pretrained models for fine-tuning:
- BioT5: HuggingFace
- SwinOCSR: Kaggle
- SciBERT: HuggingFace
- GIN-MoMu: GitHub
Except for BioT5 and SciBERT which are automatically downloaded when you start training or evaluating, you need to prepare SwinOCSR and GIN-MoMu's checkpoint from the above link, then put it into weights/.
- LPM-24: HuggingFace
- CheBI-20: HuggingFace
Because the datasets are automatically downloaded from HuggingFace, please send access request and login by following command:
huggingface-cli login --token '<hf_token>'python train.py --epochs 20 --batch_size 8 \
--grad_accum 32 --warmup_ratio 0.05 --lr 3e-5 --num_devices 4 \
--dataset_name lpm-24 --model_config src/configs/config_lpm24_train.yaml \
--cudapython train.py --epochs 50 --batch_size 8 \
--grad_accum 32 --warmup_ratio 0.04 --lr 1e-4 --num_devices 4 \
--dataset_name chebi-20 --model_config src/configs/config_chebi20_train.yaml \
--cuda| Checkpoints | Download link |
|---|---|
| LPM-24 (SMILES off, Center blocks) | OneDrive |
| CheBI-20 (All modals, Center blocks) | OneDrive |
| Checkpoints | Download link |
|---|---|
| LPM-24 (Graph off, Center blocks) | OneDrive |
| LPM-24 (Vison off, Center blocks) | OneDrive |
| LPM-24 (All modals, First blocks) | OneDrive |
| LPM-24 (All modals, Center blocks) | OneDrive |
| LPM-24 (All modals, Last blocks) | OneDrive |
| LPM-24 (All modals, Full blocks) | OneDrive |
python eval.py --dataset_name lpm-24 \
--model_config src/configs/config_lpm24_train.yaml \
--checkpoint_path path/to/ckpt \
--cudapython eval.py --dataset_name chebi-20 \
--model_config src/configs/config_chebi20_train.yaml \
--checkpoint_path path/to/ckpt \
--cudaYou can interact with the model through a user interface by running the following command:
python app.pyThe terminal will provide a local URL for testing and a public URL for global sharing.
If you are interested in my paper, please cite:
@ARTICLE{11012653,
author={Tran, Duong Thanh and Nguyen, Nguyen Doan Hieu and Pham, Nhat Truong and Rakkiyappan, Rajan and Karki, Rajendra and Manavalan, Balachandran},
journal={IEEE Journal of Biomedical and Health Informatics},
title={XMolCap: Advancing Molecular Captioning through Multimodal Fusion and Explainable Graph Neural Networks},
year={2025},
volume={},
number={},
pages={1-12},
keywords={Biological system modeling;Feature extraction;Chemicals;Bioinformatics;Accuracy;Training;Data models;Data mining;Transformers;Encoding;Explainable artificial intelligence;graph neural networks;language and molecules;large language models;molecular captioning;model interpretation;multimodal fusion},
doi={10.1109/JBHI.2025.3572910}}

