XMolCap

📝 Paper | 🤗 Benchmark datasets | 🚩 Checkpoints | ⚙️ Application | 📚 Cite our paper!

The official implementation of manuscript "XMolCap: Advancing Molecular Captioning through Multimodal Fusion and Explainable Graph Neural Networks"

Abstract

Large language models (LLMs) have significantly advanced computational biology by enabling the integration of molecular, protein, and natural language data to accelerate drug discovery. However, existing molecular captioning approaches often underutilize diverse molecular modalities and lack interpretability. In this study, we introduce XMolCap, a novel explainable molecular captioning framework that integrates molecular images, SMILES strings, and graph-based structures through a stacked multimodal fusion mechanism. The framework is built upon a BioT5-based encoder-decoder architecture, which serves as the backbone for extracting feature representations from SELFIES. By leveraging specialized models such as SwinOCSR, SciBERT, and GIN-MoMu, XMolCap effectively captures complementary information from each modality. Our model not only achieves state-of-the-art performance on two benchmark datasets (L+M-24 and ChEBI-20), outperforming several strong baselines, but also provides detailed, functional group-aware, and property-specific explanations through graph-based interpretation. XMolCap is publicly available at https://github.com/cbbl-skku-org/XMolCap/ for reproducibility and local deployment. We believe it holds strong potential for clinical and pharmaceutical applications by generating accurate, interpretable molecular descriptions that deepen our understanding of molecular properties and interactions.

News

2025.05.16: Happy to announce that our manuscript was accepted 🎉🎉🎉 (DOI: 10.1109/JBHI.2025.3572910).
2025.02.14: XMolCap was considered for major revision.
2024.12.03: Manuscript was submitted to IEEE Journal of Biomedical and Health Informatics (IEEE JBHI).

How to use

1. Environment preparation

Create an environment using Miniconda or Conda:

conda create -n XMolCap python=3.10
conda activate XMolCap

After cloning the repo, run the following command to install required packages:

# installing pytorch, recommend vervion 2.1.2 or above, you should change cuda version based on your GPU devices
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121

# installing additional packages
pip install -r requirements.txt

# install additional packages for Torch Geometric, cuda version should match with torch's cuda version
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.2+cu121.html

2. Pretrained models

We use these pretrained models for fine-tuning:

BioT5: HuggingFace
SwinOCSR: Kaggle
SciBERT: HuggingFace
GIN-MoMu: GitHub

Except for BioT5 and SciBERT which are automatically downloaded when you start training or evaluating, you need to prepare SwinOCSR and GIN-MoMu's checkpoint from the above link, then put it into weights/.

3. Benchmark datasets

LPM-24: HuggingFace
CheBI-20: HuggingFace

Because the datasets are automatically downloaded from HuggingFace, please send access request and login by following command:

huggingface-cli login --token '<hf_token>'

3. Training model

LPM-24 dataset:

python train.py --epochs 20 --batch_size 8 \
                --grad_accum 32 --warmup_ratio 0.05 --lr 3e-5 --num_devices 4 \
                --dataset_name lpm-24 --model_config src/configs/config_lpm24_train.yaml \ 
                --cuda

CheBI-20 dataset:

python train.py --epochs 50 --batch_size 8 \
                --grad_accum 32 --warmup_ratio 0.04 --lr 1e-4 --num_devices 4 \
                --dataset_name chebi-20 --model_config src/configs/config_chebi20_train.yaml \ 
                --cuda

4. Evaluating model

Main checkpoints

Checkpoints	Download link
LPM-24 (SMILES off, Center blocks)	OneDrive
CheBI-20 (All modals, Center blocks)	OneDrive

Ablation studies' checkpoints

Checkpoints	Download link
LPM-24 (Graph off, Center blocks)	OneDrive
LPM-24 (Vison off, Center blocks)	OneDrive
LPM-24 (All modals, First blocks)	OneDrive
LPM-24 (All modals, Center blocks)	OneDrive
LPM-24 (All modals, Last blocks)	OneDrive
LPM-24 (All modals, Full blocks)	OneDrive

Evaluate on LPM-24

python eval.py --dataset_name lpm-24 \
               --model_config src/configs/config_lpm24_train.yaml \
               --checkpoint_path path/to/ckpt \
               --cuda

Evaluate on CheBI-20

python eval.py --dataset_name chebi-20 \
               --model_config src/configs/config_chebi20_train.yaml \
               --checkpoint_path path/to/ckpt \
               --cuda

5. Application

Start the app

You can interact with the model through a user interface by running the following command:

python app.py

The terminal will provide a local URL for testing and a public URL for global sharing.

Preview

Citation

If you are interested in my paper, please cite:

@ARTICLE{11012653,
  author={Tran, Duong Thanh and Nguyen, Nguyen Doan Hieu and Pham, Nhat Truong and Rakkiyappan, Rajan and Karki, Rajendra and Manavalan, Balachandran},
  journal={IEEE Journal of Biomedical and Health Informatics}, 
  title={XMolCap: Advancing Molecular Captioning through Multimodal Fusion and Explainable Graph Neural Networks}, 
  year={2025},
  volume={},
  number={},
  pages={1-12},
  keywords={Biological system modeling;Feature extraction;Chemicals;Bioinformatics;Accuracy;Training;Data models;Data mining;Transformers;Encoding;Explainable artificial intelligence;graph neural networks;language and molecules;large language models;molecular captioning;model interpretation;multimodal fusion},
  doi={10.1109/JBHI.2025.3572910}}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
assets		assets
src		src
weights		weights
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app_config.py		app_config.py
eval.py		eval.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XMolCap

Abstract

News

How to use

1. Environment preparation

2. Pretrained models

3. Benchmark datasets

3. Training model

LPM-24 dataset:

CheBI-20 dataset:

4. Evaluating model

Main checkpoints

Ablation studies' checkpoints

Evaluate on LPM-24

Evaluate on CheBI-20

5. Application

Start the app

Preview

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XMolCap

Abstract

News

How to use

1. Environment preparation

2. Pretrained models

3. Benchmark datasets

3. Training model

LPM-24 dataset:

CheBI-20 dataset:

4. Evaluating model

Main checkpoints

Ablation studies' checkpoints

Evaluate on LPM-24

Evaluate on CheBI-20

5. Application

Start the app

Preview

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages