Implemention of paper 《Structure Observation Driven Image-Text Contrastive Learning for Computed Tomography Report Generation》
- Download the dataset from CT-RATE. And then preprocess data use scripts in ./data_preprocess (update the datepath in corresponding script first).
- Download the CT-CLIP model from https://github.com/ibrahimethemhamamci/CT-CLIP.
- Download the pretrained text encoder: CXR-BERT-general.
- Download the LLM text decoder: LLaMA-2-7B.
- Download the pre-extracted text embedding from Here.
- Install dependencies:
pip install -r requirements_final.txt
- Install CT-CLIP package following its official instructions.
- install ctvit:
cd ctvit pip install -e .
- Pretrain: Update the model and data paths in /CTRG/config.py, run
./run_scripts/pretrain_3D.sh
- To reduce memory consumption during finetuning, extract visual features after pretraining:
Set the pretrained model path in pretrained_visual_feature_extract.py, run
./run_scripts/visual_feature_extract.sh
- Finetuning: Update /CTRG/config.py with the correct model and data paths (including the pretraining checkpoint from step 1).
./run_scripts/finetune_rg.sh
Run the inference script, then a .txt file containing the generated reports will be saved in the checkpoint folder. You can then compute your desired evaluation metrics (e.g., BLEU, ROUGE, etc.) on this output:
./run_scripts/finetune_rg_test.sh