RadGazeIntent is a deep learning framework that interprets the diagnostic intentions behind radiologists' eye movements during chest X-ray analysis. Unlike existing approaches that mimic radiologist behavior, our method decodes the why behind each fixation point, bridging visual search patterns with diagnostic reasoning.
Paper accepted at ACM MM 2025 - A top-tier international conference on multimedia research 🎉 🎉 🎉
# Remove existing environment (if any)
conda env remove --name radgazeintent
# Create new environment
conda create -n radgazeintent python=3.8 -y
conda activate radgazeintent
# Install PyTorch and dependencies
conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch -y
conda install mkl==2024.0
conda install -c conda-forge cudatoolkit-dev=11.3.1
# Install Detectron2
pip install 'git+https://github.com/facebookresearch/detectron2.git'
# Build custom CUDA operations
cd ./radgazeintent/pixel_decoder/ops
sh make.sh
# Install additional dependencies
pip install uv
uv pip install timm scipy opencv-python wget setuptools==59.5.0 einops protobuf==4.25.0RadGazeIntent introduces three intention-labeled datasets derived from existing eye-tracking datasets (EGD and REFLACX):
Download: All three datasets are available on 🤗 Hugging Face
Models radiologists following a structured checklist, focusing on one finding at a time.
Captures opportunistic visual search where radiologists consider all findings simultaneously.
Combines initial broad scanning with focused examination, representing real-world diagnostic behavior.
# Train RadGazeIntent on RadSeq dataset
python train.py \
--hparams configs/train_for_real_egd_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0
# Train on RadExplore dataset
python train.py \
--hparams configs/train_for_real_reflacx_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0# Run inference on all test datasets
python infer_all.py \
--hparams configs/infer_egd_s2.json \
--dataset-root /path/to/your/dataset \
--gpu-id 0
# Quick inference with shell script
bash infer.shRadGazeIntent enables several downstream applications:
- Intention-aware AI Assistants: Systems that understand what radiologists are looking for
- Medical Education: Training tools that analyze student gaze patterns
- Cognitive Research: Understanding expert visual reasoning processes
If you find RadGazeIntent useful in your research, please consider citing:
@article{pham2025interpreting,
title={Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis},
author={Pham, Trong-Thang and Nguyen, Anh and Deng, Zhigang and Wu, Carol C and Van Nguyen, Hien and Le, Ngan},
journal={arXiv preprint arXiv:2507.12461},
year={2025}
}This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License - see the LICENSE file for details.
This material is based upon work supported by the National Science Foundation (NSF) under Award No OIA-1946391, NSF 2223793 EFRI BRAID, National Institutes of Health (NIH) 1R01CA277739-01.
- Datasets: Built upon EGD and REFLACX eye-tracking datasets
- Backbone: Uses Detectron2 for feature extraction
- Inspiration: Motivated by cognitive science research on expert visual reasoning
Primary Contact: Trong Thang Pham (tp030@uark.edu)
For questions, feedback, or collaboration opportunities, feel free to reach out! I would love to hear from you if you have any thoughts or suggestions about this work.
Note: While we don't actively seek contributions to the codebase, we greatly appreciate and welcome feedback, discussions, and suggestions for improvements.

