Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning

This repository contains all code and documentation for my thesis project "Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning" supervised by Prof. Rafael de Andrade Moral (Department of Mathematics & Statistics, Maynooth University, 2025). The project addresses automated multi-label thoracic disease classification in chest X-ray images, focusing on class imbalance and the critical reduction of missed diagnoses.

Table: Comprehensive Performance Comparison (All Experimental Configurations)

Experiment	Configuration	Total FP	Total FN	Mean AUC	Total Images
Baseline	No class/pos weights	17,105	21,879	0.7098	108,492
Effective Weight	β = 0.999, included No Finding	14,573	22,419	0.7032	108,492
Effective Weight	β = 0.999, excluded No Finding	1,044	22,974	0.7874	48,131
Inverse Frequency	Excluded No Finding	36,676	10,791	0.7433	48,131
Inverse Frequency	Included No Finding	40,125	16,001	0.7487	108,492

Example Multi-Label Class Probability Table

Class	Probability
Atelectasis	0.0735
Cardiomegaly	0.0003
Consolidation	0.0377
Edema	0.0086
Effusion	0.8868
Emphysema	0.0001
Infiltration	0.3509
Mass	0.9483
Nodule	0.8672
Pleural Thickening	0.7313
Pneumothorax	0.0032
No Finding	0.2788

Corresponding Grad-CAM Visualizations Example:

These interpretable visualizations highlight model attention for each class, offering transparency for real-world usage.

Dataset and Project Overview

Dataset: NIH ChestX-ray14 (released by the National Institutes of Health, USA)
- 108,948 frontal chest X-rays from 32,717 unique patients, labeled across 14 disease categories plus "No Finding"
- Acquired from NIH Clinical Center (1992-2015), includes AP/PA views in PNG format, originally 1024x1024px (resized to 224x224 for model input)
- Substantial class imbalance (over half are "No Finding"); many rare conditions occur in <1% of cases
Task: Multi-label disease prediction (one image can have multiple diseases)
Purpose: Focus on reducing missed diagnoses (false negatives) with clinically practical sensitivity/specificity trade-off

Model and Training Details

Backbone: DenseNet-121 (transfer learning from ImageNet)
Key methods:
- Class re-weighting (inverse frequency, effective number)
- Binary cross-entropy loss and variants (PyTorch implementation)
- SmoothGradCAM/Grad-CAM for heatmaps (explainable X-ray decision visualization)
- Sigmoid Activation (multi-label classification)
Hyperparameters: Adam (lr=1e-4), batch size 32, 15 epochs, input 224x224
Acceleration: CUDA-enabled GPU

Major Findings and Limitations

Sensitivity vs. Specificity: Inverse frequency weights with "No Finding" included delivered the best balance, reducing false negatives by ~30% versus baseline, at the cost of more false positives—an acceptable trade-off for screening use cases.
Imbalance Mitigation: Effective number weighting worked best to minimize false positives, but missed more actual cases (higher false negative count).
General Limits: Results reflect the source institution's demographics/conditions; external validation is necessary for deployment elsewhere. Original images are downscaled, possibly losing subtle features.

Project Structure and Guide

Open terminal in chest_xray folder:
- pip install -r requirements.txt
(Optional, for reproducing full metrics): python download_data.py to download and resize data
- Pretrained model checkpoints (all strategies) in saved_models
- Example images for CAM viz: data/sample_images
- All analysis: notebooks/
  - chesnet.ipynb – EDA, data prep
  - excluding_no_finding.ipynb – Train without 'No Finding'
  - including_no_finding.ipynb – Verify best performing model/metrics

Contact

Author: Shishir Ashoka Chandra Mouli
Supervisor: Prof. Rafael de Andrade Moral
Institution: Maynooth University, Ireland
Year: 2025

License

This repository/code is released for academic/research use.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Figures		Figures
data		data
notebooks		notebooks
saved_models		saved_models
.gitignore		.gitignore
README.md		README.md
Shishir_Thesis.pdf		Shishir_Thesis.pdf
download_data.py		download_data.py
requirements.txt		requirements.txt
thesis_bibliography.bib		thesis_bibliography.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning

Table: Comprehensive Performance Comparison (All Experimental Configurations)

Example Multi-Label Class Probability Table

Dataset and Project Overview

Model and Training Details

Major Findings and Limitations

Project Structure and Guide

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning

Table: Comprehensive Performance Comparison (All Experimental Configurations)

Example Multi-Label Class Probability Table

Dataset and Project Overview

Model and Training Details

Major Findings and Limitations

Project Structure and Guide

Contact

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages