Skip to content

Shishir-Ashok/masters_thesis

Repository files navigation

Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning

This repository contains all code and documentation for my thesis project "Optimizing Class-Imbalanced Chest X-Ray Disease Classification with Class-Balanced Learning" supervised by Prof. Rafael de Andrade Moral (Department of Mathematics & Statistics, Maynooth University, 2025). The project addresses automated multi-label thoracic disease classification in chest X-ray images, focusing on class imbalance and the critical reduction of missed diagnoses.


Table: Comprehensive Performance Comparison (All Experimental Configurations)

Experiment Configuration Total FP Total FN Mean AUC Total Images
Baseline No class/pos weights 17,105 21,879 0.7098 108,492
Effective Weight β = 0.999, included No Finding 14,573 22,419 0.7032 108,492
Effective Weight β = 0.999, excluded No Finding 1,044 22,974 0.7874 48,131
Inverse Frequency Excluded No Finding 36,676 10,791 0.7433 48,131
Inverse Frequency Included No Finding 40,125 16,001 0.7487 108,492

Example Multi-Label Class Probability Table

Class Probability
Atelectasis 0.0735
Cardiomegaly 0.0003
Consolidation 0.0377
Edema 0.0086
Effusion 0.8868
Emphysema 0.0001
Infiltration 0.3509
Mass 0.9483
Nodule 0.8672
Pleural Thickening 0.7313
Pneumothorax 0.0032
No Finding 0.2788

Corresponding Grad-CAM Visualizations Example:

  • Effusion Grad-CAM
  • Mass Grad-CAM
  • Nodule Grad-CAM
  • Pleural Thickening Grad-CAM

These interpretable visualizations highlight model attention for each class, offering transparency for real-world usage.


Dataset and Project Overview

  • Dataset: NIH ChestX-ray14 (released by the National Institutes of Health, USA)
    • 108,948 frontal chest X-rays from 32,717 unique patients, labeled across 14 disease categories plus "No Finding"
    • Acquired from NIH Clinical Center (1992-2015), includes AP/PA views in PNG format, originally 1024x1024px (resized to 224x224 for model input)
    • Substantial class imbalance (over half are "No Finding"); many rare conditions occur in <1% of cases
  • Task: Multi-label disease prediction (one image can have multiple diseases)
  • Purpose: Focus on reducing missed diagnoses (false negatives) with clinically practical sensitivity/specificity trade-off

Model and Training Details

  • Backbone: DenseNet-121 (transfer learning from ImageNet)
  • Key methods:
    • Class re-weighting (inverse frequency, effective number)
    • Binary cross-entropy loss and variants (PyTorch implementation)
    • SmoothGradCAM/Grad-CAM for heatmaps (explainable X-ray decision visualization)
    • Sigmoid Activation (multi-label classification)
  • Hyperparameters: Adam (lr=1e-4), batch size 32, 15 epochs, input 224x224
  • Acceleration: CUDA-enabled GPU

Major Findings and Limitations

  • Sensitivity vs. Specificity: Inverse frequency weights with "No Finding" included delivered the best balance, reducing false negatives by ~30% versus baseline, at the cost of more false positives—an acceptable trade-off for screening use cases.
  • Imbalance Mitigation: Effective number weighting worked best to minimize false positives, but missed more actual cases (higher false negative count).
  • General Limits: Results reflect the source institution's demographics/conditions; external validation is necessary for deployment elsewhere. Original images are downscaled, possibly losing subtle features.

Project Structure and Guide

  1. Open terminal in chest_xray folder:
    • pip install -r requirements.txt
  2. (Optional, for reproducing full metrics): python download_data.py to download and resize data
    • Pretrained model checkpoints (all strategies) in saved_models
    • Example images for CAM viz: data/sample_images
    • All analysis: notebooks/
      • chesnet.ipynb – EDA, data prep
      • excluding_no_finding.ipynb – Train without 'No Finding'
      • including_no_finding.ipynb – Verify best performing model/metrics

Contact

  • Author: Shishir Ashoka Chandra Mouli
  • Supervisor: Prof. Rafael de Andrade Moral
  • Institution: Maynooth University, Ireland
  • Year: 2025

License

This repository/code is released for academic/research use.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors