Deep learning-based image denoising using a lightweight U-Net architecture trained on real smartphone camera noise. Achieved 33.95 dB average PSNR with targeted data augmentation to address class imbalance.
Best denoising results (top row) and most challenging cases (bottom row) from test set
| Metric | Value |
|---|---|
| Average PSNR | 33.95 dB |
| Average SSIM | 0.8538 |
| Worst Case PSNR | 23.08 dB |
| Best Case PSNR | 40.17 dB |
| Standard Deviation | 3.33 dB |
Worst case improvement: Bright/colourful images improved from 17-18 dB (baseline) to 23+ dB through targeted preprocessing augmentation β a +6 dB gain on previously failing cases.
# Clone repository
git clone https://github.com/kimbielby/Image-Denoising.git
cd image-denoising
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Download the Smartphone Image Denoising Dataset from Kaggle
- Extract the dataset
- Place the image directories in
data/images/og_images/- Should contain subdirectories with GT (ground truth) and NOISY image pairs
Your directory structure should look like:
data/
βββ images/
βββ og_images/
βββ 0001_001_S6_00100_0060_3200_L/
β βββ GT_SRGB_010.PNG
β βββ NOISY_SRGB_010.PNG
βββ 0002_001_S6_00100_00020_3200_N/
βββ ...
# Train with default configuration
jupyter notebook notebooks/01_training.ipynb
# Or use the training pipeline directly
python -c "from pipelines import run_training_pipeline; from configs import load_config; config = load_config('configs/default.yaml'); run_training_pipeline(config)"Download the pre-trained model from Releases:
Option 1: Manual Download
- Go to Releases
- Download
best_model.pth(~30 MB) - Place in
runs/best_model.pth
Option 2: Command Line
# Using wget
wget https://github.com/kimbielby/Image-Denoising/releases/download/v1.0/best_model.pth -O runs/best_model.pth
# Or using curl
curl -L https://github.com/kimbielby/Image-Denoising/releases/download/v1.0/best_model.pth -o runs/best_model.pthUsage:
from models import UNet
from utils.checkpoint_utils import load_checkpoint_inference
# Load pre-trained model
model = UNet(in_channels=3, out_channels=3, init_features=32)
model = load_checkpoint_inference("runs/best_model.pth", model, device="cuda")# Denoise a single image
jupyter notebook notebooks/03_inference.ipynb- Initial training on 5,922 image pairs achieved 32.87 dB average PSNR
- Worst cases were all bright/colourful images which scored 17 - 18dB
- Root cause analysis revealed severe class imbalance: only 5.6% of training data consisted of bright images
- Created 5Γ geometric augmentations (flips, rotations) for each bright image
- Increased bright image representation from 5.6% β 24% of training data
- Result: Worst cases improved to 23+ dB (+6 dB improvement) and average performance was 33.95 dB
Runtime ColorJitter Augmentation
- Attempted runtime colour augmentation to increase data diversity
- Even with conservative settings (brightness=0.2, contrast=0.2) and clamping, this caused training divergence at epochs 4-7 across multiple runs
- Preprocessing augmentation proved more stable
CombinedLoss (MSE + SSIM)
- Looked into perceptual loss combining MSE and SSIM for better texture preservation
- Required a significantly lower learning rate (3e-5 vs 1e-4) and showed training instability
- MSELoss was chosen for production reliability
Lightweight U-Net (7.77M parameters, ~30 MB)
- 4-level encoder-decoder with skip connections
- DoubleConv blocks (Conv β BatchNorm β ReLU Γ 2)
- MaxPool2D for downsampling
- ConvTranspose2D for upsampling
- Input/Output: RGB images (512Γ512 patches)
Input (3, 512, 512)
β encoder1 (32 features)
β pool β encoder2 (64 features)
β pool β encoder3 (128 features)
β pool β encoder4 (256 features)
β pool β bottleneck (512 features)
β upconv + skip β decoder4 (256 features)
β upconv + skip β decoder3 (128 features)
β upconv + skip β decoder2 (64 features)
β upconv + skip β decoder1 (32 features)
β 1Γ1 conv
Output (3, 512, 512)
See models/model.py for implementation details.
- Initial training achieved good average metrics (32.87 dB) but had severe outliers (17-18 dB)
- Root cause analysis revealed bright images were underrepresented in training data
- Targeted approach: 5Γ augmentation of specifically bright images improved the worst cases by 6 dB
- Random approach: ColorJitter applied on all bright images caused training divergence
- CombinedLoss (MSE + SSIM) should theoretically preserve textures better, but in practice it caused training instability even with careful tuning
- MSELoss however provided stable training with good results
- Model peaked at different epochs across runs (epoch 15-30 typically)
- Early stopping with patience=15 prevented overfitting
- Validation every epoch (not every 10 as was originally set) provided better model selection
For detailed analysis, see RESULTS.md.
image-denoising/
βββ configs/
β βββ config.py # Configuration dataclasses and loader
β βββ default.yaml # All hyperparameters and settings
βββ data/
β βββ images/
β βββ og_images/ # Place downloaded dataset here
βββ dataloaders/
β βββ collate.py # Batch collation
β βββ dataloader.py # Dataset and DataLoader
βββ inference/
β βββ inference.py # Inference pipeline for new images
βββ models/
β βββ losses.py # Custom loss functions (CombinedLoss)
β βββ model.py # U-Net architecture
β βββ test.py # Testing and evaluation
β βββ train.py # Training loop with early stopping
β βββ validate.py # Validation function
βββ notebooks/
β βββ 01_training.ipynb # Interactive training
β βββ 02_evaluation.ipynb # Results analysis
β βββ 03_inference.ipynb # Inference demo
βββ pipelines/
β βββ complete.py # End-to-end workflow
β βββ inference_pipeline.py # Production inference
β βββ testing_pipeline.py # Evaluation workflow
β βββ training_pipeline.py # Training workflow
βββ preprocessing/
β βββ augment_inplace.py # Targeted bright image augmentation
β βββ crop_images.py # Image patching (512Γ512)
β βββ dataset_split.py # Train/val/test split
βββ utils/
β βββ analysis.py # Data analysis utilities
β βββ checkpoint_utils.py # Model checkpoint management
β βββ evaluation.py # Evaluation metrics and reporting
β βββ general.py # General utility functions
β βββ metrics.py # PSNR and SSIM calculation
β βββ reading_in.py # Image file loading
β βββ save_results.py # Results serialization
β βββ save_visualisations.py # Save all plots
β βββ visuals.py # Plotting and visualization
βββ imports.py # Centralized imports
βββ LICENSE # MIT License
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ RESULTS.md # Detailed analysis and findings
All hyperparameters are configurable via YAML:
# configs/default.yaml
model:
in_channels: 3
out_channels: 3
init_features: 32
loss:
name: MSELoss # or CombinedLoss
alpha: 0.8 # for CombinedLoss
train:
learning_rate: 1e-4
epochs: 200
batch_size: 16
patience: 15 # early stopping
preprocessing:
bright_threshold: 200.0
bright_copies: 5 # augmentation multiplier for bright images
random_augment: 50 # additional random augmentationsSee configs/default.yaml for all options.
Source: Smartphone Image Denoising Dataset
Preprocessing:
- Images cropped into 512Γ512 patches with padding
- Split: 70% train, 20% validation, 10% test
- Targeted augmentation: 5Γ copies of bright images (flips, rotations)
- Total training patches: 6,142 (after augmentation)
Statistics:
- Training: 6,142 image pairs
- Validation: 1,692 image pairs
- Test: 846 image pairs
- Optimizer: Adam
- Learning Rate: 1e-4
- Scheduler: ReduceLROnPlateau (factor=0.5, patience=5)
- Early Stopping: Patience=15 epochs on validation loss
- Mixed Precision: Enabled (CUDA only)
- Gradient Clipping: Max norm = 1.0
- Validation: Every epoch
- Hardware: NVIDIA GPU with 4-8GB VRAM
- U-Net Architecture: Ronneberger et al., "U-Net: Convolutional Networks for Biomedical Image Segmentation" (2015)
- Dataset: Smartphone Image Denoising Dataset
- SSIM Loss: Wang et al., "Image Quality Assessment: From Error Visibility to Structural Similarity" (2004)
MIT License - LICENSE
- Dataset provided by Rajat Gupta on Kaggle