Skip to content

Pipaup/ResUnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ResUnet — Lightweight Medical Image Segmentation

朝花夕拾 (Postscript) : Built in 2024, back when large models were still in their "village idiot" era — every line of code had to be hand-typed, every bug hunted down the hard way. A small memorial to my undergraduate years, and to the age of artisanal, hand-crafted programming.

A pure CNN segmentation network that outperforms mainstream Transformer-based models across three public benchmarks — without any attention mechanisms, Transformer modules, or deformable convolutions. Built entirely from standard components: 3×3 Conv + BatchNorm + ReLU.

Core insight: The underperformance of traditional CNNs in medical segmentation stems not from a lack of attention mechanisms, but from three long-overlooked architectural adaptation issues when migrating ResNet from classification to dense prediction.

📖 中文版本:README_CN.md

Why ResUnet?

Most recent work follows an "additive innovation" pattern — stacking Transformers, attention gates, and deformable convolutions. ResUnet takes the opposite approach: fix the architectural flaws in ResNet itself. Three targeted changes:

Baseline Res34-UNet ResUnet (final)

Left: baseline Res34-UNet  |  Right: ResUnet v1.0 — MaxPool removed, layer1 skip dropped, decoder channels preserved.

1. High-Resolution Encoder → Small Organs Survive

The traditional ResNet Stem module applies 7×7 conv (stride 2) + MaxPool, downsampling input to 1/4 resolution immediately. Fine for ImageNet classification (only needs to know what), but in medical segmentation you need to know where the boundary is — pancreas and gallbladder shrink to 1-2 pixels after 4× downsampling.

Fix: Remove the MaxPool layer. Stem downsampling drops from 4× to 2×, and the encoder's max downsampling from 32× to 16×.

Result: Dice 79.02% → 82.03% (+3.01%), HD95 26.92 → 17.65 (-9.27), Pancreas 61.39% → 65.49% (+4.10%)

2. Hierarchical Feature Selection → Block Noise from Polluting Semantics

Traditional U-Net blindly preserves all skip connections. But layer1's high-resolution features are contaminated with low-level imaging noise (CT beam-hardening streaks, MRI motion artifacts), which disrupt high-level semantic fusion when fed into the decoder.

Fix: Remove layer1's skip connection, keeping only layer2–layer4 features. Decomposed as Flayer1 = ℐspatial + Nnoise — when deep semantics are strong enough, the noise Nnoise from layer1 far outweighs its spatial localization value.

Result: Gallbladder Dice 65.53% → 70.92% (+5.39%)

3. Asymmetric Decoder → Small Organ Features Stop Being "Diluted"

Traditional symmetric decoders habitually compress channel counts after every feature fusion step ("fuse then compress"). Acceptable for large objects in natural images, but small-organ features in medical images are already extremely sparse — compressing channels actively discards weak but critical signals.

Fix (three sub-components):

  • Bottleneck Enhancement: Dual 3×3 convs at encoder tail to strengthen context (Dice +0.96%)
  • Channel-Preserving Fusion: Layer3 keeps 512 channels (768→512, not 768→256)
  • Channel-Preserving Upsampling: Layer2 keeps 384→384, not 384→192

Result: Dice 83.50% → 84.42% (+0.92%), Aorta +1.56%, Right Kidney +2.95%

Performance at a Glance

Dataset Modality Classes ResUnet Dice Previous Best Params
Synapse Abdominal CT 9 84.42% PVT-EMCAD-B2 83.63% 21.3M
ACDC Cardiac MRI 4 92.17% PVT-EMCAD-B2 92.12% 21.3M
AVT Aortic Angiography 2 87.11% DAE-Former 85.02% 21.3M

HD95 also leads: Synapse 13.03 (next best 15.68), AVT 4.44mm (next best 8.96mm — nearly 2× worse).

Runs on a single GPU — all experiments on one RTX 3090.

Qualitative Results

Synapse multi-organ segmentation

Synapse multi-organ CT — ResUnet vs. competing methods across 8 abdominal organs.

AVT aortic vessel segmentation ACDC cardiac MRI segmentation

Left: AVT aortic angiography  |  Right: ACDC cardiac MRI.

Supported Models

Multiple U-Net variants are implemented for comparison:

Model Type Notes
ResUnet (v0.1–v1.0) Pure CNN ResNet34 encoder + asymmetric decoder
ResUnet++ Pure CNN Dense skip connections
Vanilla U-Net Pure CNN Classic baseline
EMCADNet CNN + Channel Attention PVTv2/ResNet encoder + EMCAD decoder
TransUNet CNN + ViT Hybrid ResNet50 + ViT encoder
SwinUnet Pure Transformer Swin Transformer encoder-decoder

ResUnet ablation ladder (v0.1 → v1.0)

Each step corresponds to one specific paper improvement; every file under Synapse_u/model/ carries a top-of-file docstring explaining its role.

Version Paper role Key change
v0.1 baseline ResNet34 (no pretrain) + full stem + all 5 skips
v0.2 training config + ImageNet pretrain (not part of the ablation)
v0.3 Improvement ① Drop MaxPool from encoder stem (resolution preservation)
v0.4 Improvement ② Drop layer1 skip (hierarchical feature selection)
v0.5 Improvement ③a Bottleneck enhancement
v0.6 Improvement ③b Layer3 channel preservation
v0.7 unused branch Re-introduces layer1 skip (opposite of ②, kept for reference)
v1.0 ★ Final model Improvement ③c: up2 channel preservation, all changes integrated

Setup

conda env create -f ResUnet3.yml
conda activate ResUnet
python -c "import torch; print(torch.cuda.is_available())"

Requirements: Python 3.9.20 · PyTorch 2.4.1 · CUDA 12.4

Data Preparation

data/ResCNN/data/
├── Synapse/
│   ├── train_npz/         # Training .npz files (image + label)
│   └── test_vol/          # Testing .h5 volumes
├── AVT/
│   ├── train_npz/
│   └── test_vol/          # .npy.h5 volumes
└── ACDC/                  # Separate structure

Split files are in Synapse_u/lists/, one case ID per line.

Training

Unified entry point train.py: pick the model with --model, the dataset with --dataset.

# Final paper model
python train.py --model ResUnet1_0 --dataset Synapse
python train.py --model ResUnet1_0 --dataset AVT

# Ablation ladder (the three paper improvements applied step by step)
python train.py --model ResUnet0_1 --dataset Synapse        # baseline
python train.py --model ResUnet0_3 --dataset Synapse        # +Improvement (1) drop MaxPool
python train.py --model ResUnet0_4 --dataset Synapse        # +Improvement (2) drop layer1 skip
python train.py --model ResUnet0_5 --dataset Synapse        # +Improvement (3a) bottleneck
python train.py --model ResUnet0_6 --dataset Synapse        # +Improvement (3b) layer3 channels

# Comparison models
python train.py --model EMCAD     --dataset Synapse
python train.py --model TransUnet --dataset Synapse
python train.py --model SwinUnet  --dataset Synapse
python train.py --model ResUnetpp --dataset AVT
python train.py --model Unet      --dataset AVT

# List every supported model with its paper role
python train.py --list-models

# Laptop / 3060 smoke test (does not actually train, just verifies the pipeline)
python train.py --model ResUnet1_0 --dataset Synapse --max_epochs 1 --batch_size 1 --eval_interval 1

The legacy train_Synapse_*.py / train_AVT_*.py scripts are kept for reference and remain equivalent to train.py. New work should use train.py.

Common Arguments

Argument Default Description
--model required See python train.py --list-models
--dataset Synapse Synapse or AVT
--max_epochs 200 Training epochs
--batch_size 8 Batch size per GPU
--base_lr 0.02 Initial LR (SGD)
--img_size 224 Input spatial size
--eval_interval 25 Eval interval (epochs)

Training Details

  • Loss: 0.4 x CrossEntropyLoss + 0.6 x DiceLoss (0.3 + 0.7 for ACDC)
  • Optimizer: SGD, momentum=0.9, weight_decay=1e-4, polynomial LR decay
  • Evaluation: Starts at epoch >= max_epochs/1.5, runs every eval_interval, plus last 3 epochs
  • Checkpoints: Saved under checkpoints_save/<model_name>_SGD_<lr>_<epochs>_<batch>_<dataset>/

Testing

Unified entry point test.py:

python test.py --model ResUnet1_0 --dataset Synapse \
    --model_name vResUnet1_0_v1_0_0_SGD_0.02_200_8_Synapse/vResUnet1_0_v1_0_0_epoch_199.pth

python test.py --model EMCAD --dataset AVT \
    --model_name vEMCADNet_v1_0_0_SGD_0.02_200_8_AVT/vEMCADNet_v1_0_0_epoch_199.pth

# Laptop smoke test (only first case, no .nii.gz output)
python test.py --model ResUnet1_0 --dataset Synapse \
    --model_name vResUnet1_0_v1_0_0_SGD_0.02_200_8_Synapse/vResUnet1_0_v1_0_0_epoch_199.pth \
    --max_cases 1 --is_savenii False
Argument Description
--model Required, must match the model used during training
--dataset Synapse or AVT
--model_name Path relative to checkpoints_save/
--max_cases Run only the first N cases; 0 = all (default)
--is_savenii True to save .nii.gz predictions (default False)

The legacy test_Synapse.py / test_AVT.py / test_*_TransUnet.py are kept, but test.py is preferred — it shares the same MODEL_REGISTRY as train.py, so the model class never gets mismatched with the weights.

Metrics

  • Dice Score (DSC): Overlap between prediction and ground truth (higher is better)
  • 95% Hausdorff Distance (HD95): Boundary distance (lower is better)

Computed via medpy.metric.binary. Results reported per-class and averaged over foreground classes.

Project Structure

ResUnet/
├── train.py / test.py            # ★ Unified entry points (recommended)
├── train_Synapse_*.py / train_AVT_*.py / test_*.py  # Legacy scripts, kept for reference
├── Synapse_u/                    # Core library
│   ├── trainer.py                # Training loop
│   ├── utils.py                  # DiceLoss, metrics, 3D inference
│   ├── datasets/                 # Dataset classes
│   ├── model/                    # All model implementations
│   │   ├── ResUnet0_1.py ~ 0_7.py, ResUnet1_0.py    # Ablation ladder (each file documents its role)
│   │   ├── ResUnetpp.py, Unet.py                    # Baselines
│   │   ├── v3_8_19.py, v3_8_19_2.py                 # PPM exploration (not used in paper)
│   │   ├── EMCAD/                # EMCADNet
│   │   ├── TransUNet/            # ViT hybrid
│   │   └── SwinUnet/             # Swin Transformer
│   └── lists/                    # Train/test splits
├── ACDC/                         # ACDC dataset standalone subproject (kept as-is)
├── checkpoints_save/             # Training outputs (auto-generated, gitignored)
└── data/ResCNN/data/             # Dataset storage (gitignored)

Checkpoint Directory

Training outputs at checkpoints_save/<model_name>_...:

  • <name>_epoch_<N>.pth — Model weights at epoch N
  • log/ — TensorBoard event files
  • <name>_<timestamp>_loss.csv — Training loss
  • <name>_<timestamp>_dice.png / _hd95.png — Metric plots
  • <name>_<timestamp>_results.csv — Evaluation data

Notes

  • Input: single-channel grayscale → replicated to 3 channels for ImageNet-pretrained encoder compatibility
  • TransUNet requires additional ViT pretrained weights — paths in vit_seg_configs.py
  • Multi-GPU supported via --n_gpu and nn.DataParallel
  • ACDC dataset uses a separate codebase under ACDC/; use Synapse_u/ as the primary reference

Citation

If you find this work useful, please cite:

@mastersthesis{resunet2026,
  title={ResUnet: Fixing ResNet's Adaptation Defects for Medical Image Segmentation},
  year={2026},
  school={Shaoxing University}
}

About

Pure CNN — 84.42% Dice on Synapse, single RTX 3090. No attention, no deformable conv. ResUnet proves fixing architecture > adding complexity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages