ResUnet — Lightweight Medical Image Segmentation

朝花夕拾 (Postscript) : Built in 2024, back when large models were still in their "village idiot" era — every line of code had to be hand-typed, every bug hunted down the hard way. A small memorial to my undergraduate years, and to the age of artisanal, hand-crafted programming.

A pure CNN segmentation network that outperforms mainstream Transformer-based models across three public benchmarks — without any attention mechanisms, Transformer modules, or deformable convolutions. Built entirely from standard components: 3×3 Conv + BatchNorm + ReLU.

Core insight: The underperformance of traditional CNNs in medical segmentation stems not from a lack of attention mechanisms, but from three long-overlooked architectural adaptation issues when migrating ResNet from classification to dense prediction.

📖 中文版本：README_CN.md

Why ResUnet?

Most recent work follows an "additive innovation" pattern — stacking Transformers, attention gates, and deformable convolutions. ResUnet takes the opposite approach: fix the architectural flaws in ResNet itself. Three targeted changes:

_{Left: baseline Res34-UNet | Right: ResUnet v1.0 — MaxPool removed, layer1 skip dropped, decoder channels preserved.}

1. High-Resolution Encoder → Small Organs Survive

The traditional ResNet Stem module applies 7×7 conv (stride 2) + MaxPool, downsampling input to 1/4 resolution immediately. Fine for ImageNet classification (only needs to know what), but in medical segmentation you need to know where the boundary is — pancreas and gallbladder shrink to 1-2 pixels after 4× downsampling.

Fix: Remove the MaxPool layer. Stem downsampling drops from 4× to 2×, and the encoder's max downsampling from 32× to 16×.

Result: Dice 79.02% → 82.03% (+3.01%), HD95 26.92 → 17.65 (-9.27), Pancreas 61.39% → 65.49% (+4.10%)

2. Hierarchical Feature Selection → Block Noise from Polluting Semantics

Traditional U-Net blindly preserves all skip connections. But layer1's high-resolution features are contaminated with low-level imaging noise (CT beam-hardening streaks, MRI motion artifacts), which disrupt high-level semantic fusion when fed into the decoder.

Fix: Remove layer1's skip connection, keeping only layer2–layer4 features. Decomposed as Flayer1 = ℐspatial + Nnoise — when deep semantics are strong enough, the noise Nnoise from layer1 far outweighs its spatial localization value.

Result: Gallbladder Dice 65.53% → 70.92% (+5.39%)

3. Asymmetric Decoder → Small Organ Features Stop Being "Diluted"

Traditional symmetric decoders habitually compress channel counts after every feature fusion step ("fuse then compress"). Acceptable for large objects in natural images, but small-organ features in medical images are already extremely sparse — compressing channels actively discards weak but critical signals.

Fix (three sub-components):

Bottleneck Enhancement: Dual 3×3 convs at encoder tail to strengthen context (Dice +0.96%)
Channel-Preserving Fusion: Layer3 keeps 512 channels (768→512, not 768→256)
Channel-Preserving Upsampling: Layer2 keeps 384→384, not 384→192

Result: Dice 83.50% → 84.42% (+0.92%), Aorta +1.56%, Right Kidney +2.95%

Performance at a Glance

Dataset	Modality	Classes	ResUnet Dice	Previous Best	Params
Synapse	Abdominal CT	9	84.42%	PVT-EMCAD-B2 83.63%	21.3M
ACDC	Cardiac MRI	4	92.17%	PVT-EMCAD-B2 92.12%	21.3M
AVT	Aortic Angiography	2	87.11%	DAE-Former 85.02%	21.3M

HD95 also leads: Synapse 13.03 (next best 15.68), AVT 4.44mm (next best 8.96mm — nearly 2× worse).

Runs on a single GPU — all experiments on one RTX 3090.

Qualitative Results

_{Synapse multi-organ CT — ResUnet vs. competing methods across 8 abdominal organs.}

_{Left: AVT aortic angiography | Right: ACDC cardiac MRI.}

Supported Models

Multiple U-Net variants are implemented for comparison:

Model	Type	Notes
ResUnet (v0.1–v1.0)	Pure CNN	ResNet34 encoder + asymmetric decoder
ResUnet++	Pure CNN	Dense skip connections
Vanilla U-Net	Pure CNN	Classic baseline
EMCADNet	CNN + Channel Attention	PVTv2/ResNet encoder + EMCAD decoder
TransUNet	CNN + ViT Hybrid	ResNet50 + ViT encoder
SwinUnet	Pure Transformer	Swin Transformer encoder-decoder

ResUnet ablation ladder (v0.1 → v1.0)

Each step corresponds to one specific paper improvement; every file under Synapse_u/model/ carries a top-of-file docstring explaining its role.

Version	Paper role	Key change
v0.1	baseline	ResNet34 (no pretrain) + full stem + all 5 skips
v0.2	training config	+ ImageNet pretrain (not part of the ablation)
v0.3	Improvement ①	Drop MaxPool from encoder stem (resolution preservation)
v0.4	Improvement ②	Drop layer1 skip (hierarchical feature selection)
v0.5	Improvement ③a	Bottleneck enhancement
v0.6	Improvement ③b	Layer3 channel preservation
v0.7	unused branch	Re-introduces layer1 skip (opposite of ②, kept for reference)
v1.0	★ Final model	Improvement ③c: up2 channel preservation, all changes integrated

Setup

conda env create -f ResUnet3.yml
conda activate ResUnet
python -c "import torch; print(torch.cuda.is_available())"

Requirements: Python 3.9.20 · PyTorch 2.4.1 · CUDA 12.4

Data Preparation

data/ResCNN/data/
├── Synapse/
│   ├── train_npz/         # Training .npz files (image + label)
│   └── test_vol/          # Testing .h5 volumes
├── AVT/
│   ├── train_npz/
│   └── test_vol/          # .npy.h5 volumes
└── ACDC/                  # Separate structure

Split files are in Synapse_u/lists/, one case ID per line.

Training

Unified entry point train.py: pick the model with --model, the dataset with --dataset.

# Final paper model
python train.py --model ResUnet1_0 --dataset Synapse
python train.py --model ResUnet1_0 --dataset AVT

# Ablation ladder (the three paper improvements applied step by step)
python train.py --model ResUnet0_1 --dataset Synapse        # baseline
python train.py --model ResUnet0_3 --dataset Synapse        # +Improvement (1) drop MaxPool
python train.py --model ResUnet0_4 --dataset Synapse        # +Improvement (2) drop layer1 skip
python train.py --model ResUnet0_5 --dataset Synapse        # +Improvement (3a) bottleneck
python train.py --model ResUnet0_6 --dataset Synapse        # +Improvement (3b) layer3 channels

# Comparison models
python train.py --model EMCAD     --dataset Synapse
python train.py --model TransUnet --dataset Synapse
python train.py --model SwinUnet  --dataset Synapse
python train.py --model ResUnetpp --dataset AVT
python train.py --model Unet      --dataset AVT

# List every supported model with its paper role
python train.py --list-models

# Laptop / 3060 smoke test (does not actually train, just verifies the pipeline)
python train.py --model ResUnet1_0 --dataset Synapse --max_epochs 1 --batch_size 1 --eval_interval 1

The legacy train_Synapse_*.py / train_AVT_*.py scripts are kept for reference and remain equivalent to train.py. New work should use train.py.

Common Arguments

Argument	Default	Description
`--model`	required	See `python train.py --list-models`
`--dataset`	`Synapse`	`Synapse` or `AVT`
`--max_epochs`	200	Training epochs
`--batch_size`	8	Batch size per GPU
`--base_lr`	0.02	Initial LR (SGD)
`--img_size`	224	Input spatial size
`--eval_interval`	25	Eval interval (epochs)

Training Details

Loss: 0.4 x CrossEntropyLoss + 0.6 x DiceLoss (0.3 + 0.7 for ACDC)
Optimizer: SGD, momentum=0.9, weight_decay=1e-4, polynomial LR decay
Evaluation: Starts at epoch >= max_epochs/1.5, runs every eval_interval, plus last 3 epochs
Checkpoints: Saved under checkpoints_save/<model_name>_SGD_<lr>_<epochs>_<batch>_<dataset>/

Testing

Unified entry point test.py:

python test.py --model ResUnet1_0 --dataset Synapse \
    --model_name vResUnet1_0_v1_0_0_SGD_0.02_200_8_Synapse/vResUnet1_0_v1_0_0_epoch_199.pth

python test.py --model EMCAD --dataset AVT \
    --model_name vEMCADNet_v1_0_0_SGD_0.02_200_8_AVT/vEMCADNet_v1_0_0_epoch_199.pth

# Laptop smoke test (only first case, no .nii.gz output)
python test.py --model ResUnet1_0 --dataset Synapse \
    --model_name vResUnet1_0_v1_0_0_SGD_0.02_200_8_Synapse/vResUnet1_0_v1_0_0_epoch_199.pth \
    --max_cases 1 --is_savenii False

Argument	Description
`--model`	Required, must match the model used during training
`--dataset`	`Synapse` or `AVT`
`--model_name`	Path relative to `checkpoints_save/`
`--max_cases`	Run only the first N cases; `0` = all (default)
`--is_savenii`	`True` to save `.nii.gz` predictions (default `False`)

The legacy test_Synapse.py / test_AVT.py / test_*_TransUnet.py are kept, but test.py is preferred — it shares the same MODEL_REGISTRY as train.py, so the model class never gets mismatched with the weights.

Metrics

Dice Score (DSC): Overlap between prediction and ground truth (higher is better)
95% Hausdorff Distance (HD95): Boundary distance (lower is better)

Computed via medpy.metric.binary. Results reported per-class and averaged over foreground classes.

Project Structure

ResUnet/
├── train.py / test.py            # ★ Unified entry points (recommended)
├── train_Synapse_*.py / train_AVT_*.py / test_*.py  # Legacy scripts, kept for reference
├── Synapse_u/                    # Core library
│   ├── trainer.py                # Training loop
│   ├── utils.py                  # DiceLoss, metrics, 3D inference
│   ├── datasets/                 # Dataset classes
│   ├── model/                    # All model implementations
│   │   ├── ResUnet0_1.py ~ 0_7.py, ResUnet1_0.py    # Ablation ladder (each file documents its role)
│   │   ├── ResUnetpp.py, Unet.py                    # Baselines
│   │   ├── v3_8_19.py, v3_8_19_2.py                 # PPM exploration (not used in paper)
│   │   ├── EMCAD/                # EMCADNet
│   │   ├── TransUNet/            # ViT hybrid
│   │   └── SwinUnet/             # Swin Transformer
│   └── lists/                    # Train/test splits
├── ACDC/                         # ACDC dataset standalone subproject (kept as-is)
├── checkpoints_save/             # Training outputs (auto-generated, gitignored)
└── data/ResCNN/data/             # Dataset storage (gitignored)

Checkpoint Directory

Training outputs at checkpoints_save/<model_name>_...:

<name>_epoch_<N>.pth — Model weights at epoch N
log/ — TensorBoard event files
<name>_<timestamp>_loss.csv — Training loss
<name>_<timestamp>_dice.png / _hd95.png — Metric plots
<name>_<timestamp>_results.csv — Evaluation data

Notes

Input: single-channel grayscale → replicated to 3 channels for ImageNet-pretrained encoder compatibility
TransUNet requires additional ViT pretrained weights — paths in vit_seg_configs.py
Multi-GPU supported via --n_gpu and nn.DataParallel
ACDC dataset uses a separate codebase under ACDC/; use Synapse_u/ as the primary reference

Citation

If you find this work useful, please cite:

@mastersthesis{resunet2026,
  title={ResUnet: Fixing ResNet's Adaptation Defects for Medical Image Segmentation},
  year={2026},
  school={Shaoxing University}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ACDC		ACDC
Synapse_u		Synapse_u
pic		pic
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
README_CN.md		README_CN.md
ResUnet3.yml		ResUnet3.yml
begin.py		begin.py
test_AVT.py		test_AVT.py
test_AVT_TransUnet.py		test_AVT_TransUnet.py
test_DAE.py		test_DAE.py
test_Synapse.py		test_Synapse.py
test_Synapse_TransUnet.py		test_Synapse_TransUnet.py
test_all_data2_4.py		test_all_data2_4.py
train_AVT_1.py		train_AVT_1.py
train_AVT_EMCAD.py		train_AVT_EMCAD.py
train_AVT_ResUnet++.py		train_AVT_ResUnet++.py
train_AVT_SwinUnet.py		train_AVT_SwinUnet.py
train_AVT_TransUnet.py		train_AVT_TransUnet.py
train_AVT_Unet.py		train_AVT_Unet.py
train_Synapse_3.py		train_Synapse_3.py
train_Synapse_4.py		train_Synapse_4.py
train_Synapse_5.py		train_Synapse_5.py
train_Synapse_6.py		train_Synapse_6.py
train_Synapse_7.py		train_Synapse_7.py
train_Synapse_EMCAD.py		train_Synapse_EMCAD.py
train_Synapse_SwinUnet.py		train_Synapse_SwinUnet.py
train_Synapse_TransUnet.py		train_Synapse_TransUnet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResUnet — Lightweight Medical Image Segmentation

Why ResUnet?

1. High-Resolution Encoder → Small Organs Survive

2. Hierarchical Feature Selection → Block Noise from Polluting Semantics

3. Asymmetric Decoder → Small Organ Features Stop Being "Diluted"

Performance at a Glance

Qualitative Results

Supported Models

ResUnet ablation ladder (v0.1 → v1.0)

Setup

Data Preparation

Training

Common Arguments

Training Details

Testing

Metrics

Project Structure

Checkpoint Directory

Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ResUnet — Lightweight Medical Image Segmentation

Why ResUnet?

1. High-Resolution Encoder → Small Organs Survive

2. Hierarchical Feature Selection → Block Noise from Polluting Semantics

3. Asymmetric Decoder → Small Organ Features Stop Being "Diluted"

Performance at a Glance

Qualitative Results

Supported Models

ResUnet ablation ladder (v0.1 → v1.0)

Setup

Data Preparation

Training

Common Arguments

Training Details

Testing

Metrics

Project Structure

Checkpoint Directory

Notes

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages