Skip to content

lingxitong/PFM_Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺 To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?

Weiming Chen*1, Xitong Ling*1, Xidong Wang2, Zhenyang Cai2, Yijia Guo3, Mingxi Fu1, Ziyi Zeng2, Minxi Ouyang1, Jiawen Li1, Yizhi Wang1, Tian Guan1, Benyou Wang#2, Yonghong He#1

* Equal contribution # Corresponding authors

1Tsinghua University, Shenzhen β€’ 2CUHK, Shenzhen β€’ 3Peking University, Beijing


A comprehensive semantic segmentation framework based on Pathology Foundation Models (PFMs), designed specifically for pathological image analysis, supporting multiple state-of-the-art pathology foundation models with complete training, inference, and evaluation capabilities.

🌟 Features

  • 🧬 Support for SOTA Pathology Foundation Models: uni_v1, uni_v2, virchow_v1, virchow_v2, conch_v1_5, conch_v1, midnight12k, lunit_vits8, musk, PathOrchestra, gigapath, phikon, patho3dmatrix-vision, phikon_v2, hoptimus_0, hoptimus_1, kaiko-vitl14, hibou_l
  • πŸ”§ Flexible Fine-tuning Strategies: LoRA, DoRA, full parameter fine-tuning, frozen backbone, CNN adapter, Transformer adapter
  • πŸ“Š Complete Training Pipeline: Mixed precision training, learning rate scheduling, gradient accumulation
  • 🎯 Advanced Data Augmentation: Integrated 10+ advanced data augmentations including spatial, color, and noise transformations
  • πŸ“ˆ Comprehensive Evaluation Metrics: Integrated 10+ evaluation metrics including IoU/Dice and more
  • ⚑ Advanced Inference Pipeline: Support for arbitrary resolution sliding window inference

πŸ“‹ Table of Contents

πŸ“ Dataset Format

JSON Configuration File Format

The dataset uses JSON format for configuration, supporting train, validation, and test set splits:

{
  "num_classes": 3,
  "data": {
    "train": [
      {
        "image_path": "/path/to/train/image1.jpg",
        "mask_path": "/path/to/train/mask1.png"
      },
    ],
    "val": [
      {
        "image_path": "/path/to/val/image1.jpg",
        "mask_path": "/path/to/val/mask1.png"
      }
    ],
    "test": [
      {
        "image_path": "/path/to/test/image1.jpg",
        "mask_path": "/path/to/test/image2.png" 
      }
    ]
  }
}

During training, only the train and val fields are used. The test field is used when executing inference scripts. The mask_path in the test field can be null or missing, in which case the model will not compute metrics. If mask_path exists, metrics will be automatically calculated after inference.

βš™οΈ Configuration File Details

The configuration file uses YAML format and includes the following main sections:

Dataset Configuration (dataset)

dataset:
  json_file: "/path/to/dataset.json"  # Path to dataset JSON configuration file
  num_classes: 3                      # Number of classes, must match JSON file
  ignore_index: 255                   # Pixel value to ignore for uncertain regions

System Configuration (system)

system:
  num_workers: 4          # Number of processes for data loading
  pin_memory: true        # Whether to use pin_memory for faster data transfer
  seed: 42               # Random seed for reproducible experiments
  device: "cuda:0"       # Device to use

Pathology Foundation Model Configuration (model) 🧬

This is the most important section, controlling the selection and configuration of pathology foundation models:

model:
  # === Base Model Selection ===
  pfm_name: "uni_v1"                    # Pathology foundation model name
  # Options:
  # - "uni_v1"       : UNI model version 1 (1024 dim)
  # - "uni_v2"       : UNI model version 2 (1536 dim)
  # - "conch_v1"     : Conch model version 1 (768 dim)
  # - "conch_v1_5"   : Conch model version 1.5 (1024 dim)
  # - "virchow_v1"   : Virchow model version 1 (1280 dim)
  # - "virchow_v2"   : Virchow model version 2 (1280 dim)
  # - "phikon"       : Phikon model (768 dim)
  # - "phikon_v2"    : Phikon-v2 model (1024 dim)
  # - "hoptimus_0"   : H-Optimus-0 model (1536 dim)
  # - "hoptimus_1"   : H-Optimus-1 model (1536 dim)
  # - "gigapath"     : Gigapath model (1536 dim)
  # - "midnight12k"  : Midnight-12k model (1536 dim)
  # - "kaiko-vitl14" : Kaiko-ViT-L14 model (1024 dim)
  # - "lunit_vits8"  : Lunit-S8 model (384 dim)
  # - 'musk'         : MUSK model (1024 dim)
  # - "patho3dmatrix-vision": Patho3DMatrix-Vision model (1024 dim)
  # - "PathOrchestra": PathOrchestra model (1024 dim)
  # - "hibou_l"     : Hibou-Large model (1024 dim)
  
  
  # === Model Parameter Configuration ===
  emb_dim: 1024                         # Embedding dimension, must match selected PFM model
  # Corresponding embedding dimensions for each model:
  # midnight12k/hoptimus_0/hoptimus_1/uni_v2/gigapath: 1536
  # virchow_v1/virchow_v2: 1280
  # uni_v1/hibou_l/musk/phikon_v2/kaiko-vitl14/patho3dmatrix-vision/PathOrchestra/conch_v1_5: 1024
  # conch_v1/phikon: 768
  # lunit_vits8: 384
  
  pfm_weights_path: '/path/to/pytorch_model.bin'  # Path to pre-trained weights file
  
  # === Fine-tuning Strategy Configuration ===
  finetune_mode:
    type: "lora"          # Fine-tuning mode
    # Options:
    # - "lora"   : LoRA low-rank adaptation, parameter efficient
    # - "dora"   : DoRA adaptation, parameter efficient
    # - "full"   : Full parameter fine-tuning, best performance but requires more memory
    # - "frozen" : Frozen backbone, only train segmentation head
    # - "cnn_adapter" : CNN adapter fine-tuning
    # - "transformer_adapter" : Transformer adapter fine-tuning
    
    rank: 16              # LoRA/DoRA rank, only used when type is "lora" or "dora"
    alpha: 16             # LoRA/DoRA scaling factor, only used when type is "lora" or "dora"
  
  num_classes: 3                  # Number of segmentation classes, must match dataset.num_classes

Training Configuration (training)

training:
  # === Basic Training Parameters ===
  batch_size: 8           # Batch size
  epochs: 100             # Number of training epochs
  learning_rate: 0.001     # Initial learning rate
  weight_decay: 0.0001    # Weight decay
  
  # === Training Optimization Settings ===
  use_amp: true                    # Whether to use mixed precision training
  accumulate_grad_batches: 1       # Number of gradient accumulation steps
  clip_grad_norm: 5.0              # Gradient clipping threshold
  
  # === Data Augmentation Configuration ===
  augmentation:
    RandomResizedCropSize: 512     # Random crop size
    # Note: Different PFM models have input size requirements
    # virchow_v1,virchow_v2,uni_v2,midnight12k,kaiko-vitl14,hibou_l,hoptimus_0,hoptimus_1: must be a multiple of 14 (token_size) 
    # uni_v1,conch_v1_5,gigapath,conch_v1,phikon,phikon_v2,patho3dmatrix-vision,PathOrchestra: must be a multiple of 16 (token_size) 
    # lunit_vits8: must be a multiple of 8 (token_size)
    # special: musk: 384
  
  # === Optimizer Configuration ===
  optimizer:
    type: "SGD"                    # Optimizer type: SGD, Adam, AdamW
    momentum: 0.9                  # SGD momentum (SGD only)
    nesterov: true                 # Whether to use Nesterov momentum
  
  # === Learning Rate Scheduler ===
  scheduler:
    type: "cosine"                 # Scheduler type: cosine, step
    warmup_epochs: 2               # Number of warmup epochs
  
  # === Loss Function ===
  loss:
    type: "dice"          # Loss function: cross_entropy, dice, ohem, iou

Validation Configuration (validation)

validation:
  eval_interval: 1        # Validate every N epochs
  batch_size: 16          # Validation batch size
  augmentation:
    ResizedSize: 512      # Image size during validation
    # Note: Different PFM models have input size requirements
    # virchow_v1,virchow_v2,uni_v2,midnight12k,kaiko-vitl14,hibou_l,hoptimus_0,hoptimus_1: must be a multiple of 14 (token_size) 
    # uni_v1,conch_v1_5,gigapath,conch_v1,phikon,phikon_v2,patho3dmatrix-vision,PathOrchestra: must be a multiple of 16 (token_size) 
    # lunit_vits8: must be a multiple of 8 (token_size)
    # special: musk: 384

Logging and Visualization Configuration

logging:
  log_dir: "/path/to/logs"           # Log save directory
  experiment_name: "pfm_segmentation" # Experiment name

visualization:
  save_interval: 2        # Save visualization results every N epochs
  num_vis_samples: 8      # Number of visualization samples to save

πŸš€ Training Script Usage

Basic Training Command

python scripts/train.py --config configs/config.yaml

Training Script Parameters Details

python scripts/train.py \
    --config configs/config.yaml \      # Configuration file path
    --resume checkpoints/model.pth \    # Resume training from checkpoint (optional)
    --device cuda:0                     # Specify device (optional, overrides config file)

Parameter Description

  • --config: Required Configuration file path containing all training settings
  • --resume: Optional Checkpoint file path for resuming interrupted training
  • --device: Optional Training device, overrides device setting in config file

Training Output

During training, the following files will be generated:

logs/experiment_name/
β”œβ”€β”€ config.yaml                 # Saved copy of configuration file
β”œβ”€β”€ training.log                # Training log
β”œβ”€β”€ checkpoints/                # Model checkpoints
β”‚   β”œβ”€β”€ best_model.pth          # Best model
β”œβ”€β”€ visualizations/             # Visualization results
β”‚   β”œβ”€β”€ epoch_010_sample_00.png
β”‚   └── ...
└── training_history.png        # Training curve plot

Training Monitoring

During training, the following will be displayed:

  • Training loss and validation loss
  • Validation metrics (mIoU, Pixel Accuracy, etc.)
  • Learning rate changes
  • Time consumption per epoch

πŸ” Inference Script Usage

Basic Inference Command

python scripts/infer.py \
    --config logs/experiment_name/config.yaml \
    --checkpoint logs/experiment_name/checkpoints/best_model.pth \
    --input_json dataset/test.json \
    --output_dir results/

Inference Script Parameters Details

python scripts/infer.py \
    --config CONFIG_PATH \              # Configuration file used during training
    --checkpoint CHECKPOINT_PATH \      # Trained model weights
    --input_json INPUT_JSON \           # Input data JSON file
    --output_dir OUTPUT_DIR \           # Results save directory
    --device cuda:0 \                   # Inference device
    --input_size 512 \                  # Input image size
    --resize_or_windowslide windowslide \ # Inference mode
    --batch_size 4                      # Inference batch size

Detailed Parameter Description

Parameter Type Required Description
--config str βœ… Configuration file path used during training
--checkpoint str βœ… Path to model checkpoint file or checkpoint directory. For LoRA/DoRA mode, will automatically load both base model and LoRA/DoRA weights.
--input_json str βœ… JSON file containing data to be inferred
--output_dir str βœ… Inference results save directory
--device str βœ… Inference device, default cuda:0
--input_size int βœ… Input image size for model, not original image size
--resize_or_windowslide str βœ… Inference mode, default windowslide
--batch_size int βœ… Inference batch size, default 2

Inference Mode Selection

  1. Resize Mode (--resize_or_windowslide resize)

    • Resize input images to fixed size (input_size) for inference
    • Resize prediction results back to original image size after inference
  2. Window Slide Mode (--resize_or_windowslide windowslide)

    • Use sliding window (input_size) strategy to process large images
    • Maintains original resolution with higher accuracy
    • Merge back to original image size after inference

Inference Output

After inference completion, the following will be generated:

output_dir/
β”œβ”€β”€ predictions_masks/          # Prediction masks (grayscale images)
β”‚   β”œβ”€β”€ image001.png
β”‚   β”œβ”€β”€ image002.png
β”‚   └── ...
└── predictions_overlays/       # Prediction result visualizations (colored overlay images)
    β”œβ”€β”€ image001.png
    β”œβ”€β”€ image002.png
    └── ...

Inference Result Format

  • Prediction Masks: Grayscale PNG images with pixel values corresponding to class indices
  • Visualization Overlays: Colored overlays of original images with prediction results for intuitive viewing

🧬 Pathology Foundation Models Details

Supported Models List

Model Name Parameters Embedding Dim Token Size HuggingFace
UNI 307M 1024 16Γ—16 MahmoodLab/UNI
UNI2-h 1.1B 1536 14Γ—14 MahmoodLab/UNI2-h
CONCH 90M 768 16Γ—16 MahmoodLab/CONCH
CONCHv1.5 307M 1024 16Γ—16 MahmoodLab/conchv1_5
Virchow 632M 1280 14Γ—14 paige-ai/Virchow
Virchow2 632M 1280 14Γ—14 paige-ai/Virchow2
Phikon 85.8M 768 16Γ—16 owkin/phikon
Phikon-v2 300M 1024 16Γ—16 owkin/phikon-v2
Prov-Gigapath 1.1B 1536 16Γ—16 prov-gigapath/prov-gigapath
H-Optimus-0 1.1B 1536 14Γ—14 bioptimus/H-optimus-0
H-Optimus-1 1.1B 1536 14Γ—14 bioptimus/H-optimus-1
MUSK - 1024 32Γ—32 xiangjx/musk
Midnight-12k - 1536 14Γ—14 kaiko-ai/midnight
Kaiko Various 384/768/1024 Various (8Γ—8 or 16Γ—16 or 14Γ—14) 1aurent/kaikoai-models-66636c99d8e1e34bc6dcf795
Lunit 21.7M 384 8Γ—8 1aurent/vit_small_patch8_224.lunit_dino
Hibou - 1024 14Γ—14 histai/hibou-L
PathOrchestra 307M 1024 16Γ—16 AI4Pathology/PathOrchestra
patho3dmatrix-vision 307M 1024 16Γ—16 -

πŸ“– Citation

If you find this work useful, please consider citing:

@misc{chen2026extenttokenlevelrepresentationspathology,
  title={To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?},
  author={Weiming Chen and Xitong Ling and Xidong Wang and Zhenyang Cai and Yijia Guo and Mingxi Fu and Ziyi Zeng and Minxi Ouyang and Jiawen Li and Yizhi Wang and Tian Guan and Benyou Wang and Yonghong He},
  year={2026},
  eprint={2602.03887},
  archivePrefix={arXiv},
  primaryClass={eess.IV},
  url={https://arxiv.org/abs/2602.03887},
}

🀝 Contributing

Welcome to submit issues and feature requests! Please check the contribution guidelines for more information.

πŸ“ž Contact

If you have questions or suggestions, please contact us through:


About

Pathology Foundation Models Meet Semantic Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages