π©Ί To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?
Weiming Chen*1, Xitong Ling*1, Xidong Wang2, Zhenyang Cai2, Yijia Guo3, Mingxi Fu1, Ziyi Zeng2, Minxi Ouyang1, Jiawen Li1, Yizhi Wang1, Tian Guan1, Benyou Wang#2, Yonghong He#1
* Equal contributionβ# Corresponding authors
1Tsinghua University, Shenzhen β’ 2CUHK, Shenzhen β’ 3Peking University, Beijing
A comprehensive semantic segmentation framework based on Pathology Foundation Models (PFMs), designed specifically for pathological image analysis, supporting multiple state-of-the-art pathology foundation models with complete training, inference, and evaluation capabilities.
- 𧬠Support for SOTA Pathology Foundation Models: uni_v1, uni_v2, virchow_v1, virchow_v2, conch_v1_5, conch_v1, midnight12k, lunit_vits8, musk, PathOrchestra, gigapath, phikon, patho3dmatrix-vision, phikon_v2, hoptimus_0, hoptimus_1, kaiko-vitl14, hibou_l
- π§ Flexible Fine-tuning Strategies: LoRA, DoRA, full parameter fine-tuning, frozen backbone, CNN adapter, Transformer adapter
- π Complete Training Pipeline: Mixed precision training, learning rate scheduling, gradient accumulation
- π― Advanced Data Augmentation: Integrated 10+ advanced data augmentations including spatial, color, and noise transformations
- π Comprehensive Evaluation Metrics: Integrated 10+ evaluation metrics including IoU/Dice and more
- β‘ Advanced Inference Pipeline: Support for arbitrary resolution sliding window inference
- Dataset Format
- Configuration File Details
- Training Script Usage
- Inference Script Usage
- Pathology Foundation Models Details
The dataset uses JSON format for configuration, supporting train, validation, and test set splits:
{
"num_classes": 3,
"data": {
"train": [
{
"image_path": "/path/to/train/image1.jpg",
"mask_path": "/path/to/train/mask1.png"
},
],
"val": [
{
"image_path": "/path/to/val/image1.jpg",
"mask_path": "/path/to/val/mask1.png"
}
],
"test": [
{
"image_path": "/path/to/test/image1.jpg",
"mask_path": "/path/to/test/image2.png"
}
]
}
}During training, only the train and val fields are used. The test field is used when executing inference scripts. The mask_path in the test field can be null or missing, in which case the model will not compute metrics. If mask_path exists, metrics will be automatically calculated after inference.
The configuration file uses YAML format and includes the following main sections:
dataset:
json_file: "/path/to/dataset.json" # Path to dataset JSON configuration file
num_classes: 3 # Number of classes, must match JSON file
ignore_index: 255 # Pixel value to ignore for uncertain regionssystem:
num_workers: 4 # Number of processes for data loading
pin_memory: true # Whether to use pin_memory for faster data transfer
seed: 42 # Random seed for reproducible experiments
device: "cuda:0" # Device to useThis is the most important section, controlling the selection and configuration of pathology foundation models:
model:
# === Base Model Selection ===
pfm_name: "uni_v1" # Pathology foundation model name
# Options:
# - "uni_v1" : UNI model version 1 (1024 dim)
# - "uni_v2" : UNI model version 2 (1536 dim)
# - "conch_v1" : Conch model version 1 (768 dim)
# - "conch_v1_5" : Conch model version 1.5 (1024 dim)
# - "virchow_v1" : Virchow model version 1 (1280 dim)
# - "virchow_v2" : Virchow model version 2 (1280 dim)
# - "phikon" : Phikon model (768 dim)
# - "phikon_v2" : Phikon-v2 model (1024 dim)
# - "hoptimus_0" : H-Optimus-0 model (1536 dim)
# - "hoptimus_1" : H-Optimus-1 model (1536 dim)
# - "gigapath" : Gigapath model (1536 dim)
# - "midnight12k" : Midnight-12k model (1536 dim)
# - "kaiko-vitl14" : Kaiko-ViT-L14 model (1024 dim)
# - "lunit_vits8" : Lunit-S8 model (384 dim)
# - 'musk' : MUSK model (1024 dim)
# - "patho3dmatrix-vision": Patho3DMatrix-Vision model (1024 dim)
# - "PathOrchestra": PathOrchestra model (1024 dim)
# - "hibou_l" : Hibou-Large model (1024 dim)
# === Model Parameter Configuration ===
emb_dim: 1024 # Embedding dimension, must match selected PFM model
# Corresponding embedding dimensions for each model:
# midnight12k/hoptimus_0/hoptimus_1/uni_v2/gigapath: 1536
# virchow_v1/virchow_v2: 1280
# uni_v1/hibou_l/musk/phikon_v2/kaiko-vitl14/patho3dmatrix-vision/PathOrchestra/conch_v1_5: 1024
# conch_v1/phikon: 768
# lunit_vits8: 384
pfm_weights_path: '/path/to/pytorch_model.bin' # Path to pre-trained weights file
# === Fine-tuning Strategy Configuration ===
finetune_mode:
type: "lora" # Fine-tuning mode
# Options:
# - "lora" : LoRA low-rank adaptation, parameter efficient
# - "dora" : DoRA adaptation, parameter efficient
# - "full" : Full parameter fine-tuning, best performance but requires more memory
# - "frozen" : Frozen backbone, only train segmentation head
# - "cnn_adapter" : CNN adapter fine-tuning
# - "transformer_adapter" : Transformer adapter fine-tuning
rank: 16 # LoRA/DoRA rank, only used when type is "lora" or "dora"
alpha: 16 # LoRA/DoRA scaling factor, only used when type is "lora" or "dora"
num_classes: 3 # Number of segmentation classes, must match dataset.num_classestraining:
# === Basic Training Parameters ===
batch_size: 8 # Batch size
epochs: 100 # Number of training epochs
learning_rate: 0.001 # Initial learning rate
weight_decay: 0.0001 # Weight decay
# === Training Optimization Settings ===
use_amp: true # Whether to use mixed precision training
accumulate_grad_batches: 1 # Number of gradient accumulation steps
clip_grad_norm: 5.0 # Gradient clipping threshold
# === Data Augmentation Configuration ===
augmentation:
RandomResizedCropSize: 512 # Random crop size
# Note: Different PFM models have input size requirements
# virchow_v1,virchow_v2,uni_v2,midnight12k,kaiko-vitl14,hibou_l,hoptimus_0,hoptimus_1: must be a multiple of 14 (token_size)
# uni_v1,conch_v1_5,gigapath,conch_v1,phikon,phikon_v2,patho3dmatrix-vision,PathOrchestra: must be a multiple of 16 (token_size)
# lunit_vits8: must be a multiple of 8 (token_size)
# special: musk: 384
# === Optimizer Configuration ===
optimizer:
type: "SGD" # Optimizer type: SGD, Adam, AdamW
momentum: 0.9 # SGD momentum (SGD only)
nesterov: true # Whether to use Nesterov momentum
# === Learning Rate Scheduler ===
scheduler:
type: "cosine" # Scheduler type: cosine, step
warmup_epochs: 2 # Number of warmup epochs
# === Loss Function ===
loss:
type: "dice" # Loss function: cross_entropy, dice, ohem, iouvalidation:
eval_interval: 1 # Validate every N epochs
batch_size: 16 # Validation batch size
augmentation:
ResizedSize: 512 # Image size during validation
# Note: Different PFM models have input size requirements
# virchow_v1,virchow_v2,uni_v2,midnight12k,kaiko-vitl14,hibou_l,hoptimus_0,hoptimus_1: must be a multiple of 14 (token_size)
# uni_v1,conch_v1_5,gigapath,conch_v1,phikon,phikon_v2,patho3dmatrix-vision,PathOrchestra: must be a multiple of 16 (token_size)
# lunit_vits8: must be a multiple of 8 (token_size)
# special: musk: 384logging:
log_dir: "/path/to/logs" # Log save directory
experiment_name: "pfm_segmentation" # Experiment name
visualization:
save_interval: 2 # Save visualization results every N epochs
num_vis_samples: 8 # Number of visualization samples to savepython scripts/train.py --config configs/config.yamlpython scripts/train.py \
--config configs/config.yaml \ # Configuration file path
--resume checkpoints/model.pth \ # Resume training from checkpoint (optional)
--device cuda:0 # Specify device (optional, overrides config file)--config: Required Configuration file path containing all training settings--resume: Optional Checkpoint file path for resuming interrupted training--device: Optional Training device, overrides device setting in config file
During training, the following files will be generated:
logs/experiment_name/
βββ config.yaml # Saved copy of configuration file
βββ training.log # Training log
βββ checkpoints/ # Model checkpoints
β βββ best_model.pth # Best model
βββ visualizations/ # Visualization results
β βββ epoch_010_sample_00.png
β βββ ...
βββ training_history.png # Training curve plot
During training, the following will be displayed:
- Training loss and validation loss
- Validation metrics (mIoU, Pixel Accuracy, etc.)
- Learning rate changes
- Time consumption per epoch
python scripts/infer.py \
--config logs/experiment_name/config.yaml \
--checkpoint logs/experiment_name/checkpoints/best_model.pth \
--input_json dataset/test.json \
--output_dir results/python scripts/infer.py \
--config CONFIG_PATH \ # Configuration file used during training
--checkpoint CHECKPOINT_PATH \ # Trained model weights
--input_json INPUT_JSON \ # Input data JSON file
--output_dir OUTPUT_DIR \ # Results save directory
--device cuda:0 \ # Inference device
--input_size 512 \ # Input image size
--resize_or_windowslide windowslide \ # Inference mode
--batch_size 4 # Inference batch size| Parameter | Type | Required | Description |
|---|---|---|---|
--config |
str | β | Configuration file path used during training |
--checkpoint |
str | β | Path to model checkpoint file or checkpoint directory. For LoRA/DoRA mode, will automatically load both base model and LoRA/DoRA weights. |
--input_json |
str | β | JSON file containing data to be inferred |
--output_dir |
str | β | Inference results save directory |
--device |
str | β | Inference device, default cuda:0 |
--input_size |
int | β | Input image size for model, not original image size |
--resize_or_windowslide |
str | β | Inference mode, default windowslide |
--batch_size |
int | β | Inference batch size, default 2 |
-
Resize Mode (
--resize_or_windowslide resize)- Resize input images to fixed size (input_size) for inference
- Resize prediction results back to original image size after inference
-
Window Slide Mode (
--resize_or_windowslide windowslide)- Use sliding window (input_size) strategy to process large images
- Maintains original resolution with higher accuracy
- Merge back to original image size after inference
After inference completion, the following will be generated:
output_dir/
βββ predictions_masks/ # Prediction masks (grayscale images)
β βββ image001.png
β βββ image002.png
β βββ ...
βββ predictions_overlays/ # Prediction result visualizations (colored overlay images)
βββ image001.png
βββ image002.png
βββ ...
- Prediction Masks: Grayscale PNG images with pixel values corresponding to class indices
- Visualization Overlays: Colored overlays of original images with prediction results for intuitive viewing
| Model Name | Parameters | Embedding Dim | Token Size | HuggingFace |
|---|---|---|---|---|
| UNI | 307M | 1024 | 16Γ16 | MahmoodLab/UNI |
| UNI2-h | 1.1B | 1536 | 14Γ14 | MahmoodLab/UNI2-h |
| CONCH | 90M | 768 | 16Γ16 | MahmoodLab/CONCH |
| CONCHv1.5 | 307M | 1024 | 16Γ16 | MahmoodLab/conchv1_5 |
| Virchow | 632M | 1280 | 14Γ14 | paige-ai/Virchow |
| Virchow2 | 632M | 1280 | 14Γ14 | paige-ai/Virchow2 |
| Phikon | 85.8M | 768 | 16Γ16 | owkin/phikon |
| Phikon-v2 | 300M | 1024 | 16Γ16 | owkin/phikon-v2 |
| Prov-Gigapath | 1.1B | 1536 | 16Γ16 | prov-gigapath/prov-gigapath |
| H-Optimus-0 | 1.1B | 1536 | 14Γ14 | bioptimus/H-optimus-0 |
| H-Optimus-1 | 1.1B | 1536 | 14Γ14 | bioptimus/H-optimus-1 |
| MUSK | - | 1024 | 32Γ32 | xiangjx/musk |
| Midnight-12k | - | 1536 | 14Γ14 | kaiko-ai/midnight |
| Kaiko | Various | 384/768/1024 | Various (8Γ8 or 16Γ16 or 14Γ14) | 1aurent/kaikoai-models-66636c99d8e1e34bc6dcf795 |
| Lunit | 21.7M | 384 | 8Γ8 | 1aurent/vit_small_patch8_224.lunit_dino |
| Hibou | - | 1024 | 14Γ14 | histai/hibou-L |
| PathOrchestra | 307M | 1024 | 16Γ16 | AI4Pathology/PathOrchestra |
| patho3dmatrix-vision | 307M | 1024 | 16Γ16 | - |
If you find this work useful, please consider citing:
@misc{chen2026extenttokenlevelrepresentationspathology,
title={To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?},
author={Weiming Chen and Xitong Ling and Xidong Wang and Zhenyang Cai and Yijia Guo and Mingxi Fu and Ziyi Zeng and Minxi Ouyang and Jiawen Li and Yizhi Wang and Tian Guan and Benyou Wang and Yonghong He},
year={2026},
eprint={2602.03887},
archivePrefix={arXiv},
primaryClass={eess.IV},
url={https://arxiv.org/abs/2602.03887},
}Welcome to submit issues and feature requests! Please check the contribution guidelines for more information.
If you have questions or suggestions, please contact us through:
- Submit GitHub Issue
- Send email to: [lingxt23@mails.tsinghua.edu.cn] or [cwm25@mails.tsinghua.edu.cn]