HS2P is an open-source project largely based on CLAM tissue segmentation and patching code.
System requirements: Linux-based OS (e.g., Ubuntu 22.04) with Python 3.11+ and Docker installed.
We recommend running the script inside a container using the latest hs2p image from Docker Hub:
docker pull waticlems/hs2p:latest
docker run --rm -it \
-v /path/to/your/data:/data \
waticlems/hs2p:latestReplace /path/to/your/data with your local data directory.
Alternatively, you can install hs2p via pip:
pip install hs2p-
Create a
.csvfile containing paths to the desired slides. Optionally, you can provide paths to pre-computed tissue masks under the 'mask_path' columnwsi_path,mask_path /path/to/slide1.tif,/path/to/mask1.tif /path/to/slide2.tif,/path/to/mask2.tif ...
-
Create a configuration file
A good starting point is to look at the default configuration file under
hs2p/configs/default.yamlwhere parameters are documented. -
Kick off slide tiling
python3 -m hs2p.tiling --config-file </path/to/config.yaml>
-
Create a
.csvfile containing paths to the desired slides & associated annotation masks:wsi_path,mask_path /path/to/slide1.tif,/path/to/mask1.tif /path/to/slide2.tif,/path/to/mask2.tif ...
-
Create a configuration file
A good starting point is to look at the default configuration file under
hs2p/configs/default.yamlwhere parameters are documented. -
Kick off tile sampling
python3 -m hs2p.sampling --config-file </path/to/config.yaml>
Both tiling.py and sampling.py produce a similar output structure in the specified output directory.
The coordinates/ folder contains a .npy file for each successfully processed slide.
This file stores a numpy array of shape (num_tiles, 8) containing the following information for each tile:
x: x-coordinate of the tile at level 0y: y-coordinate of the tile at level 0contour_index: index of the contour containing the tile (useful for masking non-tissue content)target_tile_size: requested tile size (in pixels)target_spacing: spacing at which the user requested the tile (in microns per pixel)tile_level: pyramid level at which the tile was extractedresize_factor: ratio betweentile_size_resizedand the requested tile size (target_tile_size), useful for resizing when loading the tiletile_size_resized: size of the tile at the extraction level (tile_level), which may differ from the requested tile size (target_tile_size) if the target spacing was not availabletile_size_lv0: tile size scaled to the slide's level 0
If visualize is set to true, a visualization/ folder is created containing low-resolution images to verify the results:
mask/: visualizations of the provided tissue (or annotation) masktiling/(fortiling.py) orsampling/(forsampling.py): visualizations of the extracted or sampled tiles overlaid on the slide. Forsampling.py, this includes subfolders for each category defined in the sampling parameters (e.g., tumor, stroma, etc.)
Mask contour line thickness is automatically inferred from the whole-slide dimensions and the visualization level, so contour readability stays consistent across tiny biopsies and large resections.
For sampling visualizations, overlays are drawn only for annotations that have a non-null color in sampling_params.color_mapping. Annotations with null color are left untouched (raw slide pixels, no darkening overlay).
These visualizations are useful for double-checking that the tiling or sampling process ran as expected.
process_list.csv: a summary file listing each processed slide, indicating whether processing was successful or failed. If a failure occurred, the traceback is provided to help diagnose the issue.
For quick mask generation outside the full pipeline, use the standalone script:
python -m pip install tifffile # need extra tifffile deps
# Single slide
python scripts/generate_tissue_mask.py \
--wsi /path/to/slide.tif \
--output /path/to/tissue-mask-pyramid.tif \
--spacing 4.0 \
--tolerance 0.1
# Multiple slides
python scripts/generate_tissue_mask.py \
--wsi /path/to/slide_dir/*.tif \
--output-dir /path/to/output_dir \
--spacing 4.0 \
--tolerance 0.1This script:
- reads the WSI with
wholeslidedata - computes a binary tissue mask using HSV thresholding (
0=background,1=tissue) - uses a coarse-to-fine ROI shortcut by default to avoid loading the full target-spacing WSI into memory
- writes a pyramidal TIFF mask at a desired
spacing, where each level is downsampled from the previous one - prints a final recap of how many slides succeeded, skipped, and failed
Useful options:
--backendto switch the wholeslidedata backend (default:asap)--outputfor single-slide mode and--output-dirfor multi-slide mode--num-workersto control parallelism--no-cacheto disable cache-based skipping and force recomputation--disable-coarse-roi-shortcutto force legacy full-frame loading at target spacing--coarse-spacing,--coarse-roi-margin-um, and--processing-tile-sizeto tune coarse-to-fine ROI processing--toleranceto control how much a natural spacing can deviate from target spacing when selecting the best level for reading the whole slide--min-component-area-um2to remove tiny tissue blobs--min-hole-area-um2to fill small holes inside tissue--gaussian-sigma-umto apply optional pre-threshold Gaussian smoothing--open-radius-um/--close-radius-umfor spacing-aware morphological smoothing--spacing-at-level-0to override level-0 spacing when metadata is incorrect--compressionand--tile-sizeto tune TIFF output
The summary file is saved as summary.csv in --output-dir (multi-slide mode) or next to --output (single-slide mode).
The cache manifest used for skip inference is saved as cache_manifest.json in the same directory.

