Skip to content

kukrma/glioma-diffusion-analysis

Repository files navigation

Analysis of Peritumoral and Periedematous Diffusion Properties in Brain Gliomas Using the UCSF-PDGM Dataset

This GitHub repository contains the code and derived data related to the academic article Latent structure in peritumoral diffusion properties revealed by manifold learning and cluster analysis in adult-type diffuse gliomas, which is currently in works. Once published, the necessary links to the article will be added.

The associated research was submitted as a poster on 14th December 2025 to the Organization of Human Brain Mapping (OHBM) 2026 Annual Meeting in Bordeaux (FR), scheduled to take place on 14th–18th June 2026. The poster has been accepted and will be presented at the congress. Because the code and results might change due to article revisions and final changes, the state of this repository during poster submission is archived in the form of a release. Links will be added to the poster after the congress.

The code was originally used in the diploma thesis Analysis of White Matter Diffusion Properties in the Context of Selected Brain Tumors produced at the Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia. The state of this repository during thesis submission and defense is also archived in the form of a release. The thesis was successfully defended on 16th June 2025 and can be accessed from URL:

CC BY-NC 4.0

CC BY-NC 4.0

How to Use

To use this code, you first need to install Python on your computer and optionally some IDE in which you can comfortably interact with the scripts. Specifically, I have used Python version 3.11.4 and the following libraries:

LIBRARY VERSION
antspyx 0.4.2
cliffs-delta 1.0.0
dcor 0.6
dipy 1.9.0
matplotlib 3.7.2
nibabel 5.2.1
numpy 1.25.2
optuna 4.2.1
os built-in
pandas 2.1.0
scikit-image 0.22.0
scikit-learn 1.3.0
scipy 1.11.2
seaborn 0.13.0
time built-in
tqdm 4.66.1
umap-learn 0.5.7

With everything prepared, the code should be ready to use. Given the data-specific nature of the analysis, the code is not fully generalized, but tailored specifically to the publicly available UCSF-PDGM dataset (Version 4), which can be downloaded from the following link:

For a reader who is interested only in the analytical stage of the workflow, the derived preprocessed diffusion properties are provided via peritumoral.csv and periedematous.csv files (see below for more details).

The source code files contain many comments with descriptions of individual steps and parameters used by the defined classes and their methods. Applying the analysis to other brain diffusion MRI datasets should be possible, but will need additional adjustments based on the analyzed data.

Description of Files

The repsitory contains the following source codes:

  1. analysis.py – implements the analysis of diffusion properties, i.e. EDA, manifold learning, cluster analysis, and post-clustering analysis;
  2. preprocessing.py – implements the image processing, i.e. DWI registration, ROI generation, and computation of diffusion properties via DTI and CSD;
  3. utils.py – implements the classes and methods used by other scripts;
  4. visualization.py – implements various visualizations that did not fit well elsewhere, i.e. plots of original data, visualizations of DTI and CSD slices, and visualizations of structuring elements;

The file utils.py is called by all other scripts and organizes the functionality into three classes treated as toolkits:

  1. PreprocessingToolkit – contains methods related to image processing and data preprocessing:

    METHOD DESCRIPTION
    registration_4Dto3D() Used to perform the registration of DWI data to the patient-specific space.
    generate_ROI() Used to generate ROIs around tumors and edemas.
    model_CSD() Used to compute CSD and selected derived characteristics.
    model_DTI() Used to compute DTI and selected derived characteristics.
    change_labels() Used to change labels inside columns of a dataframe (they can be too long for plots).
  2. VisualizationToolkit – contains methods for visualizations:

    METHOD DESCRIPTION
    __init__() Contains shared color palettes.
    explore_slices() Used for simple 3D or 4D visualizations to inspect the spatial data from the Python script without the need to open FSL.
    barplot() Used to create a simple barplot (only counts per categories).
    histogram() Used to create a combined histogram with KDE.
    heatmap() Used to create a heatmap of a correlation matrix.
    violin() Used to create a violinplot with individual points.
    scatter() Used to create a scatterplot.
    dti_ellipsoids() Used to visualize DTI ellipsoids.
    csd_glyphs()() Used to visualize CSD glyphs.
    fodf_sphere() Used to visualize a single fODF.
    structuring_el() Used to visualize the structuring element.
  3. AnalysisToolkit – contains metods for the statistical analysis and ML algorithms.

    METHOD DESCRIPTION
    correlation() Used to compute linear or non-linear correlation and show it using a heatmap.
    umap_manifold() Used to compute UMAP on a range of data (UMAP parameters can be optimized).
    gmm_clustering() Used to compute GMM on a range of data (number of clusters can be optimized).
    relation_quantitative() Used to explore the relationship between a quantitative and qualitative attribute.
    relation_qualitative() Used to explore the relationship between two qualitative attributes.

Original Directory Tree

The provided source code contains paths to various files, and so it is deemed necessary to provide the project directory tree for easier replication or modification:

DPthesis/
├── data/
│   ├── preprocessed/
│   │   ├── csd/
│   │   │   ├── periedematous/
│   │   │   │   ├── UCSF-PDGM-0004_CSD_periedematous.npy
│   │   │   │   ├── UCSF-PDGM-0005_CSD_periedematous.npy
│   │   │   │   └── ...
│   │   │   └── peritumoral/
│   │   │       ├── UCSF-PDGM-0004_CSD_peritumoral.npy
│   │   │       ├── UCSF-PDGM-0005_CSD_peritumoral.npy
│   │   │       └── ...
│   │   ├── dti/
│   │   │   ├── periedematous/
│   │   │   │   ├── UCSF-PDGM-0004_DTI_periedematous.npy
│   │   │   │   ├── UCSF-PDGM-0005_DTI_periedematous.npy
│   │   │   │   └── ...
│   │   │   └── peritumoral/
│   │   │       ├── UCSF-PDGM-0004_DTI_peritumoral.npy
│   │   │       ├── UCSF-PDGM-0005_DTI_peritumoral.npy
│   │   │       └── ...
│   │   ├── dwi/
│   │   │   ├── UCSF-PDGM-0004_DWI.nii.gz
│   │   │   ├── UCSF-PDGM-0005_DWI.nii.gz
│   │   │   └── ...
│   │   ├── roi/
│   │   │   ├── periedematous/
│   │   │   │   ├── UCSF-PDGM-0004_ROI_periedematous.nii.gz
│   │   │   │   ├── UCSF-PDGM-0005_ROI_periedematous.nii.gz
│   │   │   │   └── ...
│   │   │   └── peritumoral/
│   │   │       ├── UCSF-PDGM-0004_ROI_peritumoral.nii.gz
│   │   │       ├── UCSF-PDGM-0005_ROI_peritumoral.nii.gz
│   │   │       └── ...
│   │   ├── periedematous.csv
│   │   └── peritumoral.csv
│   └── UCSF-PDGM/
│       ├── PKG-UCSF-PDGM-v3-20230111/
│       │   └── UCSF-PDGM-v3/
│       │       ├── UCSF-PDGM-0004_nifti/
│       │       │   ├── UCSF-PDGM-0004_ADC.nii.gz
│       │       │   ├── UCSF-PDGM-0004_ASL.nii.gz
│       │       │   ├── UCSF-PDGM-0004_brain_parenchyma_segmentation.nii.gz
│       │       │   ├── UCSF-PDGM-0004_brain_segmentation.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_FA.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_L1.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_L2.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_L3.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_MD.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DTI_eddy_noreg.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DWI.nii.gz
│       │       │   ├── UCSF-PDGM-0004_DWI_bias.nii.gz
│       │       │   ├── UCSF-PDGM-0004_FLAIR.nii.gz
│       │       │   ├── UCSF-PDGM-0004_FLAIR_bias.nii.gz
│       │       │   ├── UCSF-PDGM-0004_SWI.nii.gz
│       │       │   ├── UCSF-PDGM-0004_SWI_bias.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T1.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T1_bias.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T1c.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T1c_bias.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T2.nii.gz
│       │       │   ├── UCSF-PDGM-0004_T2_bias.nii.gz
│       │       │   └── UCSF-PDGM-0004_tumor_segmentation.nii.gz
│       │       ├── UCSF-PDGM-0005_nifti/...
│       │       └── ...
│       ├── UCSF-PDGM_DTI.bval
│       ├── UCSF-PDGM_DTI.bvec
│       └── UCSF-PDGM_metadata_v2.csv
├── img/...
├── analysis.py
├── preprocess.py
├── utils.py
└── visualization.py

Preprocessed Data

In addition to the source code, the derived diffusion properties in peritumoral and periedematous ROIs are provided. Although the UCSF-PDGM dataset is publicly available and therefore the entire workflow should be reproducible, the image processing stage will likely take more than ten hours to complete (depending on available hardware), and so readers interested only in the analytical stage would be hindered by the need to first perform the image processing stage. Therefore, the computed diffusion properties are provided, as they present a major milestone in the analysis.

Both the peritumoral.csv and periedematous.csv files contain the same clinical markers and the diffusion properties, only for different regions (i.e., peritumoral or periedematous ROIs) based on their name. Specifically, the attributes are:

ATTRIBUTE TYPE ORIGIN DESCRIPTION
ID ordinal UCSF-PDGM unique identification of the subject
Sex nominal UCSF-PDGM sex of the subject
Age integer UCSF-PDGM age in years at time of imaging
Grade ordinal UCSF-PDGM grade based on the WHO CNS5
Type nominal UCSF-PDGM final pathologic diagnosis based on the WHO CNS5
MGMTstatus nominal UCSF-PDGM status of the MGMT biomarker
MGMTindex integer UCSF-PDGM index developed at UCSF indicating the number of promoter methylation sites
1p/19q nominal UCSF-PDGM status of of 1p and 19q genes, assayed by fluorescent in-situ hybridization
IDH nominal UCSF-PDGM IDH subtype characterized with a capture-based targeted next-generation DNA sequencing panel
AliveDead binary UCSF-PDGM survival at last clinical follow up
OS integer UCSF-PDGM OS in days from initial diagnosis to last clinical follow up
EoR nominal UCSF-PDGM extent of resection determined by review of operative reports and immediate postoperative imaging
Biopsy binary UCSF-PDGM if burr hole biopsy was performed
Ratio float CSD ratio between smallest versus largest eigenvalue of the response function (identical for both periedematous and peritumoral data)
NE float CSD normalized entropy
GFAmed float CSD median of GFA in ROI
GFAiqr float CSD IQR of GFA in ROI
MAGmed float CSD median of MAG in ROI
MAGiqr float CSD IQR of MAG in ROI
FAmed float DTI median of FA in ROI
FAiqr float DTI IQR of FA in ROI
MDmed float DTI median of MD in ROI
MDiqr float DTI IQR of MD in ROI
RDmed float DTI median of RD in ROI
RDiqr float DTI IQR of RD in ROI
ADmed float DTI median of AD in ROI
ADiqr float DTI IQR of AD in ROI

The detailed description including formulas and explained abbreviations is provided in the academic article or in the thesis.

About

Analysis of Peritumoral and Periedematous Diffusion Properties in Brain Gliomas Using the UCSF-PDGM Dataset

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages