A comprehensive tutorial for bioinformaticians and proteomics researchers to learn glycoproteomic (proteomics with post-translational modifications) data analysis workflows using R and Python.
This tutorial is designed for:
- Graduate students and researchers in proteomics/glycoproteomics
- Bioinformaticians learning mass spectrometry data analysis
- Scientists with basic R/Python knowledge wanting to analyze TMT-based quantitative proteomics data
Prerequisites:
- Basic familiarity with R and/or Python
- Understanding of proteomics concepts (proteins, peptides, mass spectrometry)
- Familiarity with statistical concepts (p-values, fold changes, normalization)
| Chapter | Topic | Learning Outcomes |
|---|---|---|
| Chapter 1 | R Basics | Install R/RStudio, understand tidyverse, perform basic statistical tests (K-S test, Wilcoxon), create publication-quality plots |
| Chapter 2 | Data Normalization | Filter PSMs, aggregate to protein level, apply sample loading and TMM normalization, visualize with UMAP |
| Chapter 3 | Differential Analysis | Set up limma design matrices, perform differential expression analysis, interpret results |
| Chapter 4 | Enrichment Analysis | Run GSEA with GO/KEGG, analyze protein complexes (CORUM) and domains (Pfam) |
| Chapter 5 | Structure Analysis | Analyze peptide physicochemical properties, integrate AlphaFold structural data |
- Install R (version 4.3+): https://cran.r-project.org/
- Install RStudio: https://posit.co/download/rstudio-desktop/
- Install required packages:
# CRAN packages
install.packages(c(
"tidyverse", "readxl", "writexl", "readr",
"showtext", "rstatix", "ggpubr", "reticulate"
))
# Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"limma", "edgeR", "clusterProfiler",
"org.Hs.eg.db", "AnnotationDbi", "ComplexHeatmap"
))# Create conda environment for UMAP (Chapter 2)
conda create -n UMAP_env python=3.9
conda activate UMAP_env
pip install umap-learn pandas numpy matplotlib seaborn
# Create conda environment for structure analysis (Chapter 5)
conda create -n structure_analysis python=3.9
conda activate structure_analysis
pip install pandas numpy matplotlib seaborn localcider structuremap- Windows: Ensure conda is added to PATH during installation
- macOS: If using Apple Silicon, some packages may require Rosetta 2
- Linux: May require additional system libraries for some R packages
- Clone this repository:
git clone https://github.com/lfu46/Glycoproteomic-Data-Analysis-using-R-and-Python.git
cd Glycoproteomic-Data-Analysis-using-R-and-Python-
Open the
.Rprojfile in RStudio -
Start with Chapter 1 to verify your R setup works:
rmarkdown::render("Chapter_1_R_Basics/R_Basics.Rmd")- Each chapter builds on previous ones, so work through them sequentially.
See DATA_ACCESS.md for detailed information about:
- Sample datasets included for quick testing
- Where to download full datasets
- Expected data formats and column descriptions
| Category | Packages |
|---|---|
| Data manipulation | tidyverse (dplyr, tidyr, purrr, stringr) |
| Data I/O | readxl, writexl, readr |
| Visualization | ggplot2, ggpubr, ComplexHeatmap |
| Statistics | rstatix, limma, edgeR |
| Bioinformatics | clusterProfiler, org.Hs.eg.db, AnnotationDbi |
| R-Python bridge | reticulate |
This tutorial follows tidyverse conventions:
- Pipe operator (
|>) for chaining operations - Tidy data principles (each variable a column, each observation a row)
- 2-space indentation
- UTF-8 encoding
Raw MS Results (CSV/TSV from search software)
↓
PSM Filtering (XCorr, PPM thresholds)
↓
Protein-level Aggregation
↓
Normalization (Sample Loading → TMM)
↓
Statistical Analysis (limma)
↓
Visualization & Enrichment Analysis
If you use this tutorial in your research, please cite:
Fu, L. (2025). Glycoproteomic Data Analysis using R and Python: A Practical Tutorial.
GitHub: https://github.com/lfu46/Glycoproteomic-Data-Analysis-using-R-and-Python
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Report bugs or request features via GitHub Issues
- Questions: For questions about the tutorial content, open a GitHub Discussion
Contributions are welcome! Please feel free to submit a Pull Request.