Welcome to my bioinformatics projects repository.
This collection includes pipelines, analyses, and training resources that I’ve developed to streamline genomic data analysis. Many projects stem from the Genomic Data Science Specialization (Johns Hopkins University - Coursera), along with hands-on work inspired by my experience in a molecular diagnostics laboratory.
Explore end-to-end projects that highlight both my command-line bioinformatics expertise and my growing data science/visualization skills.
- 🐍 Python scripting for bioinformatics tasks
- 🔎 NGS data processing and quality control
- 🧬 Sequence alignment (Bowtie2, BWA)
- 🧫 Variant calling and annotation (GATK, bcftools, samtools)
- 🧪 Workflow automation with Snakemake
- 📊 Statistical analysis and visualization in R & Python
- 🧬 ORF detection, gene annotation, and sequence analysis
- Workflow & Pipelines: Snakemake, bash scripting
- Alignment & Variants: Bowtie2, BWA, samtools, bcftools, GATK
- Data Science: Python (
pandas,numpy,matplotlib), R (dplyr,ggplot2,Bioconductor) - Bioinformatics Packages: Biopython (
SeqIO,Bio.Seq), VariantAnnotation (R) - Documentation & Version Control: Markdown, Git/GitHub
- Assignments and exploratory scripts from Coursera training.
- Focus: building strong foundations in genomic data science.
Each subfolder includes its own README with explanations and runnable examples.
- Hands-on practice building pipelines following GATK best practices for variant discovery.
- Emphasis on reproducibility and modular structure.
- A self-designed pipeline created from scratch to handle end-to-end NGS data processing.
Each folder includes setup instructions, command examples, and data templates.
New Project — statistical analysis and visual storytelling with R.
- Summarize variant types (SNPs vs INDELs)
- Generate chromosome-level plots & allele frequency distributions
- Explore summary statistics with
dplyrandggplot2
Includes example scripts, figures, and results.
- Clone the repository
git clone https://github.com/paul-london/bioinformatics-portfolio.git cd bioinformatics-portfolio - Navigate into a project folder (
course-training/,pipelines/, orvariant-summary/) - Follow the project-specific README for setup, usage, and outputs
I’m always open to feedback, collaboration, or discussion!
- (Optional: add email here)
✨ This repository is under active development—new projects and updates are added regularly.