Skip to content

paul-london/Bioinformatics

Repository files navigation

Bioinformatics & Genomic Data Science Portfolio

Welcome to my bioinformatics projects repository.

This collection includes pipelines, analyses, and training resources that I’ve developed to streamline genomic data analysis. Many projects stem from the Genomic Data Science Specialization (Johns Hopkins University - Coursera), along with hands-on work inspired by my experience in a molecular diagnostics laboratory.

Explore end-to-end projects that highlight both my command-line bioinformatics expertise and my growing data science/visualization skills.


⚡ Core Competencies

  • 🐍 Python scripting for bioinformatics tasks
  • 🔎 NGS data processing and quality control
  • 🧬 Sequence alignment (Bowtie2, BWA)
  • 🧫 Variant calling and annotation (GATK, bcftools, samtools)
  • 🧪 Workflow automation with Snakemake
  • 📊 Statistical analysis and visualization in R & Python
  • 🧬 ORF detection, gene annotation, and sequence analysis

📦 Key Libraries & Tools

  • Workflow & Pipelines: Snakemake, bash scripting
  • Alignment & Variants: Bowtie2, BWA, samtools, bcftools, GATK
  • Data Science: Python (pandas, numpy, matplotlib), R (dplyr, ggplot2, Bioconductor)
  • Bioinformatics Packages: Biopython (SeqIO, Bio.Seq), VariantAnnotation (R)
  • Documentation & Version Control: Markdown, Git/GitHub

📁 Projects Overview

  • Assignments and exploratory scripts from Coursera training.
  • Focus: building strong foundations in genomic data science.

Each subfolder includes its own README with explanations and runnable examples.


2. Pipelines

  • Hands-on practice building pipelines following GATK best practices for variant discovery.
  • Emphasis on reproducibility and modular structure.
  • A self-designed pipeline created from scratch to handle end-to-end NGS data processing.

Each folder includes setup instructions, command examples, and data templates.


3. Variant Summary & Visualization in R

New Project — statistical analysis and visual storytelling with R.

  • Summarize variant types (SNPs vs INDELs)
  • Generate chromosome-level plots & allele frequency distributions
  • Explore summary statistics with dplyr and ggplot2

Includes example scripts, figures, and results.


🚀 Getting Started

  1. Clone the repository
    git clone https://github.com/paul-london/bioinformatics-portfolio.git
    cd bioinformatics-portfolio
  2. Navigate into a project folder (course-training/, pipelines/, or variant-summary/)
  3. Follow the project-specific README for setup, usage, and outputs

📬 Contact

I’m always open to feedback, collaboration, or discussion!


This repository is under active development—new projects and updates are added regularly.

About

Bioinformatics coursework and NGS Analysis projects.

Resources

Stars

Watchers

Forks

Contributors