Skip to content

Gunshipz/genomine

Repository files navigation

genomine

A self-hosted whole-genome annotation pipeline. Point it at your WGS VCF, let it run, and get a browseable HTML report with ClinVar significance, pharmacogenomic diplotypes, polygenic risk scores, structural variant interpretation, mtDNA haplogroup, and SNPedia community notes — all processed locally on your own machine.

Educational and wellness use only. This software does not provide medical advice and is not a clinical diagnostic tool. Results should not be used to make health decisions without consultation with a qualified clinician. Variant classification and risk scores are approximate and may contain errors.


What you get

A single HTML report (reports/<sample>/index.html) containing:

  • ClinVar significance — ~36,000 variant matches on a full 30x WGS, cross-referenced against expert-reviewed pathogenic/benign calls.
  • Clinical-grade pharmacogenomic findings — PyPGx star-allele calls for ~10 pharmacogenes (CYP2D6, CYP3A5, DPYD, etc.) with CPIC phenotype interpretation.
  • Polygenic risk scores with ancestry percentiles — 6 curated traits (CAD, T2D, Alzheimer's, breast cancer, prostate cancer, longevity) with percentile relative to a matched ancestry reference panel.
  • mtDNA haplogroup — classified via HaploGrep from mitochondrial variants.
  • AnnotSV structural variant interpretation — CNV/SV findings scored with ACMG criteria if you supply a CNV/SV VCF alongside your SNP/indel VCF.
  • SNPedia community notes — rsID matches against the SNPedia community-curated database (optional; ~18,000 matches on a full WGS).

Requirements

Input data:

  • A whole-genome sequencing (WGS) VCF in GRCh38 coordinates — 30x coverage is ideal. Compatible sources include Nebula Genomics, Dante Labs, Sequencing.com, or a raw Illumina pipeline output.
  • v0.1 does not support 23andMe or AncestryDNA chip TXT files. Chip arrays cover ~0.1% of the genome and lack the read-depth information the pipeline's QC filters rely on.

System:

  • Operating system: Linux or macOS (primary supported path). Windows users should use WSL2 — see INSTALL.md.
  • Docker — required. All annotation tools run in containers; no native installs of bcftools, OpenCRAVAT, etc. are needed.
  • ~60–80 GB free disk — annotation databases are large (see INSTALL.md for a full breakdown).
  • 16 GB+ RAM recommended for the OpenCRAVAT annotation stage.

Quickstart

# 1. Clone
git clone https://github.com/Gunshipz/genomine.git
cd genomine

# 2. Install Python deps
uv sync

# 3. Check what's installed / what needs downloading
uv run genome-refresh bootstrap status

# 4. Download everything (~46 GB total, one-time)
uv run genome-refresh bootstrap all

# 5. Verify end-to-end with the HG001 demo (~20 min)
uv run genome-refresh demo

# 6. Configure your sample
cp samples/example.yaml samples/me.yaml
# Edit samples/me.yaml — set inputs.snp_indel_vcf to your WGS VCF path

# 7. Run the pipeline
uv run genome-refresh refresh me

# 8. Open the report
open reports/me/index.html   # macOS
xdg-open reports/me/index.html  # Linux

The first run downloads annotation modules on demand (~35 GB for OpenCRAVAT modules alone). Subsequent runs reuse the cache; a cached run on a 30x WGS takes roughly 3–5 hours, dominated by the OpenCRAVAT annotation stage.

List configured samples:

uv run genome-refresh list-samples

Install only missing components (if you interrupted bootstrap all):

uv run genome-refresh bootstrap missing

Install a specific component:

uv run genome-refresh bootstrap <component>
# Components: docker  oc-modules  annotsv  pgs-reference  snpedia  demo-data

How it works

See ARCHITECTURE.md for a stage-by-stage description of the data flow, the annotation cache, and how findings are tiered.


Trouble?

See TROUBLESHOOTING.md for common symptoms and fixes: Docker not running, OpenCRAVAT first-run size, zero findings, disk space, Windows console encoding, and more.


Contributing

See CONTRIBUTING.md for dev setup, test workflow, and PR guidelines.


Changelog

See CHANGELOG.md.


License

MIT — see LICENSE.

About

Self-hosted, local-first whole-genome annotation pipeline (ClinVar, PGS, pharmacogenomics, SV triage, mtDNA, SNPedia). Your DNA never leaves your machine.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages