genomine

A self-hosted whole-genome annotation pipeline. Point it at your WGS VCF, let it run, and get a browseable HTML report with ClinVar significance, pharmacogenomic diplotypes, polygenic risk scores, structural variant interpretation, mtDNA haplogroup, and SNPedia community notes — all processed locally on your own machine.

Educational and wellness use only. This software does not provide medical advice and is not a clinical diagnostic tool. Results should not be used to make health decisions without consultation with a qualified clinician. Variant classification and risk scores are approximate and may contain errors.

What you get

A single HTML report (reports/<sample>/index.html) containing:

ClinVar significance — ~36,000 variant matches on a full 30x WGS, cross-referenced against expert-reviewed pathogenic/benign calls.
Clinical-grade pharmacogenomic findings — PyPGx star-allele calls for ~10 pharmacogenes (CYP2D6, CYP3A5, DPYD, etc.) with CPIC phenotype interpretation.
Polygenic risk scores with ancestry percentiles — 6 curated traits (CAD, T2D, Alzheimer's, breast cancer, prostate cancer, longevity) with percentile relative to a matched ancestry reference panel.
mtDNA haplogroup — classified via HaploGrep from mitochondrial variants.
AnnotSV structural variant interpretation — CNV/SV findings scored with ACMG criteria if you supply a CNV/SV VCF alongside your SNP/indel VCF.
SNPedia community notes — rsID matches against the SNPedia community-curated database (optional; ~18,000 matches on a full WGS).

Requirements

Input data:

A whole-genome sequencing (WGS) VCF in GRCh38 coordinates — 30x coverage is ideal. Compatible sources include Nebula Genomics, Dante Labs, Sequencing.com, or a raw Illumina pipeline output.
v0.1 does not support 23andMe or AncestryDNA chip TXT files. Chip arrays cover ~0.1% of the genome and lack the read-depth information the pipeline's QC filters rely on.

System:

Operating system: Linux or macOS (primary supported path). Windows users should use WSL2 — see INSTALL.md.
Docker — required. All annotation tools run in containers; no native installs of bcftools, OpenCRAVAT, etc. are needed.
~60–80 GB free disk — annotation databases are large (see INSTALL.md for a full breakdown).
16 GB+ RAM recommended for the OpenCRAVAT annotation stage.

Quickstart

# 1. Clone
git clone https://github.com/Gunshipz/genomine.git
cd genomine

# 2. Install Python deps
uv sync

# 3. Check what's installed / what needs downloading
uv run genome-refresh bootstrap status

# 4. Download everything (~46 GB total, one-time)
uv run genome-refresh bootstrap all

# 5. Verify end-to-end with the HG001 demo (~20 min)
uv run genome-refresh demo

# 6. Configure your sample
cp samples/example.yaml samples/me.yaml
# Edit samples/me.yaml — set inputs.snp_indel_vcf to your WGS VCF path

# 7. Run the pipeline
uv run genome-refresh refresh me

# 8. Open the report
open reports/me/index.html   # macOS
xdg-open reports/me/index.html  # Linux

The first run downloads annotation modules on demand (~35 GB for OpenCRAVAT modules alone). Subsequent runs reuse the cache; a cached run on a 30x WGS takes roughly 3–5 hours, dominated by the OpenCRAVAT annotation stage.

List configured samples:

uv run genome-refresh list-samples

Install only missing components (if you interrupted bootstrap all):

uv run genome-refresh bootstrap missing

Install a specific component:

uv run genome-refresh bootstrap <component>
# Components: docker  oc-modules  annotsv  pgs-reference  snpedia  demo-data

How it works

See ARCHITECTURE.md for a stage-by-stage description of the data flow, the annotation cache, and how findings are tiered.

Trouble?

See TROUBLESHOOTING.md for common symptoms and fixes: Docker not running, OpenCRAVAT first-run size, zero findings, disk space, Windows console encoding, and more.

Contributing

See CONTRIBUTING.md for dev setup, test workflow, and PR guidelines.

Changelog

See CHANGELOG.md.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
docker		docker
docs		docs
refdata		refdata
samples		samples
scripts		scripts
src/genome_analysis		src/genome_analysis
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

genomine

What you get

Requirements

Quickstart

How it works

Trouble?

Contributing

Changelog

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

genomine

What you get

Requirements

Quickstart

How it works

Trouble?

Contributing

Changelog

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages