SeizeVar

Source code for SeizeVar, a two-layer pipeline for mechanism-aware variant interpretation in monogenic epilepsy. Pairs a saturating pathogenicity head (Random Forest + ESM-2 LoRA cross-attention) with a dedicated gain-versus-loss- of-function mechanism classifier and a deterministic sodium-channel prescribing rule, scored under a leave-one-out dynamic-evidence framework that re-uses the live ClinVar reference at scoring time.

Companion paper: From pathogenicity to prescription: mechanism-aware variant interpretation in monogenic epilepsy — Ye and Chen, 2026.

What is and is not in this repository

This repository contains only source code and small reference tables. The full training data, trained model weights, and per-variant prediction tables are deposited on Zenodo (DOI to be assigned at acceptance) — too large for version control and partly subject to upstream redistribution constraints (AlphaFold v4, UniProt, ClinVar snapshots).

Released here	Released on Zenodo
Pipeline source code (≈ 35 `.py` + 5 notebooks)	Raw inputs (`data/`, ≈ 450 MB)
Figure-generation scripts	Trained model weights (`models_trained/`, ≈ 580 MB)
Small reference tables (`data_release/`)	Per-stage intermediate outputs
Build scripts	Full per-variant prediction tables

Repository layout

seizevar/
├── 01_data/code/                Data extraction, splits, augmentation, leakage audit
├── 02_features/code/            39-feature computation (gene / residue / substitution / evidence)
├── 03_models/
│   ├── pathogenicity/code/      Random-Forest + ESM-2 LoRA training and inference
│   └── mechanism/code/          Gain-vs-loss-of-function mechanism head training
├── 05_vus_application/code/     Prospective scoring of the 29,293-variant VUS pool
├── 06_competitors/code/         AlphaMissense / REVEL / MetaRNN / LoGoFunc benchmarking
├── data_release/                Curated small-table release (full release on Zenodo)
├── notebooks/                   01_data_pipeline.ipynb, 02_features_pipeline.ipynb
├── infra/                       Colab packing + training and scoring notebooks
└── figures/scripts/             Paper figure generation (Fig 1–6 + Fig S1–S7)

Reproducing the pipeline end-to-end

The scripts are stage-numbered and intended to be run in order. Each stage reads from the previous stage's outputs/ and writes the next.

# 1. Extract ClinVar, build splits, audit leakage
python 01_data/code/01_extract_clinvar.py
python 01_data/code/02_build_splits.py
python 01_data/code/03_augment_train.py
python 01_data/code/04_build_extra_valsets.py
python 01_data/code/05_leakage_audit.py

# 2. Compute 39-dimensional feature matrix
python 02_features/code/compute_features.py
python 02_features/code/audit_features.py

# 3. Train models
python 03_models/pathogenicity/code/train_rf_pathogenicity.py
python 03_models/pathogenicity/code/train_esm_lora.py     # GPU recommended (Colab)
python 03_models/mechanism/code/train_rf_mechanism.py

# 4. Score the prospective VUS pool
python 05_vus_application/code/predict_vus_rf.py
python 05_vus_application/code/predict_vus_esm.py
python 05_vus_application/code/merge_vus_predictions.py

# 5. Competitor comparison
python 06_competitors/code/fetch_am_for_vus.py
python 06_competitors/code/extract_logofunc.py
python 06_competitors/code/compute_auroc.py
python 06_competitors/code/mechanism_benchmark.py

The ESM-2 LoRA fine-tune is GPU-bound; infra/seizevar_colab_pro.ipynb mirrors the local script for free Colab Pro execution. The matched VUS scoring notebook is infra/seizevar_vus_colab.ipynb.

data_release/build_release.py regenerates the curated release tables.

Inputs the code expects

The scripts read from data/ and 01_data/outputs/, neither of which is in this repository. Pull both from Zenodo at the DOI listed in the companion paper, then unpack at the repository root before running.

Citation

If you use this code, please cite the companion paper:

Ye S, Chen P. From pathogenicity to prescription: mechanism-aware variant interpretation in monogenic epilepsy. Submitted, 2026.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SeizeVar

What is and is not in this repository

Repository layout

Reproducing the pipeline end-to-end

Inputs the code expects

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
01_data/code		01_data/code
02_features/code		02_features/code
03_models		03_models
05_vus_application/code		05_vus_application/code
06_competitors/code		06_competitors/code
data_release		data_release
figures/scripts		figures/scripts
infra		infra
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SeizeVar

What is and is not in this repository

Repository layout

Reproducing the pipeline end-to-end

Inputs the code expects

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages