Quantitative Assessment of Functional Impact — a machine-learning framework for predicting variant functional scores from protein sequence, structure, and evolutionary features.
QAFI_CODE_NEW/
├── src/qafi/ # Core library
│ ├── feature/ # Feature engineering (23 feature blocks)
│ ├── data/ # Data loading & intermediate datasets
│ ├── model/ # Prediction models
│ │ ├── psp/ # PSP models (psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1/2)
│ │ └── qafi/ # QAFI models (qafi2, qafisplit1/2/3)
│ └── validation/ # Benchmarking (GOF/LOF classification)
├── scripts/ # CLI entry points
│ ├── features/ # Feature generation scripts
│ ├── models/ # Model training & prediction scripts
│ └── benchmark/ # Benchmark experiment scripts (Beltran, ClinVar, Test30)
├── data/ # Input data (CSV, protein files)
├── notebook/ # Showcase notebooks (4 notebooks)
├── reports/ # Generated figures & tables
│ ├── thesis/ # Thesis figures
│ ├── paper/ # Paper figures
│ └── misc/ # Other figures
├── outputs/ # Model outputs (generated at runtime)
└── tests/ # Tests
All commands are run from the project root (QAFI_CODE_NEW/).
Takes a raw variant CSV and computes all 23 feature blocks:
python scripts/features/build_all_features.py \
--input-csv data/proteins/Q9Y375/Q9Y375_features5.csv \
--output-csv outputs/features/Q9Y375_all_features.csvPSP (Per-protein Score Predictor) trains per-protein regression models:
# List available methods
python scripts/models/run_psp.py --list
# Run all PSP methods
python scripts/models/run_psp.py --all
# Run a single method
python scripts/models/run_psp.py --method psp2_mlrAvailable methods: psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1, pspsplit2, pspsplit2_obs, pspsplit2_fusion
QAFI trains on multiple proteins and predicts on an unseen test protein:
# List available methods
python scripts/models/run_qafi.py --list
# Run all QAFI methods for a specific protein
python scripts/models/run_qafi.py --all --uniprot Q9Y375
# Run a single method
python scripts/models/run_qafi.py --method qafisplit2 --uniprot Q9Y375Available methods: qafi2, qafisplit1, qafisplit2, qafisplit3
Options: --sim-metric {pearson,cosine}, --target score_log_normalized, --no-save
# Step 1: Generate features
python scripts/features/build_all_features.py \
--input-csv data/proteins/Q9Y375/Q9Y375_features5.csv
# Step 2: Train PSP baselines
python scripts/models/run_psp.py --all
# Step 3: Train and predict with QAFI
python scripts/models/run_qafi.py --all --uniprot Q9Y375Outputs are saved to outputs/runs/psp/ and outputs/runs/qafi/.
Evaluate QAFI on external benchmark datasets:
# Beltran DMS fitness benchmark
python scripts/benchmark/beltran_predict.py --method all
# ClinVar clinical variant benchmark
python scripts/benchmark/clinvar_predict.py --method all
# Hold-out 30-protein validation
python scripts/benchmark/test30.py --method allSee scripts/benchmark/README.md for data setup and details.
Interactive showcase notebooks that demonstrate each module step-by-step:
| Notebook | What it shows |
|---|---|
feature_generation_showcase.ipynb |
All 23 feature blocks, one by one, with KDE plots |
psp_model_showcase.ipynb |
All 7 PSP methods, each with prediction preview & KDE |
qafi_model_showcase.ipynb |
All 4 QAFI methods, each with prediction preview & KDE |
goflof_benchmark_showcase.ipynb |
GOF/LOF benchmark: variant mechanism analysis & AUC |
Core: pandas, numpy, scikit-learn, matplotlib, xgboost, imbalanced-learn
Install with:
pip install pandas numpy scikit-learn matplotlib xgboost imbalanced-learn