Skip to content

pipi0616/QAFI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QAFI

Quantitative Assessment of Functional Impact — a machine-learning framework for predicting variant functional scores from protein sequence, structure, and evolutionary features.

Project Structure

QAFI_CODE_NEW/
├── src/qafi/                 # Core library
│   ├── feature/              # Feature engineering (23 feature blocks)
│   ├── data/                 # Data loading & intermediate datasets
│   ├── model/                # Prediction models
│   │   ├── psp/              # PSP models (psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1/2)
│   │   └── qafi/             # QAFI models (qafi2, qafisplit1/2/3)
│   └── validation/           # Benchmarking (GOF/LOF classification)
├── scripts/                  # CLI entry points
│   ├── features/             # Feature generation scripts
│   ├── models/               # Model training & prediction scripts
│   └── benchmark/            # Benchmark experiment scripts (Beltran, ClinVar, Test30)
├── data/                     # Input data (CSV, protein files)
├── notebook/                 # Showcase notebooks (4 notebooks)
├── reports/                  # Generated figures & tables
│   ├── thesis/               # Thesis figures
│   ├── paper/                # Paper figures
│   └── misc/                 # Other figures
├── outputs/                  # Model outputs (generated at runtime)
└── tests/                    # Tests

Quick Start

All commands are run from the project root (QAFI_CODE_NEW/).

1. Build Features

Takes a raw variant CSV and computes all 23 feature blocks:

python scripts/features/build_all_features.py \
    --input-csv data/proteins/Q9Y375/Q9Y375_features5.csv \
    --output-csv outputs/features/Q9Y375_all_features.csv

2. Run PSP Models

PSP (Per-protein Score Predictor) trains per-protein regression models:

# List available methods
python scripts/models/run_psp.py --list

# Run all PSP methods
python scripts/models/run_psp.py --all

# Run a single method
python scripts/models/run_psp.py --method psp2_mlr

Available methods: psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1, pspsplit2, pspsplit2_obs, pspsplit2_fusion

3. Run QAFI Models

QAFI trains on multiple proteins and predicts on an unseen test protein:

# List available methods
python scripts/models/run_qafi.py --list

# Run all QAFI methods for a specific protein
python scripts/models/run_qafi.py --all --uniprot Q9Y375

# Run a single method
python scripts/models/run_qafi.py --method qafisplit2 --uniprot Q9Y375

Available methods: qafi2, qafisplit1, qafisplit2, qafisplit3

Options: --sim-metric {pearson,cosine}, --target score_log_normalized, --no-save

Full Pipeline Example

# Step 1: Generate features
python scripts/features/build_all_features.py \
    --input-csv data/proteins/Q9Y375/Q9Y375_features5.csv

# Step 2: Train PSP baselines
python scripts/models/run_psp.py --all

# Step 3: Train and predict with QAFI
python scripts/models/run_qafi.py --all --uniprot Q9Y375

Outputs are saved to outputs/runs/psp/ and outputs/runs/qafi/.

4. Run Benchmarks

Evaluate QAFI on external benchmark datasets:

# Beltran DMS fitness benchmark
python scripts/benchmark/beltran_predict.py --method all

# ClinVar clinical variant benchmark
python scripts/benchmark/clinvar_predict.py --method all

# Hold-out 30-protein validation
python scripts/benchmark/test30.py --method all

See scripts/benchmark/README.md for data setup and details.

Notebooks

Interactive showcase notebooks that demonstrate each module step-by-step:

Notebook What it shows
feature_generation_showcase.ipynb All 23 feature blocks, one by one, with KDE plots
psp_model_showcase.ipynb All 7 PSP methods, each with prediction preview & KDE
qafi_model_showcase.ipynb All 4 QAFI methods, each with prediction preview & KDE
goflof_benchmark_showcase.ipynb GOF/LOF benchmark: variant mechanism analysis & AUC

Dependencies

Core: pandas, numpy, scikit-learn, matplotlib, xgboost, imbalanced-learn

Install with:

pip install pandas numpy scikit-learn matplotlib xgboost imbalanced-learn

About

QAFI: Quantitative Assessment of Functional Impact - ML framework for protein variant prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors