QAFI

Quantitative Assessment of Functional Impact — a machine-learning framework for predicting variant functional scores from protein sequence, structure, and evolutionary features.

Project Structure

QAFI_CODE_NEW/
├── src/qafi/                 # Core library
│   ├── feature/              # Feature engineering (23 feature blocks)
│   ├── data/                 # Data loading & intermediate datasets
│   ├── model/                # Prediction models
│   │   ├── psp/              # PSP models (psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1/2)
│   │   └── qafi/             # QAFI models (qafi2, qafisplit1/2/3)
│   └── validation/           # Benchmarking (GOF/LOF classification)
├── scripts/                  # CLI entry points
│   ├── features/             # Feature generation scripts
│   ├── models/               # Model training & prediction scripts
│   └── benchmark/            # Benchmark experiment scripts (Beltran, ClinVar, Test30)
├── data/                     # Input data (CSV, protein files)
├── notebook/                 # Showcase notebooks (4 notebooks)
├── reports/                  # Generated figures & tables
│   ├── thesis/               # Thesis figures
│   ├── paper/                # Paper figures
│   └── misc/                 # Other figures
├── outputs/                  # Model outputs (generated at runtime)
└── tests/                    # Tests

Quick Start

All commands are run from the project root (QAFI_CODE_NEW/).

1. Build Features

Takes a raw variant CSV and computes all 23 feature blocks:

python scripts/features/build_all_features.py \
    --input-csv data/proteins/Q9Y375/Q9Y375_features5.csv \
    --output-csv outputs/features/Q9Y375_all_features.csv

2. Run PSP Models

PSP (Per-protein Score Predictor) trains per-protein regression models:

# List available methods
python scripts/models/run_psp.py --list

# Run all PSP methods
python scripts/models/run_psp.py --all

# Run a single method
python scripts/models/run_psp.py --method psp2_mlr

Available methods: psp2_mlr, psp2_xgb, psp2_rfr, pspsplit1, pspsplit2, pspsplit2_obs, pspsplit2_fusion

3. Run QAFI Models

QAFI trains on multiple proteins and predicts on an unseen test protein:

# List available methods
python scripts/models/run_qafi.py --list

# Run all QAFI methods for a specific protein
python scripts/models/run_qafi.py --all --uniprot Q9Y375

# Run a single method
python scripts/models/run_qafi.py --method qafisplit2 --uniprot Q9Y375

Available methods: qafi2, qafisplit1, qafisplit2, qafisplit3

Options: --sim-metric {pearson,cosine}, --target score_log_normalized, --no-save

Full Pipeline Example

# Step 1: Generate features
python scripts/features/build_all_features.py \
    --input-csv data/proteins/Q9Y375/Q9Y375_features5.csv

# Step 2: Train PSP baselines
python scripts/models/run_psp.py --all

# Step 3: Train and predict with QAFI
python scripts/models/run_qafi.py --all --uniprot Q9Y375

Outputs are saved to outputs/runs/psp/ and outputs/runs/qafi/.

4. Run Benchmarks

Evaluate QAFI on external benchmark datasets:

# Beltran DMS fitness benchmark
python scripts/benchmark/beltran_predict.py --method all

# ClinVar clinical variant benchmark
python scripts/benchmark/clinvar_predict.py --method all

# Hold-out 30-protein validation
python scripts/benchmark/test30.py --method all

See scripts/benchmark/README.md for data setup and details.

Notebooks

Interactive showcase notebooks that demonstrate each module step-by-step:

Notebook	What it shows
`feature_generation_showcase.ipynb`	All 23 feature blocks, one by one, with KDE plots
`psp_model_showcase.ipynb`	All 7 PSP methods, each with prediction preview & KDE
`qafi_model_showcase.ipynb`	All 4 QAFI methods, each with prediction preview & KDE
`goflof_benchmark_showcase.ipynb`	GOF/LOF benchmark: variant mechanism analysis & AUC

Dependencies

Core: pandas, numpy, scikit-learn, matplotlib, xgboost, imbalanced-learn

Install with:

pip install pandas numpy scikit-learn matplotlib xgboost imbalanced-learn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QAFI

Project Structure

Quick Start

1. Build Features

2. Run PSP Models

3. Run QAFI Models

Full Pipeline Example

4. Run Benchmarks

Notebooks

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
notebook		notebook
scripts		scripts
src/qafi		src/qafi
tests		tests
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

QAFI

Project Structure

Quick Start

1. Build Features

2. Run PSP Models

3. Run QAFI Models

Full Pipeline Example

4. Run Benchmarks

Notebooks

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages