The accurate prediction of changes in binding free energy (
Given a wild-type antibody-antigen complex with a resolved 3D structure (PDB) and a specific single-point mutation (defined by chain, residue index, wild-type amino acid
A negative
To ensure that performance gains are attributable to structural reasoning rather than data leakage or simple residue propensities, this project employs a two-stage modeling strategy.
Before implementing geometric deep learning, we establish a "lower bound" of performance using interpretable linear models (Linear/Logistic Regression) and tree-based ensembles (Random Forest/XGBoost).
Rationale:
- Leakage Detection: Historical benchmarks on datasets like SKEMPI have frequently suffered from data leakage, where models memorize complex-specific biases rather than learning biophysical rules (1). High performance by a linear model often indicates improper train/test splits (e.g., random splitting rather than complex-level splitting).
- Signal Quantification: Recent studies suggest that sequence-only and simple statistical features can achieve significant correlations in
$\Delta\Delta G$ prediction tasks (2). This baseline quantifies how much variance in the AB-Bind dataset can be explained by residue identity and physicochemical descriptors alone, without explicit 3D coordinates. - Interpretability: Linear weights provide immediate sanity checks regarding physicochemical intuition (e.g., penalties for burying hydrophilic residues or introducing steric clashes via volume changes).
- Wild-type and Mutant amino acid identities (One-Hot).
- Physicochemical property shifts (Volume, Hydrophobicity, Charge, Polarity).
- Interface vs. Non-interface positioning flags.
Upon validation of the dataset splits and baselines, we implement a geometric deep learning architecture.
Graph Construction: Following the philosophy of frameworks such as DeepRank-GNN (3), the protein interface is transformed into a graph
- Nodes (
$\mathcal{V}$ ): Interface residues (defined by a distance cutoff, typically 8-10 Angstrom) from both the Antibody and Antigen. Features include amino acid type, chain logic (Ab/Ag), and atomic coordinates. - Edges (
$\mathcal{E}$ ): Spatial neighbors within the defined cutoff. Edges are annotated with Euclidean distances and categorical flags for intra-chain vs. inter-chain interactions.
Architecture:
- A message-passing GNN (e.g., Graph Convolutional Networks or Graph Attention Networks) propagates features across the interface graph.
- The architecture specifically encodes the mutation site, allowing the network to learn the localized perturbation in the structural environment.
- The readout layer pools node representations to regress the scalar
$\Delta\Delta G$ .
AB-Bind
- Source: Sirin et al., AB-Bind: Antibody binding mutational database for computational affinity predictions
- Composition: 1,101 mutants across 32 unique antibody-antigen complexes with experimentally determined
$\Delta\Delta G$ values. - Processing: Raw data is sourced via submodule from
3D-GNN-over-antibody-antigen/data/external/AB-Bind-Database. Processed data (3D-GNN-over-antibody-antigen/data/processed/ab_bind_with_labels.csv) includes PDB IDs, chain mappings, normalized mutation strings, and discrete labels (Improved/Neutral/Worsened) for classification tasks.
SKEMPI 2.0
- Source: Jankauskaite et al., SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy
- Composition: 7,085 mutants covering a diverse range of general protein-protein interactions (PPIs).
- Utility: Used to assess the transferability of features learned on antibody interfaces to general PPIs.
- Geng, C., et al. (2019). ISPRED4: interaction sites PREDiction in protein structures with a refinement strategy. Bioinformatics. (Context: Evaluation of leakage in PPI datasets).
- Dehghanpoor, R., et al. (2018). ProAffiMuSeq: sequence-based prediction of protein-protein binding affinity change upon mutation. Bioinformatics.
- Rau, M., et al. (2023). DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics.
- Sirin, S., et al. (2016). AB-Bind: Antibody binding mutational database for computational affinity predictions. Protein Science.
- Jankauskaite, J., et al. (2019). SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics.
Dependencies for the current scripts are declared in requirements.txt and include biopython and torch in addition to the pandas/scikit-learn stack; install them with pip install -r requirements.txt.
The complex-level CV pipeline additionally uses pyyaml for config parsing.
Enhanced feature engineering adds per-mutation structural context (mutation/neighbor counts, chain type, partner contact density, interface flags, and distance-to-partner) to 3D-GNN-over-antibody-antigen/data/processed/ab_bind_features.csv.
- Run
python -m src.data.prepare_ab_bindto consume3D-GNN-over-antibody-antigen/data/processed/ab_bind_with_labels.csv, engineer physicochemical shift statistics plus the structural context above, and build group-consistent train/validation/test splits. The script writes3D-GNN-over-antibody-antigen/data/processed/ab_bind_features.csvand3D-GNN-over-antibody-antigen/data/processed/ab_bind_splits.json. - Execute
python -m src.data.build_interface_graphsto parse the PDBs underdata/external/AB-Bind-Database, retain residues around each mutation, and serialize per-mutation interface graphs (with solvent proxies, atomic counts, and B-factor cues) to3D-GNN-over-antibody-antigen/data/graphs/ab_bind_graphs.pkl. The parser now preprocesses files like3nps.pdbto fill missing occupancy columns so every structure yields features. - Use
python -m src.baselines.train_baselinesto fit ridge, random forest, and gradient-boosting regressors on the engineered features; per-split metrics are saved to3D-GNN-over-antibody-antigen/reports/metrics/baseline_metrics.csv. - Run
python -m src.gnn.train_gnn(optionally pass--patience Nfor early stopping) to load the serialized graphs, train a lightweight attention-weighted message-passing GNN, and log epoch-level metrics to3D-GNN-over-antibody-antigen/reports/metrics/gnn_metrics.csv. - Execute
python scripts/compare_models.pyto merge baseline and GNN predictions and write comparison metrics/plots underresults/model_comparison/.
Future steps should compare the new GNN metrics against the baselines and extend the structural graphs to SKEMPI 2.0 for transfer evaluation.
To quantify variance across complex-wise splits (and avoid leakage), we run GroupKFold at the complex level. The CV pipeline reuses the same feature table and baseline hyperparameters, and optionally stacks GBT + fixed GNN predictions (no per-fold GNN retraining).
Ranges from the 5-fold × 3-repeat run:
- random_forest: Pearson -0.01–0.51, Spearman -0.02–0.54, MAE 1.16–1.97 kcal/mol
- gbt: Pearson 0.05–0.47, Spearman 0.09–0.54, MAE 1.19–2.12 kcal/mol
- stack_gbt_gnn: Pearson 0.05–0.47, Spearman 0.09–0.54, MAE 1.13–1.87 kcal/mol
Stacking improves MAE in many folds but does not consistently improve correlation versus GBT.
R² remains close to zero on this small, noisy dataset; these are baseline-level results, not state of the art.
How to reproduce:
# Run complex-level CV (uses config/complex_cv.yaml)
python scripts/run_complex_cv.py --config config/complex_cv.yaml
# Generate CV summary plots
python scripts/plot_complex_cv_results.py- Best tested GNN (v1): InterfaceGNN, hidden_dim=128, layers=3, dropout=0.0, target standardization + distribution-weighted loss.
- Comparative results:
| Model | Test MAE | Test RMSE | Test R2 |
|---|---|---|---|
| Gradient boosting | 0.80 | 0.99 | 0.22 |
| Random forest | 0.89 | 1.07 | 0.01 |
| Ridge | >1.0 | >1.3 | <0 |
| InterfaceGNN (v1) | 0.74 | 1.14 | -0.02 |
Takeaway: the GNN pipeline trains and matches the MAE of strong tabular baselines but still trails in R2; next steps are richer node/edge features and light architecture tweaks on GPU (3-4 layers, 128-256 channels).
-
3D-GNN-over-antibody-antigen/reports/metrics/gnn_metrics.csvReplace with a clean GPU run (InterfaceGNN 128d/3 layers, lr=1e-3, standardize targets, dist_weighted loss), then rerun the comparison plot. -
scripts/compare_models.pyRunpython scripts/compare_models.pyafter updating metrics to generateresults/model_comparison/outputs. -
src/data/build_interface_graphs.py(v2 feature enrichment) Loaddata/processed/ab_bind_features.csvand inject physico-chemical features (hydrophobicity/volume/charge/polarity deltas, structural stats) intonode_features(and optionallyedge_attr) keyed bysample_id. -
src/gnn/train_gnn.py/InterfaceGNNKeep 3-4 layers, 128-256 channels; continue using clamped edge_attr/weights and small readout. If graph features are enriched, adjust input dimensions accordingly. -
Suggested GPU sweep (v2) hidden_dim in {128, 256}, layers in {3, 4}, lr in {5e-4, 1e-3}, with target standardization + dist_weighted loss.