**predecessor of piKa - PyTorch by Jessi Rose Hoernschemeyer, for my former MSc (2023-2025) **
Version pK-EGNN: Implementation of https://github.com/lucidrains/egnn-pytorch/tree/main for molecular chemical property prediction. All rights reserved. please mail for enquires: jessih97@zedat.fu-berlin.de
Author: Jessi Rose Hoernschemeyer, 2023-2025
Compute credit: NHR/ZIB (slurm)
Developed as part of my theoretical biophysics project "pKaSchNet: prediction of protonation shifts using continuous filter convolutional neural networks (CFCNNs)" with Prof. Dr. Cecilia Clementi and Prof. Dr. Maria Andrea Mroginski (Freie Univeristät Berlin (MSc) | Technische Universität Berlin | SFB 1078: Protonation Dynamics in Protein Function)
+++++++++++++++++++++++++++++++++++++++++
Here lives a data-driven physics-informed (inductive bias e.g. EGNN_Network featurizer, & like the benchmark pKAI, learned from Poisson-Boltzmann labels) regressor for predicting scalar pKa-shifts from structure and chemical identity by pooling over learned (equivariant) multi-scale representations. The goal is that the internal representations are energetic representations, and the output layers should make the desired chemical scalar prediction, which should be invariant to energetically indistinguishable inputs, e.g. a reflected protein.
Taught on synthetic labels (pkPDB/PypKa - Reis, shifted with pKAI ref values), mapped with inputs generated by pKParser. Structure info: RCSB.org
Credit: JRH, profs, ChatGPT 4-5.1 Thinking, who helped make me this bibliography:
🤖 Main
Phil Wang & Eric Alcaide. egnn-pytorch: Implementation of E(n)-Equivariant Graph Neural Networks in PyTorch. GitHub: https://github.com/lucidrains/egnn-pytorch
Reis, P. B. P. S., Bertolini, M., Montanari, F., Rocchia, W., Machuqueiro, M., & Clevert, D.-A. (2022). A fast and interpretable deep learning approach for accurate electrostatics-driven pKa predictions in proteins (pKAI / pKAI+). Journal of Chemical Theory and Computation, 18(8), 5068–5078. https://doi.org/10.1021/acs.jctc.2c00308
pKAI / pKAI+ (software repo). pKAI+: fast and interpretable deep learning models for protein pKa prediction. GitHub: https://github.com/bayer-science-for-a-better-life/pKAI
Reis, P. B. P. S., Vila-Viçosa, D., Rocchia, W., & Machuqueiro, M. (2020). PypKa: A flexible Python module for Poisson–Boltzmann-based pKa calculations. Journal of Chemical Information and Modeling, 60(10), 4442–4448. https://doi.org/10.1021/acs.jcim.0c00718
PypKa (software repo). PypKa: A python module for flexible Poisson–Boltzmann-based pKa calculations with proton tautomerism. GitHub: https://github.com/mms-fcul/PypKa
PypKa server. Online Poisson–Boltzmann-based and AI-accelerated pKa calculations & biomolecular preparation. https://pypka.org/
pKPDB (web + repo). pKPDB: a Protein Data Bank extension database of pKa and pI theoretical values. Web: https://pypka.org/pkpdb/ | GitHub: https://github.com/mms-fcul/pKPDB
Reis, P. B. P. S., Clevert, D.-A., & Machuqueiro, M. (2022). pKPDB: a Protein Data Bank extension database of pKa and pI theoretical values. Bioinformatics, 38(1), 297–298. https://doi.org/10.1093/bioinformatics/btab518
🧠 Architectures + prep (EGNN, SchNet, scikit-learn)
Garcia Satorras, V., Hoogeboom, E., & Welling, M. (2021). E(n) Equivariant Graph Neural Networks. arXiv:2102.09844. https://arxiv.org/pdf/2102.09844
EGNN – PyTorch implementation. Phil Wang & Eric Alcaide. egnn-pytorch: Implementation of E(n)-Equivariant Graph Neural Networks in PyTorch. GitHub: https://github.com/lucidrains/egnn-pytorch
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. (KNN/hoods)
Eastman, P., Swails, J., Chodera, J. D., et al. (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology, 13(7), e1005659. (Describes OpenMM and PDBFixer.)
Word, J. M., Lovell, S. C., Richardson, J. S., & Richardson, D. C. (1999). Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. Journal of Molecular Biology, 285(4), 1735–1747. (Reduce program.)
Schütt, K. T., Kindermans, P.-J., Sauceda, H. E., Chmiela, S., Tkatchenko, A., & Müller, K.-R. (2017). SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in Neural Information Processing Systems 30, 992–1002.
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A., & Müller, K.-R. (2018). SchNet – a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24), 241722. https://doi.org/10.1063/1.5019779
SchNetPack – Deep neural networks for atomistic systems. GitHub toolbox: https://github.com/atomistic-machine-learning/schnetpack
Case, D. A., Duke, R. E., Walker, R. C., et al. (2022). Amber 22 Reference Manual (Amber22 and AmberTools22). University of California, San Francisco.
🤖 Other
Chen, A. Y., Lee, J., Damjanovic, A., & Brooks, B. R. (2022). Protein pKa prediction by tree-based machine learning. Journal of Chemical Theory and Computation, 18(4), 2673–2686. https://doi.org/10.1021/acs.jctc.1c01257
Cai, Z., Wei, W., Wang, L., Yang, X., Zhang, Y., & Huang, X. (2023). Basis for accurate protein pKa prediction with machine learning. Journal of Computational Biophysics and Chemistry, 2(2), 2250023.
Cai, Z., Wei, W., Wang, L., Yang, X., Zhang, Y., & Huang, X. (2024). DeepKa Web Server: High-throughput protein pKa prediction. Journal of Chemical Information and Modeling, 64(5), 1504–1514.
Gokcan, H., & Isayev, O. (2022). Prediction of protein pKa with representation learning. Chemical Science, 13(8), 2462–2474. https://doi.org/10.1039/D1SC05610G
Pahari, S., Sun, L., Alexov, E., & König, G. (2019). PKAD: a database of experimentally measured pKa values of ionizable groups in proteins. Database (Oxford), 2019, baz024. https://doi.org/10.1093/database/baz024