EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks
Abstract
:1. Introduction
2. Results and Discussion
3. Materials and Methods
3.1. Datasets
3.2. Training Procedure
3.3. Model Architectures
3.4. Implementation
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
Multimer | Multiple Chain Protein |
Monomer | Single Chain Protein |
EMA | Model Accuracy Estimation |
CASP | The Critical Assessment of Protein Structure Prediction |
GNN | Graph Neural Network |
FCL | Fully Connected Layer |
EBM | Energy-Based Model |
CB | Beta Carbon |
CA | Alpha Carbon |
CDP | Contact Dependent Potential |
RSAP | Relative Solvent Accessibility Potential |
MSE | Mean Squared Error |
L1 | Mean Absolute Error |
TM-score | Overall Fold Score |
QS-score | Overall Interface Score |
DCG | Discounted Cumulative Gain |
NDCG | Normalized Discounted Cumulative Gain |
References
- Dorn, M.; e Silva, M.B.; Buriol, L.S.; Lamb, L.C. Three-dimensional protein structure prediction: Methods and computational strategies. Comput. Biol. Chem. 2014, 53, 251–276. [Google Scholar] [CrossRef] [PubMed]
- Johnson, M.S.; Srinivasan, N.; Sowdhamini, R.; Blundell, T.L. Knowledge-Based Protein Modeling. Crit. Rev. Biochem. Mol. Biol. 1994, 29, 1–68. [Google Scholar] [CrossRef] [PubMed]
- Nero, T.L.; Parker, M.W.; Morton, C.J. Protein structure and computational drug discovery. Biochem. Soc. Trans. 2018, 46, 1367–1379. [Google Scholar] [CrossRef] [PubMed]
- Jubb, H.C.; Pandurangan, A.P.; Turner, M.A.; Ochoa-Montaño, B.; Blundell, T.L.; Ascher, D.B. Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. Prog. Biophys. Mol. Biol. 2017, 128, 3–13. [Google Scholar] [CrossRef] [PubMed]
- Shi, Y.; Huang, Z.; Feng, S.; Zhong, H.; Wang, W.; Sun, Y. Masked label prediction: Unified message passing model for semi-supervised classification. arXiv 2020, arXiv:2009.03509. [Google Scholar]
- Dwivedi, V.P.; Bresson, X. A Generalization of Transformer Networks to Graphs. 2020. Available online: https://doi.org/10.48550/ARXIV.2012.09699 (accessed on 1 March 2024).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need Advances in Neural Information Processing Systems 30. 2017. Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 1 March 2024).
- Chen, X.; Morehead, A.; Liu, J.; Cheng, J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023, 39, i308–i317. [Google Scholar] [CrossRef] [PubMed]
- Zhao, C.; Liu, T.; Wang, Z. Predicting residue-specific qualities of individual protein models using residual neural networks and graph neural networks. Proteins Struct. Funct. Bioinform. 2022, 90, 2091–2102. [Google Scholar] [CrossRef] [PubMed]
- Baldassarre, F.; Hurtado, D.M.; Elofsson, A.; Azizpour, H. GraphQA: Protein model quality assessment using graph convolutional networks. Bioinformatics 2020, 37, 360–366. [Google Scholar] [CrossRef] [PubMed]
- Zhang, P.; Xia, C.; Shen, H.B. High-accuracy protein model quality assessment using attention graph neural networks. Briefings Bioinform. 2022, 24, bbac614. [Google Scholar] [CrossRef]
- Anfinsen, C.B. Principles that Govern the Folding of Protein Chains. Science 1973, 181, 223–230. [Google Scholar] [CrossRef]
- Alford, R.F.; Leaver-Fay, A.; Jeliazkov, J.R.; O’Meara, M.J.; DiMaio, F.P.; Park, H.; Shapovalov, M.V.; Renfrew, P.D.; Mulligan, V.K.; Kappel, K.; et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031–3048. [Google Scholar] [CrossRef] [PubMed]
- Cao, Y.; Shen, Y. Energy-based graph convolutional networks for scoring protein docking models. Proteins Struct. Funct. Bioinform. 2020, 88, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
- Du, Y.; Meier, J.; Ma, J.; Fergus, R.; Rives, A. Energy-based models for atomic-resolution protein conformations. arXiv 2020, arXiv:2004.13167. [Google Scholar] [CrossRef]
- Studer, G.; Tauriello, G.; Schwede, T. Assessment of the assessment—All about complexes. Proteins Struct. Funct. Bioinform. 2023, 91, 1850–1860. [Google Scholar] [CrossRef]
- Morehead, A.; Chen, X.; Wu, T.; Liu, J.; Cheng, J. EGR: Equivariant Graph Refinement and Assessment of 3D Protein Complex Structures. arXiv 2022, arXiv:2205.10390. [Google Scholar] [CrossRef]
- Liu, J.; Liu, D.; He, G.; Zhang, G. Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15. Proteins Struct. Funct. Bioinform. 2023, 91, 1861–1870. [Google Scholar] [CrossRef]
- Olechnovič, K.; Venclovas, Č. VoroIF-GNN: Voronoi tessellation-derived protein–protein interface assessment using a graph neural network. Proteins Struct. Funct. Bioinform. 2023, 91, 1879–1888. [Google Scholar] [CrossRef]
- Järvelin, K.; Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 2002, 20, 422–446. [Google Scholar] [CrossRef]
- The PyMOL Molecular Graphics System, Version 2.5.4; Schrödinger, LLC, 2015. Available online: https://pymol.org/(accessed on 1 March 2024).
- Berman, H.M. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
- Chaudhury, S.; Lyskov, S.; Gray, J.J. PyRosetta: A script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 2010, 26, 689–691. [Google Scholar] [CrossRef]
- Zhang, Y.; Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins Struct. Funct. Bioinform. 2004, 57, 702–710. [Google Scholar] [CrossRef] [PubMed]
- Bertoni, M.; Kiefer, F.; Biasini, M.; Bordoli, L.; Schwede, T. Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Sci. Rep. 2017, 7, 10480. [Google Scholar] [CrossRef] [PubMed]
- Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Wang, Z. SOV_refine: A further refined definition of segment overlap score and its significance for protein structure similarity. Source Code Biol. Med. 2018, 13, 1. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Wang, Z. MASS: Predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinform. 2020, 21, 246. [Google Scholar] [CrossRef] [PubMed]
- Gustafsson, F.K.; Danelljan, M.; Timofte, R.; Schön, T.B. How to Train Your Energy-Based Model for Regression. Proceedings of the British Machine Vision Conference (BMVC), September, 2022. Available online: https://www.bmvc2020-conference.com/assets/papers/0154.pdf (accessed on 1 March 2024).
- Hertel, L.; Collado, J.; Sadowski, P.; Ott, J.; Baldi, P. Sherpa: Robust Hyperparameter Optimization for Machine Learning. SoftwareX 2020, in press. [Google Scholar] [CrossRef]
- Seetharaman, P.; Wichern, G.; Pardo, B.; Roux, J.L. Autoclip: Adaptive Gradient Clipping for Source Separation Networks. In Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), Espoo, Finland, 21–24 September 2020. [Google Scholar] [CrossRef]
- Battaglia, P.; Hamrick, J.B.C.; Bapst, V.; Sanchez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational inductive biases, deep learning, and graph networks. arXiv 2018, arXiv:1806.01261. [Google Scholar]
- Gao, H.; Ji, S. Graph u-nets. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2083–2092. [Google Scholar]
- Cangea, C.; Veličković, P.; Jovanović, N.; Kipf, T.; Liò, P. Towards sparse hierarchical graph classifiers. arXiv 2018, arXiv:1811.01287. [Google Scholar]
- Knyazev, B.; Taylor, G.W.; Amer, M. Understanding attention and generalization in graph neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 4202–4212. [Google Scholar]
- Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated Graph Sequence Neural Networks. arXiv 2015, arXiv:1511.05493. [Google Scholar] [CrossRef]
- Cai, T.; Luo, S.; Xu, K.; He, D.; Liu, T.Y.; Wang, L. Graphnorm: A principled approach to accelerating graph neural network training. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 1204–1215. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Fey, M.; Lenssen, J.E. Fast Graph Representation Learning with PyTorch Geometric. In Proceedings of the ICLR Workshop on Representation Learning on Graphs and Manifolds, New Orleans, LA, USA, 6 May 2019. [Google Scholar]
Type | Backbone | TM-L1 | TM-MSE | QS-L1 | QS-MSE |
---|---|---|---|---|---|
Regression | Transformer | 0.243 | 0.094 | 0.314 | 0.127 |
EBM | Transformer | 0.245 | 0.096 | 0.341 | 0.156 |
Regression | MetaLayer | 0.263 | 0.098 | 0.300 | 0.124 |
EBM | MetaLayer | 0.242 | 0.088 | 0.299 | 0.126 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Siciliano, A.J.; Zhao, C.; Liu, T.; Wang, Z. EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks. Int. J. Mol. Sci. 2024, 25, 6250. https://doi.org/10.3390/ijms25116250
Siciliano AJ, Zhao C, Liu T, Wang Z. EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks. International Journal of Molecular Sciences. 2024; 25(11):6250. https://doi.org/10.3390/ijms25116250
Chicago/Turabian StyleSiciliano, Andrew Jordan, Chenguang Zhao, Tong Liu, and Zheng Wang. 2024. "EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks" International Journal of Molecular Sciences 25, no. 11: 6250. https://doi.org/10.3390/ijms25116250
APA StyleSiciliano, A. J., Zhao, C., Liu, T., & Wang, Z. (2024). EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks. International Journal of Molecular Sciences, 25(11), 6250. https://doi.org/10.3390/ijms25116250