Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
2.1.1. Data Collection
2.1.2. Data Preprocessing
2.2. Models
2.2.1. Background
2.2.2. Graph Convolutional Neural Networks (GCN)
- Step 1.
- Consider the influence of neighbor nodes on the current node:is the feature matrix of the node.
- Step 2.
- At the same time, the node itself should be considered:is the adjacency matrix of graph, is identity matrix.
- Step 3.
- Symmetrical normalization. If the degree difference between two adjacent nodes is large, the node with smaller degree will be distorted after each round of aggregation. To reduce this effect, we need symmetrical normalization:is degree matrix of graph.
- Step 4.
- Multiply the aggregated features by parameters and activate them:is the learnable parameters of layer l, is the feature matrix of layer l, is a nonlinear activation function.
2.2.3. Graph Attention Networks (GAT)
2.2.4. Hyperbolic Neural Networks (HNN)
- Step 1.
- According to the Möbius addition, the exponential mapping, the logarithmic mapping, the isometric method of parallel transport of the Levi-Civita connection, etc., it is possible to obtain a mapping relation from a vector v in a manifold to another tangent space , v∈(), as shown in Equation (13).is the Möbius addition in , is the exponential map at x, is a smooth function.
- Step 2.
- Embed two sentences using two different hyperbolic RNNs or GRUs. Naturally, a simple RNN can be generalized to the hyperbolic space as follows:, , , is a pointwise nonlinearity, typically tanh, sigmoid, ReLU, etc.Under the adjusted GRU architecture, the update gate equation is adjusted as:
- Step 3.
- Enter sentence embeddings with their hyperbolic or Euclidean squared distances into an FFNN according to their geometry. Here are some of the common operations (Möbius version) defined in the hyperbolic environment:As the map from to for f: , there the Möbius version of f is defined by:Möbius matrix–vector multiplication: If there is a linear map M: , identify with its matrix representation, then , if 0:
- Step 4.
- Input an MLR (Euclidean or hyperbolic), using the cross-entropy loss on top. To achieve multi-classification, multinomial logistic regression (MLR) (also known as softmax regression) needs to be generalized to Poincaré Ball:If there is a K class, k , , ∖
2.2.5. Hyperbolic Graph Convolutions (HGCN)
- Step 1.
- Mapping from Euclidean to hyperbolic spaces. The input features are generally Euclidean, and it is necessary to first map the input features to hyperbolic manifolds by exp mapping.is input Euclidean features, , o is the north pole (origin) in ,
- Step 2.
- Multi-stack hyperbolic graph convolution layers. At each layer, the HGCN takes the embeddings of the neighbors in the tangent space of the central node, performs a hyperbolic linear transformation and aggregates them based on attention, and projects the result into a hyperbolic space with a different curvature. That is, in the HGCN layer, it is necessary to pass, if there is a graph and Euclidean features :
- Step 3.
- Node attributes or links prediction. For link prediction, the probability scores of the edges were calculated using the Fermi–Dirac decoder [58,59] (a generalization of sigmoid):For node classification, we use logarithmic mapping to map the output of the last HGCN layer to the tangent space of the origin and then perform Euclidean polynomial logistic regression. Finally, a link prediction regularization objective was added to encourage embedding at the final layer, thus maintaining the structure of the graph.
2.3. Experiments
2.3.1. Dataset Settings
2.3.2. Implementation Details
2.3.3. Evaluation Metrics
3. Results and Discussion
3.1. Performance Comparison on SHS27k Dataset
3.1.1. ROC Comparison on SHS27k Dataset
3.1.2. AP Comparison on SHS27k Dataset
3.1.3. Runtime Comparison on SHS27k Dataset
3.2. Prediction Comparison on SHS148k Dataset
3.2.1. ROC Comparison on SHS148k Dataset
3.2.2. AP Comparison on SHS148k Dataset
3.2.3. Runtime Comparison on SHS148k Dataset
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
PPI | Protein–Protein Interactions |
NN | Neural Networks |
GCN | Graph Convolutional Neural Networks |
GAT | Graph Attention Networks |
HNN | Hyperbolic Neural Networks |
HGCN | Hyperbolic Graph Convolutions |
References
- Berggård, T.; Linse, S.; James, P. Methods for the detection and analysis of protein–protein interactions. Proteomics 2007, 7, 2833–2842. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.C.; Petrey, D.; Deng, L.; Qiang, L.; Shi, Y.; Thu, C.A.; Bisikirska, B.; Lefebvre, C.; Accili, D.; Hunter, T.; et al. Structure-based prediction of protein–protein interactions on a genome-wide scale. Nature 2012, 490, 556–560. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; You, Z.H.; Xia, S.X.; Liu, F.; Chen, X.; Yan, X.; Zhou, Y. Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. J. Theor. Biol. 2017, 418, 105–110. [Google Scholar] [CrossRef]
- Wang, R.S.; Wang, Y.; Wu, L.Y.; Zhang, X.S.; Chen, L. Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinform. 2007, 8, 1–20. [Google Scholar] [CrossRef] [PubMed]
- Skrabanek, L.; Saini, H.K.; Bader, G.D.; Enright, A.J. Computational prediction of protein–protein interactions. Mol. Biotechnol. 2008, 38, 1–17. [Google Scholar] [CrossRef]
- Dong, J.; Zhao, M.; Liu, Y.; Su, Y.; Zeng, X. Deep learning in retrosynthesis planning: Datasets, models and tools. Brief. Bioinform. 2022, 23, bbab391. [Google Scholar] [CrossRef]
- Ito, T.; Chiba, T.; Ozawa, R.; Yoshida, M.; Hattori, M.; Sakaki, Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 2001, 98, 4569–4574. [Google Scholar] [CrossRef]
- Gavin, A.C.; Bösche, M.; Krause, R.; Grandi, P.; Marzioch, M.; Bauer, A.; Schultz, J.; Rick, J.M.; Michon, A.M.; Cruciat, C.M.; et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147. [Google Scholar] [CrossRef]
- Ho, Y.; Gruhler, A.; Heilbut, A.; Bader, G.D.; Moore, L.; Adams, S.L.; Millar, A.; Taylor, P.; Bennett, K.; Boutilier, K.; et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415, 180–183. [Google Scholar] [CrossRef]
- Huang, H.; Alvarez, S.; Nusinow, D.A. Data on the identification of protein interactors with the Evening Complex and PCH1 in Arabidopsis using tandem affinity purification and mass spectrometry (TAP–MS). Data Brief 2016, 8, 56–60. [Google Scholar] [CrossRef] [Green Version]
- Foltman, M.; Sanchez-Diaz, A. Studying protein–protein interactions in budding yeast using co-immunoprecipitation. In Yeast Cytokinesis; Springer: Berlin/Heidelberg, Germany, 2016; pp. 239–256. [Google Scholar]
- Mrowka, R.; Patzak, A.; Herzel, H. Is there a bias in proteome research? Genome Res. 2001, 11, 1971–1973. [Google Scholar] [CrossRef]
- Melo, R.; Fieldhouse, R.; Melo, A.; Correia, J.D.; Cordeiro, M.N.D.; Gümüş, Z.H.; Costa, J.; Bonvin, A.M.; Moreira, I.S. A machine learning approach for hot-spot detection at protein-protein interfaces. Int. J. Mol. Sci. 2016, 17, 1215. [Google Scholar] [CrossRef] [PubMed]
- De Las Rivas, J.; Fontanillo, C. Protein–protein interactions essentials: Key concepts to building and analyzing interactome networks. PLoS Comput. Biol. 2010, 6, e1000807. [Google Scholar] [CrossRef]
- You, Z.H.; Zhou, M.; Luo, X.; Li, S. Highly efficient framework for predicting interactions between proteins. IEEE Trans. Cybern. 2016, 47, 731–743. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Li, F.; Wang, L.; Jin, Y.; Chi, C.H.; Kurgan, L.; Song, J.; Shen, J. Systematic evaluation of machine learning methods for identifying human–pathogen protein–protein interactions. Brief. Bioinform. 2021, 22, bbaa068. [Google Scholar] [CrossRef] [PubMed]
- Zhou, C.; Yu, H.; Ding, Y.; Guo, F.; Gong, X.J. Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS ONE 2017, 12, e0181426. [Google Scholar] [CrossRef]
- Lin, X.; Chen, X.w. Heterogeneous data integration by tree-augmented naïve B ayes for protein–protein interactions prediction. Proteomics 2013, 13, 261–268. [Google Scholar] [CrossRef]
- Li, J.Q.; You, Z.H.; Li, X.; Ming, Z.; Chen, X. PSPEL: In silico prediction of self-interacting proteins from amino acids sequences using ensemble learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 1165–1172. [Google Scholar] [CrossRef]
- Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA 2007, 104, 4337–4341. [Google Scholar]
- Guo, Y.; Yu, L.; Wen, Z.; Li, M. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 2008, 36, 3025–3030. [Google Scholar] [CrossRef]
- Cao, C.; Liu, F.; Tan, H.; Song, D.; Shu, W.; Li, W.; Zhou, Y.; Bo, X.; Xie, Z. Deep learning and its applications in biomedicine. Genom. Proteom. Bioinform. 2018, 16, 17–32. [Google Scholar] [CrossRef] [PubMed]
- Sun, T.; Zhou, B.; Lai, L.; Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform. 2017, 18, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Du, X.; Sun, S.; Hu, C.; Yao, Y.; Yan, Y.; Zhang, Y. DeepPPI: Boosting prediction of protein–protein interactions with deep neural networks. J. Chem. Inf. Model. 2017, 57, 1499–1510. [Google Scholar] [CrossRef]
- Zhang, L.; Yu, G.; Xia, D.; Wang, J. Protein–protein interactions prediction based on ensemble deep neural networks. Neurocomputing 2019, 324, 10–19. [Google Scholar]
- Richoux, F.; Servantie, C.; Borès, C.; Téletchéa, S. Comparing two deep learning sequence-based models for protein-protein interaction prediction. arXiv 2019, arXiv:1901.06268. [Google Scholar]
- Lim, H.; Cankara, F.; Tsai, C.J.; Keskin, O.; Nussinov, R.; Gursoy, A. Artificial intelligence approaches to human-microbiome protein–protein interactions. Curr. Opin. Struct. Biol. 2022, 73, 102328. [Google Scholar]
- Lei, H.; Wen, Y.; You, Z.; Elazab, A.; Tan, E.L.; Zhao, Y.; Lei, B. Protein–protein interactions prediction via multimodal deep polynomial network and regularized extreme learning machine. IEEE J. Biomed. Health Inform. 2018, 23, 1290–1303. [Google Scholar] [CrossRef] [PubMed]
- Hashemifar, S.; Neyshabur, B.; Khan, A.A.; Xu, J. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 2018, 34, i802–i810. [Google Scholar] [CrossRef]
- Wu, J.; Wang, W.; Zhang, J.; Zhou, B.; Zhao, W.; Su, Z.; Gu, X.; Wu, J.; Zhou, Z.; Chen, S. DeepHLApan: A deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity. Front. Immunol. 2019, 10, 2559. [Google Scholar]
- Li, X.; Yan, X.; Gu, Q.; Zhou, H.; Wu, D.; Xu, J. Deepchemstable: Chemical stability prediction with an attention-based graph convolution network. J. Chem. Inf. Model. 2019, 59, 1044–1049. [Google Scholar]
- Chen, J.; Zheng, S.; Zhao, H.; Yang, Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J. Cheminform. 2021, 13, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34, i457–i466. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, L.; Zhong, F.; Wang, D.; Jiang, J.; Zhang, S.; Jiang, H.; Zheng, M.; Li, X. Graph neural network approaches for drug-target interactions. Curr. Opin. Struct. Biol. 2022, 73, 102327. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.a.; Hu, P.; Chan, K.C.; You, Z.H. Graph convolution for predicting associations between miRNA and drug resistance. Bioinformatics 2020, 36, 851–858. [Google Scholar] [CrossRef]
- Licamele, L.; Getoor, L. Predicting Protein-Protein Interactions Using Relational Features; Technical Report; University of Maryland: College Park, MD, USA, 2007. [Google Scholar]
- Yang, F.; Fan, K.; Song, D.; Lin, H. Graph-based prediction of Protein-protein interactions with attributed signed graph embedding. BMC Bioinform. 2020, 21, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Jha, K.; Saha, S.; Singh, H. Prediction of protein–protein interaction using graph neural networks. Sci. Rep. 2022, 12, 1–12. [Google Scholar] [CrossRef]
- Lv, G.; Hu, Z.; Bi, Y.; Zhang, S. Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction. arXiv 2021, arXiv:2105.06709. [Google Scholar]
- Paradesi, M.S.; Caragea, D.; Hsu, W.H. Structural prediction of protein-protein interactions in saccharomyces cerevisiae. In Proceedings of the 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering, Boston, MA, USA, 14–17 October 2007; pp. 1270–1274. [Google Scholar]
- Song, B.; Luo, X.; Luo, X.; Liu, Y.; Niu, Z.; Zeng, X. Learning spatial structures of proteins improves protein–protein interaction prediction. Brief. Bioinform. 2022, 23, bbab558. [Google Scholar]
- You, Z.H.; Lei, Y.K.; Gui, J.; Huang, D.S.; Zhou, X. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 2010, 26, 2744–2751. [Google Scholar]
- Yegnanarayana, B. Artificial Neural Networks; PHI Learning Pvt. Ltd.: New Dehli, India, 2009. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Ganea, O.; Bécigneul, G.; Hofmann, T. Hyperbolic neural networks. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
- Chami, I.; Ying, R.; Ré, C.; Leskovec, J. Hyperbolic Graph Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2019, 32, 4869–4880. [Google Scholar] [PubMed]
- Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.; Bork, P.; et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019, 47, D607–D613. [Google Scholar] [CrossRef]
- Mathivanan, S.; Periaswamy, B.; Gandhi, T.; Kandasamy, K.; Suresh, S.; Mohmood, R.; Ramachandra, Y.; Pandey, A. An evaluation of human protein-protein interaction data in the public domain. BMC Bioinform. 2006, 7, 1–14. [Google Scholar] [CrossRef]
- Ispolatov, I.; Yuryev, A.; Mazo, I.; Maslov, S. Binding properties and evolution of homodimers in protein–protein interaction networks. Nucleic Acids Res. 2005, 33, 3629–3635. [Google Scholar] [PubMed]
- Kikuchi, A.; Kishida, S.; Yamamoto, H. Regulation of Wnt signaling by protein-protein interaction and post-translational modifications. Exp. Mol. Med. 2006, 38, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Chavez, J.D.; Weisbrod, C.R.; Zheng, C.; Eng, J.K.; Bruce, J.E. Protein interactions, post-translational modifications and topologies in human cells. Mol. Cell. Proteom. 2013, 12, 1451–1467. [Google Scholar]
- Dove, S.L.; Joung, J.K.; Hochschild, A. Activation of prokaryotic transcription through arbitrary protein–protein contacts. Nature 1997, 386, 627–630. [Google Scholar] [CrossRef]
- Yang-Yen, H.F.; Chambard, J.C.; Sun, Y.L.; Smeal, T.; Schmidt, T.J.; Drouin, J.; Karin, M. Transcriptional interference between c-Jun and the glucocorticoid receptor: Mutual inhibition of DNA binding due to direct protein-protein interaction. Cell 1990, 62, 1205–1215. [Google Scholar] [CrossRef]
- Klingenberg, M. Ligand- Protein Interaction in Biomembrane Carriers. The Induced Transition Fit of Transport Catalysis. Biochemistry 2005, 44, 8563–8570. [Google Scholar] [PubMed]
- Armingol, E.; Officer, A.; Harismendy, O.; Lewis, N.E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 2021, 22, 71–88. [Google Scholar] [PubMed]
- Chen, M.; Ju, C.J.T.; Zhou, G.; Chen, X.; Zhang, T.; Chang, K.W.; Zaniolo, C.; Wang, W. Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics 2019, 35, i305–i314. [Google Scholar] [PubMed] [Green Version]
- Krioukov, D.; Papadopoulos, F.; Kitsak, M.; Vahdat, A.; Bogu?Á, M. Hyperbolic Geometry of Complex Networks. Phys. Rev. Stat. Nonlinear Soft Matter Phys. 2010, 82, 036106. [Google Scholar]
- Nickel, M.; Kiela, D. Poincaré Embeddings for Learning Hierarchical Representations. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Vyas, A.; Choudhary, N.; Khatir, M.; Reddy, C.K. GraphZoo: A Development Toolkit for Graph Neural Networks with Hyperbolic Geometries. In Proceedings of the Companion Proceedings of the Web Conference, Lyon, France, 25–29 April 2022. [Google Scholar]
Datasets | Interaction Type | # PPI | Positive Samples | Datasets | Interaction Type | # PPI | Positive Samples |
---|---|---|---|---|---|---|---|
SHS27k | reaction | 18,162 | 1.09% | SHS148k | reaction | 102,964 | 0.67% |
binding | 16,056 | 0.47% | binding | 93,632 | 0.28% | ||
ptm | 2872 | 1.05% | ptm | 20,154 | 0.38% | ||
activation | 7400 | 0.40% | activation | 42,516 | 0.23% | ||
inhibition | 5550 | 1.48% | inhibition | 34,712 | 0.76% | ||
catalysis | 11,796 | 0.79% | catalysis | 67,168 | 0.50% | ||
expression | 1572 | 0.40% | expression | 7896 | 0.16% |
Model | # Parameters |
---|---|
NN | 1088 |
GCN | 1088 |
GAT | 1120 |
HNN | 1088 |
HGCN | 1088 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, H.; Wang, W.; Jin, J.; Zheng, Z.; Zhou, B. Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules 2022, 27, 6135. https://doi.org/10.3390/molecules27186135
Zhou H, Wang W, Jin J, Zheng Z, Zhou B. Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules. 2022; 27(18):6135. https://doi.org/10.3390/molecules27186135
Chicago/Turabian StyleZhou, Hang, Weikun Wang, Jiayun Jin, Zengwei Zheng, and Binbin Zhou. 2022. "Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study" Molecules 27, no. 18: 6135. https://doi.org/10.3390/molecules27186135
APA StyleZhou, H., Wang, W., Jin, J., Zheng, Z., & Zhou, B. (2022). Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules, 27(18), 6135. https://doi.org/10.3390/molecules27186135