Advances in Computational Methods for Protein–Protein Interaction Prediction
Abstract
:1. Introduction
2. Databases and Dataset Preparation
3. Feature Extraction
3.1. Sequence-Based Features
3.1.1. Amino Acid Composition (AAC)
3.1.2. Pseudo Amino Acid Composition (PseAAC)
3.1.3. Dipeptide Composition (DPC)
3.1.4. Conjoint Triad (CT)
3.1.5. Composition, Transition, and Distribution (CTD)
- (1)
- Composition: This refers to the percentage representation of each kind of amino acid within the protein sequence. According to the example mentioned earlier, the numbers of “1”, “2”, and “3” are 3, 4, and 3, respectively, and their corresponding frequencies are , , . Thus, the classification based on hydrophobicity leads to a three-dimensional vector, and if more properties of amino acids are combined, a higher-dimensional vector can be obtained.
- (2)
- Transition: This is the frequency of occurrence of dipeptides (regardless of their order) composed of different classes of amino acids in the sequence. The definition is as follows:
- (3)
- Distribution: This describes the distribution pattern of amino acids. For each category, the position percentages of the first, first quartile, half, third quartile, and last residues of each group in the entire sequence are calculated. In the above instance, there exist four residues designated as “2”, with their respective positions within the group being , , , , and , respectively. The positions of “2” in the entire sequence are 1, 1, 3, 4, and 8, so the distribution descriptors for “2” are , , , , and . The descriptors for “1” and “3” are calculated similarly.
3.1.6. Autocorrelation Descriptor (AD)
- (1)
- Moreau-Broto AD is denoted as:
- (2)
- Moran AD is denoted as:
- (3)
- Geary AD is denoted as:
3.1.7. Position-Specific Scoring Matrix (PSSM)
3.2. Structure-Based Features
3.3. GO-Based Features
3.4. Network-Based Features
4. Computational Methods for PPI Prediction
4.1. Sequence-Based Methods
4.2. Structure-Based Methods
4.3. Network-Based Methods
4.4. GO-Based Methods
4.5. DL-Based Methods
5. Current Challenges and Future Prospects
6. Key Points
- This article presents an exposition on the commonly utilized databases and the methodologies for constructing datasets employed in PPI prediction tasks.
- The relevant feature extraction strategies and computational methods have been classified and discussed.
- Deep learning algorithms demonstrate significant advantages in extracting diverse protein features and exhibit strong predictive performance.
- Three crucial aspects that future research in PPI prediction should prioritize are more accurate and effective datasets, the extraction and integration strategies of multiple kinds of features, as well as more universally applicable prediction methodologies.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Berggard, T.; Linse, S.; James, P. Methods for the detection and analysis of protein-protein interactions. Proteomics 2007, 7, 2833–2842. [Google Scholar] [CrossRef]
- De Las Rivas, J.; Fontanillo, C. Protein-Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks. PLoS Comput. Biol. 2010, 6, e1000807. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.X.; Shan, Y.B. Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins Struct. Funct. Genet. 2001, 44, 336–343. [Google Scholar] [CrossRef]
- Braun, P.; Gingras, A.-C. History of protein-protein interactions: From egg-white to complex networks. Proteomics 2012, 12, 1478–1498. [Google Scholar] [CrossRef] [PubMed]
- De Las Rivas, J.; Fontanillo, C. Protein-protein interaction networks: Unraveling the wiring of molecular machines within the cell. Brief. Funct. Genom. 2012, 11, 489–496. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.-S.; Wang, Y.; Wu, L.-Y.; Zhang, X.-S.; Chen, L. Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinform. 2007, 8, 391. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Niu, Z.; Liu, Y.; Song, B.; Lu, W.; Zeng, L.; Zeng, X. Modality-DTA: Multimodality Fusion Strategy for Drug-Target Affinity Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 1200–1210. [Google Scholar] [CrossRef] [PubMed]
- Bakail, M.; Ochsenbein, F. Targeting protein-protein interactions, a wide open field for drug design. Comptes Rendus Chim. 2016, 19, 19–27. [Google Scholar] [CrossRef]
- Song, B.; Luo, X.; Luo, X.; Liu, Y.; Niu, Z.; Zeng, X. Learning spatial structures of proteins improves protein-protein interaction prediction. Brief. Bioinform. 2022, 23, bbab558. [Google Scholar] [CrossRef]
- Petta, I.; Lievens, S.; Libert, C.; Tavernier, J.; De Bosscher, K. Modulation of Protein-Protein Interactions for the Development of Novel Therapeutics. Mol. Ther. 2016, 24, 707–718. [Google Scholar] [CrossRef]
- Zhang, L.; Li, S.; Hao, C.X.; Hong, G.N.; Zou, J.F.; Zhang, Y.N.; Li, P.F.; Guo, Z. Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer. Gene 2013, 526, 232–238. [Google Scholar] [CrossRef]
- Tian, Y.; Su, X.; Su, Y.; Zhang, X. EMODMI: A Multi-Objective Optimization Based Method to Identify Disease Modules. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 570–582. [Google Scholar] [CrossRef]
- Gavin, A.C.; Bosche, M.; Krause, R.; Grandi, P.; Marzioch, M.; Bauer, A.; Schultz, J.; Rick, J.M.; Michon, A.M.; Cruciat, C.M.; et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147. [Google Scholar] [CrossRef] [PubMed]
- Parrish, J.R.; Gulyas, K.D.; Finley, R.L., Jr. Yeast two-hybrid contributions to interactome mapping. Curr. Opin. Biotechnol. 2006, 17, 387–393. [Google Scholar] [CrossRef]
- Ito, T.; Chiba, T.; Ozawa, R.; Yoshida, M.; Hattori, M.; Sakaki, Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 2001, 98, 4569–4574. [Google Scholar] [CrossRef] [PubMed]
- Vinogradova, O.; Qin, J. NMR as a Unique Tool in Assessment and Complex Determination of Weak Protein-Protein Interactions. Top Curr. Chem. 2012, 326, 35–45. [Google Scholar] [CrossRef] [PubMed]
- O’Connell, M.R.; Gamsjaeger, R.; Mackay, J.P. The structural analysis of protein-protein interactions by NMR spectroscopy. Proteomics 2009, 9, 5224–5232. [Google Scholar] [CrossRef] [PubMed]
- Tong, A.H.Y.; Evangelista, M.; Parsons, A.B.; Xu, H.; Bader, G.D.; Page, N.; Robinson, M.; Raghibizadeh, S.; Hogue, C.W.V.; Bussey, H.; et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 2001, 294, 2364–2368. [Google Scholar] [CrossRef] [PubMed]
- Ooi, S.L.; Pan, X.W.; Peyser, B.D.; Ye, P.; Meluh, P.B.; Yuan, D.S.; Irizarry, R.A.; Bader, J.S.; Spencer, F.A.; Boeke, J.D. Global synthetic-lethality analysis and yeast functional profiling. Trends Genet. 2006, 22, 56–63. [Google Scholar] [CrossRef] [PubMed]
- Foltman, M.; Sanchez-Diaz, A. Studying Protein-Protein Interactions in Budding Yeast Using Co-immunoprecipitation. Methods Mol. Biol. 2016, 1369, 239–256. [Google Scholar] [CrossRef]
- Zhu, H.; Bilgin, M.; Bangham, R.; Hall, D.; Casamayor, A.; Bertone, P.; Lan, N.; Jansen, R.; Bidlingmaier, S.; Houfek, T.; et al. Global analysis of protein activities using proteome chips. Science 2001, 293, 2101–2105. [Google Scholar] [CrossRef]
- Piehler, J. New methodologies for measuring protein interactions in vivo and in vitro. Curr. Opin. Struct. Biol. 2005, 15, 4–14. [Google Scholar] [CrossRef] [PubMed]
- Byron, O.; Vestergaard, B. Protein-protein interactions: A supra-structural phenomenon demanding trans-disciplinary biophysical approaches. Curr. Opin. Struct. Biol. 2015, 35, 76–86. [Google Scholar] [CrossRef] [PubMed]
- Collins, S.R.; Kemmeren, P.; Zhao, X.-C.; Greenblatt, J.F.; Spencer, F.; Holstege, F.C.P.; Weissman, J.S.; Krogan, N.J. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteom. 2007, 6, 439–450. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Bader, J.S. Precision and recall estimates for two-hybrid screens. Bioinformatics 2009, 25, 372–378. [Google Scholar] [CrossRef]
- Ding, Z.; Kihara, D. Computational identification of protein-protein interactions in model plant proteomes. Sci. Rep. 2019, 9, 8740. [Google Scholar] [CrossRef]
- Gingras, A.-C.; Gstaiger, M.; Raught, B.; Aebersold, R. Analysis of protein complexes using mass spectrometry. Nat. Rev. Mol. Cell Biol. 2007, 8, 645–654. [Google Scholar] [CrossRef]
- Marmier, G.; Weigt, M.; Bitbol, A.-F. Phylogenetic correlations can suffice to infer protein partners from sequences. PLoS Comput. Biol. 2019, 15, e1007179. [Google Scholar] [CrossRef]
- Liben-Nowell, D.; Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 2007, 58, 1019–1031. [Google Scholar] [CrossRef]
- Kovacs, I.A.; Luck, K.; Spirohn, K.; Wang, Y.; Pollis, C.; Schlabach, S.; Bian, W.; Kim, D.-K.; Kishore, N.; Hao, T.; et al. Network-based prediction of protein interactions. Nat. Commun. 2019, 10, 1240. [Google Scholar] [CrossRef]
- Nicholas Wass, M.; Fuentes, G.; Pons, C.; Pazos, F.; Valencia, A. Towards the prediction of protein interaction partners using physical docking. Mol. Syst. Biol. 2011, 7, 469. [Google Scholar] [CrossRef]
- Dong, S.; Lau, V.; Song, R.; Ierullo, M.; Esteban, E.; Wu, Y.; Sivieng, T.; Nahal, H.; Gaudinier, A.; Pasha, A.; et al. Proteome-wide, Structure-Based Prediction of Protein-Protein Interactions/New Molecular Interactions Viewer. Plant Physiol. 2019, 179, 1893–1907. [Google Scholar] [CrossRef]
- Pierce, B.G.; Wiehe, K.; Hwang, H.; Kim, B.-H.; Vreven, T.; Weng, Z. ZDOCK server: Interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics 2014, 30, 1771–1773. [Google Scholar] [CrossRef]
- Ohue, M.; Matsuzaki, Y.; Uchikoga, N.; Ishida, T.; Akiyama, Y. MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data. Protein Pept. Lett. 2014, 21, 766–778. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Zidek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Chowdhury, R.; Bouatta, N.; Biswas, S.; Floristean, C.; Kharkare, A.; Roye, K.; Rochereau, C.; Ahdritz, G.; Zhang, J.; Church, G.M.; et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 2022, 40, 1617–1623. [Google Scholar] [CrossRef]
- Li, P.; Liu, Z.-P. PST-PRNA: Prediction of RNA-binding sites using protein surface topography and deep learning. Bioinformatics 2022, 38, 2162–2168. [Google Scholar] [CrossRef]
- Zhang, S.; Zhou, J.; Hu, H.; Gong, H.; Chen, L.; Cheng, C.; Zeng, J. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 2016, 44, e32. [Google Scholar] [CrossRef]
- Salwinski, L.; Miller, C.S.; Smith, A.J.; Pettit, F.K.; Bowie, J.U.; Eisenberg, D. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32, D449–D451. [Google Scholar] [CrossRef]
- Oughtred, R.; Stark, C.; Breitkreutz, B.-J.; Rust, J.; Boucher, L.; Chang, C.; Kolas, N.; O’Donnell, L.; Leung, G.; McAdam, R.; et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 2019, 47, D529–D541. [Google Scholar] [CrossRef]
- Kerrien, S.; Aranda, B.; Breuza, L.; Bridge, A.; Broackes-Carter, F.; Chen, C.; Duesbury, M.; Dumousseau, M.; Feuermann, M.; Hinz, U.; et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012, 40, D841–D846. [Google Scholar] [CrossRef]
- Szklarczyk, D.; Kirsch, R.; Koutrouli, M.; Nastou, K.; Mehryary, F.; Hachilif, R.; Gable, A.L.; Fang, T.; Doncheva, N.T.; Pyysalo, S.; et al. The STRING database in 2023: Protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023, 51, D638–D646. [Google Scholar] [CrossRef]
- Prasad, T.S.K.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.; Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.; et al. Human Protein Reference Database-2009 update. Nucleic Acids Res. 2009, 37, D767–D772. [Google Scholar] [CrossRef]
- Licata, L.; Briganti, L.; Peluso, D.; Perfetto, L.; Iannuccelli, M.; Galeota, E.; Sacco, F.; Palma, A.; Nardozza, A.P.; Santonico, E.; et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res. 2012, 40, D857–D861. [Google Scholar] [CrossRef]
- Alanis-Lobato, G.; Andrade-Navarro, M.A.; Schaefer, M.H. HIPPIE v2.0: Enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 2017, 45, D408–D414. [Google Scholar] [CrossRef]
- Alfarano, C.; Andrade, C.E.; Anthony, K.; Bahroos, N.; Bajec, M.; Bantoft, K.; Betel, D.; Bobechko, B.; Boutilier, K.; Burgess, E.; et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005, 33, D418–D424. [Google Scholar] [CrossRef]
- Blohm, P.; Frishman, G.; Smialowski, P.; Goebels, F.; Wachinger, B.; Ruepp, A.; Frishman, D. Negatome 2.0: A database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res. 2014, 42, D396–D400. [Google Scholar] [CrossRef]
- Bateman, A.; Martin, M.-J.; Orchard, S.; Magrane, M.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Cukura, A.; Denny, P.; et al. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. [Google Scholar] [CrossRef]
- Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M.C.; Estreicher, A.; Gasteiger, E.; Martin, M.J.; Michoud, K.; O’Donovan, C.; Phan, I.; et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31, 365–370. [Google Scholar] [CrossRef]
- Barker, W.C.; Garavelli, J.S.; McGarvey, P.B.; Marzec, C.R.; Orcutt, B.C.; Srinivasarao, G.Y.; Yeh, L.S.L.; Ledley, R.S.; Mewes, H.W.; Pfeiffer, F.; et al. The PIR-International Protein Sequence Database. Nucleic Acids Res. 1999, 27, 39–43. [Google Scholar] [CrossRef]
- Andreeva, A.; Kulesha, E.; Gough, J.; Murzin, A.G. The SCOP database in 2020: Expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020, 48, D376–D382. [Google Scholar] [CrossRef]
- Bittrich, S.; Rose, Y.; Segura, J.; Lowe, R.; Westbrook, J.D.; Duarte, J.M.; Burley, S.K. RCSB Protein Data Bank: Improved annotation, search and visualization of membrane protein structures archived in the PDB. Bioinformatics 2022, 38, 1452–1454. [Google Scholar] [CrossRef]
- Carbon, S.; Dietze, H.; Lewis, S.E.; Mungall, C.J.; Munoz-Torres, M.C.; Basu, S.; Chisholm, R.L.; Dodson, R.J.; Fey, P.; Thomas, P.D.; et al. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017, 45, D331–D338. [Google Scholar] [CrossRef]
- Galperin, M.Y.; Wolf, Y.I.; Makarova, K.S.; Alvarez, R.V.; Landsman, D.; Koonin, E.V. COG database update: Focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res. 2021, 49, D274–D281. [Google Scholar] [CrossRef]
- Kanehisa, M.; Furumichi, M.; Sato, Y.; Kawashima, M.; Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023, 51, D587–D592. [Google Scholar] [CrossRef]
- Skrzypek, M.S.; Binkley, J.; Binkley, G.; Miyasato, S.R.; Simison, M.; Sherlock, G. The Candida Genome Database (CGD): Incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data. Nucleic Acids Res. 2017, 45, D592–D596. [Google Scholar] [CrossRef]
- Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predictina protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA 2007, 104, 4337–4341. [Google Scholar] [CrossRef]
- Hamp, T.; Rost, B. Evolutionary profiles improve protein-protein interaction prediction from sequence. Bioinformatics 2015, 31, 1945–1950. [Google Scholar] [CrossRef]
- Pan, X.-Y.; Zhang, Y.-N.; Shen, H.-B. Large-Scale Prediction of Human Protein-Protein Interactions from Amino Acid Sequence Based on Latent Topic Features. J. Proteome Res. 2010, 9, 4992–5001. [Google Scholar] [CrossRef]
- Mahapatra, S.; Gupta, V.R.; Sahu, S.S.; Panda, G. Deep Neural Network and Extreme Gradient Boosting Based Hybrid Classifier for Improved Prediction of Protein-Protein Interaction. IEEE/Acm Trans. Comput. Biol. Bioinform. 2022, 19, 155–165. [Google Scholar] [CrossRef]
- Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct. Funct. Genet. 2001, 43, 246–255. [Google Scholar] [CrossRef]
- Chou, K.C. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005, 21, 10–19. [Google Scholar] [CrossRef]
- Saravanan, V.; Gautham, N. Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor. Omics A J. Integr. Biol. 2015, 19, 648–658. [Google Scholar] [CrossRef]
- Chou, K.C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun. 2000, 278, 477–483. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Wang, Y.; Webb, G.I.; Smith, A.I.; Daly, R.J.; Chou, K.-C.; et al. iFeature: A Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 2018, 34, 2499–2502. [Google Scholar] [CrossRef]
- Dubchak, I.; Muchnik, I.; Holbrook, S.R.; Kim, S.H. Prediction of protein-folding class using global description of amino-acid-sequence. Proc. Natl. Acad. Sci. USA 1995, 92, 8700–8704. [Google Scholar] [CrossRef]
- Gribskov, M.; McLachlan, A.D.; Eisenberg, D. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 1987, 84, 4355–4358. [Google Scholar] [CrossRef]
- Ding, Y.; Tang, J.; Guo, F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinform. 2016, 17, 398. [Google Scholar] [CrossRef]
- Tran, H.-N.; Xuan, Q.N.P.; Nguyen, T.-T. DeepCF-PPI: Improved prediction of protein-protein interactions by combining learned and handcrafted features based on attention mechanisms. Appl. Intell. 2023, 53, 17887–17902. [Google Scholar] [CrossRef]
- Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef]
- Baranwal, M.; Magner, A.; Saldinger, J.; Turali-Emre, E.S.; Elvati, P.; Kozarekar, S.; VanEpps, J.S.; Kotov, N.A.; Violi, A.; Hero, A.O. Struct2Graph: A graph attention network for structure based predictions of protein-protein interactions. BMC Bioinform. 2022, 23, 370. [Google Scholar] [CrossRef]
- De Domenico, M.; Sole-Ribalta, A.; Cozzo, E.; Kivelae, M.; Moreno, Y.; Porter, M.A.; Gomez, S.; Arenas, A. Mathematical Formulation of Multilayer Networks. Phys. Rev. X 2013, 3, 041022. [Google Scholar] [CrossRef]
- Zhang, C.; Shine, M.; Pyle, A.M.; Zhang, Y. US-align: Universal structure alignments of proteins, nucleic acids, and macromolecular complexes. Nat. Methods 2022, 19, 1109–1115. [Google Scholar] [CrossRef]
- Mirabello, C.; Wallner, B. InterPred: A pipeline to identify and model protein-protein interactions. Proteins Struct. Funct. Bioinform. 2017, 85, 1159–1170. [Google Scholar] [CrossRef]
- Harris, M.A.; Clark, J.I.; Ireland, A.; Lomax, J.; Ashburner, M.; Collins, R.; Eilbeck, K.; Lewis, S.; Mungall, C.; Richter, J.; et al. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 2006, 34, D322–D326. [Google Scholar] [CrossRef]
- Wu, X.; Zhu, L.; Guo, J.; Zhang, D.-Y.; Lin, K. Prediction of yeast protein-protein interaction network: Insights from the Gene Ontology and annotations. Nucleic Acids Res. 2006, 34, 2137–2150. [Google Scholar] [CrossRef]
- Bandyopadhyay, S.; Mallick, K. A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 762–770. [Google Scholar] [CrossRef]
- Zhang, J.; Jia, K.; Jia, J.; Qian, Y. An improved approach to infer protein-protein interaction based on a hierarchical vector space model. BMC Bioinform. 2018, 19, 161. [Google Scholar] [CrossRef]
- Wu, H.W.; Su, Z.C.; Mao, F.L.; Olman, V.; Xu, Y. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 2005, 33, 2822–2837. [Google Scholar] [CrossRef]
- Jha, K.; Saha, S.; Dutta, P. Incorporation of gene ontology in identification of protein interactions from biomedical corpus: A multi-modal approach. Ann. Oper. Res. 2022, 39, 1–19. [Google Scholar] [CrossRef]
- Ieremie, I.; Ewing, R.M.; Niranjan, M. TransformerGO: Predicting protein-protein interactions by modelling the attention between sets of gene ontology terms. Bioinformatics 2022, 38, 2269–2277. [Google Scholar] [CrossRef]
- Zhou, T.; Lu, L.; Zhang, Y.-C. Predicting missing links via local information. Eur. Phys. J. B 2009, 71, 623–630. [Google Scholar] [CrossRef]
- Samanthula, B.K.; Jiang, W. Secure Multiset Intersection Cardinality and its Application to Jaccard Coefficient. IEEE Trans. Dependable Secur. Comput. 2016, 13, 591–604. [Google Scholar] [CrossRef]
- Adamic, L.A.; Adar, E. Friends and neighbors on the Web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
- Chen, C.; Zhang, Q.; Ma, Q.; Yu, B. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom. Intell. Lab. Syst. 2019, 191, 54–64. [Google Scholar] [CrossRef]
- Yu, B.; Chen, C.; Wang, X.; Yu, Z.; Ma, A.; Liu, B. Prediction of protein-protein interactions based on elastic net and deep forest. Expert Syst. Appl. 2021, 176, 114876. [Google Scholar] [CrossRef]
- Wang, L.; You, Z.-H.; Xia, S.-X.; Chen, X.; Yan, X.; Zhou, Y.; Liu, F. An improved efficient rotation forest algorithm to predict the interactions among proteins. Soft Comput. 2018, 22, 3373–3381. [Google Scholar] [CrossRef]
- Goktepe, Y.E.; Kodaz, H. Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method. Neurocomputing 2018, 303, 68–74. [Google Scholar] [CrossRef]
- Hu, L.; Yang, S.; Luo, X.; Yuan, H.; Sedraoui, K.; Zhou, M. A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce. IEEE-CAA J. Autom. Sin. 2022, 9, 160–172. [Google Scholar] [CrossRef]
- Wei, Z.-S.; Han, K.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J. Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 2016, 193, 201–212. [Google Scholar] [CrossRef]
- Zhang, Q.C.; Petrey, D.; Deng, L.; Qiang, L.; Shi, Y.; Thu, C.A.; Bisikirska, B.; Lefebvre, C.; Accili, D.; Hunter, T.; et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012, 490, 556–560. [Google Scholar] [CrossRef]
- Bryant, P.; Pozzati, G.; Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 2022, 13, 1265. [Google Scholar] [CrossRef]
- Comeau, S.R.; Gatchell, D.W.; Vajda, S.; Camacho, C.J. ClusPro: An automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 2004, 20, 45–50. [Google Scholar] [CrossRef]
- De Vries, S.J.; van Dijk, M.; Bonvin, A.M.J.J. The HADDOCK web server for data-driven biomolecular docking. Nat. Protoc. 2010, 5, 883–897. [Google Scholar] [CrossRef] [PubMed]
- Xue, L.C.; Rodrigues, J.; Kastritis, P.L.; Bonvin, A.; Vangone, A. PRODIGY: A web server for predicting the binding affinity of protein-protein complexes. Bioinformatics 2016, 32, 3676–3678. [Google Scholar] [CrossRef] [PubMed]
- Schneidman-Duhovny, D.; Inbar, Y.; Nussinov, R.; Wolfson, H.J. PatchDock and SymmDock: Servers for rigid and symmetric docking. Nucleic Acids Res. 2005, 33, W363–W367. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Huang, J.; Zhang, Z.; Liu, J.; Huang, T.; Chen, H. Similarity-based future common neighbors model for link prediction in complex networks. Sci. Rep. 2018, 8, 17014. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, W.; Liu, J.; Feng, J.; Gong, X. Protein Interface Complementarity and Gene Duplication Improve Link Prediction of Protein-Protein Interaction Network. Front. Genet. 2020, 11, 291. [Google Scholar] [CrossRef]
- Lei, C.; Ruan, J. A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity. Bioinformatics 2013, 29, 355–364. [Google Scholar] [CrossRef]
- Yuen, H.Y.; Jansson, J. Normalized L3-based link prediction in protein-protein interaction networks. BMC Bioinform. 2023, 24, 59. [Google Scholar] [CrossRef]
- Chen, K.-H.; Wang, T.-F.; Hu, Y.-J. Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme. BMC Bioinform. 2019, 20, 308. [Google Scholar] [CrossRef]
- Du, X.; Sun, S.; Hu, C.; Yao, Y.; Yan, Y.; Zhang, Y. DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks. J. Chem. Inf. Model. 2017, 57, 1499–1510. [Google Scholar] [CrossRef]
- Huang, Y.; Wuchty, S.; Zhou, Y.; Zhang, Z. SGPPI: Structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief. Bioinform. 2023, 24, bbad020. [Google Scholar] [CrossRef]
- Sun, T.; Zhou, B.; Lai, L.; Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinform. 2017, 18, 277. [Google Scholar] [CrossRef] [PubMed]
- Sledzieski, S.; Singh, R.; Cowen, L.; Berger, B. D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Syst. 2021, 12, 969–982.e6. [Google Scholar] [CrossRef]
- Hashemifar, S.; Neyshabur, B.; Khan, A.A.; Xu, J. Predicting protein-protein interactions through sequence-based deep learning. Bioinformatics 2018, 34, 802–810. [Google Scholar] [CrossRef] [PubMed]
- Hu, L.; Chan, K.C.C. Extracting Coevolutionary Features from Protein Sequences for Predicting Protein-Protein Interactions. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 155–166. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Singh, B. AE-LGBM: Sequence-based novel approach to detect interacting protein pairs via ensemble of autoencoder and LightGBM. Comput. Biol. Med. 2020, 125, 103964. [Google Scholar] [CrossRef] [PubMed]
- Yu, B.; Chen, C.; Zhou, H.; Liu, B.; Ma, Q. GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting. Genom. Proteom. Bioinform. 2020, 18, 582–592. [Google Scholar] [CrossRef]
- Przytycka, T.M.; Singh, M.; Slonim, D.K. Toward the dynamic interactome: It’s about time. Brief. Bioinform. 2010, 11, 15–29. [Google Scholar] [CrossRef]
- Jenghara, M.M.; Ebrahimpour-Komleh, H.; Parvin, H. Dynamic protein-protein interaction networks construction using firefly algorithm. Pattern Anal. Appl. 2018, 21, 1067–1081. [Google Scholar] [CrossRef]
- Zhang, Y.; Lin, H.; Yang, Z.; Wang, J. Construction of dynamic probabilistic protein interaction networks for protein complex identification. BMC Bioinform. 2016, 17, 186. [Google Scholar] [CrossRef] [PubMed]
- Ou-Yang, L.; Dai, D.-Q.; Li, X.-L.; Wu, M.; Zhang, X.-F.; Yang, P. Detecting temporal protein complexes from dynamic protein-protein interaction networks. BMC Bioinform. 2014, 15, 335. [Google Scholar] [CrossRef] [PubMed]
- Tan, C.S.H.; Go, K.D.; Bisteau, X.; Dai, L.; Yong, C.H.; Prabhu, N.; Ozturk, M.B.; Lim, Y.T.; Sreekumar, L.; Lengqvist, J.; et al. Thermal proximity coaggregation for system-wide profiling of protein complex dynamics in cells. Science 2018, 359, 1170–1176. [Google Scholar] [CrossRef]
Category | Database | Description | URL |
---|---|---|---|
PPI networks | DIP [39] | A database documenting experimentally determined PPIs, with the majority of data being sourced from yeast, Helicobacter pylori, and humans. | https://dip.doe-mbi.ucla.edu/dip/Main.cgi (accessed on 24 June 2023) |
BioGRID [40] | A repository containing data on post translational modifications (PTMs), chemical interactions, and protein and gene interactions. | http://www.thebiogrid.org/ (accessed on 24 June 2023) | |
IntAct [41] | A user-driven open-source database and analytical tools for data on molecular interactions that are gathered from both direct user submissions and literature curation. | http://www.ebi.ac.uk/intact (accessed on 24 June 2023) | |
STRING [42] | Covering 67,592,464 proteins from 14,094 organisms and 20,052,394,041 total interactions. | https://string-db.org/ (accessed on 24 June 2023) | |
HPRD [43] | Composed of 41,327 PPIs, 93,710 PTMs, 22,490 subcellular localizations and 112,158 protein expressions. | http://www.hprd.org/ (accessed on 24 June 2023) | |
MINT [44] | Housing PPIs experimentally confirmed and various other forms of functional interactions. | https://mint.bio.uniroma2.it (accessed on 24 June 2023) | |
HIPPIE [45] | Providing confidence-scored and functionally annotated human PPIs. | http://cbdm.uni-mainz.de/hippie/ (accessed on 24 June 2023) | |
BIND [46] | Containing over 200,000 molecular interactions, and over 3750 biological complexes involving over 1000 species. | http://download.baderlab.org/BINDTranslation/ (accessed on 24 June 2023) | |
Protein sequences | UniProt [48] | Consisting of UniProtKB, Proteomes, UniRef, and UniParc. UniProtKB covers 569,793 reviewed (Swiss-Prot) and 248,272,897 unreviewed (TrEMBL) protein sequences. | http://www.uniprot.org/ (accessed on 25 June 2023) |
SWISS-PROT [49] | Containing protein sequences along with comprehensive annotations. It has been integrated into UniProt. | http://www.expasy.org/sprot/ (accessed on 25 June 2023) | |
PIR [50] | A comprehensive collection of protein data encompassing protein sequences and annotations. | http://pir.georgetown.edu/ (accessed on 25 June 2023) | |
Protein structures | SCOP [51] | Providing a detailed account of the evolutionary and structural connections within the entire set of proteins with established structures and covering 72,544 non-redundant domains. | http://scop.mrc-lmb.cam.ac.uk/scop (accessed on 25 June 2023) |
RCSB PDB [52] | Including 207,791 structures, 63,708 structures of human sequences, and 16,385 nucleic acid-containing structures. | https://www.rcsb.org/ (accessed on 25 June 2023) | |
Other databases | GO [53] | The primary repository of knowledge regarding gene functionalities, which contains 42,950 GO terms, 7,453,079 annotations and 1,504,969 gene products. | http://geneontology.org/ (accessed on 25 June 2023) |
CGD [56] | A repository housing protein and gene data for Candida albicans and its related species. | http://www.candidagenome.org/ (accessed on 25 June 2023) | |
Negatome [47] | A resource composed of protein and domain pairs, which are improbable to interact directly. | http://mips.helmholtz-muenchen.de/proj/ppi/negatome/ (accessed on 25 June 2023) | |
KEGG [55] | A repository that integrates 18 databases arranged into categories like systems, chemicals, genomic, and health. | https://www.kegg.jp/kegg/ (accessed on 25 June 2023) | |
COG [54] | Covering proteins and genes across a broad spectrum of biological fields and providing tools for their functional annotation and evolutionary analysis. | https://www.ncbi.nlm.nih.gov/research/cog/ (accessed on 25 June 2023) |
Type | Encoding Method | Description | Vector Dimension | Reference |
---|---|---|---|---|
Handcrafted | Amino acid compositi-on | The proportion of each type of amino acid in a protein. | 20 | [60] |
Pseudo amino acid composition | Extracting the physicochemical and composition information. | 20 + | [61] | |
Amphiphilic pseudo amino acid composition | The first 20 numbers represent the classical amino acid composition, followed by discrete numbers representing amphiphilic sequence correlations along the protein chain. | 20 + | [62] | |
Dipeptide composition | The ratio of all possible pairs of amino acids in a protein. | 400 | [63] | |
Conjoint triad | The frequency at which triads, comprising three adjacent amino acids along with their respective three-digit mapped numbers, occur in a protein sequence. | 343 | [57] | |
Quasi-Sequence-Order descriptors | Representing the distribution of amino acids along the protein sequence based on the sequence-order effect and physicochemical properties. | 20 + | [64] | |
Autocorrelation descriptors | Extracting the physicochemical properties of proteins to complete the encoding, including the Moreau-Broto descriptor, Moran descriptor, and Geary descriptor. | [65] | ||
Composition | The proportion of every category of amino acids after division by attributes. | 39 | [66] | |
Transition | The ratio of dipeptides consisted of different classes of amino acids. | 39 | [66] | |
Distribution | Describing the distribution of amino acids in each attribute group throughout the sequence. | 195 | [66] | |
Position-specific scoring matrix | Reflecting the probability of each amino acid occurring at a particular position. | — | [67] | |
Multivariate mutual information | Combined information regarding amino acids obtained by retrieving group-specific characteristics and information entropy. | 119 | [68] | |
Learned | Word2vec | Converting protein sequences into numerical feature representations utilizing each semantic relationship between amino acids after training on the corpus generated from the protein sequences. | — | [69] |
TextCNN | Consisting of stacked CNNs and max-pooling layers and capturing local features of protein sequences. | 128 | [9] |
Category | Name | Details | URL |
---|---|---|---|
Sequence-based | LightGBM-PPI [85] | This method extracts composition and physicochemical information from protein sequences, uses elastic net for feature selection to obtain the best subset, and ultimately employs LightGBM to complete the classification for PPI prediction. | https://github.com/QUST-AIBBDRC/LightGBM-PPI/ (accessed on 24 July 2023) |
GcForest-PPI [86] | GcForest-PPI constructs deep forest models through a cascade architecture, where all levels of the cascade (except the last one) consist of 4 XGBoosts, 4 RFs, and 4 Extra-Trees. | https://github.com/QUST-AIBBDRC/GcForest-PPI (accessed on 24 July 2023) | |
FWRF [87] | FWRF is an improved rotating forest algorithm, which calculates the weights of features by means of statistics and removes features with low weight values based on the selection rate. | http://202.119.201.126:8888/FWRF/ (accessed on 24 July 2023) | |
Profppikernel [58] | Profppikernel uses evolutionary profiles as features and profile-kernel SVMs as a classifier to compute hyperplanes that optimally separate the two classes of data points. | https://rostlab.org/owiki/index.php/Profppikernel (accessed on 24 July 2023) | |
Goktepe et al. [88] | The proposed method utilizes the Kaiser criterion in principal component analysis (PCA) as a component selection criterion for lowering the dimensionality of feature vectors and SVM to complete the classification task. | N/A | |
Hu et al. [89] | Hu et al. propose a distributed framework integrated with MapReduce, CoFex+, to achieve a large-scale PPI prediction. | N/A | |
SSWRF [90] | SSWRF is an integrated approach that combines sample weighted random forest with SVM. | http://csbio.njust.edu.cn/bioinf/SSWRF (accessed on 25 July 2023) | |
Structure-based | PrePPI [91] | PrePPI combines structural information with additional functional features into a Naïve Bayesian network to predict PPIs. | http://bhapp.c2b2.columbia.edu/PrePPI/ (accessed on 25 July 2023) |
InterPred [74] | InterPred is a fully automated computational pipeline that integrates structural modeling, large-scale structural alignment, and molecular docking techniques to predict and model PPIs. | http://wallnerlab.org/InterPred/ (accessed on 25 July 2023) | |
Dong et al. [32] | Dong et al. use structural modeling and a docking algorithm to predict Arabidopsis PPIs. | http://bar.utoronto.ca/interactions2/ (accessed on 25 July 2023) | |
Bryant et al. [92] | Bryant et al. employ AlphaFold2 protocol and optimized multiple sequence alignments for modeling, and design pDockQ score to distinguish acceptable models from incorrect ones. | https://gitlab.com/ElofssonLab/FoldDock (accessed on 25 July 2023) | |
Cluspro [93] | Cluspro is an automated rigid-body docking and recognition algorithm that quickly filters docked conformations and ranks them according to their clustering properties. | http://structure.bu.edu (accessed on 4 March 2024) | |
HADDOCK [94] | HADDOCK is a docking technique guided by (experimental) knowledge regarding the molecular interface and relative orientations. | http://haddock.chem.uu.nl/Haddock (accessed on 4 March 2024) | |
PRODIGY [95] | PRODIGY can predict binding affinities based on a 3D structure. | http://milou.science.uu.nl/services/PRODIGY (accessed on 4 March 2024) | |
PATCHDOCK [96] | PATCHDOCK is a molecular docking algorithm based on the principle of shape complementarity. | http://bioinfo3d.cs.tau.ac.il (accessed on 4 March 2024) | |
Network-based | L3 [30] | The L3 principle holds that protein pairs are likely to interact if connected through numerous paths in the network. | N/A |
SFCN [97] | The SFCN model can accurately identify all future common neighbors and measure their contributions using only existing similarity indicators. | N/A | |
Sim [98] | Sim is a link prediction means that relies on gene duplication and the complementarity of protein–protein interfaces. | https://github.com/wingroy001/L3Sim (accessed on 26 July 2023) | |
Lei et al. [99] | The proposed approach introduces a new random walk algorithm with resistance, which predicts PPIs by measuring the higher order topological similarity of two proteins. | www.cs.utsa.edu/jruan/RWS/ (accessed on 26 July 2023) | |
L3N [100] | L3N addresses certain missing elements in the L3 predictor from a network modeling perspective. | https://github.com/andy897221/BMC_PPI_L3N (accessed on 26 July 2023) | |
GO-based | PPI-MetaGO [101] | PPI-MetaGO is an ensemble meta-learning approach that leverages GO semantic similarity and other feature representations and multiple ML algorithms to predict PPIs. | https://github.com/mlbioinfolab/ppi-metago (accessed on 26 July 2023) |
Bandyopadhyay et al. [77] | Protein pairs are represented by weighted eigenvectors based on GO terms, and SVM is used to predict new PPIs. | N/A | |
Deep learning-based | DeepPPI [102] | DeepPPI uses two separate neural networks receiving raw input from each of the two proteins to predict the PPIs. | http://ailab.ahu.edu.cn:8087/DeepPPI/index.html (accessed on 26 July 2023) |
Struct2Graph [71] | Struct2Graph, based on GCNs, is a mutual attention classifier specialized in predicting PPIs from 3D structural data. | https://github.com/baranwa2/Struct2Graph (accessed on 26 July 2023) | |
TAGPPI [9] | TAGPPI extracts multidimensional features from protein sequences and contact maps created by AlphaFold. | https://github.com/xzenglab/TAGPPI (accessed on 26 July 2023) | |
SGPPI [103] | SGPPI employs graph convolutional neural networks as part of its structure-based DL framework. | https://github.com/emerson106/SGPPI (accessed on 26 July 2023) | |
Sun et al. [104] | Sun et al. apply stacked autoencoders to predict human PPIs. | http://repharma.pku.edu.cn/ppi (accessed on 27 July 2023) | |
D-SCRIPT [105] | D-SCRIPT is a DL model that utilizes the binding compatibility of protein structures to predict PPIs. | https://cb.csail.mit.edu/cb/dscript/ (accessed on 27 July 2023) | |
DPPI [106] | DPPI employs a convolutional neural network architecture which combines stochastic projection and data augmentation to accomplish the prediction task. | https://github.com/hashemifar/DPPI/ (accessed on 27 July 2023) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xian, L.; Wang, Y. Advances in Computational Methods for Protein–Protein Interaction Prediction. Electronics 2024, 13, 1059. https://doi.org/10.3390/electronics13061059
Xian L, Wang Y. Advances in Computational Methods for Protein–Protein Interaction Prediction. Electronics. 2024; 13(6):1059. https://doi.org/10.3390/electronics13061059
Chicago/Turabian StyleXian, Lei, and Yansu Wang. 2024. "Advances in Computational Methods for Protein–Protein Interaction Prediction" Electronics 13, no. 6: 1059. https://doi.org/10.3390/electronics13061059
APA StyleXian, L., & Wang, Y. (2024). Advances in Computational Methods for Protein–Protein Interaction Prediction. Electronics, 13(6), 1059. https://doi.org/10.3390/electronics13061059