An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction
Abstract
:1. Introduction
1.1. Stacking Ensemble Classifier
1.2. Model Performance Evaluation
2. Result and Discussion
2.1. Baseline Model Performance
2.2. Ensemble Classifier Performances
2.3. Comparative Performance with Existing Methods
3. Materials and Methods
3.1. Dataset and Feature Representation
3.2. Baseline Classifier
- Generate a bootstrap sample, denoted as , for each tree.
- Develop each tree, labeled as , utilizing the respective sample .
- Determine the optimal predictor for each split of the tree by selecting from random subsets of predictors, guided by a predefined criterion such as entropy or the Gini index.
- We consider that we have a dataset with an input variable of and correspondence labels of y, so that the dataset can be written as
- The purpose is to estimate the function using a minimal loss function while reconstructing the unknown functional dependence of x on y, .
- The estimation equation can be reformulated by minimizing the expected loss function throughout the response data :
- We consider that we have a dataset with an input variable of and correspondence labels of y, so that the dataset can be written as
- When K trees comprise an XGBoost, the resulting model is denoted as , where represents the prediction function of kth tree.
- The following stage is calculating the expected output :
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AUC | Area Under the Curve |
CNN | Convolutional Neural Network |
dBSA | Delta Buried Surface Area |
dComDistance | Delta Center of Mass Distance |
dFNat | Delta Fraction Native |
dHBNum | Delta Hydrogen Bond Number |
DNN | Deep Neural Network |
dNonbE | Delta Non-bonded Energy |
dNonbWater | Delta Non-bonded Water |
EFB | Exclusive Feature Bundling |
EMs | Electron Microscopy |
FNs | False negatives |
FPs | False positives |
GANs | Graph attention networks |
GCNs | Graph convolutional networks |
GE | Global Encoding |
GOSS | Gradient-based One-Side Sampling |
LGB | Light gradient boosting |
MCC | Matthew’s correlation coefficient |
MD | Molecular dynamics |
NMR | Nuclear Magnetic Resonance |
QSAR | Quantitative Structure–Activity Relationships |
RMSd-i | Root Mean Square Deviation of Interface |
RMSd-l | Root Mean Square Deviation of Ligand |
ROC | Receiver Operating Characteristic |
TNs | True negatives |
TPs | True positives |
XGBoost | Extreme gradient boosting |
LR | Logistic regression |
NB | Naïve Bayes |
NN | Neural Network |
PPIs | Protein–protein interactions |
RF | Random forest |
SVMs | Support Vector Machines |
References
- Mazmanian, K.; Sargsyan, K.; Lim, C. How the local environment of functional sites regulates protein function. J. Am. Chem. Soc. 2020, 142, 9861–9871. [Google Scholar] [CrossRef] [PubMed]
- Peng, X.; Wang, J.; Peng, W.; Wu, F.X.; Pan, Y. Protein–protein interactions: Detection, reliability assessment and applications. Briefings Bioinform. 2017, 18, 798–819. [Google Scholar] [CrossRef] [PubMed]
- Xiang, H.; Zhou, M.; Li, Y.; Zhou, L.; Wang, R. Drug discovery by targeting the protein–protein interactions involved in autophagy. Acta Pharm. Sin. B 2023. [Google Scholar] [CrossRef] [PubMed]
- Morris, R.; Black, K.A.; Stollar, E.J. Uncovering protein function: From classification to complexes. Essays Biochem. 2022, 66, 255–285. [Google Scholar] [CrossRef] [PubMed]
- Keskin, O.; Gursoy, A.; Ma, B.; Nussinov, R. Principles of protein- protein interactions: What are the preferred ways for proteins to interact? Chem. Rev. 2008, 108, 1225–1244. [Google Scholar] [CrossRef] [PubMed]
- Bryant, P.; Pozzati, G.; Elofsson, A. Improved prediction of protein–protein interactions using AlphaFold2. Nat. Commun. 2022, 13, 1265. [Google Scholar] [CrossRef] [PubMed]
- Ding, Z.; Kihara, D. Computational identification of protein–protein interactions in model plant proteomes. Sci. Rep. 2019, 9, 8740. [Google Scholar] [CrossRef] [PubMed]
- Liu, T.; Gao, H.; Ren, X.; Xu, G.; Liu, B.; Wu, N.; Luo, H.; Wang, Y.; Tu, T.; Yao, B.; et al. Protein–protein interaction and site prediction using transfer learning. Briefings Bioinform. 2023, 24, bbad376. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Zhou, Q.; He, J.; Jiang, Z.; Peng, C.; Tong, R.; Shi, J. Recent advances in the development of protein–protein interactions modulators: Mechanisms and clinical trials. Signal Transduct. Target. Ther. 2020, 5, 213. [Google Scholar] [CrossRef]
- Kuzmanov, U.; Emili, A. Protein-protein interaction networks: Probing disease mechanisms using model systems. Genome Med. 2013, 5, 37. [Google Scholar] [CrossRef]
- Winegar, P.H.; Hayes, O.G.; McMillan, J.R.; Figg, C.A.; Focia, P.J.; Mirkin, C.A. DNA-directed protein packing within single crystals. Chem 2020, 6, 1007–1017. [Google Scholar] [CrossRef] [PubMed]
- Díaz-Moreno, I.; Díaz-Quintana, A.; Subías, G.; Mairs, T.; Miguel, A.; Díaz-Moreno, S. Detecting transient protein–protein interactions by X-ray absorption spectroscopy: The cytochrome c6-photosystem I complex. FEBS Lett. 2006, 580, 3023–3028. [Google Scholar] [CrossRef] [PubMed]
- Ravi Acharya, K.; Lloyd, M.D. The advantages and limitations of protein crystal structures. Trends Pharmacol. Sci. 2005, 26, 10–14. [Google Scholar] [CrossRef] [PubMed]
- Gao, G.; Williams, J.G.; Campbell, S.L. Protein-protein interaction analysis by nuclear magnetic resonance spectroscopy. In Protein-Protein Interactions: Methods and Applications; Humana Press: Totowa, NJ, USA, 2004; pp. 79–91. [Google Scholar]
- Purslow, J.A.; Khatiwada, B.; Bayro, M.J.; Venditti, V. NMR methods for structural characterization of protein–protein complexes. Front. Mol. Biosci. 2020, 7, 9. [Google Scholar] [CrossRef] [PubMed]
- Hu, Y.; Cheng, K.; He, L.; Zhang, X.; Jiang, B.; Jiang, L.; Li, C.; Wang, G.; Yang, Y.; Liu, M. NMR-based methods for protein analysis. Anal. Chem. 2021, 93, 1866–1879. [Google Scholar] [CrossRef] [PubMed]
- Malhotra, S.; Joseph, A.P.; Thiyagalingam, J.; Topf, M. Assessment of protein–protein interfaces in cryo-EM derived assemblies. Nat. Commun. 2021, 12, 3399. [Google Scholar] [CrossRef] [PubMed]
- Carter, R.; Luchini, A.; Liotta, L.; Haymond, A. Next-generation techniques for determination of protein–protein interactions: Beyond the crystal structure. Curr. Pathobiol. Rep. 2019, 7, 61–71. [Google Scholar] [CrossRef] [PubMed]
- Costa, T.R.; Ignatiou, A.; Orlova, E.V. Structural analysis of protein complexes by cryo electron microscopy. In Bacterial Protein Secretion Systems: Methods and Protocols; Humana Press: New York, NY, USA, 2017; pp. 377–413. [Google Scholar]
- Xiong, W.; Xie, L.; Zhou, S.; Guan, J. Active learning for protein function prediction in protein–protein interaction networks. Neurocomputing 2014, 145, 44–52. [Google Scholar] [CrossRef]
- Ying, K.C.; Lin, S.W. Maximizing cohesion and separation for detecting protein functional modules in protein–protein interaction networks. PLoS ONE 2020, 15, e0240628. [Google Scholar] [CrossRef]
- Jha, K.; Saha, S. Amalgamation of 3d structure and sequence information for protein–protein interaction prediction. Sci. Rep. 2020, 10, 19171. [Google Scholar] [CrossRef]
- Chen, X.W.; Liu, M. Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 2005, 21, 4394–4400. [Google Scholar] [CrossRef] [PubMed]
- Qi, Y.; Klein-Seetharaman, J.; Bar-Joseph, Z. Random forest similarity for protein–protein interaction prediction from multiple sources. In Biocomputing 2005; World Scientific: Singapore, 2005; pp. 531–542. [Google Scholar]
- Li, B.Q.; Feng, K.Y.; Chen, L.; Huang, T.; Cai, Y.D. Prediction of protein–protein interaction sites by random forest algorithm with mRMR and IFS. PLoS ONE 2012, 7, e43927. [Google Scholar] [CrossRef] [PubMed]
- Zhan, X.K.; You, Z.H.; Li, L.P.; Li, Y.; Wang, Z.; Pan, J. Using random forest model combined with Gabor feature to predict protein–protein interaction from protein sequence. Evol. Bioinform. 2020, 16, 1176934320934498. [Google Scholar] [CrossRef] [PubMed]
- Barradas-Bautista, D.; Cao, Z.; Vangone, A.; Oliva, R.; Cavallo, L. A random forest classifier for protein–protein docking models. Bioinform. Adv. 2022, 2, vbab042. [Google Scholar] [CrossRef] [PubMed]
- Jha, K.; Saha, S.; Singh, H. Prediction of protein–protein interaction using graph neural networks. Sci. Rep. 2022, 12, 8360. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Han, P.; Wang, G.; Chen, W.; Wang, S.; Song, T. SDNN-PPI: Self-attention with deep neural network effect on protein–protein interaction prediction. BMC Genom. 2022, 23, 474. [Google Scholar] [CrossRef] [PubMed]
- Soleymani, F.; Paquet, E.; Viktor, H.L.; Michalowski, W.; Spinello, D. ProtInteract: A deep learning framework for predicting protein–protein interactions. Comput. Struct. Biotechnol. J. 2023, 21, 1324–1348. [Google Scholar] [CrossRef] [PubMed]
- Ni, Q.; Wang, Z.Z.; Han, Q.; Li, G.; Wang, X.; Wang, G. Using logistic regression method to predict protein function from protein–protein interaction data. In Proceedings of the 2009 3rd International Conference on Bioinformatics and Biomedical Engineering, Beijing, China, 11–13 June 2009; IEEE: Cham, Switzerland, 2009; pp. 1–4. [Google Scholar]
- Su, X.R.; You, Z.H.; Hu, L.; Huang, Y.A.; Wang, Y.; Yi, H.C. An efficient computational model for large-scale prediction of protein–protein interactions based on accurate and scalable graph embedding. Front. Genet. 2021, 12, 635451. [Google Scholar] [CrossRef] [PubMed]
- Prasasty, V.D.; Hutagalung, R.A.; Gunadi, R.; Sofia, D.Y.; Rosmalena, R.; Yazid, F.; Sinaga, E. Prediction of human-Streptococcus pneumoniae protein–protein interactions using logistic regression. Comput. Biol. Chem. 2021, 92, 107492. [Google Scholar] [CrossRef]
- Kohonen, J.; Talikota, S.; Corander, J.; Auvinen, P.; Arjas, E. A Naive Bayes classifier for protein function prediction. Silico Biol. 2009, 9, 23–34. [Google Scholar] [CrossRef]
- Murakami, Y.; Mizuguchi, K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein–protein interaction sites. Bioinformatics 2010, 26, 1841–1848. [Google Scholar] [CrossRef] [PubMed]
- Maruyama, O. Heterodimeric protein complex identification by naïve Bayes classifiers. BMC Bioinform. 2013, 14, 347. [Google Scholar] [CrossRef] [PubMed]
- Geng, H.; Lu, T.; Lin, X.; Liu, Y.; Yan, F. Prediction of protein–protein interaction sites based on naive Bayes classifier. Biochem. Res. Int. 2015, 2015, 978193. [Google Scholar] [CrossRef] [PubMed]
- Uddin, M.A.; Ahmed, M.S. Modified naive Bayes classifier for classification of protein–protein interaction sites. J. Biosci. Agric. Res. 2020, 26, 2177–2184. [Google Scholar] [CrossRef]
- Bradford, J.R.; Westhead, D.R. Improved prediction of protein–protein binding sites using a support vector machines approach. Bioinformatics 2005, 21, 1487–1494. [Google Scholar] [CrossRef] [PubMed]
- Lestari, D.; Aprilia, S.; Bustamam, A. Performance analysis of support vector machine combined with global encoding on detection of protein–protein interaction network of HIV virus. AIP Conf. Proc. 2018, 2023, 020228. [Google Scholar]
- Das, S.; Chakrabarti, S. Classification and prediction of protein–protein interaction interface using machine learning algorithm. Sci. Rep. 2021, 11, 1761. [Google Scholar] [CrossRef] [PubMed]
- Quasar, S.R.; Sharma, R.; Mittal, A.; Sharma, M.; Agarwal, D.; de La Torre Díez, I. Ensemble methods for computed tomography scan images to improve lung cancer detection and classification. Multimed. Tools Appl. 2024, 83, 52867–52897. [Google Scholar] [CrossRef]
- Lasantha, D.; Vidanagamachchi, S.; Nallaperuma, S. Deep learning and ensemble deep learning for circRNA-RBP interaction prediction in the last decade: A review. Eng. Appl. Artif. Intell. 2023, 123, 106352. [Google Scholar] [CrossRef]
- Elo, G.; Ghansah, B.; Kwaa-Aidoo, E.K. Critical Review of Stack Ensemble Classifier for the Prediction of Young Adults’ Voting Patterns Based on Parents’ Political Affiliations. Informing Sci. Int. J. Emerg. Transdiscipl. 2024, 27, 002. [Google Scholar] [CrossRef]
- Peng, L.; Yuan, R.; Shen, L.; Gao, P.; Zhou, L. LPI-EnEDT: An ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min. 2021, 14, 50. [Google Scholar] [CrossRef] [PubMed]
- ZRen, Z.H.; Yu, C.Q.; Li, L.P.; You, Z.H.; Guan, Y.J.; Li, Y.C.; Pan, J. SAWRPI: A stacking ensemble framework with adaptive weight for predicting ncRNA-protein interactions using sequence information. Front. Genet. 2022, 13, 839540. [Google Scholar]
- Albu, A.I.; Bocicor, M.I.; Czibula, G. MM-StackEns: A new deep multimodal stacked generalization approach for protein–protein interaction prediction. Comput. Biol. Med. 2023, 153, 106526. [Google Scholar] [CrossRef]
- Cong, H.; Liu, H.; Cao, Y.; Liang, C.; Chen, Y. Protein–protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinform. 2023, 24, 456. [Google Scholar] [CrossRef] [PubMed]
- Gramatica, P.; Giani, E.; Papa, E. Statistical external validation and consensus modeling: A QSPR case study for Koc prediction. J. Mol. Graph. Model. 2007, 25, 755–766. [Google Scholar] [CrossRef] [PubMed]
- Valsecchi, C.; Grisoni, F.; Consonni, V.; Ballabio, D. Consensus versus individual QSARs in classification: Comparison on a large-scale case study. J. Chem. Inf. Model. 2020, 60, 1215–1223. [Google Scholar] [CrossRef] [PubMed]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Zhou, Z.H.; Zhou, Z.H. Ensemble Learning; Springer: Singapore, 2002. [Google Scholar]
- Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
- Cao, H.; Gu, Y.; Fang, J.; Hu, Y.; Ding, W.; He, H.; Chen, G. Application of stacking ensemble learning model in quantitative analysis of biomaterial activity. Microchem. J. 2022, 183, 108075. [Google Scholar]
- de Zarzà i Cubero, I.; de Curtò y DíAz, J.; Hernández-Orallo, E.; Calafate, C. Cascading and Ensemble Techniques in Deep Learning. Electronics 2023, 12, 3354. [Google Scholar] [CrossRef]
- Sarmas, E.; Spiliotis, E.; Marinakis, V.; Koutselis, T.; Doukas, H. A meta-learning classification model for supporting decisions on energy efficiency investments. Energy Build. 2022, 258, 111836. [Google Scholar] [CrossRef]
- Härner, S.; Ekman, D. Comparing Ensemble Methods with Individual Classifiers in Machine Learning for Diabetes Detection; Degree Project Report in Computer Science and Engineering; KTH Royal Institute of Technology: Stockholm, Sweden, June 2022. [Google Scholar]
- Sayyad, S.; Shaikh, M.; Pandit, A.; Sonawane, D.; Anpat, S. Confusion matrix-based supervised classification using microwave SIR-C SAR satellite dataset. In Proceedings of the Recent Trends in Image Processing and Pattern Recognition: Third International Conference, RTIP2R 2020, Aurangabad, India, 3–4 January 2020; Revised Selected Papers, Part II 3. Springer: Singapore, 2021; pp. 176–187. [Google Scholar]
- Dinga, R.; Penninx, B.W.; Veltman, D.J.; Schmaal, L.; Marquand, A.F. Beyond accuracy: Measures for assessing machine learning models, pitfalls and guidelines. bioRxiv 2019, 743138. [Google Scholar] [CrossRef]
- Blagec, K.; Dorffner, G.; Moradi, M.; Samwald, M. A critical analysis of metrics used for measuring progress in artificial intelligence. arXiv 2020, arXiv:2008.02577. [Google Scholar]
- de Hond, A.A.; Van Calster, B.; Steyerberg, E.W. Commentary: Artificial Intelligence and Statistics: Just the Old Wine in New Wineskins? Front. Digit. Health 2022, 4, 923944. [Google Scholar] [CrossRef]
- Armah, G.K.; Luo, G.; Qin, K. A deep analysis of the precision formula for imbalanced class distribution. Int. J. Mach. Learn. Comput. 2014, 4, 417–422. [Google Scholar] [CrossRef]
- Monaghan, T.F.; Rahman, S.N.; Agudelo, C.W.; Wein, A.J.; Lazar, J.M.; Everaert, K.; Dmochowski, R.R. Foundational statistical principles in medical research: Sensitivity, specificity, positive predictive value, and negative predictive value. Medicina 2021, 57, 503. [Google Scholar] [CrossRef] [PubMed]
- Christen, P.; Hand, D.J.; Kirielle, N. A review of the F-measure: Its history, properties, criticism, and alternatives. ACM Comput. Surv. 2023, 56, 73. [Google Scholar] [CrossRef]
- Lavazza, L.; Morasca, S. Comparing ϕ and the F-measure as performance metrics for software-related classifications. Empir. Softw. Eng. 2022, 27, 185. [Google Scholar] [CrossRef]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
- Rashidi, H.H.; Albahra, S.; Robertson, S.; Tran, N.K.; Hu, B. Common statistical concepts in the supervised Machine Learning arena. Front. Oncol. 2023, 13, 1130229. [Google Scholar] [CrossRef]
- Lučić, B.; Batista, J.; Bojović, V.; Lovrić, M.; Sović Kržić, A.; Bešlo, D.; Nadramija, D.; Vikić-Topić, D. Estimation of random accuracy and its use in validation of predictive quality of classification models within predictive challenges. Croat. Chem. Acta 2019, 92, 379–391. [Google Scholar] [CrossRef]
- Orasch, O.; Weber, N.; Müller, M.; Amanzadi, A.; Gasbarri, C.; Trummer, C. Protein–Protein Interaction Prediction for Targeted Protein Degradation. Int. J. Mol. Sci. 2022, 23, 7033. [Google Scholar] [CrossRef] [PubMed]
- Jandova, Z.; Vargiu, A.V.; Bonvin, A.M. Native or Non-Native Protein–Protein Docking Models? Molecular Dynamics to the Rescue. J. Chem. Theory Comput. 2021, 17, 5944–5954. [Google Scholar] [CrossRef] [PubMed]
- Zhao, N.; Pang, B.; Shyu, C.R.; Korkin, D. An accurate classification of native and non-native protein–protein interactions using supervised and semi-supervised learning approaches. In Proceedings of the 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Hongkong, China, 18–21 December 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 185–189. [Google Scholar]
- Zhao, N.; Pang, B.; Shyu, C.R.; Korkin, D. Feature-based classification of native and non-native protein–protein interactions: Comparing supervised and semi-supervised learning approaches. Proteomics 2011, 11, 4321–4330. [Google Scholar] [CrossRef] [PubMed]
- Berry, A. Protein folding and its links with human disease. In Proceedings of the Biochemical Society Symposia, Leeds, UK, 1 August 2001; Portland Press Limited: London, UK, 2001; Volume 68, pp. 1–26. [Google Scholar]
- Zhou, H.X.; Pang, X. Electrostatic interactions in protein structure, folding, binding, and condensation. Chem. Rev. 2018, 118, 1691–1741. [Google Scholar] [CrossRef] [PubMed]
- Chandel, T.I.; Zaman, M.; Khan, M.V.; Ali, M.; Rabbani, G.; Ishtikhar, M.; Khan, R.H. A mechanistic insight into protein-ligand interaction, folding, misfolding, aggregation and inhibition of protein aggregates: An overview. Int. J. Biol. Macromol. 2018, 106, 1115–1129. [Google Scholar] [CrossRef] [PubMed]
- Louros, N.; Schymkowitz, J.; Rousseau, F. Mechanisms and pathology of protein misfolding and aggregation. Nat. Rev. Mol. Cell Biol. 2023, 24, 912–933. [Google Scholar] [CrossRef] [PubMed]
- Chaudhuri, T.K.; Paul, S. Protein-misfolding diseases and chaperone-based therapeutic approaches. FEBS J. 2006, 273, 1331–1349. [Google Scholar] [CrossRef] [PubMed]
- Damm, K.L.; Carlson, H.A. Gaussian-Weighted RMSD Superposition of Proteins: A Structural Comparison for Flexible Proteins and Predicted Protein Structures. Biophys. J. 2006, 90, 4558–4573. [Google Scholar] [CrossRef]
- Pandya, V.; Rao, P.; Prajapati, J.; Rawal, R.M.; Goswami, D. Pinpointing top inhibitors for GSK3β from pool of indirubin derivatives using rigorous computational workflow and their validation using molecular dynamics (MD) simulations. Sci. Rep. 2024, 14, 14–49. [Google Scholar] [CrossRef]
- Stärk, H.; Ganea, O.; Pattanaik, L.; Barzilay, D.; Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MA, USA, 17–23 July 2022; Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; PMLR: London, UK, 2022; Volume 162, pp. 20503–20521. [Google Scholar]
- Gaudreault, F.; Najmanovich, R.J. FlexAID: Revisiting docking on non-native-complex structures. J. Chem. Inf. Model. 2015, 55, 1323–1336. [Google Scholar] [CrossRef] [PubMed]
- Bodea, F.; Bungau, S.G.; Negru, A.P.; Radu, A.; Tarce, A.G.; Tit, D.M.; Bungau, A.F.; Bustea, C.; Behl, T.; Radu, A.F. Exploring new therapeutic avenues for ophthalmic disorders: Glaucoma-related molecular docking evaluation and bibliometric analysis for improved management of ocular diseases. Bioengineering 2023, 10, 983. [Google Scholar] [CrossRef]
- Ovchinnikov, S.; Kamisetty, H.; Baker, D. Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information. eLife 2014, 3, e02030. [Google Scholar] [CrossRef] [PubMed]
- Rozano, L.; Hane, J.K.; Mancera, R.L. The Molecular Docking of MAX Fungal Effectors with Plant HMA Domain-Binding Proteins. Int. J. Mol. Sci. 2023, 24, 15239. [Google Scholar] [CrossRef]
- Chakravarty, D.; Guharoy, M.; Robert, C.H.; Chakrabarti, P.; Janin, J. Reassessing buried surface areas in protein–protein complexes. Protein Sci. 2013, 22, 1453–1457. [Google Scholar] [CrossRef]
- Schiebel, J.; Gaspari, R.; Wulsdorf, T.; Ngo, K.; Sohn, C.; Schrader, T.E.; Cavalli, A.; Ostermann, A.; Heine, A.; Klebe, G. Intriguing role of water in protein-ligand binding studied by neutron crystallography on trypsin complexes. Nat. Commun. 2018, 9, 3559. [Google Scholar] [CrossRef]
- Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 1, pp. 278–282. [Google Scholar]
- Kulkarni, V.Y. Effective Learning and Classification Using Random Forest Algorithm. Ph.D. Thesis, Savitribai Phule Pune University, Pune, India, June 2014. [Google Scholar]
- Lee, T.H.; Ullah, A.; Wang, R. Bootstrap aggregating and random forest. In Macroeconomic Forecasting in the Era of Big Data: Theory and Practice; Springer: Cham, Switzerland, 2020; pp. 389–429. [Google Scholar]
- Boyko, N.; Omeliukh, R.; Duliaba, N. The Random Forest Algorithm as an Element of Statistical Learning for Disease Prediction. In Proceedings of the 3rd International Workshop on Computational & Information Technologies for Risk-Informed Systems, Neubiberg, Germany, 12 January 2023; Volume 4, pp. 1–15. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Biau, G.; Cadre, B.; Rouvìère, L. Accelerated gradient boosting. Mach. Learn. 2019, 108, 971–992. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Mateo, J.; Rius-Peris, J.; Maraña-Pérez, A.; Valiente-Armero, A.; Torres, A. Extreme gradient boosting machine learning method for predicting medical treatment in patients with acute bronchiolitis. Biocybern. Biomed. Eng. 2021, 41, 792–801. [Google Scholar] [CrossRef]
- Ali, Z.A.; Abduljabbar, Z.H.; Taher, H.A.; Sallow, A.B.; Almufti, S.M. Exploring the Power of eXtreme Gradient Boosting Algorithm in Machine Learning: A Review. Acad. J. Nawroz Univ. 2023, 12, 320–334. [Google Scholar]
- Zhang, J.; Mucs, D.; Norinder, U.; Svensson, F. LightGBM: An effective and scalable algorithm for prediction of chemical toxicity–application to the Tox21 and mutagenicity data sets. J. Chem. Inf. Model. 2019, 59, 4150–4158. [Google Scholar] [CrossRef] [PubMed]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2017; Volume 30. [Google Scholar]
- Taha, A.A.; Malebary, S.J. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 2020, 8, 25579–25587. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, W.; Wang, K.; Song, J. Application of LightGBM Algorithm in the Initial Design of a Library in the Cold Area of China Based on Comprehensive Performance. Buildings 2022, 12, 1309. [Google Scholar] [CrossRef]
Model | Evaluation | Testing Data, at Each Trajectory Interval | Independent Set | ||||
---|---|---|---|---|---|---|---|
0–20 ns | 20–40 ns | 40–60 ns | 60 –80 ns | 80–100 ns | |||
Previous Model [70] | Accuracy | 0.77 | 0.83 | 0.85 | 0.85 | 0.86 | 0.60 |
Precision | 0.79 | 0.86 | 0.87 | 0.86 | 0.88 | 0.61 | |
Recall | 0.76 | 0.81 | 0.84 | 0.84 | 0.85 | 0.61 | |
F1-Score | 0.76 | 0.82 | 0.85 | 0.84 | 0.85 | 0.59 | |
ROC AUC | 0.86 | 0.92 | 0.93 | 0.93 | 0.94 | 0.60 | |
Ours | Accuracy | 0.84 | 0.89 | 0.91 | 0.92 | 0.92 | 0.63 |
Precision | 0.84 | 0.90 | 0.91 | 0.93 | 0.92 | 0.61 | |
Recall | 0.83 | 0.89 | 0.91 | 0.92 | 0.92 | 0.74 | |
F1-Score | 0.84 | 0.89 | 0.91 | 0.92 | 0.92 | 0.63 | |
ROC AUC | 0.92 | 0.96 | 0.97 | 0.98 | 0.97 | 0.63 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pratiwi, N.K.C.; Tayara, H.; Chong, K.T. An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction. Int. J. Mol. Sci. 2024, 25, 5957. https://doi.org/10.3390/ijms25115957
Pratiwi NKC, Tayara H, Chong KT. An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction. International Journal of Molecular Sciences. 2024; 25(11):5957. https://doi.org/10.3390/ijms25115957
Chicago/Turabian StylePratiwi, Nor Kumalasari Caecar, Hilal Tayara, and Kil To Chong. 2024. "An Ensemble Classifiers for Improved Prediction of Native–Non-Native Protein–Protein Interaction" International Journal of Molecular Sciences 25, no. 11: 5957. https://doi.org/10.3390/ijms25115957