Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features
Abstract
:1. Introduction
- An intelligent and robust computational model built on multiple layers.
- A deep model with automatic weight optimization using standard learning procedure for the accurate prediction of sumoylation sites.
- The employment of the half-sphere exposure method that efficiently transforms peptide sequences into a feature vector.
- Efficient features extraction by removing noisy and irrelevant features using an unsupervised PCA algorithm.
- Mathew’s Correlation Coefficient (MCC), accuracy, sensitivity, specificity, and area under the ROC Curve (AUC) are used to evaluate the performance of the proposed model.
2. Proposed Model Design
2.1. Benchmark Dataset
2.2. Feature Formulation Technique
2.3. Feature Selection
2.4. Deep Architecture
3. Performance Evaluation
4. Results and Analysis
4.1. Hyper-Parameter Analysis
4.2. Cross-Validation Scheme Performance Analysis
4.3. Performance Comparison of Different Classifiers
4.4. Existing Model Performance Comparisons
4.5. Performance Comparison on Independent Dataset
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mann, M.; Jensen, O.N. Proteomic Analysis of Post-Translational Modifications. Nat. Biotechnol. 2003, 21, 255–261. [Google Scholar] [CrossRef] [PubMed]
- Kessler, B.M.; Edelmann, M.J. PTMs in Conversation: Activity and Function of Deubiquitinating Enzymes Regulated via Post-Translational Modifications. Cell Biochem. Biophys. 2011, 60, 21–38. [Google Scholar] [CrossRef] [PubMed]
- Huber, S.C.; Hardin, S.C. Numerous Posttranslational Modifications Provide Opportunities for the Intricate Regulation of Metabolic Enzymes at Multiple Levels. Curr. Opin. Plant Biol. 2004, 7, 318–322. [Google Scholar] [CrossRef] [PubMed]
- Jensen, O.N. Interpreting the Protein Language Using Proteomics. Nat. Rev. Mol. Cell Biol. 2006, 7, 391–403. [Google Scholar] [CrossRef]
- Kim, W.; Bennett, E.J.; Huttlin, E.L.; Guo, A.; Li, J.; Possemato, A.; Sowa, M.E.; Rad, R.; Rush, J.; Comb, M.J.; et al. Systematic and Quantitative Assessment of the Ubiquitin-Modified Proteome. Mol. Cell 2011, 44, 325–340. [Google Scholar] [CrossRef]
- Drazic, A.; Myklebust, L.M.; Ree, R.; Arnesen, T. The World of Protein Acetylation. Biochim. Biophys. Acta-Proteins Proteomics 2016, 1864, 1372–1401. [Google Scholar] [CrossRef]
- Guo, M.; Huang, B.X. Integration of Phosphoproteomic, Chemical, and Biological Strategies for the Functional Analysis of Targeted Protein Phosphorylation. Proteomics 2013, 13, 424–437. [Google Scholar] [CrossRef]
- Venne, A.S.; Kollipara, L.; Zahedi, R.P. The next Level of Complexity: Crosstalk of Posttranslational Modifications. Proteomics 2014, 14, 513–524. [Google Scholar] [CrossRef]
- Verdin, E.; Ott, M. 50 Years of Protein Acetylation: From Gene Regulation to Epigenetics, Metabolism and Beyond. Nat. Rev. Mol. Cell Biol. 2015, 16, 258–264. [Google Scholar] [CrossRef]
- Warden, S.M.; Richardson, C.; O’Donnell, J., Jr.; Stapleton, D.; Kemp, B.E.; Witters, L.A. Post-Translational Modifications of the β-1 Subunit of AMP-Activated Protein Kinase Affect Enzyme Activity and Cellular Localization. Biochem. J. 2001, 354, 275. [Google Scholar] [CrossRef]
- Lee, H.; Iqbal, N.; Chang, W.; Lee, S.-Y. A Calibration Method for Eye-Gaze Estimation Systems Based on 3D Geometrical Optics. IEEE Sens. J. 2013, 13, 3219–3225. [Google Scholar] [CrossRef]
- OuYang, B.; Xie, S.; Berardi, M.J.; Zhao, X.; Dev, J.; Yu, W.; Sun, B.; Chou, J.J. Unusual Architecture of the P7 Channel from Hepatitis C Virus. Nature 2013, 498, 521–525. [Google Scholar] [CrossRef]
- Oxenoid, K.; Dong, Y.; Cao, C.; Cui, T.; Sancak, Y.; Markhard, A.L.; Grabarek, Z.; Kong, L.; Liu, Z.; Ouyang, B.; et al. Architecture of the Mitochondrial Calcium Uniporter. Nature 2016, 533, 269–273. [Google Scholar] [CrossRef]
- Liu, B.; Wu, H.; Chou, K.-C. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. Nat. Sci. 2017, 9, 67–91. [Google Scholar] [CrossRef]
- Bettermann, K.; Benesch, M.; Weis, S.; Haybaeck, J. SUMOylation in Carcinogenesis. Cancer Lett. 2012, 316, 113–125. [Google Scholar] [CrossRef]
- Xue, Y.; Zhou, F.; Fu, C.; Xu, Y.; Yao, X. SUMOsp: A Web Server for Sumoylation Site Prediction. Nucleic Acids Res. 2006, 34, W254–W257. [Google Scholar] [CrossRef]
- Ren, J.; Gao, X.; Jin, C.; Zhu, M.; Wang, X.; Shaw, A.; Wen, L.; Yao, X.; Xue, Y. Systematic Study of Protein Sumoylation: Development of a Site-Specific Predictor of SUMOsp 2.0. Proteomics 2009, 9, 3409–3412. [Google Scholar] [CrossRef]
- Zhao, Q.; Xie, Y.; Zheng, Y.; Jiang, S.; Liu, W.; Mu, W.; Liu, Z.; Zhao, Y.; Xue, Y.; Ren, J. GPS-SUMO: A Tool for the Prediction of Sumoylation Sites and SUMO-Interaction Motifs. Nucleic Acids Res. 2014, 42, W325–W330. [Google Scholar] [CrossRef]
- Zhou, F.-F.; Xue, Y.; Chen, G.-L.; Yao, X. GPS: A Novel Group-Based Phosphorylation Predicting and Scoring Method. Biochem. Biophys. Res. Commun. 2004, 325, 1443–1448. [Google Scholar] [CrossRef]
- Xu, Y.; Ding, Y.-X.; Deng, N.-Y.; Liu, L.-M. Prediction of Sumoylation Sites in Proteins Using Linear Discriminant Analysis. Gene 2016, 576, 99–104. [Google Scholar] [CrossRef]
- Yang, W. Regularized Complete Linear Discriminant Analysis for Small Sample Size Problems. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 67–73. ISBN 9783642318368. [Google Scholar]
- Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E. Linear Discriminant Analysis: A Detailed Tutorial. AI Commun. 2017, 30, 169–190. [Google Scholar] [CrossRef]
- Xu, J.; He, Y.; Qiang, B.; Yuan, J.; Peng, X.; Pan, X.-M. A Novel Method for High Accuracy Sumoylation Site Prediction from Protein Sequences. BMC Bioinform. 2008, 9, 8. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.-Z.; Chen, Z.; Gong, Y.-A.; Ying, G. SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties. PLoS ONE 2012, 7, e39195. [Google Scholar] [CrossRef] [PubMed]
- Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K.-C. PSumo-CD: Predicting Sumoylation Sites in Proteins with Covariance Discriminant Algorithm by Incorporating Sequence-Coupled Effects into General PseAAC. Bioinformatics 2016, 32, 3133–3141. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Lysenko, A.; López, Y.; Dehzangi, A.; Sharma, R.; Reddy, H.; Sattar, A.; Tsunoda, T. HseSUMO: Sumoylation Site Prediction Using Half-Sphere Exposures of Amino Acids Residues. BMC Genom. 2019, 19, 982. [Google Scholar] [CrossRef]
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Chou, K.-C.; Shen, H.-B. REVIEW: Recent Advances in Developing Web-Servers for Predicting Protein Attributes. Nat. Sci. 2009, 1, 63–92. [Google Scholar] [CrossRef]
- Chou, K.-C.C. Some Remarks on Protein Attribute Prediction and Pseudo Amino Acid Composition. J. Theor. Biol. 2011, 273, 236–247. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y.; Gao, T.; Pan, Z.; Cheng, H.; Yang, Q.; Cheng, Z.; Guo, A.; Ren, J.; Xue, Y. CPLM: A Database of Protein Lysine Modifications. Nucleic Acids Res. 2014, 42, D531–D536. [Google Scholar] [CrossRef]
- Kaur, P.; Gosain, A. Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise. In Advances in Intelligent Systems and Computing; Springer: Singapore, 2018; pp. 23–30. ISBN 9789811066016. [Google Scholar]
- Yen, S.-J.; Lee, Y.-S. Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. In Intelligent Control and Automation; Springer: Berlin/Heidelberg, Germany, 2006; pp. 731–740. ISBN 3540372555. [Google Scholar]
- Zhu, Y.; Liu, Y.; Chen, Y.; Li, L. ResSUMO: A Deep Learning Architecture Based on Residual Structure for Prediction of Lysine SUMOylation Sites. Cells 2022, 11, 2646. [Google Scholar] [CrossRef]
- Chou, K.-C. Pseudo Amino Acid Composition and Its Applications in Bioinformatics, Proteomics and System Biology. Curr. Proteom. 2009, 6, 262–274. [Google Scholar] [CrossRef]
- Hamelryck, T. An Amino Acid Has Two Sides: A New 2D Measure Provides a Different View of Solvent Exposure. Proteins Struct. Funct. Bioinforma. 2005, 59, 38–48. [Google Scholar] [CrossRef] [PubMed]
- Chakravarty, S.; Varadarajan, R. Residue Depth: A Novel Parameter for the Analysis of Protein Structure and Stability. Structure 1999, 7, 723–732. [Google Scholar] [CrossRef]
- Rost, B.; Sander, C. Conservation and Prediction of Solvent Accessibility in Protein Families. Proteins Struct. Funct. Genet. 1994, 20, 216–226. [Google Scholar] [CrossRef]
- Miller, S.; Janin, J.; Lesk, A.M.; Chothia, C. Interior and Surface of Monomeric Proteins. J. Mol. Biol. 1987, 196, 641–656. [Google Scholar] [CrossRef]
- Nishikawa, K.; Ooi, T. Prediction of the Surface-Interior Diagram of Globular Proteins by an Empirical Method. Int. J. Pept. Protein Res. 1980, 16, 19–32. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.; Iqbal, N.; Amiruddin Abd Rahman, M.; Khalis Abdul Karim, M. Deep-PiRNA: Bi-Layered Prediction Model for PIWI-Interacting RNA Using Discriminative Features. Comput. Mater. Contin. 2022, 72, 2243–2258. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.; Iqbal, N.; Khan, S.A.; Chou, K.-C. Prediction of PiRNAs and Their Function Based on Discriminative Intelligent Model Using Hybrid Features into Chou’s PseKNC. Chemom. Intell. Lab. Syst. 2020, 203, 104056. [Google Scholar] [CrossRef]
- Wu, Y.; Tan, H.; Qin, L.; Ran, B.; Jiang, Z. A Hybrid Deep Learning Based Traffic Flow Prediction Method and Its Understanding. Transp. Res. Part C Emerg. Technol. 2018, 90, 166–180. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.; Iqbal, N.; Hussain, T.; Khan, S.A.; Chou, K.-C. A Two-Level Computation Model Based on Deep Learning Algorithm for Identification of PiRNA and Their Functions via Chou’s 5-Steps Rule. Int. J. Pept. Res. Ther. 2020, 26, 795–809. [Google Scholar] [CrossRef]
- Al-Jumaili, M.H.A.; Siddique, F.; Abul Qais, F.; Hashem, H.E.; Chtita, S.; Rani, A.; Uzair, M.; Almzaien, K.A. Analysis and Prediction Pathways of Natural Products and Their Cytotoxicity against HeLa Cell Line Protein Using Docking, Molecular Dynamics and ADMET. J. Biomol. Struct. Dyn. 2023, 41, 765–777. [Google Scholar] [CrossRef] [PubMed]
- Glorot, X.; Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
- Voisin, T.; Rouet-Benzineb, P.; Reuter, N.; Laburthe, M. Orexins and Their Receptors: Structural Aspects and Role in Peripheral Tissues. Cell. Mol. Life Sci. 2003, 60, 72–87. [Google Scholar] [CrossRef] [PubMed]
- Baratloo, A.; Hosseini, M.; Negida, A.; El Ashal, G. Part 1: Simple Definition and Calculation of Accuracy, Sensitivity and Specificity. Emergency 2015, 3, 48–49. [Google Scholar] [CrossRef] [PubMed]
- Khan, S.; Naeem, M.; Qiyas, M. Deep Intelligent Predictive Model for the Identification of Diabetes. AIMS Math. 2023, 8, 16446–16462. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.; Iqbal, N.; Li, M.; Khan, D.M. Spark-Based Parallel Deep Neural Network Model for Classification of Large Scale RNAs into PiRNAs and Non-PiRNAs. IEEE Access 2020, 8, 136978–136991. [Google Scholar] [CrossRef]
- Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.C. IRNA-PseU: Identifying RNA Pseudouridine Sites. Mol. Ther.-Nucleic Acids 2016, 5, e332. [Google Scholar] [CrossRef]
- Khan, F.; Khan, M.; Iqbal, N.; Khan, S.; Muhammad Khan, D.; Khan, A.; Wei, D.-Q. Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach. Front. Genet. 2020, 11, 1052. [Google Scholar] [CrossRef]
- Khan, S.; Khan, M.A.; Khan, M.; Iqbal, N.; AlQahtani, S.A.; Al-Rakhami, M.S.; Khan, D.M. Optimized Feature Learning for Anti-Inflammatory Peptide Prediction Using Parallel Distributed Computing. Appl. Sci. 2023, 13, 7059. [Google Scholar] [CrossRef]
List of Parameters | Optimal Values |
---|---|
Neurons at hidden layers | 80-72-56-23-16-2 |
Seed | 12345L |
Regularization l2 | 0.001 |
Activation Functions | Tanh and SoftMax |
Weight initialization function | XAVIER function |
Momentum | 0.9 |
Dropout | 0.25 |
Number of hidden layers | 4 |
Updater | ADAGRAD function |
Training iterations | 600 |
Optimizer | SGD Method |
LR | 0.1 |
LR | Tanh | Sigmoid | ReLU |
---|---|---|---|
0.08 | 96.47 | 89.23 | 92.01 |
0.09 | 96.47 | 89.23 | 92.01 |
0.1 | 96.47 | 89.23 | 92.01 |
0.2 | 93.41 | 87.78 | 90.71 |
0.3 | 91.72 | 85.80 | 89.21 |
0.4 | 90.35 | 84.17 | 87.84 |
0.5 | 87.97 | 82.46 | 86.44 |
0.6 | 85.97 | 80.74 | 85.04 |
0.7 | 83.96 | 79.03 | 83.64 |
0.8 | 81.95 | 77.31 | 82.24 |
0.9 | 79.95 | 75.60 | 80.84 |
Method | ACC (%) | SP (%) | SN (%) | MCC |
---|---|---|---|---|
Deep-Sumo (5-fold) | 91.03 | 85.21 | 95.88 | 0.821 |
Deep-Sumo (10-fold) | 91.99 | 89.44 | 94.70 | 0.841 |
After Feature Selection | ||||
Deep-Sumo (5-fold) | 95.19 | 93.04 | 97.40 | 0.905 |
Deep-Sumo (10-fold) | 96.47 | 96.25 | 96.71 | 0.929 |
ML | SP (%) | MCC | ACC (%) | SN (%) |
---|---|---|---|---|
Deep-Sumo | 96.25 | 0.929 | 96.47 | 96.71 |
SVM | 93.08 | 0.802 | 90.00 | 86.92 |
RF | 95.90 | 0.883 | 94.10 | 92.31 |
KNN | 90.01 | 0.801 | 89.10 | 78.33 |
Method | ACC (%) | SP (%) | SN (%) | MCC |
---|---|---|---|---|
pSumo-CD | 72.80 | 92.10 | 53.60 | 0.494 |
HseSUMO | 89.50 | 89.50 | 89.50 | 0.790 |
Deep-Sumo | 96.47 | 96.25 | 96.71 | 0.929 |
ML | ACC (%) | SP (%) | SN (%) | MCC |
---|---|---|---|---|
Deep-Sumo | 92.23 | 93.53 | 90.93 | 0.892 |
RF | 91.43 | 92.12 | 90.74 | 0.854 |
SVM | 88.32 | 91.76 | 84.88 | 0.782 |
KNN | 87.65 | 88.12 | 87.18 | 0.781 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, S.; Khan, M.; Iqbal, N.; Dilshad, N.; Almufareh, M.F.; Alsubaie, N. Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features. Life 2023, 13, 2153. https://doi.org/10.3390/life13112153
Khan S, Khan M, Iqbal N, Dilshad N, Almufareh MF, Alsubaie N. Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features. Life. 2023; 13(11):2153. https://doi.org/10.3390/life13112153
Chicago/Turabian StyleKhan, Salman, Mukhtaj Khan, Nadeem Iqbal, Naqqash Dilshad, Maram Fahaad Almufareh, and Najah Alsubaie. 2023. "Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features" Life 13, no. 11: 2153. https://doi.org/10.3390/life13112153
APA StyleKhan, S., Khan, M., Iqbal, N., Dilshad, N., Almufareh, M. F., & Alsubaie, N. (2023). Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features. Life, 13(11), 2153. https://doi.org/10.3390/life13112153