PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations
Abstract
:1. Introduction
2. Results and Discussion
2.1. Sequence Preference Analysis
2.2. Performance Results on Training Dataset
2.3. Performance Optimization by Chi-Square Test
2.4. Comparison among Different ML Methods on Training Dataset
2.5. Comparison of PUP-Fuse with Existing Methods on Independent Dataset
3. Materials and Methods
3.1. Data Collection and Processing
3.2. Encoding Scheme
3.2.1. pbCKSAAP
3.2.2. CKSAAP Encoding
3.2.3. Binary Encoding
3.2.4. TPC Encoding
3.2.5. AAI Encoding
3.2.6. Feature Selection
3.2.7. Classification Method
3.2.8. Feature Integration
3.2.9. Model Evaluation
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, T.; Chen, Y.; Li, T.; Jia, C. Recognition of Protein Pupylation Sites by Adopting Resampling Approach. Molecules 2018, 23, 3097. [Google Scholar] [CrossRef] [Green Version]
- Alhuwaider, A.A.H.; Truscott, K.N.; Dougan, D.A. Pupylation of PafA or Pup inhibits components of the Pup-Proteasome System. FEBS Lett. 2018, 592, 15–23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Delley, C.L.; Striebel, F.; Heydenreich, F.M.; Ozcelik, D.; Weber-Ban, E. Activity of the mycobacterial proteasomal ATPase Mpa is reversibly regulated by pupylation. J. Biol. Chem. 2012, 287, 7907–7914. [Google Scholar] [CrossRef] [Green Version]
- Burns, K.E.; Darwin, K.H. Pupylation: Proteasomal targeting by a protein modifier in bacteria. Methods Mol. Biol. 2012, 832, 151–160. [Google Scholar] [CrossRef] [Green Version]
- Striebel, F.; Imkamp, F.; Ozcelik, D.; Weber-Ban, E. Pupylation as a signal for proteasomal degradation in bacteria. Biochim. Biophys. Acta 2014, 1843, 103–113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Burns, K.E.; Darwin, K.H. Pupylation versus ubiquitylation: Tagging for proteasome-dependent degradation. Cell Microbiol. 2010, 12, 424–431. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barandun, J.; Delley, C.L.; Weber-Ban, E. The pupylation pathway and its role in mycobacteria. BMC Biol. 2012, 10, 95. [Google Scholar] [CrossRef] [Green Version]
- Poulsen, C.; Akhter, Y.; Jeon, A.H.; Schmitt-Ulms, G.; Meyer, H.E.; Stefanski, A.; Stuhler, K.; Wilmanns, M.; Song, Y.H. Proteome-wide identification of mycobacterial pupylation targets. Mol. Syst. Biol. 2010, 6, 386. [Google Scholar] [CrossRef]
- Imkamp, F.; Rosenberger, T.; Striebel, F.; Keller, P.M.; Amstutz, B.; Sander, P.; Weber-Ban, E. Deletion of dop in Mycobacterium smegmatis abolishes pupylation of protein substrates in vivo. Mol. Microbiol. 2010, 75, 744–754. [Google Scholar] [CrossRef]
- Mukherjee, S.; Orth, K. Microbiology. A protein pupylation paradigm. Science 2008, 322, 1062–1063. [Google Scholar] [CrossRef]
- Hecht, N.; Gur, E. Development of a fluorescence anisotropy-based assay for Dop, the first enzyme in the pupylation pathway. Anal. Biochem. 2015, 485, 97–101. [Google Scholar] [CrossRef]
- Xu, X.; Niu, Y.; Liang, K.; Shen, G.; Cao, Q.; Yang, Y. Analysis of pupylation of Streptomyces hygroscopicus 5008 in vitro. Biochem. Biophys. Res. Commun. 2016, 474, 126–130. [Google Scholar] [CrossRef]
- Fascellaro, G.; Petrera, A.; Lai, Z.W.; Nanni, P.; Grossmann, J.; Burger, S.; Biniossek, M.L.; Gomez-Auli, A.; Schilling, O.; Imkamp, F. Comprehensive Proteomic Analysis of Nitrogen-Starved Mycobacterium smegmatis Deltapup Reveals the Impact of Pupylation on Nitrogen Stress Response. J. Proteome Res. 2016, 15, 2812–2825. [Google Scholar] [CrossRef]
- Chen, X.; Li, C.; Wang, L.; Liu, Y.; Li, C.; Zhang, J. The Mechanism of Mycobacterium smegmatis PafA Self-Pupylation. PLoS ONE 2016, 11, e0151021. [Google Scholar] [CrossRef]
- Nan, X.; Bao, L.; Zhao, X.; Zhao, X.; Sangaiah, A.K.; Wang, G.G.; Ma, Z. EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites. Molecules 2017, 22, 1463. [Google Scholar] [CrossRef] [Green Version]
- Singh, V.; Sharma, A.; Dehzangi, A.; Tsunoda, T. PupStruct: Prediction of Pupylated Lysine Residues Using Structural Properties of Amino Acids. Genes 2020, 11, 1431. [Google Scholar] [CrossRef]
- Liu, Z.; Ma, Q.; Cao, J.; Gao, X.; Ren, J.; Xue, Y. GPS-PUP: Computational prediction of pupylation sites in prokaryotic proteins. Mol. Biosyst. 2011, 7, 2737–2740. [Google Scholar] [CrossRef] [PubMed]
- Tung, C.W. Prediction of pupylation sites using the composition of k-spaced amino acid pairs. J. Theor. Biol. 2013, 336, 11–17. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Qiu, J.D.; Shi, S.P.; Suo, S.B.; Liang, R.P. Systematic analysis and prediction of pupylation sites in prokaryotic proteins. PLoS ONE 2013, 8, e74002. [Google Scholar] [CrossRef]
- Hasan, M.M.; Zhou, Y.; Lu, X.; Li, J.; Song, J.; Zhang, Z. Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs. PLoS ONE 2015, 10, e0129635. [Google Scholar] [CrossRef] [Green Version]
- Tung, C.W. PupDB: A database of pupylated proteins. BMC Bioinform. 2012, 13, 40. [Google Scholar] [CrossRef] [Green Version]
- Vacic, V.; Iakoucheva, L.M.; Radivojac, P. Two Sample Logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics 2006, 22, 1536–1537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hasan, M.M.; Rashid, M.M.; Khatun, M.S.; Kurata, H. Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci. Rep. 2019, 9, 8258. [Google Scholar] [CrossRef] [Green Version]
- Hasan, M.M.; Yang, S.; Zhou, Y.; Mollah, M.N. SuccinSite: A computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst. 2016, 12, 786–795. [Google Scholar] [CrossRef]
- Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinform. 2010, 26, 680–682. [Google Scholar] [CrossRef] [PubMed]
- Hasan, M.M.; Khatun, M.S.; Kurata, H. iLBE for Computational Identification of Linear B-cell Epitopes by Integrating Sequence and Evolutionary Features. Genom. Proteom. Bioinform. 2020. [Google Scholar] [CrossRef] [PubMed]
- Khatun, M.S.; Hasan, M.M.; Kurata, H. PreAIP: Computational Prediction of Anti-inflammatory Peptides by Integrating Multiple Complementary Features. Front. Genet. 2019, 10, 129. [Google Scholar] [CrossRef]
- Hasan, M.M.; Khatun, M.S.; Mollah, M.N.H.; Yong, C.; Guo, D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int. J. Nanomed. 2017, 12, 6303–6315. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.Z.; Tang, Y.R.; Sheng, Z.Y.; Zhang, Z. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC Bioinform. 2008, 9, 101. [Google Scholar] [CrossRef] [Green Version]
- Charoenkwan, P.; Yana, J.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides. J. Chem. Inf. Model. 2020, 60, 6666–6678. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. Meta-iPVP: A sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J. Comput. Aided Mol. Des. 2020, 34, 1105–1116. [Google Scholar] [CrossRef] [PubMed]
- Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: Amino acid index database, progress report 2008. Nucleic Acids Res. 2008, 36, D202–D205. [Google Scholar] [CrossRef] [Green Version]
- Frank, E.; Hall, M.; Trigg, L.; Holmes, G.; Witten, I.H. Data mining in bioinformatics using Weka. Bioinformatics 2004, 20, 2479–2481. [Google Scholar] [CrossRef] [Green Version]
- Khatun, S.; Hasan, M.; Kurata, H. Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett. 2019, 593, 3029–3039. [Google Scholar] [CrossRef]
- Khatun, M.S.; Hasan, M.M.; Shoombuatong, W.; Kurata, H. ProIn-Fuse: Improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J. Comput. Aided Mol. Des. 2020, 34, 1229–1236. [Google Scholar] [CrossRef]
- Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput. Struct. Biotechnol. J. 2019, 17, 972–981. [Google Scholar] [CrossRef]
- Zhang, D.; Xu, Z.C.; Su, W.; Yang, Y.H.; Lv, H.; Yang, H.; Lin, H. iCarPS: A computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics 2020, btaa702. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. Acm. Trans. Intel. Syst. Tec. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Hasan, M.M.; Alam, M.A.; Shoombuatong, W.; Kurata, H. IRC-Fuse: Improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations. J. Comput. Aided Mol. Des. 2021, 1–9. [Google Scholar] [CrossRef]
- Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int. J. Biol. Macromol. 2020, 157, 752–758. [Google Scholar] [CrossRef]
- Hasan, M.M.; Khatun, M.S.; Kurata, H. Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites. Cells 2019, 8, 95. [Google Scholar] [CrossRef] [Green Version]
- Ho Thanh Lam, L.; Le, N.H.; Van Tuan, L.; Tran Ban, H.; Nguyen Khanh Hung, T.; Nguyen, N.T.K.; Huu Dang, L.; Le, N.Q.K. Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences. Biology 2020, 9, 325. [Google Scholar] [CrossRef]
- Hasan, M.M.; Kurata, H. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS ONE 2018, 13, e0200283. [Google Scholar] [CrossRef]
- Khatun, M.S.; Shoombuatong, W.; Hasan, M.M.; Kurata, H. Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction. Curr. Genom. 2020, 21, 454–463. [Google Scholar] [CrossRef]
- Le, N.Q.K.; Do, D.T.; Hung, T.N.K.; Lam, L.H.T.; Huynh, T.T.; Nguyen, N.T.K. A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. Int. J. Mol. Sci. 2020, 21, 9070. [Google Scholar] [CrossRef]
- Manavalan, B.; Hasan, M.M.; Basith, S.; Gosu, V.; Shin, T.H.; Lee, G. Empirical Comparison and Analysis of Web-Based DNA N (4)-Methylcytosine Site Prediction Tools. Mol. Ther. Nucleic Acids 2020, 22, 406–420. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Yana, J.; Schaduangrat, N.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 2020, 112, 2813–2822. [Google Scholar] [CrossRef]
- Charoenkwan, P.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal. Biochem. 2020, 599, 113747. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Kanthawong, S.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iDPPIV-SCM: A Sequence-Based Predictor for Identifying and Analyzing Dipeptidyl Peptidase IV (DPP-IV) Inhibitory Peptides Using a Scoring Card Method. J. Proteome. Res. 2020, 19, 4125–4136. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Kanthawong, S.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 2020, 113, 689–698. [Google Scholar] [CrossRef]
- Charoenkwan, P.; Anuwongcharoen, N.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. In silico approaches for the prediction and analysis of antiviral peptides: A review. Curr. Pharm. Des. 2020, 26, 1–11. [Google Scholar] [CrossRef]
- Manavalan, B.; Basith, S.; Shin, T.H.; Lee, G. Computational prediction of species-specific yeast DNA replication origin via iterative feature representation. Brief. Bioinform. 2020, bbaa304. [Google Scholar] [CrossRef] [PubMed]
- Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome. Mol. Ther. Nucleic Acids 2019, 18, 131–141. [Google Scholar] [CrossRef] [Green Version]
- Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput. Struct. Biotechnol. J. 2018, 16, 412–420. [Google Scholar] [CrossRef]
- Hasan, M.M.; Shoombuatong, W.; Kurata, H.; Manavalan, B. Critical evaluation of web-based DNA N6-methyladenine site prediction tools. Brief. Funct. Genom. 2021, elaa028. [Google Scholar] [CrossRef] [PubMed]
- Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening. Med. Res. Rev. 2020, 40, 1276–1314. [Google Scholar] [CrossRef] [PubMed]
- Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. mAHTPred: A sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2019, 35, 2757–2765. [Google Scholar] [CrossRef]
- Wei, L.; He, W.; Malik, A.; Su, R.; Cui, L.; Manavalan, B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief. Bioinform. 2020, bbaa275. [Google Scholar] [CrossRef] [PubMed]
- Su, R.; He, L.; Liu, T.; Liu, X.; Wei, L. Protein subcellular localization based on deep image features and criterion learning strategy. Brief. Bioinform. 2020, bbaa313. [Google Scholar] [CrossRef]
- Ning, Q.; Ma, Z.; Zhao, X.; Yin, M. SSKM_Succ: A novel succinylation sites prediction method incorprating K-means clustering with a new semi-supervised learning algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 1. [Google Scholar] [CrossRef]
- Ning, Q.; Yu, M.; Ji, J.; Ma, Z.; Zhao, X. Analysis and prediction of human acetylation using a cascade classifier based on support vector machine. BMC Bioinform. 2019, 20, 346. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hasan, M.M.; Basith, S.; Khatun, M.S.; Lee, G.; Manavalan, B.; Kurata, H. Meta-i6mA: An interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 2020, bbaa202. [Google Scholar] [CrossRef] [PubMed]
- Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020, 36, 3350–3356. [Google Scholar] [CrossRef] [PubMed]
Encoding Method | Sens | Spec | Acc | MCC | AUC | p-Value |
---|---|---|---|---|---|---|
AAI | 0.482 | 0.811 | 0.651 | 0.313 | 0.697 | <0.01 |
Binary | 0.510 | 0.810 | 0.661 | 0.331 | 0.703 | <0.01 |
pbCKSAAP | 0.782 | 0.800 | 0.800 | 0.590 | 0.908 | 0.034 |
TPC | 0.770 | 0.801 | 0.791 | 0.574 | 0.877 | 0.021 |
CKSAAP | 0.773 | 0.805 | 0.789 | 0.583 | 0.895 | 0.038 |
PUP-Fuse | 0.802 | 0.820 | 0.811 | 0.623 | 0.912 |
Encoding Method | Sens | Spec | Acc | MCC | AUC | p-Value |
---|---|---|---|---|---|---|
AAI | 0.410 | 0.854 | 0.626 | 0.294 | 0.724 | <0.01 |
Binary | 0.417 | 0.855 | 0.629 | 0.305 | 0.731 | <0.01 |
pbCKSAAP | 0.831 | 0.827 | 0.829 | 0.658 | 0.929 | 0.031 |
TPC | 0.754 | 0.827 | 0.789 | 0.582 | 0.878 | <0.01 |
CKSAAP | 0.822 | 0.825 | 0.824 | 0.646 | 0.911 | <0.026 |
PUP-Fuse | 0.886 | 0.881 | 0.884 | 0.768 | 0.956 |
Methods | Sens | Spec | Acc | MCC |
---|---|---|---|---|
iPUP | 0.40 | 0.88 | 0.73 | 0.32 |
GPS-PUP | 0.21 | 0.89 | 0.68 | 0.13 |
PUPS | 0.17 | 0.89 | 0.67 | 0.08 |
pbPUP | 0.48 | 0.82 | 0.79 | 0.45 |
PUP-Fuse | 0.59 | 0.91 | 0.82 | 0.55 |
Training | Independent | |
---|---|---|
Pupylated protein | 162 | 71 |
Pupylated lysine | 186 | 87 |
Non-pupylated lysine | 186 | 191 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Auliah, F.N.; Nilamyani, A.N.; Shoombuatong, W.; Alam, M.A.; Hasan, M.M.; Kurata, H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int. J. Mol. Sci. 2021, 22, 2120. https://doi.org/10.3390/ijms22042120
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. International Journal of Molecular Sciences. 2021; 22(4):2120. https://doi.org/10.3390/ijms22042120
Chicago/Turabian StyleAuliah, Firda Nurul, Andi Nur Nilamyani, Watshara Shoombuatong, Md Ashad Alam, Md Mehedi Hasan, and Hiroyuki Kurata. 2021. "PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations" International Journal of Molecular Sciences 22, no. 4: 2120. https://doi.org/10.3390/ijms22042120