A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs
Abstract
:Highlights
- A novel species-specific RBP predictor based on a convolutional neural network with short motifs,
- State-of-the-art performance based on simple sequence features,
- Difference in discriminative features for RBP prediction between different species,
- Quick finding of candidate RBPs for further verification through biological experiments,
- Performance improvement of predicting RBP using computational methods,
- Recommendation for species-specific RBP predicting models.
Abstract
1. Introduction
2. Related Works
3. Materials and Methods
3.1. Benchmark Datasets
3.2. Feature Representation Based on Short Peptide Motifs
3.3. Feature Selection Using the LightGBM Algorithm
3.4. CnnRBP Model
3.5. Performance Evaluation
4. Experimental Results and Analysis
4.1. Performance on Training Datasets through Cross-Validation
4.2. Performance Analysis with Different Feature Dimensionalities
4.3. Comparison with Other RBP Prediction Methods on Independent Validation Datasets
4.4. Cross-Species Prediction
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lu, Z.; Guan, X.; Schmidt, C.A.; Matera, A.G. RIP-seq analysis of eukaryotic Sm proteins identifies three major categories of Sm-containing ribonucleoproteins. Genome Biol. 2014, 15, R7. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Marchese, D.; de Groot, N.S.; Lorenzo Gotor, N.; Livi, C.M.; Tartaglia, G.G. Advances in the characterization of RNA-binding proteins. WIREs RNA 2016, 7, 793–810. [Google Scholar] [CrossRef] [PubMed]
- Xiao, R.; Chen, J.Y.; Liang, Z.; Luo, D.; Fu, X.D. Pervasive Chromatin-RNA Binding Protein Interactions Enable RNA-Based Regulation of Transcription. Cell 2019, 178, 107–121. [Google Scholar] [CrossRef]
- Fei, T.; Chen, Y.; Xiao, T.; Li, W.; Cato, L.; Zhang, P.; Cotter, M.B.; Bowden, M.; Lis, R.T.; Zhao, S.G.A. Genome-wide CRISPR screen identifies HNRNPL as a prostate cancer dependency regulating RNA splicing. Proc. Natl. Acad. Sci. USA 2017, 114, E5207–E5215. [Google Scholar] [CrossRef]
- Gerstberger, S.; Hafner, M.; Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 2014, 15, 829–845. [Google Scholar] [CrossRef] [PubMed]
- Hentze, M.W.; Castello, A.; Schwarzl, T.; Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 2018, 19, 327–341. [Google Scholar] [CrossRef]
- Castello, A.; Fischer, B.; Frese, C.; Horos, R.; Alleaume, A.M.; Foehr, S.; Curk, T.; Krijgsveld, J.; Hentze, M. Comprehensive Identification of RNA-Binding Domains in Human Cells. Mol. Cell 2016, 63, 696–710. [Google Scholar] [CrossRef] [Green Version]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3146–3154. [Google Scholar]
- Paz, I.; Kligun, E.; Bengad, B.; Mandel-Gutfreund, Y. BindUP: A web server for non-homology-based prediction of DNA and RNA binding proteins. Nucleic Acids Res. 2016, 44, W568–W574. [Google Scholar] [CrossRef] [Green Version]
- Kumar, M.; Gromiha, M.M.; Raghava, G.P.S. SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit. 2011, 24, 303–313. [Google Scholar] [CrossRef]
- Livi, C.M.; Klus, P.; Delli Ponti, R.; Tartaglia, G.G. catRAPID signature: Identification of ribonucleoproteins and RNA-binding regions. Bioinformatics 2016, 32, 773–775. [Google Scholar] [CrossRef] [Green Version]
- Sharan, M.; Förstner, K.U.; Eulalio, A.; Vogel, J. APRICOT: An integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins. Nucleic Acids Res. 2017, 45, e96. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Liu, S. RBPPred: Predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2017, 33, 854–862. [Google Scholar] [CrossRef] [Green Version]
- Yang, Y.; Zhao, H.; Wang, J.; Zhou, Y. SPOT-Seq-RNA: Predicting Protein-RNA Complex Structure and RNA-Binding Function by Fold Recognition and Binding Affinity Prediction. Methods Mol. Biol. 2014, 1137, 119–130. [Google Scholar] [PubMed] [Green Version]
- Bressin, A.; Schulte-Sasse, R.; Figini, D.; Urdaneta, E.C.; Beckmann, B.M.; Marsico, A. TriPepSVM: De novo prediction of RNA-binding proteins based on short amino acid motifs. Nucleic Acids Res. 2019, 47, 4406–4417. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.L.; Chen, S.C.; Iyengar, S.S. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 2019, 51, 1–36. [Google Scholar] [CrossRef]
- Ahmed, S.; Kabir, M.; Arif, M.; Khan, Z.U.; Yu, D.J. DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information. Anal. Biochem. 2021, 612, 113955. [Google Scholar] [CrossRef]
- Hu, J.; Zheng, L.L.; Bai, Y.S.; Zhang, K.W.; Yu, D.J.; Zhang, G.J. Accurate prediction of protein-ATP binding residues using position-specific frequency matrix. Anal. Biochem. 2021, 626, 114241. [Google Scholar] [CrossRef]
- He, W.; Wang, Y.; Cui, L.; Su, R.; Wei, L. Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides. Bioinformatics 2021, 37, 4684–4693. [Google Scholar] [CrossRef]
- Cui, F.; Li, S.; Zhang, Z.; Sui, M.; Cao, C.; El-Latif Hesham, A.; Zou, Q. DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins. Comput. Struct. Biotechnol. J. 2022, 20, 2020–2028. [Google Scholar] [CrossRef]
- Zheng, J.; Zhang, X.; Zhao, X.; Tong, X.; Hong, X.; Xie, J.; Liu, S. Deep-RBPPred: Predicting RNA binding proteins in the proteome scale based on deep learning. Sci. Rep. 2018, 8, 15264. [Google Scholar] [CrossRef] [Green Version]
- Du, X.; Diao, Y.; Yao, Y.; Zhu, H.; Yan, Y.; Zhang, Y. DeepMVF-RBP: Deep Multi-view Fusion Representation Learning for RNA-binding Proteins Prediction. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 65–68. [Google Scholar]
- Zhao, Y.; Du, X. econvRBP: Improved ensemble convolutional neural networks for RNA binding protein prediction directly from sequence. Methods 2020, 181–182, 15–23. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Chen, Q.; Liu, B. iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. J. Mol. Biol. 2020, 432, 5860–5875. [Google Scholar] [CrossRef] [PubMed]
- Pan, X.; Fan, Y.X.; Jia, J.; Shen, H.B. Identifying RNA-binding proteins using multi-label deep learning. Sci. China Inf. Sci. 2019, 62, 19103. [Google Scholar] [CrossRef] [Green Version]
- Niu, M.; Wu, J.; Zou, Q.; Liu, Z.; Xu, L. rBPDL: Predicting RNA-Binding Proteins Using Deep Learning. IEEE J. Biomed. Health Inform. 2021, 25, 3668–3676. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Dataset | Training | Independent Validation | ||||
---|---|---|---|---|---|---|
Positive | Negative | Pos:Neg | Positive | Negative | Pos:Neg | |
Human | 1625 | 10,834 | 1:6.67 | 181 | 1204 | 1:6.65 |
E. coli | 460 | 3404 | 1:7.4 | 52 | 379 | 1:7.29 |
Salmonella | 275 | 1273 | 1:4.63 | 31 | 142 | 1:4.58 |
Dataset | ACC (%) | PRE (%) | SEN (%) | SPE (%) | F1 (%) | BACC (%) | MCC (%) | AUC (%) |
---|---|---|---|---|---|---|---|---|
Human | 99.91 | 99.96 | 99.78 | 99.98 | 99.87 | 99.88 | 99.81 | 99.98 |
E. coli | 99.08 | 99.59 | 97.65 | 99.79 | 98.57 | 98.72 | 97.95 | 99.69 |
Salmonella | 95.34 | 94.09 | 92.34 | 96.86 | 92.85 | 94.60 | 89.79 | 96.72 |
The Top 10 Features | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Human | KK | SS | SL | LLL | CC | GK | GRG | FF | LL | RV |
E. coli | RV | RK | KG | RL | AL | KR | SL | PF | RR | RTK |
Salmonella | RK | KV | AL | RR | RV | TL | KR | LL | IK | KK |
The Last 10 Features | ||||||||||
Human | DAQ | DAW | DVV | DAD | DAE | DAK | ACQ | DAR | DAH | DAP |
E. coli | NHI | NHL | DLN | CSV | NRP | NRI | NRL | NRV | NRA | NRG |
Salmonella | DGR | DGK | FW | DGE | DGD | DGQ | DGN | DGM | QHL | TF |
Predictor | ACC (%) | PRE (%) | SEN (%) | SPE (%) | F1 (%) | BACC (%) | MCC | AUC |
---|---|---|---|---|---|---|---|---|
Human dataset | ||||||||
SPOT-Seq-Pred | 85.70 | 38.89 | 23.20 | 94.52 | 29.07 | 58.86 | 0.22 | - |
RNAPred | 49.67 | 18.74 | 87.57 | 44.09 | 30.88 | 65.83 | 0.22 | 0.72 |
RBPPred | 65.63 | 24.08 | 75.69 | 64.12 | 36.53 | 69.91 | 0.27 | 0.70 |
Deep-RBPPred | 30.25 | 14.73 | 90.61 | 21.18 | 25.35 | 55.90 | 0.10 | 0.69 |
TriPepSVM | 88.95 | 58.75 | 51.93 | 94.45 | 55.13 | 73.23 | 0.49 | 0.83 |
iDRBP_MMC | 89.29 | 56.57 | 78.45 | 90.93 | 65.73 | 84.69 | 0.61 | 0.92 |
CnnRBP * | 92.64 | 72.07 | 71.27 | 95.85 | 70.67 | 83.56 | 0.67 | 0.91 |
E. coli dataset | ||||||||
SPOT-Seq-Pred | 87.28 | 100.00 | 29.03 | 100.00 | 45.00 | 64.52 | 0.50 | - |
RNAPred | 66.67 | 32.00 | 80.00 | 63.83 | 45.71 | 71.91 | 0.34 | 0.75 |
RBPPred | 80.92 | 47.73 | 67.74 | 83.80 | 56.00 | 75.77 | 0.45 | 0.77 |
Deep-RBPPred | 61.02 | 20.71 | 78.85 | 58.58 | 32.80 | 68.72 | 0.24 | 0.72 |
TriPepSVM | 92.34 | 69.39 | 65.38 | 96.04 | 67.32 | 80.71 | 0.63 | 0.92 |
iDRBP_MMC | 92.57 | 75.00 | 57.00 | 97.36 | 64.77 | 77.18 | 0.62 | 0.90 |
CnnRBP * | 93.04 | 69.64 | 75.00 | 95.51 | 72.22 | 85.26 | 0.68 | 0.96 |
Salmonella dataset | ||||||||
SPOT-Seq-Pred | 92.11 | 100.00 | 34.15 | 100.00 | 51.43 | 67.31 | 0.56 | - |
RNAPred | 49.18 | 17.25 | 86.27 | 44.18 | 28.76 | 65.23 | 0.20 | 0.79 |
RBPPred | 81.44 | 35.11 | 63.46 | 83.91 | 45.21 | 73.68 | 0.37 | 0.81 |
Deep-RBPPred | 60.12 | 27.38 | 74.19 | 57.04 | 40.00 | 65.62 | 0.24 | 0.70 |
TriPepSVM | 90.17 | 85.00 | 54.83 | 97.89 | 66.66 | 76.36 | 0.63 | 0.86 |
iDRBP_MMC | 87.86 | 67.85 | 61.29 | 93.66 | 64.41 | 77.48 | 0.58 | 0.90 |
CnnRBP * | 92.49 | 84.62 | 70.97 | 97.18 | 77.19 | 84.08 | 0.73 | 0.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Z.-S.; Rao, J.; Lin, Y.-J. A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs. Appl. Sci. 2023, 13, 8231. https://doi.org/10.3390/app13148231
Wei Z-S, Rao J, Lin Y-J. A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs. Applied Sciences. 2023; 13(14):8231. https://doi.org/10.3390/app13148231
Chicago/Turabian StyleWei, Zhi-Sen, Jun Rao, and Yao-Jin Lin. 2023. "A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs" Applied Sciences 13, no. 14: 8231. https://doi.org/10.3390/app13148231
APA StyleWei, Z. -S., Rao, J., & Lin, Y. -J. (2023). A Deep Model for Species-Specific Prediction of Ribonucleic-Acid-Binding Protein with Short Motifs. Applied Sciences, 13(14), 8231. https://doi.org/10.3390/app13148231