Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species
Abstract
:1. Introduction
Tool Name | Modification | Method | Publication Year | Feature Selection Method 1 | Feature Encoding Scheme 2 | Data Scale (Sample Number) | URL Accessibility | Window Size | Species | Reference |
---|---|---|---|---|---|---|---|---|---|---|
RAMPred | m1A | SVM | 2016 | None | NCP, ANF | 6366 (H. sapiens), 1064 (Mus musculus), 483 (S. cerevisiae) | Accessible | 41 | H. sapiens, M. musculus, and S. cerevisiae | [17] |
iRNA-3typeA | m1A and m6A | SVM | 2018 | None | NCP, ANF | 6366 (H. sapiens; m1A), 1064 (Mus musculus; m1A), 1130 (H. sapiens; m6A), 725 (Mus musculus; m6A) | Accessible | 41 | H. sapiens and Mus musculus | [18] |
M6APred-EL | m6A | Ensemble SVM | 2018 | None | position-specific information, physical-chemical properties, ring-function -hydrogen-chemical properties | 1307 | Inaccessible | 51 | S. cerevisiae | [19] |
iRNA(m6A)- PseDNC | m6A | SVM | 2018 | None | PseDNC | 1307 | Inaccessible | 51 | S.cerevisiae | [25] |
BERMP | m6A | BGRU | 2018 | None | ENAC, RNA word embedding | 53,000 (mammalian full transcript mode) 44,853 (mammalian mature mRNA mode) 1100 (S. cerevisiae) 2100 (A. thaliana) | Inaccessible | 251 (Mammalian) 51 (S. cerevisiae) and 101 (A. thaliana) | H. sapiens, M. musculus, S. cerevisiae and A. thaliana | [26] |
M6AMRFS | m6A | XGBoost | 2018 | SFS | DNC, binary, Local position-specific dinucleotide frequency | 1307 (S. cerevisiae), 1130 (H. sapiens), 725 (Mus musculus) and 1000 (A. thaliana) | - | 51 (S. cerevisiae), 41 (H. sapiens), 41 (Mus musculus) and 25 (A. thaliana) | H. sapiens, M. musculus, S. cerevisiae and A. thaliana | [27] |
RFAthM6A | m6A | RF | 2018 | None | PSNSP, PSDSP, KSNPF, k-mer | 2518 | Accessible | 101 | A. thaliana | [28] |
DeepM6APred | m6A | SVM | 2019 | None | deep features and NPPS | 1307 | Inaccessible | 51 | S.cerevisiae | [29] |
Gene2Vec | m6A | CNN | 2019 | None | One-hot, Neighboring methylation state, RNA word embedding, Gene2Vec | 56,557 | Inaccessible | 1001 | H. sapiens and Mus musculus | [22] |
WHISTLE | m6A | SVM | 2022 | perturb method | NCP, GNF, Genome-derived features | 20,516, 17,383 | Accessible | - | H. sapiens | [30] |
DeepPromise | m6A | CNN | 2022 | None | ENAC, one-hot and RNA word embedding | 44,901, 11,656 and 5233 | Accessible | 1001 | H. sapiens and Mus musculus | [21] |
Adaptive-m6A | m6A | Adaptive learning network | 2023 | CHI2 | NAC, DNC, TNC, BE, CKSNAP, ENAC, NCP and RNA word embedding | 6728 (D. melanogaster), 43,025 (zebrafish), 2172 (E. coli), 44,445 (Mus musculus), 2614 (S. cerevisiae) 5033 (A. thaliana) | Accessible | 21 | D. melanogaster, zebrafish, E. coli, Mus musculus, S. cerevisiae A. thaliana | [31] |
PPUS | Ψ | SVM | 2015 | Dynamic window size | One-hot | 464 (yeast), 102 (H. sapiens) | Accessible | dynamically | Yeast and H. sapiens | [32] |
iRNA-PseU | Ψ | SVM | 2016 | None | NCP, ND, PseKNC | 314 (H. sapiens), 495 (S. cerevisiae) and 314 (Mus musculus) | Accessible | 5, 10, 15, 20 | S. cerevisiae, H. sapiens and Mus musculus | [33] |
PseUI | Ψ | SVM | 2018 | SFS | NAC, DNC, PseDNC, PSNP, PSDP | 314 (H. sapiens), 495 (S. cerevisiae) and 314 (Mus musculus) | Inaccessible | 21, 31 | S. cerevisiae, H. sapiens and Mus musculus | [34] |
iPseU-CNN | Ψ | CNN | 2019 | None | One-hot | 990 (H. sapiens), 628 (S. cerevisiae) and 944 (M. musculus) | - | 15 | S. cerevisiae, H. sapiens and Mus musculus | [35] |
EnsemPseU | Ψ | Ensemble | 2020 | CHI2, mRMR, F-score | Kmer, one-hot, ENAC, NCP, ND | 990 (H. sapiens), 628 (S. cerevisiae) and 944 (M. musculus) | Inaccessible | - | S. cerevisiae, H. sapiens and Mus musculus | [36] |
PIANO | Ψ | SVM | 2020 | None | SCP, PSNP, Genome-derived features | 3566 (H. sapiens) | Accessible | 41 | H. sapiens | [37] |
PSI-MOUSE | Ψ | SVM | 2020 | None | NCP, Genome-derived features | 628 (S. cerevisiae) and 944 (M. musculus) | Accessible | - | S. cerevisiae, Mus musculus | [38] |
BERT2OME | 2′-O-methylation | BERT | 2023 | None | RNA word embedding | 1089 (H. sapiens), 278 (S. cerevisiae) and 45 (M. musculus) | Inaccessible | 41 | S. cerevisiae, H. sapiens and Mus musculus | [39] |
MSCAN | m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um | multi-scale self- and cross-attention mechanisms | 2024 | None | RNA word embedding | 9115 (m1A), 47,208 (m6A), 6803 (m5C), 986 (m5U), 1339 (m6Am), 691 (m7G), 2273 (Ψ), 4547 (Am, Cm, Gm, and Um) and 5901 (I) | Accessible | 21, 31 and 41 | H. sapiens | [40] |
2. Results
2.1. Elucidate Methylation Mechanisms Based on Multi-Scale Sequential Design
2.2. Comparative Analysis of Model Performance across Species and Methylations
2.3. Proposed Method Outperforms State-of-the-Art Methods
3. Discussion
4. Materials and Methods
4.1. Datasets from Various Species
4.2. Multi-Scale Information Processing Module
4.3. BERT Encoder Module
4.4. Fusion Feature Module
4.5. Classification Module
4.6. Evaluation Metrics
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Smith, Z.D.; Meissner, A. DNA methylation: Roles in mammalian development. Nat. Rev. Genet. 2013, 14, 204–220. [Google Scholar] [CrossRef]
- Dunn, D.B. The occurrence of 1-methyladenine in ribonucleic acid. Biochim. Biophys. Acta 1961, 46, 198–200. [Google Scholar] [CrossRef]
- Meyer, K.D.; Jaffrey, S.R. The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nat. Rev. Mol. Cell Biol. 2014, 15, 313–326. [Google Scholar] [CrossRef] [PubMed]
- Fu, Y.; Dominissini, D.; Rechavi, G.; He, C. Gene expression regulation mediated through reversible m(6)a rna methylation. Nat. Rev. Genet. 2014, 15, 293–306. [Google Scholar] [CrossRef] [PubMed]
- Yang, H.; Liu, Y.; Bai, F.; Zhang, J.Y.; Ma, S.H.; Liu, J.; Xu, Z.D.; Zhu, H.G.; Ling, Z.Q.; Ye, D.; et al. Tumor development is associated with decrease of tet gene expression and 5-methylcytosine hydroxylation. Oncogene 2013, 32, 663–669. [Google Scholar] [CrossRef] [PubMed]
- Schevitz, R.W.; Podjarny, A.D.; Krishnamachari, N.; Hughes, J.J.; Sigler, P.B.; Sussman, J.L. Crystal structure of a eukaryotic initiator trna. Nature 1979, 278, 188–190. [Google Scholar] [CrossRef] [PubMed]
- Saikia, M.; Fu, Y.; Pavon-Eternod, M.; He, C.; Pan, T. Genome-wide analysis of n1-methyl-adenosine modification in human trnas. RNA 2010, 16, 1317–1327. [Google Scholar] [CrossRef] [PubMed]
- Wu, H.; Zhang, Y. Mechanisms and functions of tet protein-mediated 5-methylcytosine oxidation. Genes Dev. 2011, 25, 2436–2452. [Google Scholar] [CrossRef]
- Yang, C.; Hu, Y.; Zhou, B.; Bao, Y.; Li, Z.; Gong, C.; Yang, H.; Wang, S.; Xiao, Y. The role of m6a modification in physiology and disease. Cell Death Dis. 2020, 11, 960. [Google Scholar] [CrossRef]
- Charette, M.; Gray, M.W. Pseudouridine in rna: What, where, how, and why. Iubmb Life 2000, 49, 341–351. [Google Scholar] [CrossRef]
- Davis, D.R.; Veltri, C.A.; Nielsen, L. An rna model system for investigation of pseudouridine stabilization of the codon-anticodon interaction in trnalys, trnahis and trnatyr. J. Biomol. Struct. Dyn. 1998, 15, 1121–1132. [Google Scholar] [CrossRef]
- Basak, A.; Query, C.C. A pseudouridine residue in the spliceosome core is part of the filamentous growth program in yeast. Cell Rep. 2014, 8, 966–973. [Google Scholar] [CrossRef]
- Jack, K.; Bellodi, C.; Landry, D.M.; Niederer, R.O.; Meskauskas, A.; Musalgaonkar, S.; Kopmar, N.; Krasnykh, O.; Dean, A.M.; Thompson, S.R.; et al. Rrna pseudouridylation defects affect ribosomal ligand binding and translational fidelity from yeast to human cells. Mol. Cell 2011, 44, 660–666. [Google Scholar] [CrossRef]
- Ma, X.; Zhao, X.; Yu, Y.T. Pseudouridylation (psi) of u2 snrna in s. Cerevisiae is catalyzed by an rna-independent mechanism. EMBO J. 2003, 22, 1889–1897. [Google Scholar] [CrossRef]
- Carlile, T.M.; Rojas-Duran, M.F.; Zinshteyn, B.; Shin, H.; Bartoli, K.M.; Gilbert, W.V. Pseudouridine profiling reveals regulated mrna pseudouridylation in yeast and human cells. Nature 2014, 515, 143–146. [Google Scholar] [CrossRef] [PubMed]
- Boccaletto, P.; Machnicka, M.A.; Purta, E.; Piatkowski, P.; Baginski, B.; Wirecki, T.K.; de Crecy-Lagard, V.; Ross, R.; Limbach, P.A.; Kotter, A.; et al. Modomics: A database of rna modification pathways. 2017 update. Nucleic Acids Res. 2018, 46, D303–D307. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Feng, P.; Tang, H.; Ding, H.; Lin, H. Rampred: Identifying the n1-methyladenosine sites in eukaryotic transcriptomes. Sci. Rep. 2016, 6, 31080. [Google Scholar] [CrossRef] [PubMed]
- Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. Irna-3typea: Identifying three types of modification at rna’s adenosine sites. Mol. Ther. Nucleic Acids 2018, 11, 468–474. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.; Chen, H.; Su, R. M6apred-el: A sequence-based predictor for identifying n6-methyladenosine sites using ensemble learning. Mol. Ther. Nucleic Acids 2018, 12, 635–644. [Google Scholar] [CrossRef] [PubMed]
- Ma, R.; Li, S.; Li, W.; Yao, L.; Huang, H.D.; Lee, T.Y. Kinasephos 3.0: Redesign and expansion of the prediction on kinase-specific phosphorylation sites. Genom. Proteom. Bioinform. 2023, 21, 228–241. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, F.; Wang, Y.; Smith, A.I.; Webb, G.I.; Akutsu, T.; Baggag, A.; Bensmail, H.; Song, J. Comprehensive review and assessment of computational methods for predicting rna post-transcriptional modification sites from rna sequences. Brief. Bioinform. 2019, 21, 1676–1696. [Google Scholar] [CrossRef]
- Zou, Q.; Xing, P.; Wei, L.; Liu, B. Gene2vec: Gene subsequence embedding for prediction of mammalian n6-methyladenosine sites from mrna. RNA 2019, 25, 205–218. [Google Scholar] [CrossRef]
- Ma, R.; Li, S.; Parisi, L.; Li, W.; Huang, H.D.; Lee, T.Y. Holistic similarity-based prediction of phosphorylation sites for understudied kinases. Brief. Bioinform. 2023, 24, bbac624. [Google Scholar] [CrossRef]
- Song, Z.; Huang, D.; Song, B.; Chen, K.; Song, Y.; Liu, G.; Su, J.; de Magalhães, J.P.; Rigden, D.J.; Meng, J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring rna modifications. Nat. Commun. 2021, 12, 4011. [Google Scholar] [CrossRef]
- Chen, W.; Ding, H.; Zhou, X.; Lin, H.; Chou, K.C. Irna(m6a)-psednc: Identifying n(6)-methyladenosine sites using pseudo dinucleotide composition. Anal. Biochem. 2018, 561–562, 59–65. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; He, N.; Chen, Y.; Chen, Z.; Li, L. Bermp: A cross-species classifier for predicting m(6)a sites by integrating a deep learning algorithm and a random forest approach. Int. J. Biol. Sci. 2018, 14, 1669–1677. [Google Scholar] [CrossRef] [PubMed]
- Qiang, X.; Chen, H.; Ye, X.; Su, R.; Wei, L. M6amrfs: Robust prediction of n6-methyladenosine sites with sequence-based features in multiple species. Front. Genet. 2018, 9, 495. [Google Scholar] [CrossRef]
- Wang, X.; Yan, R. Rfathm6a: A new tool for predicting m6a sites in arabidopsis thaliana. Plant Mol. Biol. 2018, 96, 327–337. [Google Scholar] [CrossRef] [PubMed]
- Wei, L.; Su, R.; Wang, B.; Li, X.; Zou, Q.; Gao, X. Integration of deep feature representations and handcrafted features to improve the prediction of n6-methyladenosine sites. Neurocomputing 2019, 324, 3–9. [Google Scholar] [CrossRef]
- Chen, K.; Wei, Z.; Zhang, Q.; Wu, X.; Rong, R.; Lu, Z.; Su, J.; de Magalhaes, J.P.; Rigden, D.J.; Meng, J. Whistle: A high-accuracy map of the human n6-methyladenosine (m6a) epitranscriptome predicted using a machine learning approach. Nucleic Acids Res. 2019, 47, e41. [Google Scholar] [CrossRef]
- Wang, R.; Chung, C.R.; Huang, H.D.; Lee, T.Y. Identification of species-specific rna n6-methyladinosine modification sites from rna sequences. Brief. Bioinform. 2023, 24, bbac573. [Google Scholar] [CrossRef]
- Li, Y.H.; Zhang, G.; Cui, Q. Ppus: A web server to predict pus-specific pseudouridine sites. Bioinformatics 2015, 31, 3362–3364. [Google Scholar] [CrossRef]
- Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.C. Irna-pseu: Identifying rna pseudouridine sites. Mol. Ther. Nucleic Acids 2016, 5, e332. [Google Scholar]
- He, J.; Fang, T.; Zhang, Z.; Huang, B.; Zhu, X.; Xiong, Y. Pseui: Pseudouridine sites identification based on rna sequence information. BMC Bioinform. 2018, 19, 306. [Google Scholar] [CrossRef]
- Tahir, M.; Tayara, H.; Chong, K.T. Ipseu-cnn: Identifying rna pseudouridine sites using convolutional neural networks. Mol. Ther. Nucleic Acids 2019, 16, 463–470. [Google Scholar] [CrossRef] [PubMed]
- Bi, Y.; Jin, D.; Jia, C.Z. Ensempseu: Identifying pseudouridine sites with an ensemble approach. IEEE Access 2020, 8, 79376–79382. [Google Scholar] [CrossRef]
- Song, B.; Tang, Y.; Wei, Z.; Liu, G.; Su, J.; Meng, J.; Chen, K. Piano: A web server for pseudouridine-site (psi) identification and functional annotation. Front. Genet. 2020, 11, 88. [Google Scholar] [CrossRef] [PubMed]
- Song, B.; Chen, K.; Tang, Y.; Ma, J.; Meng, J.; Wei, Z. Psi-mouse: Predicting mouse pseudouridine sites from sequence and genome-derived features. Evol. Bioinform. Online 2020, 16, 1176934320925752. [Google Scholar] [CrossRef] [PubMed]
- Soylu, N.N.; Sefer, E. Bert2ome: Prediction of 2′-O-methylation modifications from rna sequence by transformer architecture based on bert. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 2177–2189. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Huang, T.; Wang, D.; Zeng, W.; Sun, Y.; Zhang, L. Mscan: Multi-scale self- and cross-attention network for rna methylation site prediction. BMC Bioinform. 2024, 25, 32. [Google Scholar] [CrossRef] [PubMed]
- Bailey, T.L.; Johnson, J.; Grant, C.E.; Noble, W.S. The meme suite. Nucleic Acids Res. 2015, 43, W39–W49. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Stamatoyannopoulos, J.A.; Bailey, T.L.; Noble, W.S. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef] [PubMed]
- Wagih, O.; Sugiyama, N.; Ishihama, Y.; Beltrao, P. Uncovering phosphorylation-based specificities through functional interaction networks. Mol. Cell. Proteom. 2016, 15, 236–245. [Google Scholar] [CrossRef]
- Crooks, G.E.; Hon, G.; Chandonia, J.M.; Brenner, S.E. Weblogo: A sequence logo generator. Genome Res. 2004, 14, 1188–1190. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Zeng, P.; Li, Y.-H.; Zhang, Z.; Cui, Q. Sramp: Prediction of mammalian n6-methyladenosine (m6a) sites based on sequence-derived features. Nucleic Acids Res. 2016, 44, e91. [Google Scholar] [CrossRef]
- Li, H.; Chen, L.; Huang, Z.; Luo, X.; Li, H.; Ren, J.; Xie, Y. Deepome: A web server for the prediction of 2’-o-me sites based on the hybrid cnn and blstm architecture. Front. Cell Dev. Biol. 2021, 9, 686894. [Google Scholar] [CrossRef]
- Li, F.; Guo, X.; Jin, P.; Chen, J.; Xiang, D.; Song, J.; Coin, L.J.M. Porpoise: A new approach for accurate prediction of rna pseudouridine sites. Brief. Bioinform. 2021, 22, bbab245. [Google Scholar] [CrossRef]
- Ji, Y.; Zhou, Z.; Liu, H.; Davuluri, R.V. Dnabert: Pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 2021, 37, 2112–2120. [Google Scholar] [CrossRef]
- Xuan, J.J.; Sun, W.J.; Lin, P.H.; Zhou, K.R.; Liu, S.; Zheng, L.L.; Qu, L.H.; Yang, J.H. Rmbase v2.0: Deciphering the map of rna modifications from epitranscriptome sequencing data. Nucleic Acids Res. 2018, 46, D327–D334. [Google Scholar] [CrossRef]
- He, H.B.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Wang, R.; Wang, Z.; Wang, H.; Pang, Y.; Lee, T.Y. Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian. Sci. Rep. 2020, 10, 20447. [Google Scholar] [CrossRef]
- Jin, J.; Yu, Y.; Wang, R.; Zeng, X.; Pang, C.; Jiang, Y.; Li, Z.; Dai, Y.; Su, R.; Zou, Q.; et al. Idna-abf: Multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biol. 2022, 23, 219. [Google Scholar] [CrossRef] [PubMed]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Yamada, K.; Hamada, M. Prediction of rna-protein interactions using a nucleotide language model. Bioinform. Adv. 2022, 2, vbac023. [Google Scholar] [CrossRef]
- Zhao, W.; Alwidian, S.; Mahmoud, Q.H. Adversarial training methods for deep learning: A systematic review. Algorithms 2022, 15, 283. [Google Scholar] [CrossRef]
- Jia, X.; Zhang, Y.; Wei, X.; Wu, B.; Ma, K.; Wang, J.; Cao, X. Prior-Guided Adversarial Initialization for Fast Adversarial Training; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 567–584. [Google Scholar]
- Liu, J.; Zhang, Q.; Mo, K.; Xiang, X.; Li, J.; Cheng, D.; Gao, R.; Liu, B.; Chen, K.; Wei, G. An efficient adversarial example generation algorithm based on an accelerated gradient iterative fast gradient. Comput. Stand. Interfaces 2022, 82, 103612. [Google Scholar] [CrossRef]
- Jia, X.; Zhang, Y.; Wu, B.; Wang, J.; Cao, X. Boosting fast adversarial training with learnable adversarial initialization. IEEE Trans. Image Process 2022, 31, 4417–4430. [Google Scholar] [CrossRef]
- Kao, H.J.; Huang, C.H.; Bretana, N.A.; Lu, C.T.; Huang, K.Y.; Weng, S.L.; Lee, T.Y. A two-layered machine learning method to identify protein o-glcnacylation sites with o-glcnac transferase substrate motifs. BMC Bioinform. 2015, 16 (Suppl. S18), S10. [Google Scholar] [CrossRef] [PubMed]
- Wang, R.; Wang, Z.; Li, Z.; Lee, T.Y. Residue-residue contact can be a potential feature for the prediction of lysine crotonylation sites. Front. Genet. 2021, 12, 788467. [Google Scholar] [CrossRef]
- Li, F.; Li, C.; Marquez-Lago, T.T.; Leier, A.; Akutsu, T.; Purcell, A.W.; Ian Smith, A.; Lithgow, T.; Daly, R.J.; Song, J.; et al. Quokka: A comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics 2018, 34, 4223–4231. [Google Scholar] [CrossRef]
Modification Type | Tool Name | Algorithm | Dataset Scale | Encoding Scheme | SE | SP | AUC | MCC |
---|---|---|---|---|---|---|---|---|
m1A | RAMPred [17] | SVM | 6366 (H. sapiens) 1064 (M. musculus) and 483 (S. cerevisiae) | NCP, ANF | 0.92 | 0.01 | 0.51 | 0.02 |
iRNA-3typeA [18] | SVM | 6366 (H. sapiens) and 1064 (M. musculus) | NCP, ANF | 0.98 | 0.03 | 0.5 | 0.02 | |
DeepPromise [21] | CNN | 5233 | ENAC, one-hot and RNA word embedding | 0.87 | 0.9 | 0.94 | 0.59 | |
MultiRM [24] | RNA embedding + LSTM + attention | 16,380 | RNA word embedding | 0.64 | 0.8 | 0.78 | 0.45 | |
BERT-RNA (Ours) | Pretrained BERT model | 4846 | RNA word embedding | 0.95 | 0.98 | 0.99 | 0.79 | |
m6A | SRAMP [45] | RF | 55,706 (full transcript mode) and 46,992 (mature mRNA mode) | one-hot, KNN score spectrum | 0.44 | 0.9 | 0.29 | 0.79 |
DeepPromise [21] | CNN | 44,901, 11,656 | ENAC, one-hot and RNA word embedding | 0.39 | 0.9 | 0.25 | 0.76 | |
DeepOME [46] | CNN+BiLSTM | 3052 | One-hot | 0.97 | 1 | 0.93 | 0.99 | |
CapNetwork [31] | CNN+BiLSTM+CapsuleNetwork | 207,010 | RNA word embedding | 0.12 | 0.95 | 0.13 | 0.61 | |
Adaptive-m6A [31] | CNN+BiLSTM+attention | 207,010 | RNA word embedding | 0.87 | 0.73 | 0.6 | 0.88 | |
MultiRM [24] | RNA embedding + LSTM + attention | 65,178 | RNA word embedding | 0.82 | 0.78 | 0.86 | 0.6 | |
BERT-RNA (Ours) | Pretrained BERT model | 422,994 | RNA word embedding | 0.97 | 0.98 | 0.99 | 0.94 | |
Porpoise [47] | emsenble learning framework | 2472 | BE, pseKNC, NCP, PSTNPss | 0.82 | 0.75 | 0.74 | 0.59 | |
Ψ | PSI-MOUSE [38] | SVM, RF, GLM, NB, DT | 944 | NCP, ND and Genome-derived features | 0.86 | 0.97 | 0.95 | 0.91 |
MultiRM [24] | RNA embedding + LSTM + attention | 3137 | RNA word embedding | 0.92 | 0.76 | 0.85 | 0.69 | |
BERT-RNA (Ours) | Pretrained BERT model | 8871 | RNA word embedding | 0.7 | 0.65 | 0.74 | 0.23 |
N1-Methyladenosine (m1A) | N6-Methyladenosine (m6A) | Pseudouridine (pseU, Ψ) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training Set | Testing Set | Training Set | Testing Set | Training Set | Testing Set | |||||||||
Group | Species | Assembly | Positive | Negative | Positive | Negative | Positive | Negative | Positive | Negative | Positive | Negative | Positive | Negative |
Bacteria | P. aeruginosa | ASM676v1 | - | - | - | - | 4082 | 34,559 | 1732 | 14,829 | - | - | - | - |
Mammal | Pan troglodytes (Chimpanzee) | panTro4 | - | - | - | - | 26,925 | 34,500 | 11,438 | 14,888 | - | - | - | - |
Mammal | Macaca mulatta (Rhesus) | rheMac8 | - | - | - | - | 27,264 | 34,493 | 11,573 | 14,895 | - | - | - | - |
Mammal | Rattus norvegicus (Rat) | rn5 | - | - | - | - | 35,102 | 34,466 | 14,894 | 14,922 | - | - | - | - |
Mammal | Sus scrofa domesticus (Pig) | susScr3 | - | - | - | - | 35,034 | 34,520 | 14,942 | 14,868 | - | - | - | - |
Plant | Arabidopsis thaliana (Thale cress) | TAIR10 | - | - | - | - | 14,328 | 34,475 | 6003 | 14,913 | - | - | - | - |
Mammal | Mus musculus (House mouse) | mm10 | 740 | 34,568 | 312 | 14,820 | 35,095 | 34,475 | 14,904 | 14,913 | 2322 | 2322 | 998 | 7261 |
Fungi | Saccharomyces cerevisiae S288C | sacCer3 | 858 | 34,567 | 362 | 14,281 | 47,367 | 34,574 | 20,304 | 14,814 | 1466 | 1466 | 650 | 7248 |
Mammal | Homo sapiens (Human) | hg19 | 1815 | 34,558 | 759 | 14,803 | 35,090 | 34,474 | 14,903 | 14,914 | 2405 | 2405 | 1030 | 7263 |
Insect | Drosophila melanogaster (Fruit fly) | BDGP6 | - | - | - | - | 4791 | 34,553 | 2028 | 14,835 | - | - | - | - |
Vertebrate | Danio rerio (Zebrafish) | danRer10 | - | - | - | - | 30,158 | 34,528 | 12,864 | 14,859 | - | - | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, R.; Chung, C.-R.; Lee, T.-Y. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. Int. J. Mol. Sci. 2024, 25, 2869. https://doi.org/10.3390/ijms25052869
Wang R, Chung C-R, Lee T-Y. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. International Journal of Molecular Sciences. 2024; 25(5):2869. https://doi.org/10.3390/ijms25052869
Chicago/Turabian StyleWang, Rulan, Chia-Ru Chung, and Tzong-Yi Lee. 2024. "Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species" International Journal of Molecular Sciences 25, no. 5: 2869. https://doi.org/10.3390/ijms25052869
APA StyleWang, R., Chung, C. -R., & Lee, T. -Y. (2024). Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. International Journal of Molecular Sciences, 25(5), 2869. https://doi.org/10.3390/ijms25052869