Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning
Abstract
:1. Introduction
2. Results
2.1. Implementation of the Parameterized CSCRSites
2.2. Performance of Different Combinations of CSCRSites Settings
2.3. Comparing CSCRSites with Conventional Machine Learning Methods
2.4. Comparing CSCRSites with Existing Deep Learning Methods
2.5. Performance of CSCRSites in Motif Discovery
3. Discussion
4. Materials and Methods
4.1. Datasets
4.2. Sequence Encoding
4.3. Model Construction
4.4. Motifs Discovery
5. Conclusions
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Jeck, W.R.; Sorrentino, J.A.; Wang, K.; Slevin, M.K.; Burd, C.E.; Liu, J.Z.; Marzluff, W.F.; Sharpless, N.E. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 2013, 19, 141–157. [Google Scholar] [CrossRef] [PubMed]
- Sanger, H.L.; Klotz, G.; Riesner, D.; Gross, H.J.; Kleinschmidt, A.K. Viroids are single-stranded covalently closed circular RNA molecules existing as highly base-paired rod-like structures. Proc. Natl. Acad. Sci. USA 1976, 73, 3852–3856. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.H.; Li, C.; Tan, C.L.; Liu, X.B. Circular RNAs: A new frontier in the study of human diseases. J. Med. Genet. 2016, 53, 359–365. [Google Scholar] [CrossRef] [PubMed]
- Du, W.W.; Yang, W.; Liu, E.; Yang, Z.; Dhaliwal, P.; Yang, B.B. Foxo3 circular RNA retards cell cycle progression via forming ternary complexes with p21 and CDK2. Nucleic Acids Res. 2016, 44, 2846–2858. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, S.Y.; Feng, J.; Lei, L.J.; Hu, J.; Xia, L.J.; Wang, J.; Xiang, Y.; Liu, L.J.; Zhong, S.; Han, L.; et al. Comprehensive characterization of tissue-specific circular RNAs in the human and mouse genomes. Brief. Bioinf. 2017, 18, 984–992. [Google Scholar] [CrossRef]
- Salzman, J.; Chen, R.E.; Olsen, M.N.; Wang, P.L.; Brown, P.O. Cell-type specific features of circular RNA expression. PLoS Genet. 2013, 9, e1003777. [Google Scholar] [CrossRef]
- Lu, D.; Xu, A.D. Mini Review: Circular RNAs as Potential Clinical Biomarkers for Disorders in the Central Nervous System. Front. Genet. 2016, 7, 53. [Google Scholar] [CrossRef]
- Holdt, M.L.; Kohlmaier, A.; Teupser, D. Molecular roles and function of circular RNAs in eukaryotic cells. Cell. Mol. Life Sci. 2018, 75, 1071–1098. [Google Scholar] [CrossRef]
- Hansen, T.B.; Jensen, T.I.; Clausen, B.H.; Bramsen, J.B.; Finsen, B.; Damgaard, C.K.; Kjems, J. Natural RNA circles function as efficient microRNA sponges. Nature 2013, 495, 384–388. [Google Scholar] [CrossRef]
- Qu, S.; Yang, X.; Li, X.; Wang, J.; Gao, Y.; Shang, R.; Sun, W.; Dou, K.; Li, H. Circular RNA: A new star of noncoding RNAs. Cancer Lett. 2015, 365, 141–148. [Google Scholar] [CrossRef]
- Ebbesen, K.K.; Kjems, J.; Hansen, T.B. Circular RNAs: Identification, biogenesis and function. Biochim. Biophys. Acta 2016, 1859, 163–168. [Google Scholar] [CrossRef] [PubMed]
- Glazar, P.; Papavasileiou, P.; Rajewsky, N. circBase: A database for circular RNAs. RNA 2014, 20, 1666–1670. [Google Scholar] [CrossRef]
- Chen, X.; Han, P.; Zhou, T.; Guo, X.; Song, X.; Li, Y. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations. Sci. Rep. 2016, 6, 34985. [Google Scholar] [CrossRef] [PubMed]
- Fan, C.; Lei, X.; Fang, Z.; Jiang, Q.; Wu, F.-X. CircR2Disease: A manually curated database for experimentally supported circular RNAs associated with various diseases. Database 2018, 2018, bay044. [Google Scholar] [CrossRef] [PubMed]
- Xia, S.; Feng, J.; Chen, K.; Ma, Y.; Gong, J.; Cai, F.; Jin, Y.; Gao, Y.; Xia, L.; Chang, H.; et al. CSCD: A database for cancer-specific circular RNAs. Nucleic Acids Res. 2017, 46, D925–D929. [Google Scholar] [CrossRef] [PubMed]
- Lyu, D.; Huang, S. The emerging role and clinical implication of human exonic circular RNA. RNA Biol. 2017, 14, 1000–1006. [Google Scholar] [CrossRef]
- Abdelmohsen, K.; Panda, A.C.; Munk, R.; Grammatikakis, I.; Dudekula, D.B.; De, S.; Kim, J.; Noh, J.H.; Kim, K.M.; Martindale, J.L.; et al. Identification of HuR target circular RNAs uncovers suppression of PABPN1 translation by CircPABPN1. RNA Biol. 2017, 14, 361–369. [Google Scholar] [CrossRef]
- Janas, T.; Janas, M.M.; Sapoń, K.; Janas, T. Mechanisms of RNA loading into exosomes. FEBS Lett. 2015, 589, 1391–1398. [Google Scholar] [CrossRef] [Green Version]
- Ray, D.; Kazan, H.; Chan, E.T.; Pena Castillo, L.; Chaudhry, S.; Talukder, S.; Blencowe, B.J.; Morris, Q.; Hughes, T.R. Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins. Nat. Biotechnol. 2009, 27, 667–670. [Google Scholar] [CrossRef]
- Licatalosi, D.D.; Mele, A.; Fak, J.J.; Ule, J.; Kayikci, M.; Chi, S.W.; Clark, T.A.; Schweitzer, A.C.; Blume, J.E.; Wang, X.; et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 2008, 456, 464–469. [Google Scholar] [CrossRef] [Green Version]
- Hafner, M.; Landthaler, M.; Burger, L.; Khorshid, M.; Hausser, J.; Berninger, P.; Rothballer, A.; Ascano, M.; Jungkamp, A.-C.; Munschauer, M.; et al. Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP. Cell 2010, 141, 129–141. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Konig, J.; Zarnack, K.; Rot, G.; Curk, T.; Kayikci, M.; Zupan, B.; Turner, D.J.; Luscombe, N.M.; Ule, J. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat. Struct. Mol. Biol. 2010, 17, 909–915. [Google Scholar] [CrossRef] [Green Version]
- Adjeroh, D.; Allaga, M.; Tan, J.; Lin, J.; Jiang, Y.; Abbasi, A.; Zhou, X. Feature-Based and String-Based Models for Predicting RNA-Protein Interaction. Molecules 2018, 23, 697. [Google Scholar] [CrossRef] [PubMed]
- Shen, W.J.; Cui, W.; Chen, D.; Zhang, J.; Xu, J. RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence. Molecules 2018, 23, 540. [Google Scholar] [CrossRef] [PubMed]
- Sainath, T.N.; Kingsbury, B.; Saon, G.; Soltau, H.; Mohamed, A.R.; Dahl, G.; Ramabhadran, B. Deep Convolutional Neural Networks for large-scale speech tasks. Neural Netw. 2015, 64, 39–48. [Google Scholar] [CrossRef] [PubMed]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef]
- Hassanzadeh, H.R.; Wang, M.D. DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins. In Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016. [Google Scholar]
- Zeng, H.; Edwards, M.D.; Liu, G.; Gifford, D.K. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics 2016, 32, i121–i127. [Google Scholar] [CrossRef]
- Pan, X.; Shen, H.-B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinf. 2017, 18, 136. [Google Scholar] [CrossRef]
- Pan, X.; Shen, H.-B. Predicting RNA—Protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 2018, 34, 3427–3436. [Google Scholar] [CrossRef]
- Pan, X.; Rijnbeek, P.; Yan, J.; Shen, H.-B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom. 2018, 19, 511. [Google Scholar] [CrossRef] [PubMed]
- Maticzka, D.; Lange, S.J.; Costa, F.; Backofen, R. GraphProt: Modeling binding preferences of RNA-binding proteins. Genome Biol. 2014, 15, R17. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 20–25 August 1995. [Google Scholar]
- Muppirala, U.K.; Honavar, V.G.; Dobbs, D. Predicting RNA-protein interactions using only sequence information. BMC Bioinf. 2011, 12, 489. [Google Scholar] [CrossRef]
- Dai, Q.; Guo, M.; Duan, X.; Teng, Z.; Fu, Y. Construction of Complex Features for Computational Predicting ncRNA-Protein Interaction. Front. Genet. 2019, 10, 18. [Google Scholar] [CrossRef] [Green Version]
- Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA 2007, 104, 4337–4341. [Google Scholar] [CrossRef]
- Wang, J.; Wang, L. Prediction of back-splicing sites reveals sequence compositional features of human circular RNAs. In Proceedings of the 2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), Orlando, FL, USA, 19–21 October 2017. [Google Scholar]
- Ray, D.; Kazan, H.; Cook, K.B.; Weirauch, M.T.; Najafabadi, H.S.; Li, X.; Gueroussov, S.; Albu, M.; Zheng, H.; Yang, A.; et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 2013, 499, 172–177. [Google Scholar] [CrossRef] [Green Version]
- Pinero, J.; Bravo, A.; Queralt-Rosinach, N.; Gutierrez-Sacristan, A.; Deu-Pons, J.; Centeno, E.; Garcia-Garcia, J.; Sanz, F.; Furlong, L.I. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017, 45, D833–D839. [Google Scholar]
- Chen, X.; Gu, P.; Xie, R.; Han, J.; Liu, H.; Wang, B.; Xie, W.; Xie, W.; Zhong, G.; Chen, C.; et al. Heterogeneous nuclear ribonucleoprotein K is associated with poor prognosis and regulates proliferation and apoptosis in bladder cancer. J. Cell. Mol. Med. 2017, 21, 1266–1279. [Google Scholar] [CrossRef]
- Marzese, D.M.; Liu, M.; Huynh, J.L.; Hirose, H.; Donovan, N.C.; Huynh, K.T.; Kiyohara, E.; Chong, K.; Cheng, D.; Tanaka, R.; et al. Brain metastasis is predetermined in early stages of cutaneous melanoma by CD44v6 expression through epigenetic regulation of the spliceosome. Pigment Cell Melanoma Res. 2015, 28, 82–93. [Google Scholar] [CrossRef]
- Hamdollah Zadeh, M.A.; Amin, E.M.; Hoareau-Aveilla, C.; Domingo, E.; Symonds, K.E.; Ye, X.; Heesom, K.J.; Salmon, A.; D’Silva, O.; Betteridge, K.B.; et al. Alternative splicing of TIA-1 in human colon cancer regulates VEGF isoform expression, angiogenesis, tumour growth and bevacizumab resistance. Mol. Oncol. 2015, 9, 167–178. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Li, Q.; He, J.; Zhong, L.; Shu, F.; Xing, R.; Lv, D.; Lei, B.; Wan, B.; Yang, Y.; et al. HnRNP-L promotes prostate cancer progression by enhancing cell cycling and inhibiting apoptosis. Oncotarget 2017, 8, 19342–19353. [Google Scholar] [CrossRef] [PubMed]
- Anczukow, O.; Akerman, M.; Clery, A.; Wu, J.; Shen, C.; Shirole, N.H.; Raimer, A.; Sun, S.; Jensen, M.A.; Hua, Y.; et al. SRSF1-Regulated Alternative Splicing in Breast Cancer. Mol. Cell 2015, 60, 105–117. [Google Scholar] [CrossRef] [PubMed]
- Bailey, T.L.; Boden, M.; Buske, F.A.; Frith, M.; Grant, C.E.; Clementi, L.; Ren, J.; Li, W.W.; Noble, W.S. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res. 2009, 37, W202–W208. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; p. 326. [Google Scholar]
- Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188. [Google Scholar]
- Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 2015, 12, 931–934. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
- Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
- Quang, D.; Xie, X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016, 44, e107. [Google Scholar] [CrossRef]
- Wang, J.; Wang, L. Deep Learning of the Back-splicing Code for Circular RNA Formation. Bioinformatics 2019, btz382. [Google Scholar] [CrossRef]
- Gupta, S.; Stamatoyannopoulos, J.A.; Bailey, T.L.; Noble, W.S. Quantifying similarity between motifs. Genome Biol. 2007, 8, R24. [Google Scholar] [CrossRef] [PubMed]
Sample Availability: Samples are not available from the authors. |
Acc. | Prec. | AUC | |
---|---|---|---|
CSCRSites | 0.74 | 0.76 | 0.83 |
DeepBind | 0.68 | 0.68 | 0.75 |
iDeepS | 0.61 | 0.64 | 0.65 |
Zeng’s method | 0.59 | 0.59 | 0.62 |
Associated Genes | Known Motifs ID | Known Sequence | Learnt Motifs ID | Learnt Sequence | Overlap | E-Value |
---|---|---|---|---|---|---|
DAZAP1 | RNCMPT00013 | UAGGUAG | KER_29 | UAGGUAGG | 7 | 0.0031 |
FMR1 | RNCMPT00016 | GGACAAG | KER_632 | GGCACAGG | 7 | 0.0290 |
HNRNPK | RNCMPT00026 | CCAACCC | KER_959 | CAACCAGU | 6 | 0.0429 |
HNRNPL | RNCMPT00027 | ACACACA | KER_793 | ACACACAG | 7 | 0.0019 |
HNRPLL | RNCMPT00178 | ACACACA | KER_793 | ACACACAG | 7 | 0.0030 |
HuR | RNCMPT00032 | UUAUUUU | KER_78 | UUUAUUUU | 7 | 0.0054 |
RNCMPT00112 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0098 | |
RNCMPT00117 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0070 | |
RNCMPT00136 | UUGGUUU | KER_395 | AUUGAUUU | 7 | 0.0202 | |
IGF2BP2 | RNCMPT00033 | ACAAACA | KER_512 | AAACACAG | 7 | 0.0401 |
IGF2BP3 | RNCMPT00172 | ACAAACA | KER_793 | ACACACAG | 7 | 0.0110 |
KHDRBS1 | RNCMPT00169 | AUAAAAG | KER_837 | UAUUAAAG | 7 | 0.0254 |
MATR3 | RNCMPT00037 | AAUCUUG | KER_801 | GAAUCUUG | 7 | 0.0021 |
PABPC5 | RNCMPT00171 | AGAAAAU | KER_113 | AGAAAGUG | 7 | 0.0060 |
PABPN1 | RNCMPT00157 | AGAAGAC | KER_183 | AGAAAACA | 7 | 0.0109 |
PCBP1 | RNCMPT00186 | CCUUUCC | KER_577 | CCUUCCCU | 7 | 0.0055 |
PCBP2 | RNCMPT00044 | CCUUCCC | KER_577 | CCUUCCCU | 7 | 0.0021 |
PTBP1 | RNCMPT00268 | CUUUUCU | KER_366 | UUUUCUUU | 6 | 0.0208 |
RNCMPT00269 | ACUUUCU | KER_269 | UACUUCCC | 7 | 0.0051 | |
RBM46 | RNCMPT00054 | AAUCAAU | KER_153 | GAAUCAAU | 7 | 0.0208 |
SAMD4A | RNCMPT00063 | GCUGGAC | KER_608 | UGCUGGCC | 7 | 0.0347 |
SNRNP70 | RNCMPT00070 | GAUCAAG | KER_197 | GAAUCAAG | 7 | 0.0065 |
SRSF1 | RNCMPT00107 | GGAGGAA | KER_37 | GGGAGGAA | 7 | 0.0391 |
SRSF10 | RNCMPT00019 | AGAGAAA | KER_824 | AGAGAAAA | 7 | 0.0373 |
RNCMPT00089 | AGAGAAA | KER_824 | AGAGAAAA | 7 | 0.0299 | |
TIA1 | RNCMPT00165 | UUUUUUC | KER_842 | UUCCUUCU | 7 | 0.0122 |
U2AF2 | RNCMPT00079 | UUUUUUC | KER_842 | UUCCUUCU | 7 | 0.0036 |
ZC3H14 | RNCMPT00086 | UUUGUUU | KER_900 | UUUCUUUC | 7 | 0.0111 |
Methods | Application | Motifs | Merit | Demerit |
---|---|---|---|---|
CSCRSites | circRNA binding sites | YES | Discovery of various length motifs High prediction accuracy | The rate of convergence is relatively slow |
DeepBind | DNA/RNA binding sites | YES | Scales well to ChIP-seq and HT-SELEX data sets | Low prediction accuracy on circRNA data sets |
iDeepS | RNA binding sites | YES | Integrates RNA secondary structure | Predict binding targets for specific RBP |
Zeng’s method | DNA binding sites | YES | Motif occupancy task | Motif length is fixed |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Lei, X.; Wu, F.-X. Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning. Molecules 2019, 24, 4035. https://doi.org/10.3390/molecules24224035
Wang Z, Lei X, Wu F-X. Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning. Molecules. 2019; 24(22):4035. https://doi.org/10.3390/molecules24224035
Chicago/Turabian StyleWang, Zhengfeng, Xiujuan Lei, and Fang-Xiang Wu. 2019. "Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning" Molecules 24, no. 22: 4035. https://doi.org/10.3390/molecules24224035