DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment
Abstract
:1. Introduction
2. Results and Discussion
2.1. Parameter Optimization
2.2. Comparing with Other State-of-the-Art
3. Materials and Methods
3.1. Datasets
3.2. Sequence Encoding
3.3. Resolving the Data Imbalance Problem
3.4. Convolutional Neural Networks
3.5. Illustration of the DeepPred-SubMito
3.6. Evaluation Criteria
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
CNN | Convolutional neural networks |
MCC | Matthews Correlation Coefficient |
ACC | accuracy |
PSSM | position specific scoring matrix |
ROC curve | Receiver Operating Characteristic curve |
SVM | Support vector machine |
References
- Surguchov, A.P. Common genes for mitochondrial and cytoplasmic proteins. Trends Biochem. Sci. 1987, 12, 335–338. [Google Scholar] [CrossRef]
- De Brito, O.M.; Scorrano, L. An intimate liaison: Spatial organization of the endoplasmic reticulum–mitochondria relationship. EMBO 2010, 29, 2715–2723. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fulda, S.; Galluzzi, L.; Kroemer, G. Targeting mitochondria for cancer therapy. Nat. Rev. Drug Discov. 2010, 9, 447–464. [Google Scholar] [CrossRef]
- Kroemer, G.; Reed, J.C. Mitochondrial control of cell death. Nat. Med. 2000, 6, 513–519. [Google Scholar] [CrossRef]
- Shi, S.P.; Qiu, J.D.; Sun, X.Y.; Huang, J.H.; Huang, S.Y.; Suo, S.B.; Liang, R.P.; Zhang, L. Identify submitochondria and subchloroplast locations with pseudo amino acid composition: Approach from the strategy of discrete wavelet transform feature extraction. Biochim. et Biophys. Acta (BBA)-Mol. Cell Res. 2011, 1813, 424–430. [Google Scholar] [CrossRef] [Green Version]
- Mei, S. Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. JTBIAP 2012, 310, 80–87. [Google Scholar] [CrossRef]
- Lin, H.; Chen, W.; Yuan, L.F.; Li, Z.Q.; Ding, H. Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor. 2013, 61, 259–268. [Google Scholar] [CrossRef]
- Kumar, R.; Kumari, B.; Kumar, M. Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 2018, 42, 11–22. [Google Scholar] [CrossRef]
- Qiu, W.; Li, S.; Cui, X.; Yu, Z.; Wang, M.; Du, J.; Peng, Y.; Yu, B. Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. J. Theor. Biol. 2018, 450, 86–103. [Google Scholar] [CrossRef] [PubMed]
- Yu, B.; Qiu, W.; Chen, C.; Ma, A.; Jiang, J.; Zhou, H.; Ma, Q. SubMito-XGBoost: Predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 2020, 36, 1074–1081. [Google Scholar] [CrossRef] [PubMed]
- Savojardo, C.; Bruciaferri, N.; Tartari, G.; Martelli, P.L.; Casadio, R. DeepMito: Accurate prediction of protein sub-mitochondrial localization using convolutional neural networks. Bioinformatics 2020, 36, 56–64. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Du, P.F. Predicting protein submitochondrial locations: The 10th Anniversary. Curr. Genom. 2017, 18, 316–321. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cedano, J.; Aloy, P.; Perez-Pons, J.A.; Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 1997, 266, 594–600. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Duan, X. Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC. J. Theor. Biol. 2018, 437, 239–250. [Google Scholar] [CrossRef] [PubMed]
- Angermueller, C.; Pärnamaa, T.; Parts, L.; Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 2016, 12, 878. [Google Scholar] [CrossRef] [PubMed]
- Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinf. 2017, 18, 851–869. [Google Scholar] [CrossRef] [Green Version]
- Jurtz, V.I.; Johansen, A.R.; Nielsen, M.; Almagro Armenteros, J.J.; Nielsen, H.; Sønderby, C.K.; Winther, O.; Sønderby, S.K. An introduction to deep learning on biological sequence data: Examples and solutions. Bioinformatics 2017, 33, 3685–3690. [Google Scholar] [CrossRef]
- Almagro Armenteros, J.J.; Sønderby, C.K.; Sønderby, S.K.; Nielsen, H.; Winther, O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 2017, 33, 3387–3395. [Google Scholar] [CrossRef]
- Pang, L.; Wang, J.; Zhao, L.; Wang, C.; Zhan, H. A novel protein subcellular localization method with CNN-XGBoost model for Alzheimer’s disease. Front. Genet. 2019, 9, 751. [Google Scholar] [CrossRef]
- Kaleel, M.; Zheng, Y.; Chen, J.; Feng, X.; Simpson, J.C.; Pollastri, G.; Mooney, C. SCLpred-EMS: Subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks. Bioinformatics 2020, 36, 3343–3349. [Google Scholar] [CrossRef]
- Pan, X.; Rijnbeek, P.; Yan, J.; Shen, H.B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genom. 2018, 19, 511. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simpson, A.J.R. Over-sampling in a deep neural network. arXiv 2015, arXiv:1502.03648. [Google Scholar]
- Kim, M.J.; Kang, D.K.; Kim, H.B. Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction. Expert Syst. Appl. 2015, 42, 1074–1082. [Google Scholar] [CrossRef]
- Manaswi, N.K. Understanding and Working with Keras; Apress: Berkeley, CA, USA, 2018; pp. 31–43. [Google Scholar]
- Zhang, C.J.; Tang, H.; Li, W.C.; Lin, H.; Chen, W.; Chou, K.C. iOri-Human: Identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2016, 7, 69783. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22, 1658–1659. [Google Scholar] [CrossRef] [Green Version]
- Du, P.; Yu, Y. SubMito-PSPCP: Predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions. Biomed Res. Int. 2013, 2013, 263829. [Google Scholar] [CrossRef] [Green Version]
- Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef]
- Pan, X.; Shen, H.B. Predicting RNA–protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics 2018, 34, 3427–3436. [Google Scholar] [CrossRef] [Green Version]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef] [Green Version]
- Cao, Z.; Pan, X.; Yang, Y.; Huang, Y.; Shen, H.B. The lncLocator: A subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2018, 34, 2185–2194. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Ling, C.X.; Li, C. Data mining for direct marketing: Problems and solutions. Kdd 1998, 98, 73–79. [Google Scholar]
- Bouvrie, J. Notes on convolutional neural networks. CogPrints 2006. Available online: http://cogprints.org/5869/ (accessed on 2 July 2020).
- Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem. 2004, 28, 367–374. [Google Scholar] [CrossRef]
- Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. iRNA-AI: Identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2017, 8, 4208. [Google Scholar] [CrossRef] [Green Version]
- Zeng, H.; Edwards, M.D.; Liu, G.; Gifford, D.K. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 2016, 32, i121–i127. [Google Scholar] [CrossRef]
Parameter | List of Values Evaluated |
---|---|
Sliding window size (W) | 80, 130, 180, 230, 280 |
Max-pooling | 2 |
Number of convolutional motifs (F) | 32, 64, 128 |
Kernel size (k) | 3, 5, 7, 9 |
Droup (D) | 0.25 |
Optimization | Adam |
Name | Architecture |
---|---|
1 layer32 | 32 Convolution kernels |
1 layer64 | 64 Convolution kernels |
1 layer128 | 128 Convolution kernels |
2 layer | 64/128 Convolution kernels |
3 layer | 64/64/128 Convolution kernels |
Datasets | Model | MCC (O) | MCC (I) | MCC (S) | MCC (M) | ACC |
---|---|---|---|---|---|---|
SM424-18 | DeepMito | 0.46 | 0.47 | 0.53 | 0.65 | NA |
DeepPred-SubMito | 0.85 | 0.49 | 0.99 | 0.56 | 0.79 | |
SubMitoPred | SubMitoPred | 0.42 | 0.34 | 0.19 | 0.51 | NA |
DeepMito | 0.45 | 0.68 | 0.54 | 0.79 | NA | |
DeepPred-SubMito | 0.92 | 0.69 | 0.97 | 0.73 | 0.88 |
Dataset | Model | MCC (I) | MCC (M) | MCC (O) | ACC (%) |
---|---|---|---|---|---|
M983 | SubMito-PSPCP | 0.77 | 0.73 | 0.83 | 89.01 |
Ahmad et al. | 0.871 | 0.986 | 0.996 | 0.951 | |
SubMito-XGBoost | 0.9559 | 0.9595 | 0.9604 | 98.94 | |
DeepPred-SubMito | 0.9503 | 0.9649 | 0.9807 | 97.68 |
Compartment | SM424-18 | SubMitoPred | M983 |
---|---|---|---|
Outer membrane | 74 | 82 | 145 |
Inner membrane | 190 | 282 | 661 |
Intermembrane space | 25 | 32 | NA |
Matrix | 135 | 174 | 177 |
Total | 424 | 570 | 983 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Jin, Y.; Zhang, Q. DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. Int. J. Mol. Sci. 2020, 21, 5710. https://doi.org/10.3390/ijms21165710
Wang X, Jin Y, Zhang Q. DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. International Journal of Molecular Sciences. 2020; 21(16):5710. https://doi.org/10.3390/ijms21165710
Chicago/Turabian StyleWang, Xiao, Yinping Jin, and Qiuwen Zhang. 2020. "DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment" International Journal of Molecular Sciences 21, no. 16: 5710. https://doi.org/10.3390/ijms21165710
APA StyleWang, X., Jin, Y., & Zhang, Q. (2020). DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. International Journal of Molecular Sciences, 21(16), 5710. https://doi.org/10.3390/ijms21165710