MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Feature Selection
2.2. gcForest Model
2.3. Multi-Weighted gcForest (MLW-gcForest)
2.3.1. Calculation of Weight α
2.3.2. Sorting Optimization Algorithm (Calculation of Weight β)
Sorting Optimization Algorithm
Algorithm 1: Sorting optimization algorithm |
Input: Number of samples : Number of sliding windows |
For () # current window |
For () # current sample |
Number of feature vectors after scanning (: Number of original features : Number of sample classes : Size of the sliding window : Scanning step size, default ) |
For (j ) |
Input S-dim feature vector into random forest, and output -dim class vector |
Input S-dim feature vector into completely random forest, and output -dim class vector |
End For |
Concatenate -dim class vector from random forest Concatenate -dim class vector from completely random forest and are multiplied by weights and |
Concatenate these vectors (length ) |
Descend L-dim vector, Use the top of the sorted vectors to obtain the prediction ability of the current sample in the current window , as detailed by the following formula |
End For |
: Prediction ability of sliding window |
End for |
Output: |
3. Results
3.1. Dataset Preparation
3.2. Experiments
3.3. Results
3.3.1. Classification Performance of Different Machine Learning Methods for Five Cancer Subtypes
3.3.2. Comparison with the State of the Art
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Noone, A.M.; Cronin, K.A.; Altekruse, S.F.; Howlader, N.; Lewis, D.R.; Petkov, V.I.; Penberthy, L. Cancer incidence and survival trends by subtype using data from the Surveillance Epidemiology and End Results Program, 1992–2013. Cancer Epidemiol. Biomark. Prev. 2016, 26, 632. [Google Scholar] [CrossRef] [PubMed]
- Choi, W.; Ochoa, A.; McConkey, D.J.; Aine, M.; Höglund, M.; Kim, W.Y.; Real, F.X.; Kiltie, A.E.; Milsom, I.; Dyrskjøt, L.; et al. Genetic alterations in the molecular subtypes of bladder cancer: Illustration in the cancer genome atlas dataset. Eur. Urol. 2017, 72, 354–365. [Google Scholar] [CrossRef] [PubMed]
- Dai, X.; Li, T.; Bai, Z.; Yang, Y.; Liu, X.; Zhan, J.; Shi, B. Breast cancer intrinsic subtype classification, clinical use and future trends. Am. J. Cancer Res. 2015, 5, 2929. [Google Scholar] [PubMed]
- Feng, P.H.; Chen, T.T.; Lin, Y.T.; Chiang, S.Y.; Lo, C.M. Classification of lung cancer subtypes based on autofluorescence bronchoscopic pattern recognition: A preliminary study. Comput. Methods Programs Biomed. 2018, 163, 33–38. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.S.; Heo, J.; Libbrecht, L.; Chu, I.S.; Kaposi-Novak, P.; Calvisi, D.F.; Mikaelyan, A.; Roberts, L.R.; Demetris, A.J.; Sun, Z.; et al. A novel prognostic subtype of human hepatocellular carcinoma derived from hepatic progenitor cells. Nat. Med. 2006, 12, 410. [Google Scholar] [CrossRef] [PubMed]
- Lee, E.; Yong, R.L.; Paddison, P.; Zhu, J. Comparison of glioblastoma (GBM) molecular classification methods. In Seminars in Cancer Biology; Academic Press: Cambridge, MA, USA, 2018; Volume 53, pp. 201–211. [Google Scholar]
- Cristescu, R.; Lee, J.; Nebozhyn, M.; Kim, K.M.; Ting, J.C.; Wong, S.S.; Liu, J.; Yue, Y.G.; Wang, J.; Yu, K.; et al. Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes. Nat. Med. 2015, 21, 449. [Google Scholar] [CrossRef] [PubMed]
- Way, G.P.; Sanchez-Vega, F.; La, K.; Armenia, J.; Chatila, W.K.; Luna, A.; Sander, C.; Cherniack, A.D.; Mina, M.; Ciriello, G.; et al. Machine learning detects pan-cancer ras pathway activation in the cancer genome atlas. Cell Rep. 2018, 23, 172–180.e3. [Google Scholar] [CrossRef]
- Wong, K.C.; Chen, J.; Zhang, J.; Lin, J.; Yan, S.; Zhang, S.; Li, X.; Liang, C.; Peng, C.; Lin, Q.; et al. Early Cancer Detection from Multianalyte Blood Test Results. iScience 2019, 15, 332–341. [Google Scholar] [CrossRef] [Green Version]
- Sachnev, V.; Suresh, S.; Choi, Y.S. Cancer subtype’s classifier based on Hybrid Samples Balanced Genetic Algorithm and Extreme Learning Machine. J. Digit. Contents Soc. 2016, 17, 565–579. [Google Scholar] [CrossRef]
- Muhamed Ali, A.; Zhuang, H.; Ibrahim, A.; Rehman, O.; Huang, M.; Wu, A. A Machine Learning Approach for the Classification of Kidney Cancer Subtypes Using miRNA Genome Data. Appl. Sci. 2018, 8, 2422. [Google Scholar] [CrossRef]
- Flynn, W.F.; Namburi, S.; Paisie, C.A.; Reddi, H.V.; Li, S.; Karuturi, R.K.M.; George, J. Pan-cancer machine learning predictors of primary site of origin and molecular subtype. bioRxiv 2018, 333914. [Google Scholar] [CrossRef]
- Villa, C.; Cagle, P.T.; Johnson, M.; Patel, J.D.; Yeldandi, A.V.; Raj, R.; DeCamp, M.M.; Raparia, K. Correlation of EGFR mutation status with predominant histologic subtype of adenocarcinoma according to the new lung adenocarcinoma classification of the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society. Arch. Pathol. Lab. Med. 2014, 138, 1353–1357. [Google Scholar] [PubMed]
- Hung, F.H.; Chiu, H.W. Cancer subtype prediction from a pathway-level perspective by using a support vector machine based on integrated gene expression and protein network. Comput. Methods Programs Biomed. 2017, 141, 27–34. [Google Scholar] [CrossRef] [PubMed]
- Tomczak, K.; Czerwińska, P.; Wiznerowicz, M. The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. 2015, 19, A68. [Google Scholar] [CrossRef] [PubMed]
- Yu, K.H.; Zhang, C.; Berry, G.J.; Altman, R.B.; Ré, C.; Rubin, D.L.; Snyder, M. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 2016, 7, 12474. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sun, D.; Wang, M.; Li, A. A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 16, 841–850. [Google Scholar] [CrossRef] [PubMed]
- Becker, A.S.; Marcon, M.; Ghafoor, S.; Wurnig, M.C.; Frauenfelder, T.; Boss, A. Deep learning in mammography: Diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Investig. Radiol. 2017, 52, 434–440. [Google Scholar] [CrossRef]
- Cai, Z.; Xu, D.; Zhang, Q.; Zhang, J.; Ngai, S.M.; Shao, J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. BioSyst. 2015, 11, 791–800. [Google Scholar] [CrossRef]
- Guo, Y.; Shang, X.; Li, Z. Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer. Neurocomputing 2019, 324, 20–30. [Google Scholar] [CrossRef]
- Lu, C.F.; Hsu, F.T.; Hsieh, K.L.C.; Kao, Y.C.J.; Cheng, S.J.; Hsu, J.B.K.; Tsai, P.H.; Chen, R.J.; Huang, C.C.; Yen, Y.; et al. Machine learning–based radiomics for molecular subtyping of gliomas. Clin. Cancer Res. 2018, 24, 4429–4436. [Google Scholar] [CrossRef]
- Liao, Z.; Li, D.; Wang, X.; Li, L.; Zou, Q. Cancer diagnosis through IsomiR expression with machine learning method. Curr. Bioinform. 2018, 13, 57–63. [Google Scholar] [CrossRef]
- Xiao, Y.; Wu, J.; Lin, Z.; Zhao, X. A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 2018, 153, 1–9. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Cireşan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. arXiv 2012, arXiv:1202.2745. [Google Scholar]
- Ha, R.; Mutasa, S.; Karcich, J.; Gupta, N.; Van Sant, E.P.; Nemer, J.; Sun, M.; Chang, P.; Liu, M.Z.; Jambawalikar, S. Predicting Breast Cancer Molecular Subtype with MRI Dataset Utilizing Convolutional Neural Network Algorithm. J. Digit. Imaging 2019, 32, 276–282. [Google Scholar] [CrossRef] [PubMed]
- Coudray, N.; Ocampo, P.S.; Sakellaropoulos, T.; Narula, N.; Snuderl, M.; Fenyö, D.; Moreira, A.L.; Razavian, N.; Tsirigos, A. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 2018, 24, 1559. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Z.H.; Feng, J. Deep Forest: Towards an Alternative to Deep Neural Networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Ray, S. Disease Classification within Dermascopic Images Using features extracted by ResNet50 and classification through Deep Forest. arXiv 2018, arXiv:1807.05711. [Google Scholar]
- Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. 2010, 72, 417–473. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L.; Wang, B.; Li, F.; Zhang, Z. Feature clustering based support vector machine recursive feature elimination for gene selection. Appl. Intell. 2017, 48, 1–14. [Google Scholar] [CrossRef]
- Vinh, L.T.; Lee, S.; Park, Y.T.; d’Auriol, B.J. A novel feature selection method based on normalized mutual information. Appl. Intell. 2012, 37, 100–120. [Google Scholar] [CrossRef]
- Tibshirani, R. The lasso method for variable selection in the cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
- Lin, Y.; Liu, X.; Hao, M. Model-free feature screening for high-dimensional survival data. Sci. China Math. 2018, 61, 1617–1636. [Google Scholar] [CrossRef]
- Quinlan J, R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Fan, W.; Wang, H.; Philip, S.Y.; Ma, S. Is random model better? On its accuracy and efficiency. In Proceedings of the Third IEEE International Conference on Data Mining, Melbourne, FL, USA, 22 November 2003; p. 51. [Google Scholar]
- Cortes, C.; Mohri, M. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation: Vancouver, BC, Canada, 2004; pp. 313–320. [Google Scholar]
- Telonis, A.; Magee, R.; Loher, P.; Chervoneva, I.; Londin, E.; Rigoutsos, I. The presence or absence alone of miRNA isoforms (isomiRs) successfully discriminate amongst the 32 TCGA cancer types. bioRxiv 2016, 082685. [Google Scholar]
- Li, H.; Zhu, Y.; Burnside, E.S.; Huang, E.; Drukker, K.; Hoadley, K.A.; Fan, C.; Conzen, S.D.; Zuley, M.; Net, J.M.; et al. Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set. NPJ Breast Cancer 2016, 2, 16012. [Google Scholar] [CrossRef]
- Sherafatian, M. Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping. Gene 2018, 677, 111–118. [Google Scholar] [CrossRef]
- Podolsky, M.D.; Barchuk, A.A.; Kuznetcov, V.I.; Gusarova, N.F.; Gaidukov, V.S.; Tarakanov, S.A. Evaluation of machine learning algorithm utilization for lung cancer classification based on gene expression levels. Asian Pac. J. Cancer Prev. 2016, 17, 835–838. [Google Scholar] [CrossRef]
- Tan, P.S.; Nakagawa, S.; Goossens, N.; Venkatesh, A.; Huang, T.; Ward, S.C.; Sun, X.; Song, W.M.; Koh, A.; Canasto-Chibuque, C.; et al. Clinicopathological indices to predict hepatocellular carcinoma molecular classification. Liver Int. 2016, 36, 108–118. [Google Scholar] [CrossRef]
- Friemel, J.; Rechsteiner, M.; Frick, L.; Böhm, F.; Struckmann, K.; Egger, M.; Moch, H.; Heikenwalder, M.; Weber, A. Intratumor heterogeneity in hepatocellular carcinoma. Clin. Cancer Res. 2015, 21, 1951–1961. [Google Scholar] [CrossRef]
- Ryu, Y.J.; Choi, S.H.; Park, S.J.; Yun, T.J.; Kim, J.H.; Sohn, C.H. Glioma: Application of whole-tumor texture analysis of diffusion-weighted imaging for the evaluation of tumor heterogeneity. PLoS ONE 2014, 9, e108335. [Google Scholar] [CrossRef] [PubMed]
Cancer | Total Samples | Available Samples | Feature Dimensions | Subtype Classes |
---|---|---|---|---|
Breast invasive carcinoma (BRCA) | 1247 | 514 | 350 | 4 |
Glioblastoma (GBM) | 629 | 546 | 240 | 4 |
Lung adenocarcinoma (LUAD) | 706 | 317 | 380 | 3 |
Stomach adenocarcinoma (STAD) | 580 | 508 | 250 | 4 |
Liver hepatocellular carcinoma (LIHC) | 167 | 167 | 390 | 4 |
BRCA | LUAD | LIHC | GBM | STAD | ||
---|---|---|---|---|---|---|
SVM | ACC | 0.752 | 0.762 | 0.714 | 0.694 | 0.674 |
Pre | 0.755 | 0.754 | 0.726 | 0.723 | 0.619 | |
Recall | 0.709 | 0.742 | 0.693 | 0.732 | 0.574 | |
F1 | 0.731 | 0.748 | 0.709 | 0.727 | 0.596 | |
KNN | ACC | 0.745 | 0.750 | 0.688 | 0.631 | 0.706 |
Pre | 0.774 | 0.746 | 0.708 | 0.683 | 0.697 | |
Recall | 0.743 | 0.739 | 0.686 | 0.736 | 0.736 | |
F1 | 0.758 | 0.742 | 0.697 | 0.709 | 0.716 | |
LR | ACC | 0.730 | 0.746 | 0.718 | 0.669 | 0.658 |
Pre | 0.728 | 0.756 | 0.708 | 0.683 | 0.609 | |
Recall | 0.663 | 0.726 | 0.701 | 0.726 | 0.559 | |
F1 | 0.694 | 0.741 | 0.704 | 0.704 | 0.583 | |
RF | ACC | 0.691 | 0.676 | 0.716 | 0.730 | 0.674 |
Pre | 0.527 | 0.532 | 0.693 | 0.751 | 0.546 | |
Recall | 0.475 | 0.508 | 0.699 | 0.753 | 0.563 | |
F1 | 0.500 | 0.520 | 0.696 | 0.752 | 0.554 | |
gcForest | ACC | 0.852 | 0.820 | 0.804 | 0.836 | 0.757 |
Pre | 0.859 | 0.821 | 0.798 | 0.857 | 0.733 | |
Recall | 0.826 | 0.819 | 0.778 | 0.850 | 0.788 | |
F1 | 0.842 | 0.820 | 0.788 | 0.853 | 0.760 | |
MLW-gcForest | ACC | 0.915 | 0.866 | 0.873 | 0.885 | 0.876 |
Pre | 0.923 | 0.863 | 0.845 | 0.863 | 0.872 | |
Recall | 0.916 | 0.852 | 0.829 | 0.878 | 0.821 | |
F1 | 0.919 | 0.857 | 0.837 | 0.870 | 0.846 |
Cancer | Methods | Result | ||
---|---|---|---|---|
AUC | ACC | Pre | ||
BRCA | Liao [22] | N/A | 0.87 | N/A |
Guo [20] | N/A | N/A | 0.88 | |
Telonis [41] | N/A | 0.91 | N/A | |
Li [42] | 0.89 | N/A | N/A | |
Sherafatian [43] | N/A | 0.89 | 0.90 | |
MLW-gcForest | 0.98 | 0.91 | 0.92 | |
LUAD | Liao [22] | N/A | 0.91 | N/A |
Guo [20] | N/A | N/A | 0.88 | |
Telonis [41] | N/A | 0.86 | N/A | |
Podolsky [44] | 0.92 | N/A | N/A | |
Cai [19] | N/A | 0.85 | 0.86 | |
MLW-gcForest | 0.92 | 0.87 | 0.86 | |
LIHC | Guo [20] | N/A | N/A | 0.82 |
Telonis [41] | N/A | 0.90 | N/A | |
Tan [45] | 0.77 | 0.83 | N/A | |
Friemel [46] | N/A | 0.87 | N/A | |
MLW-gcForest | 0.91 | 0.87 | 0.85 | |
GBM | Guo [20] | N/A | N/A | 0.78 |
Lu [21] | 0.92 | 0.88 | N/A | |
Ryu [47] | 0.83 | 0.80 | N/A | |
MLW-gcForest | 0.87 | 0.89 | 0.86 | |
STAD | Liao [22] | N/A | 0.84 | N/A |
Telonis [41] | N/A | 0.85 | N/A | |
MLW-gcForest | 0.88 | 0.87 | 0.87 |
Methylation | RNA | CNV | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | Pre | Recall | F1 | ACC | Pre | Recall | F1 | ACC | Pre | Recall | F1 | |
BRCA | 0.915 | 0.923 | 0.916 | 0.919 | 0.844 | 0.851 | 0.846 | 0.848 | 0.757 | 0.722 | 0.745 | 0.733 |
LUAD | 0.866 | 0.863 | 0.852 | 0.857 | 0.807 | 0.826 | 0.824 | 0.825 | 0.739 | 0.746 | 0.726 | 0.736 |
LIHC | 0.873 | 0.845 | 0.829 | 0.837 | 0.796 | 0.816 | 0.810 | 0.813 | 0.726 | 0.731 | 0.744 | 0.737 |
GBM | 0.885 | 0.863 | 0.878 | 0.870 | 0.843 | 0.832 | 0.846 | 0.839 | 0.750 | 0.733 | 0.742 | 0.737 |
STAD | 0.876 | 0.872 | 0.821 | 0.846 | 0.739 | 0.725 | 0.714 | 0.719 | 0.668 | 0.673 | 0.676 | 0.674 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, Y.; Yang, W.; Wang, J.; Zhao, J.; Qiang, Y. MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data. Appl. Sci. 2019, 9, 3589. https://doi.org/10.3390/app9173589
Dong Y, Yang W, Wang J, Zhao J, Qiang Y. MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data. Applied Sciences. 2019; 9(17):3589. https://doi.org/10.3390/app9173589
Chicago/Turabian StyleDong, Yunyun, Wenkai Yang, Jiawen Wang, Juanjuan Zhao, and Yan Qiang. 2019. "MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data" Applied Sciences 9, no. 17: 3589. https://doi.org/10.3390/app9173589