Lung Cancer Diagnosis System Based on Volatile Organic Compounds (VOCs) Profile Measured in Exhaled Breath
Abstract
:1. Introduction
2. Materials and Methods
2.1. Patients
2.2. Methodology
2.3. Feature Selection
2.3.1. Correlation Coefficient
2.3.2. Maximum Relevance–Minimum Redundancy (mRMR)
2.4. Classification
- Support vector machineSVM [38] is a marginal predictor that finds the best hyperplane in the feature vector space, thereby establishing a boundary that maximizes the margin between data samples in distinct categories, which results in strong generalization capabilities. In this study, a hyperplane linear classifier is used with the support vectors closest to the decision boundary being used. SVM with N support vectors and weights was used to estimate the classification given by:
- Logistic regressionIn LR [39], the predictor variable is generated by a linear combination of the input variables. The values of this predictor variable are converted into probabilities using a logistic function. It uses the logit function to calculate the chance of an event occurring:
- kNN classifierThe k-nearest neighbors algorithm [39] is a non-parametric classification and regression. The input in the feature space is made up of the k closest training examples. A class membership is the outcome of kNN. The object is assigned to the most frequent class among its k nearest neighbors after a majority of its neighbors vote to categorize it. If k = 1, the item is simply assigned to the class of the item’s closest neighbor. In the proposed system, we used k = 7 so that the class with the majority vote for the closest seven training samples would be selected as the classification result.
- Naïve BayesThe Bayes rule is used to calculate NB, which assumes that features (variables) are independent of one another given the class. For a training sample s with m VOCs values levels for the m features, the posterior probability that belongs to a class is
- Decision tree (DT)The generation of a decision tree is a very efficient approach for constructing classifiers from data. The most commonly used logic technique is the tree representation. A typical decision-tree learning system uses a top-down approach to find a solution in a specific section of the search space. It splits the working area into subparts and uses the Gini Index, Gain Ratio or Information Gain to verify the purity of a class division. These metrics are utilized in the decision tree that forms the classification and regression tree (CART), C4.5 and ID3, respectively. In this study, CART is used [39].
- Random forest (RF)Random forest [39] is an ensemble learning system for classification, regression and other tasks that works by creating a large number of decision trees during training and generating a class that is the mode of the individual tree classes (classification) or a mean/average predictor (regression). Random forest is a method of averaging a large number of deep decision trees that have been trained to minimise variation across various parts of the same training set. This comes at the cost of a small increase in bias and some lack of interpretability, but it usually results in a significant gain in the efficiency of the final model. In this study, RF is built using 1000 CART decision trees.
- Bagging classifierBagging classifiers [38] are ensemble meta-estimators that apply basic classifiers to random subsets of the original dataset before aggregating their predictions (either by vote or average) to generate a final prediction. A meta-estimator that incorporates randomness into the building technique of a black-box estimator (e.g., a decision tree) can be used to reduce the variance of a black-box estimator. In this study, the base estimator is a linear SVM, and the number of base estimators is 10.
- Adaboost classifierAn AdaBoost classifier [39] is a meta-estimator that begins by fitting a classifier on the original dataset before fitting consecutive copies of the classifier on the same dataset, but it modifies the weights of incorrectly classified cases such that subsequent classifiers focus more on challenging circumstances. In this study, the base estimator used is the CART decision tree, and the number of base estimators is 100.
- Neural networks (NN)The multilayer perceptron (MLP) [39] is a type of artificial neural network that is used to simulate complicated functions. It has three or more layers of nodes, including an input layer, one or more hidden levels and an output layer, as shown in Figure 3. MLP, in contrast to conventional techniques, does not need any previous assumptions about the distribution of training data, which eliminates the impact of data distribution on performance. The activation function converts input to output. A cost function is used to determine the best parameter values. To enhance the model, the network is run many times. The backpropagation method aids in the learning of NN parameters. In this study, to determine the best parameters, the Adam optimizer is utilized. In the hidden layers, the rectified linear unit (ReLU) activation function is employed, and, in the output layer, the sigmoid activation function is used to offer a prediction between 0 and 1, with a value of larger than 0.5 indicating malignancy and a value of less than 0.5 indicating benignity.
3. Results
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
- Midthun, D.E. Early diagnosis of lung cancer. F1000prime Rep. 2013, 5, 12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2018. CA Cancer J. Clin. 2018, 71, 7–33. [Google Scholar] [CrossRef] [PubMed]
- Gordienko, Y.; Gang, P.; Hui, J.; Zeng, W.; Kochura, Y.; Alienin, O.; Rokovyi, O.; Stirenko, S. Deep learning with lung segmentation and bone shadow exclusion techniques for chest X-ray analysis of lung cancer. In Proceedings of the International Conference on Computer Science, Engineering and Education Applications, Kiev, Ukraine, 18–20 January 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 638–647. [Google Scholar]
- Feng, P.H.; Chen, T.T.; Lin, Y.T.; Chiang, S.Y.; Lo, C.M. Classification of lung cancer subtypes based on autofluorescence bronchoscopic pattern recognition: A preliminary study. Comput. Methods Programs Biomed. 2018, 163, 33–38. [Google Scholar] [CrossRef] [PubMed]
- Hyun, S.H.; Ahn, M.S.; Koh, Y.W.; Lee, S.J. A machine-learning approach using PET-based radiomics to predict the histological subtypes of lung cancer. Clin. Nucl. Med. 2019, 44, 956–960. [Google Scholar] [CrossRef]
- Bębas, E.; Borowska, M.; Derlatka, M.; Oczeretko, E.; Hładuński, M.; Szumowski, P.; Mojsak, M. Machine-learning-based classification of the histological subtype of non-small-cell lung cancer using MRI texture analysis. Biomed. Signal Process. Control 2021, 66, 102446. [Google Scholar] [CrossRef]
- De Mesquita, V.A.; Cortez, P.C.; Ribeiro, A.B.; de Albuquerque, V.H.C. A novel method for lung nodule detection in computed tomography scans based on Boolean equations and vector of filters techniques. Comput. Electr. Eng. 2022, 100, 107911. [Google Scholar] [CrossRef]
- Da Nóbrega, R.V.M.; Rebouças Filho, P.P.; Rodrigues, M.B.; da Silva, S.P.; Dourado Júnior, C.M.; de Albuquerque, V.H.C. Lung nodule malignancy classification in chest computed tomography images using transfer learning and convolutional neural networks. Neural Comput. Appl. 2020, 32, 11065–11082. [Google Scholar] [CrossRef]
- Barros, A.C.; Ramalho, G.L.; Pereira, C.R.; Papa, J.P.; de Albuquerque, V.H.C.; Tavares, J.M.R. Automated recognition of lung diseases in CT images based on the optimum-path forest classifier. Neural Comput. Appl. 2019, 31, 901–914. [Google Scholar]
- Rodrigues, M.B.; Da Nobrega, R.V.M.; Alves, S.S.A.; Reboucas Filho, P.P.; Duarte, J.B.F.; Sangaiah, A.K.; De Albuquerque, V.H.C. Health of things algorithms for malignancy level classification of lung nodules. IEEE Access 2018, 6, 18592–18601. [Google Scholar] [CrossRef]
- Jett, J. Screening for lung cancer: Who should be screened? Arch. Pathol. Lab. Med. 2012, 136, 1511–1514. [Google Scholar] [CrossRef] [PubMed]
- Liu, B.; Ricarte Filho, J.; Mallisetty, A.; Villani, C.; Kottorou, A.; Rodgers, K.; Chen, C.; Ito, T.; Holmes, K.; Gastala, N.; et al. Detection of Promoter DNA Methylation in Urine and Plasma Aids the Detection of Non–Small Cell Lung Cancer. Clin. Cancer Res. 2020, 26, 4339–4348. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Todd, N.W.; Qiu, Q.; Fan, T.; Zhao, R.Y.; Rodgers, W.H.; Fang, H.B.; Katz, R.L.; Stass, S.A.; Jiang, F. Genetic Deletions in Sputum as Diagnostic Markers for Early Detection of Stage I Non–Small Cell Lung Cancer. Clin. Cancer Res. 2007, 13, 482–487. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hanai, Y.; Shimono, K.; Matsumura, K.; Vachani, A.; Albelda, S.; Yamazaki, K.; Beauchamp, G.K.; Oka, H. Urinary volatile compounds as biomarkers for lung cancer. Biosci. Biotechnol. Biochem. 2012, 76, 679–684. [Google Scholar] [CrossRef] [Green Version]
- Zhang, C.; Leng, W.; Sun, C.; Lu, T.; Chen, Z.; Men, X.; Wang, Y.; Wang, G.; Zhen, B.; Qin, J. Urine proteome profiling predicts lung cancer from control cases and other tumors. EBioMedicine 2018, 30, 120–128. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Hong, W. Research status and funding trends of lung cancer biomarkers. J. Thorac. Dis. 2013, 5, 698. [Google Scholar]
- Bel’skaya, L.V.; Sarf, E.A.; Kosenok, V.K.; Gundyrev, I.A. Biochemical markers of saliva in lung cancer: Diagnostic and prognostic perspectives. Diagnostics 2020, 10, 186. [Google Scholar] [CrossRef] [Green Version]
- Taghizadeh-Hesary, F.; Akbari, H.; Bahadori, M. Anti-mitochondrial therapy: A potential therapeutic approach in oncology. Preprints 2022. [Google Scholar] [CrossRef]
- Janfaza, S.; Khorsand, B.; Nikkhah, M.; Zahiri, J. Digging deeper into volatile organic compounds associated with cancer. Biol. Methods Protoc. 2019, 4, bpz014. [Google Scholar] [CrossRef]
- Kort, S.; Tiggeloven, M.; Brusse-Keizer, M.; Gerritsen, J.; Schouwink, J.; Citgez, E.; de Jongh, F.; Samii, S.; van der Maten, J.; van den Bogart, M.; et al. Multi-centre prospective study on diagnosing subtypes of lung cancer by exhaled-breath analysis. Lung Cancer 2018, 125, 223–229. [Google Scholar] [CrossRef]
- Phillips, M.; Altorki, N.; Austin, J.H.; Cameron, R.B.; Cataneo, R.N.; Greenberg, J.; Kloss, R.; Maxfield, R.A.; Munawar, M.I.; Pass, H.I.; et al. Prediction of lung cancer using volatile biomarkers in breath 1. Cancer Biomark. 2007, 3, 95–109. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bajtarevic, A.; Ager, C.; Pienz, M.; Klieber, M.; Schwarz, K.; Ligor, M.; Ligor, T.; Filipiak, W.; Denz, H.; Fiegl, M.; et al. Noninvasive detection of lung cancer by analysis of exhaled breath. BMC Cancer 2009, 9, 1–16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mazzone, P.J.; Wang, X.F.; Lim, S.; Jett, J.; Choi, H.; Zhang, Q.; Beukemann, M.; Seeley, M.; Martino, R.; Rhodes, P. Progress in the development of volatile exhaled breath signatures of lung cancer. Ann. Am. Thorac. Soc. 2015, 12, 752–757. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gasparri, R.; Santonico, M.; Valentini, C.; Sedda, G.; Borri, A.; Petrella, F.; Maisonneuve, P.; Pennazza, G.; D’Amico, A.; Di Natale, C.; et al. Volatile signature for the early diagnosis of lung cancer. J. Breath Res. 2016, 10, 016007. [Google Scholar] [CrossRef] [PubMed]
- Bousamra, M., II; Schumer, E.; Li, M.; Knipp, R.J.; Nantz, M.H.; Van Berkel, V.; Fu, X.A. Quantitative analysis of exhaled carbonyl compounds distinguishes benign from malignant pulmonary disease. J. Thorac. Cardiovasc. Surg. 2014, 148, 1074–1081. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fu, X.A.; Li, M.; Knipp, R.J.; Nantz, M.H.; Bousamra, M. Noninvasive detection of lung cancer using exhaled breath. Cancer Med. 2014, 3, 174–181. [Google Scholar] [CrossRef]
- Li, M.; Yang, D.; Brock, G.; Knipp, R.J.; Bousamra, M.; Nantz, M.H.; Fu, X.A. Breath carbonyl compounds as biomarkers of lung cancer. Lung Cancer 2015, 90, 92–97. [Google Scholar] [CrossRef]
- Schumer, E.M.; Trivedi, J.R.; van Berkel, V.; Black, M.C.; Li, M.; Fu, X.A.; Bousamra, M., II. High sensitivity for lung cancer detection using analysis of exhaled carbonyl compounds. J. Thorac. Cardiovasc. Surg. 2015, 150, 1517–1524. [Google Scholar] [CrossRef]
- Schumer, E.M.; Black, M.C.; Bousamra, M., II; Trivedi, J.R.; Li, M.; Fu, X.A.; van Berkel, V. Normalization of exhaled carbonyl compounds after lung cancer resection. Ann. Thorac. Surg. 2016, 102, 1095–1100. [Google Scholar] [CrossRef] [Green Version]
- Pesesse, R.; Stefanuto, P.H.; Schleich, F.; Louis, R.; Focant, J.F. Multimodal chemometric approach for the analysis of human exhaled breath in lung cancer patients by TD-GC× GC-TOFMS. J. Chromatogr. B 2019, 1114, 146–153. [Google Scholar] [CrossRef]
- Koureas, M.; Kirgou, P.; Amoutzias, G.; Hadjichristodoulou, C.; Gourgoulianis, K.; Tsakalof, A. Target analysis of volatile organic compounds in exhaled breath for lung cancer discrimination from other pulmonary diseases and healthy persons. Metabolites 2020, 10, 317. [Google Scholar] [CrossRef] [PubMed]
- Tsou, P.H.; Lin, Z.L.; Pan, Y.C.; Yang, H.C.; Chang, C.J.; Liang, S.K.; Wen, Y.F.; Chang, C.H.; Chang, L.Y.; Yu, K.L.; et al. Exploring volatile organic compounds in breath for high-accuracy prediction of lung cancer. Cancers 2021, 13, 1431. [Google Scholar] [CrossRef] [PubMed]
- Hakim, M.; Broza, Y.Y.; Barash, O.; Peled, N.; Phillips, M.; Amann, A.; Haick, H. Volatile organic compounds of lung cancer and possible biochemical pathways. Chem. Rev. 2012, 112, 5949–5966. [Google Scholar] [CrossRef]
- Kim, K.H.; Jahan, S.A.; Kabir, E. A review of breath analysis for diagnosis of human health. TrAC Trends Anal. Chem. 2012, 33, 1–8. [Google Scholar] [CrossRef]
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- Brown, G.; Pocock, A.; Zhao, M.J.; Luján, M. Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
- Asuntha, A.; Srinivasan, A. Deep learning for lung Cancer detection and classification. Multimed. Tools Appl. 2020, 79, 7731–7762. [Google Scholar] [CrossRef]
- Gupta, S.; Sedamkar, R. Machine learning for healthcare: Introduction. In Machine Learning with Health Care Perspective; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 1–25. [Google Scholar]
- Murray, R.K.; Granner, D.K.; Rodwell, V.W. Harper’s Illustrated Biochemistry, 27th ed.; Lange Medical Books: New York, NY, USA, 2006. [Google Scholar]
- Kischkel, S.; Miekisch, W.; Sawacki, A.; Straker, E.M.; Trefz, P.; Amann, A.; Schubert, J.K. Breath biomarkers for lung cancer detection and assessment of smoking related effects—Confounding variables, influence of normalization and statistical algorithms. Clin. Chim. Acta 2010, 411, 1637–1644. [Google Scholar] [CrossRef]
- Sponring, A.; Filipiak, W.; Mikoviny, T.; Ager, C.; Schubert, J.; Miekisch, W.; Amann, A.; Troppmair, J. Release of volatile organic compounds from the lung cancer cell line NCI-H2087 in vitro. Anticancer Res. 2009, 29, 419–426. [Google Scholar]
Age (years) | 30–96 |
Malignant | 252 |
Height (cm) | 126–193 |
Weight (Kg) | 33–183 |
Active smoker | 169 |
Previous smoker | 232 |
Lifelong non-smoker | 100 |
Personal history of lung cancer | 61 |
Family history of lung cancer | 125 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.84 | 0.78 | 0.90 | 0.83 |
LR | 0.82 | 0.74 | 0.90 | 0.80 |
kNN | 0.80 | 0.70 | 0.90 | 0.78 |
NB | 0.59 | 0.32 | 0.86 | 0.44 |
DT | 0.68 | 0.58 | 0.78 | 0.64 |
RF | 0.87 | 0.88 | 0.86 | 0.87 |
Bagging | 0.84 | 0.82 | 0.86 | 0.84 |
AdaBoost | 0.75 | 0.74 | 0.76 | 0.75 |
NN | 0.69 | 0.66 | 0.73 | 0.68 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.83 | 0.79 | 0.87 | 0.83 |
LR | 0.81 | 0.78 | 0.84 | 0.80 |
kNN | 0.79 | 0.68 | 0.89 | 0.76 |
NB | 0.63 | 0.38 | 0.87 | 0.51 |
DT | 0.68 | 0.67 | 0.70 | 0.68 |
RF | 0.84 | 0.86 | 0.83 | 0.84 |
Bagging | 0.83 | 0.81 | 0.86 | 0.83 |
AdaBoost | 0.74 | 0.76 | 0.71 | 0.74 |
NN | 0.68 | 0.68 | 0.68 | 0.68 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.82 | 0.78 | 0.87 | 0.81 |
LR | 0.81 | 0.78 | 0.84 | 0.80 |
kNN | 0.77 | 0.66 | 0.88 | 0.74 |
NB | 0.63 | 0.39 | 0.86 | 0.51 |
DT | 0.69 | 0.66 | 0.72 | 0.68 |
RF | 0.82 | 0.84 | 0.80 | 0.83 |
Bagging | 0.82 | 0.80 | 0.84 | 0.82 |
AdaBoost | 0.78 | 0.80 | 0.75 | 0.78 |
NN | 0.69 | 0.67 | 0.71 | 0.68 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.79 | 0.68 | 0.90 | 0.76 |
LR | 0.78 | 0.66 | 0.90 | 0.75 |
kNN | 0.87 | 0.78 | 0.96 | 0.86 |
NB | 0.62 | 0.36 | 0.88 | 0.49 |
DT | 0.74 | 0.68 | 0.80 | 0.72 |
RF | 0.87 | 0.86 | 0.88 | 0.87 |
Bagging | 0.79 | 0.66 | 0.92 | 0.76 |
AdaBoost | 0.75 | 0.74 | 0.76 | 0.75 |
NN | 0.69 | 0.58 | 0.80 | 0.65 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.79 | 0.68 | 0.90 | 0.76 |
LR | 0.78 | 0.66 | 0.90 | 0.75 |
kNN | 0.81 | 0.76 | 0.86 | 0.80 |
NB | 0.75 | 0.54 | 0.96 | 0.68 |
DT | 0.65 | 0.60 | 0.71 | 0.63 |
RF | 0.84 | 0.80 | 0.88 | 0.83 |
Bagging | 0.82 | 0.72 | 0.92 | 0.80 |
AdaBoost | 0.78 | 0.72 | 0.84 | 0.77 |
NN | 0.73 | 0.68 | 0.78 | 0.72 |
Algorithm | Accuracy | Sensitivity | Specificity | F-Score |
---|---|---|---|---|
SVM | 0.86 | 0.80 | 0.92 | 0.85 |
LR | 0.86 | 0.80 | 0.92 | 0.85 |
kNN | 0.77 | 0.68 | 0.86 | 0.75 |
NB | 0.81 | 0.66 | 0.96 | 0.78 |
DT | 0.65 | 0.58 | 0.73 | 0.62 |
RF | 0.66 | 0.60 | 0.73 | 0.64 |
Bagging | 0.86 | 0.80 | 0.92 | 0.85 |
AdaBoost | 0.66 | 0.60 | 0.73 | 0.64 |
NN | 0.86 | 0.80 | 0.92 | 0.85 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shaffie, A.; Soliman, A.; Eledkawy, A.; Fu, X.-A.; Nantz, M.H.; Giridharan, G.; van Berkel, V.; El-Baz, A. Lung Cancer Diagnosis System Based on Volatile Organic Compounds (VOCs) Profile Measured in Exhaled Breath. Appl. Sci. 2022, 12, 7165. https://doi.org/10.3390/app12147165
Shaffie A, Soliman A, Eledkawy A, Fu X-A, Nantz MH, Giridharan G, van Berkel V, El-Baz A. Lung Cancer Diagnosis System Based on Volatile Organic Compounds (VOCs) Profile Measured in Exhaled Breath. Applied Sciences. 2022; 12(14):7165. https://doi.org/10.3390/app12147165
Chicago/Turabian StyleShaffie, Ahmed, Ahmed Soliman, Amr Eledkawy, Xiao-An Fu, Michael H. Nantz, Guruprasad Giridharan, Victor van Berkel, and Ayman El-Baz. 2022. "Lung Cancer Diagnosis System Based on Volatile Organic Compounds (VOCs) Profile Measured in Exhaled Breath" Applied Sciences 12, no. 14: 7165. https://doi.org/10.3390/app12147165
APA StyleShaffie, A., Soliman, A., Eledkawy, A., Fu, X.-A., Nantz, M. H., Giridharan, G., van Berkel, V., & El-Baz, A. (2022). Lung Cancer Diagnosis System Based on Volatile Organic Compounds (VOCs) Profile Measured in Exhaled Breath. Applied Sciences, 12(14), 7165. https://doi.org/10.3390/app12147165