Machine Learning Approaches to Identify Discriminative Signatures of Volatile Organic Compounds (VOCs) from Bacteria and Fungi Using SPME-DART-MS
Abstract
:1. Introduction
2. Results and Discussions
2.1. Classification of Pathogens as Bacteria or Fungi Based on VOC Signatures
2.2. Classifying Individual Bacterial Strains
2.3. Discussions
3. Materials and Methods
3.1. Sample Preparation
3.2. Ambient Plasma Ionization Mass Spectrometry
3.3. Data Preprocessing
3.4. Machine Learning Classification Algorithms
- Logistic Regression with ’L2’ regularization (Ridge Regression): This is a simple linear classification model that achieves good performance for linearly separable classes. A binary classifier was implemented with the stochastic average gradient (SAG) solver and regularized with an ‘L2’ prior.
- Logistic Regression with ’L1’ regularization (Lasso Regression): This is also a linear model that promotes sparsity in the learnable parameters that are can be seen as weights for each variable. The classifier was implemented with the stochastic average gradient (SAG) solver and regularized with an ‘L1’ prior.
- Decision Trees and Random Forests: The decision tree algorithm learns to predict the class of a given input by a series of simple decision rules that are inferred from the training data. Random forests are ensemble classifiers that train multitudes of decision trees on different subsets of features, each being trained on a bootstrapped subset of the training data. A random forest classifier was also trained on the PCA-transformed data, as well as on the binary feature matrix. A huge advantage of these methods is that they help identify subsets of input variables that may be most or least relevant to the problem. In our case, we can see the exact peak locations that were of discriminatory importance to the classifier (Figure 2b).
- Support Vector Machines (SVM): A support vector machine classifier works by finding a classification boundary that best separates the data points in the training set. It is not limited to finding a linear model and is able to find optimal separation in higher dimensional subspaces. An SVM classifier was trained on both the PCA-transformed data, as well as the binary feature matrix. A 5-fold cross-validation-based grid search was used to choose between linear kernels, radial basis function (rbf) kernels, and sigmoid kernels, as well as to choose the optimal hyperparameters.
- K-Nearest Neighbors (KNN): The K-nearest neighbors algorithm classifies a new data point by simply considering the class of a certain k number of data points in the training set that lie closest to it in the feature space, and then choosing the most frequently occurring class label. A KNN classifier was trained using 8 nearest neighbors, this being determined using a 5-fold cross-validation-based grid search.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fitzgerald, S.; Duffy, E.; Holland, L.; Morrin, A. Multi-strain volatile profiling of pathogenic and commensal cutaneous bacteria. Sci. Rep. 2020, 10, 17971. [Google Scholar] [CrossRef] [PubMed]
- Tait, E.; Perry, J.D.; Stanforth, S.P.; Dean, J.R. Identification of volatile organic compounds produced by bacteria using HS-SPME-GC-MS. J. Chromatogr. Sci. 2014, 52, 363–373. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Reali, S.; Najib, E.Y.; Balázs, K.E.T.; Tan, A.C.H.; Váradi, L.; Hibbs, D.E.; Groundwater, P.W. Novel diagnostics for point-of-care bacterial detection and identification. RSC Adv. 2019, 9, 21486–21497. [Google Scholar] [CrossRef] [Green Version]
- Wang, C.; Liu, M.; Wang, Z.; Li, S.; Deng, Y.; He, N. Point-of-care diagnostics for infectious diseases: From methods to devices. Nano Today 2021, 37, 101092. [Google Scholar] [CrossRef]
- Leong, S.X.; Leong, Y.X.; Tan, E.X.; Sim, H.Y.F.; Koh, C.S.L.; Lee, Y.H.; Chong, C.; Ng, L.S.; Chen, J.R.T.; Pang, D.W.C.; et al. Noninvasive and Point-of-Care Surface-Enhanced Raman Scattering (SERS)-Based Breathalyzer for Mass Screening of Coronavirus Disease 2019 (COVID-19) under 5 min. ACS Nano 2022, 16, 2629–2639. [Google Scholar] [CrossRef] [PubMed]
- McGuire, N.D.; Ewen, R.; de Lacy Costello, B.; Garner, C.E.; Probert, C.; Vaughan, K.; Ratcliffe, N.M. Towards point of care testing for C. difficile infection by volatile profiling, using the combination of a short multi-capillary gas chromatography column with metal oxide sensor detection. Meas. Sci. Technol. 2014, 25, 065108. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McNerney, R.; Daley, P. Towards a point-of-care test for active tuberculosis: Obstacles and opportunities. Nat. Rev. Microbiol. 2011, 9, 204–213. [Google Scholar]
- Korpi, A.; Järnberg, J.; Pasanen, A.L. Microbial volatile organic compounds. Crit. Rev. Toxicol. 2009, 39, 139–193. [Google Scholar] [CrossRef]
- Misztal, P.K.; Lymperopoulou, D.S.; Adams, R.I.; Scott, R.A.; Lindow, S.E.; Bruns, T.; Taylor, J.W.; Uehling, J.; Bonito, G.; Vilgalys, R.; et al. Emission Factors of Microbial Volatile Organic Compounds from Environmental Bacteria and Fungi. Environ. Sci. Technol. 2018, 52, 8272–8282. [Google Scholar] [CrossRef]
- Steppert, I.; Schönfelder, J.; Schultz, C.; Kuhlmeier, D. Rapid in vitro differentiation of bacteria by ion mobility spectrometry. Appl. Microbiol. Biotechnol. 2021, 105, 4297–4307. [Google Scholar] [CrossRef]
- Dailey, A.; Saha, J.; Zaidi, F.; Abdirahman, H.; Haymond, A.; Alem, F.; Hakami, R.; Couch, R. VOC fingerprints: Metabolomic signatures of biothreat agents with and without antibiotic resistance. Sci. Rep. 2020, 10, 11746. [Google Scholar] [CrossRef] [PubMed]
- Kunze-Szikszay, N.; Euler, M.; Perl, T. Identification of volatile compounds from bacteria by spectrometric methods in medicine diagnostic and other areas: Current state and perspectives. Appl. Microbiol. Biotechnol. 2021, 105, 6245–6255. [Google Scholar] [CrossRef] [PubMed]
- Neerincx, A.H.; Geurts, B.P.; van Loon, J.; Tiemes, V.; Jansen, J.J.; Harren, F.J.M.; Kluijtmans, L.A.J.; Merkus, P.J.F.M.; Cristescu, S.M.; Buydens, L.M.C.; et al. Detection of Staphylococcus aureus in cystic fibrosis patients using breath VOC profiles. J. Breath Res. 2016, 10, 046014. [Google Scholar] [CrossRef] [PubMed]
- Zetola, N.M.; Modongo, C.; Matsiri, O.; Tamuhla, T.; Mbongwe, B.; Matlhagela, K.; Sepako, E.; Catini, A.; Sirugo, G.; Martinelli, E.; et al. Diagnosis of pulmonary tuberculosis and assessment of treatment response through analyses of volatile compound patterns in exhaled breath samples. J. Infect. 2017, 74, 367–376. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cody, R.B.; Laramée, J.A.; Durst, H.D. Versatile new ion source for the analysis of materials in open air under ambient conditions. Anal. Chem. 2005, 77, 2297–2302. [Google Scholar] [CrossRef] [PubMed]
- Smolinska, A.; Hauschild, A.C.; Fijten, R.; Dallinga, J.; Baumbach, J.; Van Schooten, F. Current breathomics—A review on data pre-processing techniques and machine learning in metabolomics breath analysis. J. Breath Res. 2014, 8, 027105. [Google Scholar] [CrossRef]
- Shetewi, T.; Finnegan, M.; Fitzgerald, S.; Xu, S.; Duffy, E.; Morrin, A. Investigation of the relationship between skin-emitted volatile fatty acids and skin surface acidity in healthy participants—A pilot study. J. Breath Res. 2021, 15, 037101. [Google Scholar] [CrossRef]
- Vishinkin, R.; Busool, R.; Mansour, E.; Fish, F.; Esmail, A.; Kumar, P.; Gharaa, A.; Cancilla, J.C.; Torrecilla, J.S.; Skenders, G.; et al. Profiles of Volatile Biomarkers Detect Tuberculosis from Skin. Adv. Sci. 2021, 8, 2100235. [Google Scholar] [CrossRef]
- Zamkah, A.; Hui, T.; Andrews, S.; Dey, N.; Shi, F.; Sherratt, R.S. Identification of suitable biomarkers for stress and emotion detection for future personal affective wearable sensors. Biosensors 2020, 10, 40. [Google Scholar] [CrossRef] [Green Version]
- Nylander-French, L.A.; Beauchamp, J.D.; Pleil, J.D. Volatile emissions from skin. InBreathborne Biomarkers and the Human Volatilome; Elsevier: Amsterdam, The Netherlands, 2020; pp. 409–423. [Google Scholar] [CrossRef]
- Nizio, K.; Perrault, K.; Troobnikoff, A.; Ueland, M.; Shoma, S.; Iredell, J.; Middleton, P.; Forbes, S. In vitro volatile organic compound profiling using GC× GC-TOFMS to differentiate bacteria associated with lung infections: A proof-of-concept study. J. Breath Res. 2016, 10, 026008. [Google Scholar] [CrossRef] [Green Version]
- Kusano, M.; Mendez, E.; Furton, K.G. Comparison of the volatile organic compounds from different biological specimens for profiling potential. J. Forensic Sci. 2013, 58, 29–39. [Google Scholar] [CrossRef]
- Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef]
- Mohammed, A.; Wyk, F.V.; Chinthala, L.K.; Khojandi, A.; Davis, R.L.; Coopersmith, C.M.; Kamaleswaran, R. Temporal Differential Expression of Physiomarkers Predicts Sepsis in Critically Ill Adults. Shock 2021, 56, 58–64. [Google Scholar] [CrossRef]
- Singhal, L.; Garg, Y.; Yang, P.; Tabaie, A.; Wong, A.I.; Mohammed, A.; Chinthala, L.; Kadaria, D.; Sodhi, A.; Holder, A.L.; et al. eARDS: A multi-center validation of an interpretable machine learning algorithm of early onset Acute Respiratory Distress Syndrome (ARDS) among critically ill adults with COVID-19. PLoS ONE 2021, 16, e0257056. [Google Scholar] [CrossRef]
- Grunwell, J.R.; Rad, M.G.; Stephenson, S.T.; Mohammad, A.F.; Opolka, C.; Fitzpatrick, A.M.; Kamaleswaran, R. Machine Learning–Based Discovery of a Gene Expression Signature in Pediatric Acute Respiratory Distress Syndrome. Crit. Care Explor. 2021, 3, e0431. [Google Scholar] [CrossRef]
- Yan, S.; Wang, S.; Qiu, J.; Li, M.; Li, D.; Xu, D.; Li, D.; Liu, Q. Raman spectroscopy combined with machine learning for rapid detection of food-borne pathogens at the single-cell level. Talanta 2021, 226, 122195. [Google Scholar] [CrossRef]
- Huang, Y.; Doh, I.J.; Bae, E. Design and validation of a portable machine learning-based electronic nose. Sensors 2021, 21, 3923. [Google Scholar] [CrossRef]
- Liao, Y.H.; Shih, C.H.; Abbod, M.F.; Shieh, J.S.; Hsiao, Y.J. Development of an E-nose system using machine learning methods to predict ventilator-associated pneumonia. Microsyst. Technol. 2022, 28, 341–351. [Google Scholar] [CrossRef]
- Palma, S.I.; Traguedo, A.P.; Porteira, A.R.; Frias, M.J.; Gamboa, H.; Roque, A.C. Machine learning for the meta-analyses of microbial pathogens’ volatile signatures. Sci. Rep. 2018, 8, 3360. [Google Scholar] [CrossRef] [Green Version]
- Vitense, P.; Kasbohm, E.; Klassen, A.; Gierschner, P.; Trefz, P.; Weber, M.; Miekisch, W.; Schubert, J.K.; Möbius, P.; Reinhold, P.; et al. Detection of Mycobacterium avium ssp. paratuberculosis in Cultures From Fecal and Tissue Samples Using VOC Analysis and Machine Learning Tools. Front. Vet. Sci. 2021, 8, 53. [Google Scholar] [CrossRef]
- Monedeiro, F.; Monedeiro-Milanowski, M.; Zmysłowski, H.; Martinis, B.S.D.; Buszewski, B. Evaluation of salivary VOC profile composition directed towards oral cancer and oral lesion assessment. Clin. Oral Investig. 2021, 25, 4415–4430. [Google Scholar] [CrossRef]
- Capitain, C.; Weller, P. Non-Targeted Screening Approaches for Profiling of Volatile Organic Compounds Based on Gas Chromatography-Ion Mobility Spectroscopy (GC-IMS) and Machine Learning. Molecules 2021, 26, 5457. [Google Scholar] [CrossRef]
- Busman, M.; Roberts, E.; Proctor, R.H.; Maragos, C.M. Volatile Organic Compound Profile Fingerprints Using DART–MS Shows Species-Specific Patterns in Fusarium Mycotoxin Producing Fungi. J. Fungi 2021, 8, 3. [Google Scholar] [CrossRef]
- Jastrzembski, J.A.; Bee, M.Y.; Sacks, G.L. Trace-Level Volatile Quantitation by Direct Analysis in Real Time Mass Spectrometry following Headspace Extraction: Optimization and Validation in Grapes. J. Agric. Food Chem. 2017, 65, 9353–9359. [Google Scholar] [CrossRef]
- Alvarez-Martin, A.; George, J.; Kaplan, E.; Osmond, L.; Bright, L.; Newsome, G.A.; Kaczkowski, R.; Vanmeert, F.; Kavich, G.; Heald, S. Identifying VOCs in exhibition cases and efflorescence on museum objects exhibited at Smithsonian’s National Museum of the American Indian-New York. Herit. Sci. 2020, 8, 115. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. arXiv 2011, arXiv:1106.1813. [Google Scholar] [CrossRef]
- Curtis, M.; Keelor, J.D.; Jones, C.M.; Pittman, J.J.; Jones, P.R.; Sparkman, O.D.; Fernández, F.M. Schlieren visualization of fluid dynamics effects in direct analysis in real time mass spectrometry. Rapid Commun. Mass Spectrom. 2015, 29, 431–439. [Google Scholar] [CrossRef]
- Lemaître, G.; Nogueira, F.; char, C.K.A. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 559–563. [Google Scholar]
Classifier | Dataset | Accuracy | F-Score | Area under the ROC Curve (AUC) | Class Bacteria | Class Fungi | ||
---|---|---|---|---|---|---|---|---|
Precision | Sensitivity | Precision | Sensitivity | |||||
Logistic Regression | Binary Features | 0.846 | 0.748 | 0.865 | 0.903 | 0.899 | 0.639 | 0.667 |
PCA Features | 0.846 | 0.843 | 0.775 | 0.853 | 0.970 | 0.833 | 0.444 | |
Logistic Regression with Lasso | Binary Features | 0.795 | 0.753 | 0.921 | 0.928 | 0.870 | 0.633 | 0.722 |
PCA Features | 0.795 | 0.742 | 0.827 | 0.895 | 0.870 | 0.633 | 0.639 | |
K-Nearest Neighbors | Binary Features | 0.821 | 0.782 | 0.743 | 0.886 | 0.903 | 0.700 | 0.583 |
PCA Features | 0.820 | 0.812 | 0.752 | 0.886 | 0.936 | 0.750 | 0.583 | |
Support Vector Machines | Binary Features | 0.795 | 0.657 | 0.805 | 0.870 | 0.862 | 0.528 | 0.583 |
PCA Features | 0.821 | 0.670 | 0.734 | 0.842 | 0.899 | 0.556 | 0.444 | |
Random Forest Classifier | Binary Features | 0.872 | 0.937 | 0.951 | 0.881 | 0.982 | 0.980 | 0.555 |
PCA Features | 0.744 | 0.570 | 0.779 | 0.794 | 0.903 | 0.444 | 0.194 |
Pathogen | Strain |
---|---|
Staphylococcus aureus (LAC) | |
Staphylococcus aureus (UAMS-1) | |
Staphylococcus epidermidis (NRS-101) | |
Acinetobacter baumannii (CDC-0033) | |
Bacteria | Klebsiella pneumoniae (CDC-0004) |
Pseudomonas aeruginosa (PA01) | |
Klebsiella aerogenes (NR-48555) | |
Enterococcus faecium ( HM-959 ) | |
Escherichia coli (CDC-0346) | |
Proteus mirabilis (CDC-0029) | |
Candida albicans ( NR-29340 ) | |
Fungi | Candida glabrata (CDC-0314) |
Malassezia fufur (ATCC-12078) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arora, M.; Zambrzycki, S.C.; Levy, J.M.; Esper, A.; Frediani, J.K.; Quave, C.L.; Fernández, F.M.; Kamaleswaran, R. Machine Learning Approaches to Identify Discriminative Signatures of Volatile Organic Compounds (VOCs) from Bacteria and Fungi Using SPME-DART-MS. Metabolites 2022, 12, 232. https://doi.org/10.3390/metabo12030232
Arora M, Zambrzycki SC, Levy JM, Esper A, Frediani JK, Quave CL, Fernández FM, Kamaleswaran R. Machine Learning Approaches to Identify Discriminative Signatures of Volatile Organic Compounds (VOCs) from Bacteria and Fungi Using SPME-DART-MS. Metabolites. 2022; 12(3):232. https://doi.org/10.3390/metabo12030232
Chicago/Turabian StyleArora, Mehak, Stephen C. Zambrzycki, Joshua M. Levy, Annette Esper, Jennifer K. Frediani, Cassandra L. Quave, Facundo M. Fernández, and Rishikesan Kamaleswaran. 2022. "Machine Learning Approaches to Identify Discriminative Signatures of Volatile Organic Compounds (VOCs) from Bacteria and Fungi Using SPME-DART-MS" Metabolites 12, no. 3: 232. https://doi.org/10.3390/metabo12030232
APA StyleArora, M., Zambrzycki, S. C., Levy, J. M., Esper, A., Frediani, J. K., Quave, C. L., Fernández, F. M., & Kamaleswaran, R. (2022). Machine Learning Approaches to Identify Discriminative Signatures of Volatile Organic Compounds (VOCs) from Bacteria and Fungi Using SPME-DART-MS. Metabolites, 12(3), 232. https://doi.org/10.3390/metabo12030232