Integrating Metabolomics and Machine Learning for Advanced Chemical Detection
Abstract
1. Introduction
2. Metabolomics Data Characteristics and Analytical Platforms
2.1. Chemical Diversity and Structural Complexity
2.2. Dynamic Range and Quantitative Variability
2.3. Missing Data and Sparsity
2.4. Technical Variability and Batch Effects
2.5. Noise, Signal Overlap, and Data Preprocessing
2.6. Analytical Platforms for Metabolomics
2.6.1. Mass Spectrometry (MS)-Based Platforms
2.6.2. Nuclear Magnetic Resonance (NMR) Spectroscopy
2.6.3. Ion Mobility Spectroscopy (IMS) and Emerging Technologies
3. Machine Learning Strategies in Metabolomics
3.1. Unsupervised Learning
3.2. Supervised Learning
3.3. Deep Learning Approaches
3.4. Feature Selection and Model Interpretation
3.5. Critical Comparison of Machine Learning Approaches in Metabolomics
4. Applications in Advanced Chemical Detection
4.1. Biomedical Diagnostics
4.2. Environmental and Toxicological Analysis
4.3. Food Authenticity and Safety
4.4. Drug Discovery and Precision Medicine
4.5. Sensor-Based and Portable Chemical Detection
4.6. Critical Appraisal of Evidence Quality and Translational Readiness
5. Challenges and Limitations
6. Future Perspectives and Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ANN | Artificial Neural Networks |
| CE–MS | Capillary Electrophoresis–Mass Spectrometry |
| CNN | Convolutional Neural Networks |
| DL | Deep Learning |
| GC–MS | Gas Chromatography–Mass Spectrometry |
| IM | Ion Mobility |
| LASSO | Least Absolute Shrinkage and Selection Operator |
| LC-MS | Liquid Chromatography–Mass Spectrometry |
| LIME | Local Interpretable Model-Agnostic Explanations |
| MAR | Missing at Random |
| MCAR | Missing Completely at Random |
| ML | Machine Learning |
| MNAR | Missing Not at Random |
| NMR | Nuclear Magnetic Resonance Spectroscopy |
| PLS-DA | Partial Least Squares Discriminant Analysis |
| QC | Quality Control |
| RF | Random Forest |
| SHAP | Shapley Additive Explanations |
| SMLM | Supervised Machine Learning Methods |
| SVM | Support Vector Machines |
| UMLM | Unsupervised Machine Learning Methods |
| XAI | Explainable Artificial Intelligence |
References
- Muthubharathi, B.C.; Gowripriya, T.; Balamurugan, K. Metabolomics: Small molecules that matter more. Mol. Omics 2021, 17, 210–229. [Google Scholar] [CrossRef] [PubMed]
- Fraga-Corral, M.; Carpena, M.; Garcia-Oliveira, P.; Pereira, A.; Prieto, M.; Simal-Gandara, J. Analytical metabolomics and applications in health, environmental and food science. Crit. Rev. Anal. Chem. 2022, 52, 712–734. [Google Scholar] [CrossRef] [PubMed]
- Wolfender, J.-L.; Gaudry, A.; Rutz, A.; Quiros-Guerrero, L.-M.; Nothias, L.-F.; Queiroz, E.F.; Defossez, E.; Allard, P.-M. Metabolomics in ecology and bioactive natural products discovery: Challenges and prospects for a comprehensive study of the specialised metabolome. Chimia 2022, 76, 954–963. [Google Scholar] [CrossRef]
- Picone, G. The Application of NMR-Based Metabolomics in the Field of Nutritional Studies. Encyclopedia 2025, 5, 174. [Google Scholar] [CrossRef]
- Ciampa, A.; Danesi, F.; Picone, G. NMR-based metabolomics for a more holistic and sustainable research in food quality assessment: A narrative review. Appl. Sci. 2023, 13, 372. [Google Scholar] [CrossRef]
- Mattoli, L.; Gianni, M.; Burico, M. Mass spectrometry-based metabolomic analysis as a tool for quality control of natural complex products. Mass Spectrom. Rev. 2023, 42, 1358–1396. [Google Scholar] [CrossRef]
- Ghosh, P.; Nandi, A.; Ghosh, M. Advanced Metabolomics Techniques: NMR-Based Profiling, GC–MS, AI-Driven Compound Identification. In Botanical Extracts; CRC Press: Boca Raton, FL, USA, 2026; pp. 158–168. [Google Scholar]
- Syed, M.; Gupta, A.; Narad, P.; Sengupta, A. Feature Extraction and Selection Methods and Bioinformatics Approach on Omics Data to Identify Molecular Signatures for Specific Diseases. In Feature Selection and Feature Extraction on Omics Data; Chapman and Hall: London, UK; CRC: Boca Raton, FL, USA, 2026; pp. 147–193. [Google Scholar]
- Savitha, S.; Keerthana, R.; Logeswaran, K.; Keerthika, P.; Sharmila, V.; Sangeetha, M. Integration of multi-omics data: Genomics, proteomics, metabolomics. In Harnessing AI and Machine Learning for Precision Wellness; IGI Global Scientific Publishing: Palmdale, PA, USA, 2025; pp. 149–184. [Google Scholar]
- Dimopoulou, M.; Stagos, D.; Gortzi, O. Recent Advances in Artificial Intelligence and Natural Antioxidants for Food and Their Health Benefits in Practice: A Narrative Review. Appl. Sci. 2025, 16, 284. [Google Scholar] [CrossRef]
- Pirooznia, M.; Vanoni, M.; Balan, J.; Moustafa, A.; Galal, A.; Talal, M.; Moustafa, A. Applications of machine learning. Insights Comput. Genom. 2022, 2023, 138. [Google Scholar] [CrossRef]
- Galal, A.; Talal, M.; Moustafa, A. Applications of machine learning in metabolomics: Disease modeling and classification. Front. Genet. 2022, 13, 1017340. [Google Scholar] [CrossRef]
- Dhall, D.; Kaur, R.; Juneja, M. Machine learning: A review of the algorithms and its applications. In Proceedings of ICRIC 2019: Recent Innovations in Computing; Springer: Cham, Switzerland, 2019; pp. 47–63. [Google Scholar]
- Feng, Y.; Chen, C. Progress in Machine Learning-Assisted Biosensors for Alzheimer’s Disease. Biosensors 2026, 16, 161. [Google Scholar] [CrossRef]
- Feng, Y.; La, M. Overview in Machine-Learning-Assisted Sensing Techniques for Monitoring COVID-19. Micromachines 2026, 17, 283. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Z.; Tian, D.; Yang, Y.; Cui, H.; Li, Y.; Ren, S.; Han, T.; Gao, Z. Machine learning assisted biosensing technology: An emerging powerful tool for improving the intelligence of food safety detection. Curr. Res. Food Sci. 2024, 8, 100679. [Google Scholar] [CrossRef]
- Cui, F.; Yue, Y.; Zhang, Y.; Zhang, Z.; Zhou, H.S. Advancing biosensors with machine learning. ACS Sens. 2020, 5, 3346–3364. [Google Scholar] [CrossRef]
- Nicholson, J.K.; Lindon, J.C.; Holmes, E. ‘Metabonomics’: Understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 1999, 29, 1181–1189. [Google Scholar] [CrossRef]
- Patti, G.J.; Yanes, O.; Siuzdak, G. Metabolomics: The apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 2012, 13, 263–269. [Google Scholar] [CrossRef]
- Johnson, C.H.; Ivanisevic, J.; Siuzdak, G. Metabolomics: Beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 2016, 17, 451–459. [Google Scholar] [CrossRef]
- Wishart, D.S. Metabolomics: Applications to food science and nutrition research. Trends Food Sci. Technol. 2008, 19, 482–493. [Google Scholar] [CrossRef]
- Scalbert, A.; Brennan, L.; Fiehn, O.; Hankemeier, T.; Kristal, B.S.; van Ommen, B.; Pujos-Guillot, E.; Verheij, E.; Wishart, D.; Wopereis, S. Mass-spectrometry-based metabolomics: Limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics 2009, 5, 435–458. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Hankemeier, T.; Ramautar, R. Next-generation capillary electrophoresis–mass spectrometry approaches in metabolomics. Curr. Opin. Biotechnol. 2017, 43, 1–7. [Google Scholar] [CrossRef]
- Rohloff, J. Analysis of phenolic and cyclic compounds in plants using derivatization techniques in combination with GC-MS-based metabolite profiling. Molecules 2015, 20, 3431–3462. [Google Scholar] [CrossRef] [PubMed]
- Dettmer, K.; Aronov, P.A.; Hammock, B.D. Mass spectrometry-based metabolomics. Mass Spectrom. Rev. 2007, 26, 51–78. [Google Scholar] [CrossRef] [PubMed]
- Harrieder, E.-M.; Kretschmer, F.; Böcker, S.; Witting, M. Current state-of-the-art of separation methods used in LC-MS based metabolomics and lipidomics. J. Chromatogr. B 2022, 1188, 123069. [Google Scholar] [CrossRef]
- Psychogios, N.; Hau, D.D.; Peng, J.; Guo, A.C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B. The human serum metabolome. PLoS ONE 2011, 6, e16957. [Google Scholar] [CrossRef] [PubMed]
- Dieterle, F.; Ross, A.; Schlotterbeck, G.; Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 2006, 78, 4281–4290. [Google Scholar] [CrossRef]
- Wei, R.; Wang, J.; Su, M.; Jia, E.; Chen, S.; Chen, T.; Ni, Y. Missing value imputation approach for mass spectrometry-based metabolomics data. Sci. Rep. 2018, 8, 663. [Google Scholar] [CrossRef]
- Davis, T.J.; Firzli, T.R.; Higgins Keppler, E.A.; Richardson, M.; Bean, H.D. Addressing missing data in GC× GC metabolomics: Identifying missingness type and evaluating the impact of imputation methods on experimental replication. Anal. Chem. 2022, 94, 10912–10920. [Google Scholar] [CrossRef]
- Karaki, D. Sparse Non-Negative Matrix Factorization for the Processing of Mass Spectrometry Data in Metabolomics. Ph.D. Thesis, Université Paris-Saclay, Orsay, France, 2026. [Google Scholar]
- Kokla, M.; Virtanen, J.; Kolehmainen, M.; Paananen, J.; Hanhineva, K. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study. BMC Bioinform. 2019, 20, 492. [Google Scholar] [CrossRef]
- Hajnajafi, K.; Iqbal, M.A. Mass-spectrometry based metabolomics: An overview of workflows, strategies, data analysis and applications. Proteome Sci. 2025, 23, 5. [Google Scholar] [CrossRef] [PubMed]
- Drevet Mulard, E.; Gilard, V.; Balayssac, S.; Rautureau, G.J. Quantitative nuclear magnetic resonance for small biological molecules in complex mixtures: Practical guidelines and key considerations for non-specialists. Molecules 2025, 30, 1838. [Google Scholar] [CrossRef]
- Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J.D.; Halsall, A.; Haselden, J.N. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 2011, 6, 1060–1083. [Google Scholar] [CrossRef] [PubMed]
- Han, W.; Li, L. Evaluating and minimizing batch effects in metabolomics. Mass Spectrom. Rev. 2022, 41, 421–442. [Google Scholar] [CrossRef]
- Liu, Q.; Walker, D.; Uppal, K.; Liu, Z.; Ma, C.; Tran, V.; Li, S.; Jones, D.P.; Yu, T. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci. Rep. 2020, 10, 13856. [Google Scholar] [CrossRef]
- Wehrens, R.; Hageman, J.A.; van Eeuwijk, F.; Kooke, R.; Flood, P.J.; Wijnker, E.; Keurentjes, J.J.; Lommen, A.; van Eekelen, H.D.; Hall, R.D. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 2016, 12, 88. [Google Scholar] [CrossRef] [PubMed]
- Johnson, W.E.; Li, C.; Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007, 8, 118–127. [Google Scholar] [CrossRef] [PubMed]
- Bier, M.; Weaver, J.C. Signals, noise, and thresholds. In Bioengineering and Biophysical Aspects of Electromagnetic Fields, 4th ed.; CRC Press: Boca Raton, FL, USA, 2018; pp. 261–297. [Google Scholar]
- Fu, G.-H.; Wu, Y.-J.; Zong, M.-J.; Yi, L.-Z. Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics. Chemom. Intell. Lab. Syst. 2020, 196, 103906. [Google Scholar] [CrossRef]
- García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big data preprocessing: Methods and prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef]
- Yan, C. A review on spectral data preprocessing techniques for machine learning and quantitative analysis. iScience 2025, 28, 112759. [Google Scholar] [CrossRef]
- Picone, G.; Mengucci, C.; Capozzi, F. The NMR added value to the green foodomics perspective: Advances by machine learning to the holistic view on food and nutrition. Magn. Reson. Chem. 2022, 60, 590–596. [Google Scholar] [CrossRef]
- Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 2006, 78, 779–787. [Google Scholar] [CrossRef]
- Melnikov, A.D.; Tsentalovich, Y.P.; Yanshole, V.V. Deep learning for the precise peak detection in high-resolution LC–MS data. Anal. Chem. 2019, 92, 588–592. [Google Scholar] [CrossRef] [PubMed]
- Liebal, U.W.; Phan, A.N.; Sudhakar, M.; Raman, K.; Blank, L.M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites 2020, 10, 243. [Google Scholar] [CrossRef] [PubMed]
- Elguoshy, A.; Zedan, H.; Saito, S. Machine learning-driven insights in cancer metabolomics: From subtyping to biomarker discovery and prognostic modeling. Metabolites 2025, 15, 514. [Google Scholar] [CrossRef]
- Broadhurst, D.I.; Kell, D.B. Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2006, 2, 171–196. [Google Scholar] [CrossRef]
- Saccenti, E.; Hoefsloot, H.C.; Smilde, A.K.; Westerhuis, J.A.; Hendriks, M.M. Reflections on univariate and multivariate analysis of metabolomics data. Metabolomics 2014, 10, 361–374. [Google Scholar] [CrossRef]
- Cannataro, M.; Guzzi, P.H.; Agapito, G.; Zucco, C.; Milano, M. Artificial Intelligence in Bioinformatics: From Omics Analysis to Deep Learning and Network Mining; Elsevier: Amsterdam, The Netherlands, 2022. [Google Scholar]
- Gromski, P.S.; Muhamadali, H.; Ellis, D.I.; Xu, Y.; Correa, E.; Turner, M.L.; Goodacre, R. A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Anal. Chim. Acta 2015, 879, 10–23. [Google Scholar] [CrossRef]
- Stanimirova, I.; Daszykowski, M. Exploratory analysis of metabolomic data. In Comprehensive Analytical Chemistry; Elsevier: Amsterdam, The Netherlands, 2018; Volume 82, pp. 227–264. [Google Scholar]
- Ren, S.; Hinzman, A.A.; Kang, E.L.; Szczesniak, R.D.; Lu, L.J. Computational and statistical analysis of metabolomics data. Metabolomics 2015, 11, 1492–1513. [Google Scholar] [CrossRef]
- Nyamundanda, G.; Brennan, L.; Gormley, I.C. Probabilistic principal component analysis for metabolomic data. BMC Bioinform. 2010, 11, 571. [Google Scholar] [CrossRef]
- Picone, G.; Engelsen, S.B.; Savorani, F.; Testi, S.; Badiani, A.; Capozzi, F. Metabolomics as a powerful tool for molecular quality assessment of the fish Sparus aurata. Nutrients 2011, 3, 212–227. [Google Scholar] [CrossRef]
- Picone, G.; Mezzetti, B.; Babini, E.; Capocasa, F.; Placucci, G.; Capozzi, F. Unsupervised principal component analysis of NMR metabolic profiles for the assessment of substantial equivalence of transgenic grapes (Vitis vinifera). J. Agric. Food Chem. 2011, 59, 9271–9279. [Google Scholar] [CrossRef]
- Antonelli, J.; Claggett, B.L.; Henglin, M.; Kim, A.; Ovsak, G.; Kim, N.; Deng, K.; Rao, K.; Tyagi, O.; Watrous, J.D. Statistical workflow for feature selection in human metabolomics data. Metabolites 2019, 9, 143. [Google Scholar] [CrossRef]
- Čuperlović-Culf, M.; Belacel, N.; Culf, A.S.; Chute, I.C.; Ouellette, R.J.; Burton, I.W.; Karakach, T.K.; Walter, J.A. NMR metabolic analysis of samples using fuzzy K-means clustering. Magn. Reson. Chem. 2009, 47, S96–S104. [Google Scholar] [CrossRef] [PubMed]
- Chaudhry, M.; Shafi, I.; Mahnoor, M.; Vargas, D.L.R.; Thompson, E.B.; Ashraf, I. A systematic literature review on identifying patterns using unsupervised clustering algorithms: A data mining perspective. Symmetry 2023, 15, 1679. [Google Scholar] [CrossRef]
- Salman, H.A.; Kalakech, A.; Steiti, A. Random forest algorithm overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
- Ivanciuc, O. Applications of support vector machines in chemistry. Rev. Comput. Chem. 2007, 23, 291. [Google Scholar]
- Trotter, M.W.B. Support Vector Machines for Drug Discovery; University of London: London, UK; University College London (United Kingdom): London, UK, 2006. [Google Scholar]
- Khan, M.F. Artificial Intelligence (AI) Strategies for Metabolite Identification Based on Tandem Mass Spectrometry Data. Available online: https://hdl.handle.net/10803/695639 (accessed on 30 March 2026).
- Lee, L.C.; Liong, C.-Y.; Jemain, A.A. Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: A review of contemporary practice strategies and knowledge gaps. Analyst 2018, 143, 3526–3539. [Google Scholar] [CrossRef]
- Blasco, H.; Błaszczyński, J.; Billaut, J.-C.; Nadal-Desbarats, L.; Pradat, P.-F.; Devos, D.; Moreau, C.; Andres, C.R.; Emond, P.; Corcia, P. Comparative analysis of targeted metabolomics: Dominance-based rough set approach versus orthogonal partial least square-discriminant analysis. J. Biomed. Inform. 2015, 53, 291–299. [Google Scholar] [CrossRef]
- Ghosh, T.; Zhang, W.; Ghosh, D.; Kechris, K. Predictive modeling for metabolomics data. In Computational Methods and Data Analysis for Metabolomics; Springer: New York, NY, USA, 2020; pp. 313–336. [Google Scholar]
- Sen, P.; Lamichhane, S.; Mathema, V.B.; McGlinchey, A.; Dickens, A.M.; Khoomrung, S.; Orešič, M. Deep learning meets metabolomics: A methodological perspective. Brief. Bioinform. 2021, 22, 1531–1542. [Google Scholar] [CrossRef]
- Sewak, M.; Sahay, S.K.; Rathore, H. An overview of deep learning architecture of deep neural networks and autoencoders. J. Comput. Theor. Nanosci. 2020, 17, 182–188. [Google Scholar] [CrossRef]
- Zhan, H.; Huang, Y.; Chen, Z. Recent progress in artificial intelligence enabled NMR spectroscopy: Methodologies, implementations, quality assessments, and prospects. Appl. Phys. Rev. 2026, 13, 011322. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Goodacre, R.; Broadhurst, D.; Smilde, A.K.; Kristal, B.S.; Baker, J.D.; Beger, R.; Bessant, C.; Connor, S.; Capuani, G.; Craig, A. Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 2007, 3, 231–241. [Google Scholar] [CrossRef]
- Picone, G. The 1H HR-NMR Methods for the Evaluation of the Stability, Quality, Authenticity, and Shelf Life of Foods. Encyclopedia 2024, 4, 1617–1628. [Google Scholar] [CrossRef]
- Trimigno, A.; Łoniewska, B.; Skonieczna-Żydecka, K.; Kaczmarczyk, M.; Łoniewski, I.; Picone, G. The application of High-Resolution Nuclear Magnetic Resonance (HR NMR) in metabolomic analyses of meconium and stool in newborns. A preliminary pilot study of MABEL project: Metabolomics approach for the assessment of Baby-Mother Enteric Microbiota Legacy. PharmaNutrition 2024, 27, 100378. [Google Scholar] [CrossRef]
- Münger, L.H.; Trimigno, A.; Picone, G.; Freiburghaus, C.; Pimentel, G.; Burton, K.J.; Pralong, F.P.; Vionnet, N.; Capozzi, F.; Badertscher, R.; et al. Identification of Urinary Food Intake Biomarkers for Milk, Cheese, and Soy-Based Drink by Untargeted GC-MS and NMR in Healthy Humans. J. Proteome Res. 2017, 16, 3321–3335. [Google Scholar] [CrossRef]
- Trimigno, A.; Münger, L.; Picone, G.; Freiburghaus, C.; Pimentel, G.; Vionnet, N.; Pralong, F.; Capozzi, F.; Badertscher, R.; Vergères, G. GC-MS Based Metabolomics and NMR Spectroscopy Investigation of Food Intake Biomarkers for Milk and Cheese in Serum of Healthy Humans. Metabolites 2018, 8, 26. [Google Scholar] [CrossRef] [PubMed]
- Nicholson, J.K.; Lindon, J.C. Metabonomics. Nature 2008, 455, 1054–1056. [Google Scholar] [CrossRef]
- Armitage, E.G.; Barbas, C. Metabolomics in cancer biomarker discovery: Current trends and future perspectives. J. Pharm. Biomed. Anal. 2014, 87, 1–11. [Google Scholar] [CrossRef]
- Trushina, E.; Mielke, M.M. Recent advances in the application of metabolomics to Alzheimer’s Disease. Biochim. Biophys. Acta (BBA)-Mol. Basis Dis. 2014, 1842, 1232–1239. [Google Scholar] [CrossRef]
- Wang, T.J.; Larson, M.G.; Vasan, R.S.; Cheng, S.; Rhee, E.P.; McCabe, E.; Lewis, G.D.; Fox, C.S.; Jacques, P.F.; Fernandez, C. Metabolite profiles and the risk of developing diabetes. Nat. Med. 2011, 17, 448–453. [Google Scholar] [CrossRef]
- Bundy, J.G.; Davey, M.P.; Viant, M.R. Environmental metabolomics: A critical review and future perspectives. Metabolomics 2009, 5, 3–21. [Google Scholar] [CrossRef]
- Prud’homme, S.M.; Hani, Y.M.I.; Cox, N.; Lippens, G.; Nuzillard, J.-M.; Geffard, A. The zebra mussel (Dreissena polymorpha) as a model organism for ecotoxicological studies: A prior 1H NMR spectrum interpretation of a whole body extract for metabolism monitoring. Metabolites 2020, 10, 256. [Google Scholar] [CrossRef]
- Koubová, A.; Van Nguyen, T.; Grabicová, K.; Burkina, V.; Aydin, F.G.; Grabic, R.; Nováková, P.; Švecová, H.; Lepič, P.; Fedorova, G. Metabolome Adaptation and Oxidative Stress Response of Common Carp (Cyprinus carpio) to Altered Water Pollution Levels. Environ. Pollut. 2022, 303, 119117. [Google Scholar] [CrossRef]
- Dunn, W.B.; Ellis, D.I. Metabolomics: Current analytical platforms and methodologies. TrAC Trends Anal. Chem. 2005, 24, 285–294. [Google Scholar]
- Cesare Marincola, F.; Palmas, C.; Lastres Couto, M.A.; Paz, I.; Cremades, J.; Pintado, J.; Bruni, L.; Picone, G. Metabolic Profile of Senegalese Sole (Solea senegalensis) Muscle: Effect of Fish–Macroalgae IMTA-RAS Aquaculture. Molecules 2025, 30, 2518. [Google Scholar] [CrossRef] [PubMed]
- Cuadros Rodríguez, L.; Jiménez Carvelo, A.M.; González Casado, A.; Bagur González, M.G. Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity—A review. Food Res. Int. 2019, 122, 25–39. [Google Scholar]
- Selamat, J.; Rozani, N.A.A.; Murugesu, S. Application of the metabolomics approach in food authentication. Molecules 2021, 26, 7565. [Google Scholar] [CrossRef]
- Laghi, L.; Picone, G.; Capozzi, F. Nuclear magnetic resonance for foodomics beyond food analysis. TrAC Trends Anal. Chem. 2014, 59, 93–102. [Google Scholar] [CrossRef]
- Trimigno, A.; Marincola, F.C.; Dellarosa, N.; Picone, G.; Laghi, L. Definition of food quality by NMR-based foodomics. Curr. Opin. Food Sci. 2015, 4, 99–104. [Google Scholar] [CrossRef]
- Picone, G.; Trimigno, A.; Tessarin, P.; Donnini, S.; Rombolà, A.D.; Capozzi, F. 1H NMR foodomics reveals that the biodynamic and the organic cultivation managements produce different grape berries (Vitis vinifera L. cv. Sangiovese). Food Chem. 2016, 213, 187–195. [Google Scholar] [CrossRef] [PubMed]
- Xue, M.; Qu, Z.; Moretti, A.; Logrieco, A.F.; Chu, H.; Zhang, Q.; Sun, C.; Ren, X.; Cui, L.; Chen, Q. Aspergillus mycotoxins: The major food contaminants. Adv. Sci. 2025, 12, 2412757. [Google Scholar] [CrossRef]
- Pinu, F.R. Metabolomics: Applications to food safety and quality research. In Microbial Metabolomics: Applications in Clinical, Environmental, and Industrial Microbiology; Springer: New York, NY, USA, 2016; pp. 225–259. [Google Scholar]
- Wishart, D.S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat. Rev. Drug Discov. 2016, 15, 473–484. [Google Scholar] [CrossRef]
- Rahman, M. Metabolomics: A Path Towards Personalized Medicine; Academic Press: Cambridge, MA, USA, 2023. [Google Scholar]
- Au, A.; Cheng, K.-K.; Wei, L.K. Metabolomics, lipidomics and pharmacometabolomics of human hypertension. Adv. Exp. Med. Biol. 2017, 956, 599–613. [Google Scholar]
- Schnackenberg, L.K.; Kaput, J.; Beger, R.D. Metabolomics: A tool for personalizing medicine? Pers. Med. 2008, 5, 495–504. [Google Scholar] [CrossRef]
- Robertson, D.G. Metabonomics in toxicology: A review. Toxicol. Sci. 2005, 85, 809–822. [Google Scholar] [CrossRef]
- Rani, S.; Saini, K.; Maity, D. Sensors in medical diagnostics. In Handbook of Carbon Sensors; CRC Press: Boca Raton, FL, USA, 2025; pp. 121–152. [Google Scholar]
- Giordano, G.F.; Ferreira, L.F.; Bezerra, Í.R.; Barbosa, J.A.; Costa, J.N.; Pimentel, G.J.; Lima, R.S. Machine learning toward high-performance electrochemical sensors. Anal. Bioanal. Chem. 2023, 415, 3683–3692. [Google Scholar] [CrossRef]
- Puthongkham, P.; Wirojsaengthong, S.; Suea-Ngam, A. Machine learning and chemometrics for electrochemical sensors: Moving forward to the future of analytical chemistry. Analyst 2021, 146, 6351–6364. [Google Scholar] [CrossRef] [PubMed]
- Uzun, S.D. Machine learning-based prediction and interpretation of electrochemical biosensor responses: A comprehensive framework. Microchem. J. 2025, 218, 115656. [Google Scholar] [CrossRef]
- Nashruddin, S.N.A.B.M.; Salleh, F.H.M.; Yunus, R.M.; Zaman, H.B. Artificial intelligence-powered electrochemical sensor: Recent advances, challenges, and prospects. Heliyon 2024, 10, e37964. [Google Scholar] [CrossRef] [PubMed]
- Kang, M.; Kim, D.; Kim, J.; Kim, N.; Lee, S. Strategies to enrich electrochemical sensing data with analytical relevance for machine learning applications: A focused review. Sensors 2024, 24, 3855. [Google Scholar] [CrossRef] [PubMed]
- Shi, H.; Yeh, J.I. Nanoelectrodes for Biomedical Applications. In Handbook of Nanobiomedical Research: Fundamentals, Applications and Recent Developments: Volume 3. Applications in Diagnostics; World Scientific Publishing: Singapore, 2014; pp. 385–412. [Google Scholar]
- Rahmani, K.; Yang, Y.; Foster, E.P.; Tsai, C.-T.; Meganathan, D.P.; Alvarez, D.D.; Gupta, A.; Cui, B.; Santoro, F.; Bloodgood, B.L. Intelligent in-cell electrophysiology: Reconstructing intracellular action potentials using a physics-informed deep learning model trained on nanoelectrode array recordings. Nat. Commun. 2025, 16, 657. [Google Scholar] [CrossRef]
- Ganesana, M.; Lee, S.T.; Wang, Y.; Venton, B.J. Analytical techniques in neuroscience: Recent advances in imaging, separation, and electrochemical methods. Anal. Chem. 2017, 89, 314–341. [Google Scholar] [CrossRef] [PubMed]
- Talukder, M.A.; Khalid, M.; Sultana, N. A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction. Sci. Rep. 2025, 15, 4617. [Google Scholar] [CrossRef]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis Second Edition Corrected Version (30 Jan 2008); Chapman and Hall: London, UK, 1995. [Google Scholar]
- Edition, S. Bayesian Data Analysis; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- Vinaixa, M.; Samino, S.; Saez, I.; Duran, J.; Guinovart, J.J.; Yanes, O. A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites 2012, 2, 775–795. [Google Scholar] [CrossRef]
- Wishart, D.S. Advances in metabolite identification. Bioanalysis 2011, 3, 1769–1782. [Google Scholar] [CrossRef]
- Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869. [Google Scholar]
- Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A perspective on explainable artificial intelligence methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
- Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: A systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inform. 2024, 11, 10. [Google Scholar] [CrossRef]
- Sumner, L.W.; Amberg, A.; Barrett, D.; Beale, M.H.; Beger, R.; Daykin, C.A.; Fan, T.W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J.L. Proposed minimum reporting standards for chemical analysis: Chemical analysis working group (CAWG) metabolomics standards initiative (MSI). Metabolomics 2007, 3, 211–221. [Google Scholar] [CrossRef]

| Method | Learning Type | Main Strengths | Main Limitations | Validation Requirements | Typical Use in Metabolomics |
|---|---|---|---|---|---|
| PCA | Unsupervised | Simple, interpretable, useful for visualization and outlier detection | Not predictive; captures variance, not necessarily class relevance | Assessment of score plots, loading plots, and technical confounders | Exploratory analysis, batch-effect inspection, quality control |
| Hierarchical clustering/k-means | Unsupervised | Identifies natural sample or metabolite groupings | Sensitive to scaling, distance metrics, and cluster-number selection | Stability analysis and biological plausibility assessment | Sample grouping, metabolite-pattern exploration |
| PLS-DA | Supervised | Interpretable, handles collinearity, widely used in metabolomics | High risk of overfitting; may generate optimistic classification results | Cross-validation, permutation testing, external validation | Classification, biomarker prioritization |
| Random Forest | Supervised | Robust to noise, captures nonlinear relationships, provides variable importance | Can overfit small datasets; variable importance may be biased | Nested cross-validation and external validation | Classification, feature ranking, biomarker discovery |
| SVM | Supervised | Effective in high-dimensional data; suitable for nonlinear classification | Requires parameter tuning; limited interpretability | Hyperparameter optimization and independent validation | Classification of complex metabolomics profiles |
| Artificial neural networks | Deep learning | Captures nonlinear interactions; flexible model structure | Requires larger datasets; black-box behavior | Large training sets, regularization, external validation | Prediction and classification in large datasets |
| CNNs | Deep learning | Effective for spectral or image-like data; automatic feature extraction | Computationally demanding; limited interpretability | Independent validation and explainability analysis | Spectral analysis, imaging metabolomics |
| Autoencoders | Deep learning/representation learning | Useful for dimensionality reduction and latent-feature extraction | Latent features may be difficult to interpret biologically | Reconstruction error assessment and downstream validation | Feature extraction, denoising, data compression |
| Application Area | Detection Target | Typical Analytical Platform | Common ML Methods | Main Advantages | Main Practical Limitations |
|---|---|---|---|---|---|
| Biomedical diagnostics | Disease-associated metabolite signatures | LC-MS, GC-MS, NMR | RF, SVM, PLS-DA, neural networks | Early detection, non-invasive biomarker discovery, patient stratification | Limited external validation, small cohorts, biological heterogeneity |
| Environmental monitoring | Pollutants, xenobiotics, exposure signatures | LC-MS, GC-MS, NMR, sensor arrays | PCA, RF, SVM, clustering | Detection of exposure-related metabolic perturbations | Matrix effects, environmental variability, lack of standardized datasets |
| Food authenticity and safety | Adulteration, geographical origin, contaminants, spoilage | NMR, LC-MS, GC-MS, electronic nose/tongue | PLS-DA, SVM, RF, hybrid PCA-ML | Rapid classification, traceability, quality control | Product variability, batch effects, calibration transfer |
| Drug discovery and precision medicine | Drug-response metabolites, toxicity markers | LC-MS, NMR, multi-omics platforms | RF, SVM, DL, feature-selection models | Mechanistic insight, toxicity prediction, patient stratification | High cost, limited cohort size, regulatory requirements |
| Sensor-based chemical detection | Single analytes, multiplexed analytes, sensor fingerprints | Biosensors, electrochemical sensors, micro/nanoelectrode arrays | PCA-ML, Bayesian models, SVM, RF, neural networks | Portable and real-time detection, high-throughput analysis | Signal drift, calibration instability, limited real-world validation |
| Application Area | Typical Evidence Strength | Common Validation Approach | Main Risk of Bias | Translational Readiness | Key Requirement for Improvement |
|---|---|---|---|---|---|
| Biomedical diagnostics | Moderate but heterogeneous | Internal cross-validation; limited external validation | Small cohorts, clinical heterogeneity, confounding factors | Medium | Larger multicenter cohorts and external validation |
| Environmental monitoring | Moderate | Laboratory-controlled validation | Matrix variability and limited field validation | Medium | Real-world environmental sampling and standardization |
| Food authenticity and safety | Moderate to high for selected products | Cross-validation and occasional external test sets | Product variability, geographical bias, batch effects | Medium-high | Interlaboratory validation and calibration transfer |
| Drug discovery and precision medicine | Exploratory to moderate | Preclinical or cohort-specific validation | Limited sample size and biological complexity | Medium | Integration with clinical endpoints and multi-omics validation |
| Sensor-based chemical detection | Exploratory to moderate | Laboratory calibration and classification testing | Sensor drift, device variability, overfitting | Low-medium | Long-term stability testing, real-sample validation, multi-device studies |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Picone, G. Integrating Metabolomics and Machine Learning for Advanced Chemical Detection. Sensors 2026, 26, 3001. https://doi.org/10.3390/s26103001
Picone G. Integrating Metabolomics and Machine Learning for Advanced Chemical Detection. Sensors. 2026; 26(10):3001. https://doi.org/10.3390/s26103001
Chicago/Turabian StylePicone, Gianfranco. 2026. "Integrating Metabolomics and Machine Learning for Advanced Chemical Detection" Sensors 26, no. 10: 3001. https://doi.org/10.3390/s26103001
APA StylePicone, G. (2026). Integrating Metabolomics and Machine Learning for Advanced Chemical Detection. Sensors, 26(10), 3001. https://doi.org/10.3390/s26103001
