Machine Learning in Metabolomics: Unlocking the Future of Data Analysis

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: 31 May 2026 | Viewed by 7151

Special Issue Editors


E-Mail Website
Guest Editor
Core Facilities, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria
Interests: metabolomics; mass spectrometry; single-cell; maternal immune activation; CSF
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Institute for Experiential AI, Northeastern University, Boston, MA 02115, USA
Interests: mass spectrometry; proteomics; metabolomics; multi-omics integration; bioinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Machine learning (ML) is transforming the landscape of metabolomics, building on its revolutionary impact in precision medicine and other omics fields. With its ability to identify patterns and relationships in complex datasets without explicit programming, ML is driving advancements in biomarker discovery, disease and patient cohort classification, chemical composition analysis, and more.

A key breakthrough lies in the application of ML to compound annotation, where it has dramatically improved the identification of unknown small molecules. By refining fingerprinting techniques and introducing innovative algorithms, researchers are expanding the capabilities of untargeted metabolomics, uncovering new insights into the vast, unexplored universe of small molecules.

Another pivotal area is biomarker discovery. ML algorithms excel at processing large datasets, identifying specific metabolic signatures that indicate diseases. These discoveries hold enormous promise for precision medicine, enabling treatments tailored to an individual’s unique metabolic profile.

Moreover, ML is a cornerstone in multiomics integration—combining data from genomics, transcriptomics, proteomics, and metabolomics. This holistic approach provides a comprehensive view of biological systems, with ML enhancing both data interpretation and analytical precision.

However, despite significant advancements in analytical platforms and software, challenges remain in data processing and integration. These hurdles underscore the importance of developing novel ML methods tailored to the unique demands of metabolomics. As the field evolves, such innovations are critical for addressing data complexity, advancing our understanding of biological systems, and driving innovative personalized medicine solutions.

Dr. Boryana Petrova
Dr. Arzu Tugce Guler
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolomics
  • MS imaging
  • machine learning
  • AI
  • multi-omics
  • integration
  • biomarker discovery
  • compound annotation
  • spectra interpretation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 4648 KB  
Article
M-GNN: A Topology-Enhanced Multi-Modal Graph Neural Network for Cancer Driver Gene Prediction
by Lu Qin, Wen Zhu, Xinyi Liao and Yujing Zhang
Metabolites 2026, 16(4), 268; https://doi.org/10.3390/metabo16040268 - 16 Apr 2026
Viewed by 532
Abstract
Background: Accurate identification of cancer driver genes is essential for understanding tumorigenesis and developing targeted therapies. Although graph neural networks (GNNs) have advanced multi-omics integration, existing methods often simply concatenate omics features and underutilize the topological information of biological networks. Methods: We propose [...] Read more.
Background: Accurate identification of cancer driver genes is essential for understanding tumorigenesis and developing targeted therapies. Although graph neural networks (GNNs) have advanced multi-omics integration, existing methods often simply concatenate omics features and underutilize the topological information of biological networks. Methods: We propose M-GNN, a multi-modal GNN framework for cancer driver gene prediction. It employs separate Graph Convolutional Network (GCN) encoders to process four types of omics data (mutation, expression, methylation, copy number variation (CNV)), each represented as a 16-dimensional vector. We incorporate knowledge distillation by using soft labels from a pre-trained teacher model to enhance feature representation. An attention mechanism adaptively fuses the encoded omics features, and a dual-path classifier combining a GCN and a Multilayer Perceptron (MLP) preserves both intrinsic gene properties and network topology. Results: Experiments on three public protein–protein interaction (PPI) networks show that M-GNN consistently achieves the highest or second-highest AUPRC compared to five state-of-the-art methods. Ablation studies confirm the contribution of each module, and biological interpretability analysis—including analysis of GO enrichment and drug sensitivity—validates the reliability of the predicted genes. Conclusions: M-GNN provides a robust and interpretable computational tool for systematic cancer driver gene identification, effectively integrating multi-omics and network data. Full article
Show Figures

Figure 1

26 pages, 2150 KB  
Article
A Stability-Oriented Biomarker Selection Framework Synergistically Driven by Robust Rank Aggregation and L1-Sparse Modeling
by Jigen Luo, Jianqiang Du, Jia He, Qiang Huang, Zixuan Liu and Gaoxiang Huang
Metabolites 2025, 15(12), 806; https://doi.org/10.3390/metabo15120806 - 18 Dec 2025
Viewed by 760
Abstract
Background: In high-dimensional, small-sample omics studies such as metabolomics, feature selection not only determines the discriminative performance of classification models but also directly affects the reproducibility and translational value of candidate biomarkers. However, most existing methods primarily optimize classification accuracy and treat [...] Read more.
Background: In high-dimensional, small-sample omics studies such as metabolomics, feature selection not only determines the discriminative performance of classification models but also directly affects the reproducibility and translational value of candidate biomarkers. However, most existing methods primarily optimize classification accuracy and treat stability as a post hoc diagnostic, leading to considerable fluctuations in selected feature sets under different data splits or mild perturbations. Methods: To address this issue, this study proposes FRL-TSFS, a feature selection framework synergistically driven by filter-based Robust Rank Aggregation and L1-sparse modeling. Five complementary filter methods—variance thresholding, chi-square test, mutual information, ANOVA F test, and ReliefF—are first applied in parallel to score features, and Robust Rank Aggregation (RRA) is then used to obtain a consensus feature ranking that is less sensitive to the bias of any single scoring criterion. An L1-regularized logistic regression model is subsequently constructed on the candidate feature subset defined by the RRA ranking to achieve task-coupled sparse selection, thereby linking feature selection stability, feature compression, and classification performance. Results: FRL-TSFS was evaluated on six representative metabolomics and gene expression datasets under a mildly perturbed scenario induced by 10-fold cross-validation, and its performance was compared with multiple baselines using the Extended Kuncheva Index (EKI), Accuracy, and F1-score. The results show that RRA substantially improves ranking stability compared with conventional aggregation strategies without degrading classification performance, while the full FRL-TSFS framework consistently attains higher EKI values than the other feature selection schemes, markedly reduces the number of selected features to several tens of metabolites or genes, and maintains competitive classification performance. Conclusions: These findings indicate that FRL-TSFS can generate compact, reproducible, and interpretable biomarker panels, providing a practical analysis framework for stability-oriented feature selection and biomarker discovery in untargeted metabolomics. Full article
Show Figures

Figure 1

18 pages, 1886 KB  
Article
Multi-Omics Feature Selection to Identify Biomarkers for Hepatocellular Carcinoma
by Rency S. Varghese, Xinran Zhang, Sarada Giridharan, Muhammad Salman Sajid, Md Mamunur Rashid, Alexander Kroemer and Habtom W. Ressom
Metabolites 2025, 15(9), 575; https://doi.org/10.3390/metabo15090575 - 28 Aug 2025
Cited by 1 | Viewed by 1946
Abstract
Introduction: Hepatocellular carcinoma (HCC), the most prevalent form of liver cancer, ranks as the third leading cause of mortality globally. Patients diagnosed with HCC exhibit a dismal prognosis mostly due to the emergence of symptoms in the advanced stages of the disease. Moreover, [...] Read more.
Introduction: Hepatocellular carcinoma (HCC), the most prevalent form of liver cancer, ranks as the third leading cause of mortality globally. Patients diagnosed with HCC exhibit a dismal prognosis mostly due to the emergence of symptoms in the advanced stages of the disease. Moreover, conventional biomarkers demonstrate insufficient efficacy in the early detection of HCC, hence highlighting the need for the identification of novel and more effective biomarkers. Methods: In this paper, we investigate methods for integration of multi-omics data we generated by both untargeted and targeted mass spectrometric analysis of serum samples from HCC cases and patients with liver cirrhosis. Specifically, the performances of several feature selection methods are evaluated on their abilities to identify a panel of multi-omics features that distinguish HCC cases from cirrhotic controls. Results: The integrative analysis identified key molecules associated with liver including such as leucine and isoleucine as well as SERPINA1, which is involved in LXR/RXR Activation and Acute Response signaling. A new method that uses recursive feature selection in conjunction with a transformer-based deep learning model as an estimator led to more promising results compared to other deep learning methods that perform disease classification and feature selection sequentially. Conclusions: The findings in this study reinforce the importance of adapting or extending deep learning models to support robust feature selection, especially for integration of multi-omics data with limited sample size to avoid the risk of overfitting and the need for evaluation of the multi-omics features discovered in this study via blood samples from a larger and independent cohort to identify robust biomarkers for HCC. Full article
Show Figures

Figure 1

12 pages, 740 KB  
Article
Deep Learning-Based Molecular Fingerprint Prediction for Metabolite Annotation
by Hoi Yan Katharine Chau, Xinran Zhang and Habtom W. Ressom
Metabolites 2025, 15(2), 132; https://doi.org/10.3390/metabo15020132 - 14 Feb 2025
Cited by 1 | Viewed by 2862
Abstract
Background/Objectives: Liquid chromatography coupled with mass spectrometry (LC-MS) is a commonly used platform for many metabolomics studies. However, metabolite annotation has been a major bottleneck in these studies in part due to the limited publicly available spectral libraries, which consist of tandem mass [...] Read more.
Background/Objectives: Liquid chromatography coupled with mass spectrometry (LC-MS) is a commonly used platform for many metabolomics studies. However, metabolite annotation has been a major bottleneck in these studies in part due to the limited publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known compounds. Application of deep learning methods is increasingly reported as an alternative to spectral matching due to their ability to map complex relationships between molecular fingerprints and mass spectrometric measurements. The objectives of this study are to investigate deep learning methods for molecular fingerprint based on MS/MS spectra and to rank putative metabolite IDs according to similarity of their known and predicted molecular fingerprints. Methods: We trained three types of deep learning methods to model the relationships between molecular fingerprints and MS/MS spectra. Prior to training, various data processing steps, including scaling, binning, and filtering, were performed on MS/MS spectra obtained from National Institute of Standards and Technology (NIST), MassBank of North America (MoNA), and Human Metabolome Database (HMDB). Furthermore, selection of the most relevant m/z bins and molecular fingerprints was conducted. The trained deep learning models were evaluated on ranking putative metabolite IDs obtained from a compound database for the challenges in Critical Assessment of Small Molecule Identification (CASMI) 2016, CASMI 2017, and CASMI 2022 benchmark datasets. Results: Feature selection methods effectively reduced redundant molecular and spectral features prior to model training. Deep learning methods trained with the truncated features have shown comparable performances against CSI:FingerID on ranking putative metabolite IDs. Conclusion: The results demonstrate a promising potential of deep learning methods for metabolite annotation. Full article
Show Figures

Figure 1

Back to TopTop