Data Science in Metabolomics

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (8 December 2021) | Viewed by 23468

Special Issue Editor


E-Mail Website
Guest Editor
Karmanos Cancer Institute, School of Medicine, Wayne State University, Detroit, MI, USA
Interests: clinical trial design; survival analysis; PK/PD; metabolomics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Metabolomics produces extensive amounts of data and depends excessively on data science for inferring biological meaning. Data science is an interdisciplinary and applied field that uses techniques and theories drawn from statistics, mathematics, computer science, and information science. It enables extracting meaningful and practical insights from large-scale metabolomics data. This Special Issues is devoted to methodologies, software, tools, reviews, and case studies of data science applied to metabolomics data. Topics to be included will be, but are not limited to, data preprocessing of metabolomics data, data visualization, metabolite structure identification, compound identification, data integration, and pathway-based metabolic data analysis.

Dr. Seongho Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

2 pages, 161 KiB  
Editorial
The Intersection of Metabolomics and Data Science
by Seongho Kim
Metabolites 2023, 13(8), 915; https://doi.org/10.3390/metabo13080915 - 4 Aug 2023
Viewed by 692
Abstract
Metabolomics generates a vast amount of data and heavily relies on data science for biological interpretation [...] Full article
(This article belongs to the Special Issue Data Science in Metabolomics)

Research

Jump to: Editorial, Review

14 pages, 2889 KiB  
Article
Comparative Analysis of Binary Similarity Measures for Compound Identification in Mass Spectrometry-Based Metabolomics
by Seongho Kim, Ikuko Kato and Xiang Zhang
Metabolites 2022, 12(8), 694; https://doi.org/10.3390/metabo12080694 - 26 Jul 2022
Cited by 3 | Viewed by 1767
Abstract
Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no [...] Read more.
Compound identification is a critical step in untargeted metabolomics. Its most important procedure is to calculate the similarity between experimental mass spectra and either predicted mass spectra or mass spectra in a mass spectral library. Unlike the continuous similarity measures, there is no study to assess the performance of binary similarity measures in compound identification, even though the well-known Jaccard similarity measure has been widely used without proper evaluation. The objective of this study is thus to evaluate the performance of binary similarity measures for compound identification in untargeted metabolomics. Fifteen binary similarity measures, including the well-known Jaccard, Dice, Sokal–Sneath, Cosine, and Simpson measures, were selected to assess their performance in compound identification. using both electron ionization (EI) and electrospray ionization (ESI) mass spectra. Our theoretical evaluations show that the accuracy of the compound identification was exactly the same between the Jaccard, Dice, 3W-Jaccard, Sokal–Sneath, and Kulczynski measures, between the Cosine and Hellinger measures, and between the McConnaughey and Driver–Kroeber measures, which were practically confirmed using mass spectra libraries. From the mass spectrum-based evaluation, we observed that the best performing similarity measures were the McConnaughey and Driver–Kroeber measures for EI mass spectra and the Cosine and Hellinger measures for ESI mass spectra. The most robust similarity measure was the Fager–McGowan measure, the second-best performing similarity measure in both EI and ESI mass spectra. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

15 pages, 1564 KiB  
Article
Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data
by Mir Henglin, Brian L. Claggett, Joseph Antonelli, Mona Alotaibi, Gino Alberto Magalang, Jeramie D. Watrous, Kim A. Lagerborg, Gavin Ovsak, Gabriel Musso, Olga V. Demler, Ramachandran S. Vasan, Martin G. Larson, Mohit Jain and Susan Cheng
Metabolites 2022, 12(6), 519; https://doi.org/10.3390/metabo12060519 - 4 Jun 2022
Cited by 8 | Viewed by 2105
Abstract
Emerging technologies now allow for mass spectrometry-based profiling of thousands of small molecule metabolites (‘metabolomics’) in an increasing number of biosamples. While offering great promise for insight into the pathogenesis of human disease, standard approaches have not yet been established for statistically analyzing [...] Read more.
Emerging technologies now allow for mass spectrometry-based profiling of thousands of small molecule metabolites (‘metabolomics’) in an increasing number of biosamples. While offering great promise for insight into the pathogenesis of human disease, standard approaches have not yet been established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes, including disease outcomes. To determine optimal approaches for analysis, we formally compare traditional and newer statistical learning methods across a range of metabolomics dataset types. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observe that with an increasing number of study subjects, univariate compared to multivariate methods result in an apparently higher false discovery rate as represented by substantial correlation between metabolites directly associated with the outcome and metabolites not associated with the outcome. Although the higher frequency of such associations would not be considered false in the strict statistical sense, it may be considered biologically less informative. In scenarios wherein the number of assayed metabolites increases, as in measures of nontargeted versus targeted metabolomics, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small cohorts, sparse multivariate models exhibited the most-robust statistical power with more consistent results. These findings have important implications for metabolomics analysis in human disease. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

23 pages, 2734 KiB  
Article
Binary Simplification as an Effective Tool in Metabolomics Data Analysis
by Francisco Traquete, João Luz, Carlos Cordeiro, Marta Sousa Silva and António E. N. Ferreira
Metabolites 2021, 11(11), 788; https://doi.org/10.3390/metabo11110788 - 18 Nov 2021
Cited by 7 | Viewed by 2212
Abstract
Metabolomics aims to perform a comprehensive identification and quantification of the small molecules present in a biological system. Due to metabolite diversity in concentration, structure, and chemical characteristics, the use of high-resolution methodologies, such as mass spectrometry (MS) or nuclear magnetic resonance (NMR), [...] Read more.
Metabolomics aims to perform a comprehensive identification and quantification of the small molecules present in a biological system. Due to metabolite diversity in concentration, structure, and chemical characteristics, the use of high-resolution methodologies, such as mass spectrometry (MS) or nuclear magnetic resonance (NMR), is required. In metabolomics data analysis, suitable data pre-processing, and pre-treatment procedures are fundamental, with subsequent steps aiming at highlighting the significant biological variation between samples over background noise. Traditional data analysis focuses primarily on the comparison of the features’ intensity values. However, intensity data are highly variable between experimental batches, instruments, and pre-processing methods or parameters. The aim of this work was to develop a new pre-treatment method for MS-based metabolomics data, in the context of sample profiling and discrimination, considering only the occurrence of spectral features, encoding feature presence as 1 and absence as 0. This “Binary Simplification” encoding (BinSim) was used to transform several benchmark datasets before the application of clustering and classification methods. The performance of these methods after the BinSim pre-treatment was consistently as good as and often better than after different combinations of traditional, intensity-based, pre-treatments. Binary Simplification is, therefore, a viable pre-treatment procedure that effectively simplifies metabolomics data-analysis pipelines. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Graphical abstract

12 pages, 3395 KiB  
Article
MStractor: R Workflow Package for Enhancing Metabolomics Data Pre-Processing and Visualization
by Luca Nicolotti, Jeremy Hack, Markus Herderich and Natoiya Lloyd
Metabolites 2021, 11(8), 492; https://doi.org/10.3390/metabo11080492 - 29 Jul 2021
Cited by 2 | Viewed by 2926
Abstract
Untargeted metabolomics experiments for characterizing complex biological samples, conducted with chromatography/mass spectrometry technology, generate large datasets containing very complex and highly variable information. Many data-processing options are available, however, both commercial and open-source solutions for data processing have limitations, such as vendor platform [...] Read more.
Untargeted metabolomics experiments for characterizing complex biological samples, conducted with chromatography/mass spectrometry technology, generate large datasets containing very complex and highly variable information. Many data-processing options are available, however, both commercial and open-source solutions for data processing have limitations, such as vendor platform exclusivity and/or requiring familiarity with diverse programming languages. Data processing of untargeted metabolite data is a particular problem for laboratories that specialize in non-routine mass spectrometry analysis of diverse sample types across humans, animals, plants, fungi, and microorganisms. Here, we present MStractor, an R workflow package developed to streamline and enhance pre-processing of metabolomics mass spectrometry data and visualization. MStractor combines functions for molecular feature extraction with user-friendly dedicated GUIs for chromatographic and mass spectromerty (MS) parameter input, graphical quality-control outputs, and descriptive statistics. MStractor performance was evaluated through a detailed comparison with XCMS Online. The MStractor package is freely available on GitHub at the MetabolomicsSA repository. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

16 pages, 3407 KiB  
Article
The mwtab Python Library for RESTful Access and Enhanced Quality Control, Deposition, and Curation of the Metabolomics Workbench Data Repository
by Christian D. Powell and Hunter N.B. Moseley
Metabolites 2021, 11(3), 163; https://doi.org/10.3390/metabo11030163 - 12 Mar 2021
Cited by 8 | Viewed by 2781
Abstract
The Metabolomics Workbench (MW) is a public scientific data repository consisting of experimental data and metadata from metabolomics studies collected with mass spectroscopy (MS) and nuclear magnetic resonance (NMR) analyses. MW has been constantly evolving; updating its ‘mwTab’ text file format, adding a [...] Read more.
The Metabolomics Workbench (MW) is a public scientific data repository consisting of experimental data and metadata from metabolomics studies collected with mass spectroscopy (MS) and nuclear magnetic resonance (NMR) analyses. MW has been constantly evolving; updating its ‘mwTab’ text file format, adding a JavaScript Object Notation (JSON) file format, implementing a REpresentational State Transfer (REST) interface, and nearly quadrupling the number of datasets hosted on the repository within the last three years. In order to keep up with the quickly evolving state of the MW repository, the ‘mwtab’ Python library and package have been continuously updated to mirror the changes in the ‘mwTab’ and JSONized formats and contain many new enhancements including methods for interacting with the MW REST interface, enhanced format validation features, and advanced features for parsing and searching for specific metabolite data and metadata. We used the enhanced format validation features to evaluate all available datasets in MW to facilitate improved curation and FAIRness of the repository. The ‘mwtab’ Python package is now officially released as version 1.0.1 and is freely available on GitHub and the Python Package Index (PyPI) under a Clear Berkeley Software Distribution (BSD) license with documentation available on ReadTheDocs. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

18 pages, 4808 KiB  
Article
Development of a Microfluidic Platform for Trace Lipid Analysis
by Andrew Davic and Michael Cascio
Metabolites 2021, 11(3), 130; https://doi.org/10.3390/metabo11030130 - 24 Feb 2021
Cited by 3 | Viewed by 1518
Abstract
The inherent trace quantity of primary fatty acid amides found in biological systems presents challenges for analytical analysis and quantitation, requiring a highly sensitive detection system. The use of microfluidics provides a green sample preparation and analysis technique through small-volume fluidic flow through [...] Read more.
The inherent trace quantity of primary fatty acid amides found in biological systems presents challenges for analytical analysis and quantitation, requiring a highly sensitive detection system. The use of microfluidics provides a green sample preparation and analysis technique through small-volume fluidic flow through micron-sized channels embedded in a polydimethylsiloxane (PDMS) device. Microfluidics provides the potential of having a micro total analysis system where chromatographic separation, fluorescent tagging reactions, and detection are accomplished with no added sample handling. This study describes the development and the optimization of a microfluidic-laser induced fluorescence (LIF) analysis and detection system that can be used for the detection of ultra-trace levels of fluorescently tagged primary fatty acid amines. A PDMS microfluidic device was designed and fabricated to incorporate droplet-based flow. Droplet microfluidics have enabled on-chip fluorescent tagging reactions to be performed quickly and efficiently, with no additional sample handling. An optimized LIF optical detection system provided fluorescently tagged primary fatty acid amine detection at sub-fmol levels (436 amol). The use of this LIF detection provides unparalleled sensitivity, with detection limits several orders of magnitude lower than currently employed LC-MS techniques, and might be easily adapted for use as a complementary quantification platform for parallel MS-based omics studies. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

19 pages, 1281 KiB  
Article
Comprehensive Comparative Analysis of Local False Discovery Rate Control Methods
by Shin June Kim, Youngjae Oh and Jaesik Jeong
Metabolites 2021, 11(1), 53; https://doi.org/10.3390/metabo11010053 - 14 Jan 2021
Cited by 1 | Viewed by 2011
Abstract
Due to the advance in technology, the type of data is getting more complicated and large-scale. To analyze such complex data, more advanced technique is required. In case of omics data from two different groups, it is interesting to find significant biomarkers between [...] Read more.
Due to the advance in technology, the type of data is getting more complicated and large-scale. To analyze such complex data, more advanced technique is required. In case of omics data from two different groups, it is interesting to find significant biomarkers between two groups while controlling error rate such as false discovery rate (FDR). Over the last few decades, a lot of methods that control local false discovery rate have been developed, ranging from one-dimensional to k-dimensional FDR procedure. For comparison study, we select three of them, which have unique and significant properties: Efron’s approach, Ploner’s approach, and Kim’s approach in chronological order. The first approach is one-dimensional approach while the other two are two-dimensional ones. Furthermore, we consider two more variants of Ploner’s approach. We compare the performance of those methods on both simulated and real data. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

32 pages, 902 KiB  
Review
Mathematical Models for FDG Kinetics in Cancer: A Review
by Sara Sommariva, Giacomo Caviglia, Gianmario Sambuceti and Michele Piana
Metabolites 2021, 11(8), 519; https://doi.org/10.3390/metabo11080519 - 6 Aug 2021
Cited by 2 | Viewed by 2151
Abstract
Compartmental analysis is the mathematical framework for the modelling of tracer kinetics in dynamical Positron Emission Tomography. This paper provides a review of how compartmental models are constructed and numerically optimized. Specific focus is given on the identifiability and sensitivity issues and on [...] Read more.
Compartmental analysis is the mathematical framework for the modelling of tracer kinetics in dynamical Positron Emission Tomography. This paper provides a review of how compartmental models are constructed and numerically optimized. Specific focus is given on the identifiability and sensitivity issues and on the impact of complex physiological conditions on the mathematical properties of the models. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

17 pages, 1166 KiB  
Review
Amino Acid Metabolism in Apicomplexan Parasites
by Aarti Krishnan and Dominique Soldati-Favre
Metabolites 2021, 11(2), 61; https://doi.org/10.3390/metabo11020061 - 20 Jan 2021
Cited by 26 | Viewed by 3696
Abstract
Obligate intracellular pathogens have coevolved with their host, leading to clever strategies to access nutrients, to combat the host’s immune response, and to establish a safe niche for intracellular replication. The host, on the other hand, has also developed ways to restrict the [...] Read more.
Obligate intracellular pathogens have coevolved with their host, leading to clever strategies to access nutrients, to combat the host’s immune response, and to establish a safe niche for intracellular replication. The host, on the other hand, has also developed ways to restrict the replication of invaders by limiting access to nutrients required for pathogen survival. In this review, we describe the recent advancements in both computational methods and high-throughput –omics techniques that have been used to study and interrogate metabolic functions in the context of intracellular parasitism. Specifically, we cover the current knowledge on the presence of amino acid biosynthesis and uptake within the Apicomplexa phylum, focusing on human-infecting pathogens: Toxoplasma gondii and Plasmodium falciparum. Given the complex multi-host lifecycle of these pathogens, we hypothesize that amino acids are made, rather than acquired, depending on the host niche. We summarize the stage specificities of enzymes revealed through transcriptomics data, the relevance of amino acids for parasite pathogenesis in vivo, and the role of their transporters. Targeting one or more of these pathways may lead to a deeper understanding of the specific contributions of biosynthesis versus acquisition of amino acids and to design better intervention strategies against the apicomplexan parasites. Full article
(This article belongs to the Special Issue Data Science in Metabolomics)
Show Figures

Figure 1

Back to TopTop