Bioinformatics and Data Analysis

A special issue of Metabolites (ISSN 2218-1989). This special issue belongs to the section "Bioinformatics and Data Analysis".

Deadline for manuscript submissions: closed (31 August 2016) | Viewed by 53938

Special Issue Editor

SRI International, Menlo Park, CA 94025, USA
Interests: pathway bioinformatics; metabolic modeling; genome informatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Bioinformatics analysis methods for metabolomics data have undergone considerable improvements in the last decade, and these methods have a strong effect on both the speed and the accuracy of metabolomics studies. Still, it seems unlikely that metabolomics investigations are extracting all potential knowledge from their collected data, and the development of improved bioinformatics methods for analyzing metabolomics data are needed. This Special Issue is devoted to computational techniques for analyzing metabolomics data. Topics that will be covered by this Special Issue will include (not exclusively): statistical methods for analyzing metabolomics samples, metabolite structure identification, visualization of metabolomics data, pathway-based data analysis, metabolomics and metabolic modeling, and metabolomics-related databases.

Dr. Peter D. Karp
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Metabolites is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • metabolomics data analysis
  • metabolite identification
  • computational metabolomics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

2087 KiB  
Article
A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
by Fidele Tugizimana, Paul A. Steenkamp, Lizelle A. Piater and Ian A. Dubery
Metabolites 2016, 6(4), 40; https://doi.org/10.3390/metabo6040040 - 3 Nov 2016
Cited by 59 | Viewed by 11741
Abstract
Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A [...] Read more.
Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynxTM software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Graphical abstract

10601 KiB  
Article
MetMatch: A Semi-Automated Software Tool for the Comparison and Alignment of LC-HRMS Data from Different Metabolomics Experiments
by Stefan Koch, Christoph Bueschl, Maria Doppler, Alexandra Simader, Jacqueline Meng-Reiterer, Marc Lemmens and Rainer Schuhmacher
Metabolites 2016, 6(4), 39; https://doi.org/10.3390/metabo6040039 - 2 Nov 2016
Cited by 6 | Viewed by 5501
Abstract
Due to its unsurpassed sensitivity and selectivity, LC-HRMS is one of the major analytical techniques in metabolomics research. However, limited stability of experimental and instrument parameters may cause shifts and drifts of retention time and mass accuracy or the formation of different ion [...] Read more.
Due to its unsurpassed sensitivity and selectivity, LC-HRMS is one of the major analytical techniques in metabolomics research. However, limited stability of experimental and instrument parameters may cause shifts and drifts of retention time and mass accuracy or the formation of different ion species, thus complicating conclusive interpretation of the raw data, especially when generated in different analytical batches. Here, a novel software tool for the semi-automated alignment of different measurement sequences is presented. The tool is implemented in the Java programming language, it features an intuitive user interface and its main goal is to facilitate the comparison of data obtained from different metabolomics experiments. Based on a feature list (i.e., processed LC-HRMS chromatograms with mass-to-charge ratio (m/z) values and retention times) that serves as a reference, the tool recognizes both m/z and retention time shifts of single or multiple analytical datafiles/batches of interest. MetMatch is also designed to account for differently formed ion species of detected metabolites. Corresponding ions and metabolites are matched and chromatographic peak areas, m/z values and retention times are combined into a single data matrix. The convenient user interface allows for easy manipulation of processing results and graphical illustration of the raw data as well as the automatically matched ions and metabolites. The software tool is exemplified with LC-HRMS data from untargeted metabolomics experiments investigating phenylalanine-derived metabolites in wheat and T-2 toxin/HT-2 toxin detoxification products in barley. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Figure 1

1163 KiB  
Article
Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
by Yun Xu, Howbeer Muhamadali, Ali Sayqal, Neil Dixon and Royston Goodacre
Metabolites 2016, 6(4), 38; https://doi.org/10.3390/metabo6040038 - 28 Oct 2016
Cited by 10 | Viewed by 6373
Abstract
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate [...] Read more.
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Figure 1

4611 KiB  
Article
Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data
by Hendrik Treutler and Steffen Neumann
Metabolites 2016, 6(4), 37; https://doi.org/10.3390/metabo6040037 - 20 Oct 2016
Cited by 17 | Viewed by 7171
Abstract
Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we [...] Read more.
Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92 % of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Figure 1

13302 KiB  
Article
Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics
by Lochana C. Menikarachchi, Ritvik Dubey, Dennis W. Hill, Daniel N. Brush and David F. Grant
Metabolites 2016, 6(2), 17; https://doi.org/10.3390/metabo6020017 - 31 May 2016
Cited by 5 | Viewed by 5910
Abstract
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these [...] Read more.
Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generators provides a potential option. Here we evaluated structure generation constraints based on the specification of: (1) substructures required (i.e., seed structures); (2) substructures not allowed; and (3) filters to remove incorrect structures. Our approach (database assisted structure identification, DASI) used predictive models in MolFind to find candidate structures with chemical and physical properties similar to the unknown. These candidates were then used for seed structure generation using eight different structure generation algorithms. One algorithm was able to generate correct seed structures for 21/39 test compounds. Eleven of these seed structures were large enough to constrain the combinatorial structure generator to fewer than 100,000 structures. In 35/39 cases, at least one algorithm was able to generate a correct seed structure. The DASI method has several limitations and will require further experimental validation and optimization. At present, it seems most useful for identifying the structure of unknown-unknowns with molecular weights <200 Da. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Graphical abstract

3088 KiB  
Article
Analysis of Metabolomics Datasets with High-Performance Computing and Metabolite Atlases
by Yushu Yao, Terence Sun, Tony Wang, Oliver Ruebel, Trent Northen and Benjamin P. Bowen
Metabolites 2015, 5(3), 431-442; https://doi.org/10.3390/metabo5030431 - 20 Jul 2015
Cited by 41 | Viewed by 8482
Abstract
Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of [...] Read more.
Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of information. Here, we describe the Metabolite Atlas framework and interface that provides highly-efficient, web-based access to raw mass spectrometry data in concert with assertions about chemicals detected to help address some of these challenges. This integration, by design, enables experimentalists to explore their raw data, specify and refine features annotations such that they can be leveraged for future experiments. Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. By using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources. In addition, the interfaces facilitate integration with systems biology tools to ultimately link metabolomics data with biological models. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Figure 1

1396 KiB  
Article
Computational Metabolomics Operations at BioCyc.org
by Peter D. Karp, Richard Billington, Timothy A. Holland, Anamika Kothari, Markus Krummenacker, Daniel Weaver, Mario Latendresse and Suzanne Paley
Metabolites 2015, 5(2), 291-310; https://doi.org/10.3390/metabo5020291 - 22 May 2015
Cited by 22 | Viewed by 7692
Abstract
BioCyc.org is a genome and metabolic pathway web portal covering 5500 organisms, including Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli. These organism-specific databases have undergone variable degrees of curation. The EcoCyc (Escherichia coli Encyclopedia) database is the most highly [...] Read more.
BioCyc.org is a genome and metabolic pathway web portal covering 5500 organisms, including Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Escherichia coli. These organism-specific databases have undergone variable degrees of curation. The EcoCyc (Escherichia coli Encyclopedia) database is the most highly curated; its contents have been derived from 27,000 publications. The MetaCyc (Metabolic Encyclopedia) database within BioCyc is a “universal” metabolic database that describes pathways, reactions, enzymes and metabolites from all domains of life. Metabolic pathways provide an organizing framework for analyzing metabolomics data, and the BioCyc website provides computational operations for metabolomics data that include metabolite search and translation of metabolite identifiers across multiple metabolite databases. The site allows researchers to store and manipulate metabolite lists using a facility called SmartTables, which supports metabolite enrichment analysis. That analysis operation identifies metabolite sets that are statistically over-represented for the substrates of specific metabolic pathways. BioCyc also enables visualization of metabolomics data on individual pathway diagrams and on the organism-specific metabolic map diagrams that are available for every BioCyc organism. Most of these operations are available both interactively and as programmatic web services. Full article
(This article belongs to the Special Issue Bioinformatics and Data Analysis)
Show Figures

Back to TopTop