Novel Algorithms for Computational Analysis of Bioinformatics Data

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (25 November 2020) | Viewed by 12416

Special Issue Editors


E-Mail Website
Guest Editor
Georgetown University Innovation Center for Biomedical Informatics, Washington DC, WA 20007, USA
Interests: precision medicine; data science; machine learning; genotype to phenotype

E-Mail Website
Guest Editor
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Interests: translational bioinformatics; precision medicine; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Interests: medical genetics; data science; clinical implementation of genomic sequencing

Special Issue Information

Dear Colleagues,

There is no question that genomics is a Big Data science and is projected to get bigger than major generators of data such as astronomy, YouTube, and Twitter. Bioinformatics is crucial to bring context to these data to explain how life forms work. In the past decade, the notion of biological data, which encompass genomics, proteomics, metabolomics, other -omics and related phenotypic data, has shifted in magnitude, from sets of hundreds to sets of millions and even billions of entities. This exponential increase has attracted many talented scientists to develop bioinformatics tools that can help us to understand the data from a gene-centric approach to a systems level. These novel computer programs focus on DNA sequence analysis, RNA structure prediction, protein structure and function, and much more. To extract useful information from these datasets rapidly, the field of bioinformatics is increasingly relying on machine learning (ML) algorithms to conduct predictive analytics and gain greater insights into the complex biological processes. Machine learning involves programming computers to classify or predict events using example data or past experience. Machine learning includes deep learning, natural language processing, and biocuration tools that are becoming increasingly important to transform huge volumes of genomic data both from research and clinical contexts into actionable knowledge. Now is the time for coordinated community efforts that address the challenges and opportunities in bioinformatics for the next decade. In this Special Issue, we invite you to present your leading work in novel algorithms and tools for computational analysis of Bioinformatics data, thus contributing to this collection of some of the most recent advances in our field in one place.

Dr. Subha Madhavan
Dr. Marylyn D. Ritchie
Dr. Bradford Powell
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics
  • genomics
  • multi-omics
  • machine learning
  • deep learning
  • actionable genome
  • natural language processing
  • algorithms

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 3092 KiB  
Article
4SpecID: Reference DNA Libraries Auditing and Annotation System for Forensic Applications
by Luís Neto, Nádia Pinto, Alberto Proença, António Amorim and Eduardo Conde-Sousa
Genes 2021, 12(1), 61; https://doi.org/10.3390/genes12010061 - 2 Jan 2021
Cited by 5 | Viewed by 2513
Abstract
Forensic genetics is a fast-growing field that frequently requires DNA-based taxonomy, namely, when evidence are parts of specimens, often highly processed in food, potions, or ointments. Reference DNA-sequences libraries, such as BOLD or GenBank, are imperative tools for taxonomic assignment, particularly when morphology [...] Read more.
Forensic genetics is a fast-growing field that frequently requires DNA-based taxonomy, namely, when evidence are parts of specimens, often highly processed in food, potions, or ointments. Reference DNA-sequences libraries, such as BOLD or GenBank, are imperative tools for taxonomic assignment, particularly when morphology is inadequate for classification. The auditing and curation of these datasets require reliable mechanisms, preferably with automated data preprocessing. Software tools were developed to grade these datasets considering as primary criterion the number of records, which is not compliant with forensic standards, where the priority is validation from independent sources. Moreover, 4SpecID is an efficient and freely available software tool developed to audit and annotate reference libraries, specifically designed for forensic applications. Its intuitive user-friendly interface virtually accesses any database and includes specific data mining functions tuned for the widespread BOLD repositories. The built tool was evaluated in laptop MacBook and a dual-Xeon server with a large BOLD dataset (Culicidae, 36,115 records), and the best execution time to grade the dataset on the laptop was 0.28 s. Datasets of Bovidae and Felidae families were used to evaluate the quality of the tool and the relevance of independent sources validation. Full article
(This article belongs to the Special Issue Novel Algorithms for Computational Analysis of Bioinformatics Data)
Show Figures

Graphical abstract

20 pages, 2361 KiB  
Article
Parallelized Latent Dirichlet Allocation Provides a Novel Interpretability of Mutation Signatures in Cancer Genomes
by Taro Matsutani and Michiaki Hamada
Genes 2020, 11(10), 1127; https://doi.org/10.3390/genes11101127 - 25 Sep 2020
Cited by 3 | Viewed by 3311
Abstract
Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies [...] Read more.
Mutation signatures are defined as the distribution of specific mutations such as activity of AID/APOBEC family proteins. Previous studies have reported numerous signatures, using matrix factorization methods for mutation catalogs. Different mutation signatures are active in different tumor types; hence, signature activity varies greatly among tumor types and becomes sparse. Because of this, many previous methods require dividing mutation catalogs for each tumor type. Here, we propose parallelized latent Dirichlet allocation (PLDA), a novel Bayesian model to simultaneously predict mutation signatures with all mutation catalogs. PLDA is an extended model of latent Dirichlet allocation (LDA), which is one of the methods used for signature prediction. It has parallelized hyperparameters of Dirichlet distributions for LDA, and they represent the sparsity of signature activities for each tumor type, thus facilitating simultaneous analyses. First, we conducted a simulation experiment to compare PLDA with previous methods (including SigProfiler and SignatureAnalyzer) using artificial data and confirmed that PLDA could predict signature structures as accurately as previous methods without searching for the optimal hyperparameters. Next, we applied PLDA to PCAWG (Pan-Cancer Analysis of Whole Genomes) mutation catalogs and obtained a signature set different from the one predicted by SigProfiler. Further, we have shown that the mutation spectrum represented by the predicted signature with PLDA provides a novel interpretability through post-analyses. Full article
(This article belongs to the Special Issue Novel Algorithms for Computational Analysis of Bioinformatics Data)
Show Figures

Figure 1

20 pages, 4386 KiB  
Article
Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation
by Liang Chen, Yuyao Zhai, Qiuyan He, Weinan Wang and Minghua Deng
Genes 2020, 11(7), 792; https://doi.org/10.3390/genes11070792 - 14 Jul 2020
Cited by 26 | Viewed by 6056
Abstract
As single-cell RNA sequencing technologies mature, massive gene expression profiles can be obtained. Consequently, cell clustering and annotation become two crucial and fundamental procedures affecting other specific downstream analyses. Most existing single-cell RNA-seq (scRNA-seq) data clustering algorithms do not take into account the [...] Read more.
As single-cell RNA sequencing technologies mature, massive gene expression profiles can be obtained. Consequently, cell clustering and annotation become two crucial and fundamental procedures affecting other specific downstream analyses. Most existing single-cell RNA-seq (scRNA-seq) data clustering algorithms do not take into account the available cell annotation results on the same tissues or organisms from other laboratories. Nonetheless, such data could assist and guide the clustering process on the target dataset. Identifying marker genes through differential expression analysis to manually annotate large amounts of cells also costs labor and resources. Therefore, in this paper, we propose a novel end-to-end cell supervised clustering and annotation framework called scAnCluster, which fully utilizes the cell type labels available from reference data to facilitate the cell clustering and annotation on the unlabeled target data. Our algorithm integrates deep supervised learning, self-supervised learning and unsupervised learning techniques together, and it outperforms other customized scRNA-seq supervised clustering methods in both simulation and real data. It is particularly worth noting that our method performs well on the challenging task of discovering novel cell types that are absent in the reference data. Full article
(This article belongs to the Special Issue Novel Algorithms for Computational Analysis of Bioinformatics Data)
Show Figures

Figure 1

Back to TopTop