Machine Learning Algorithms for Bioinformatics Problems

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Algorithms for Multidisciplinary Applications".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 12949

Special Issue Editor


E-Mail Website
Guest Editor
Department of Engineering and Natural Sciences, Technical University of Applied Sciences Wildau, 15745 Wildau, Germany
Interests: machine learning; natural language processing; genetics; bioinformatics

Special Issue Information

Dear Colleagues,

We invite you to submit articles regarding your knowledge and latest research relating to the development of machine learning algorithms, applied to bioinformatics problems, to this Special Issue, entitled “Machine Learning Algorithms for Bioinformatics Problems”.

The Special Issue will focus on algorithms in the following areas:

  • Statistical (genetic, phylogenetic, epigenomic, transcriptomic, proteomic and epidemiologic) sequence analysis;
  • The prediction of complex biochemical structures, molecular functions and disease outcomes;
  • The modeling and simulation of organic matter, compartments of organisms or of complete living systems and their populations;
  • The recognition, understanding and prediction of the behavior of living systems using machines (including human–machine interaction);
  • Physical and biological intelligence (intelligent solutions to the above problems, found in nature and not invented by humans).

We are seeking new and innovative contributions that exactly or approximately solve bioinformatics problems.

We are also looking for contributions that have high educational value to:

  • Scientists working on machine learning to understand the specific settings of biology;
  • Scientists working on bioinformatics to understand the wealth of existing expertise in the machine learning community.

High-quality papers that address both theoretical and practical topics are solicited. Submissions regarding traditional bioinformatics domains and new applications are welcome.

Prof. Dr. Peter Beyerlein
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5913 KiB  
Article
A Bioinformatics Analysis of Ovarian Cancer Data Using Machine Learning
by Vincent Schilling, Peter Beyerlein and Jeremy Chien
Algorithms 2023, 16(7), 330; https://doi.org/10.3390/a16070330 - 11 Jul 2023
Cited by 1 | Viewed by 2566
Abstract
The identification of biomarkers is crucial for cancer diagnosis, understanding the underlying biological mechanisms, and developing targeted therapies. In this study, we propose a machine learning approach to predict ovarian cancer patients’ outcomes and platinum resistance status using publicly available gene expression data. [...] Read more.
The identification of biomarkers is crucial for cancer diagnosis, understanding the underlying biological mechanisms, and developing targeted therapies. In this study, we propose a machine learning approach to predict ovarian cancer patients’ outcomes and platinum resistance status using publicly available gene expression data. Six classical machine-learning algorithms are compared on their predictive performance. Those with the highest score are analyzed by their feature importance using the SHAP algorithm. We were able to select multiple genes that correlated with the outcome and platinum resistance status of the patients and validated those using Kaplan–Meier plots. In comparison to similar approaches, the performance of the models was higher, and different genes using feature importance analysis were identified. The most promising identified genes that could be used as biomarkers are TMEFF2, ACSM3, SLC4A1, and ALDH4A1. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Bioinformatics Problems)
Show Figures

Figure 1

16 pages, 795 KiB  
Article
Entropy-Based Anomaly Detection for Gaussian Mixture Modeling
by Luca Scrucca
Algorithms 2023, 16(4), 195; https://doi.org/10.3390/a16040195 - 3 Apr 2023
Cited by 3 | Viewed by 3008
Abstract
Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to model complex distributions that may not be easily represented by a single [...] Read more.
Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to model complex distributions that may not be easily represented by a single Gaussian distribution. The Gaussian mixture model with a noise component refers to a finite mixture that includes an additional noise component to model the background noise or outliers in the data. This additional noise component helps to take into account the presence of anomalies or outliers in the data. This latter aspect is crucial for anomaly detection in situations where a clear, early warning of an abnormal condition is required. This paper proposes a novel entropy-based procedure for initializing the noise component in Gaussian mixture models. Our approach is shown to be easy to implement and effective for anomaly detection. We successfully identify anomalies in both simulated and real-world datasets, even in the presence of significant levels of noise and outliers. We provide a step-by-step description of the proposed data analysis process, along with the corresponding R code, which is publicly available in a GitHub repository. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Bioinformatics Problems)
Show Figures

Figure 1

12 pages, 469 KiB  
Article
Stochastic Safety Radius on UPGMA
by Ruriko Yoshida, Lillian Paul and Peter Nesbitt
Algorithms 2022, 15(12), 483; https://doi.org/10.3390/a15120483 - 18 Dec 2022
Viewed by 1570
Abstract
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) is one of the most popular distance-based methods to reconstruct an equidistant phylogenetic tree from a distance matrix computed from an alignment of sequences. Since we use equidistant trees as gene trees for phylogenomic analyses [...] Read more.
Unweighted Pair Group Method with Arithmetic Mean (UPGMA) is one of the most popular distance-based methods to reconstruct an equidistant phylogenetic tree from a distance matrix computed from an alignment of sequences. Since we use equidistant trees as gene trees for phylogenomic analyses under the multi-species coalescent model and since an input distance matrix computed from an alignment of each gene in a genome is estimated via the maximum likelihood estimators, it is important to conduct a robust analysis on UPGMA. Stochastic safety radius, introduced by Steel and Gascuel, provides a lower bound for the probability that a phylogenetic tree reconstruction method returns the true tree topology from a given distance matrix. In this article, we compute the stochastic safety radius of UPGMA for a phylogenetic tree with n leaves. Computational experiments show an improved gap between empirical probabilities estimated from random samples and the true tree topology from UPGMA, increasing confidence in phylogenic results. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Bioinformatics Problems)
Show Figures

Figure 1

10 pages, 274 KiB  
Article
Periodic and Non-Periodic Brainwaves Emerging via Stochastic Syncronization of Closed Loops of Firing Neurons
by Piero Mazzetti and Anna Carbone
Algorithms 2022, 15(11), 396; https://doi.org/10.3390/a15110396 (registering DOI) - 26 Oct 2022
Cited by 1 | Viewed by 1474
Abstract
Periodic and non-periodic components of electrophysiological signals are modelled in terms of syncronized sequences of closed loops of firing neurons correlated according to a Markov chain. Single closed loops of firing neurons reproduce fundamental and harmonic components, appearing as lines in the power [...] Read more.
Periodic and non-periodic components of electrophysiological signals are modelled in terms of syncronized sequences of closed loops of firing neurons correlated according to a Markov chain. Single closed loops of firing neurons reproduce fundamental and harmonic components, appearing as lines in the power spectra at frequencies ranging from 0.5 Hz to 100 Hz. Further interesting features of the brainwave signals emerge by considering multiple syncronized sequences of closed loops. In particular, we show that fluctuations in the number of syncronized loops lead to the onset of a broadband power spectral component. By the effects of these fluctuations and the emergence of a broadband component, a highly distorted waveform and nonstationarity of the signal are observed, consistent with empirical EEG and MEG signals. The amplitudes of the periodic and aperiodic components are evaluated by using typical firing neuron pulse amplitudes and durations. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Bioinformatics Problems)
16 pages, 1054 KiB  
Article
A Neural Network Approach for the Analysis of Reproducible Ribo–Seq Profiles
by Giorgia Giacomini, Caterina Graziani, Veronica Lachi, Pietro Bongini, Niccolò Pancino, Monica Bianchini, Davide Chiarugi, Angelo Valleriani and Paolo Andreini
Algorithms 2022, 15(8), 274; https://doi.org/10.3390/a15080274 - 4 Aug 2022
Cited by 3 | Viewed by 3226
Abstract
In recent years, the Ribosome profiling technique (Ribo–seq) has emerged as a powerful method for globally monitoring the translation process in vivo at single nucleotide resolution. Based on deep sequencing of mRNA fragments, Ribo–seq allows to obtain profiles that reflect the time spent [...] Read more.
In recent years, the Ribosome profiling technique (Ribo–seq) has emerged as a powerful method for globally monitoring the translation process in vivo at single nucleotide resolution. Based on deep sequencing of mRNA fragments, Ribo–seq allows to obtain profiles that reflect the time spent by ribosomes in translating each part of an open reading frame. Unfortunately, the profiles produced by this method can vary significantly in different experimental setups, being characterized by a poor reproducibility. To address this problem, we have employed a statistical method for the identification of highly reproducible Ribo–seq profiles, which was tested on a set of E. coli genes. State-of-the-art artificial neural network models have been used to validate the quality of the produced sequences. Moreover, new insights into the dynamics of ribosome translation have been provided through a statistical analysis on the obtained sequences. Full article
(This article belongs to the Special Issue Machine Learning Algorithms for Bioinformatics Problems)
Show Figures

Figure 1

Back to TopTop