Algorithms in Computational Biology

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Analysis of Algorithms and Complexity Theory".

Deadline for manuscript submissions: closed (31 March 2021) | Viewed by 18236

Special Issue Editors


E-Mail Website
Guest Editor
CNRS, Lille, France
Interests: computational molecular biology; algorithms; genomics; metagenomics; high throughput sequencing

E-Mail Website
Guest Editor
Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada
Interests: computational molecular biology; RNA structure and function; evolution; algorithms; big data

Special Issue Information

Dear Colleagues,

The last decade has witnessed the generation of increasingly massive and complex -omics data by high-throughput technologies: genomes, transcriptomes, proteomes, metagenomes, epigenomes, with important applications in biological, environmental, and biomedical sciences. The huge amount and heterogeneity of these data have taken computational biology into a big data era, with a shift from single-level analysis to large-scale multi-omics data integration. This has led to the rise of diverse problems to store, treat, and annotate these data that require powerful algorithmic techniques to be solved efficiently in practice. The aim of this Special Issue is to present state-of-the-art algorithmic innovations that allow facing the computational bottlenecks of -omics data analysis. This includes a variety of methods, such as discrete algorithms on sequences, trees and graphs, data structures and compressed data structures, parallel computing, combinatorial and sampling approaches, heuristics and parameterized algorithms, data mining, and machine learning techniques.

We invite you to submit high-quality papers to this Special Issue on “Algorithms in Computational Biology”, with subjects covering the whole range from theory to applications. Surveys are also welcome. The following is a (non-exhaustive) list of topics of interests:

  • Genomics and pangenomics
  • Transcriptomics
  • Metagenomics
  • Epigenomics
  • Proteomics and proteogenomics
  • Sequence comparison
  • Sequence assembly
  • Structural variants
  • RNA and protein structures
  • Structural and functional annotation
  • Evolution and comparative genomics
  • Biological networks

Dr. Helene Touzet
Dr. Aïda Ouangraoua
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Genomics and pangenomics
  • Transcriptomics
  • Metagenomics
  • Epigenomics  
  • Proteomics and proteogenomics
  • Sequence comparison
  • Sequence assembly
  • Structural variants
  • RNA and protein structures
  • Structural and functional annotation
  • Evolution and comparative genomics
  • Biological networks

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

23 pages, 1463 KiB  
Article
Improved Duplication-Transfer-Loss Reconciliation with Extinct and Unsampled Lineages
by Samson Weiner and Mukul S. Bansal
Algorithms 2021, 14(8), 231; https://doi.org/10.3390/a14080231 - 05 Aug 2021
Cited by 3 | Viewed by 2905
Abstract
Duplication-Transfer-Loss (DTL) reconciliation is a widely used computational technique for understanding gene family evolution and inferring horizontal gene transfer (transfer for short) in microbes. However, most existing models and implementations of DTL reconciliation cannot account for the effect of unsampled or extinct species [...] Read more.
Duplication-Transfer-Loss (DTL) reconciliation is a widely used computational technique for understanding gene family evolution and inferring horizontal gene transfer (transfer for short) in microbes. However, most existing models and implementations of DTL reconciliation cannot account for the effect of unsampled or extinct species lineages on the evolution of gene families, likely affecting their accuracy. Accounting for the presence and possible impact of any unsampled species lineages, including those that are extinct, is especially important for inferring and studying horizontal transfer since many genes in the species lineages represented in the reconciliation analysis are likely to have been acquired through horizontal transfer from unsampled lineages. While models of DTL reconciliation that account for transfer from unsampled lineages have already been proposed, they use a relatively simple framework for transfer from unsampled lineages and cannot explicitly infer the location on the species tree of each unsampled or extinct lineage associated with an identified transfer event. Furthermore, there does not yet exist any systematic studies to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation. In this work, we address these deficiencies by (i) introducing an extended DTL reconciliation model, called the DTLx reconciliation model, that accounts for unsampled and extinct species lineages in a new, more functional manner compared to existing models, (ii) showing that optimal reconciliations under the new DTLx reconciliation model can be computed just as efficiently as under the fastest DTL reconciliation model, (iii) providing an efficient algorithm for sampling optimal DTLx reconciliations uniformly at random, (iv) performing the first systematic simulation study to assess the impact of accounting for unsampled lineages on the accuracy of DTL reconciliation, and (v) comparing the accuracies of inferring transfers from unsampled lineages under our new model and the only other previously proposed parsimony-based model for this problem. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

27 pages, 1377 KiB  
Article
Guaranteed Diversity and Optimality in Cost Function Network Based Computational Protein Design Methods
by Manon Ruffini, Jelena Vucinic, Simon de Givry, George Katsirelos, Sophie Barbe and Thomas Schiex
Algorithms 2021, 14(6), 168; https://doi.org/10.3390/a14060168 - 28 May 2021
Cited by 10 | Viewed by 3375
Abstract
Proteins are the main active molecules of life. Although natural proteins play many roles, as enzymes or antibodies for example, there is a need to go beyond the repertoire of natural proteins to produce engineered proteins that precisely meet application requirements, in terms [...] Read more.
Proteins are the main active molecules of life. Although natural proteins play many roles, as enzymes or antibodies for example, there is a need to go beyond the repertoire of natural proteins to produce engineered proteins that precisely meet application requirements, in terms of function, stability, activity or other protein capacities. Computational Protein Design aims at designing new proteins from first principles, using full-atom molecular models. However, the size and complexity of proteins require approximations to make them amenable to energetic optimization queries. These approximations make the design process less reliable, and a provable optimal solution may fail. In practice, expensive libraries of solutions are therefore generated and tested. In this paper, we explore the idea of generating libraries of provably diverse low-energy solutions by extending cost function network algorithms with dedicated automaton-based diversity constraints on a large set of realistic full protein redesign problems. We observe that it is possible to generate provably diverse libraries in reasonable time and that the produced libraries do enhance the Native Sequence Recovery, a traditional measure of design methods reliability. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

13 pages, 1300 KiB  
Article
Validation of Automated Chromosome Recovery in the Reconstruction of Ancestral Gene Order
by Qiaoji Xu, Lingling Jin, James H. Leebens-Mack and David Sankoff
Algorithms 2021, 14(6), 160; https://doi.org/10.3390/a14060160 - 21 May 2021
Cited by 6 | Viewed by 2146
Abstract
The RACCROCHE pipeline reconstructs ancestral gene orders and chromosomal contents of the ancestral genomes at all internal vertices of a phylogenetic tree. The strategy is to accumulate a very large number of generalized adjacencies, phylogenetically justified for each ancestor, to produce long ancestral [...] Read more.
The RACCROCHE pipeline reconstructs ancestral gene orders and chromosomal contents of the ancestral genomes at all internal vertices of a phylogenetic tree. The strategy is to accumulate a very large number of generalized adjacencies, phylogenetically justified for each ancestor, to produce long ancestral contigs through maximum weight matching. It constructs chromosomes by counting the frequencies of ancestral contig co-occurrences on the extant genomes, clustering these for each ancestor and ordering them. The main objective of this paper is to closely simulate the evolutionary process giving rise to the gene content and order of a set of extant genomes (six distantly related monocots), and to assess to what extent an updated version of RACCROCHE can recover the artificial ancestral genome at the root of the phylogenetic tree relating to the simulated genomes. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

25 pages, 562 KiB  
Article
Disjoint Tree Mergers for Large-Scale Maximum Likelihood Tree Estimation
by Minhyuk Park, Paul Zaharias and Tandy Warnow
Algorithms 2021, 14(5), 148; https://doi.org/10.3390/a14050148 - 07 May 2021
Cited by 5 | Viewed by 3212
Abstract
The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence [...] Read more.
The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. Here, we evaluate the feasibility of using DTMs to improve the scalability of maximum likelihood (ML) gene tree estimation to large numbers of input sequences. Our study shows distinct differences between the three selected ML codes—RAxML-NG, IQ-TREE 2, and FastTree 2—and shows that good DTM pipeline design can provide advantages over these ML codes on large datasets. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

17 pages, 2551 KiB  
Article
Multiple Loci Selection with Multi-Way Epistasis in Coalescence with Recombination
by Aritra Bose, Filippo Utro, Daniel E. Platt and Laxmi Parida
Algorithms 2021, 14(5), 136; https://doi.org/10.3390/a14050136 - 25 Apr 2021
Viewed by 2411
Abstract
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding the evolution of complex genetic systems. We [...] Read more.
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding the evolution of complex genetic systems. We present a backward coalescent model, EpiSimRA, that accommodates multiple loci selection, with multi-way (k-way) epistasis for any arbitrary k. Starting from arbitrary extant populations with epistatic sites, we trace the Ancestral Recombination Graph (ARG), sampling relevant recombination and coalescent events. Our framework allows for studying different complex evolutionary scenarios in the presence of selective sweeps, positive and negative selection with multiway epistasis. We also present a forward counterpart of the coalescent model based on a Wright-Fisher (WF) process, which we use as a validation framework, comparing the hallmarks of the ARG between the two. We provide the first framework that allows a nose-to-nose comparison of multiway epistasis in a coalescent simulator with its forward counterpart with respect to the hallmarks of the ARG. We demonstrate, through extensive experiments, that EpiSimRA is consistently superior in terms of performance (seconds vs. hours) in comparison to the forward model without compromising on its accuracy. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

Review

Jump to: Research

23 pages, 677 KiB  
Review
Predicting the Evolution of Syntenies—An Algorithmic Review
by Nadia El-Mabrouk
Algorithms 2021, 14(5), 152; https://doi.org/10.3390/a14050152 - 11 May 2021
Cited by 3 | Viewed by 2562
Abstract
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just [...] Read more.
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees. Full article
(This article belongs to the Special Issue Algorithms in Computational Biology)
Show Figures

Figure 1

Back to TopTop