Research

Jump to: Review

249 KiB

Open AccessArticle

Computational Recognition of RNA Splice Sites by Exact Algorithms for the Quadratic Traveling Salesman Problem

by Anja Fischer, Frank Fischer, Gerold Jäger, Jens Keilwagen, Paul Molitor and Ivo Grosse

Computation 2015, 3(2), 285-298; https://doi.org/10.3390/computation3020285 - 03 Jun 2015

Cited by 6 | Viewed by 6120

One fundamental problem of bioinformatics is the computational recognition of DNA and RNA binding sites. Given a set of short DNA or RNA sequences of equal length such as transcription factor binding sites or RNA splice sites, the task is to learn a [...] Read more.

One fundamental problem of bioinformatics is the computational recognition of DNA and RNA binding sites. Given a set of short DNA or RNA sequences of equal length such as transcription factor binding sites or RNA splice sites, the task is to learn a pattern from this set that allows the recognition of similar sites in another set of DNA or RNA sequences. Permuted Markov (PM) models and permuted variable length Markov (PVLM) models are two powerful models for this task, but the problem of finding an optimal PM model or PVLM model is NP-hard. While the problem of finding an optimal PM model or PVLM model of order one is equivalent to the traveling salesman problem (TSP), the problem of finding an optimal PM model or PVLM model of order two is equivalent to the quadratic TSP (QTSP). Several exact algorithms exist for solving the QTSP, but it is unclear if these algorithms are capable of solving QTSP instances resulting from RNA splice sites of at least 150 base pairs in a reasonable time frame. Here, we investigate the performance of three exact algorithms for solving the QTSP for ten datasets of splice acceptor sites and splice donor sites of five different species and find that one of these algorithms is capable of solving QTSP instances of up to 200 base pairs with a running time of less than two days. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

1360 KiB

Open AccessArticle

A Guide to Phylogenetic Reconstruction Using Heterogeneous Models—A Case Study from the Root of the Placental Mammal Tree

by Raymond J. Moran, Claire C. Morgan and Mary J. O'Connell

Computation 2015, 3(2), 177-196; https://doi.org/10.3390/computation3020177 - 15 Apr 2015

Cited by 14 | Viewed by 21572

Abstract

There are numerous phylogenetic reconstruction methods and models available—but which should you use and why? Important considerations in phylogenetic analyses include data quality, structure, signal, alignment length and sampling. If poorly modelled, variation in rates of change across proteins and across lineages can [...] Read more.

There are numerous phylogenetic reconstruction methods and models available—but which should you use and why? Important considerations in phylogenetic analyses include data quality, structure, signal, alignment length and sampling. If poorly modelled, variation in rates of change across proteins and across lineages can lead to incorrect phylogeny reconstruction which can then lead to downstream misinterpretation of the underlying data. The risk of choosing and applying an inappropriate model can be reduced with some critical yet straightforward steps outlined in this paper. We use the question of the position of the root of placental mammals as our working example to illustrate the topological impact of model misspecification. Using this case study we focus on using models in a Bayesian framework and we outline the steps involved in identifying and assessing better fitting models for specific datasets. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Figure 1

4002 KiB

Open AccessArticle

Evolution by Pervasive Gene Fusion in Antibiotic Resistance and Antibiotic Synthesizing Genes

by Orla Coleman, Ruth Hogan, Nicole McGoldrick, Niamh Rudden and James O. McInerney

Computation 2015, 3(2), 114-127; https://doi.org/10.3390/computation3020114 - 26 Mar 2015

Cited by 4 | Viewed by 6053

Abstract

Phylogenetic (tree-based) approaches to understanding evolutionary history are unable to incorporate convergent evolutionary events where two genes merge into one. In this study, as exemplars of what can be achieved when a tree is not assumed a priori, we have analysed the evolutionary [...] Read more.

Phylogenetic (tree-based) approaches to understanding evolutionary history are unable to incorporate convergent evolutionary events where two genes merge into one. In this study, as exemplars of what can be achieved when a tree is not assumed a priori, we have analysed the evolutionary histories of polyketide synthase genes and antibiotic resistance genes and have shown that their history is replete with convergent events as well as divergent events. We demonstrate that the overall histories of these genes more closely resembles the remodelling that might be seen with the children’s toy Lego, than the standard model of the phylogenetic tree. This work demonstrates further that genes can act as public goods, available for re-use and incorporation into other genetic goods. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Figure 1

2430 KiB

Open AccessArticle

Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism

by Le Bao, Daniel Elleder, Raunaq Malhotra, Michael DeGiorgio, Theodora Maravegias, Lindsay Horvath, Laura Carrel, Colin Gillin, Tomáš Hron, Helena Fábryová, David R. Hunter and Mary Poss

Computation 2014, 2(4), 221-245; https://doi.org/10.3390/computation2040221 - 28 Nov 2014

Cited by 5 | Viewed by 9620

Abstract

Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part [...] Read more.

Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species’ genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Figure 1

835 KiB

Open AccessArticle

Incongruencies in Vaccinia Virus Phylogenetic Trees

by Chad Smithson, Samantha Kampman, Benjamin M. Hetman and Chris Upton

Computation 2014, 2(4), 182-198; https://doi.org/10.3390/computation2040182 - 14 Oct 2014

Cited by 11 | Viewed by 9239

Abstract

Over the years, as more complete poxvirus genomes have been sequenced, phylogenetic studies of these viruses have become more prevalent. In general, the results show similar relationships between the poxvirus species; however, some inconsistencies are notable. Previous analyses of the viral genomes contained [...] Read more.

Over the years, as more complete poxvirus genomes have been sequenced, phylogenetic studies of these viruses have become more prevalent. In general, the results show similar relationships between the poxvirus species; however, some inconsistencies are notable. Previous analyses of the viral genomes contained within the vaccinia virus (VACV)-Dryvax vaccine revealed that their phylogenetic relationships were sometimes clouded by low bootstrapping confidence. To analyze the VACV-Dryvax genomes in detail, a new tool-set was developed and integrated into the Base-By-Base bioinformatics software package. Analyses showed that fewer unique positions were present in each VACV-Dryvax genome than expected. A series of patterns, each containing several single nucleotide polymorphisms (SNPs) were identified that were counter to the results of the phylogenetic analysis. The VACV genomes were found to contain short DNA sequence blocks that matched more distantly related clades. Additionally, similar non-conforming SNP patterns were observed in (1) the variola virus clade; (2) some cowpox clades; and (3) VACV-CVA, the direct ancestor of VACV-MVA. Thus, traces of past recombination events are common in the various orthopoxvirus clades, including those associated with smallpox and cowpox viruses. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Figure 1

959 KiB

Open AccessArticle

On Mechanistic Modeling of Gene Content Evolution: Birth-Death Models and Mechanisms of Gene Birth and Gene Retention

by Ashley I. Teufel, Jing Zhao, Malgorzata O'Reilly, Liang Liu and David A. Liberles

Computation 2014, 2(3), 112-130; https://doi.org/10.3390/computation2030112 - 28 Aug 2014

Cited by 9 | Viewed by 8865

Abstract

Characterizing the mechanisms of duplicate gene retention using phylogenetic methods requires models that are consistent with different biological processes. The interplay between complex biological processes and necessarily simpler statistical models leads to a complex modeling problem. A discussion of the relationship between biological [...] Read more.

Characterizing the mechanisms of duplicate gene retention using phylogenetic methods requires models that are consistent with different biological processes. The interplay between complex biological processes and necessarily simpler statistical models leads to a complex modeling problem. A discussion of the relationship between biological processes, existing models for duplicate gene retention and data is presented. Existing models are then extended in deriving two new birth/death models for phylogenetic application in a gene tree/species tree reconciliation framework to enable probabilistic inference of the mechanisms from model parameterization. The goal of this work is to synthesize a detailed discussion of modeling duplicate genes to address biological questions, moving from previous work to future trajectories with the aim of generating better models and better inference. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Figure 1

Review

Jump to: Research

841 KiB

Open AccessReview

Evolutionary Dynamics in Gene Networks and Inference Algorithms

by Daniel Aguilar-Hidalgo, María C. Lemos and Antonio Córdoba

Computation 2015, 3(1), 99-113; https://doi.org/10.3390/computation3010099 - 13 Mar 2015

Cited by 6 | Viewed by 5580

Abstract

Dynamical interactions among sets of genes (and their products) regulate developmental processes and some dynamical diseases, like cancer. Gene regulatory networks (GRNs) are directed networks that define interactions (links) among different genes/proteins involved in such processes. Genetic regulation can be modified during the [...] Read more.

Dynamical interactions among sets of genes (and their products) regulate developmental processes and some dynamical diseases, like cancer. Gene regulatory networks (GRNs) are directed networks that define interactions (links) among different genes/proteins involved in such processes. Genetic regulation can be modified during the time course of the process, which may imply changes in the nodes activity that leads the system from a specific state to a different one at a later time (dynamics). How the GRN modifies its topology, to properly drive a developmental process, and how this regulation was acquired across evolution are questions that the evolutionary dynamics of gene networks tackles. In the present work we review important methodology in the field and highlight the combination of these methods with evolutionary algorithms. In recent years, this combination has become a powerful tool to fit models with the increasingly available experimental data. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures

Graphical abstract

267 KiB

Open AccessReview

Computation of the Likelihood in Biallelic Diffusion Models Using Orthogonal Polynomials

by Claus Vogl

Computation 2014, 2(4), 199-220; https://doi.org/10.3390/computation2040199 - 14 Nov 2014

Cited by 3 | Viewed by 4359

Abstract

In population genetics, parameters describing forces such as mutation, migration and drift are generally inferred from molecular data. Lately, approximate methods based on simulations and summary statistics have been widely applied for such inference, even though these methods waste information. In contrast, probabilistic [...] Read more.

In population genetics, parameters describing forces such as mutation, migration and drift are generally inferred from molecular data. Lately, approximate methods based on simulations and summary statistics have been widely applied for such inference, even though these methods waste information. In contrast, probabilistic methods of inference can be shown to be optimal, if their assumptions are met. In genomic regions where recombination rates are high relative to mutation rates, polymorphic nucleotide sites can be assumed to evolve independently from each other. The distribution of allele frequencies at a large number of such sites has been called “allele-frequency spectrum” or “site-frequency spectrum” (SFS). Conditional on the allelic proportions, the likelihoods of such data can be modeled as binomial. A simple model representing the evolution of allelic proportions is the biallelic mutation-drift or mutation-directional selection-drift diffusion model. With series of orthogonal polynomials, specifically Jacobi and Gegenbauer polynomials, or the related spheroidal wave function, the diffusion equations can be solved efficiently. In the neutral case, the product of the binomial likelihoods with the sum of such polynomials leads to finite series of polynomials, i.e., relatively simple equations, from which the exact likelihoods can be calculated. In this article, the use of orthogonal polynomials for inferring population genetic parameters is investigated. Full article

(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

► Show Figures