Special Issue "Evolution and Structure of Proteins and Proteomes"

Quicklinks

A special issue of Genes (ISSN 2073-4425).

Deadline for manuscript submissions: closed (15 August 2011)

Special Issue Editors

Guest Editor
Dr. Kyung Mo Kim

Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea
Guest Editor
Dr. Gustavo Caetano-Anollés (Website)

Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA
Phone: 217.333-8172

Special Issue Information

Dear Colleagues,

Since Darwin established the general principles of natural selection in 1859 and Kimura proposed the neutral theory in the late 1960s, most studies of evolution of molecules have focused on individual gene sequences. However, the recent revolution in nucleic acid sequencing driven by shotgun and high-throughput technologies have led to rapid generation of myriad genomic sequences across the three cellular domains of life and viruses. Similarly, advances in structural genomics have produced an ever-expanding repertoire of three-dimensional models of structure, providing a crucial link to our understanding of the molecular workings of the cell. These unique resources enable the exploration of diversity and change in molecules and molecular repertoires within different time frames and at a global and synthetic level. It is now possible to study how genomes, proteomes, biological processes, molecular functions, and biological networks (e.g., protein interactions and metabolic pathways) are organized and evolve. Some recent genome-wide studies have connected some of these aspects to each other in integrative ways. Others have fleshed processes linked to mechanistic and evolutionary patterns, using experimental, bioinformatic and theoretical approaches. In this special issue we encourage the contribution of review articles and original research papers that address the evolution, structure and function of proteins and proteomes using molecular evolution, genomic, structural, network, and systems biology frameworks.

Dr. Kyung Mo Kim
Dr. Gustavo Caetano-Anollés
Guest Editors

Keywords

  • biological networks
  • evolution
  • genomes
  • molecular function
  • molecular structure
  • protein
  • proteome

Published Papers (11 papers)

View options order results:
result details:
Displaying articles 1-11
Export citation of selected articles as:

Research

Jump to: Review, Other

Open AccessArticle The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence
Genes 2012, 3(2), 291-319; doi:10.3390/genes3020291
Received: 27 March 2012 / Revised: 2 May 2012 / Accepted: 8 May 2012 / Published: 16 May 2012
Cited by 3 | PDF Full-text (497 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple [...] Read more.
The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Figures

Open AccessArticle Evolution and Quantitative Comparison of Genome-Wide Protein Domain Distributions
Genes 2011, 2(4), 912-924; doi:10.3390/genes2040912
Received: 29 August 2011 / Revised: 7 October 2011 / Accepted: 25 October 2011 / Published: 9 November 2011
Cited by 3 | PDF Full-text (120 KB) | HTML Full-text | XML Full-text
Abstract
The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey [...] Read more.
The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey essentially the same information and at the same time are much more readily accessible, although protein domain models trained for one phylogenetic group frequently fail on distantly related sequences. Pooling related domain models based on their GO-annotation in combination with de novo gene prediction methods provides estimates that seem to be less affected by phylogenetic biases. We show here for 18 diverse representatives from all eukaryotic kingdoms that a pooled analysis of the tendencies for co-occurrence or avoidance of protein domains is indeed feasible. This type of analysis can reveal general large-scale patterns in the domain co-occurrence and helps to identify lineage-specific variations in the evolution of protein domains. Somewhat surprisingly, we do not find strong ubiquitous patterns governing the evolutionary behavior of specific functional classes. Instead, there are strong variations between the major groups of Eukaryotes, pointing at systematic differences in their evolutionary constraints. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessArticle Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms
Genes 2011, 2(4), 869-911; doi:10.3390/genes2040869
Received: 16 September 2011 / Revised: 28 October 2011 / Accepted: 28 October 2011 / Published: 8 November 2011
Cited by 12 | PDF Full-text (2098 KB) | HTML Full-text | XML Full-text
Abstract
The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence [...] Read more.
The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Figures

Open AccessArticle Functional Capabilities of the Earliest Peptides and the Emergence of Life
Genes 2011, 2(4), 671-688; doi:10.3390/genes2040671
Received: 11 August 2011 / Revised: 14 September 2011 / Accepted: 14 September 2011 / Published: 26 September 2011
Cited by 12 | PDF Full-text (317 KB) | HTML Full-text | XML Full-text
Abstract
Considering how biological macromolecules first evolved, probably within a marine environment, it seems likely the very earliest peptides were not encoded by nucleic acids, or at least not via the genetic code as we know it. An objective of the present work [...] Read more.
Considering how biological macromolecules first evolved, probably within a marine environment, it seems likely the very earliest peptides were not encoded by nucleic acids, or at least not via the genetic code as we know it. An objective of the present work is to demonstrate that sequence-independent peptides, or peptides with variable and unreliable lengths and sequences, have the potential to perform a variety of chemically useful functions such as anion and cation binding and membrane and channel formation as well as simple types of catalysis. These functions tend to be performed with the assistance of the main chain CONH atoms rather than the more variable or limited side chain atoms of the peptides presumed to exist then. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessArticle Protein Folding Absent Selection
Genes 2011, 2(3), 608-626; doi:10.3390/genes2030608
Received: 10 July 2011 / Revised: 5 August 2011 / Accepted: 11 August 2011 / Published: 16 August 2011
Cited by 8 | PDF Full-text (299 KB) | HTML Full-text | XML Full-text
Abstract
Biological proteins are known to fold into specific 3D conformations. However, the fundamental question has remained: Do they fold because they are biological, and evolution has selected sequences which fold? Or is folding a common trait, widespread throughout sequence space? To address [...] Read more.
Biological proteins are known to fold into specific 3D conformations. However, the fundamental question has remained: Do they fold because they are biological, and evolution has selected sequences which fold? Or is folding a common trait, widespread throughout sequence space? To address this question arbitrary, unevolved, random-sequence proteins were examined for structural features found in folded, biological proteins. Libraries of long (71 residue), random-sequence polypeptides, with ensemble amino acid composition near the mean for natural globular proteins, were expressed as cleavable fusions with ubiquitin. The structural properties of both the purified pools and individual isolates were then probed using circular dichroism, fluorescence emission, and fluorescence quenching techniques. Despite this necessarily sparse “sampling” of sequence space, structural properties that define globular biological proteins, namely collapsed conformations, secondary structure, and cooperative unfolding, were found to be prevalent among unevolved sequences. Thus, for polypeptides the size of small proteins, natural selection is not necessary to account for the compact and cooperative folded states observed in nature. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessArticle Reassessing Domain Architecture Evolution of Metazoan Proteins: The Contribution of Different Evolutionary Mechanisms
Genes 2011, 2(3), 578-598; doi:10.3390/genes2030578
Received: 30 June 2011 / Revised: 13 July 2011 / Accepted: 2 August 2011 / Published: 5 August 2011
Cited by 8 | PDF Full-text (10413 KB) | HTML Full-text | XML Full-text
Abstract
In the accompanying papers we have shown that sequence errors of public databases and confusion of paralogs and epaktologs (proteins that are related only through the independent acquisition of the same domain types) significantly distort the picture that emerges from comparison of [...] Read more.
In the accompanying papers we have shown that sequence errors of public databases and confusion of paralogs and epaktologs (proteins that are related only through the independent acquisition of the same domain types) significantly distort the picture that emerges from comparison of the domain architecture (DA) of multidomain Metazoan proteins since they introduce a strong bias in favor of terminal over internal DA change. The issue of whether terminal or internal DA changes occur with greater probability has very important implications for the DA evolution of multidomain proteins since gene fusion can add domains only at terminal positions, whereas domain-shuffling is capable of inserting domains both at internal and terminal positions. As a corollary, overestimation of terminal DA changes may be misinterpreted as evidence for a dominant role of gene fusion in DA evolution. In this manuscript we show that in several recent studies of DA evolution of Metazoa the authors used databases that are significantly contaminated with incomplete, abnormal and mispredicted sequences (e.g., UniProtKB/TrEMBL, EnsEMBL) and/or the authors failed to separate paralogs and epaktologs, explaining why these studies concluded that the major mechanism for gains of new domains in metazoan proteins is gene fusion. In contrast with the latter conclusion, our studies on high quality orthologous and paralogous Swiss-Prot sequences confirm that shuffling of mobile domains had a major role in the evolution of multidomain proteins of Metazoa and especially those formed in early vertebrates. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessArticle Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Errors Caused by Confusing Paralogs and Epaktologs
Genes 2011, 2(3), 516-561; doi:10.3390/genes2030516
Received: 7 June 2011 / Revised: 8 July 2011 / Accepted: 19 July 2011 / Published: 2 August 2011
Cited by 5 | PDF Full-text (18993 KB) | HTML Full-text | XML Full-text
Abstract
In the accompanying paper (Nagy, Szláma, Szarka, Trexler, Bányai, Patthy, Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors) we showed that in the case of UniProtKB/TrEMBL, RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences of Metazoan species [...] Read more.
In the accompanying paper (Nagy, Szláma, Szarka, Trexler, Bányai, Patthy, Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors) we showed that in the case of UniProtKB/TrEMBL, RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences of Metazoan species the contribution of erroneous (incomplete, abnormal, mispredicted) sequences to domain architecture (DA) differences of orthologous proteins might be greater than those of true gene rearrangements. Based on these findings, we suggest that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. In this manuscript we examine the impact of confusing paralogous and epaktologous multidomain proteins (i.e., those that are related only through the independent acquisition of the same domain types) on conclusions drawn about DA evolution of multidomain proteins in Metazoa. To estimate the contribution of this type of error we have used as reference UniProtKB/Swiss-Prot sequences from protein families with well-characterized evolutionary histories. We have used two types of paralogy-group construction procedures and monitored the impact of various parameters on the separation of true paralogs from epaktologs on correctly annotated Swiss-Prot entries of multidomain proteins. Our studies have shown that, although public protein family databases are contaminated with epaktologs, analysis of the structure of sequence similarity networks of multidomain proteins provides an efficient means for the separation of epaktologs and paralogs. We have also demonstrated that contamination of protein families with epaktologs increases the apparent rate of DA change and introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences.We have shown that confusing paralogous and epaktologous multidomain proteins significantly increases the apparent rate of DA change in Metazoa and introduces a positional bias in favor of terminal over internal DA changes. Our findings caution that earlier studies based on analysis of datasets of protein families that were contaminated with epaktologs may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of multidomain proteins is presented in an accompanying paper [1]. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessArticle Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors
Genes 2011, 2(3), 449-501; doi:10.3390/genes2030449
Received: 24 May 2011 / Revised: 14 June 2011 / Accepted: 20 June 2011 / Published: 13 July 2011
Cited by 12 | PDF Full-text (3104 KB) | HTML Full-text | XML Full-text | Correction | Supplementary Files
Abstract
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, [...] Read more.
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI’s GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI’s GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1]. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)

Review

Jump to: Research, Other

Open AccessReview Antifragility and Tinkering in Biology (and in Business) Flexibility Provides an Efficient Epigenetic Way to Manage Risk
Genes 2011, 2(4), 998-1016; doi:10.3390/genes2040998
Received: 28 September 2011 / Revised: 25 October 2011 / Accepted: 16 November 2011 / Published: 29 November 2011
Cited by 8 | PDF Full-text (847 KB) | HTML Full-text | XML Full-text
Abstract
The notion of antifragility, an attribute of systems that makes them thrive under variable conditions, has recently been proposed by Nassim Taleb in a business context. This idea requires the ability of such systems to ‘tinker’, i.e., to creatively respond to [...] Read more.
The notion of antifragility, an attribute of systems that makes them thrive under variable conditions, has recently been proposed by Nassim Taleb in a business context. This idea requires the ability of such systems to ‘tinker’, i.e., to creatively respond to changes in their environment. A fairly obvious example of this is natural selection-driven evolution. In this ubiquitous process, an original entity, challenged by an ever-changing environment, creates variants that evolve into novel entities. Analyzing functions that are essential during stationary-state life yield examples of entities that may be antifragile. One such example is proteins with flexible regions that can undergo functional alteration of their side residues or backbone and thus implement the tinkering that leads to antifragility. This in-built property of the cell chassis must be taken into account when considering construction of cell factories driven by engineering principles. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)
Open AccessReview The Evolution of Protein Structures and Structural Ensembles Under Functional Constraint
Genes 2011, 2(4), 748-762; doi:10.3390/genes2040748
Received: 24 September 2011 / Revised: 15 October 2011 / Accepted: 19 October 2011 / Published: 28 October 2011
Cited by 9 | PDF Full-text (203 KB) | HTML Full-text | XML Full-text
Abstract
Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (http://www.ncbi.nlm.nih.gov/), and [...] Read more.
Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (http://www.ncbi.nlm.nih.gov/), and our knowledge of function through a limited set of in-vitro biochemical studies. How these intersect through evolution is described in the first part of the review. In the second part, our understanding of a series of questions is addressed. This includes how sequences evolve within structures, how evolutionary processes enable structural transitions, how the folding process can change through evolution and what the fitness impacts of this might be. Moving beyond static structures, the evolution of protein kinetics (including normal modes) is discussed, as is the evolution of conformational ensembles and structurally disordered proteins. This ties back to a question of the role of neostructuralization and how it relates to selection on sequences for functions. The relationship between metastability, the fitness landscape, sequence divergence, and organismal effective population size is explored. Lastly, a brief discussion of modeling the evolution of sequences of ordered and disordered proteins is entertained. Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)

Other

Jump to: Research, Review

Open AccessCorrection Correction: Nagy, A., et al. Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors. Genes 2011, 2, 449-501.
Genes 2011, 2(3), 599-607; doi:10.3390/genes2030599
Received: 9 August 2011 / Accepted: 16 August 2011 / Published: 16 August 2011
PDF Full-text (570 KB) | HTML Full-text | XML Full-text
Abstract We found some errors in the published versions of Figure S2, Figure S3 and Figure S8 of our paper [1]. The correct Figures are presented below. [...] Full article
(This article belongs to the Special Issue Evolution and Structure of Proteins and Proteomes)

Journal Contact

MDPI AG
Genes Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
genes@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Genes
Back to Top