1. Introduction
All living species cope with their ever-changing environments by activating the cellular stress response (CSR) [
1]. The major components of the CSR are a set of proteins named heat shock proteins (HSPs). Several HSPs are molecular chaperones, playing a pivotal role in protein homeostasis under pathological and stressful conditions by supporting protein folding and refolding, and protein transportation across various membranes [
2]. These proteins also inhibit apoptotic pathways and promote various immune responses by recruiting cytokines and anti-inflammatory effectors [
3], and have been associated with several diseases, including cancer, cardiovascular, and neurodegenerative diseases [
4,
5,
6]. In addition to their indispensable intracellular functions, several HSPs are also found at the plasma membrane [
7,
8,
9] and the extracellular environment, where they function in cell signaling and immunity [
10,
11,
12]. HSPs are diverse in function, structure, and domain organization and are usually separated into families according to their molecular size [
2].
Evolutionarily, the different HSP families are present in all kingdoms of life, i.e., bacteria, archaea, and eukaryotes [
13,
14]. Furthermore, when the members of each family, e.g., Hsp70s, are compared to each other, they exhibit a remarkable amount of conservation at their function, structure, and primary sequence. Therefore, the HSP system’s main components are conserved and, in most cases, stem from the same ancestral molecule [
13,
14]. Such a high degree of conservation has long been attributed to the process of natural selection in the form of purifying selection [
15,
16].
Although the main HSP components are present and highly conserved in all species, eukaryotes have significantly diversified several of them, including Hsp70s, with eukaryotic genomes coding 10 to 15 different Hsp70s compared to the one to three Hsp70s coded within bacterial genomes. This gene family diversification has long been attributed to gene or genome duplication events that resulted in homologous proteins with specialized functions [
15]. For example, there are specialized Hsp70 systems in the mitochondria, endoplasmic reticulum, cytosol, and plant plastids. Similar levels of conservation and gene multiplication hold true for almost all other HSP families, including Hsp40s.
Therefore, common ancestry (homology) explains the presence of the Hsp70 system in all species, and gene duplication explains its amplification in different eukaryotic phyla. However, in several cases, the system “reinvented” itself even though this was not the most parsimonious solution to the cell stress response problem. For example, the bacterial Hsp70 system comprises three major proteins, i.e., Hsp70 (DNAK), Hsp40 (DNAJ), and GrpE [
17,
18]. In this system, the DNAK is the major chaperone, responsible for protein folding and unfolding; DNAJ is a co-chaperone, responsible for stimulating DNAK’s ATP hydrolysis; and GrpE, the nucleotide exchange factor (NEF), responsible for removing ADP from DNAK. In all eukaryotes, the same homologous system is present only in the mitochondria and the plant plastids [
19,
20,
21].
In contrast, in all the other eukaryotic Hsp70 systems, the Hsp70 and Hsp40 are homologous to their bacterial counterparts, but the NEFs are not [
3,
22]. Indeed, different eukaryotic species and different Hsp70s use analogous proteins to remove ADP from the Hsp70. Some use a modified version of an Hsp70, named Hsp110 [
23], while others use proteins that contain the BAG (Bcl2-associated athanogene) domain [
24]. All these proteins bind to the same region on the Hsp70 molecule, causing similar molecular shifts to release the ADP from the Hsp70s’ nucleotide-binding groove. To make things more fascinating, the GrpE and the BAG domains have similar structures and cause identical structural changes upon binding to the Hsp70 polypeptide [
25].
These observations reveal that during the evolution of the Hsp70 systems, there were many cases of divergent evolution, in which gene duplication resulted in homologous proteins and structures, and several cases of convergent evolution, in which similar proteins, structures, and functions evolved independently in different eukaryotic lineages. The presence or absence of particular NEFs in diverse eukaryotic lineages raises fundamental questions regarding their origin, evolution, and function. For example, based on the continuous presence of GrpE in all bacteria and mitochondria, we can safely assume that GrpE evolved very early in evolution, predating the origin of the eukaryotic cell, while Hsp110s probably represent eukaryotic innovations.
The case of the BAG domain-containing proteins, however, is more complicated because they are known to exist in vertebrates and plants [
26], while their presence in non-vertebrate animals and fungi remains elusive. Additionally, the animal and plant sequences have highly diverged, making the identification and retrieval of plant sequences using, for example, human BAGs, uncertain and error prone [
26]. The sporadic presence of BAG-containing proteins in eukaryotes makes their origin obscure, and the question as to whether vertebrate and plant BAGs are homologous or analogous warrants a definite answer. Furthermore, considering that the function of NEFs is relatively straightforward and conserved, it is currently unclear which NEFs evolved to provide redundancy or functional specialization to Hsp70s and potentially provide organisms with selective advantages in particular environments.
In contrast to some research on the interspecies evolution of BAG domain-containing proteins, their evolution and natural variation within a species, including humans, remains unknown. In humans, the most common form of genetic variation is a single nucleotide polymorphism (SNP), a single nucleotide difference between individuals at particular positions within their DNA [
27]. The majority of complex traits and diseases in humans are hypothesized to arise from rare SNPs in specific combination with common SNPs. However, how SNPs within BAG encoding genes manifest their functional consequences and whether they are subject to purifying selection or other evolutionary mechanisms within humans remain unknown.
In humans, there are five proteins that contain the BAG domain (BAG-1 to -5) [
22]. Among them 
BAG1, one of the most studied BAG proteins, encodes at least three protein isoforms, i.e., BAG-1L, BAG-1M, and BAG-1S [
28,
29]. All BAG-1 isoforms share two major domains, an N-terminal ubiquitin-like domain (UBQ-like) and a C-terminal BAG domain [
30,
31]. Although BAG-1′s exact mechanism of action is not entirely clear, human BAG-1 participates in a wide variety of cellular processes, including apoptosis, cell growth and survival, transcriptional regulation, protein refolding and degradation, and tumorigenesis [
24,
31,
32,
33].
To answer some of these questions and shed light on the complex evolution of NEFs and, in particular, BAGs, we investigated the origin and evolution of BAG-1 between species, examined the patterns of extant natural variation in humans, and assessed the functional consequences of specific natural polymorphisms in humans.
  3. Discussion
Based on the current databases and sequence similarity tools used, the 
BAG1 gene is not present in bacteria or archaea. This finding strongly suggests that this protein is a eukaryotic innovation and is supported by the fact that all studied bacteria use GrpE as a nucleotide exchange factor [
22].
Our results support homology (in contrast to analogy) between extant BAG-1 proteins and suggest that BAG-1 originated once early in eukaryotic evolution and subsequently was lost from a vast number of species. Our findings support an evolutionary scenario according to which the fusion between the UBQ-like and BAG domains occurred only once before the separation of the major eukaryotic kingdoms of animals, fungi, and plants. After the domain fusion event, the molecule followed separate and independent evolutionary trajectories in the different lineages, resulting in the very radical distribution in the extant animals, fungi, and plants for which genomic information is available.
This evolutionary pattern is consistent with the birth-and-death model of evolution [
15,
42], in which a particular gene as it passes vertically during speciation either duplicates (gene birth) or disappears (gene death). According to this model of evolution, the duplicated gene (paralog) is “free” from functional constraints and thus accumulates more mutations as compared to the original (first) gene copy [
15,
42]. The latter idea could explain the high divergence of BAG-1 proteins between different phyla, e.g., animals and plants, because the genes found in the extant species might not be direct orthologs. Instead, these sequences might represent paralogs that have been differentially lost independently in the different eukaryotic lineages. Alternatively (but not necessarily mutually exclusive), the low conservation can be explained by relaxation of purifying selection (that allows the accumulation of many amino acid-altering mutations as long as the major function–structure is preserved) or even by the function of positive (diversifying) selection, which favors amino acid altering mutations [
43,
44]. On the other end of the spectrum, however, we observed the high level of conservation of particular amino acids that seem to preserve the structure (three antiparallel α-helices) and NEF function (binding to the NBD region of Hsp70s) [
24,
39]. Although this conservation could be the result of genetic drift, in which particular mutations are randomly fixed in a population, we favor the function of purifying selection. Therefore, it seems probable that BAG-1 has evolved under a mixture of different evolutionary forces. Whether these forces resulted in different or specialized functions remains unknown, although the presence of several conserved residues of yet unknown functionality between all BAG-1 proteins studied and multiple phylum-specific conserved residues support this notion. Furthermore, it is not known which cellular functions attributed to BAG-1 other than being an NEF are conserved between the different species. Based on the observed conservation patterns in both domains, we speculate that some of these functions might be conserved. The association of BAG-1 with the proteasome observed in mammals [
45] and plants [
46] provides some support to this speculation.
If the ideas presented above stand true, would such mixed evolutionary patterns also be visible within a single species’ populations? Indeed, analysis of the type, frequency, and distribution of SNPs present within BAG1 revealed evolutionary patterns similar to those observed between species. First, the finding that most SNPs within BAG1 are rare suggests that BAG1 follows a similar model of evolution within humans as it has between species. Second, the observation that SNPs found within BAG1 were unequally distributed across different regions of the gene agrees with the finding that conserved amino acids within BAG-1 are scattered across the primary amino acid sequence and suggests that at least certain regions within BAG1 are resilient to change. Furthermore, the bimodal distribution of amino acid-altering SNPs (nsSNPs) across BAG1’s coding region reveals differential accumulation of mutations within the two major functional domains (UBQ-like and BAG). These findings could be the result of diverse selective pressures acting on these two gene regions. Although the presence of different selection pressures acting upon the two regions (purifying on the UBQ and positive on the BAG) could explain the current findings, the random accumulation of mutations that have not been eliminated from the population cannot be ruled out. Together, these data suggest that SNPs occurring within BAG1’s coding region are subject to relaxed purifying selection, potentially due to less constrictive functional constraints, which could be explained partially by the redundancy within the Hsp70 networks. Whether or not this is the case, these suggestions are further corroborated by the interspecies evolution of BAG-1, which also revealed the presence of multiple radical amino acid changes.
To test some of the different evolutionary hypotheses, we characterized the functional outputs of a few selected SNPs. Based on the computational analysis, these mutations were not considered particularly radical in their chemical nature, although they were predicted to be “deleterious” by several algorithms. This assumption was partially verified by the almost identical melting temperature of all the recombinant proteins generated. On the other hand, some mutations altered the ATPase activity of HSPA1A differently from the wild-type BAG-1. However, it is very hard to determine or even pinpoint the mechanistic details of the observed changes because of the lack of any binding data. Moreover, if we assume that BAG-1 locks the HSPA1A in an “open” conformation [
47,
48], then the refolding assay suggests that the M215V mutation (which also affected Hsp70’s ATP hydrolysis) had lost this ability. This result could be related to changes in the packing interactions that resulted from the presence of valine in place of methionine. This finding could be interpreted as a loss-of-function of BAG-1 in homozygous individuals. Alternatively, these findings might suggest functional redundancy because other NEFs could compensate for a non-functional BAG-1 protein.
Collectively, the results of these assays suggest small but consistent changes in the function of BAG-1. This interpretation further suggests that the interaction of BAG-1 with Hsp70s does not depend on a single amino acid, but instead, it is a combination of smaller interactions that position the molecule in the correct orientation, thus allowing it to perform its NEF activity irrespective of the apparent affinity for the Hsp70 molecule [
39]. Furthermore, we might not have observed major functional alterations because, except for the intracellular luciferase assay, the other in vitro assays did not contain other accessory molecules (e.g., other co-chaperones) [
24,
33,
39,
47,
48,
49]. Lastly, BAG-1 has several functions inside the cell, which include association with the proteasome, binding to Bcl2, and regulation of apoptosis [
31,
45], which were not tested in this report.
Evolutionarily, these observations suggest that functionally, BAG-1 is relatively plastic and can accommodate a high mutational weight without losing much of its original functionality. This speculation implies that most mutations during BAG-1’s evolutionary history within or between species are functionally and thus evolutionary neutral. Furthermore, the lack of major functional outcomes is less supportive of positive or diversifying selection in humans.
Sketching BAG1’s origin and evolution between several eukaryotes and within humans has revealed common evolutionary patterns that delineate the interspecies evolution and intraspecies variation of BAG-1. This knowledge provides insights into how BAG-1 has acquired its diverse functions and potentially predicts its ability to accommodate a plethora of amino acid changes within humans that could have allowed them to adapt to their dynamic environments or predispose specific individuals for particular diseases.