1. Introduction
Transcription factors (TFs) are DNA-binding proteins that allow for modulation of transcription initiation in response to intracellular and extracellular changes. Over decades of research, there have been many advances in exploring the TFs regulatory mechanisms cells use to control their gene expression. However, technological innovations such as massively parallel sequencing and data sciences have expanded our interest in new model organisms and their adaptations. TFs are
trans factors that bind to
cis-regulatory elements, promoter or enhancer sequences known as TF binding sites (TFBSs). It has been reported that most of the bacterial TFBSs are found in the proximal region (about −100 to +20 bp from the transcription start site [TSS]) and distal regions (up to −200 from TSS) [
1,
2,
3]. Functionally, TFs are categorized into activators and suppressors, with a few of them being dual-regulators [
4]. Regarding the number of genes regulated, TFs are classified into local or global regulators [
5]. Such characteristics make up the mechanism of transcription regulation and help identify novel TFs.
Proteomic studies allow the grouping of TFs into families based on structural comparison studies. However, new findings have shown that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs [
6]. A more comprehensive characterization of the TF regulatory network is achieved by identifying the TFBSs, the genes regulated, and the method of regulation. Advances in computational biology and data processing have given rise to inclusive databases that can predict structure and function for TFs in new model organisms [
7]. However, most of these databases are built from experimental studies.
To gain insights into transcriptional regulatory networks in extremophile organisms, our laboratory has employed a novel biochemistry-based method, Restriction Endonuclease, Selection, Protection, and Amplification (REPSA), to characterize several TFs in the extreme thermophilic model organism
Thermus thermophilus HB8. To date, we have studied four tetracycline repressor protein (TetR) family transcriptional suppressors and have successfully identified their TFBSs [
8,
9,
10,
11]. Commonly, suppressors bind DNA in the absence of small-molecule modulators/cofactors and with high-affinity. Contrary, numerous transcriptional activators employ small-molecule modulators in order to bind DNA, thus complicating their analysis in vitro.
In this study, we explore the utility of REPSA to identify and characterize a potential thermophilic transcriptional activator, TTHB099. Protein sequence homology analysis indicates that TTHB099 is one of the four cAMP receptor protein (CRP) family members (TTHA1437, TTHA1359, TTHB099, and TTHA1567) in
T. thermophilus HB8 and should bind palindromic DNA sequences as a homodimer [
12]. However, despite having a cAMP binding domain, it does not require this cofactor to bind DNA. Here, we identified the preferred DNA-binding sequence for TTHB099 as the 16-mer motif: 5′–TGT(A/g)n(t/c)c(t/c)(a/g)g(a/g)n(T/c)ACA–3′. Furthermore, we used binding kinetics studies and mRNA expression data to validate potential biological roles of TTHB099.
3. Discussion
In this study, an in vitro iterative selection method, REPSA, was used to annotate the TTHB099 transcription regulator in T. thermophilus HB8. This, coupled with next generation sequencing and MEME motif elicitation, allowed for the identification of the TTHB099-DNA binding motif, a 16 bp long palindromic sequence, 5′–TGT(A/g)n(t/c)c(t/c)(a/g)g(a/g)n(T/c)ACA–3′, with a consensus half-site 5′–T1G2T3(A/G)4N5(T/C)6C7(T/C)8–3′. Binding kinetics between TTHB099 and its consensus sequence, as well as single point mutations within its half-site, were investigated using BLI. TTHB099 protein bound the 16-mer consensus sequence with a high affinity (KD = 2.21 nM) and the point-mutated sequences in the range of 4.86 of 33.6 nM with mutations at the second and third positions having the greatest effect. The different binding affinities for each mutated sequence mirrored the MEME results represented by the TTHB099 sequence logo. Our report is the first time a consensus sequence has been identified for TTHB099.
Interestingly, our sequence has a strong resemblance to the
E. coli CRP (CRP
Ec) consensus sequence, 5′-AAATGTGATCTAGATCACATTT-3′ [
26]. In both cases, the trimers “TGT” and “ACA” are highly conserved and are considered most significant for TF binding. The specifics of this resemblance could be correlated to the homology between the two proteins previously reported by Agari et al. [
12]. However,
E. coli and
T. thermophilus HB8 are not only phylogenetically distant, but they also live in entirely different environments, mesophilic and extremophilic, respectively [
27]. Hence, the biological roles of TTHB099 need not necessarily be the same as those of CRP
Ec. This is most evident in the observation that TTHB099 does not require the second messenger 3′,5′ cAMP to bind DNA, which is required by CRP
Ec.
Having found and validated a consensus TTHB099-binding sequence, mapping it onto the genome of
T. thermophilus HB8 would help identify potential TTHB099-regulated genes. Using FIMO, the MEME derived position weight matrix version of our consensus sequence recognized 78 sequences. The top 25 sequences with the best
p-values were selected for further validation. It is important to note that the
p-values derived were not as small as found in our previous studies, due to the ten poorly conserved positions in the middle of the TTHB099 consensus sequence palindrome, which affected the dynamic programming algorithm of FIMO. Our analysis of the TTHB099 binding site location relative to the TSS of the proximal downstream genes showed that almost half of the identified sites were located inside open reading frames, which is not typical for traditional transcription factors. Notably, no potential TTHB099 binding site was found near its own gene. This could imply that the TTHB099 TF by itself has no direct regulatory role over its operon litR (
TTHB100, TTHB099, TTHB098) or the divergent crtB operon (
TTHB101, TTHB102) that shares a common intergenic region. Autoregulation is a common feature for many prokaryotic TFs, including members of the CRP family, but may not be a characteristic for TTHB099 unless in an auxiliary fashion [
28].
The promoter analysis revealed that nine TTHB099-binding sites overlapped with potential core promoter elements, a TF-promoter interaction characteristic of Class II transcription activators, as well as transcription inhibition via steric hindrance. Additionally, three sequences are located upstream of the −35 box, fitting the Class I activator model, while two are downstream of the −10 box, a model used by both transcription activators and repressors. These variations in the binding method suggest that TTHB099 could be either an activator or a suppressor. Indeed, the dual regulatory role is common in global regulators such as CRP
Ec [
29]. Moreover, eight pairs of the TTHB099-binding sequences were found in the intergenic region of divergent genes, another characteristic of dual-regulators [
30].
Biophysical studies performed with BLI were used to further our understanding of TTHB099 interaction with the identified sites. The equilibrium dissociation constants were below the micromolar range, showing that TTHB099 had some appreciable affinity for the tested sites. However, variations as high as 200-fold were observed. These KD changes did not follow any particular trends, such as the p-value order established through FIMO. Neither did the sites with the highest affinity have similarities in terms of promoter location or presumed manner of transcription regulation. For example, the TTHB099 binding sequence with the highest affinity (3.05 nM) was located in the intergenic region and overlapped with the −35 box upstream of TTHA1833. The TTHB099 binding sequences with the next lowest KD were also situated in the intergenic regions, but they were located upstream and downstream of the TTHB088/89 promoters, respectively. Such biophysical results emphasize the importance of experimental validation of theoretically determined sites.
Our BLI binding studies are limited to the simple interactions of purified protein with synthesized DNAs in the absence of any environmental or biological factors. Knowing that the transcription regulation apparatus can be complex, we decided to complement our in vitro study with data from in vivo expression profiles. Using publicly available expression profile data from the matched wild type and TTHB099-deficient T. thermophilus HB8 strains, operons of the 16 potentially regulated genes were investigated. We found that the mRNAs of these genes were not significantly affected by the deficiency of TTHB099. These results suggest that TTHB099 does not have, on its own, any appreciable regulatory roles over these genes in exponentially propagating wild type organisms.
Nonetheless, TTHB099 deficiency does appreciably affect the expression of several genes in exponentially propagating
T. thermophilus HB8. We identified 19 operons, 12 of which were overexpressed (positively affected) in the deficient strains. The upregulated set of genes were involved in the electron transport chain (ETC) of oxidative phosphorylation, sugar metabolism, type IV pilin related proteins, and one osmotically inducible protein, consistent with TTHB099 being a transcriptional repressor. Conversely, there were seven under-expressed operons or a total of 17 genes in the TTHB099-deficient strains, suggesting that TTHB099 may act as an activator for these genes. The downregulated genes encoded for ribosomal proteins, iron ABC transporters, and ATPases. Notably, the biological roles of the most affected operons in the TTHB099-deficient strain were involved in metabolic pathways that have been reported to be regulated by the archetype CRP
Ec [
31]. For example, ribosome related genes were downregulated in the absence of
TTHB099, similar to what Pal et al. reported for their evolutionary expressed CRP
Ec-deficient strains [
32]. Likewise, iron transport genes were downregulated in the absence of
TTHB099, similar to what was observed in the absence of CRP
Ec, as Zhang et al. reported [
33]. Such results indicate that TTHB099 does have some biological functions similar to those of the CRP
Ec. However, these regulatory roles do not seem to be affected by changes in cAMP concentration. Moreover, a MEME search for a consensus sequence between the 19 most-affected operons identified via the GEO data failed to bring up any significant motifs. Thus, the hypothesis for a simple regulatory mechanism is once more unsatisfied.
TT_P0055 from
T. thermophilus HB27, an ortholog of TTHB099 with only one amino acid substitution (E77D), has been reported to be a positive regulator of
crtB operon, which in turn is involved in light-dependent carotenoid biosynthesis [
33]. However, the functional effects of TT_P0055 on carotenoid production lack details on the mechanism of regulation and could indicate that TT_P0055 has indirect control over
crtB activation. The homology between the HB27 and HB8 strains, particularly on this regulatory complex (TT_P0055 and TTHB099 proteins, their intergenic regions, and their
crtB operons), would suggest similar biological functions for the two TFs. When analyzing the GEO expression data in the absence of TTHB099, there is no detectable change in
crtB genes. These results could be attributed to the absence of light in the experimental conditions required to deplete the litR transcriptional repressor of TT_P0055, the latter positively regulating carotenoid production [
34].
Because TTHB099 does not seem to have any observable binding to the
crtB promoter, the study published by Ebright et al. centered on TTHB099 binding upstream of
TTHB101 is based on a prediction not firmly established [
35]. Hence, Ebright’s claim that TTHB099 is a model class II transcription activator may need to be reconsidered under the light of our new findings.
Looking for a connection between the genes found via the REPSA-identified consensus sequence and the genes affected by TTHB099 deficiency, as determined by GEO2R, we found that five of the affected operons (30 genes) had an upstream binding sequence identified by FIMO. Interestingly, these binding sites were located at about 0.9 to 4 kbp upstream of the most affected operons. Such behavior could be explained by TTHB099 acting as an enhancer or silencer. These elements do exist in the prokaryotic world but not in large numbers. To date, the identified prokaryotic enhancers regulate only a few promoters used by σ
54-directed RNA polymerases [
36]. Knowing that
T. thermophilus HB8 does not have a σ
54 homolog, it becomes even more challenging to suggest that TTHB099 can function as an enhancer/silencer. Future studies could be designed to analyze potential interactions of TTHB099 with other TFs, supporting the hypothesis of a complex regulatory mechanism involving distal enhancer/silencer elements. As for TTHB099 being an activator or a suppressor, all our data point towards a dual regulatory role.