Next Article in Journal
Emerging Roles for the INK4a/ARF (CDKN2A) Locus in Adipose Tissue: Implications for Obesity and Type 2 Diabetes
Next Article in Special Issue
The Role of Stress Proteins in Haloarchaea and Their Adaptive Response to Environmental Shifts
Previous Article in Journal
Current Understanding of the Relationship of HDL Composition, Structure and Function to Their Cardioprotective Properties in Chronic Kidney Disease
Previous Article in Special Issue
The CARF Protein MM_0565 Affects Transcription of the Casposon-Encoded cas1-solo Gene in Methanosarcina mazei Gö1
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

G-Quadruplexes in the Archaea Domain

1
Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
2
Institut Curie, CNRS UMR9187, INSERM U1196, Universite Paris Saclay, 91400 Orsay, France
3
Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
4
Faculty of Mechanical Engineering, Brno University of Technology, Technicka 2896/2, 616 69 Brno, Czech Republic
5
Faculty of Chemistry, Brno University of Technology, Purkyňova 464/118, 612 00 Brno, Czech Republic
6
Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic
7
Institut de Biologie Intégrative de la Cellule (I2BC), CNRS, Université Paris-Saclay, CEDEX, 91198 Gif-sur-Yvette, France
8
Laboratoire d’Optique et Biosciences, Ecole Polytechnique, CNRS, INSERM, Institut Polytechnique de Paris, 91128 Palaiseau, France
*
Authors to whom correspondence should be addressed.
Biomolecules 2020, 10(9), 1349; https://doi.org/10.3390/biom10091349
Submission received: 11 August 2020 / Revised: 16 September 2020 / Accepted: 18 September 2020 / Published: 21 September 2020
(This article belongs to the Collection Archaea: Diversity, Metabolism and Molecular Biology)

Abstract

:
The importance of unusual DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, G-quadruplexes (G4s) have gained in popularity during the last decade, and their presence and functional relevance at the DNA and RNA level has been demonstrated in a number of viral, bacterial, and eukaryotic genomes, including humans. Here, we performed the first systematic search of G4-forming sequences in all archaeal genomes available in the NCBI database. In this article, we investigate the presence and locations of G-quadruplex forming sequences using the G4Hunter algorithm. G-quadruplex-prone sequences were identified in all archaeal species, with highly significant differences in frequency, from 0.037 to 15.31 potential quadruplex sequences per kb. While G4 forming sequences were extremely abundant in Hadesarchaea archeon (strikingly, more than 50% of the Hadesarchaea archaeon isolate WYZ-LMO6 genome is a potential part of a G4-motif), they were very rare in the Parvarchaeota phylum. The presence of G-quadruplex forming sequences does not follow a random distribution with an over-representation in non-coding RNA, suggesting possible roles for ncRNA regulation. These data illustrate the unique and non-random localization of G-quadruplexes in Archaea.

1. Introduction

The Archaea domain was classified separately from Bacteria by Carl Woese and George Fox in 1977 [1]. Later on, it was found that all major molecular machinery, such as DNA replication, transcription, and translation, of archaea are much more similar to those of eukaryotes than to those of bacteria [2,3]. This is also true for some important membrane proteins, such as ATP synthases and proteins of the Sec transport system [4,5], or for some proteins involved in cell division and vesicle trafficking [6]. Thus, the archaeal domain occupies a key position in the Tree of Life, and there is currently a hot debate about their exact relationships with eukaryotes [7,8]. A schematic phylogenic tree for the Archaea domain is proposed in Figure 1; this phylogeny is rapidly evolving with many new phyla recently identified via the accumulation of metagenome associated genomes (MAGs) and various new proposals for phylum definition and nomenclature [9,10]. The first detected archaea were isolated in harsh environments but later found in almost every environment, including the human microbiota, where they play important roles in the gut, mouth, and on the skin [11,12]. It has been hypothesized that archaea found in oceans are one of the most abundant groups of organisms on the planet with important roles both in the carbon and the nitrogen cycle [13]. The Archaea domain has several unique features, such as ether-linked lipids, while eukaryotes and most of the bacteria have ester-linked lipids [14]. Moreover, the stereochemistry of archaeal lipids has the opposite configuration as compare to the ones of eukaryotic and bacterial origin. Interestingly, methanogenesis, the production of greenhouse methane gas as a metabolic by-product, occurs only in the archaeal domain [15,16].
G-quadruplex structures (G4) formed by guanine rich sequences are among the most intensively studied local DNA/RNA structures [20]. G4s are formed by G:G Hoogsteen base pairing in a guanine quartet, and their formation requires the presence of stabilizing cations, such as potassium [21] (Figure 2). In both bacteria and eukaryotes, G4 formation regulates various processes, including gene expression [22], protein translation [23], and proteolysis [24]. G4 have been identified in a number of pathogens, including viruses, eukaryotes (e.g., Plasmodium falciparum) [25,26] or prokaryotes (e.g., Neissseria gonorrhoeae [27], and Mycobacterium tuberculosis) [28,29]. Moreover, many G4-binding proteins are conserved in all organisms highlighting the importance of the G4 structure regulations [30], and novel G4 binding proteins have been identified, sharing the NIQI amino acid motif (RGRGRRGGGSGGSGGRGRG) [31]. Specific helicases have been identified both in eukaryotes and bacteria to unfold these structures, which can be extremely stable and would be problematic for the transcription or replication of G-rich motifs (e.g., the Pif1 or RecQ family helicases) [32]. Recently, G4Hunter was successfully used for the prediction of G-quadruplex-forming sequences in all complete bacterial genomes [33]. These results showed that G-quadruplex-forming sequences are present in all species with the highest frequencies in some extremophiles. In contrast to RNA, there is no correlation between genomic DNA GC% in Archaea (and in Bacteria) and the optimal growth temperature. This is likely because DNA in vivo is topologically closed, and topologically closed DNA is stable at least up to 107 °C [34]. We therefore cannot anticipate a higher density of G4-prone motifs in thermophiles, due to a GC-bias. A comparison with Extremophiles in bacteria is interesting [35]. Ding et al. hypothesized that stress-resistant bacteria found in the Deinococcales may utilize putative quadruplex sequences (PQS) for gene regulatory purposes. An enrichment in prokaryote PQS has been found in thermophilic organisms [33] but also in organisms with resistance to other stress factors, such as radiation [36,37]; thus, a direct correlation between temperature and G4 presence is not supported by these findings. In addition, while bacteria in the Deinococcus-Thermus group are the most abundant for PQS, it is striking that the mostly thermophilic and hyperthermophilic bacteria in the Thermotogae phylum have one of the lowest PQS frequencies. Correlation among thermophiles and G4s, therefore, depends on the phylum (Gram-negative vs. Gram-positive bacteria).
Due to the roles of G4s in the regulation of basic cellular processes, it is important to identify their location in genomes. Several algorithms are available to predict G-quadruplex-forming sequences [38,39,40,41]. Among them, the G4Hunter application was developed to provide quantitative analyses giving a propensity score as an output [41], and the G4Hunter web tool allows effective and fast analyses of PQS in large datasets [42].
The prokaryotic genetic material is generally stored in circular chromosomes and plasmids [43]. The presence of quadruplex-prone motifs in over a hundred of bacterial genomes was determined over a decade ago [44]. In bacterial genomes, PQS are located non-randomly with a higher relative abundance in non-coding RNA (ncRNA), mRNA, and regions around tRNA and regulatory sequences. PQS also play roles in nitrate assimilation in Paracoccus denitrificans [45]. PQS in the hsdS, recD, and pmrA genes of Streptococcus pneumoniae contributes to host–pathogen interactions [46]. Such observations show the significant role of G4 in bacteria. The importance of another local DNA structure, the cruciform formed by inverted repeats, has been shown as an important regulatory feature of eukaryotic cell organelles, such as chloroplasts and mitochondria with circular DNA genomes [47,48]. Overall, the role of G4s in bacteria [27,49] and eukaryotes [50] is increasingly recognized.
In contrast, little is currently known regarding the abundancy and location of PQS in the archaeal domain. Ding et al. performed an initial search on bacterial and archaeal genomes using a modified Quadparser algorithm with relaxed parameters allowing long loops (up to 12 nucleotides) [35]. They found that thermophilic microorganisms (both archaea and bacteria) appear to favor PQS in their genomes. Dhapola et al. created the Quadbase2 web server, in which G4 motifs found in a variety of organisms, including archaea, may be searched but did not analyze G4 propensity in archaea [51]. Because G4s play many important biological roles in bacterial and eukaryotic cells, we assume that G4s are also likely to have important functions in archaea. Therefore, we comprehensively analyzed the presence and locations of PQS in all sequenced archaeal genomes by G4Hunter [41,42]. These data provide the first study analyzing the presence of G4-prone sequences in this important domain of life.

2. Materials and Methods

2.1. Selection of the DNA Sequences

The set of all archaeal genomic DNA sequences was downloaded from the Genome database of the National Center for Biotechnology Information [52]. We have used for our analyses all accessible archaeal genomes, including contig and scaffold sequences (3387 genomes), and we have selected one representative genome for each species (Supplementary Table S1). For PQS analyses of features, we restricted our analysis to the subset of 140 completely assembled genomes. In total, we have analyzed the presence of G4 forming sequences in 3387 genomes from the archaeal Domain representing a total of 6423 Mbps.

2.2. Process of Analysis

We used the computational core of our DNA analyzer software written in Java programming language [53]. For our analyses, we used a new G4Hunter algorithm implementation [42]. Default parameters for G4Hunter were set to “25” for window size and 1.2 or above for the G4H score (G4HS). PQS score was grouped to the five intervals: 1.2–1.4, 1.4–1.6, 1.6–1.8,1.8–2.0, and 2.0 and more. Overall results for each species group contained a list of species with size of its genomic DNA sequence and number of putative G4 sequences found ( Supplementary Table S2A); for clarity, the results for Groups and Subgroups are in separate files ( Supplementary Table S2B,C). These data were processed by python jupyter using pandas with statistical tools [54]. Graphs were generated from the pandas tables using the “seaborn” graphical library. Note that the distinction between overlapping or discrete (non-overlapping) G4 motifs may create issues in the way potential motifs are counted. For this reason, we also provide a % PQS factor, which corresponds to the probability that any given nucleotide in the group or subgroup belongs to a G4-prone region (G4H > 1.2).
The default window value for G4Hunter has been discussed and tested in previous publications [41]. The value is chosen here (25 nt) corresponds more or less to the size of a typical intramolecular quadruplex. We considered shorter windows (20 nt) in previous studies. However, we noticed that for low thresholds (<1.2), a single GGGGGG run would give a hit; while intermolecular G4 formation is indeed possible with this motif, we hypothesized that intramolecular structures would be more relevant.
A slightly longer window (e.g., 30 nucleotides) further contributes to eliminating such motifs, but at the cost of significantly decreasing the number of hits (by a factor of 2 to 3; see Table 1): This larger window would, therefore, increase the number of false negatives, i.e., miss “real” intramolecular G4. On the other hand, a much larger window (50–100 nt) would be interesting to identify “G4 clusters” in which multiple tandem quadruplexes may be formed. We present the number of sequences found in three different complete archaeal genomes using four different window sizes and a threshold of 1.2:
As shown in Table 1, long G-rich prone regions, potentially supporting the formation of multiple quadruplexes, are present, but far less frequent (by a factor of 19 to 186 for a window of 50 vs. 25) than the classically defined G4Hunter motifs. In these three genomes, a large majority (95–99%) of the G4-prone regions would only support the formation of a single individual quadruplex.

2.3. Analysis of Putative G4 Sequences Around Annotated NCBI Features

We downloaded feature tables from the NCBI database along with genomic DNA sequences. Feature tables contain annotations of known features found in DNA sequences. We performed an analysis of G4-prone sequences occurrence inside recorded features. Features were grouped by their name stated in the feature table file (gene, rRNA, tRNA, ncRNA, and repeat region). From this analysis, we obtained a file with feature names and numbers of putative G4 forming sequences found inside and around features for each group of species analyzed. Search for putative G4 forming sequences took place inside feature boundaries; note that frequencies of inverted repeats in mitochondrial DNA (mtDNA) [48], as well in the G4 prone sequences in bacteria [33], are distributed with different frequencies in close proximity to specific features. Further processing was performed in Microsoft Excel and the data are available as Supplementary Table S3.

2.4. Statistical Analysis

A cluster dendrogram of PQS characteristics was constructed in program R, version 3.6.3, library pvclust [55], to further reveal and graphically depict similarities between particular archaeal subgroups. Mean, Min, Max, and % PQS values were used as input data (Supplementary Table S4). The following parameters were used for analysis: Cluster method ‘ward.D2′, distance ‘Euclidean’, number of bootstrap resampling was set to 10,000. Statistically significant clusters (based on AU values (blue) above 95, equivalent to p-values less than 0.05) are highlighted by rectangles marked with broken red lines. R code is provided in Supplementary Table S4). Statistical evaluations of differences in G4 forming sequences presence in various phylogenetic groups were made by a Kruskal–Wallis test with a Bonferroni adjustment in STATISTICA, with p-value cut-off 0.05; data are available in Supplementary Table S5.

2.5. Quadruplex Formation In Vitro

Representative examples of the candidate sequences identified by G4Hunter were experimentally tested for G4 formation using different techniques: Isothermal difference spectra (IDS) and Circular dichroism (CD as described previously [41]).

2.5.1. Samples

Oligonucleotides were purchased from Eurogentec, Belgium, as dried samples purified by RP cartridge purification. Stock solutions were prepared at 250 μM strand concentration in ddH2O.

2.5.2. Experimental Conditions

Most experiments were performed in a 10 mM Lithium Cacodylate pH 7.1 buffer supplemented with 100 mM KCl (since Hadesarchaea has not been cultivated, it is impossible to know their intracellular potassium concentration. However, this is in the range of intracellular potassium concentration for other archaea, such as Thermococcales).

2.5.3. Isothermal Spectra

2.5 µM oligonucleotide solutions were prepared in 10 mM Lithium Cacodylate buffer at pH 7.1. The solutions were kept at 95 °C for 5 min and slowly cooled to room temperature and kept at 4 °C overnight. Absorbance spectra were recorded on a Cary 300 (Agilent Technologies, France) spectrophotometer at 37 °C (scan range: 500–200 nm; scan rate: 600 nm/min; automatic baseline correction). After recording these first series of spectra (unfolded as no potassium was present) 1 M KCl (100 μL) was added to the samples, and UV-absorbance spectra were recorded after 15 min equilibration, and corrected for dilution. Each IDS corresponds to the arithmetic difference between the initial (unfolded) and final (folded, corrected for dilution) spectra.

2.5.4. Circular Dichroism

2.5 µM oligonucleotide solutions were prepared in 10 mM lithium cacodylate buffer at pH 7.1 supplemented with 100 mM KCl. The solutions were kept at 95 °C for 5 min and slowly cooled to room temperature and kept at 4 °C overnight. CD spectra were recorded on a JASCO J-1500 (France) spectropolarimeter at room temperature or at 80 °C, using a scan range of 400–210 nm, a scan rate of 200 nm/min, and averaging four accumulations (Supplementary Figure S1).

2.6. G-Quadruplex Binding Proteins Prediction

For G-quadruplex binding proteins prediction, based on previously published G-quadruplex binding motif (RGRGRGRGGGSGGSGGRGRG) [31], the BLASTp algorithm was used [56]. The target organisms were limited to the Archaea domain (NCBI taxid ID: 2157). E-value cut-off was set to 0.05. For similarity search of RecQ helicase from Escherichia coli (UNIPROT ID: P15043), BLASTp algorithm [56] was used with an E-value cut-off of 0.0001 and the same restriction to the Archaea domain, as above. BLASTp analyses are enclosed in Supplementary Table S6. FIMO search [57,58] for G-quadruplex binding motif (RGRGRGRGGGSGGSGGRGRG) [31] in Methanosarcina mazei complete proteome was carried out on a set of 15722 known protein sequences downloaded from NCBI, with q-value (p-value corrected for multiple testing by Benjamini and Hochberg method) cut-off of 0.05 (Supplementary Table S7). The most similar protein of RecQ helicase from Escherichia coli (UNIPROT ID: P15043) in Hadesarchaea archaeon isolate WYZ-LMO6 was searched using tBLASTn [59], and the resulting best hit was translated using Expasy Translate Tool [60,61] and functional domain were visualized using NCBI CDD [62] (Supplementary Table S8).

3. Results

3.1. Prediction of G4 Forming Sequences in Archaea

We analyzed the occurrence of putative G4 sequences (PQS) with G4Hunter in 3387 archaeal genomes. The length of sequenced archaeal genomes in our dataset varied from 100 kbps to 13.4 Mbps (list provided in Supplementary Table S1). The average GC content was 46.51%, with a minimum of 24.30% for Nanobsidianus stetteri isolate SCGC AB-777 (Nanoarchaeota) and a maximum of 70.95% for Halobacteriales archaeon SW_7_71_33 (phylum Euryarchaeota). Using standard parameters for the G4Hunter search algorithm (window size of 25 and G4HS ≥ 1.2) we found 4,470,813 PQS in these 3387 archaeal genomes using a default threshold of 1.2. The higher the G4HS score is, the higher the stability of the structure. Over 90% and 98% of sequences with a score above 1.2 or 1.5, respectively, were experimentally demonstrated to form a stable quadruplex in vitro [41]. Figure 3A provides an example of G-rich motifs found in archaea with G4HS between 1.32 and 3.0. As expected from previous analyses on eukaryotes and bacteria, most (97%) PQS have a relatively low (1.2 to 1.4) G4Hunter score. More stable motifs are rarer, with a sharp decrease in the number of retrieved sequences with scores above 1.4, as shown in Table 2. Only 132 PQS with a G4Hunter score of 2 or more were found. A summary of all PQS found in ranges of G4Hunter score intervals and precomputed PQS frequencies per 1000 bp is provided in Table 2.
The comparison of G4 prone sequences found in archaea with bacteria genomes revealed that in both domains, frequencies sharply decreased with G4HS as compared to the human genome, in which highly stable G4s are relatively more frequent (see Figure 3B). This result indicates an overall stronger relative selection pressure against stable G4 motifs in both archaea and bacteria as compared to humans, and likely most eukaryotes, as the relative number of G4Hunter high scoring motifs is even higher in yeast [63]. Guo and Bartel suggested that eukaryotes have robust machinery that globally unfolds RNA G-quadruplexes, whereas some bacteria have instead undergone evolutionary depletion of G-quadruplex-forming sequences [64]. Our analysis suggests that archaea behave like bacteria, except for the slight difference found for the most stable motifs (G4HS >2), which were less selected against in archaea than in bacteria.

3.2. Variation in Frequency for G4 Forming Sequences in Archaea

The total number of analyzed sequences in particular phylogenetic categories, together with a median length of the genome, shortest genome, longest genome, mean, minimal, and maximal observed frequency PQS per kbp, and total PQS counts are shown in Table 3. For this analysis, Archaea have been divided into five superphyla that form monophyletic assemblages (clades) in the most recent phylogenetic analysis and 41 subgroups that correspond to different taxonomic ranks (suffix aeota for phylum, candidate phylum, suffix ales for orders). Seven subgroups have an average GC content above 50%, the highest GC content being observed in Halobacteriales (63.95%), which is also the archaeal group containing the highest number of available genome sequences–440), all other groups have average GC contents below 50%.
The mean frequency of PQS per kbp for all archaeal genomes was 1.207. The lowest mean frequency was for the Heimdallarchaeota (0.273), followed by Methanococcales and Methanobacteriales (0.39). The highest density of PQS was found in the Hadesarchaea subgroup (4.607), followed by Korarchaeota (2.626). The highest absolute frequency of PQS was found in Hadesarchaea archaeon isolate WYZ-LMO6 with 15.3 PQS per 1000bp (i.e., one quadruplex every 65 bp), and the lowest frequency was found in Methanobrevibacter sp. 87.7: Interestingly, only 71 PQS were found in its 1.92 Mb long genome (Supplementary Table S2A). Detailed statistical characteristics for PQS frequencies per kbp (including mean, variance, outliers) are depicted in boxplots for all inspected subgroups (Figure 4). The Hadesarchaea subgroup has a higher PQS frequency in comparison to other subgroups. The comparison of the five main superphyla BAT, Cren, Asgard, Eury, and DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) (Figure 1) revealed the highest mean PQS frequency in Cren superphylum (1.15) and the lowest in Asgard superphylum (0.48). However, the Hadesarchaea subgroup, which exhibits the highest frequency among subgroups, is found in the Eury superphylum. The detailed data for superphyla are in Supplementary Table S2B, for subgroups in Supplementary Table S2C.
A cluster dendrogram shows the similarities among subgroups based on the PQS data (Figure 5). This dendrogram shows that the Hadesarchaeota subgroup is the most distant one (the shortest branch length) compare to other subgroups. The cluster dendrogram based on PQS characteristics is similar to the phylogenetic relationships (see Figure 1). For example, all of the Asgard subgroups (Odinarchaeota, Heimdallarchaeota, Thorarchaeota, and Lokiarchaeota) lie close together, in one bigger cluster (Figure 5, left part). Other examples are the Woesearchaeota, Aenigmarchaeota, and Nanoarchaeota subgroups, which are members of the DPANN superphylum, and lie adjacent to each other in PQS based cluster tree. On the other hand, all of the subgroups with the prefix “-thermo”, indicative of high-temperature environments, are clustered together (Thermoplasmatales, Thermococcales, Thermoproteales, and Geothermarchaeota). These subgroups are relatively PQS rich, but lack phylogenetical proximity, suggesting that PQS richness does not rely on evolutionary proximity.
We then analyzed the relationship between overall % GC content and PQS frequency (Figure 6). PQS frequencies tend to correlate with GC content as G4-prone motifs need to be relatively G-rich; however, there are interesting exceptions to this rule, and this correlation is poorer than anticipated. Ding et al. already noticed that Methanomicrobia and Thermococci have greater densities of PQS than the theoretical values based on the GC % of their genomes [35]. Organisms with higher than expected PQS frequencies based on their GC content (over 50% of the maximal observed PQS frequency, Figure 6) are highlighted in color; the whole figure is separated into smaller segments according to inspected G4Hunter score intervals. The most extreme outlier is Hadesarchaea archaeon, for which 51% of its genome has a G4Hunter score above 1.2, despite a GC content of 54%, i.e., only modestly above the 46.5% average for all sequences tested here, and far below the most GC rich archaea genomes. Cherry-picked examples of G-rich motifs with high G4 Hunter scores (G4HS) in Hadesarchaea archaeon are provided in Table 4. We have also carried out additional statistical evaluation of PQS differences between all groups and subgroups; detailed results are found in Supplementary Table S5. Nearly all comparisons were significant, i.e., there are significant differences between PQS frequencies of particular groups and subgroups.
Figure 7 shows the relationship between GC percentage and mean PQS frequencies (or mean percentage of PQS length of the genome) in particular archaeal subgroups. Overall, we found some correlation (although far from perfect, as shown by R2 = 0.7) between mean PQS frequencies (expressed as the mean fraction of nucleotides of the genomes involved a PQS motif) and increasing GC % content. The highest mean percentage of PQS length of the genomes was found in subgroup Hadesarchaea, in which more than 10% of their genomes are involved in a potential PQS.

3.3. Localization of PQS in Genomes

To evaluate the position of PQS in archaeal genomes, we downloaded the described “features” of all archaeal genomes and analyzed the presence of all PQS in annotated sequences (Figure 8). Overall, we find a higher density of G4-prone motifs in non-protein coding RNAs (tRNA, rRNA, and other ncRNA) than in protein-coding genes. G4 density in ncRNA is clearly above average genomic G4 density, while mRNA G4 density is close to the genomic average. This may derive in part from the observation that rRNA and tRNA genes are especially GC-rich in hyperthermophilic archaea, in order to stabilize folding under harsh conditions [65]. On the other hand, we can probably expect a stronger selection pressure against the formation of intramolecular quadruplexes within the relatively small tRNA core, as this would disrupt its three-dimensional shape and alter its biological function. In line with this hypothesis, the PQS frequencies are actually lower in tRNA than in ncRNA and rRNA [66]. Interestingly, the 5′ end of some human tRNA genes is often G-rich and has been reported to allow G4 formation: Ivanov and colleagues have shown that mature cytoplasmic tRNAs are cleaved during stress response to produce tRNA fragments that function to repress translation in vivo and that these bioactive tRNA fragments assemble into intermolecular RNA G4s [67]. The 5′ fragment of tRNAAla involves a predominant hairpin structure that starts with the 5′-GGGGGU motif, allowing the formation of tetramolecular quadruplex structures with five tetrad layers. Interestingly, tRNA-derived fragments have also been described in archaea. For example, a 26-residue-long fragment (5′ GGGUUGGUGGUCUAGUCUGGUUAUGA) originating from the 5′ part of valine tRNA is the most abundant tRNA fragment in Haloferax volcanii [68]. This fragment, while exhibiting a relatively G-rich 5′ end (starting with GGGUUGG), may, in principle, allow intermolecular quadruplex formation as well.
Unfortunately, other features in archaeal genomes are so poorly annotated that we cannot use these data for evaluation. Comparison of PQS frequencies in annotated sequences with analyses of Bacteria shows the same trend for ncRNA, rRNA, protein-coding gene, and tRNA features. In contrast, the frequency in bacteria for ncRNA is 1.7 per kbp, and the frequency in archaea for ncRNA is 5.3 per kbp. On the other hand, the PQS frequency in repeat regions is lower in archaea than in the bacteria genome. We have to take into account that the data could be influenced by poor annotation in archaea genomes, and also by a low number of annotated sequences in Archaea; only 141 representative archaeal genomes are annotated, compared to 1627 representative bacteria annotated genomes. The strong abundance of the PQS in ncRNA compare to other locations pointing to its functional relevance. ncRNAs are present in the cells as single-stranded molecules in contrast to DNA, and therefore, they can easily adopt the G4 structures as a part of their 3D arrangement similarly to mRNAs [69,70]. It has been shown that ncRNAs play important roles in many cellular processes, including the regulation of gene transcription, post-transcriptional, and epigenetic regulations [71,72].
Other specific regions, such as replication origins or promoter regions, were not included in this graph. The oriC 10.0 database (http://tubic.org/doric/public/index.php) contains 226 archaeal origins of replication obtained by both in vitro studies and in silico predictions ([73]), prediction and experimental data are available for the Thermococcales [74,75], the Haloarchaea, and the Sulfolobales [76]. Archaeal replicators, as in bacteria, are composed of three main elements: A cluster of binding sites for the initiator Cdc6, the DNA unwinding element (DUE), and binding sites for regulatory proteins [75]. Interestingly, it was found in several Haloarchaea species that a specific (TGGGGGGG) motif occurs in one of the two origins of replication (oriC1) [77]. This long G-rich motif was shown to be necessary for efficient replication initiation in Haloarcula hispanica [78,79] and predicted to be prone to inter-molecular quadruplex formation.

3.4. Experimental Demonstration of Quadruplex Formation In Vitro

Next, we selected a few DNA G4-prone motifs found in Hadesarchaea and experimentally tested if they formed a G4 structure under classical conditions. As inferred from isothermal difference spectra (IDS) (Figure 9a) and circular dichroism (CD) spectra (Figure 9b), all motifs clearly formed G-quadruplexes at room temperature. However, as these motifs are found in an archeon expected to live at a high temperature, we also recorded the spectra at 80 °C. As shown in Figure 9c, these quadruplexes were thermally stable and still formed at high temperatures. Of note, most spectra are indicative of a parallel fold. This bias is the result of a high threshold for G4Hunter (all motifs have scores > 1.7). As a consequence, these motifs are very G-rich, with runs of G separated by short spacers, often 1–2 nt. As short loops tend to be propeller-type, this sequence bias will favor a parallel conformation.

3.5. G4-Binding Proteins from Archaea

Given that G4-prone motifs are found in Archaea, and actually extremely abundant in some subgroups, it was interesting to check if potential helicases are present to solve these structures. A number of DNA and RNA G4-helicases have been identified in eukaryotes, e.g., Pif1, DOG, Rhau/DHX36, WRN, BLM; for a review [80]. Little or no experimental data is currently available on archaea enzymes able to unfold G-quadruplexes. As RecQ has been reported to unfold G4 structures in bacteria, we searched for RecQ homologs in Archaea. A BLASTp search using RecQ (UNIPROT ID: P15043) from E. coli as a query revealed 1206 homologous protein sequences in a archaeal domain with an E-value cut-off = 0.0001. A listing of all candidates identified is presented in supplementary information (Supplementary Table S6). Five proteins have an identity with G-quadruplex RecQ resolvase higher than 50%, and 312 proteins have more than 50% aa positives hits in the sequence, suggesting that they share the G4 unfolding functionality in archaea genomes. Besides protein actively unfolding G4 structures, other peptides may actually bind to single-strand G-rich sequences and passively contribute to G4 unfolding by conformational selection. This is the case for a single-strand binding protein isolated from Methanococcus jannaschii, which was used to design an assay to detect G4 formation [79]. Apart from proteins that actively or passively unfold quadruplexes, others may bind to and sometimes promote G4 formation. The amino acid composition of 77 G-quadruplex binding proteins from Homo sapiens revealed unique features of quadruplex binding proteins, with prominent enrichment for glycine (G) and arginine I [31]. Human-binding proteins share a 20 amino acid long motif/domain (RGRGR GRGGG SGGSG GRGRG), which is similar to the previously described RG-rich domain of the FMR1 G-quadruplex binding protein. The search for this 20 amino acid-long motif in archaea proteome found 23 hits/potential G-quadruplex binding proteins with an E-value threshold of 0.05; the identity was found, e.g., for RNA DEAD box helicase or for two 30S ribosomal proteins S4 (Supplementary Table S6, list 2). We searched protein sequences in the proteome of the mesophylic archaeon Methanosarcina mazei (for which the largest amount of proteins is known) for the presence of this motif. For highly significant p values (p < 10−6), we found four proteins with a potential quadruplex-binding motif (Supplementary Table S7), while significantly more (193) hits were found for p-values < 1 × 10−5. Three of them are without any known function (DUF134 domain-containing protein, PGF-pre-PGF domain-containing protein, and DUF5320 domain-containing protein). Even if the full proteome of Hadesarchaea archaeon is not known, it is interesting to note that this RG-domain is present in a number of putative proteins. In addition, while a true RecQ homolog was not found, one Hadesarchaea archaon 600aa-polypeptide has a good similarity with RecQ in its N-terminal half (Supplementary Table S8). The presence of the NIQI motif in the “DNA-directed RNA polymerase subunit” is also interesting and possibly logical, given the necessity of unraveling G-quadruplexes during transcription. The presence in archaeal genomes of potential G4-binding and G4-unfolding proteins supports the formation of quadruplex structures in archaeal cells.

4. Discussion

We provide here the first comprehensive study of PQS occurrences, frequencies, and distributions in archaeal genomes. The overall analysis made on global frequency hides extreme differences between species and subgroups, which can be explained by differences in GC content and possibly codon usage.
At one end of the G4 spectrum, some subgroups of archaea, such as Parvarchaeota or Heimdallarchaeota, have very low PQS frequencies, and PQS cover 1% or less of their genomes. In sharp contrast, we found an unprecedented enrichment of PQS for some subgroups, often living under extreme conditions. For example, over 50% of the genome of Hadesarchaea archaeon may potentially adopt a quadruplex fold. This Hadesarchaea is living under extreme conditions, as it was found in South African gold mines 3 km underground, without light and oxygen (Hades is the Greek god of the underworld). Following this analysis, we used the BioSample NCBI database [78] to compare the living environment of the archaea organisms with the highest PQS frequencies. Data for all genomes with PQS frequency above 6 per kbp are shown in Table 5. A majority of organisms with extremely high PQS frequencies are found in hot springs sediments or in deep-sea hydrothermal vent sediments, and this high PQS frequency may be associated with their extremophilic life, although more work will be necessary to compare G4 density in acidophilic, thermophilic, halophilic and psychrophilic organisms. For example, in bacteria, in the Gram-positive subgroup Deinococcus-Thermus, a high PQS frequency was associated with their extremophilic origin [35,81], while the gram-negative extremophilic bacteria subgroup Thermotogae are among organisms with a low PQS frequency [33]. We suggest that the high stability of G4 structures compare to dsDNA structure could play important roles in archaea and Gram-positive extremophiles organisms. We then experimentally confirmed G4 formation with a few archaea sequences to confirm that our in silico predictions are verified: All predicted experimentally tested formed stable G-quadruplexes in vitro. This absence of false positives is hardly surprising given that we chose high scoring motifs. From our published [41] and unpublished data on now over 500 sequences, false positives for sequences with scores above 1.5 are extremely rare (<1.5%), and we have yet to find a false positive with a score > 1.75. Some of the sequences considered were long and may even allow the formation of two juxtaposed G4 structures. In a few cases, we can even propose a topology, as for example, TGGTGGGGGCGGGGGGAGGGGCGGGGGT (642K), in which the predicted guanine tracks (underlined) may either be: TGGTGGGGGCGGGGGGAGGGGCGGGGGT or TGGTGGGGGCGGGGGGAGGGGCGGGGGT, and different folds may result from these possibilities (the latter would be likely parallel, as experimentally observed at 80 °C, while the former may adopt a non-parallel fold, as observed at room temperature). Note, however, that G4 hunter does not make any hypothesis on the G tracts involved in G4 formation, in contrast with Quadparser, for example, where one actively seeks the four runs of G involved in G-quartet formation. G4 formation is (still) full of surprises, and correctly predicting which runs (or individual guanines) participate in G-quartet formation is far from trivial and requires extensive experimental validation.
The extreme enrichment found in some archaea challenges our existing views on “noncanonical” DNA structures to which G-quadruplexes belong, as it is plausible that a substantial part of the Hadesarchaea genome may be packed into G-quadruplex structures. The complementary C-rich strand may also fold into a different quadruplex structure called the i-motif [82] that is favored by acidic pH. Further studies will be dedicated to i-DNA formation in Archaea.
Hadesarchaea archaeon isolates WYZ-LMO4, WYZ-LMO5, WYZ-LMO6 are archaeal species isolated from hydrothermal spring sediments. Besides high temperatures, often above 50 °C, these ecological niches usually have high salinity. Interestingly, most G-quadruplexes withstand high temperatures (their melting point is often above 70 °C) and are further stabilized by positively charged ions such a K+ and Na+ [84,85]. Such conditions may have naturally favored G-quadruplexes over duplexes. It also highlights one of the consequences of a high GC %: G4-prone motifs become more frequent (Figure 5). In addition, all hyperthermophilic organism genomes encode a reverse gyrase, which positively supercoil DNA, possibly to protect the genome [86]. In future studies, it would be very interesting to carry out a genome-wide wet-lab experiment, for example, direct DNA sequencing of G-quadruplex loci as described in [87,88] or direct visualization of G-quadruplexes in living cells using specific antibodies, such as BG4 [89].

5. Conclusions

Overall, our results indicate that archaea are, like eukaryotes and bacteria, prone to G-quadruplex formation: G-quadruplexes are here, there, and everywhere! Important differences in G4 densities were found among species, and experimental validation was obtained in vitro for a few candidate sequences. Follow-up studies may check if specific archaeal PQS loci—for example, in important genes, show some phylogenetic conservation. If confirmed, this could serve as a new (additional) phylogenetic marker and give us some extended clues about the evolution and function of G-quadruplex forming sequences in Archaea. This study will stimulate further studies on G4 presence in Archaea, and help to establish whether some regulatory mechanisms may only apply to a given domain or be truly universal.

Supplementary Materials

The following are available online at www.mdpi.com/xxx/s1/, Figure S1: Experimental evidence for G4 formation with Hadesarchaea sequences at high temperature; Table S1: The accession codes and phylogenetic classification of all archaeal genomic DNA sequences, Table S2: Overall results of PQS frequencies found in each analyzed genomic sequence (all (A), superphylum (B) or phylum (C)) together with GC content, sequence length and other parameters, Table S3: Feature counts, Table S4: PQS characteristics used for the dendrogram shown in Figure 6, Table S5: Statistical evaluation, Table S6: BLASTp search for RecQ and NIQI in Archaea, Table S7: FIMO search for putative quadruplex binding motif, Table S8: The most similar protein of RecQ (E. coli) in Hadesarchaea archaeon.

Author Contributions

Conceptualization, V.B. and J.-L.M.; methodology, P.K.; software, O.P., J.Š., and P.P.; validation,; V.B., P.K. and M.B.; formal analysis, M.B.; resources, M.B., P.F., V.D.C.; data curation, V.B., M.B., P.F., V.D.C., J.-L.M..; Experimental validation, Y.L., D.V.; writing—original draft preparation, V.B., M.F. and J.-L.M.; writing—review and editing, P.P. H.M., T.S.T., P.F., H.M., V.D.C., J.-L.M.; visualization, V.B., J.-L.M.; supervision, V.B., J.-L.M.; project administration, V.B., M.F.; funding acquisition, V.B., M.F., J.-L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Czech Science Foundation (18-15548S) and by the SYMBIT project Reg. no. CZ.02.1.01/0.0/0.0/15_003/0000477 financed from the ERDF.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Woese, C.R.; Fox, G.E. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc. Natl. Sci. Acad. USA 1977, 74, 5088–5090. [Google Scholar] [CrossRef] [Green Version]
  2. Olsen, G.J.; Woese, C.R. Archaeal genomics: An overview. Cell 1997, 89, 991–994. [Google Scholar] [CrossRef] [Green Version]
  3. Forterre, P. Archaea: What can we learn from their sequences? Curr. Opin. Genet. Dev. 1997, 7, 764–770. [Google Scholar] [CrossRef]
  4. Grüber, G.; Manimekalai, M.S.S.; Mayer, F.; Müller, V. ATP synthases from archaea: The beauty of a molecular motor. Biochim. Biophys. Acta 2014, 1837, 940–952. [Google Scholar] [CrossRef] [Green Version]
  5. Bolhuis, A. The archaeal Sec-dependent protein translocation pathway. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2004, 359, 919–927. [Google Scholar] [CrossRef] [Green Version]
  6. Samson, R.Y.; Dobro, M.J.; Jensen, G.J.; Bell, S.D. The Structure, Function and Roles of the Archaeal ESCRT Apparatus. Subcell. Biochem. 2017, 84, 357–377. [Google Scholar] [CrossRef]
  7. Spang, A.; Eme, L.; Saw, J.H.; Caceres, E.F.; Zaremba-Niedzwiedzka, K.; Lombard, J.; Guy, L.; Ettema, T.J.G. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLoS Genet. 2018, 14, e1007080. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Da Cunha, V.; Gaia, M.; Nasir, A.; Forterre, P. Asgard archaea do not close the debate about the universal tree of life topology. PLoS Genet. 2018, 14, e1007215. [Google Scholar] [CrossRef] [Green Version]
  9. Adam, P.S.; Borrel, G.; Brochier-Armanet, C.; Gribaldo, S. The growing tree of Archaea: New perspectives on their diversity, evolution and ecology. ISME J. 2017, 11, 2407. [Google Scholar] [CrossRef]
  10. Spang, A.; Caceres, E.F.; Ettema, T.J.G. Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life. Science 2017, 357. [Google Scholar] [CrossRef] [Green Version]
  11. Pennisi, E. Survey of archaea in the body reveals other microbial guests. Science 2017, 358, 983. [Google Scholar] [CrossRef] [PubMed]
  12. Chaudhary, P.P.; Conway, P.L.; Schlundt, J. Methanogens in humans: Potentially beneficial or harmful for health. Appl. Microbiol. Biotechnol. 2018, 102, 3095–3104. [Google Scholar] [CrossRef]
  13. Vuillemin, A.; Wankel, S.D.; Coskun, Ö.K.; Magritsch, T.; Vargas, S.; Estes, E.R.; Spivack, A.J.; Smith, D.C.; Pockalny, R.; Murray, R.W. Archaea dominate oxic subseafloor communities over multimillion-year time scales. Sci. Adv. 2019, 5, eaaw4108. [Google Scholar] [CrossRef] [Green Version]
  14. Jain, S.; Caforio, A.; Driessen, A.J.M. Biosynthesis of archaeal membrane ether lipids. Front. Microbiol. 2014, 5, 641. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Nobu, M.K.; Narihiro, T.; Kuroda, K.; Mei, R.; Liu, W.-T. Chasing the elusive Euryarchaeota class WSA2: Genomes reveal a uniquely fastidious methyl-reducing methanogen. ISME J. 2016, 10, 2478–2487. [Google Scholar] [CrossRef] [PubMed]
  16. Aouad, M.; Borrel, G.; Brochier-Armanet, C.; Gribaldo, S. Evolutionary placement of Methanonatronarchaeia. Nat. Microbiol. 2019, 4, 558–559. [Google Scholar] [CrossRef] [Green Version]
  17. Forterre, P. The universal tree of life: An update. Front. Microbiol. 2015, 6. [Google Scholar] [CrossRef] [Green Version]
  18. Dombrowski, N.; Lee, J.-H.; Williams, T.A.; Offre, P.; Spang, A. Genomic diversity, lifestyles and evolutionary origins of DPANN archaea. FEMS Microbiol. Lett. 2019, 366, fnz008. [Google Scholar] [CrossRef] [Green Version]
  19. Gaia, M.; Forterre, P. The Tree of Life. In Molecular Mechanisms of Microbial Evolution (Grand Challenges in Biology and Biotechnology); Rampelotto, P.H., Ed.; Springer: New York, NY, USA, 2018. [Google Scholar]
  20. Sun, Z.-Y.; Wang, X.-N.; Cheng, S.-Q.; Su, X.-X.; Ou, T.-M. Developing Novel G-Quadruplex Ligands: From Interaction with Nucleic Acids to Interfering with Nucleic Acid–Protein Interaction. Molecules 2019, 24, 396. [Google Scholar] [CrossRef] [Green Version]
  21. Harkness, R.W.; Mittermaier, A.K. G-quadruplex dynamics. BBA Proteins Proteomics 2017, 1865, 1544–1554. [Google Scholar] [CrossRef]
  22. Siddiqui-Jain, A.; Grand, C.L.; Bearss, D.J.; Hurley, L.H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. USA 2002, 99, 11593–11598. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Lee, S.C.; Zhang, J.; Strom, J.; Yang, D.; Dinh, T.N.; Kappeler, K.; Chen, Q.M. G-Quadruplex in the NRF2 mRNA 5′ Untranslated Region Regulates De Novo NRF2 Protein Translation under Oxidative Stress. Mol. Cell. Biol. 2016, 37. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Crenshaw, E.; Leung, B.P.; Kwok, C.K.; Sharoni, M.; Olson, K.; Sebastian, N.P.; Ansaloni, S.; Schweitzer-Stenner, R.; Akins, M.R.; Bevilacqua, P.C.; et al. Amyloid Precursor Protein Translation is Regulated by a 3′UTR Guanine Quadruplex. PLoS ONE 2015, 10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Gage, H.L.; Merrick, C.J. Conserved associations between G-quadruplex-forming DNA motifs and virulence gene families in malaria parasites. BMC Genomics 2020, 21, 236. [Google Scholar] [CrossRef] [Green Version]
  26. Gazanion, E.; Lacroix, L.; Alberti, P.; Gurung, P.; Wein, S.; Cheng, M.; Mergny, J.; Gomes, A.; Lopez-Rubio, J. Genome wide distribution of G-quadruplexes and their impact on gene expression in malaria parasites. PLoS Genetics 2020. [Google Scholar] [CrossRef]
  27. Cahoon, L.A.; Seifert, H.S. An alternative DNA structure is necessary for pilin antigenic variation in Neisseria gonorrhoeae. Science 2009, 325, 764–767. [Google Scholar] [CrossRef] [Green Version]
  28. Thakur, R.S.; Desingu, A.; Basavaraju, S.; Subramanya, S.; Rao, D.N.; Nagaraju, G. Mycobacterium tuberculosis DinG is a structure-specific helicase that unwinds G4 DNA implications for targeting g4 dna as a novel therapeutic approach. J. Biol. 2014, 289, 25112–25136. [Google Scholar]
  29. Mishra, S.K.; Shankar, U.; Jain, N.; Sikri, K.; Tyagi, J.S.; Sharma, T.K.; Mergny, J.-L.; Kumar, A. Characterization of G-Quadruplex Motifs in espB, espK, and cyp51 Genes of Mycobacterium tuberculosis as Potential Drug Targets. Mol. Ther. Nucleic. Acids 2019, 16, 698–706. [Google Scholar] [CrossRef] [Green Version]
  30. Brazda, V.; Haronikova, L.; Liao, J.C.; Fojta, M. DNA and RNA Quadruplex-Binding Proteins. Int. J. Mol. Sci. 2014, 15, 17493–17517. [Google Scholar] [CrossRef] [Green Version]
  31. Brázda, V.; Červeň, J.; Bartas, M.; Mikysková, N.; Coufal, J.; Pečinka, P. The Amino Acid Composition of Quadruplex Binding Proteins Reveals a Shared Motif and Predicts New Potential Quadruplex Interactors. Molecules 2018, 23, 2341. [Google Scholar] [CrossRef] [Green Version]
  32. Ribeyre, C.; Lopes, J.; Boulé, J.-B.; Piazza, A.; Guédin, A.; Zakian, V.A.; Mergny, J.-L.; Nicolas, A. The yeast Pif1 helicase prevents genomic instability caused by G-quadruplex-forming CEB1 sequences in vivo. PLoS Genet. 2009, 5, e1000475. [Google Scholar] [CrossRef] [Green Version]
  33. Bartas, M.; Čutová, M.; Brázda, V.; Kaura, P.; Šťastný, J.; Kolomazník, J.; Coufal, J.; Goswami, P.; Červeň, J.; Pečinka, P. The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria. Molecules 2019, 24, 1711. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Marguet, E.; Forterre, P. DNA stability at temperatures typical for hyperthermophiles. Nucleic Acids Res. 1994, 22, 1681–1686. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Ding, Y.; Fleming, A.M.; Burrows, C.J. Case studies on potential G-quadruplex-forming sequences from the bacterial orders Deinococcales and Thermales derived from a survey of published genomes. Sci. Rep. 2018. [Google Scholar] [CrossRef]
  36. Kota, S.; Dhamodharan, V.; Pradeepkumar, P.I.; Misra, H.S. G-quadruplex forming structural motifs in the genome of Deinococcus radiodurans and their regulatory roles in promoter functions. Appl. Microbiol. Biotechnol. 2015, 99, 9761–9769. [Google Scholar] [CrossRef] [PubMed]
  37. Mishra, S.; Chaudhary, R.; Singh, S.; Kota, S.; Misra, H.S. Guanine Quadruplex DNA Regulates Gamma Radiation Response of Genome Functions in the Radioresistant Bacterium Deinococcus radiodurans. J. Bacteriol. 2019, 201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Todd, A.K.; Johnston, M.; Neidle, S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005, 33, 2901–2907. [Google Scholar] [CrossRef] [Green Version]
  39. Huppert, J.L.; Balasubramanian, S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005, 33, 2908–2916. [Google Scholar] [CrossRef] [Green Version]
  40. Eddy, J.; Maizels, N. Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res. 2006, 34, 3887–3896. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Bedrat, A.; Lacroix, L.; Mergny, J.L. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 2016. [Google Scholar] [CrossRef]
  42. Brázda, V.; Kolomazník, J.; Lýsek, J.; Bartas, M.; Fojta, M.; Šťastný, J.; Mergny, J.-L. G4Hunter web application: A web server for G-quadruplex prediction. Bioinformatics 2019, 35, 3493–3495. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Finan, T.M. The divided bacterial genome: Structure, function, and evolution. Microbiol. Mol. Biol. Rev. 2017, 81, e00019-17. [Google Scholar]
  44. Yadav, V.K.; Abraham, J.K.; Mani, P.; Kulshrestha, R.; Chowdhury, S. QuadBase: Genome-wide database of G4 DNA-occurrence and conservation in human, chimpanzee, mouse and rat promoters and 146 microbes. Nucleic Acids Res. 2008, 36, D381–D385. [Google Scholar] [CrossRef] [PubMed]
  45. Waller, Z.A.; Pinchbeck, B.J.; Buguth, B.S.; Meadows, T.G.; Richardson, D.J.; Gates, A.J. Control of bacterial nitrate assimilation by stabilization of G-quadruplex DNA. Chem. Commun. 2016, 52, 13511–13514. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Rawal, P.; Kummarasetti, V.B.R.; Ravindran, J.; Kumar, N.; Halder, K.; Sharma, R.; Mukerji, M.; Das, S.K.; Chowdhury, S. Genome-wide prediction of G4 DNA as regulatory motifs: Role in Escherichia coli global regulation. Genome Res. 2006, 16, 644–655. [Google Scholar] [CrossRef] [Green Version]
  47. Brázda, V.; Lýsek, J.; Bartas, M.; Fojta, M. Complex Analyses of Short Inverted Repeats in All Sequenced Chloroplast DNAs. BioMed Res. Int. 2018, 2018, 1097018. [Google Scholar] [CrossRef] [PubMed]
  48. Čechová, J.; Lýsek, J.; Bartas, M.; Brázda, V. Complex analyses of inverted repeats in mitochondrial genomes revealed their importance and variability. Bioinformatics 2018, 34, 1081–1085. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  49. Cahoon, L.A.; Seifert, H.S. Transcription of a cis-acting, noncoding, small RNA is required for pilin antigenic variation in Neisseria gonorrhoeae. PLoS Pathog. 2013, 9, e1003074. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  50. Neidle, S. The structures of quadruplex nucleic acids and their drug complexes. Curr. Opin. Struct. Biol. 2009, 19, 239–250. [Google Scholar] [CrossRef]
  51. Dhapola, P.; Chowdhury, S. QuadBase2: Web server for multiplexed guanine quadruplex mining and visualization. Nucleic Acids Res. 2016, 44, W277–W283. [Google Scholar] [CrossRef] [Green Version]
  52. Sayers, E.W.; Agarwala, R.; Bolton, E.E.; Brister, J.R.; Canese, K.; Clark, K.; Connor, R.; Fiorini, N.; Funk, K.; Hefferon, T.; et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019, 47, D23–D28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Brázda, V.; Kolomazník, J.; Lỳsek, J.; Hároníková, L.; Coufal, J.; Št’astnỳ, J. Palindrome analyser—A new web-based server for predicting and evaluating inverted repeats in nucleotide sequences. Biochem. Biophys. Res. Commun. 2016, 478, 1739–1745. [Google Scholar] [CrossRef] [PubMed]
  54. Computational Tools—Pandas 0.25.1 Documentation. Available online: https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html (accessed on 16 October 2019).
  55. Suzuki, R.; Shimodaira, H. Pvclust: An R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 2006, 22, 1540–1542. [Google Scholar] [CrossRef]
  56. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
  57. Grant, C.E.; Bailey, T.L.; Noble, W.S. FIMO: Scanning for occurrences of a given motif. Bioinformatics 2011, 27, 1017–1018. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  58. Bailey, T.L.; Boden, M.; Buske, F.A.; Frith, M.; Grant, C.E.; Clementi, L.; Ren, J.; Li, W.W.; Noble, W.S. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res. 2009, 37, W202–W208. [Google Scholar] [CrossRef]
  59. Gertz, E.M.; Yu, Y.-K.; Agarwala, R.; Schäffer, A.A.; Altschul, S.F. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biol. 2006, 4, 41. [Google Scholar] [CrossRef]
  60. Wernersson, R. Virtual Ribosome—A comprehensive DNA translation tool with support for integration of sequence feature annotation. Nucleic Acids Res. 2006, 34, W385–W388. [Google Scholar] [CrossRef] [Green Version]
  61. Artimo, P.; Jonnalagedda, M.; Arnold, K.; Baratin, D.; Csardi, G.; De Castro, E.; Duvaud, S.; Flegel, V.; Fortier, A.; Gasteiger, E. ExPASy: SIB bioinformatics resource portal. Nucleic Acids Res. 2012, 40, W597–W603. [Google Scholar] [CrossRef]
  62. Marchler-Bauer, A.; Derbyshire, M.K.; Gonzales, N.R.; Lu, S.; Chitsaz, F.; Geer, L.Y.; Geer, R.C.; He, J.; Gwadz, M.; Hurwitz, D.I.; et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 2015, 43, D222–D226. [Google Scholar] [CrossRef] [Green Version]
  63. Čutová, M.; Manta, J.; Porubiaková, O.; Kaura, P.; Šťastný, J.; Jagelská, E.B.; Goswami, P.; Bartas, M.; Brázda, V. Divergent distributions of inverted repeats and G-quadruplex forming sequences in Saccharomyces cerevisiae. Genomics 2020, 112, 1897–1901. [Google Scholar] [CrossRef] [PubMed]
  64. Guo, J.U.; Bartel, D.P. RNA G-quadruplexes are globally unfolded in eukaryotic cells and depleted in bacteria. Science 2016, 353. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  65. Galtier, N.; Tourasse, N.; Gouy, M. A nonhyperthermophilic common ancestor to extant life forms. Science 1999, 283, 220–221. [Google Scholar] [CrossRef] [Green Version]
  66. Klein, R.J.; Misulovin, Z.; Eddy, S.R. Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl. Sci. Acad. USA 2002, 99, 7542–7547. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  67. Lyons, S.M.; Gudanis, D.; Coyne, S.M.; Gdaniec, Z.; Ivanov, P. Identification of functional tetramolecular RNA G-quadruplexes derived from transfer RNAs. Nat. Commun. 2017, 8, 1127. [Google Scholar] [CrossRef] [Green Version]
  68. Gebetsberger, J.; Zywicki, M.; Künzi, A.; Polacek, N. tRNA-derived fragments target the ribosome and function as regulatory non-coding RNA in Haloferax volcanii. Archaea 2012, 2012, 260909. [Google Scholar] [CrossRef] [Green Version]
  69. Magnus, M.; Kappel, K.; Das, R.; Bujnicki, J.M. RNA 3D structure prediction guided by independent folding of homologous sequences. BMC Bioinf. 2019, 20, 512. [Google Scholar] [CrossRef] [Green Version]
  70. Kamura, T.; Katsuda, Y.; Kitamura, Y.; Ihara, T. G-quadruplexes in mRNA: A key structure for biological function. Biochem. Biophys. Res. Commun. 2020. [Google Scholar] [CrossRef]
  71. Qu, Z.; Adelson, D.L. Evolutionary conservation and functional roles of ncRNA. Front. Genet. 2012, 3. [Google Scholar] [CrossRef] [Green Version]
  72. Buddeweg, A.; Daume, M.; Randau, L.; Schmitz, R.A. Noncoding RNAs in Archaea: Genome-Wide Identification and Functional Classification. Meth. Enzymol. 2018, 612, 413–442. [Google Scholar] [CrossRef]
  73. Luo, H.; Gao, F. DoriC 10.0: An updated database of replication origins in prokaryotic genomes including chromosomes and plasmids. Nucleic Acids Res. 2019, 47, D74–D77. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  74. Cossu, M.; Da Cunha, V.; Toffano-Nioche, C.; Forterre, P.; Oberto, J. Comparative genomics reveals conserved positioning of essential genomic clusters in highly rearranged Thermococcales chromosomes. Biochimie 2015, 118, 313–321. [Google Scholar] [CrossRef] [PubMed]
  75. Matsunaga, F.; Forterre, P.; Ishino, Y.; Myllykallio, H. In vivo interactions of archaeal Cdc6/Orc1 and minichromosome maintenance proteins with the replication origin. Proc. Natl. Acad. Sci. USA 2001, 98, 11152–11157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  76. Dueber, E.C.; Costa, A.; Corn, J.E.; Bell, S.D.; Berger, J.M. Molecular determinants of origin discrimination by Orc1 initiators in archaea. Nucleic Acids Res. 2011, 39, 3621–3631. [Google Scholar] [CrossRef] [Green Version]
  77. Norais, C.; Hawkins, M.; Hartman, A.L.; Eisen, J.A.; Myllykallio, H.; Allers, T. Genetic and physical mapping of DNA replication origins in Haloferax volcanii. PLoS Genet. 2007, 3, e77. [Google Scholar] [CrossRef]
  78. Wu, Z.; Liu, J.; Yang, H.; Liu, H.; Xiang, H. Multiple replication origins with diverse control mechanisms in Haloarcula hispanica. Nucleic Acids Res. 2013, 42, 2282–2294. [Google Scholar] [CrossRef]
  79. Zhuang, X.; Tang, J.; Hao, Y.; Tan, Z. Fast detection of quadruplex structure in DNA by the intrinsic fluorescence of a single-stranded DNA binding protein. J. Mol. Recognit. 2007, 20, 386–391. [Google Scholar] [CrossRef]
  80. Mendoza, O.; Bourdoncle, A.; Boulé, J.-B.; Brosh, R.M.; Mergny, J.-L. G-quadruplexes and helicases. Nucleic Acids Res. 2016, 44, 1989–2006. [Google Scholar] [CrossRef] [Green Version]
  81. Beaume, N.; Pathak, R.; Yadav, V.K.; Kota, S.; Misra, H.S.; Gautam, H.K.; Chowdhury, S. Genome-wide study predicts promoter-G4 DNA motifs regulate selective functions in bacteria: Radioresistance of D. radiodurans involves G4 DNA-mediated regulation. Nucleic Acids Res. 2013, 41, 76–89. [Google Scholar] [CrossRef] [Green Version]
  82. Gehring, K.; Leroy, J.-L.; Guéron, M. A tetrameric DNA structure with protonated cytosine-cytosine base pairs. Nature 1993, 363, 561–565. [Google Scholar] [CrossRef]
  83. Barrett, T.; Clark, K.; Gevorgyan, R.; Gorelenkov, V.; Gribov, E.; Karsch-Mizrachi, I.; Kimelman, M.; Pruitt, K.D.; Resenchuk, S.; Tatusova, T.; et al. BioProject and BioSample databases at NCBI: Facilitating capture and organization of metadata. Nucleic Acids Res. 2012, 40, D57–D63. [Google Scholar] [CrossRef] [PubMed]
  84. Bartas, M.; Brázda, V.; Karlický, V.; Červeň, J.; Pečinka, P. Bioinformatics analyses and in vitro evidence for five and six stacked G-quadruplex forming sequences. Biochimie 2018, 150, 70–75. [Google Scholar] [CrossRef]
  85. Risitano, A.; Fox, K.R. Stability of Intramolecular DNA Quadruplexes: Comparison with DNA Duplexes. Biochemistry 2003, 42, 6507–6513. [Google Scholar] [CrossRef]
  86. Couturier, M.; Gadelle, D.; Forterre, P.; Nadal, M.; Garnier, F. The reverse gyrase TopR1 is responsible for the homeostatic control of DNA supercoiling in the hyperthermophilic archaeon Sulfolobus solfataricus. Mol. Microbiol. 2020, 113, 356–368. [Google Scholar] [CrossRef] [PubMed]
  87. Chambers, V.S.; Marsico, G.; Boutell, J.M.; Di Antonio, M.; Smith, G.P.; Balasubramanian, S. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015, 33, 877. [Google Scholar] [CrossRef] [Green Version]
  88. Hänsel-Hertsch, R.; Spiegel, J.; Marsico, G.; Tannahill, D.; Balasubramanian, S. Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin immunoprecipitation and high-throughput sequencing. Nat. Protoc. 2018, 13, 551. [Google Scholar] [CrossRef] [PubMed]
  89. Hänsel-Hertsch, R.; Di Antonio, M.; Balasubramanian, S. DNA G-quadruplexes in the human genome: Detection, functions and therapeutic potential. Nat. Rev. Mol. Cell. Biol. 2017, 18, 279. [Google Scholar] [CrossRef]
Figure 1. A schematic phylogenic tree for Archaea. This unrooted evolutionary tree of Archaea is based on the schematic tree of Forterre (2015) [17] updated according to recent phylogenetic analyses [9,18]. BAT stands for Bathyarchaeota, Aigarchaeota, and Thaumarchaeota. DPANN is an acronym based on the first five groups discovered: Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota. The term BAT superphylum has been proposed by Gaia et al. in 2018 [19], and the terms Eury and Cren superphyla are suggested here. The terms Cren superphylum is suggested here because the phyla Crenarchaeota, Verstratearchaeota Marsarchaeota, Nezaarchaeota, and Geothermarchaeota form a consensus monophyletic clade in all archaeal phylogeny. We included Korarchaeota in this superphylum because they often branch as sister groups of the above phyla in archaeal phylogenies, although the fast evolutionary rate made their positioning sometimes difficult. We suggested in parallel the term Eury superphylum because Euryarchaeota includes very diverse groups of cultivated and uncultivated Archaea which are difficult to the group in a single phylum, especially considering that phyla, such as Verstratearchaeota Marsarchaeota, or Nezaarchaeota only contain few uncultivated species only defined by a few metagenome associated genomes (MAGs). Names in bold letters correspond to subgroups that include cultivated species; names in thin letters correspond to subgroups that include only MAGs.
Figure 1. A schematic phylogenic tree for Archaea. This unrooted evolutionary tree of Archaea is based on the schematic tree of Forterre (2015) [17] updated according to recent phylogenetic analyses [9,18]. BAT stands for Bathyarchaeota, Aigarchaeota, and Thaumarchaeota. DPANN is an acronym based on the first five groups discovered: Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota. The term BAT superphylum has been proposed by Gaia et al. in 2018 [19], and the terms Eury and Cren superphyla are suggested here. The terms Cren superphylum is suggested here because the phyla Crenarchaeota, Verstratearchaeota Marsarchaeota, Nezaarchaeota, and Geothermarchaeota form a consensus monophyletic clade in all archaeal phylogeny. We included Korarchaeota in this superphylum because they often branch as sister groups of the above phyla in archaeal phylogenies, although the fast evolutionary rate made their positioning sometimes difficult. We suggested in parallel the term Eury superphylum because Euryarchaeota includes very diverse groups of cultivated and uncultivated Archaea which are difficult to the group in a single phylum, especially considering that phyla, such as Verstratearchaeota Marsarchaeota, or Nezaarchaeota only contain few uncultivated species only defined by a few metagenome associated genomes (MAGs). Names in bold letters correspond to subgroups that include cultivated species; names in thin letters correspond to subgroups that include only MAGs.
Biomolecules 10 01349 g001
Figure 2. A G-quartet involves four coplanar guanines establishing a cyclic array of H-bonds (left). Stacking of two or more (three in this example) quartets leads to the formation of a G-quadruplex structure (right), stabilized by cations, such as potassium (not shown).
Figure 2. A G-quartet involves four coplanar guanines establishing a cyclic array of H-bonds (left). Stacking of two or more (three in this example) quartets leads to the formation of a G-quadruplex structure (right), stabilized by cations, such as potassium (not shown).
Biomolecules 10 01349 g002
Figure 3. Examples of sequences with different G-quadruplexes (G4) Hunter scores (G4HS) and distribution of PQS according to threshold category. (A) Examples of archaea 25-nt long sequences (corresponding to the window size chosen for the analysis) for which G4Hunter scores are provided within parentheses. Isolated guanines are shown in red, all other guanines in bold red characters. Longer archaea motifs with high G4H scores are provided in Table 3. (B) Distribution of G4-prone motifs according to the G4Hunter score. 1.2 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc. These numbers are normalized by the total number of PQS found in bacteria, archaea, and compared with Homo sapiens. The first category represents 97.9% and 97.2% of all PQS sequences in bacteria and archaea, respectively. Note the log scale on the Y-axis.
Figure 3. Examples of sequences with different G-quadruplexes (G4) Hunter scores (G4HS) and distribution of PQS according to threshold category. (A) Examples of archaea 25-nt long sequences (corresponding to the window size chosen for the analysis) for which G4Hunter scores are provided within parentheses. Isolated guanines are shown in red, all other guanines in bold red characters. Longer archaea motifs with high G4H scores are provided in Table 3. (B) Distribution of G4-prone motifs according to the G4Hunter score. 1.2 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc. These numbers are normalized by the total number of PQS found in bacteria, archaea, and compared with Homo sapiens. The first category represents 97.9% and 97.2% of all PQS sequences in bacteria and archaea, respectively. Note the log scale on the Y-axis.
Biomolecules 10 01349 g003
Figure 4. Frequencies of PQS in subgroups of analyzed archaeal genomes. Data within boxes span the interquartile range, and whiskers show the lowest and highest values within 1.5 interquartile range. Black points denote outliers. Horizontal black lines inside boxplots are median values.
Figure 4. Frequencies of PQS in subgroups of analyzed archaeal genomes. Data within boxes span the interquartile range, and whiskers show the lowest and highest values within 1.5 interquartile range. Black points denote outliers. Horizontal black lines inside boxplots are median values.
Biomolecules 10 01349 g004
Figure 5. Cluster dendrogram of PQS characteristics of archaeal subgroups. Cluster dendrogram of PQS characteristics (Supplementary Table S4) was made in R v. 3.6.3 (code provided in Supplementary Table S4) using pvclust package with these parameters: Cluster method ‘ward.D2′, distance ‘euclidean’, number of bootstrap resamplings was 10,000. AU values are in blue and indicate the statistical significance of particular branching (values above 95 are equivalent to p-values lesser than 0.05). Statistically significant clusters are highlighted by red dashed rectangles.
Figure 5. Cluster dendrogram of PQS characteristics of archaeal subgroups. Cluster dendrogram of PQS characteristics (Supplementary Table S4) was made in R v. 3.6.3 (code provided in Supplementary Table S4) using pvclust package with these parameters: Cluster method ‘ward.D2′, distance ‘euclidean’, number of bootstrap resamplings was 10,000. AU values are in blue and indicate the statistical significance of particular branching (values above 95 are equivalent to p-values lesser than 0.05). Statistically significant clusters are highlighted by red dashed rectangles.
Biomolecules 10 01349 g005
Figure 6. Relationship between the observed frequency of PQS per 1000 bp and GC content. Different G4Hunter score intervals are considered. In each G4Hunter score interval miniplot, frequencies were normalized according to the highest observed frequency of PQS. Organisms with max. frequency per 1000 bp greater than 50% are described and highlighted in color.
Figure 6. Relationship between the observed frequency of PQS per 1000 bp and GC content. Different G4Hunter score intervals are considered. In each G4Hunter score interval miniplot, frequencies were normalized according to the highest observed frequency of PQS. Organisms with max. frequency per 1000 bp greater than 50% are described and highlighted in color.
Biomolecules 10 01349 g006
Figure 7. Relationship between GC percentage and % of PQS in genomes of particular archaeal subgroups. The Fitted equation with the R2 coefficient is depicted on the top side of the plot.
Figure 7. Relationship between GC percentage and % of PQS in genomes of particular archaeal subgroups. The Fitted equation with the R2 coefficient is depicted on the top side of the plot.
Biomolecules 10 01349 g007
Figure 8. Differences in PQS frequency by DNA locus. The chart shows PQS frequencies normalized per 1000 bp annotated locations from the NCBI database and shows a comparison between Archaea and Bacteria. Archaea G4-prone motifs are strongly over-represented in ncRNA and rRNA compared to the average G4 density in Archaea (mean f = 1.207), but also compared to bacteria. PQS count is provided in Supplementary Table S3 Excel file.
Figure 8. Differences in PQS frequency by DNA locus. The chart shows PQS frequencies normalized per 1000 bp annotated locations from the NCBI database and shows a comparison between Archaea and Bacteria. Archaea G4-prone motifs are strongly over-represented in ncRNA and rRNA compared to the average G4 density in Archaea (mean f = 1.207), but also compared to bacteria. PQS count is provided in Supplementary Table S3 Excel file.
Biomolecules 10 01349 g008
Figure 9. Experimental evidence for quadruplex formation with archaea sequences. Isothermal differential absorbance (IDS; panel A) and circular dichroism (CD; panels B and C) spectra of Hadesarchaea archeon DNA sequences were recorded at 20 °C (panels A and B) or at a high temperature (80 °C) for CD (panel C).
Figure 9. Experimental evidence for quadruplex formation with archaea sequences. Isothermal differential absorbance (IDS; panel A) and circular dichroism (CD; panels B and C) spectra of Hadesarchaea archeon DNA sequences were recorded at 20 °C (panels A and B) or at a high temperature (80 °C) for CD (panel C).
Biomolecules 10 01349 g009
Table 1. A number of putative quadruplex sequences (PQS) were found using four different window sizes in three complete archaeal genomes.
Table 1. A number of putative quadruplex sequences (PQS) were found using four different window sizes in three complete archaeal genomes.
Archaea (GC %)Number of G4 Sequences Found for a Window of:
25 nt30 nt50 nt100 nt
Methanococcus maripaludis C7 (33.3%)55817130
Cenarchaeum symbiosum A (57.3%)601931973245
Halobacterium salinarum NRC (65.9%)473823132624
Table 2. Number of PQS found and their frequencies per 1000 bp in all 3387 archaeal genomes, grouped by G4Hunter score (1.2-1.4 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc.).
Table 2. Number of PQS found and their frequencies per 1000 bp in all 3387 archaeal genomes, grouped by G4Hunter score (1.2-1.4 means any sequence with a score between 1.2 and 1.399; 1.4 between 1.4 and 1.599, etc.).
G4HSNumber of
PQS in Dataset
Fraction of
All PQS
PQS Frequency
Per kbp
1.2–1.44,344,9170.97181.19
1.4–1.6119,2330.02671.8 × 10−2
1.6–1.863570.001429.9 × 10−4
1.8–2.01740.00003892.5 × 10−5
>2.01320.00002952.2 × 10−5
Total4,470,8131
Table 3. Genomic sequences sizes, GC%, total count of PQS, and mean frequencies of quadruplex motifs. Seq (total number of sequences), Median (median length of sequences), Short. (shortest sequence), Long. (longest sequence), GC % (average GC content), PQS (total number of predicted PQS), Mean f (mean frequency of predicted PQS per 1000 bp), Min f (lowest frequency of predicted PQS per 1000 bp), Max f (highest frequency of predicted PQS per 1000 bp). %PQS corresponds to the probability that any given nucleotide in the group or subgroup belongs to a G4-prone region (G4H > 1.2). Colors correspond to phylogenetic tree depiction.
Table 3. Genomic sequences sizes, GC%, total count of PQS, and mean frequencies of quadruplex motifs. Seq (total number of sequences), Median (median length of sequences), Short. (shortest sequence), Long. (longest sequence), GC % (average GC content), PQS (total number of predicted PQS), Mean f (mean frequency of predicted PQS per 1000 bp), Min f (lowest frequency of predicted PQS per 1000 bp), Max f (highest frequency of predicted PQS per 1000 bp). %PQS corresponds to the probability that any given nucleotide in the group or subgroup belongs to a G4-prone region (G4H > 1.2). Colors correspond to phylogenetic tree depiction.
KingdomSeq.MedianShortLongGC %PQSMean f Min fMax f% PQS
Archeae33871,686,930100,21213,399,91546.517,927,7751.210.0415.313.58
SuperphylumSeq.MedianShortLongGC %PQSMean fMin fMax f% PQS
BAT3201,180,629164,7953,506,10543.07421,6781.160.058.423.49
Cren3791,808,184210,8606,451,20443.051,009,6601.560.099.444.75
Asgard 712,322,715291,5155,684,03838.7574,6470.470.121.501.39
DPANN309832,169100,2126,604,95339.22219,0580.700.084.202.18
Eury 23081,826,841137,79713,399,91548.776,202,7321.250.0415.313.68
PhylumSeq.MedianShortLongGC %PQSMean fMin fMax f% PQS
Bathyarchaeota1281,208,976.5200,4933,506,10546.29245,1621.540.238.423.00
Thaumarchaeota1921,173,909.5164,7953,441,56940.93176,5160.910.055.322.73
Thermoproteales1471,581,744242,5873,969,44845.86513,0532.070.117.386.31
Sulfolobales1182,223,757.5210,8603,034,02438.20200,8420.790.344.582.38
Desulfurococcales291,580,347807,4772,148,44846.9999,2112.290.406.376.95
Verstraetearchaeota181,171,913.5419,1721,937,66246.7640,5861.830.103.435.50
Marsarchaeota151,915,630351,3583,731,39246.7252,8531.640.472.945.01
Geothermarchaeota61,183,145.5803,7971,671,86642.7216,5822.150.967.036.65
Nezhaarchaeota21,332,140.51,315,7071,348,57443.5320160.760.750.772.27
Korarchaeota181,542,873834,2092,942,06548.3968,4342.631.059.447.95
Unclassified Crenarchaeota271,203,892301,0276,451,20437.0119,3610.440.091.491.29
Lokiarchaeota291,892,624320,8475,143,41732.7725,4790.410.211.501.24
Odinarchaeota11,460,7101,460,7101,460,71038.0510380.710.710.712.16
Thorarchaeota292,770,204291,5154,389,05946.5540,0060.600.241.181.76
Heimdallarchaeota122,167,091432,3405,684,03834.4281240.270.120.500.82
Aenigmarchaeota35751,672248,1821,410,47039.3317,9900.710.113.782.12
Nanohaloarchaeota17815,638565,2891,480,84644.5386720.480.091.821.50
Woesearchaeota72966,794.5518,2952,944,56740.7757,8330.660.083.921.96
Pacearchaeota60719,507279,4326,604,95333.7437,6750.560.082.991.73
Nanoarchaeota25577,110204,0811,162,23932.8399400.590.134.201.70
Micrarchaeota39887,931658,7161,333,87550.4142,2981.170.152.863.47
Diapherotrites19568,419302,0641,130,89937.4260770.490.112.331.46
Unclassified DPANN40858,043.5100,2123,188,02335.5733,8460.670.152.392.04
Hadesarchaeota12857,575451,3931,241,44153.7756,3694.611.2615.3114.55
Persephonarchaeota33637,942137,7971,412,53544.0634,9051.490.592.364.49
Thermococcales601,867,904.5207,9092,388,52746.77191,4921.720.477.535.15
Theinoarchaeota24,165,8063,559,5484,772,06441.5754800.660.650.671.94
Methanofastidiosa96992,372156,65613,399,91540.71141,1920.830.083.642.54
Methanococcales241,717,4831,207,3611,936,38732.0115,0650.390.200.861.19
Methanobacteriales 2242,001,0361,157,5213,466,37033.62175,1910.390.042.321.14
Methanopyrales 31,430,3091,421,6211,694,96958.9410,7982.341.973.006.84
Methanomassilicoccales911,404,109640,2232,641,21656.22257,3401.850.224.415.38
Thermoplasmatales1351,621,237593,4532,816,55742.71246,8321.130.117.033.42
Acidoprofondum/DHV2-2111,731,076519,4202,981,80540.5516,6091.210.294.123.59
Archaeoglobales531,901,943478,5353,408,04142.98117,4701.220.573.293.66
Methanosarcinales 2792,913,215208,2615,751,49244.99845,3941.190.157.523.54
Methanomicrobiales 1462,228,967.5622,7993,978,80454.97783,1722.380.237.207.07
Methanocellales 52,957,6351,465,2723,243,77050.9616,8251.210.411.883.51
Halobacteriales4403,585,981397,6235,605,38163.952,271,6001.560.084.254.50
Unclassified Diaforarchaea971,460,542233,1682,294,89447.38136,1151.030.182.553.02
Unclassified other5971,400,198258,3127,416,91546.88862,9621.020.075.163.00
Table 4. Long G4-prone motifs with high G4HS found in Hadesarchea archeon.
Table 4. Long G4-prone motifs with high G4HS found in Hadesarchea archeon.
NameSequences (5′ to 3′)G4 Hunter ScoreIDSCD
038_KAGGCTGGGGGTGAGGGCGGTGGTGGGGAAGGGAGGGGTGGGGGAGAAAACGAAGGGGGT2.07G4Parallel
086_KTGGGGAGGAGGGGAGGGGAGGTGGGCTGGGGGGGGCT2.57G4Parallel
174_KAGGGTGAGGGAGGAGGTGCTGGGGGGAAGGGAGGTGGGGGAGGGGGAGGTGGAGGGGCTGGTGAGGGA2.07G4Parallel
175_KAGGGGAGGAGGGTGGCCGTGGTGGGGGCGGGGGGAGGGGCGGGGGTGGGGGGGCCTGGGGGGA2.54G4Parallel
176_KAGGAGGAGGGTGAGGGACCAGGGGAGGAGGGAGGGGAGGGGGGGAAGGAGGAGGGAGAGGAGGAGGGA1.93G4Parallel
178_KTGGTGGGGGCGGGGGGAGGGGCGGGGGTGGGGGGGCCTGGGGGGA2.89G4Parallel
195_KAGGGGAGGAGGGTGGCCGTGGTGGGGGCGGGGGGAGGGGCGGGGGTGGCCTCCACGGA1.91G4Parallel
196_KAGGGGAGGAGGGAGGGGAGGGGGGGAAGGAGGAGGGAGAGGAGGAGGGA2.22G4Parallel
245_KGGGGTCGTCGGGGGGGAGAGCTGGGGAGGAGGGGAGGGGAGGTGGGCTGGGGGGGGCTGGGGAGGGAGGAGGTGAGGGG2.33G4Parallel
640_KAGGGAGGTGGGGGAGGGGGAGGTGGAGGGGCT2.38G4Parallel
642_KTGGTGGGGGCGGGGGGAGGGGCGGGGGT2.93G4Hybrid*
643_KAGGCTGGGGGTGAGGGCGGTGGTGGGGAAGGGAGGGGTGGGGGAGAAAACGAAGGGGGT2.07G4Parallel
644_KAGGGCGGTGGTGGGGAAGGGAGGGGTGGGGGA2.41G4Parallel
645_KGGCGGGGGGGGAGTCCTTCATCCTGGGGTAGGGG1.74G4Parallel
* Sequence 642_K adopts a hybrid structure at room temperature, which is converted to a parallel conformation at high temperatures.
Table 5. Detailed characteristics of archaeal species with PQS frequency per 1000 bp greater than 6.00. Living environments data were obtained from the BioSample NCBI database [83].
Table 5. Detailed characteristics of archaeal species with PQS frequency per 1000 bp greater than 6.00. Living environments data were obtained from the BioSample NCBI database [83].
Organism NameGC ContentPQS f% PQSLiving Environment
(Isolated from)
Hadesarchaea archaeon isolate WYZ-LMO665.0115.31051.15Hot springs sediment, Yellowstone NP, USA
Hadesarchaea archaeon isolate WYZ-LMO456.179.68531.10Hot springs sediment, Jinze hot spring, China
Hadesarchaea archaeon isolate WYZ-LMO556.049.58130.69Hot springs sediment, Jinze hot spring, China
Korarchaeota archaeon isolate B35_G1765.019.44528.80Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Bathyarchaeota archaeon B2361.788.41826.12Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Bathyarchaeota archaeon isolate M10_bin13958.427.85824.55Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Thermococcus celer JCM 855857.217.53424.52Solfataric marine water hole on a beach of Vulcano, Italy
Methanosaeta harundinacea isolate UBA15262.017.51823.12Waste water, Suncor tailings pond 6, Canada
Bathyarchaeota archaeon isolate B23_G1557.677.39722.90Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Thermocladium modestius JCM 1008853.147.38125.59Mud from a spring pool, Noji-onsen, Fukushima, Japan
Methanoculleus chikugoensis JCM 1082562.367.19822.90Paddy field soil, Chikugo, Fukuoka, Japan
Methanosaeta harundinacea isolate UBA28161.147.08921.80Wastewater, North Alberta, Canada
Geothermarchaeota archaeon ex4572_2760.547.03222.01Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Thermoplasmata archaeon isolate CSSed11_322R161.827.02822.57Hypersaline soda lake sediment, Kulunda Steppe, Russia
Methanosarcinales archaeon Methan_0260.86.73820.67Anaerobic digester metagenome, Australia
Methanosaeta harundinacea 6Ac60.66.72120.66isolated from an upflow anaerobic sludge blanket reactor treating beer-manufacture wastewater in Beijing, China.
(ref PMID:16403877)
Thermoplasmatales archaeon ex4484_3654.256.67321.15Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Aeropyrum camini SY1 = JCM 1209156.736.37019.72Deep-sea hydrothermal vent chimney, the Suiyo Seamount in the Izu-Bonin Arc, Japan
Bathyarchaeota archaeon isolate B46_G1761.926.33219.03Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Thermoplasmata archaeon isolate B14_G1553.836.32720.11Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Thermoplasmata archaeon isolate B23_G153.666.24019.72Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico
Pyrobaculum neutrophilum V24Sta59.916.23319.52isolated from a hot spring in Iceland
Thermoplasmata archaeon isolate B23_G952.986.16419.65Deep-sea hydrothermal vent sediments, Guaymas Basin, Gulf of California, Mexico

Share and Cite

MDPI and ACS Style

Brázda, V.; Luo, Y.; Bartas, M.; Kaura, P.; Porubiaková, O.; Šťastný, J.; Pečinka, P.; Verga, D.; Da Cunha, V.; Takahashi, T.S.; et al. G-Quadruplexes in the Archaea Domain. Biomolecules 2020, 10, 1349. https://doi.org/10.3390/biom10091349

AMA Style

Brázda V, Luo Y, Bartas M, Kaura P, Porubiaková O, Šťastný J, Pečinka P, Verga D, Da Cunha V, Takahashi TS, et al. G-Quadruplexes in the Archaea Domain. Biomolecules. 2020; 10(9):1349. https://doi.org/10.3390/biom10091349

Chicago/Turabian Style

Brázda, Václav, Yu Luo, Martin Bartas, Patrik Kaura, Otilia Porubiaková, Jiří Šťastný, Petr Pečinka, Daniela Verga, Violette Da Cunha, Tomio S. Takahashi, and et al. 2020. "G-Quadruplexes in the Archaea Domain" Biomolecules 10, no. 9: 1349. https://doi.org/10.3390/biom10091349

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop