1. Introduction
Halobacterium salinarum is an aerobic archaeon found in hypersaline environments, such as salted fish, salt flats, and salterns. These archaea play a crucial role in the ecology and biogeochemical cycles of hypersaline environments and, thus, have garnered significant attention due to the unique adaptations they have to thrive in high-salt conditions. These adaptations include the production of compatible solutes to maintain osmotic balance and modified membrane proteins [
1]. Furthermore,
H. salinarum can perform phototrophy using bacteriorhodopsin, a reddish light-driven proton pump. Due to this archaeon’s extreme conditions, it possesses robust DNA repair mechanisms to counteract the damaging effects of high salt and UV radiation, ensuring genome stability in harsh environments [
2]. We are also particularly interested in
H. salinarum as a source of archaeol, one of the main components of our sulfated lactosyl archaeol (SLA) archaeosome adjuvant. Indeed, SLA archaeosomes are liposomal vesicles composed of a sulfated disaccharide group covalently linked to the free sn-1 hydroxyl backbone of an archaeal core lipid derived from
H. salinarum [
3]. This adjuvant has shown great promise in pre-clinical studies, and a better understanding of potential prophage elements within
H. salinarum is necessary to support the continued progression of SLA archaeosomes toward clinical applications [
4,
5,
6]. The prophage status of this strain is important, as culturing this organism in a good manufacturing practice (GMP) facility requires the absence of active prophages to eliminate phage contamination risks to the facility and resulting products.
Extremophiles such as
H. salinarum thrive in harsh environments and maintain intricate interactions with viruses that target archaea. Haloarchaeophages are viruses that specifically infect halophilic archaea [
7]. These viruses exhibit a wide range of morphologies, including tailed and non-tailed phages, with double-stranded DNA genomes being the most common [
7]. They possess specialized mechanisms to attach to and inject their genetic material into host cells, leading to the replication and assembly of new virus particles. Haloarchaeophages have demonstrated high host specificity, selectively infecting particular species or strains of halophilic archaea.
The study of haloarchaeophages has provided valuable insights into the biology of halophiles and their viral interactions [
7,
8]. These viruses influence the population dynamics and diversity of halophiles, serving as agents of selection and contributing to the evolution of their hosts. Moreover, the infection and lysis of halophilic archaea by haloarchaeophages release organic matter and nutrients back into the environment, influencing the biogeochemical cycles of hypersaline habitats.
Prophages are viral genomes integrated into the host genome, remaining dormant until triggered to enter the lytic cycle. Studies have identified prophages in halophilic archaea, such as
Haloferax volcanii,
Haloquadratum walsbyi, and
Halobacterium halobium, among others [
9,
10,
11]. Prophage elements in halophile genomes indicate that viruses can integrate their genetic material into the host genome and coexist with their hosts for extended periods. Under certain conditions, such as stress or changes in the host cell environment, the prophage may be induced to enter the lytic cycle, producing new virus particles and lysis of the host cell. The study of prophages in halophiles provides insights into the genetic diversity and evolution of halophilic archaea. It also contributes to understanding viral–host interactions in extreme environments and the mechanisms underlying viral latency and reactivation.
Haloarchaeophages also hold promise in various biotechnological applications. Their stability in high salt concentrations and extreme conditions makes them valuable tools for genetic engineering, DNA delivery systems, and bioremediation. The unique properties of haloarchaeophages offer opportunities to develop innovative molecular biology and biotechnology approaches.
This study aims to perform a comprehensive genetic analysis of Halobacterium salinarum strain ATCC 33170 to identify any prophage sequences encoded within its genome. Advanced bioinformatic analyses have been employed to identify putative prophage sequences within the ATCC 33170 genome. Functional annotation and prediction of prophage-associated genes provide insights into potential roles in host–virus interactions, such as lysogeny, host manipulation, or environmental adaptation. The findings of this study contribute to our understanding of the genetic landscape of Halobacterium salinarum strains and their viral interactions. Unraveling the prophage sequences and their potential functions will provide valuable insights into the coevolutionary dynamics between halophilic archaea and viruses in highly saline environments and provide data to support the further development of components, such as archaeol, derived from halophiles.
2. Materials and Methods
Genomic data for
H. salinarum ATCC 33170 were downloaded from ATCC and NCBI (NZ_JACHGX010000001—NZ_JACHGX010000037) for prophage analysis. The ATCC genomic data consist of 28 contiguous overlapping DNA segments (contigs), with the largest contig measuring 718.6 kilobases (kb). The NCBI genomic data consist of 37 contigs, with the largest contig measuring 642.1 kb long. The genomic data were imported into Geneious Prime and used for de novo assembly with Flye version 2.7 [
12]. Input data type was selected as high-quality contigs with an estimated genome size of 2.4 megabases (Mb). No sequences were trimmed before assembly.
The Flye contigs were split into 50 and 90 kb sections and submitted to BLASTX and BLASTN (discontiguous megablast) using the non-redundant protein database (accessed January 2022). The first submission used the nr database restricted to viruses (tax id: 10239), then analyzed again without database restrictions to identify regions with high viral hits spanning more than 5 kb.
The contig file stemming from the Flye assembly was also submitted to PHASTER (
www.phaster.ca, accessed on 13 January 2022) with the checkbox for submitting a file consisting of multiple separate contigs selected [
13]. The results were downloaded and indicated the presence of an incomplete prophage region on contig 14. An in-depth analysis of this region was conducted in Geneious Prime. The study revealed that the incomplete prophage region contains six genes translated to obtain protein sequences for further analysis with BLASTP, PHYRE2, and InterProScan [
14,
15,
16].
The six proteins were submitted to BLASTP against the non-redundant protein database and a restricted viral database, and the top hit was documented. The proteins were also submitted to PHYRE2 to identify the closest protein templates based on the predicted structure [
14]. Results were downloaded and analyzed for protein functionality and origin.
The InterProScan [
15,
16] plugin for Geneious Prime was used to analyze the protein sequences against the following member databases: Conserved Domains Database (CDD [
17]), Gene3d [
18], High-quality Automated and Manual Annotation of Proteins (HAMAP [
19]), Panther [
20], PfamA [
21], Protein Information Resource SuperFamily (PIRSF [
22]), Simple Modular Architecture Research Tool (SMART [
23]), SuperFamily [
24], and The Institute for Genomic Research collection of manually curated protein families (TIGRFAM [
25]). Results were annotated to the nucleotide sequence of the incomplete prophage region, and protein families were analyzed for taxonomic associations.
The contig nucleotide sequences were submitted to Prokaryotic Antiviral Defence LOCator (PADLOC) v2.0.0 to identify any potential antiviral defense systems encoded [
26].
4. Discussion
Halophilic euryarchaeotes, a group of extremophilic archaea adapted to high-salt environments, have attracted considerable scientific interest due to their ecological significance and unique adaptations. They play a crucial role in the biogeochemical cycling of hypersaline environments and offer valuable insights into the limits of life on Earth [
32,
33,
34]. Hypersaline waters and salt crystals contain high numbers of haloarchaeal cells and their viruses, representing a worldwide distributed reservoir of orphan genes and possibly novel virion morphotypes [
7,
8]. The study of haloarchaeal-associated viruses, known as halophages, provides a deeper understanding of viral–host interactions and unveils potential biotechnological applications due to their unique features [
8].
Over 110 viruses have been described for halophilic archaeal hosts with lifestyles ranging from lytic to forming chronic infections of their hosts [
7,
9,
35]. Prophages or prophage-like elements have been documented in halophilic archaeal genomes, such as
Haloferax volcanii,
Haloquadratum walsbyi, and
Halobacterium halobium [
9,
10,
11]. Most of these prophage sequences correspond to pleomorphic viruses, which comprise the second largest group of halophages, with over 18 documented [
9,
35]. These viruses establish non-lytic infections, presumably using budding as an exit mechanism, and have been isolated against archaea genera, such as
Halorubrum,
Haloarcula,
Halogeometricum, and
Natrinema [
9,
35,
36,
37,
38,
39].
One seemingly pervasive type of pleomorphic halophage is the
Haloarcula His2, which shares protein similarity with several prophages in the halophiles [
40,
41,
42]. For example,
Haloquadratum walsbyi was found to encode an incomplete prophage related to His2 and two pleomorphic haloviruses, HRPV-1 and HHPV1, which also have a similar block of homologs related to His2 [
10,
35,
42]. Furthermore, the
Halorubrum pleomorphic virus-1 (HRPV-1), isolated from a solar saltern, was found to encode three structural proteins, VP3, VP4, and VP8 [
36,
43]. The HRPV-1 encoded proteins show significant similarity to the proteins of the minimal replicon of plasmid pHK2 of
Haloferax sp. and the His2 phage [
36,
42,
44]. As several halophage sequences are available from a broad array of halophiles, analysis of the
H. salinarum genome for the presence of prophage sequences is possible.
PHASTER analysis of the 21 Flye contigs identified a 7 kb portion of contig 14 encoding putative prophage elements. The assigned prophage completeness score was low (i.e., 40), suggesting an incomplete prophage region. Further analysis of all contigs using BLASTX and discontiguous megablast did not reveal other areas encoding potential prophage elements in the genome. While some of the genes appear to be viral in origin, BLASTP analysis with a restricted database shows mixed viral-family hits, which is highly unusual for an intact phage.
PHYRE2 analysis on the six proteins of interest revealed high homology to various proteins from bacteria, archaea, and a transposon (IS200) with 100% confidence, as discussed in the Results section. The gp1 protein was modeled to a glucose-1-phosphate uridylyltransferase, or UGPase, an enzyme that catalyzes UDP-glucose production from glucose-1-phosphate and UTP [
45]. This enzyme is widespread due to its role in glycogen synthesis and forming glycolipids, glycoproteins, and proteoglycans [
46,
47,
48]. Although glycoproteins are featured in some halophage capsids, this enzyme was not modeled to any phage proteins [
43]. Gp2 PHYRE2 results are to UDP-glucose dehydrogenase, which catalyzes a two-step NAD-dependent oxidation of UDP-glucose (UDP-Glc) to produce UDP-glucuronic acid (UDP-GlcA) [
49]. Studies of this enzyme have demonstrated its importance in polysaccharide biosynthesis and detoxification [
49]. This enzyme was also not modeled to a protein with PHYRE2.
Modeling of the gp3 protein hit the crystal structure of a heterodimer of Cdc6/Orc1 initiators bound to the origin DNA from
Sulfolobus solfataricus [
50]. Cellular initiators form higher-order assemblies on replication origins, using ATP to remodel duplex DNA and facilitate the loading of replisome components [
51]. This protein appears to be a core component of the basal initiation machinery used to recognize the origin of replication in
S. solfataricus, suggesting it is not a phage-derived protein.
Further investigation of gp4 and gp6 showed that their structures are similar to the cryo-EM structure of the Cas12f1-sgRNA-target DNA complex [
52]. The type V-F Cas12f proteins are compact and associate with a guide RNA to cleave single- and double-stranded DNA targets [
53]. A cryo-electron microscopy structure revealed that two Cas12f1 molecules assemble with the single guide RNA to recognize the double-stranded DNA target [
52,
53]. Each Cas12f1 protomer plays distinct roles in nucleic acid recognition and DNA cleavage, explaining how the miniature enzyme achieves RNA-guided DNA cleavage [
54,
55]. There is a single hit to the T7 bacteriophage primase-2 helicase, though the modeled region is tiny in comparison and is ranked 93 and 80 for gp4 and gp6, respectively.
The gp5 protein is modeled to the crystal structure of IS200 transposase of
Sulfolobus solfataricus [
56]. IS200 transposases, present in many bacteria and Archaea, are distinct from other groups of transposases. Two monomers form a tight dimer, forming the catalytic site at the interface between the two monomers [
56]. A phage hit corresponding to a DNA replication organizer membrane protein of phage Phi29 was identified [
57]. The hit was ranked 30, having 9% confidence and 25% identity over 12 amino acids; thus, it is unlikely that gp5 is a phage-derived protein.
Examination of the contigs for phage-defense systems with PADLOC revealed six putative genes on five contigs. To defend against viruses, Archaea are thought to primarily use the Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated genes (CRISPR-Cas) system [
58,
59], Toxin–Antitoxin (TA) systems, Restriction Modification (RM) systems, and alteration of cell surface proteins [
60,
61]. No CRISPR-Cas or TA systems were identified in the analysis, only RM, AMP-ligating, and unidentified systems. Overall, the phage-defense system carriage rate of
H. salinarum is low compared to bacteria, which can carry upwards of 15 defense systems [
62].
As this strain will be used to produce adjuvants, the mutational behavior of nearby strains should be assessed. In the investigation of the evolutionary dynamics of
H. salinarum, a mutation-accumulation experiment was conducted, and the genome was sequenced, comparing it to the moderate halophilic archaeon
Haloferax volcanii [
63]. The mutation accumulation in
H. salinarum over 1250 generations revealed a base-substitution rate of 3.99 × 10
−10 per site per generation, comparable to that of
H. volcanii [
63]. However, dissimilarities in genome-wide insertion–deletion rates and mutation spectra suggest unique evolutionary pathways. Notably,
H. salinarum is characterized by a high rate of spontaneous mutations attributed to mobile genetic elements (MGEs), including ISH elements and transposons [
64,
65]. Studied since the 1980s, these elements are associated with the insertional inactivation of genes, as well as genome inversions and rearrangements. For example, differences between laboratory strains NRC-1 and R1 primarily stem from the dynamic mobilome, exemplifying the impact of MGEs on the evolutionary landscape of
H. salinarum [
65].