Next Article in Journal
Identification of Host Factors Interacting with a γ-Shaped RNA Element from a Plant Virus-Associated Satellite RNA
Previous Article in Journal
Minor Variants of Orf1a, p33, and p23 Genes of VT Strain Citrus Tristeza Virus Isolates Show Symptomless Reactions on Sour Orange and Prevent Superinfection of Severe VT Isolates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genomic Analysis and Taxonomic Characterization of Seven Bacteriophage Genomes Metagenomic-Assembled from the Dishui Lake

1
College of Food Science and Technology, Shanghai Ocean University, Shanghai 201306, China
2
Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266000, China
3
Laboratory of Quality and Safety Risk Assessment for Aquatic Products on Storage and Preservation, Ministry of Agriculture and Rural Affairs, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Viruses 2023, 15(10), 2038; https://doi.org/10.3390/v15102038
Submission received: 13 September 2023 / Revised: 27 September 2023 / Accepted: 29 September 2023 / Published: 30 September 2023
(This article belongs to the Section Bacterial Viruses)

Abstract

:
Viruses in aquatic ecosystems exhibit remarkable abundance and diversity. However, scattered studies have been conducted to mine uncultured viruses and identify them taxonomically in lake water. Here, whole genomes (29–173 kbp) of seven uncultured dsDNA bacteriophages were discovered in Dishui Lake, the largest artificial lake in Shanghai. We analyzed their genomic signatures and found a series of viral auxiliary metabolic genes closely associated with protein synthesis and host metabolism. Dishui Lake phages shared more genes with uncultivated environmental viruses than with reference viruses based on the gene-sharing network classification. Phylogeny of proteomes and comparative genomics delineated three new genera within two known viral families of Kyanoviridae and Autographiviridae, and four new families in Caudoviricetes for these seven novel phages. Their potential hosts appeared to be from the dominant bacterial phyla in Dishui Lake. Altogether, our study provides initial insights into the composition and diversity of bacteriophage communities in Dishui Lake, contributing valuable knowledge to the ongoing research on the roles played by viruses in freshwater ecosystems.

1. Introduction

Viruses are the most abundant entities and harbor the majority of genetic diversity in aquatic ecosystems by far [1]. It is estimated that there are 1031 virus-like particles (VLPs) on Earth [2], playing a crucial role in natural processes by regulating global biogeochemical processes and facilitating microbial evolution through horizontal gene transfer [3]. Typically, phages are responsible for the lysis of about 10–50% of marine bacteria [4].
Freshwater systems house an estimated 1.76 × 1027 VLPs [5]. The scientific community has shown increasing interest in exploring viruses in freshwater habitats, even though the majority of viruses remain uncultured. High-throughput sequencing (HTS) has played an irreplaceable role in enabling us to examine and understand viral community characteristics across various freshwater systems without the need for cultivation or reliance on specialized isolation methods [6,7,8,9]. This has led to the discovery of the potential key interactions between viruses and hosts in deep freshwater lakes with metagenomes [10], the known diversity of eukaryotic and prokaryotic viruses in the Yangtze River [11], seasonal alternations in the viral community structure in artificial freshwater reservoir [12], and the various lake viromes worldwide [13,14,15]. Countless genomic fragments were assembled and characterized from viral metagenomic data [16], with the constant development and improvement of computational tools and standards for uncultivated viruses [17,18,19,20,21], allowing for the accurate identification of novel viral groups.
Dishui Lake, situated in Shanghai’s Pudong New Area, is the city’s largest artificial lake. Sourced from the Huangpu River, the lake accepts surface runoff and discharges into the East China Sea, undertaking important functions such as flood control and drainage, and displacing water bodies. The water quality as well as the diversity of bacteria and phytoplankton in Dishui Lake have been investigated and are closely related to human activities [22,23,24,25]. During our metagenomic and metaviromic investigation of eukaryotic microbial viruses in Dishui Lake, diverse large/giant dsDNA viruses and their parasitic viruses of virophages have been discovered, identifying a new tripartite cell–virus–virophage (C-V-v) infection system that consists of green algae, large green algal virus, and virophage [26,27,28,29]. To date, the bacteriophage community in the lake has remained largely unexplored and awaits further research in deciphering their enigmatic characteristics.
In this study, we obtained complete genomes of seven novel phages from Dishui Lake based on metagenomics. A variety of approaches were used to analyze their distinctive genomic features, affiliations, and potential hosts. Three of them were identified to be new members of Autographviridae and Kyanoviridae, and each of the other four was classified as a new family in Caudoviricetes. Collectively, unique bacteriophages were discovered in Dishui Lake for the first time, which provides fresh perspectives on the variety and distribution of phages in offshore freshwater ecosystems as well as contributes to the isolation and cultivation of specific phages from Dishui Lake.

2. Materials and Methods

2.1. Metagenomic Data Source

The Dishui Lake metagenomic datasets have been previously reported in our work [26,27]. Briefly, surface water samples at a depth of 1.5 m were collected from Dishui Lake, Shanghai (121°55′27″ E, 30°53′55″ N) and filtered through 0.22 µm (pore size) membrane (GSWP, Merck Millipore). The total DNA of microbial biomasses was extracted using QIAamp Fast DNA Stool Mini Kit (QIAgen).
High-throughput sequencing was performed by the Illumina Miseq sequencing platform at Shanghai Personalbio Technology Co., Ltd. (Shanghai, China). Miseq sequencing library was constructed with insert sizes of approximately 400 bp fragments (the Nextera DNA Flex Library Prep Kit, Illumina), and paired-end sequencing of 2 × 251 bp was carried out. Mate pair (MP) library of approximately 3 kb in length was also prepared and sequenced for validating the correct assembly of sequences. Removing the sequencing adapters and trimming low-quality (<Q20) reads was performed using FastQC and NGS QC Toolkit [30]. The high-quality reads ranging in length from 50 to 250 bp were retained and then assembled into contigs varying from 250 bp to 50 kbp [27].

2.2. Viral Genome Assembly

Contigs containing conserved phage genes were considered as the references for assembly with clean sequence dataset (minimum overlap ≥25 bp, minimum overlap identity ≥ 95%) and contigs dataset (minimum overlap ≥ 25 bp, minimum overlap identity ≥ 97%). The completeness and precision of the assembled viral genomes were double-checked by MP library mapping and PCR validation. PCR primers were designed based on the assembled genomic sequences, and DNA templates for PCR were the same DNA samples as used for metagenomic sequencing. PCR amplicons were purified and sequenced. The sequences were aligned with the assembled sequence scaffolds. The Geneious R10.0.9 (https://www.geneious.com accessed on 12 May 2018) and MP library were used to assemble and check the topology of sequences, respectively. Viral contigs (>10 kb) were screened by Cenote-Taker2 [31] to assess Dishui Lake viral diversity.

2.3. Genome Characterization

Open reading frames (ORFs) of seven Dishui Lake phage genomes were predicted with Prodigal V2.6.3 in meta mode [32], and their potential functions were annotated through BLASTP [33] search against the NCBI non-redundant (nr) database (http://blast.ncbi.nlm.nih.gov/ accessed on 16 June 2022) and the Batch CD-Search tool of the NCBI Conserved Domains Database (CDD) [34]. HHpred search [35] (https://toolkit.tuebingen.mpg.de/hhpred accessed on 16 June 2022) was conducted against Pfam-A_v35, PDB_mmCIF70_31_Jul, UniProt-SwissPort-viral70_3_Nov_2021, and COG_KOG_v1.0 databases. Only if the protein sequence had a high probability (>95%) was it recognized when compared to the HHpred database. Auxiliary metabolic genes (AMGs) were predicted using VIBRANT v1.2.1 (Virus Identification By iteRative ANnoTation) with default settings [36]. The tRNA genes were identified using the tRNAscan-SE search program [37]. BACPHLIP was employed to determine the phage lifestyle (lysogenic or lytic) based on its genome [38]. Phage genome maps were visualized using CGView [39].

2.4. Retrieval of Viruses Related to Dishui Lake Phages in Environmental Datasets

To figure out the affiliations of the Dishui Lake phages with other uncultivated viral genomes (UViGs), the ORFs of seven Dishui Lake phages were queried against the IMG/VR4 database (released on 20 September 2022) [40]. A virus was considered a relative [41,42] if I) it shared at least 50% of proteins (E < 1 × 10−5, identity ≥ 30% and coverage ≥ 50%, bitscore ≥ 50) with the query genome in common, and II) it had a genome size neither shorter than 50% nor longer than 150% compared to the respective Dishui Lake phages.

2.5. Protein-Cluster-Based Gene-Sharing Network

Seven Dishui Lake phage genomes and the selected environmental UViGs were combined with the vConTACT v2.0 database [43] which was set to ProkaryoticViralRefSeq211-Merged, containing reference sequences of prokaryotic viruses (released on 25 April 2022), with standard parameters (options: --rel-mode Diamond --db ProkaryoticViralRefSeq211-Merged --pcs-mode MCL --vcs-mode ClusterONE --threads 32). Relevant genomes that were clustered with the Dishui Lake phages (similarity score ≥ 1) were chosen for follow-up analysis. The visualization of gene-sharing networks was performed by Cytoscape v3.9.1 (http://cytoscape.org/ accessed on 10 October 2022) [44].

2.6. Proteomic Trees

Proteomic trees were generated using the Viral Proteomic Tree server (ViPTree) [45], with the whole genomes of phages obtained in this study, known ICTV-ratified families, and the reference sequences of Virus–Host DB [46] to describe their family-level taxonomic status and relationship. The trees were edited with the Interactive Tree of Life (iTOL) online tool [47] (https://itol.embl.de/upload.cgi accessed on 20 April 2023).

2.7. Comparative Genomic Analysis

The proportion of orthologous components (ratio of orthologous gene numbers) between viral genomes was calculated using CompareM (https://github.com/dparks1134/ CompareM; --evalue 1 × 10−5 --per_identity 30) (accessed on 18 March 2023). The outcome was transformed into a matrix with the tidyr R package, and heatmaps were generated using the pheatmap R package. Dishui Lake phages and the related viruses shown in the proteomic trees were clustered and calculated by using VIRIDIC [48] according to the nucleotide-based intergenomic similarity thresholds at 95% for species and 70% for genus. BLASTN parameters were set as: ‘-word size 7-reward 2-penalty-3-gapopen 5-gapextend 2′. Global alignment and comparison between genomes in each homologous gene cluster and the gene order were interactively visualized with Clinker [49] (https://github.com/gamcil/clinker accessed on 25 March 2023) with default settings.

2.8. Phylogenetic Analysis

Phylogenetic analyses were conducted by using the protein sequences of phage signature genes of terminase large subunit (TerL) and DNA-dependent RNA polymerase of Autographiviridae, identified in Dishui Lake phages and their homologous counterparts detected in other viruses in the NCBI GenBank database (BLASTP; E-value < 1 × 10−5; identity > 30%; and alignment coverage > 50%). Protein sequences were aligned with MUSCLE in the MEGA X software [50] under the default settings, and alignments containing more than 50% gaps were excluded. Neighbor-joining trees were reconstructed with 1000 bootstrap value and visualized using iTOL. Fourteen conserved core genes in the family Kyanoviridae as well as DSL-LC02 and DSL-LC03 were concatenated, aligned by using MAFFT v1.5 [51], and trimmed with TrimAL v1.4 [52]. Phylogenetic analysis was carried out with FastTree v2.1 [53] under Whelan and Goldman (WAG) substitution mode.

2.9. Phage Host Prediction

Host prediction was performed using iPHoP [54], HostG [55], and the CrisprOpenDB tool [56]. Meanwhile, the Dishui Lake phage tRNA genes were used as queries to search the NCBI nr database via BLASTn for identifying potential hosts, and top hits (at least 95% coverage and 95% identity) were deemed authentic matches [57].

3. Results

3.1. Viral Community Composition in Dishui Lake

In total, 415 viral contigs (>10 kb) were primarily identified in the Dishui Lake metagenomic dataset by Cenote-Taker2 (Table S1), and they were assigned to Kyanoviridae (14.0%), Autographiviridae (2.2%), Inoviridae (5.8%), and other virus categories (3.1%). The remaining contigs were grouped into unclassified phages (69.9%) or viruses (5.0%). Overall, Kyanoviridae was the most abundant identifiable virus in Dishui Lake.

3.2. Genomic Features of Dishui Lake Phages

According to the metagenomic sequence assembly, reference assembly (contig extending), and PCR-based genomic sequence checking, complete genomes of seven phages were discovered in the Dishui Lake metagenome, and their genome size ranged between 29 kbp and 173 kbp (Table 1). For ease of reference, they were named DSL-LC01 (OQ999401), DSL-LC02 (OR032571), DSL-LC03 (OR003938), DSL-LC04 (OR003939), DSL-LC05 (OR003940), DSL-LC06 (OR003941), and DSL-LC07 (OQ999402), respectively.
The combination of the nr database comparison, Batch-CD search for conserved structural domains, and the HHpred search for remote relatives was applied for accurate annotation of viral ORFs (Table S2). A total of 44.8% (377/841) ORFs were functionally annotated, including genes encoding replication proteins present in all genomes, such as DNA primase, helicase, and polymerase (Figure 1 and Figure 2). The predicted structural proteins in Dishui Lake phages were MCP, minor capsid protein (mCP), portal proteins, baseplate hub proteins, the tail tube, and tape measure proteins (Figure 1 and Figure 2). The HK97-fold major capsid proteins (MCPs) are the vital proteins in Duplodnaviria [58]. The remaining 464 (55.2%) were predicted to be hypothetical proteins or unknown ORFs (Figure 1 and Figure 2). Additionally, a total of 19 AMGs were detected in the Dishui Lake phage genomes by VIBRANT and are associated with seven distinct metabolic categories (Figure 1 and Figure 2) (Figure S1). The top three are glycan biosynthesis and metabolism, followed by energy metabolism and carbohydrate metabolism. DSL-LC01 appeared to be a lysogenic phage, while the others were lytic ones (Table 1).

3.3. Protein-Cluster-Based Gene-Sharing Network

A total of 238 relevant viral sequences of Dishui Lake phages were retrieved from the IMG/VR4 dataset (Table S3). These sequences along with the viral sequences in the NCBI Bacterial and Archaeal Viral RefSeq V211 with ICTV and NCBI taxonomy (released on 25 April 2022) were subjected to the gene-sharing network analyzing with the seven Dishui Lake phages by using vConTACT2 (a total of 4723 input sequences). Notably, ICTV has updated the classification of viruses, eliminating the former order Caudovirales and the families Myoviridae, Siphoviridae, and Podoviridae [21]. Accordingly, the following results were interpreted based on the updated classification information in NCBI and ICTV taxonomy.
Six Dishui Lake phages formed five VCs of DSL-LC01 (VC_339_2), DSL-LC02 (VC_49_31), DSL-LC03 (VC_49_32), DSL-LC05 (VC_341_0), DSL-LC06 (VC_342_0), and DSL-LC07 (VC_334_0) (Figure 3 and Table S4). All Dishui Lake phages shared more genes with IMG/VR uncultivated environmental viruses compared with the viruses in the NCBI RefSeq. DSL-LC04 was classified as an outlier to VC_340_0 comprising the UViGs solely. DSL-LC06, together with 50 UViGs, formed VC_342_0, which represents an orphaned group. The grouping of DSL-LC05 resembled that of DSL-LC04 and DSL-LC06 but additionally connected to unclassified Caudoviricetes viruses of Vibrio phage VvAW1 (NC_020488.1), Rhodoferax phage P26218 (NC_029061.1), and Thalassomonas phage BA3 (NC_009990.1). DSL-LC01 was in VC_339_2 and connected to the outlier Gordonia phage GodonK (NC_048176.1), a member of the floating genus Godonkavirus in Caudoviricetes. The VC_334_0 where DSL-LC07 was clustered contained 84 viral sequences, 75 of which were UViGs, with 70 and eight classified to Autographiviridae in IMG/VR and RefSeq, respectively. This indicates that DSL-LC07 is most likely a member of this viral family. DSL-LC02 and DSL-LC03 were grouped to the same VC. Most of the viruses that were related to them affiliated with the newly established family Kyanoviridae, and others belonged to floating genera and unclassified viruses in the class Caudoviricetes. In addition to environmental viruses, Synechococcus phage S-CAM9 (NC_031922.1) shared the most genes with DSL-LC02 and DSL-LC03 (Figure 1).

3.4. Proteomic Trees

Based on the above classification results, the exemplar viruses from Kyanoviridae (57 sequences) and Autographiviridae (373 sequences) in the ICTV official Master Species List (MSL) were used as reference sequences to determine whether DSL-LC02, DSL-LC03, and DSL-LC07 were members in these families.
The proteomic trees were reconstructed by using ViPTree based on the whole genomic sequences of DSL-LC02, DSL-LC03, and the Kyanoviridae phages. As shown in Figure 4a, DSL-LC02 and DSL-LC03 were grouped together, closely related to two known cyanophages (Synechococcus virus S-PRM1of Makelovirus and S-CAM9 of Kanaloavirus), and were embedded in Kyanoviridae. The branching scale between the members of Neptunevirus was minimal (around 0.27), while DSL-LC02 and DSL-LC03 had a branching value of about 0.1. Accordingly, the grouping of ViPTree laterally indicates that each of these two DSL phages likely represents a new genus of the family.
DSL-LC07 was clustered with Synechococcus phage S-CBP3 of Lirvirus and S-CBP4 of Poseidonvirus) (Figure 4b). The branching scales between members of the same genus in the Autographiviridae showed that Uliginvirus had the smallest branching scale of about 0.3. By contrast, the branching scale between DSL-LC07 and known viruses was at about 0.2, which suggests that DSL-LC07 could be classified at the taxonomic rank of new genus in this family.
Similar to the above results of the gene-sharing network (Figure 3), Dishui Lake phages of DSL-LC01, DSL-LC04, DSL-LC05, and DSL-LC06 were not closely related to known virus families but to UViGs (Figure 5 and Figure S2). Collectively, the results suggest that these four Dishui Lake phages could be assigned to new families along with closely related UViGs (if complete genome available) of Caudoviricetes.

3.5. Orthologous Fraction and Nucleotide Identity of Viral Genomes

To further evaluate their classification on either the family or genus level, the orthologous fraction (OF) was calculated for DSL-LC02, DSL-LC03, and DSL-LC07 with CompareM. Notably, Dishui Lake phages of DSL-LC01, DSL-LC04, DSL-LC05, and DSL-LC06 were excluded from such analysis because of the absence of known families associated with them. As a result, DSL-LC02 and DSL-LC03 shared 20–50% of OFs with the representative members in Kyanoviridae (Figure S3), and DSL-LC07 shared 18–60% of OFs with Autographiviridae viruses. The OFs shared between these Dishui Lake phages and reference viruses are within the range of OFs shared among members of each of these two families (Figure S3). Moreover, the OFs shared between viruses of the same genus were 70–90%. The taxonomical placements of DSL-LC02, DSL-LC03, and DSL-LC07 based on the OFs are in agreement with the above proteomic tree.
Meanwhile, the assignment of DSL-LC02, DSL-LC03, and DSL-LC07 was also evaluated based on nucleotide identity across the genome (genus > 70%; species > 95%) [21]. As shown in Figure 6, Dishui Lake phages of DSL-LC02 (<21.6%), DSL-LC03 (<23.4%), and DSL-LC07 (<37%) had less nucleotide similarity to viruses in the other genera. The similarity between DSL-LC02 and DSL-LC03 was 30.4%, which is higher in comparison to that of the others in Kyanoviridae but less than 70%. Clearly, these results confirm that each of these three Dishui Lake phages could be classified to a new genus.
DSL-LC02, DSL-LC03, and DSL-LC07 also shared similarity of the colinear distribution of genes involving the capsid structure, genome package, and DNA replication and transcription with their closest relatives of the same family (Figure 1).

3.6. Phylogenetic Analyses

Viral hallmark genes refer to relatively conserved genes within a specific group of viruses [59]. The family Kyanoviridae contains at least 14 conserved genes of T4-like phages (https://ictv.global/files/proposals/approved) (accessed on 18 April 2023), which were identified in both DSL-LC02 and DSL-LC03 as well (Table S5). These two Dishui Lake phages were clustered together on the tree reconstructed by using the core-gene set and were closely related to the genera of Makelovirus and Kanaloavirus (Figure 7) like the proteomic tree (Figure 4a). For the Autographiviridae, all the members encode a characteristic DNA-dependent RNA polymerase (RNAP) [60], which was also present in DSL-LC07. The RNAP tree showed that DSL-LC07 was grouped together with Synechococcus phage S-CPB1, 3 and 4 (Lirvirus and Poseidonvirus) (Figure 8), which is similar to its placement in the proteomic tree (Figure 4b).
The Caudoviricetes class consists of dsDNA bacteriophages also featuring a conserved gene known as TerL [61]. To further assess the potential relatives of four Dishui Lake phages (DSL-LC01, DSL-LC04, DSL-LC05, and DSL-LC06) without association with known viral families, phylogenetic analyses were carried out using the amino acid sequences of annotated TerL. As shown in Figure 9a, DSL-LC01 TerL was most closely related to that of the Microbacterium phages, forming a branch distinct from other ones. The neighboring branch is composed of Gordonia phages, with their hosts all belonging to the order Mycobacteriales of Actinomycetes. TerL of DSL-LC04 and DSL-LC06 were grouped with that of Pelagibacter phages that are ubiquitous in marine environments and infect Pseudomonadota [62] (Figure 9b,d). DSL-LC05 was clustered with Ralstonia and Liberibacter phages, with robust bootstrap value support, whose hosts are affiliated with Pseudomonadota from freshwater environments (Figure 9c).

3.7. Predicted Viral Hosts

The potential hosts of the seven Dishui Lake phages were predicted based on the consistency results of hostG (confidence score > 90) and iPHoP (Table 2). DSL-LC01 seemed to infect the members of Actinomycetia. DSL-LC02, DSL-LC03, and DSL-LC07 are likely the viruses of Cyanobiaceae, and two of them could be annotated to the level of the host genus. DSL-LC04 and DSL-LC05 may feed on members of Pseudomonas and Enterobacterales, respectively. DSL-LC06 appeared to prey on members of Pseudomonadota. The tRNA genes were identified in DSL-LC01, 02, and 03 (Table 1). The Ala-tRNA (TGC) gene of DSL-LC03 matched to that of Cyanobacteriota (Table S6), which is consistent with the hostG and iPHoP results. The other tRNA genes failed to be linked to the potential hosts. No hits were found based on CRISPR spacer-protospacer searches.

4. Discussion

The currently studied freshwater viruses have demonstrated that the numerical superiority of the assigned contigs belonged to head-tail bacteriophages of the Caudoviricetes [63], but our understanding of freshwater viral diversity and function is far from complete. Assembling complete viral genomes can significantly improve the quality of subsequent systems-wide functional profiling studies, determine all of the genetic components, and better understand how it operates, thereby enabling the discovery of new biological insights [64]. In this study, the majority of viral contigs in Dishui Lake dataset were assigned to unknown virus categories which may play important roles in the lake, and some of them were assigned to the Kyanoviridae, containing a large number of cyanoviruses infecting cyanobacteria that frequently dominate in freshwater systems [65]. Importantly, complete genomes of seven Dishui Lake bacteriophages were assembled, analyzed, and taxonomically classified.
Genes that are not involved in viral replication but associated with non-essential viral functions are referred to as AMGs [66], which are hijacked by phages from their hosts, strongly influence microbial metabolism and diversity, and enhance virus fitness by expressing AMGs [67]. DSL-LC01 contains four AMGs (Figure 2). ORF7 encodes D-alanyl-D-alanine carboxypeptidase, which catalyzes distinct carboxypeptidation and transpeptidation reactions during the last stages of cell wall peptidoglycan synthesis, thereby regulating host external structure [68]. ORF125 encodes UDP-galactose mutase (UGM). UGM is an enzyme involved in galactofuranose metabolism and is a critical component in catalyzing the formation of the bacterial cell walls [69]. Thus, DSL-LC01 may inhibit the morphological growth of a host. ORF132 appears to function as DNA methyltransferase that serves to methylate DSL-LC01 genomes, allowing it to evade the host’s restriction enzymes as part of the immune response, and it plays an essential role in gene regulation, potentially relating to virulence [70]. ORF137 protein is homologous to tryptophan 7-halogenase, a halogenating enzyme, which has been recognized as a key factor in incorporating chloride and bromide into activated organic molecules during the biosynthesis of various secondary metabolites [71]. Except for the AMGs, DSL-LC01 also possesses the transcriptional regulator WhiB proteins (ORF69, ORF135), which are exclusively found in actinobacteria and actinobacteriophages and help phages to repress the expression of the host whiB2 gene, thus causing morphological changes and growth inhibition of the host [72]. Notably, over 70% of genes in actinobacteriophages have unknown functions [73], and this is true for DSL-LC01 as well. Collectively, these results suggest that DSL-LC01 is highly likely an actinobacteriophage, encompassing enormous genetic diversity.
Photosynthesis-associated proteins of PsbA and PsbD are only present in the genomes of DSL-LC02, DSL-LC03, and DSL-LC07 (Figure 1). The psbA gene is essential as expressed upon phage infection and contributing to maintaining host photosynthesis during infection to increase its fitness [74]. Some lytic phages-encoded glycosyltransferases (GTFs, PF03414.13) (DSL-LC02 ORF132; DSL-LC03 ORF2) glycosylate their DNA to protect it from the host endonuclease restriction system [75]. GDP-L-fucose (DSL-LC02 ORF140; DSL-LC03 ORF209) and GDP-D-mannose 4,6-dehydratase (GMD) (DSL-LC02 ORF142; DSL-LC03 ORF208) participate in the initial step of de novo biosynthesis of GDP-L-fucose which plays a vital role in the bacterial metabolic pathway [76]. They are also present in the cyanophage P-SSM2 infecting Prochlorococcus, contributing to its life cycles [77]. Taken together, DSL-LC02, DSL-LC03, and DSL-LC07 are highly likely to be cyanophages, and the presence of these enzymes/proteins constitutes one of the main characteristics of these phages.
DSL-LC04, DSL-LC05, and DSL-LC06 contain part of the specific functional proteins that have an impact on the host (Figure 2). The 2OG-Fe(II) oxygenase (DSL-LC04 ORF78) plays a pivotal role in a multitude of biological processes, catalyzing the hydroxylation of molecules in microorganisms to regulate lipid metabolism, DNA repair, protein modification, and secondary metabolite production [78]. During phage infection, it regulates host nitrogen metabolism and energy production by modulating intracellular levels of 2-oxoglutarate [79], suggesting the importance and diverse functions of its ability to reprogram host metabolism during infection [80]. Holin (DSL-LC04 ORF72; DSL-LC05 ORF16) generates non-specific “holes” in the host membrane as well as modulates the length of the infective cycle and the release of viable phage particles in the lysis pathway [81]. Glucosamine 6-phosphate N-acetyltransferase (GNA1) (DSL-LC06 ORF40) may be part of a complex glycosylation system, including enzymes involved in nucleotide sugar formation, glycosyltransferases, and glycosidases, independent of their host [82]. Accordingly, the DSL phages may incorporate essential cellular genes from the host to optimize the expression of their own genes while infecting the same hosts [83].
The tRNA genes were detected in three of these seven phage genomes (Table 1). It is conjectured that the main difference between bacteriophages with and without tRNA genes lies in the length of their genomes, as bacteriophages containing tRNA genes are significantly longer than those without these genes [84], and virulent phages have more tRNA genes than temperate phages, which may be used in the translation process to synthesize essential proteins for them [83]. Some tRNA genes of DSL-LC02 and DSL-LC03 were matched to those of the Synechococcus viruses but not to Synechococcus, even though their hosts are very likely to be the Synechococcus members. Actually, an average similarity of only 70.7% was shared between the tRNA genes of phages and their hosts in the genus rank [83], and only 7.6% of tRNA genes matched to those of the host on the genus level [85]. This likely results from the mutation rate of tRNA genes, which becomes faster in phages than in hosts due to the high growth rate of phages [86]. Clearly, specific hosts can hardly be predicted with certainty through tRNA gene matching alone.
The dominant bacterial phyla in Dishui Lake correspond to typical freshwater bacterial groups, most of which belong to Pseudomonadota, Actinomycetota, Cyanobacteriota, and Bacteroidota [87], consistent with the predicted taxonomic rank of hosts for the Dishui Lake phages (Table 2). Unfortunately, we failed to find the specific cyanobacterium hosts for DSL-LC02, 03, and 07 based on the spacer-protospacer search, even though the CRISPR-Cas systems are predominantly found in the majority of cyanobacteria except for the marine subclade (Synechococcus and Prochlorococcus) [88]. The occurrence of CRISPR-Cas systems was previously estimated to be in 40% of bacteria and 81% of archaea [89], while recently it has been found that only 10% of bacteria possess such innate immune systems and that many lineages of uncultivated bacteria appeared to almost devoid of CRISPR-Cas systems [90]. Collectively, these findings suggest the challenges remaining in confident prediction of potential hosts of uncultivated viruses through spacer-protospacer matching analysis.
In light of the results of the taxonomic classification (Table S7), four families and three genera were proposed to accommodate these novel uncultivated head-tailed viruses. DSL-LC01, distantly related to other actinobacteriophages, was assigned to the family ‘Nanparkviridae’ (a truncation of Nanhuizui Park located in the Dishui Lake sightseeing region). DSL-LC04, DSL-LC05, and DSL-LC06, which infect Pseudomonadota members, were classified into the families of ‘Dishuiviridae’ (a truncation of Dishui Lake), ‘Luchaoviridae’ (Dishui Lake, also known as Luchao Lake), and ‘Nanhuiviridae’ (Dishui Lake is located in Nanhui New Town), respectively. Three novel Synechococcus phages of DSL-LC02, DSL-LC03, and DSL-LC07 were classified to the genera of ‘Norislandvirus’, ‘Souislandvirus’, and ‘Wesilandvirus’ (North Island, South Island, and West Island; three scenic spots in Dishui Lake), respectively.

5. Conclusions

Our study represents the first investigation of bacteriophages in Dishui Lake. The metagenomic datasets have led to the discovery of the complete genomes of seven novel phages. These phages display unique genomic characteristics and could be classified into three new genera within two known families (Kyanoviridae and Autographiviridae) as well as four new families within Caudoviricetes. The potential hosts of these phages belong to the three predominant bacterial phyla (Cyanobacteriota, Pseudomonadota, and Actinomycetota) found in Dishui Lake, with three out of the seven Dishui Lake phages infecting Synechococcus of Cyanobacteriota. Our findings suggest that there is a vast diversity of viruses yet to be discovered in the lake environment as these phages are only distantly related to known viruses. Additionally, six out of the seven Dishui Lake phages are lytic, which highlights their crucial roles in regulating the predator–prey relationship between hosts and phages. Our study emphasizes the importance of metagenomics in uncovering uncultivated viruses in freshwater environments.

Supplementary Materials

The following supporting information can be downloaded at: www.mdpi.com/article/10.3390/v15102038/s1, Figure S1: Auxiliary metabolic genes identified in Dishui Lake phages; Figure S2: Proteomic subtree of DSL-LC01, 04, 05, and 06; Figure S3: Orthologous gene proportions of Dishui Lake phages of DSL-LC02, 03, and 07; Table S1: 415 viral contigs (>10 kb) primarily identified in the Dishui Lake metagenomic dataset by Cenote-Taker2; Table S2: ORF annotation of Dishui Lake phages; Table S3: 238 relevant viral sequences of Dishui Lake phages retrieved from the IMG/VR4 dataset; Table S4: Classification of Dishui Lake phages based on vConTACT2; Table S5: Fourteen conserved core genes of Kyanoviridae detected in DSL-LC02 and 03. Table S6: The tRNA genes of DSL-LC01 and 03 matching to that of bacteria. Table S7: Genomic characteristics and criteria used for the classification of Dishui Lake phages.

Author Contributions

Conceptualization, H.C., Y.Z. and Y.W.; methodology, H.C., Y.Z., X.L., T.X. and Y.W.; validation, H.C., Y.Z., X.L., Y.N., S.W., Y.Y. and Y.W.; formal analysis, H.C., Y.Z. and Y.W.; investigation, H.C., Y.Z., X.L., Y.N., S.W., Y.Y. and Y.W.; resources, Y.Z., T.X., S.W., Y.Y. and Y.W.; data curation, H.C., Y.Z., Y.Y. and Y.W.; writing—original draft preparation, H.C., Y.Z. and Y.W.; writing—review and editing, H.C., Y.Z., Y.N. and Y.W.; visualization, H.C. and Y.Z.; supervision, Y.W.; project administration, Y.W.; funding acquisition, Y.Y. and Y.W.; All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China (41376135 and 31570112). The funder had no additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The genomic sequences generated in this study have been deposited in GenBank with the accession numbers of OQ999401 (DSL-LC01), OR032571 (DSL-LC02), OR003938 (DSL-LC03), OR003939 (DSL-LC04), OR003940 (DSL-LC05), OR003941 (DSL-LC06), and OQ999402 (DSL-LC07). Other data generated or analyzed during this study are available from the authors upon request.

Acknowledgments

We thank Cong Li and Yijian Sheng for constructive discussions.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Suttle, C.A. Marine viruses—Major players in the global ecosystem. Nat. Rev. Microbiol. 2007, 5, 801–812. [Google Scholar] [CrossRef] [PubMed]
  2. Kang, I.; Oh, H.M.; Kang, D.; Cho, J.C. Genome of a SAR116 bacteriophage shows the prevalence of this phage type in the oceans. Proc. Natl. Acad. Sci. USA 2013, 110, 12343–12348. [Google Scholar] [CrossRef]
  3. Bruder, K.; Malki, K.; Cooper, A.; Sible, E.; Shapiro, J.W.; Watkins, S.C.; Putonti, C. Freshwater Metaviromics and Bacteriophages: A Current Assessment of the State of the Art in Relation to Bioinformatic Challenges. Evol. Bioinform. Online 2016, 12, 25–33. [Google Scholar] [CrossRef] [PubMed]
  4. Fuhrman, J.A. Marine viruses and their biogeochemical and ecological effects. Nature 1999, 399, 541–548. [Google Scholar] [CrossRef] [PubMed]
  5. Elbehery, A.H.A.; Deng, L. Insights into the global freshwater virome. Front. Microbiol. 2022, 13, 953500. [Google Scholar] [CrossRef] [PubMed]
  6. Aguirre de Carcer, D.; Lopez-Bueno, A.; Pearce, D.A.; Alcami, A. Biodiversity and distribution of polar freshwater DNA viruses. Sci. Adv. 2015, 1, e1400127. [Google Scholar] [CrossRef] [PubMed]
  7. Palermo, C.N.; Fulthorpe, R.R.; Saati, R.; Short, S.M. Metagenomic Analysis of Virus Diversity and Relative Abundance in a Eutrophic Freshwater Harbour. Viruses 2019, 11, 792. [Google Scholar] [CrossRef] [PubMed]
  8. Skvortsov, T.; de Leeuwe, C.; Quinn, J.P.; McGrath, J.W.; Allen, C.C.; McElarney, Y.; Watson, C.; Arkhipova, K.; Lavigne, R.; Kulakov, L.A. Metagenomic Characterisation of the Viral Community of Lough Neagh, the Largest Freshwater Lake in Ireland. PLoS ONE 2016, 11, e0150361. [Google Scholar] [CrossRef]
  9. Garza, D.R.; Dutilh, B.E. From cultured to uncultured genome sequences: Metagenomics and modeling microbial ecosystems. Cell Mol. Life Sci. 2015, 72, 4287–4308. [Google Scholar] [CrossRef]
  10. Okazaki, Y.; Nishimura, Y.; Yoshida, T.; Ogata, H.; Nakano, S.I. Genome-resolved viral and cellular metagenomes revealed potential key virus-host interactions in a deep freshwater lake. Environ. Microbiol. 2019, 21, 4740–4754. [Google Scholar] [CrossRef]
  11. Lu, J.; Yang, S.; Zhang, X.; Tang, X.; Zhang, J.; Wang, X.; Wang, H.; Shen, Q.; Zhang, W. Metagenomic analysis of viral community in the Yangtze River expands known eukaryotic and prokaryotic virus diversity in freshwater. Virol. Sin. 2022, 37, 60–69. [Google Scholar] [CrossRef]
  12. Moon, K.; Kim, S.; Kang, I.; Cho, J.C. Viral metagenomes of Lake Soyang, the largest freshwater lake in South Korea. Sci. Data 2020, 7, 349. [Google Scholar] [CrossRef] [PubMed]
  13. Prado, T.; Brandao, M.L.; Fumian, T.M.; Freitas, L.; Chame, M.; Leomil, L.; Magalhaes, M.G.P.; Degrave, W.M.S.; Leite, J.P.G.; Miagostovich, M.P. Virome analysis in lakes of the South Shetland Islands, Antarctica—2020. Sci. Total Environ. 2022, 852, 158537. [Google Scholar] [CrossRef] [PubMed]
  14. Du, K.; Yang, F.; Zhang, J.T.; Yu, R.C.; Deng, Z.; Li, W.F.; Chen, Y.; Li, Q.; Zhou, C.Z. Comparative genomic analysis of five freshwater cyanophages and reference-guided metagenomic data mining. Microbiome 2022, 10, 128. [Google Scholar] [CrossRef] [PubMed]
  15. Che, R.; Bai, M.; Xiao, W.; Zhang, S.; Wang, Y.; Cui, X. Nutrient levels and prokaryotes affect viral communities in plateau lakes. Sci. Total Environ. 2022, 839, 156033. [Google Scholar] [CrossRef] [PubMed]
  16. Simmonds, P.; Adams, M.J.; Benko, M.; Breitbart, M.; Brister, J.R.; Carstens, E.B.; Davison, A.J.; Delwart, E.; Gorbalenya, A.E.; Harrach, B.; et al. Consensus statement: Virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 2017, 15, 161–168. [Google Scholar] [CrossRef] [PubMed]
  17. Nooij, S.; Schmitz, D.; Vennema, H.; Kroneman, A.; Koopmans, M.P.G. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front. Microbiol. 2018, 9, 749. [Google Scholar] [CrossRef] [PubMed]
  18. Cantalupo, P.G.; Pipas, J.M. Detecting viral sequences in NGS data. Curr. Opin. Virol. 2019, 39, 41–48. [Google Scholar] [CrossRef]
  19. Andrade-Martinez, J.S.; Camelo Valera, L.C.; Chica Cardenas, L.A.; Forero-Junco, L.; Lopez-Leal, G.; Moreno-Gallego, J.L.; Rangel-Pineros, G.; Reyes, A. Computational Tools for the Analysis of Uncultivated Phage Genomes. Microbiol. Mol. Biol. Rev. 2022, 86, e0000421. [Google Scholar] [CrossRef]
  20. Chibani, C.M.; Farr, A.; Klama, S.; Dietrich, S.; Liesegang, H. Classifying the Unclassified: A Phage Classification Method. Viruses 2019, 11, 195. [Google Scholar] [CrossRef]
  21. Turner, D.; Kropinski, A.M.; Adriaenssens, E.M. A Roadmap for Genome-Based Phage Taxonomy. Viruses 2021, 13, 506. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, Q.; Ding, C.; Achal, V.; Shan, D.; Zhou, Y.; Xu, Y.; Xiang, W.N. Potential for nutrient removal by integrated remediation methods in a eutrophicated artificial lake—A case study in Dishui Lake, Lingang New City, China. Water Sci. Technol. 2014, 70, 2031–2039. [Google Scholar] [CrossRef] [PubMed]
  23. Liu, Y.; Tan, W.; Wu, X.; Wu, Z.; Yu, G.; Li, R. First report of microcystin production in Microcystis smithii Komarek and Anagnostidis (Cyanobacteria) from a water bloom in Eastern China. J. Environ. Sci. 2011, 23, 102–107. [Google Scholar] [CrossRef] [PubMed]
  24. Zhao, K.; Cao, Y.; Pang, W.T.; Wang, L.Z.; Song, K.; You, Q.M.; Wang, Q.X. Long-term plankton community dynamics and influencing factors in a man-made shallow lake, Lake Dishui, China. Aquat. Sci. 2020, 83, 210. [Google Scholar] [CrossRef]
  25. Zhu, W.J.; Pan, Y.D.; Tao, J.J.; Li, X.B.; Xu, X.L.; Wang, Y.F.; Wang, Q.X. Phytoplankton community and succession in a newly man-made shallow lake, Shanghai, China. Aquat. Ecol. 2013, 47, 137–147. [Google Scholar] [CrossRef]
  26. Chen, H.; Zhang, W.; Li, X.; Pan, Y.; Yan, S.; Wang, Y. The genome of a prasinoviruses-related freshwater virus reveals unusual diversity of phycodnaviruses. BMC Genom. 2018, 19, 49. [Google Scholar] [CrossRef] [PubMed]
  27. Gong, C.; Zhang, W.; Zhou, X.; Wang, H.; Sun, G.; Xiao, J.; Pan, Y.; Yan, S.; Wang, Y. Novel Virophages Discovered in a Freshwater Lake in China. Front. Microbiol. 2016, 7, 5. [Google Scholar] [CrossRef] [PubMed]
  28. Sheng, Y.; Wu, Z.; Xu, S.; Wang, Y. Isolation and Identification of a Large Green Alga Virus (Chlorella Virus XW01) of Mimiviridae and Its Virophage (Chlorella Virus Virophage SW01) by Using Unicellular Green Algal Cultures. J. Virol. 2022, 96, e0211421. [Google Scholar] [CrossRef]
  29. Xu, S.; Zhou, L.; Liang, X.; Zhou, Y.; Chen, H.; Yan, S.; Wang, Y. Novel Cell-Virus-Virophage Tripartite Infection Systems Discovered in the Freshwater Lake Dishui Lake in Shanghai, China. J. Virol. 2020, 94, 10–1128. [Google Scholar] [CrossRef]
  30. Patel, R.K.; Jain, M. NGS QC Toolkit: A toolkit for quality control of next generation sequencing data. PLoS ONE 2012, 7, e30619. [Google Scholar] [CrossRef]
  31. Tisza, M.J.; Belford, A.K.; Dominguez-Huerta, G.; Bolduc, B.; Buck, C.B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 2021, 7, veaa100. [Google Scholar] [CrossRef] [PubMed]
  32. Hyatt, D.; Chen, G.L.; Locascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010, 11, 119. [Google Scholar] [CrossRef] [PubMed]
  33. Hu, G.; Kurgan, L. Sequence Similarity Searching. Curr. Protoc. Protein Sci. 2019, 95, e71. [Google Scholar] [CrossRef] [PubMed]
  34. Wang, J.; Chitsaz, F.; Derbyshire, M.K.; Gonzales, N.R.; Gwadz, M.; Lu, S.; Marchler, G.H.; Song, J.S.; Thanki, N.; Yamashita, R.A.; et al. The conserved domain database in 2023. Nucleic Acids Res. 2023, 51, D384–D388. [Google Scholar] [CrossRef] [PubMed]
  35. Gabler, F.; Nam, S.Z.; Till, S.; Mirdita, M.; Steinegger, M.; Soding, J.; Lupas, A.N.; Alva, V. Protein Sequence Analysis Using the MPI Bioinformatics Toolkit. Curr. Protoc. Bioinform. 2020, 72, e108. [Google Scholar] [CrossRef]
  36. Kieft, K.; Zhou, Z.C.; Anantharaman, K. VIBRANT: Automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 2020, 8, 90. [Google Scholar] [CrossRef] [PubMed]
  37. Chan, P.P.; Lin, B.Y.; Mak, A.J.; Lowe, T.M. tRNAscan-SE 2.0: Improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021, 49, 9077–9096. [Google Scholar] [CrossRef]
  38. Hockenberry, A.J.; Wilke, C.O. BACPHLIP: Predicting bacteriophage lifestyle from conserved protein domains. PeerJ 2021, 9, e11396. [Google Scholar] [CrossRef]
  39. Grant, J.R.; Stothard, P. The CGView Server: A comparative genomics tool for circular genomes. Nucleic Acids Res. 2008, 36, W181–W184. [Google Scholar] [CrossRef]
  40. Camargo, A.P.; Nayfach, S.; Chen, I.M.A.; Palaniappan, K.; Ratner, A.; Chu, K.; Ritter, S.J.; Reddy, T.B.K.; Mukherjee, S.; Schulz, F.; et al. IMG/VR v4: An expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 2022, 51, D733–D743. [Google Scholar] [CrossRef]
  41. Bartlau, N.; Wichels, A.; Krohne, G.; Adriaenssens, E.M.; Heins, A.; Fuchs, B.M.; Amann, R.; Moraru, C. Highly diverse flavobacterial phages isolated from North Sea spring blooms. ISME J. 2022, 16, 555–568. [Google Scholar] [CrossRef] [PubMed]
  42. Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef] [PubMed]
  43. Bolduc, B.; Jang, H.B.; Doulcier, G.; You, Z.Q.; Roux, S.; Sullivan, M.B. vConTACT: An iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 2017, 5, e3243. [Google Scholar] [CrossRef] [PubMed]
  44. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
  45. Nishimura, Y.; Yoshida, T.; Kuronishi, M.; Uehara, H.; Ogata, H.; Goto, S. ViPTree: The viral proteomic tree server. Bioinformatics 2017, 33, 2379–2380. [Google Scholar] [CrossRef] [PubMed]
  46. Mihara, T.; Nishimura, Y.; Shimizu, Y.; Nishiyama, H.; Yoshikawa, G.; Uehara, H.; Hingamp, P.; Goto, S.; Ogata, H. Linking Virus Genomes with Host Taxonomy. Viruses 2016, 8, 66. [Google Scholar] [CrossRef] [PubMed]
  47. Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef] [PubMed]
  48. Moraru, C.; Varsani, A.; Kropinski, A.M. VIRIDIC-A Novel Tool to Calculate the Intergenomic Similarities of Prokaryote-Infecting Viruses. Viruses 2020, 12, 1268. [Google Scholar] [CrossRef]
  49. Gilchrist, C.L.M.; Chooi, Y.H. Clinker & clustermap.js: Automatic generation of gene cluster comparison figures. Bioinformatics 2021, 37, 2473–2475. [Google Scholar] [CrossRef]
  50. Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef]
  51. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
  52. Capella-Gutierrez, S.; Silla-Martinez, J.M.; Gabaldon, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef] [PubMed]
  53. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2—Approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef]
  54. Roux, S.; Camargo, A.P.; Coutinho, F.H.; Dabdoub, S.M.; Dutilh, B.E.; Nayfach, S.; Tritt, A. iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol. 2023, 21, e3002083. [Google Scholar] [CrossRef] [PubMed]
  55. Shang, J.; Sun, Y. Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning. BMC Biol. 2021, 19, 250. [Google Scholar] [CrossRef] [PubMed]
  56. Dion, M.B.; Plante, P.L.; Zufferey, E.; Shah, S.A.; Corbeil, J.; Moineau, S. Streamlining CRISPR spacer-based bacterial host predictions to decipher the viral dark matter. Nucleic Acids Res. 2021, 49, 3127–3138. [Google Scholar] [CrossRef]
  57. Zhou, Y.; Zhou, L.; Yan, S.; Chen, L.; Krupovic, M.; Wang, Y. Diverse viruses of marine archaea discovered using metagenomics. Environ. Microbiol. 2023, 25, 367–382. [Google Scholar] [CrossRef]
  58. Krupovic, M.; Dolja, V.V.; Koonin, E.V. The LUCA and its complex virome. Nat. Rev. Microbiol. 2020, 18, 661–670. [Google Scholar] [CrossRef]
  59. Koonin, E.V.; Senkevich, T.G.; Dolja, V.V. The ancient Virus World and evolution of cells. Biol. Direct 2006, 1, 29. [Google Scholar] [CrossRef]
  60. Adriaenssens, E.M.; Sullivan, M.B.; Knezevic, P.; van Zyl, L.J.; Sarkar, B.L.; Dutilh, B.E.; Alfenas-Zerbini, P.; Lobocka, M.; Tong, Y.; Brister, J.R.; et al. Taxonomy of prokaryotic viruses: 2018–2019 update from the ICTV Bacterial and Archaeal Viruses Subcommittee. Arch. Virol. 2020, 165, 1253–1260. [Google Scholar] [CrossRef]
  61. Bi, L.; Yu, D.T.; Du, S.; Zhang, L.M.; Zhang, L.Y.; Wu, C.F.; Xiong, C.; Han, L.L.; He, J.Z. Diversity and potential biogeochemical impacts of viruses in bulk and rhizosphere soils. Environ. Microbiol. 2021, 23, 588–599. [Google Scholar] [CrossRef] [PubMed]
  62. Zhang, Z.; Qin, F.; Chen, F.; Chu, X.; Luo, H.; Zhang, R.; Du, S.; Tian, Z.; Zhao, Y. Culturing novel and abundant pelagiphages in the ocean. Environ. Microbiol. 2021, 23, 1145–1161. [Google Scholar] [CrossRef] [PubMed]
  63. Rusinol, M.; Martinez-Puchol, S.; Timoneda, N.; Fernandez-Cassi, X.; Perez-Cataluna, A.; Fernandez-Bravo, A.; Moreno-Mesonero, L.; Moreno, Y.; Alonso, J.L.; Figueras, M.J.; et al. Metagenomic analysis of viruses, bacteria and protozoa in irrigation water. Int. J. Hyg. Environ. Health 2020, 224, 113440. [Google Scholar] [CrossRef]
  64. Somerville, V.; Lutz, S.; Schmid, M.; Frei, D.; Moser, A.; Irmler, S.; Frey, J.E.; Ahrens, C.H. Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system. BMC Microbiol. 2019, 19, 143. [Google Scholar] [CrossRef] [PubMed]
  65. Sulcius, S.; Simoliunas, E.; Alzbutas, G.; Gasiunas, G.; Jauniskis, V.; Kuznecova, J.; Miettinen, S.; Nilsson, E.; Meskys, R.; Roine, E.; et al. Genomic Characterization of Cyanophage vB_AphaS-CL131 Infecting Filamentous Diazotrophic Cyanobacterium Aphanizomenon flos-aquae Reveals Novel Insights into Virus-Bacterium Interactions. Appl. Environ. Microbiol. 2019, 85, e01311-18. [Google Scholar] [CrossRef] [PubMed]
  66. Wu, R.; Smith, C.A.; Buchko, G.W.; Blaby, I.K.; Paez-Espino, D.; Kyrpides, N.C.; Yoshikuni, Y.; McDermott, J.E.; Hofmockel, K.S.; Cort, J.R.; et al. Structural characterization of a soil viral auxiliary metabolic gene product—A functional chitosanase. Nat. Commun. 2022, 13, 5485. [Google Scholar] [CrossRef]
  67. Luo, X.Q.; Wang, P.; Li, J.L.; Ahmad, M.; Duan, L.; Yin, L.Z.; Deng, Q.Q.; Fang, B.Z.; Li, S.H.; Li, W.J. Viral community-wide auxiliary metabolic genes differ by lifestyles, habitats, and hosts. Microbiome 2022, 10, 190. [Google Scholar] [CrossRef]
  68. Reynolds, P.E. Control of peptidoglycan synthesis in vancomycin-resistant enterococci: D,D-peptidases and D,D-carboxypeptidases. Cell Mol. Life Sci. 1998, 54, 325–331. [Google Scholar] [CrossRef]
  69. Tanner, J.J.; Boechi, L.; Andrew McCammon, J.; Sobrado, P. Structure, mechanism, and dynamics of UDP-galactopyranose mutase. Arch. Biochem. Biophys. 2014, 544, 128–141. [Google Scholar] [CrossRef]
  70. Bochow, S.; Elliman, J.; Owens, L. Bacteriophage adenine methyltransferase: A life cycle regulator? Modelled using Vibrio harveyi myovirus like. J. Appl. Microbiol. 2012, 113, 1001–1013. [Google Scholar] [CrossRef]
  71. van Pee, K.H.; Patallo, E.P. Flavin-dependent halogenases involved in secondary metabolism in bacteria. Appl. Microbiol. Biotechnol. 2006, 70, 631–641. [Google Scholar] [CrossRef]
  72. Rybniker, J.; Nowag, A.; van Gumpel, E.; Nissen, N.; Robinson, N.; Plum, G.; Hartmann, P. Insights into the function of the WhiB-like protein of mycobacteriophage TM4-a transcriptional inhibitor of WhiB2. Mol. Microbiol. 2010, 77, 642–657. [Google Scholar] [CrossRef] [PubMed]
  73. Sharma, V.; Hardy, A.; Luthe, T.; Frunzke, J. Phylogenetic Distribution of WhiB- and Lsr2-Type Regulators in Actinobacteriophage Genomes. Microbiol. Spectr. 2021, 9, e0072721. [Google Scholar] [CrossRef] [PubMed]
  74. Lindell, D.; Jaffe, J.D.; Johnson, Z.I.; Church, G.M.; Chisholm, S.W. Photosynthesis genes in marine viruses yield proteins during host infection. Nature 2005, 438, 86–89. [Google Scholar] [CrossRef]
  75. Markine-Goriaynoff, N.; Gillet, L.; Van Etten, J.L.; Korres, H.; Verma, N.; Vanderplasschen, A. Glycosyltransferases encoded by viruses. J. Gen. Virol. 2004, 85, 2741–2754. [Google Scholar] [CrossRef] [PubMed]
  76. Wang, W.; Zhang, F.; Wen, Y.; Hu, Y.; Yuan, Y.; Wei, M.; Zhou, Y. Cell-free enzymatic synthesis of GDP-L-fucose from mannose. AMB Express 2019, 9, 74. [Google Scholar] [CrossRef] [PubMed]
  77. Sullivan, M.B.; Coleman, M.L.; Weigele, P.; Rohwer, F.; Chisholm, S.W. Three Prochlorococcus cyanophage genomes: Signature features and ecological interpretations. PLoS Biol. 2005, 3, e144. [Google Scholar] [CrossRef]
  78. Van Staalduinen, L.M.; Jia, Z.C. Post-translational hydroxylation by 2OG/Fe(ll)-dependent oxygenases as a novel regulatory mechanism in bacteria. Front. Microbiol. 2015, 5, 798. [Google Scholar] [CrossRef]
  79. Sullivan, M.B.; Huang, K.H.; Ignacio-Espinoza, J.C.; Berlin, A.M.; Kelly, L.; Weigele, P.R.; DeFrancesco, A.S.; Kern, S.E.; Thompson, L.R.; Young, S.; et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environ. Microbiol. 2010, 12, 3035–3056. [Google Scholar] [CrossRef]
  80. Wang, Q.; Cai, L.; Zhang, R.; Wei, S.; Li, F.; Liu, Y.; Xu, Y. A Unique Set of Auxiliary Metabolic Genes Found in an Isolated Cyanophage Sheds New Light on Marine Phage-Host Interactions. Microbiol. Spectr. 2022, 10, e0236722. [Google Scholar] [CrossRef]
  81. Wang, I.N.; Smith, D.L.; Young, R. Holins: The protein clocks of bacteriophage infections. Annu. Rev. Microbiol. 2000, 54, 799–825. [Google Scholar] [CrossRef] [PubMed]
  82. Piacente, F.; Bernardi, C.; Marin, M.; Blanc, G.; Abergel, C.; Tonetti, M.G. Characterization of a UDP-N-acetylglucosamine biosynthetic pathway encoded by the giant DNA virus Mimivirus. Glycobiology 2014, 24, 51–61. [Google Scholar] [CrossRef] [PubMed]
  83. Bailly-Bechet, M.; Vergassola, M.; Rocha, E. Causes for the intriguing presence of tRNAs in phages. Genome Res. 2007, 17, 1486–1495. [Google Scholar] [CrossRef] [PubMed]
  84. Morgado, S.; Vicente, A.C. Global In-Silico Scenario of tRNA Genes and Their Organization in Virus Genomes. Viruses 2019, 11, 180. [Google Scholar] [CrossRef] [PubMed]
  85. Paez-Espino, D.; Eloe-Fadrosh, E.A.; Pavlopoulos, G.A.; Thomas, A.D.; Huntemann, M.; Mikhailova, N.; Rubin, E.; Ivanova, N.N.; Kyrpides, N.C. Uncovering Earth’s virome. Nature 2016, 536, 425–430. [Google Scholar] [CrossRef] [PubMed]
  86. Drake, J.W. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. USA 1991, 88, 7160–7164. [Google Scholar] [CrossRef] [PubMed]
  87. Zwart, G.; Crump, B.C.; Agterveld, M.P.K.V.; Hagen, F.; Han, S.K. Typical freshwater bacteria: An analysis of available 16S rRNA gene sequences from plankton of lakes and rivers. Aquat. Microb. Ecol. 2002, 28, 141–155. [Google Scholar] [CrossRef]
  88. Cai, F.; Axen, S.D.; Kerfeld, C.A. Evidence for the widespread distribution of CRISPR-Cas system in the Phylum Cyanobacteria. RNA Biol. 2013, 10, 687–693. [Google Scholar] [CrossRef]
  89. Makarova, K.S.; Haft, D.H.; Barrangou, R.; Brouns, S.J.; Charpentier, E.; Horvath, P.; Moineau, S.; Mojica, F.J.; Wolf, Y.I.; Yakunin, A.F.; et al. Evolution and classification of the CRISPR-Cas systems. Nat. Rev. Microbiol. 2011, 9, 467–477. [Google Scholar] [CrossRef]
  90. Burstein, D.; Sun, C.L.; Brown, C.T.; Sharon, I.; Anantharaman, K.; Probst, A.J.; Thomas, B.C.; Banfield, J.F. Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat. Commun. 2016, 7, 10613. [Google Scholar] [CrossRef]
Figure 1. Whole-genome alignments of DSL-LC02, DSL-LC03, and DSL-LC07 with their closest related viruses of Synechococcus phage S-CAM9, Synechococcus phage S-CBP3, and Synechococcus phage S-CBP4. Colored coding sequences represent different functional groups. Green linkages between genomes indicate sequence similarity (>30%) of shared homologous genes.
Figure 1. Whole-genome alignments of DSL-LC02, DSL-LC03, and DSL-LC07 with their closest related viruses of Synechococcus phage S-CAM9, Synechococcus phage S-CBP3, and Synechococcus phage S-CBP4. Colored coding sequences represent different functional groups. Green linkages between genomes indicate sequence similarity (>30%) of shared homologous genes.
Viruses 15 02038 g001
Figure 2. The genome maps of DSL-LC01, DSL-LC04, DSL-LC05, and DSL-LC06. The outside circle represents ORFs. The black peaks denote the (G+C) mol% (outward indicates higher than the whole genome average (G+C) mol%, and inward indicates the opposite). ORFs are colored according to their functional categories.
Figure 2. The genome maps of DSL-LC01, DSL-LC04, DSL-LC05, and DSL-LC06. The outside circle represents ORFs. The black peaks denote the (G+C) mol% (outward indicates higher than the whole genome average (G+C) mol%, and inward indicates the opposite). ORFs are colored according to their functional categories.
Viruses 15 02038 g002
Figure 3. The gene-sharing network of Dishui Lake phages, reference prokaryotic DNA viruses, and related UViGs from the IMG/VR dataset. Each node represents a genome, and edges signify the similarity between genomes based on shared protein clusters (PCs). Dishui Lake phages and the nodes that first neighbor them are shown in different colors, and other viruses are in light blue. The VCs associated with Dishui Lake phages are enlarged.
Figure 3. The gene-sharing network of Dishui Lake phages, reference prokaryotic DNA viruses, and related UViGs from the IMG/VR dataset. Each node represents a genome, and edges signify the similarity between genomes based on shared protein clusters (PCs). Dishui Lake phages and the nodes that first neighbor them are shown in different colors, and other viruses are in light blue. The VCs associated with Dishui Lake phages are enlarged.
Viruses 15 02038 g003
Figure 4. Proteomic trees of (a) DSL-LC02, DSL-LC03, and reference viruses in the family Kyanoviridae, and (b) DSL-LC07 and reference viruses in the family Autographiviridae. The Dishui Lake phages and their branches are labeled in red. Genus containing more than one virus was collapsed into blue triangle.
Figure 4. Proteomic trees of (a) DSL-LC02, DSL-LC03, and reference viruses in the family Kyanoviridae, and (b) DSL-LC07 and reference viruses in the family Autographiviridae. The Dishui Lake phages and their branches are labeled in red. Genus containing more than one virus was collapsed into blue triangle.
Viruses 15 02038 g004
Figure 5. Proteomic tree of DSL-LC01, DSL-LC04, DSL-LC05, DSL-LC06, and reference sequences from the Virus–Host DB. The branches containing Dishui Lake phages and related viruses are indicated in sky blue, red, yellow, and light blue, respectively, and their corresponding enlarged subtrees are shown in Figure S2.
Figure 5. Proteomic tree of DSL-LC01, DSL-LC04, DSL-LC05, DSL-LC06, and reference sequences from the Virus–Host DB. The branches containing Dishui Lake phages and related viruses are indicated in sky blue, red, yellow, and light blue, respectively, and their corresponding enlarged subtrees are shown in Figure S2.
Viruses 15 02038 g005
Figure 6. VIRIDIC-generated heatmaps of DSL-LC02, DSL-LC03, DSL-LC07, and their closely related viruses in the proteomic trees (Figure 4). The heatmaps incorporate intergenomic similarity values and alignment metrics of aligned fraction of genomes and genome length ratio. Dishui Lake phages are labelled in red.
Figure 6. VIRIDIC-generated heatmaps of DSL-LC02, DSL-LC03, DSL-LC07, and their closely related viruses in the proteomic trees (Figure 4). The heatmaps incorporate intergenomic similarity values and alignment metrics of aligned fraction of genomes and genome length ratio. Dishui Lake phages are labelled in red.
Viruses 15 02038 g006
Figure 7. Phylogenetic tree (midpoint rooting) of DSL-LC02, DSL-LC03, and members of Kyanoviridae based on 14 core genes concatenated. Viruses in the same genus are collapsed to triangles. Dishui Lake phages are labelled in red.
Figure 7. Phylogenetic tree (midpoint rooting) of DSL-LC02, DSL-LC03, and members of Kyanoviridae based on 14 core genes concatenated. Viruses in the same genus are collapsed to triangles. Dishui Lake phages are labelled in red.
Viruses 15 02038 g007
Figure 8. Phylogenetic tree (midpoint rooting) reconstructed by using DNA-dependent RNA polymerases from DSL-LC07 and 244 complete Autographiviridae RefSeq genomes in the NCBI virus database. The size of the purple dots represents the bootstrap value. Dishui Lake phages are labelled in red.
Figure 8. Phylogenetic tree (midpoint rooting) reconstructed by using DNA-dependent RNA polymerases from DSL-LC07 and 244 complete Autographiviridae RefSeq genomes in the NCBI virus database. The size of the purple dots represents the bootstrap value. Dishui Lake phages are labelled in red.
Viruses 15 02038 g008
Figure 9. Phylogenetic tree reconstructed by using TerL homologs from DSL-LC01 (a), DSL-LC04 (b), DSL-LC05 (c), DSL-LC06 (d), and other viruses. Numbers in the parentheses indicate the number of leaves collapsed. The color diamonds represent the classification information of the viruses and their hosts. Dishui Lake phages are labelled in red.
Figure 9. Phylogenetic tree reconstructed by using TerL homologs from DSL-LC01 (a), DSL-LC04 (b), DSL-LC05 (c), DSL-LC06 (d), and other viruses. Numbers in the parentheses indicate the number of leaves collapsed. The color diamonds represent the classification information of the viruses and their hosts. Dishui Lake phages are labelled in red.
Viruses 15 02038 g009
Table 1. General features of seven Dishui Lake phages.
Table 1. General features of seven Dishui Lake phages.
PhageGenome (bp)%GCNo. of ORFAnnotated ORFAMGstRNALifestyle a
Ρ129,46953.61525743Temperate
DSL-LC02167,92836.9226102611Virulent
DSL-LC03172,80535.821095711Virulent
DSL-LC0461,54751.3863500Virulent
DSL-LC0537,56956.6593100Virulent
DSL-LC0629,81138.2442510Virulent
DSL-LC0745,79746.1643210Virulent
a Predicted by BACPHLIP.
Table 2. Potential hosts of seven Dishui Lake phages.
Table 2. Potential hosts of seven Dishui Lake phages.
PhageiPhoPhostG
DSL-LC01Actinomycetota; ActinomycetiaActinomycetota; Actinomycetia
DSL-LC02Cyanobacteriota; Cyanobacteriia;
PCC-6307; Cyanobiaceae
Cyanobacteriota
DSL-LC03Cyanobacteriota; Cyanobacteriia; PCC-6307; Cyanobiaceae;
Synechococcus
Cyanobacteriota
DSL-LC04Pseudomonadota;
Gammaproteobacteria
Pseudomonadota; Gammaproteobacteria; Pseudomonadales; Pseudomonadaceae; Pseudomonas
DSL-LC05Pseudomonadota;
Gammaproteobacteria;

Pseudomonadota;
Gammaproteobacteria;
Enterobacterales
DSL-LC06Pseudomonadota-
DSL-LC07Cyanobacteriota; Cyanobacteriia; PCC-6307; Cyanobiaceae;
Synechococcus
Cyanobacteriota
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, H.; Zhou, Y.; Li, X.; Xu, T.; Ni, Y.; Wu, S.; Yu, Y.; Wang, Y. Genomic Analysis and Taxonomic Characterization of Seven Bacteriophage Genomes Metagenomic-Assembled from the Dishui Lake. Viruses 2023, 15, 2038. https://doi.org/10.3390/v15102038

AMA Style

Cai H, Zhou Y, Li X, Xu T, Ni Y, Wu S, Yu Y, Wang Y. Genomic Analysis and Taxonomic Characterization of Seven Bacteriophage Genomes Metagenomic-Assembled from the Dishui Lake. Viruses. 2023; 15(10):2038. https://doi.org/10.3390/v15102038

Chicago/Turabian Style

Cai, Haoyun, Yifan Zhou, Xiefei Li, Tianqi Xu, Yimin Ni, Shuang Wu, Yongxin Yu, and Yongjie Wang. 2023. "Genomic Analysis and Taxonomic Characterization of Seven Bacteriophage Genomes Metagenomic-Assembled from the Dishui Lake" Viruses 15, no. 10: 2038. https://doi.org/10.3390/v15102038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop