Next Article in Journal
Nuclear Export in Non-Hodgkin Lymphoma and Implications for Targeted XPO1 Inhibitors
Next Article in Special Issue
MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein–Protein Docking Conformations
Previous Article in Journal
Identification and Characterization of Hdh-FMRF2 Gene in Pacific Abalone and Its Possible Role in Reproduction and Larva Development
Previous Article in Special Issue
Evolutionary Diversity of Dus2 Enzymes Reveals Novel Structural and Functional Features among Members of the RNA Dihydrouridine Synthases Family
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Use of an Integrated Approach Involving AlphaFold Predictions for the Evolutionary Taxonomy of Duplodnaviria Viruses

1
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str., 117997 Moscow, Russia
2
Limnological Institute, Siberian Branch of the Russian Academy of Sciences, 664033 Irkutsk, Russia
*
Authors to whom correspondence should be addressed.
Biomolecules 2023, 13(1), 110; https://doi.org/10.3390/biom13010110
Submission received: 6 December 2022 / Revised: 31 December 2022 / Accepted: 1 January 2023 / Published: 5 January 2023
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)

Abstract

:
The evaluation of the evolutionary relationships is exceptionally important for the taxonomy of viruses, which is a rapidly expanding area of research. The classification of viral groups belonging to the realm Duplodnaviria, which include tailed bacteriophages, head-tailed archaeal viruses and herpesviruses, has undergone many changes in recent years and continues to improve. One of the challenging tasks of Duplodnaviria taxonomy is the classification of high-ranked taxa, including families and orders. At the moment, only 17 of 50 families have been assigned to orders. The evaluation of the evolutionary relationships between viruses is complicated by the high level of divergence of viral proteins. However, the development of structure prediction algorithms, including the award-winning AlphaFold, encourages the use of the results of structural predictions to clarify the evolutionary history of viral proteins. In this study, the evolutionary relationships of two conserved viral proteins, the major capsid protein and terminase, representing different viruses, including all classified Duplodnaviria families, have been analysed using AlphaFold modelling. This analysis has been undertaken using structural comparisons and different phylogenetic methods. The results of the analyses mainly indicated the high quality of AlphaFold modelling and the possibility of using the AlphaFold predictions, together with other methods, for the reconstruction of the evolutionary relationships between distant viral groups. Based on the results of this integrated approach, assumptions have been made about refining the taxonomic classification of bacterial and archaeal Duplodnaviria groups, and problems relating to the taxonomic classification of Duplodnaviria have been discussed.

1. Introduction

In recent years, the taxonomy of viruses has undergone significant changes. Many of these changes have been related to the reclassification of viruses infecting bacteria (bacteriophages, phages), of which tailed bacteriophages with double-stranded DNA genomes constitute the most numerous group [1]. The series of taxonomic reforms began with a shift from a classification based on bacteriophage morphology to a classification based on genomic data [2]. Since the early 2000s, the growing body of genomic data has revealed a much higher genomic diversity than was previously anticipated, primarily among tailed bacteriophages [2]. Until 2019, tailed bacteriophages were grouped within the order Caudovirales, which included three families, namely, Myoviridae, Podoviridae and Siphoviridae. Those families were created based on differences in phage morphology. Phages having a myoviral morphology (myoviruses) possess long contractile tails, while siphoviruses have flexible non-contractible tails and podoviruses possess short non-contractible (expandable) tails (Figure 1).
Traditional classification based on morphology has drastically changed, in the last several years, with the former Caudovirales order and the Myoviridae, Podoviridae and Siphoviridae families being abolished. In 2019, a decision of the Executive Committee of the International Committee on Taxonomy of Viruses (ICTV) increased the number of ranks of the taxonomic classification of viruses to fifteen [3]. Currently, the classification approved by ICTV places tailed phages, head-tailed archaeal viruses and evolutionarily related herpesviruses [4] in realm Duplodnaviria and kingdom Heunggongvirae. Herpesviruses belong to the phylum Peploviricota and class Herviviricetes, while bacterial phages and head-tailed archaeal viruses are attributed to the phylum Uroviricota and class Caudoviricetes [5]. However, classification within the class Caudoviricetes of the level of lower taxonomic ranks has not yet been fully formalised and is in a state of discovery and refinement. Currently, the class Caudoviricetes comprises four orders, 47 families, 98 subfamilies, 1907 genera and 3301 species. Most families and other lower-ranked taxa are not assigned to orders, which contain only 14 families.
The new ranking hierarchy of virus taxonomy is based on the evolutionary relationships between viruses. This hierarchy is founded upon modern evolutionary synthesis, a development of the Darwinian approach to classification, using the achievements of genomics to identify evolutionary relations [3,6,7,8,9]. The creation of a credible picture of evolutionary relationships between viral groups is often a complex task. It is assumed that viruses are of ancient origin, and they may be the most ancient creatures on Earth [10]. The long path of evolution, following the divergence of viral groups, led to many mutations in genes, which makes it difficult to identify reliable phylogenetic relationships. The problem can be partially solved by using conserved genes [11], concatenated sequences of conserved genes [12] or an analysis of protein folding and structural similarity [13]. Another problem with building the classification scheme of bacteriophages is the phenomenon of genetic mosaicism accompanying the evolution of bacteriophages, especially temperate bacteriophages, due to the modular nature of phage evolution [14,15,16,17,18]. It is difficult to solve the latter problem by merely tracing the evolutionary history of individual proteins and genes; nevertheless, this history is very important for understanding the mechanisms of viral evolution.
The structural resemblance between proteins is widely used to estimate evolutionary relationships between proteins that show little or no homology in their amino acid sequences [13,19,20]. The structural similarity between two proteins can be assessed by using the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates, or by using other, more advanced, metrics, such as the template modelling score (TM-score) or the DALI Z-score [21,22,23]. Previously, the clusterisation of experimentally determined structures of major capsid proteins applying the DALI Z-score has been used to illustrate the common origin of several main viral groups and to cluster prokaryotic viruses [13,24]. However, for most Caudoviricetes families, such clusterisation is not possible, since the structures of conserved proteins for most viruses are not determined. Recent advances in protein structure prediction methods, which in some cases can give results close to those derived experimentally, encourage the use of new deep learning techniques to study the evolutionary history of protein structures. Previously, a similar approach was used to deduce patterns of the evolution of phage tail sheath proteins, but this was not tested for other viral conserved proteins, including the major capsid protein and terminase, which serve as markers of evolutionary relations between phages and are often used for taxonomic purposes [25].
In this study, the structures of the major capsid proteins (MCPs) and the ATPase subunits of terminase, encoded in the genomes of representatives of ICTV’s approved families of kingdom Heunggongvirae, were predicted using the winner of the CASP14 Award, AlphaFold [26], and further these predictions have been used for comparative analysis. This analysis was combined with different phylogenetic examinations, and suggestions were made about possible taxonomic updates. In addition, major capsid protein structures were modelled with another deep learning algorithm, RoseTTAFold, and the quality of predictions was assessed.

2. Materials and Methods

2.1. Data Collection and Annotation of Sequences

Viral genomes and protein sequences were downloaded from the NCBI GenBank [27] and UniProt [28] databases. Protein structures were downloaded from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) [29]. Viral genomes were re-annotated with the assistance of Glimmer 3.0.2 [30], which was used for open-reading frames (ORFs) detection. Protein functions were predicted using a BLAST homology search [27] and HHM-HHM-motif comparison using the HHpred server [31].

2.2. Protein Modelling and Quality Assessment

All protein structures were modelled with AlphaFold 2.2.4 (AF) [26], using full databases and the command line parameter --monomer_casp14, matching the CASP14 configuration. Spatial structures of 53 Caudoviricetes major capsid proteins (MCPs) and encapsulin were also modelled with RoseTTAFold [32], with default settings using the Robetta server [33]. The best-ranked structures were selected for further study. Quality assessment of protein structures was performed using deep learning framework DeepAccNet [34], applying the default settings. Protein structures were superimposed and visualised using Pymol 2.5.4 (Schrödinger Inc., NY, USA) [35].

2.3. Structural Alignment and Scoring the Structural Similarity

Structure comparison was performed using the DALI server [36] and the mTM-align package [37], with default settings. Structural similarity was evaluated with the DALI Z-score [21] and the TM-score [22]. A structural similarity matrix was obtained using the DALI server. Phylogenetic trees based on structural similarity were obtained with the built-in DALI tools and the PHYLIP Phylogeny Inference Package 3.6 [38], using the neighbour-joining clustering method. Multiple sequence alignments based on structural similarity were obtained with mTM-align.

2.4. Phylogenetic Analysis Employing Primary Sequences of Proteins

Multiple alignments of primary amino acid sequences were obtained with Clustal Omega 1.2.3 [39], with (ten refinement iterations, evaluating full distance matrix for initial and guide trees) settings, MAFFT 7.48 [40] with default settings and using the L-INS-i algorithm, and MUSCLE 3.8.425 [41] with default settings. The phylogenetic trees based on the alignments of proteins’ primary sequences were constructed using RAxML-NG 1.1.0 [42] and the raxmlGUI 2.0.10 graphic interface [43] with (--tree rand{10} --bs-trees 1000) settings and applying the best protein model found with ModelTest-NG 0.1.7 [44]. The robustness of the RAxML-NG 1.1.0 trees was assessed using bootstrapping and calculations of transfer bootstrap estimation (TBE) support [45].

2.5. Comparative Analysis of Phylogenetic Trees

Pairwise comparison of best-scoring phylogenetic trees constructed with RAxML-NG 1.1.0, based on Clustal Omega 1.2.3, mTM-align, MAFFT 7.48 and MUSCLE 3.8.425 alignments, and dendrograms obtained with DALI and mTM-align structural comparisons, was performed using the ETE 3.1.2 toolkit [46] with an “unrooted tree” setting. Robinson–Foulds normalised distances (nRFs) [47] were used to compute distances between the trees and construct matrices for heatmaps. The heatmaps were visualised with a Python Plotly Express module 5.11.0.

2.6. VIRIDIC Intergenomic Comparison and GRAViTy Dendrogram

Comparison of intergenomic similarities was conducted with the VIRIDIC tool [48], using the default settings. The proteome-based GRAViTy dendrogram was obtained with the GRAViTy server (https://gravity.cvr.gla.ac.uk accessed on 1 November 2022) [49,50], using the database DB-B: Baltimore Group Ib-Prokaryotic and archaeal dsDNA viruses (VMRv34) and genomic sequences of representative viruses. The dendrogram was visualised with iTOL [51].

3. Results

3.1. Modelling Structures of Major Capsid Protein and ATPase Subunit of Terminase

3.1.1. Selected Viral Groups

In early September 2022, the list of Duplodnaviria taxa approved by the ICTV encompassed 50 families. Of these, 47 families contained bacterial and archaeal head-tailed viruses and three families included eukaryotic herpesviruses. One representative of each family was picked from the list of viruses published on the ICTV website. In addition, several bacteriophages of general or particular interest, which were not assigned to the approved families, were taken to be studied. They comprised three jumbo phages, which presumably presenting ancient early diverged groups, two phages (phage λ and phage HK97) that had played an important role in viral research and two phages (Curtobacterium phage Ayka and Pseudomonas phage MD8) that were not assigned to particular families, which had been analysed in the authors’ previous research [18,52]. The jumbo phages mentioned above were the first isolated jumbo phage, Phikzvirus phiKZ (Pseudomonas phage phiKZ) [53,54], Donellivirus gee (Bacillus phage G), which is an isolated phage with the largest known genome [55], and the phage with the largest known genome predicted by metagenome analysis, LacPavin_0818_WC45 [56] (Table 1).
Use of the ICTV-recommended intergenomic similarity comparison VIRIDIC tool [48] did not show any meaningful genomic likeness between the representative viruses (Supplementary Figure S1), indicating that the genus threshold of 70% nucleotide identity [2] cannot be applied to these viruses.

3.1.2. Major Capsid Protein Modelling

Experimentally determined, and bioinformatically predicted, 57 genes encoding the major capsid proteins (MCPs) have been translated and modelled by AlphaFold (AF). The best-ranking structures were taken for further analysis. For use as an outgroup for the phylogenetic analysis, the structure of Thermotoga maritima encapsulin (PDB code 7K5W), which has a high structural resemblance to Heunggongvirae MCPs [57], was also modelled with AlphaFold. In addition, 53 MCPs belonging to viruses assigned to phylum Uroviricota and class Caudoviricetes (phages and head-tailed archaeal viruses) were modelled with RoseTTAFold to evaluate the prediction quality of the programs.
Interestingly, an examination of GenBank annotations indicated the absence of MCP annotations in several cases. Furthermore, examination of annotations using HHpred indicated errors in annotations for selected representatives of Duneviridae and Helgolandviridae families. The HHpred examination of MCP sequences obtained by the translation of corresponding predicted genes of Hacavirus HCTV1 (Haloarcula californiae tailed virus 1, order Thumleimavirales, family Soleiviridae) found no meaningful similarities with viral capsid proteins, which was also the case with Phikzvirus phiKZ, but subsequent modelling of suggested MCP of Hacavirus HCTV1 and experimental data for Phikzvirus phiKZ confirmed the functions of these proteins.
All the models (Figure 2 and Supplementary Figure S2) contain characteristic HK97-fold (Figure 3a), named after Byrnievirus HK97 (Escherichia phage HK97), including its conserved elements, the A-domain (axial domain), the P-domain (peripheral domain), the E-loop (extended loop) and the N-arm (N-terminal arm). Most models contain additional elements found in different HK97-like capsid proteins, such as the G-loop. The modelled proteins often contain other domains, or subdomains, that can also be explained by the presence of protease and scaffolding protein domains in the translated sequences. Protease and scaffolding protein are essential for capsid assembly and can be encoded in a single gene, but are absent in mature capsids [58,59,60,61]. The superimposition of modelled AF structures of phages HK97, λ, T4 and T7, and experimentally determined corresponding structures 1OHG (HK97, the mature capsid) [62], 7SJ5 (λ, major capsid protein mutant in the pre-assembly conformation) [63], 5VF3 (T4, mutant MCP in the isometric capsid) [64], 7VS5 (T4, MCP in the expanded head structure) [65] and 3J7V (T7, MCP in the DNA-free procapsid state) [66], showed RMSD values of 0.968 Å, 0.874 Å, 3.437 Å, 0.763 Å and 2.708 Å, respectively (Figure 3b). These values are lower than the corresponding experimental resolution values (3.45 Å, 2.69 Å, 3.45 Å, 3.40 Å and 4.60 Å, respectively).
The most complex structural architecture was found in MCP models of herpesviruses and Jumbo phage phiKZ. Interestingly, Pseudomonas phage MD8 also featured a comparatively complicated architecture. As was suggested earlier, for this phage, a single gene encodes for major capsid protein, protease and scaffolding proteins as a single propeptide [18].

3.1.3. ATPase Subunit of Terminase Modelling

The ATPase subunits of terminase (terminase, large subunit of terminase, TerL) were modelled in a similar way. The terminase genes were extracted from the annotations of representative genomes and translated. The terminase (gene IVa2) of Human adenovirus C, exploiting a mechanism of genome packaging similar to herpesviruses and tailed phages [68], was also modelled with AF in order to use it as an outgroup in phylogenetic analyses.
The structural architecture of Heunggongvirae TerL reflects the function of this protein. Typical ATPase subunits of terminase include the N-terminal adenosine triphosphatase (ATPase) domain that drives DNA translocation and the C-terminal endonuclease domain that cleaves the concatemeric genome at both initiation and completion of genome packaging [69]. The ATPase domain (ATDP) contains a five-stranded, parallel β-sheet in the canonical ASCE fold sandwiched between several α-helices, which is easily recognisable in the models (Figure 4, Figure 5 and Figure S3), and additional β-strands that are unique in viral terminases [70].
The superimposition of the TerL model of phage HK97 and experimentally determined structure 6Z6D produced the RMSD value of 8.054 Å (the experimental accuracy was 2.20 Å), and the superimposition of the phage T4 TerL model and experimentally determined structure 3CPE produced the RMSD value of 0.474 Å (the experimental accuracy was 2.80 Å). An inspection of the HK97 model and the X-ray determined structure indicated that the comparatively high RMSD was due to the predicted orientation of ATPase and nuclease domains relative to each other. A superimposition using the separated domains without the linker part yielded the RMSD values of 0.455 Å for the ATPase domain and 0.469 Å for the nuclease domain.

3.1.4. Evaluation of Models’ Accuracy

Overall accuracy predictions were assessed with the Local Distance Difference Test (LDDT), using the DeepAccNet accuracy predictor. Comparison of the average lDDT score of the 54 AF models of MCP and TerL indicated a mostly high level of accuracy of predictions, and that structure prediction of TerL was more accurate than that of MCP, (lDDT TerL mean: 0.988, median: 0.996, q1: 0.991, q3: 0.999; lDDT MCP mean: 0.907, median: 0.929, q1: 0.822, q3: 0.970). The average lDDT of the ATPase domains extracted from the TerL models was even higher, (mean: 0.998, median: 0.999, q1: 0.998, q3: 0.9997). The evaluation of RoseTTAFold models of the same 53 MCPs showed a lower accuracy of prediction, (lDDT mean: 0.634, median: 0.649, q1: 0.582, q3: 0.685), than with the AlphaFold models (Figure 6).

3.2. Comparisons of Models’ Structural Similarity

3.2.1. Structural Comparisons Using DALI

An evaluation of the structural similarity of proteins including viral major capsid proteins, using DALI, was carried out to investigate the evolutionary relationships and classification of protein fold [13,36]. A DALI analysis using the AF models of major capsid proteins (Figure 7) demonstrated clustering of all four bacteriophages of the Crassvirales order (families Crevaviridae, Intestiviridae, Steigviridae and Suoliviridae) and both head-tailed archaeal viruses of the Methanobavirales order (Anaerodiviridae and Leisingerviridae families). At the same time, the MCPs of the head-tailed archaeal viruses of the Kirjokansivirales and Thumleimavirales orders did not form distinct clusters. The MCP models of representatives of the Herpesvirales order (Alloherpesviridae, Herpesviridae and Malacoherpesviridae families) were grouped together, but did not show such high similarities as MCPs of the Crassvirales and Methanobavirales orders. The results of DALI clustering using the structures with removed parts, approximately corresponding to the protease and scaffolding domains, were similar to those of full-sized models (Supplementary Figure S4).
In addition, DALI indicated noticeable structural resemblances of representative MCPs for other families, including:
  • The bacteriophages of Guelinviridae, Rountreeviridae and Salasmaviridae families, and a novel Curtobacterium phage Ayka; (from now on, in this study, these will be referred to as group 1);
  • The bacteriophages of Ackermannviridae, Kyanoviridae and Straboviridae families (group 2);
  • The bacteriophages of Pachyviridae and Pervagoviridae families (group 3), making a subcluster of a larger cluster that includes the Crassvirales phages;
  • The bacteriophages of the Casjensviridae family and Lambdavirus lambda (group 4);
  • The bacteriophages of Duneviridae and Helgolandviridae families (group 5).
Some of these observations can be biologically meaningful, reflecting the common origin and lifestyle of viruses. Group 1 comprises the so-called “ϕ29-like” lytic phages with a podoviral morphology and similar genome size of about 16–20 kb, infecting Gram-positive bacteria [52,71,72,73]. The Guelinviridae, Rountreeviridae and Salasmaviridae families were proposed in 2020, clarifying the taxonomic classification of ϕ29-like viruses [5]. Group 2 includes phages with large genomes of about 150–200 kb, which were described earlier as “T4-like” phages [74,75,76]. MCP models of unclassified Jumbo-phage LacPavin (genome size 735 kb) and Maribacter phage Colly_1 (Molycolviridae order, genome size 735 kb) also showed some resemblance to group 2 models.
Viruses of group 3 [77,78] infect flavobacteria (phylum Bacteroidota) and have genomes of a similar size (about 73 kb) and GC-content (Table 1). A BLAST search using the GenBank PHG database indicated the relatedness of the MCPs of Pachyviridae and Pervagoviridae representative phages (Bit-score of more 130), but did not reveal homologies with the Crassvirales phages, which infect human gut symbiont Bacteroides [79,80]. Both group 3 and Crassvirales phages have a podoviral morphology. Phages of group 4 have genomes of 49–67 kb and infect Enterobacterales (Salmonella phage χ of Casjensviridae family [81] and phage λ [82]). Group 5 includes the phages Flavobacterium phage 1H (Duneviridae family) [83] and Polaribacter phage Leef_1 (Helgolandviridae family) [78], infecting flavobacteria. The phages have genomes of similar size (about 38–39 kb) and features with a siphoviral morphology.
It is noteworthy that, according to the DALI comparisons of MCP structural similarities, the archaeal head-tailed viral families do not form one or two distinct clusters. They are often grouped with bacteriophages or do not show similarities with any other families.
The DALI analysis performed using the modelled structures of a large subunit of terminase (TerL) and its ATPase domain (ATPD) demonstrated similar results (Figure 8 and Figure 9). The TerL (or ATPD) analysis showed differences in the DALI MCP comparison. Generally, structural similarities of representative models TerL and ATPD were greater than for MCPs, indicating the greater conservation of terminase. TerL and ATPD structural comparisons with DALI also gathered the viruses of groups 1, 2 and 3 mentioned above into distinct clusters and indicated the likeness of group 3 Pachyviridae and Pervagoviridae terminase models to Crassvirales terminases. Interestingly, the TerL and ATPD models of phage λ (group 4), unclassified Pseudomonas phage MD8, distantly related to lambdoid phages, a Casjensviridae family phage Salmonella phage χ (group 4) and archaeal head-tailed Haloarcula hispanicatailed virus 1 (Madisaviridae family) showed a distinct similarity. These viruses have a siphovirus morphology [24] and a genome size of 48–59 kb. As well as the DALI MCP comparisons, the terminase analysis indicated a complex pattern of relationships between archaeal viruses, not matching the ICTV classification.

3.2.2. Structural Comparisons with mTM-Align

The results of the mTM-align structural comparisons (Figure A1) did not match the conclusions of the DALI examinations. However, there were many similar observations concerning the similarities of the modelled structures. For example, a relatedness was shown between the MCPs and terminases of group 1, group 2 and group 3 viruses, mentioned in Section 3.2.1, and the analysis also showed the complex grouping pattern of archaeal viruses. As well as the MCP DALI tree, the mTM-align tree placed the Methanobavirales order families in a monophyletic branch, showing the similarity of Lambdavirus and Casjensviridae MCP models and Hafunaviridae and MD8 models.
The mTM-align tree constructed with the whole models of the ATPase subunit of terminase did not place all three families of the Herpesvirales order in a single clade, but the MCP and ATPD trees set these families in clades. The latter trees placed the Pachyviridae and Pervagoviridae structures to the clade, containing the Crassvirales representatives, like the DALI trees do.

3.3. Phylogenetic Analysis Using Amino Acid Sequences of MCP and TerL

3.3.1. Phylogenetic Analysis of MCP

In addition to the evidence of trees that made comparisons based on structural similarity, a comparative analysis was conducted with phylogenetic trees, which were based on alignments obtained with different algorithms (Clustal Omega, MAFFT, MUSCLE, and mTM-align) using MCP amino acid sequences (Figure 10 and Figures S5–S7); the latter analyses showed unmatching topologies. It did, however, demonstrate the common composition of some branches. Only the tree constructed using the alignment based on the structural similarity obtained with mTM-align placed all the Herpesvirales representatives in distinct monophyletic branches. Except for this tree, none of the trees arranged the Crassvirales families and the representatives of group 3 (Pachyviridae and Pervagoviridae families) in monophyletic or adjacent branches. Except for the MUSCLE tree, none of the trees placed the representatives of group 1 (ϕ29-like Guelinviridae, Rountreeviridae and Salasmaviridae families, and Curtobacterium phage Ayka) in a monophyletic branch. However, for the sequences belonging to group 2 (T4-like Ackermannviridae, Kyanoviridae and Straboviridae families), and those belonging to the Methanobavirales order’s families, the phylogenetic analyses based on amino acid sequence alignments showed results resembling those produced from structural comparisons. Apparently, low MCP sequence conservation level (MAFFT pairwise identity 6.0%) hinders the possibilities of phylogenetic analysis.

3.3.2. Phylogenetic Analysis of Terminase

A phylogenetic analysis based on the amino acid sequences of the ATPase subunit of terminase TerL and the ATPase domain, identified with the assistance of AlphaFold, was conducted using alignments obtained with Clustal Omega, MAFFT, MUSCLE and mTM-align (Figure 11 and Figures S8–S10). Apparently, the level of conservatism of TerL was somewhat higher than for MCP (pairwise identity of TerL alignment by MAFFT 7.5%). Except for the MUSCLE trees, the remaining trees grouped the Herpesvirales and group 2 (ϕ29-like) representatives in distinct clades. Most of the trees arranged the representatives of both group 3 and order Crassvirales in a monophyletic branch, and all trees placed the proteins of group 1 (T4-like viruses) in a clade. However, as well as in the cases described above, the trees demonstrated different topologies. Along with all other structural and phylogenetic analyses, the terminase trees showed complex relationships between the terminases of archaeal head-tailed viruses, which did not match the current ICTV classification.

3.4. Analysis of Topological Congruence of Dendrograms

Comparisons of the topologies of dendrograms, based on both the structural similarities and amino acid sequence alignments, using the ETE toolkit calculated high Robinson–Foulds normalised distances (nRFs), indicated the topological incongruence of the trees (Figure 12). The comparisons also indicated greater topological similarities between ATPD and TerL trees constructed using the same algorithms; topological similarities of ATPD trees were basically more pronounced. A visual comparison showed better topological similarities between the branches that had diverged comparatively recently.

3.5. GRAViTy Dendrogram

Evolutionary relationships between the viruses were also estimated with the GRAViTy pipeline, classifying viral groups according to the homology between viral genes and similarities in genomes’ organisation. The GRAViTy tool is recommended by the ICTV for the demarcation of high-ranked taxa [2]. The GRAViTy tree (Supplementary Figure S11) shows differences from the DALI structural similarity dendrogram and other trees, clustering two Herpesvirales families together with four Thumleimavirales families of archaeal viruses and placing both the representatives of group 3 (Pachyviridae and Pervagoviridae) and group 4 (Casjensviridae and Zierdtviridae) in distant clusters. The differences in topologies could also be due to a different composition of viruses involved in the analysis. The GRAViTy dendrogram clustered all representatives of the Crassvirales order, of group 1 (representatives of Guelinviridae, Rountreeviridae and Salasmaviridae families, and Curtobacterium phage Ayka), group 2 (Ackermannviridae, Kyanoviridae and Straboviridae) and group 5 (Duneviridae and Helgolandviridae) in corresponding groups (also containing other viruses not present in the list of 57 representative viruses) (Table 1). Nevertheless, not all representatives of the Kirjokansivirales and Methanobavirales orders were grouped according to their taxonomic classification. Interestingly, the GRAViTy dendrogram placed Plasmaviridae and Helgolandviridae representatives into one branch. Plasmaviridae is a family of pleomorphic enveloped viruses, not belonging to class Caudoviricetes, that infect Acholeplasma species [84].

4. Discussion

AlphaFold modelling of the structures of two viral conservative proteins, the major capsid protein and the ATPase subunit of terminase, has demonstrated high predictive accuracy. This accuracy exceeded that of RoseTTAFold, another deep-learning algorithm, identifying AlphaFold as being preferred to RoseTTAFold for such purposes. Using predicted accuracy alone, however, it is difficult to judge the extent to which the models correspond to real structures. It should also be pointed out that the native state of viral proteins can change, in different states of the viral particle (e.g., empty, full, expanded capsids) and at different stages of viral particle assembly [64,65,85,86]. The correlation between structural similarity and sequence identity is not absolute, due to conformational plasticity, mutations, solvent effects and ligand binding [87]. Most of these limitations refer to analyses that also employed experimentally determined structures, but they can be exacerbated by structural prediction errors. Therefore, it seems to be difficult to forecast the effectiveness of using AlphaFold for the analysis of structural similarity and evolutionary history based on the resemblance of predicted structures alone. In addition, as shown in this study, different algorithms of structural comparison can lead to different results. Unfortunately, some limitations are inherent not only in structural analysis, but also in analyses based on primary amino acid protein sequences. Recovering evolutionary history using amino acid alignments has its own problems, related to high mutation rates, the details of molecular evolution and depends on algorithms and methods [88,89,90,91], as was seen in the phylogenetic studies performed in this work. Thus, an integrated approach involving the evaluation and comparison of various methods, including structural- and sequence-based phylogenies, genome organisation and biological data, may provide more confident conclusions.
Phylogenetic analysis based on the primary sequence of proteins using the same ML algorithm RAxML-ng and alignment using different algorithms resulted in different tree topologies. Bootstrap values were also low, and these can be explained by a low level of sequence similarity, due to ancient divergence of the main Duplodnaviria groups and a high mutation rate. Sequence conservation level was higher for terminase than for MCP. However, due to the modular evolution of viruses, which is inherent to various viral groups [14,92], using just one protein cannot uncover the evolutionary history. The example of Pseudomonas phage MD8, examined in this study, as well as earlier [18], shows a different evolutionary history for MCP and terminase, as indicated by both the structural comparisons of AF models and sequence-based phylogenetic analysis.
The results of most structural comparisons and phylogenies based on amino acid sequences seem to correlate for viruses that are evolutionarily closely related, such as T4-like phages. Both all-against-all structural comparison and sequence-based phylogenies grouped T4- and ϕ29-like viruses in monophyletic groups, and these results appear to be biologically reasonable. Larger proteins of evolutionarily related representatives of three Herpesvirales families [93,94] were better clustered using algorithms employing structural alignments of the predicted models. This is an argument in favour of using structural analysis, which can be explained by good modelling accuracy and the superior sensitivity of structural comparisons in comparison with sequence-based phylogeny. The DALI can be more appropriate than mTM-align for such purposes. DALI, and mTM to a lesser degree, demonstrated consistency with sequence-based phylogenetic analysis for relatively small distances estimated with the tree’s branch length (e.g., T4-like viral families), and better clustered the distant representatives of the Hepresvirales families. Therefore, it seems reasonable to suggest that structural comparisons based on AF predictions can also work in the context of intermediate distances. It is probable that the detailed examination of protein evolution, (e.g., the emergence of new domains and subdomains inherent for the whole group), may assist evolutionary analysis. This approach was used in the analysis of the evolution of phage tail sheath protein [95], featuring the subsequent adding of new domains while maintaining the common conserved domain. Interestingly, the predicted structures of the modelled MCPs of Crassvirales phages and group 3 phages (Pachyviridae and Pervagoviridae representatives) contain similar additional domains, composed mainly of β-strands, which supports the suggestion of their relatedness.
It is extremely difficult to understand the evolutionary history of viral groups, and the corresponding evolutionary-based taxonomic updates, using only one protein, but a set of conserved marker genes with the same origin could indicate evolutionary relationships and taxonomic classification. Such an approach is widely used, and might include various groups of DNA viruses [96,97,98], but difficulties might be encountered with Duplodnaviria viruses, because of the limited number of shared genes and recombination events, as a result of modular evolution. For instance, many phages lack some replication protein genes [99], making it impossible to use them in phylogenetic analysis. The approach of using the main clustering tools, such as predicted proteome-based clustering tools [2], to reconstruct evolutionary history at the level of families could hypothetically lead to a situation where newly acquired genes can mask highly mutated conserved genes, leading to erroneous conclusions about the origin of these groups.
Viral evolution is characterised by network character [100,101], and genomics-based evaluations of evolutionary history, using gene network clustering, do show the connections between related groups; they do not, however, readily show the history of the appearance of these connections. It is probable that some reference points are needed for the assessment of Duplodnaviria evolution and, in genome-based approaches, they should include proteins of common origin for all viruses represented.
Recent meticulous work carried out on the classification of archaeal viruses [24] has laid the foundations for classification of this important viral group. The analysis of all available complete genomes of archaeal head-tailed viruses has made it possible to make assumptions about the ancient divergence of archaeal and bacterial tailed viruses and the intensive exchange of genes involved in DNA metabolism and counter-defence mechanisms. The results of the present study, including structural comparisons and sequence-based phylogenies, indicate that such assumptions may be needed in some modifications, such as those explaining the non-monophyletic character of the relationships of archaeal MCPs and terminases. The fact that a BLAST search indicated the closest related proteins not among representatives of the same order of head-tailed archaeal viruses, but among representatives of other taxa of archaeal and bacterial viruses, and the results of phylogenetic studies not in accordance with current classification, may indicate a more complex pattern of early evolution of Caudoviricetes viruses. Otherwise, the requirements of monophyly for orders [2] should be explained or clarified.
Apparently, the theory and practice of the evolutionary taxonomy of Duplodnaviria viruses needs further clarification and refinements, which should be based on decisions about the priorities relating to various genomic data, decisions that have several aspects, including philosophical questions about the relationship between holistic and reductionist approaches in evolutionary biology. The evolution of genome organisation, the emergence and exchange of genomic modules and the evolution of conserved proteins can, together, be used for the evolutionary taxonomy of viruses.
Given the lack of experimental data on viral proteins, accurate predictions of AlphaFold can be useful for reconstructing the evolution of proteins, making AlphaFold an important tool in evolutionary taxonomy. AlphaFold modelling can also be used to functionally assign proteins when commonly used BLAST and HMM searches fail. The collection of data obtained during this study, involving AlphaFold predictions and sequence-based analysis, has made it possible for several suggestions to be made concerning refinements to present taxonomic classification:
  • Bacteriophages of Guelinviridae, Rountreeviridae and Salasmaviridae families, Curtobacterium phage Ayka (group 1) and related phages can be considered as candidates for the delineation of a new order.
  • The families Ackermannviridae, Kyanoviridae and Straboviridae (group 2), and related phages, can be assigned to a new taxon of a higher rank.
  • The bacteriophages of Pachyviridae and Pervagoviridae families (group 3) are related to Crassvirales phages. These, and related phages, can be considered as candidates for the delineation of a new order.
  • The bacteriophages of the Casjensviridae family and Lambdavirus lambda (group 4) are evolutionarily related. The taxonomy of these, and related, groups requires additional research and refinements, taking into account the specifics of the evolution of temperate phages, which are highly susceptible to genetic exchanges.
  • The bacteriophages of the Duneviridae and Helgolandviridae families (group 5) are evolutionarily related and, together with related phages, can be considered as candidates for the delineation of a new order.
  • The evolutionary history and taxonomic classification of head-tailed archaeal viruses requires additional research and further clarification.

5. Conclusions

AlphaFold’s highly accurate predictions create new possibilities for studying the evolutionary history of viral proteins. In this study, use of the results of AlphaFold modelling, combined with the results of sequence-based analysis and other data, enabled the discovery of deep evolutionary relationships and suggestions for possible upgrades to taxonomic classifications of Duplodnaviria viruses.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom13010110/s1. Figure S1: VIRIDIC-generated heatmap of 57 viruses representing different Duplodnaviria groups. The colour coding indicates the clustering of phage genomes based on intergenomic similarity. The numbers represent the similarity values for each genome pair, rounded to the first decimal. Figure S2: Structural models of the product of whole genes encoding for the major capsid proteins in genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Figure S3: Structural models of the product of whole genes encoding for the ATPase subunit of terminase in genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Figure S4: The DALI matrix based on pairwise Z-score comparisons of 57 AF models of major capsid protein, with removed parts supposedly belonging to the protease and scaffolding domains and encapsulin. Figure S5: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of major capsid protein and an encapsulin aligned with MAFFT. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to encapsulin. Figure S6: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of major capsid protein and an encapsulin aligned with MUSCLE. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to encapsulin. Figure S7: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of major capsid protein and an encapsulin aligned with mTM-align. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to encapsulin. Figure S8: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase aligned with MAFFT. The numbers near the tree branches indicate the TBE support. The total number of bootstrap trees was 1000. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to Adenoviridae. Figure S9: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase aligned with MUSCLE. The numbers near the tree branches indicate the TBE support. The total number of bootstrap trees was 1000. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to Adenoviridae.; Figure S10: Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase aligned with mTM-align. The numbers near the tree branches indicate the TBE support. The total number of bootstrap trees was 1000. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to Adenoviridae. Figure S11: GRAViTy dendrogram constructed with database “DB-B: Baltimore Group Ib-Prokaryotic and archaeal dsDNA viruses (VMRv34)” and genomic sequences of 57 representative viruses.

Author Contributions

Conceptualisation, P.E. and K.M.; methodology, P.E., D.G. and M.S.; software, P.E. and D.G.; validation, P.E. and D.G.; formal analysis, P.E., D.G., M.S. and K.M.; investigation, P.E. and D.G.; resources, K.M.; data curation, P.E. and D.G.; writing—original draft preparation, P.E. and K.M.; writing—review and editing, K.M.; visualisation, P.E. and D.G.; supervision, K.M.; project administration, K.M.; funding acquisition, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Russian Science Foundation, grant #21-16-00047.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Dendrograms based on protein structure comparisons using mTM-align: (a) 57 major capsid protein and an encapsulin AF models. The tree was rooted to encapsulin; (b) 58 ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase. The tree was rooted to Adenoviridae; (c) 58 ATPase domains, extracted from AF models of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase, and the tree was rooted to Adenoviridae. The branch lengths are measured with TM-scores.
Figure A1. Dendrograms based on protein structure comparisons using mTM-align: (a) 57 major capsid protein and an encapsulin AF models. The tree was rooted to encapsulin; (b) 58 ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase. The tree was rooted to Adenoviridae; (c) 58 ATPase domains, extracted from AF models of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase, and the tree was rooted to Adenoviridae. The branch lengths are measured with TM-scores.
Biomolecules 13 00110 g0a1aBiomolecules 13 00110 g0a1bBiomolecules 13 00110 g0a1c

References

  1. Tolstoy, I.; Kropinski, A.M.; Brister, J.R. Bacteriophage Taxonomy: An Evolving Discipline. Methods Mol. Biol. 2018, 1693, 57–71. [Google Scholar] [CrossRef]
  2. Turner, D.; Kropinski, A.M.; Adriaenssens, E.M. A Roadmap for Genome-Based Phage Taxonomy. Viruses 2021, 13, 506. [Google Scholar] [CrossRef] [PubMed]
  3. Gorbalenya, A.E.; Krupovic, M.; Mushegian, A.; Kropinski, A.M.; Siddell, S.G.; Varsani, A.; Adams, M.J.; Davison, A.J.; Dutilh, B.E.; Harrach, B.; et al. The New Scope of Virus Taxonomy: Partitioning the Virosphere into 15 Hierarchical Ranks. Nat. Microbiol. 2020, 5, 668–674. [Google Scholar] [CrossRef]
  4. Baker, M.L.; Jiang, W.; Rixon, F.J.; Chiu, W. Common Ancestry of Herpesviruses and Tailed DNA Bacteriophages. J. Virol. 2005, 79, 14967–14970. [Google Scholar] [CrossRef] [Green Version]
  5. Current ICTV. Taxonomy Release|ICTV. Available online: https://ictv.global/taxonomy (accessed on 9 November 2022).
  6. Hubbs Carl Leavitt Evolution the New Synthesis. Am. Nat. 1943, 77, 365–368.
  7. Koonin, E.V. The Origin at 150: Is a New Evolutionary Synthesis in Sight? Trends Genet. TIG 2009, 25, 473–475. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Koonin, E.V. Darwinian Evolution in the Light of Genomics. Nucleic Acids Res. 2009, 37, 1011–1034. [Google Scholar] [CrossRef]
  9. Koonin, E.V.; Dolja, V.V.; Krupovic, M.; Varsani, A.; Wolf, Y.I.; Yutin, N.; Zerbini, F.M.; Kuhn, J.H. Global Organization and Proposed Megataxonomy of the Virus World. Microbiol. Mol. Biol. Rev. MMBR 2020, 84-, e00061-19. [Google Scholar] [CrossRef]
  10. Prangishvili, D.; Bamford, D.H.; Forterre, P.; Iranzo, J.; Koonin, E.V.; Krupovic, M. The Enigmatic Archaeal Virosphere. Nat. Rev. Microbiol. 2017, 15, 724–739. [Google Scholar] [CrossRef]
  11. Yutin, N.; Wolf, Y.I.; Koonin, E.V. Origin of Giant Viruses from Smaller DNA Viruses Not from a Fourth Domain of Cellular Life. Virology 2014, 466–467, 38–52. [Google Scholar] [CrossRef] [Green Version]
  12. Low, S.J.; Džunková, M.; Chaumeil, P.-A.; Parks, D.H.; Hugenholtz, P. Evaluation of a Concatenated Protein Phylogeny for Classification of Tailed Double-Stranded DNA Viruses Belonging to the Order Caudovirales. Nat. Microbiol. 2019, 4, 1306–1315. [Google Scholar] [CrossRef] [PubMed]
  13. Krupovic, M.; Koonin, E.V. Multiple Origins of Viral Capsid Proteins from Cellular Ancestors. Proc. Natl. Acad. Sci. USA 2017, 114, E2401–E2410. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Botstein, D. A Theory of Modular Evolution for Bacteriophages. Ann. NY Acad. Sci. 1980, 354, 484–490. [Google Scholar] [CrossRef] [PubMed]
  15. Campbell, A. Phage Evolution and Speciation. In The Bacteriophages; The Viruses; Calendar, R., Ed.; Springer US: Boston, MA, USA, 1988; pp. 1–14. ISBN 978-1-4684-5424-6. [Google Scholar]
  16. Hendrix, R.W.; Smith, M.C.M.; Burns, R.N.; Ford, M.E.; Hatfull, G.F. Evolutionary Relationships among Diverse Bacteriophages and Prophages: All the World’s a Phage. Proc. Natl. Acad. Sci. USA 1999, 96, 2192–2197. [Google Scholar] [CrossRef] [Green Version]
  17. Brüssow, H.; Canchaya, C.; Hardt, W.-D. Phages and the Evolution of Bacterial Pathogens: From Genomic Rearrangements to Lysogenic Conversion. Microbiol. Mol. Biol. Rev. MMBR 2004, 68, 560–602. [Google Scholar] [CrossRef] [Green Version]
  18. Evseev, P.; Lukianova, A.; Sykilinda, N.; Gorshkova, A.; Bondar, A.; Shneider, M.; Kabilov, M.; Drucker, V.; Miroshnikov, K. Pseudomonas Phage MD8: Genetic Mosaicism and Challenges of Taxonomic Classification of Lambdoid Bacteriophages. Int. J. Mol. Sci. 2021, 22, 10350. [Google Scholar] [CrossRef]
  19. Salemme, F.R.; Miller, M.D.; Jordan, S.R. Structural Convergence during Protein Evolution. Proc. Natl. Acad. Sci. USA 1977, 74, 2820–2824. [Google Scholar] [CrossRef] [Green Version]
  20. Wood, T.C.; Pearson, W.R. Evolution of Protein Sequences and Structures. J. Mol. Biol. 1999, 291, 977–995. [Google Scholar] [CrossRef] [Green Version]
  21. Holm, L.; Kääriäinen, S.; Rosenström, P.; Schenkel, A. Searching Protein Structure Databases with DaliLite v.3. Bioinformatics 2008, 24, 2780–2781. [Google Scholar] [CrossRef] [Green Version]
  22. Zhang, Y.; Skolnick, J. Scoring Function for Automated Assessment of Protein Structure Template Quality. Proteins 2004, 57, 702–710. [Google Scholar] [CrossRef]
  23. Zhou, X.; Chou, J.; Wong, S.T. Protein Structure Similarity from Principle Component Correlation Analysis. BMC Bioinformatics 2006, 7, 40. [Google Scholar] [CrossRef] [PubMed]
  24. Liu, Y.; Demina, T.A.; Roux, S.; Aiewsakun, P.; Kazlauskas, D.; Simmonds, P.; Prangishvili, D.; Oksanen, H.M.; Krupovic, M. Diversity, Taxonomy, and Evolution of Archaeal Viruses of the Class Caudoviricetes. PLOS Biol. 2021, 19, e3001442. [Google Scholar] [CrossRef] [PubMed]
  25. Benler, S.; Yutin, N.; Antipov, D.; Rayko, M.; Shmakov, S.; Gussow, A.B.; Pevzner, P.; Koonin, E.V. Thousands of Previously Unknown Phages Discovered in Whole-Community Human Gut Metagenomes. Microbiome 2021, 9, 78. [Google Scholar] [CrossRef] [PubMed]
  26. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
  27. Download—NCBI. Available online: https://www.ncbi.nlm.nih.gov/home/download/ (accessed on 3 November 2022).
  28. UniProt. Available online: https://www.uniprot.org/ (accessed on 25 May 2022).
  29. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [Green Version]
  30. Delcher, A.L.; Bratke, K.A.; Powers, E.C.; Salzberg, S.L. Identifying Bacterial Genes and Endosymbiont DNA with Glimmer. Bioinformatics 2007, 23, 673–679. [Google Scholar] [CrossRef] [Green Version]
  31. Gabler, F.; Nam, S.-Z.; Till, S.; Mirdita, M.; Steinegger, M.; Söding, J.; Lupas, A.N.; Alva, V. Protein Sequence Analysis Using the MPI Bioinformatics Toolkit. Curr. Protoc. Bioinforma. 2020, 72, e108. [Google Scholar] [CrossRef]
  32. Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G.R.; Wang, J.; Cong, Q.; Kinch, L.N.; Schaeffer, R.D.; et al. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373, 871–876. [Google Scholar] [CrossRef]
  33. Robetta Structure Prediction Service. Available online: https://robetta.bakerlab.org/ (accessed on 3 November 2022).
  34. Hiranuma, N.; Park, H.; Baek, M.; Anishchenko, I.; Dauparas, J.; Baker, D. Improved Protein Structure Refinement Guided by Deep Learning Based Accuracy Estimation. Nat. Commun. 2021, 12, 1340. [Google Scholar] [CrossRef]
  35. PyMOL|Pymol.Org. Available online: https://pymol.org/2/ (accessed on 11 November 2021).
  36. Holm, L. Dali Server: Structural Unification of Protein Families. Nucleic Acids Res. 2022, 50, W210–W215. [Google Scholar] [CrossRef]
  37. Dong, R.; Peng, Z.; Zhang, Y.; Yang, J. MTM-Align: An Algorithm for Fast and Accurate Multiple Protein Structure Alignment. Bioinformatics 2018, 34, 1719–1725. [Google Scholar] [CrossRef] [PubMed]
  38. PHYLIP. Home Page. Available online: https://evolution.genetics.washington.edu/phylip/ (accessed on 13 March 2022).
  39. Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T.J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; et al. Fast, Scalable Generation of High-Quality Protein Multiple Sequence Alignments Using Clustal Omega. Mol. Syst. Biol. 2011, 7, 539. [Google Scholar] [CrossRef] [PubMed]
  40. Katoh, K.; Standley, D.M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Edgar, R.C. MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Kozlov, A.M.; Darriba, D.; Flouri, T.; Morel, B.; Stamatakis, A. RAxML-NG: A Fast, Scalable and User-Friendly Tool for Maximum Likelihood Phylogenetic Inference. Bioinformatics 2019, 35, 4453–4455. [Google Scholar] [CrossRef] [Green Version]
  43. Edler, D.; Klein, J.; Antonelli, A.; Silvestro, D. RaxmlGUI 2.0: A Graphical Interface and Toolkit for Phylogenetic Analyses Using RAxML. Methods Ecol. Evol. 2021, 12, 373–377. [Google Scholar] [CrossRef]
  44. Darriba, D.; Posada, D.; Kozlov, A.M.; Stamatakis, A.; Morel, B.; Flouri, T. ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models. Mol. Biol. Evol. 2020, 37, 291–294. [Google Scholar] [CrossRef] [Green Version]
  45. Lemoine, F.; Domelevo Entfellner, J.-B.; Wilkinson, E.; Correia, D.; Dávila Felipe, M.; De Oliveira, T.; Gascuel, O. Renewing Felsenstein’s Phylogenetic Bootstrap in the Era of Big Data. Nature 2018, 556, 452–456. [Google Scholar] [CrossRef]
  46. Huerta-Cepas, J.; Serra, F.; Bork, P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol. Biol. Evol. 2016, 33, 1635–1638. [Google Scholar] [CrossRef] [Green Version]
  47. Robinson, D.F.; Foulds, L.R. Comparison of Phylogenetic Trees. Math. Biosci. 1981, 53, 131–147. [Google Scholar] [CrossRef]
  48. Moraru, C.; Varsani, A.; Kropinski, A.M. VIRIDIC—A Novel Tool to Calculate the Intergenomic Similarities of Prokaryote-Infecting Viruses. Viruses 2020, 12, 1268. [Google Scholar] [CrossRef] [PubMed]
  49. Simmonds, P.; Aiewsakun, P. Virus Classification—Where Do You Draw the Line? Arch. Virol. 2018, 163, 2037–2046. [Google Scholar] [CrossRef] [Green Version]
  50. Aiewsakun, P.; Adriaenssens, E.M.; Lavigne, R.; Kropinski, A.M.; Simmonds, P. Evaluation of the Genomic Diversity of Viruses Infecting Bacteria, Archaea and Eukaryotes Using a Common Bioinformatic Platform: Steps towards a Unified Taxonomy. J. Gen. Virol. 2018, 99, 1331–1343. [Google Scholar] [CrossRef] [PubMed]
  51. Letunic, I.; Bork, P. Interactive Tree Of Life (ITOL) v5: An Online Tool for Phylogenetic Tree Display and Annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef] [PubMed]
  52. Tarakanov, R.I.; Lukianova, A.A.; Evseev, P.V.; Pilik, R.I.; Tokmakova, A.D.; Kulikov, E.E.; Toshchakov, S.V.; Ignatov, A.N.; Dzhalilov, F.S.-U.; Miroshnikov, K.A. Ayka, a Novel Curtobacterium Bacteriophage, Provides Protection against Soybean Bacterial Wilt and Tan Spot. Int. J. Mol. Sci. 2022, 23, 10913. [Google Scholar] [CrossRef] [PubMed]
  53. Krylov, V.N.; Zhazykov, I.Z. Pseudomonas bacteriophage phiKZ--possible model for studying the genetic control of morphogenesis. Genetika 1978, 14, 678–685. [Google Scholar] [PubMed]
  54. Mesyanzhinov, V.V.; Robben, J.; Grymonprez, B.; Kostyuchenko, V.A.; Bourkaltseva, M.V.; Sykilinda, N.N.; Krylov, V.N.; Volckaert, G. The Genome of Bacteriophage ΦKZ of Pseudomonas Aeruginosa11Edited by M. Gottesman. J. Mol. Biol. 2002, 317, 1–19. [Google Scholar] [CrossRef]
  55. Kristensen, D.M.; Cai, X.; Mushegian, A. Evolutionarily Conserved Orthologous Families in Phages Are Relatively Rare in Their Prokaryotic Hosts▿. J. Bacteriol. 2011, 193, 1806–1814. [Google Scholar] [CrossRef] [Green Version]
  56. Al-Shayeb, B.; Sachdeva, R.; Chen, L.-X.; Ward, F.; Munk, P.; Devoto, A.; Castelle, C.J.; Olm, M.R.; Bouma-Gregson, K.; Amano, Y.; et al. Clades of Huge Phages from across Earth’s Ecosystems. Nature 2020, 578, 425–431. [Google Scholar] [CrossRef] [Green Version]
  57. Juhala, R.J.; Ford, M.E.; Duda, R.L.; Youlton, A.; Hatfull, G.F.; Hendrix, R.W. Genomic Sequences of Bacteriophages HK97 and HK022: Pervasive Genetic Mosaicism in the Lambdoid Bacteriophages. J. Mol. Biol. 2000, 299, 27–51. [Google Scholar] [CrossRef] [Green Version]
  58. Fokine, A.; Rossmann, M.G. Common Evolutionary Origin of Procapsid Proteases, Phage Tail Tubes, and Tubes of Bacterial Type VI Secretion Systems. Structure 2016, 24, 1928–1935. [Google Scholar] [CrossRef] [PubMed]
  59. Chang, J.R.; Spilman, M.S.; Rodenburg, C.M.; Dokland, T. Functional Domains of the Bacteriophage P2 Scaffolding Protein: Identification of Residues Involved in Assembly and Protease Activity. Virology 2009, 384, 144–150. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  60. Duda, R.L.; Oh, B.; Hendrix, R.W. Functional Domains of the HK97 Capsid Maturation Protease and the Mechanisms of Protein Encapsidation. J. Mol. Biol. 2013, 425, 2765–2781. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  61. Duda, R.L.; Hempel, J.; Michel, H.; Shabanowitz, J.; Hunt, D.; Hendrix, R.W. Structural Transitions during Bacteriophage HK97 Head Assembly. J. Mol. Biol. 1995, 247, 618–635. [Google Scholar] [CrossRef] [PubMed]
  62. Helgstrand, C.; Wikoff, W.R.; Duda, R.L.; Hendrix, R.W.; Johnson, J.E.; Liljas, L. The Refined Structure of a Protein Catenane: The HK97 Bacteriophage Capsid at 3.44 A Resolution. J. Mol. Biol. 2003, 334, 885–899. [Google Scholar] [CrossRef] [PubMed]
  63. Davis, C.R.; Backos, D.; Morais, M.C.; Churchill, M.E.A.; Catalano, C.E. Characterization of a Primordial Major Capsid-Scaffolding Protein Complex in Icosahedral Virus Shell Assembly. J. Mol. Biol. 2022, 434, 167719. [Google Scholar] [CrossRef]
  64. Fokine, A.; Leiman, P.G.; Shneider, M.M.; Ahvazi, B.; Boeshans, K.M.; Steven, A.C.; Black, L.W.; Mesyanzhinov, V.V.; Rossmann, M.G. Structural and Functional Similarities between the Capsid Proteins of Bacteriophages T4 and HK97 Point to a Common Ancestry. Proc. Natl. Acad. Sci. USA 2005, 102, 7163–7168. [Google Scholar] [CrossRef] [Green Version]
  65. Fang, Q.; Tang, W.-C.; Fokine, A.; Mahalingam, M.; Shao, Q.; Rossmann, M.G.; Rao, V.B. Structures of a Large Prolate Virus Capsid in Unexpanded and Expanded States Generate Insights into the Icosahedral Virus Assembly. Proc. Natl. Acad. Sci. USA 2022, 119, e2203272119. [Google Scholar] [CrossRef]
  66. Guo, F.; Liu, Z.; Fang, P.-A.; Zhang, Q.; Wright, E.T.; Wu, W.; Zhang, C.; Vago, F.; Ren, Y.; Jakana, J.; et al. Capsid Expansion Mechanism of Bacteriophage T7 Revealed by Multistate Atomic Models Derived from Cryo-EM Reconstructions. Proc. Natl. Acad. Sci. USA 2014, 111, E4606–E4614. [Google Scholar] [CrossRef] [Green Version]
  67. Duda, R.L.; Teschke, C.M. The Amazing HK97 Fold: Versatile Results of Modest Differences. Curr. Opin. Virol. 2019, 36, 9–16. [Google Scholar] [CrossRef]
  68. Ahi, Y.S.; Vemula, S.V.; Hassan, A.O.; Costakes, G.; Stauffacher, C.; Mittal, S.K. Adenoviral L4 33K Forms Ring-like Oligomers and Stimulates ATPase Activity of IVa2: Implications in Viral Genome Packaging. Front. Microbiol. 2015, 6, 318. [Google Scholar] [CrossRef] [PubMed]
  69. Hilbert, B.J.; Hayes, J.A.; Stone, N.P.; Xu, R.-G.; Kelch, B.A. The Large Terminase DNA Packaging Motor Grips DNA with Its ATPase Domain for Cleavage by the Flexible Nuclease Domain. Nucleic Acids Res. 2017, 45, 3591–3605. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  70. Zhao, H.; Christensen, T.E.; Kamau, Y.N.; Tang, L. Structures of the Phage Sf6 Large Terminase Provide New Insights into DNA Translocation and Cleavage. Proc. Natl. Acad. Sci. USA 2013, 110, 8075–8080. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  71. Meijer, W.J.J.; Horcajadas, J.A.; Salas, M. Φ29 Family of Phages. Microbiol. Mol. Biol. Rev. 2001, 65, 261–287. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  72. Kwan, T.; Liu, J.; DuBow, M.; Gros, P.; Pelletier, J. The Complete Genomes and Proteomes of 27 Staphylococcus Aureus Bacteriophages. Proc. Natl. Acad. Sci. USA 2005, 102, 5174–5179. [Google Scholar] [CrossRef] [Green Version]
  73. Ha, E.; Son, B.; Ryu, S. Clostridium Perfringens Virulent Bacteriophage CPS2 and Its Thermostable Endolysin LysCPS2. Viruses 2018, 10, 251. [Google Scholar] [CrossRef] [Green Version]
  74. Tétart, F.; Desplats, C.; Kutateladze, M.; Monod, C.; Ackermann, H.-W.; Krisch, H.M. Phylogeny of the Major Head and Tail Genes of the Wide-Ranging T4-Type Bacteriophages. J. Bacteriol. 2001, 183, 358–366. [Google Scholar] [CrossRef] [Green Version]
  75. Adriaenssens, E.M.; Vaerenbergh, J.V.; Vandenheuvel, D.; Dunon, V.; Ceyssens, P.-J.; Proft, M.D.; Kropinski, A.M.; Noben, J.-P.; Maes, M.; Lavigne, R. T4-Related Bacteriophage LIMEstone Isolates for the Control of Soft Rot on Potato Caused by ‘Dickeya Solani’. PLoS ONE 2012, 7, e33227. [Google Scholar] [CrossRef] [Green Version]
  76. Sullivan, M.B.; Huang, K.H.; Ignacio-Espinoza, J.C.; Berlin, A.M.; Kelly, L.; Weigele, P.R.; DeFrancesco, A.S.; Kern, S.E.; Thompson, L.R.; Young, S.; et al. Genomic Analysis of Oceanic Cyanobacterial Myoviruses Compared with T4-like Myoviruses from Diverse Hosts and Environments. Environ. Microbiol. 2010, 12, 3035–3056. [Google Scholar] [CrossRef] [Green Version]
  77. Green, J.; Rahman, F.; Saxton, M.; Williamson, K. Metagenomic Assessment of Viral Diversity in Lake Matoaka, a Temperate, Eutrophic Freshwater Lake in Southeastern Virginia, USA. Aquat. Microb. Ecol. 2015, 75, 117–128. [Google Scholar] [CrossRef]
  78. Bartlau, N.; Wichels, A.; Krohne, G.; Adriaenssens, E.M.; Heins, A.; Fuchs, B.M.; Amann, R.; Moraru, C. Highly Diverse Flavobacterial Phages Isolated from North Sea Spring Blooms. ISME J. 2022, 16, 555–568. [Google Scholar] [CrossRef] [PubMed]
  79. Dutilh, B.E.; Cassman, N.; McNair, K.; Sanchez, S.E.; Silva, G.G.Z.; Boling, L.; Barr, J.J.; Speth, D.R.; Seguritan, V.; Aziz, R.K.; et al. A Highly Abundant Bacteriophage Discovered in the Unknown Sequences of Human Faecal Metagenomes. Nat. Commun. 2014, 5, 4498. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  80. Yutin, N.; Makarova, K.S.; Gussow, A.B.; Krupovic, M.; Segall, A.; Edwards, R.A.; Koonin, E.V. Discovery of an Expansive Bacteriophage Family That Includes the Most Abundant Viruses from the Human Gut. Nat. Microbiol. 2018, 3, 38–46. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  81. Phothaworn, P.; Dunne, M.; Supokaivanich, R.; Ong, C.; Lim, J.; Taharnklaew, R.; Vesaratchavest, M.; Khumthong, R.; Pringsulaka, O.; Ajawatanawong, P.; et al. Characterization of Flagellotropic, Chi-Like Salmonella Phages Isolated from Thai Poultry Farms. Viruses 2019, 11, 520. [Google Scholar] [CrossRef] [Green Version]
  82. Lederberg, E.M.; Lederberg, J. Genetic Studies of Lysogenicity in Escherichia Coli. Genetics 1953, 38, 51–64. [Google Scholar] [CrossRef]
  83. Castillo, D.; Middelboe, M. Genomic Diversity of Bacteriophages Infecting the Fish Pathogen Flavobacterium Psychrophilum. FEMS Microbiol. Lett. 2016, 363, fnw272. [Google Scholar] [CrossRef] [Green Version]
  84. Krupovic, M. ICTV Report ConsortiumYR 2018 ICTV Virus Taxonomy Profile: Plasmaviridae. J. Gen. Virol. 2018, 99, 617–618. [Google Scholar] [CrossRef]
  85. Steven, A.C.; Greenstone, H.L.; Booy, F.P.; Black, L.W.; Ross, P.D. Conformational Changes of a Viral Capsid Protein. Thermodynamic Rationale for Proteolytic Regulation of Bacteriophage T4 Capsid Expansion, Co-Operativity, and Super-Stabilization by Soc Binding. J. Mol. Biol. 1992, 228, 870–884. [Google Scholar] [CrossRef]
  86. Bowman, B.R.; Baker, M.L.; Rixon, F.J.; Chiu, W.; Quiocho, F.A. Structure of the Herpesvirus Major Capsid Protein. EMBO J. 2003, 22, 757–765. [Google Scholar] [CrossRef] [Green Version]
  87. Hark Gan, H.; Perlow, R.A.; Roy, S.; Ko, J.; Wu, M.; Huang, J.; Yan, S.; Nicoletta, A.; Vafai, J.; Sun, D.; et al. Analysis of Protein Sequence/Structure Similarity Relationships. Biophys. J. 2002, 83, 2781–2791. [Google Scholar] [CrossRef] [Green Version]
  88. Felsenstein, J. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution 1985, 39, 783–791. [Google Scholar] [CrossRef]
  89. Som, A. Causes, Consequences and Solutions of Phylogenetic Incongruence. Brief. Bioinform. 2015, 16, 536–548. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  90. Kubatko, L.S.; Degnan, J.H. Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst. Biol. 2007, 56, 17–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  91. Thiergart, T.; Landan, G.; Martin, W.F. Concatenated Alignments and the Case of the Disappearing Tree. BMC Evol. Biol. 2014, 14, 266. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  92. Vakulenko, Y.; Deviatkin, A.; Drexler, J.F.; Lukashev, A. Modular Evolution of Coronavirus Genomes. Viruses 2021, 13, 1270. [Google Scholar] [CrossRef]
  93. McGeoch, D.J.; Rixon, F.J.; Davison, A.J. Topics in Herpesvirus Genomics and Evolution. Virus Res. 2006, 117, 90–104. [Google Scholar] [CrossRef]
  94. Davison, A.J. Evolution of the Herpesviruses. Vet. Microbiol. 2002, 86, 69–88. [Google Scholar] [CrossRef]
  95. Evseev, P.; Shneider, M.; Miroshnikov, K. Evolution of Phage Tail Sheath Protein. Viruses 2022, 14, 1148. [Google Scholar] [CrossRef]
  96. McGeoch, D.J.; Dolan, A.; Ralph, A.C. Toward a Comprehensive Phylogeny for Mammalian and Avian Herpesviruses. J. Virol. 2000, 74, 10401–10406. [Google Scholar] [CrossRef] [Green Version]
  97. Koonin, E.V.; Yutin, N. Evolution of the Large Nucleocytoplasmic DNA Viruses of Eukaryotes and Convergent Origins of Viral Gigantism. Adv. Virus Res. 2019, 103, 167–202. [Google Scholar] [CrossRef]
  98. Subramaniam, K.; Behringer, D.C.; Bojko, J.; Yutin, N.; Clark, A.S.; Bateman, K.S.; van Aerle, R.; Bass, D.; Kerr, R.C.; Koonin, E.V.; et al. A New Family of DNA Viruses Causing Disease in Crustaceans from Diverse Aquatic Biomes. mBio 2020, 11, e02938-19. [Google Scholar] [CrossRef] [PubMed]
  99. Weigel, C.; Seitz, H. Bacteriophage Replication Modules. FEMS Microbiol. Rev. 2006, 30, 321–381. [Google Scholar] [CrossRef] [PubMed]
  100. Koonin, E.V.; Dolja, V.V. Virus World as an Evolutionary Network of Viruses and Capsidless Selfish Elements. Microbiol. Mol. Biol. Rev. MMBR 2014, 78, 278–303. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  101. Yutin, N.; Raoult, D.; Koonin, E.V. Virophages, Polintons, and Transpovirons: A Complex Evolutionary Network of Diverse Selfish Genetic Elements with Different Reproduction Strategies. Virol. J. 2013, 10, 158. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of the morphology of phages belonging to the former Caudovirales order, namely, the Myoviridae, Podoviridae, and Siphoviridae families, along with the evolutionarily related herpesviruses wrapped in the tegument, which is depicted as a circle. Viral capsids are coloured in slate, while tails and portal–vertex complexes are purple.
Figure 1. Schematic representation of the morphology of phages belonging to the former Caudovirales order, namely, the Myoviridae, Podoviridae, and Siphoviridae families, along with the evolutionarily related herpesviruses wrapped in the tegument, which is depicted as a circle. Viral capsids are coloured in slate, while tails and portal–vertex complexes are purple.
Biomolecules 13 00110 g001
Figure 2. Structural models of the product of the whole genes encoding for the major capsid proteins in the genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Other predicted MCP structures are shown in Supplementary Figure S2.
Figure 2. Structural models of the product of the whole genes encoding for the major capsid proteins in the genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Other predicted MCP structures are shown in Supplementary Figure S2.
Biomolecules 13 00110 g002
Figure 3. (a) The HK97 fold and its common features coloured as indicated in the figure (according to Duda et al. [67]). The diagram is of the mature HK97 capsid (PDB code 1OHG_A). (b) Superimposition of AF models (depicted in teal) and RCSB PDB structures (depicted red) belonging to the same viruses: 1OHG (phage HK97, the mature capsid), 7SJ5 (λ, major capsid protein mutant in the pre-assembly conformation), 5VF3 (T4, mutant MCP in the isometric capsid), 3J7V (T7) and corresponding RMSD values.
Figure 3. (a) The HK97 fold and its common features coloured as indicated in the figure (according to Duda et al. [67]). The diagram is of the mature HK97 capsid (PDB code 1OHG_A). (b) Superimposition of AF models (depicted in teal) and RCSB PDB structures (depicted red) belonging to the same viruses: 1OHG (phage HK97, the mature capsid), 7SJ5 (λ, major capsid protein mutant in the pre-assembly conformation), 5VF3 (T4, mutant MCP in the isometric capsid), 3J7V (T7) and corresponding RMSD values.
Biomolecules 13 00110 g003
Figure 4. Structural models of the product of the whole genes encoding for the ATPase subunit of terminase in the genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Other predicted terminase structures are shown in Supplementary Figure S3.
Figure 4. Structural models of the product of the whole genes encoding for the ATPase subunit of terminase in the genomes of representative viruses (Table 1), obtained with AlphaFold. The N-terminus is labelled with a blue circle and the C-terminus is labelled with a yellow circle. Other predicted terminase structures are shown in Supplementary Figure S3.
Biomolecules 13 00110 g004
Figure 5. (a) Ribbon diagram of the phage HK97 large subunit of terminase (PDB code 6Z6D), coloured based on a rainbow gradient scheme, where the N-terminus of the polypeptide chain is coloured blue and the C-terminus is coloured red. (b) Superimposition of AF models (depicted in teal) and RCSB PDB structures (depicted in red), belonging to the same viruses: 6Z6D (phage HK97, using the whole model and separated ATPase and nuclease domains), 3CPE (phage T4, using the whole model) and corresponding RMSD values.
Figure 5. (a) Ribbon diagram of the phage HK97 large subunit of terminase (PDB code 6Z6D), coloured based on a rainbow gradient scheme, where the N-terminus of the polypeptide chain is coloured blue and the C-terminus is coloured red. (b) Superimposition of AF models (depicted in teal) and RCSB PDB structures (depicted in red), belonging to the same viruses: 6Z6D (phage HK97, using the whole model and separated ATPase and nuclease domains), 3CPE (phage T4, using the whole model) and corresponding RMSD values.
Biomolecules 13 00110 g005
Figure 6. Comparison of the overall accuracy of predictions made with the Local Distance Difference Test (lDDT), using the DeepAccNet accuracy predictor. MCP_RoseTTAFlold–RoseTTAFlold models of the MCP, MCP_AF2–AlphaFold models of the MCP, Ter_AF2–terminase ATPase subunits’ models predicted with AlphaFold, ATPase_AF2–ATPase domain of terminase ATPase subunits’ models predicted with AlphaFold.
Figure 6. Comparison of the overall accuracy of predictions made with the Local Distance Difference Test (lDDT), using the DeepAccNet accuracy predictor. MCP_RoseTTAFlold–RoseTTAFlold models of the MCP, MCP_AF2–AlphaFold models of the MCP, Ter_AF2–terminase ATPase subunits’ models predicted with AlphaFold, ATPase_AF2–ATPase domain of terminase ATPase subunits’ models predicted with AlphaFold.
Biomolecules 13 00110 g006
Figure 7. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons of 57 major capsid proteins and encapsulin AF models, using DALI. The branch lengths are measured using the DALI Z-score and the tree was rooted to encapsulin. “A”—archaeal viruses, “E”—eukaryotic viruses, “+”—phages infecting Gram-positive bacteria, and “-”—phages infecting Gram-negative bacteria.
Figure 7. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons of 57 major capsid proteins and encapsulin AF models, using DALI. The branch lengths are measured using the DALI Z-score and the tree was rooted to encapsulin. “A”—archaeal viruses, “E”—eukaryotic viruses, “+”—phages infecting Gram-positive bacteria, and “-”—phages infecting Gram-negative bacteria.
Biomolecules 13 00110 g007aBiomolecules 13 00110 g007b
Figure 8. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons of 58 AF models of ATPase subunit of terminase including an Adenoviridae terminase, using DALI. The branch lengths are measured using the DALI Z-score and the tree was rooted to Adenoviridae.
Figure 8. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons of 58 AF models of ATPase subunit of terminase including an Adenoviridae terminase, using DALI. The branch lengths are measured using the DALI Z-score and the tree was rooted to Adenoviridae.
Biomolecules 13 00110 g008aBiomolecules 13 00110 g008b
Figure 9. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons with DALI, using 57 ATPase domain structures extracted from TerL AF models including an Adenoviridae terminase. The branch lengths are measured using the DALI Z-score and the tree was rooted to Adenoviridae.
Figure 9. Matrix (a) and dendrogram (b) based on the pairwise Z-score comparisons with DALI, using 57 ATPase domain structures extracted from TerL AF models including an Adenoviridae terminase. The branch lengths are measured using the DALI Z-score and the tree was rooted to Adenoviridae.
Biomolecules 13 00110 g009aBiomolecules 13 00110 g009b
Figure 10. Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of major capsid protein and an encapsulin aligned with Clustal Omega. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to encapsulin.
Figure 10. Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of major capsid protein and an encapsulin aligned with Clustal Omega. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to encapsulin.
Biomolecules 13 00110 g010
Figure 11. Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase aligned with Clustal Omega. The numbers near the tree branches indicate the TBE support. The total number of bootstrap trees was 1000. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to Adenoviridae.
Figure 11. Best-scoring ML phylogenetic tree constructed with 57 amino acid sequences of ATPase subunits of Heunggongvirae terminases and an Adenoviridae terminase aligned with Clustal Omega. The numbers near the tree branches indicate the TBE support. The total number of bootstrap trees was 1000. The scale bar shows 0.5 estimated substitutions per site and the trees were rooted to Adenoviridae.
Biomolecules 13 00110 g011
Figure 12. Comparisons of the topological congruence of trees, obtained using structural alignments and different amino acid sequence alignments, and also the normalised RF score shown in matrices. The designation “mTM primary” means that the tree was constructed using the alignment of amino acid sequences with mTM-align; the designation “mTM structure” means that the tree was constructed using structural similarity as measured with TM-scores.
Figure 12. Comparisons of the topological congruence of trees, obtained using structural alignments and different amino acid sequence alignments, and also the normalised RF score shown in matrices. The designation “mTM primary” means that the tree was constructed using the alignment of amino acid sequences with mTM-align; the designation “mTM structure” means that the tree was constructed using structural similarity as measured with TM-scores.
Biomolecules 13 00110 g012
Table 1. List of Duplodnaviria Heunggongvirae representative viruses taken for the analyses and general genomic features. Eukaryotic viruses are coloured blue, bacterial viruses are coloured green, and archaeal viruses are coloured yellow.
Table 1. List of Duplodnaviria Heunggongvirae representative viruses taken for the analyses and general genomic features. Eukaryotic viruses are coloured blue, bacterial viruses are coloured green, and archaeal viruses are coloured yellow.
SpeciesOriginal NameICTV TaxonomyGenome Size, b.p.GC Content, %NCBI Accession
Ranid herpesvirus 1Lucke tumor herpesvirus-ranid herpesvirus 1Herpesvirales; Alloherpesviridae220,85954.6DQ665917.1
Human alphaherpesvirus 1Human herpesvirus 1 strain 17Herpesvirales; Herpesviridae152,22268.3JN555585.1
Haliotid herpesvirus 1Abalone herpesvirus Victoria/AUS/2009Herpesvirales; Malacoherpesviridae211,51846.8JX453331.1
Curtobacterium phage AykaCurtobacterium phage Aykanot classified18,40052.6ON381767.1
LacPavinLacPavin_0818_WC45not classified735,41132.2LR756501.1
Pseudomonas phage MD8Pseudomonas phage MD8not classified43,27761.1KX198612.1
Limestonevirus limestoneDickeya phage vB-DsoM-LIMEstone1Ackermannviridae152,42749.3HE600015.1
Harrekavirus harrekaOlleya phage Harreka_1Aggregaviridae43,17532.0MT732457.1
Cebadecemvirus phi10unaCellulophaga phage phi10:1Assiduviridae53,66431.5KC821618.1
Teseptimavirus T7Escherichia phage T7Autographiviridae39,93748.4V01146.1
Chivirus chiSalmonella phage χCasjensviridae59,40756.5JX094499.1
Lambdavirus lambdaEscherichia phage λLambdavirus48,50249.9J02459.1
Suwonvirus PP101Pectobacterium phage PP101Chaseviridae53,33344.9KY087898.2
Junduvirus communisuncultured phage cr2_1Crassvirales; Crevaviridae95,81532.7MZ130489.1
Jahgtovirus gastrointestinalisuncultured phage cr36_1Crassvirales; Intestiviridae96,46632.0MZ130479.1
Kahnovirus copriuncultured phage cr44_1Crassvirales; Steigviridae93,56435.8MZ130483.1
Afonbuvirus coliuncultured phage cr35_1Crassvirales; Suoliviridae97,70631.4MZ130499.1
Cetovirus cetoVibrio phage CetoDemerecviridae128,24139.9MG649966.1
Donellivirus geeBacillus phage GDonellivirus497,51329.9JN638751.1
Tunavirus T1Escherichia phage T1Drexlerviridae48,83645.6AY216660.1
Unahavirus uv1HFlavobacterium phage 1HDuneviridae39,29031.4KU599889.1
Freyavirus freyaPolaribacter phage Freya_1Forsetiviridae43,97828.9MT732463.1
Gregsiragusavirus CPS1Clostridium phage CPS1Guelinviridae19,08928.3KY996523.1
Leefvirus LeefPolaribacter phage Leef_1Helgolandviridae37,54729.7MT732473.1
Byrnievirus HK97Escherichia phage HK97Hendrixvirinae39,73249.8AF069529.1
Pecentumvirus P100Listeria phage P100Herelleviridae131,38436.0DQ004855.1
Beejeyvirus BJ1Halorubrum virus BJ1Kirjokansivirales; Graaviviridae42,27164.9AM419438.1
Retbasiphovirus HFTV1Haloferax tailed virus 1Kirjokansivirales; Haloferuviridae38,05954.1MG550112.1
Hatrivirus HATV3Haloarcula tailed virus 3Kirjokansivirales; Pyrstoviridae42,29351.1MZ334527.1
Lonfivirus HSTV1Haloarcula sinaiiensis tailed virus 1Kirjokansivirales; Shortaselviridae32,18960.3KC117378.1
Bellamyvirus bellamySynechococcus phage BellamyKyanoviridae204,93041.1MF351863.1
Clampvirus HHTV1Haloarcula hispanica tailed virus 1Madisaviridae49,10756.5KC292025.1
Pseudomonas virus YuaPseudomonas phage YuAMesyanzhinovviridae58,66364.3AM749441.1
Metforvirus Drs3Methanobacterium virus Drs3Methanobavirales; Anaerodiviridae37,12941.2MH674343.1
Psimunavirus psiM2Methanobacterium phage psiM2Methanobavirales; Leisingerviridae26,11146.3AF065411.1
Mollyvirus collyMaribacter phage Colly_1Molycolviridae124,16936.3MT732450.1
Noahvirus arcBacteriophage DSS3_VP1Naomviridae75,08747.5MN602266.1
Bonaevitae bonaevitaeMicrobacterium phage BonaeVitaeOrlajensenviridae17,45168.2MH045556.1
Bacelvirus phi46tresCellulophaga phage phi46:3Pachyviridae72,96132.7KC821622.1
Peduovirus P2Bacteriophage P2Peduoviridae; Peduovirus33,59350.2AF063097.1
Callevirus CalleCellulophaga phage Calle_1Pervagoviridae72,97938.1MT732432.1
Phikzvirus phiKZPseudomonas phage phiKZPhikzvirus280,33436.8AF399011.1
Rosenblumvirus rv66Bacteriophage 66Rountreeviridae18,19929.3AY954949.1
Salasvirus phi29Bacillus phage phi29Salasmaviridae19,28240.0EU771092.1
Halohivirus HHTV2Haloarcula hispanica tailed virus 2Saparoviridae52,64366.6KC292024.1
Enquatrovirus N4Escherichia phage N4Schitoviridae70,15341.3EF056009.1
Tequatrovirus T4Escherichia phage T4Straboviridae168,90335.3AF158101.6
Pormufvirus HRTV28Halorubrum tailed virus 28 isolate HRTV-28/28Suolaviridae35,27064.3MZ334528.1
Hacavirus HCTV1Haloarcula californiae tailed virus 1Thumleimavirales; Druskaviridae103,25757.0KC292029.1
Haloferacalesvirus HF1Halophage HF1Thumleimavirales; Hafunaviridae75,89855.8AY190604.2
Hagravirus HGTV1Halogranum tailed virus 1Thumleimavirales; Halomagnusviridae143,85550.4KC292026.1
Eilatmyovirus HATV2Haloarcula tailed virus 2Thumleimavirales; Soleiviridae63,30149.7MZ334525.1
Myohalovirus phiHHalobacterium phage phiHVertoviridae58,07263.7MK002701.1
Bromdenvirus bromdenMycobacterium phage BromdenVilmaviridae70,18358.2MH576973.1
Peternellavirus peternellaWinogradskyella phage Peternella_1Winoviridae39,64935.4MT732475.1
Foxborovirus foxboroGordonia phage FoxboroZierdtviridae67,77365.8MH727547.1
Siovirus americenseRoseobacter phage SIO1Zobellviridae39,89846.2AF189021.1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Evseev, P.; Gutnik, D.; Shneider, M.; Miroshnikov, K. Use of an Integrated Approach Involving AlphaFold Predictions for the Evolutionary Taxonomy of Duplodnaviria Viruses. Biomolecules 2023, 13, 110. https://doi.org/10.3390/biom13010110

AMA Style

Evseev P, Gutnik D, Shneider M, Miroshnikov K. Use of an Integrated Approach Involving AlphaFold Predictions for the Evolutionary Taxonomy of Duplodnaviria Viruses. Biomolecules. 2023; 13(1):110. https://doi.org/10.3390/biom13010110

Chicago/Turabian Style

Evseev, Peter, Daria Gutnik, Mikhail Shneider, and Konstantin Miroshnikov. 2023. "Use of an Integrated Approach Involving AlphaFold Predictions for the Evolutionary Taxonomy of Duplodnaviria Viruses" Biomolecules 13, no. 1: 110. https://doi.org/10.3390/biom13010110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop