*Article* **Exploring Codon Adjustment Strategies towards** *Escherichia coli***-Based Production of Viral Proteins Encoded by HTH1, a Novel Prophage of the Marine Bacterium** *Hypnocyclicus thermotrophus*

**Hasan Arsın 1,2,\*, Andrius Jasilionis <sup>3</sup> , Håkon Dahle 2,4, Ruth-Anne Sandaa <sup>1</sup> , Runar Stokke 1,2 , Eva Nordberg Karlsson <sup>3</sup> and Ida Helene Steen 1,2,\***


**Citation:** Arsın, H.; Jasilionis, A.; Dahle, H.; Sandaa, R.-A.; Stokke, R.; Nordberg Karlsson, E.; Steen, I.H. Exploring Codon Adjustment Strategies towards *Escherichia coli*-Based Production of Viral Proteins Encoded by HTH1, a Novel Prophage of the Marine Bacterium *Hypnocyclicus thermotrophus*. *Viruses* **2021**, *13*, 1215. https://doi.org/ 10.3390/v13071215

Academic Editors: Carla Varanda and Patrick Materatski

Received: 21 May 2021 Accepted: 18 June 2021 Published: 23 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Abstract:** Marine viral sequence space is immense and presents a promising resource for the discovery of new enzymes interesting for research and biotechnology. However, bottlenecks in the functional annotation of viral genes and soluble heterologous production of proteins hinder access to downstream characterization, subsequently impeding the discovery process. While commonly utilized for the heterologous expression of prokaryotic genes, codon adjustment approaches have not been fully explored for viral genes. Herein, the sequence-based identification of a putative prophage is reported from within the genome of *Hypnocyclicus thermotrophus*, a Gram-negative, moderately thermophilic bacterium isolated from the Seven Sisters hydrothermal vent field. A prophage-associated gene cluster, consisting of 46 protein coding genes, was identified and given the proposed name *Hypnocyclicus thermotrophus* phage H1 (HTH1). HTH1 was taxonomically assigned to the viral family *Siphoviridae*, by lowest common ancestor analysis of its genome and phylogeny analyses based on proteins predicted as holin and DNA polymerase. The gene neighbourhood around the HTH1 lytic cassette was found most similar to viruses infecting Gram-positive bacteria. In the HTH1 lytic cassette, an N-acetylmuramoyl-L-alanine amidase (Amidase\_2) with a peptidoglycan binding motif (LysM) was identified. A total of nine genes coding for enzymes putatively related to lysis, nucleic acid modification and of unknown function were subjected to heterologous expression in *Escherichia coli*. Codon optimization and codon harmonization approaches were applied in parallel to compare their effects on produced proteins. Comparison of protein yields and thermostability demonstrated that codon optimization yielded higher levels of soluble protein, but codon harmonization led to proteins with higher thermostability, implying a higher folding quality. Altogether, our study suggests that both codon optimization and codon harmonization are valuable approaches for successful heterologous expression of viral genes in *E. coli*, but codon harmonization may be preferable in obtaining recombinant viral proteins of higher folding quality.

**Keywords:** prophage; hydrothermal vent; *Hypnocyclicus thermotrophus*; lytic cassette; *Escherichia coli*; heterologous expression; codon optimization; codon harmonization

#### **1. Introduction**

Hydrothermal vents host some of the most diverse microbial communities in marine environments. Diverse (hyper)thermophilic bacteria and archaea grow within the steep chemical and temperature gradients formed by rapid mixing of high temperature (up to

above 300 ◦C) reduced vent fluids and cold seawater [1,2]. The discovery of the hydrothermal vent ecosystem remains one of the biggest breakthroughs in our understanding of how life can be sustained in extreme conditions, marked by the first vent observation on the Galápagos Rift, in the eastern Pacific [3] and the discovery of the first black smoker vents [4]. Today, hydrothermal vents are well-known as attractive sites for bioprospecting of biotechnologically interesting enzymes [5–8] and other valuable biomolecules with potential industrial applications [6,9,10]. As with other marine biomes [11–13], hydrothermal vent environments are observed to be abundant with viruses, especially tailed dsDNA bacteriophages of order *Caudovirales* [14,15]. These viruses remain a largely unexplored space of genetic diversity and, therefore, an under-utilized source for enzyme bioprospecting efforts [16,17].

The unique biology of host-reliant viral replication makes viruses remarkably interesting entities for biotechnology, where lytic enzymes can be found associated with their strategy of host infection [18,19]. While lytic phages reproduce by host cell lysis, lysogenic or temperate phages can remain dormant until induction, either as so-called "prophages" integrated into the host genome, or as extrachromosomal elements [20,21]. Temperate phages have been reported as particularly present in the microbial communities associated with vent fields [15,22], likely related to challenging environmental factors such as lower host abundances, limiting nutrient availability, and the fringe physical and chemical conditions present at these sites. In addition, the set of viral genes made available to the host via lysogeny may also produce fitness-enhancing phenotypes, increasing the host resilience in these environments [23–25].

The currently studied minority of bacteriophages have yielded numerous biotechnologically important enzymes. Some significant examples include enzymes acting on nucleic acids, such as DNA polymerases, DNA ligases from bacteriophages T4 [26,27] and T7 [28–30], and exonuclease from the bacteriophage T5 [31]. Furthermore, lytic enzymes such as endolysins, naturally arming the phages for the degradation of bacterial cell walls, are of increasing interest as bactericidal agents [32–35] and have been subjected to trials as phage therapy [36,37]. Many of the above viruses were studied from isolates and provide a glimpse into similar discoveries possible from within the vast viral sequence space in marine environments [13,16].

To be able to study discovered viral enzymes of potential biotechnological interest, molecular cloning and heterologous expression approaches are required to produce the enzymes in amounts needed for characterization experiments. Study of the heterologous expression of viral genes from marine metagenomes, however, has been extremely limited [38]. Extending the knowledge in this field has subsequently been a major task in the project Virus-X (Viral Metagenomics for Innovation Value) aiming to identify and characterize novel enzymes and other proteins from bacteriophages and archaeal viruses. To date, only a few examples of studies describing the expression of viral genes from environmental marine resources are reported [39,40]. For the heterologous production of most proteins, *Escherichia coli* remains a desirable host due to its ease of use, quick generation times and a wide genetic toolkit regarding cloning and expression vectors [41]. However, *E. coli* does present certain well-documented challenges in soluble protein production when expressing genes from genetically less-related sources [41,42]. Furthermore, the distinct codon usage bias of *E. coli* often presents a difference in the availability of tRNAs between the native organism and itself, adversely affecting protein expression efficacy [43,44].

Numerous approaches exist to increase soluble protein yields of recombinant genes in *E. coli*. The use of various fusion protein tags has been a popular and effective way to improve soluble yields for many years [45–48]. The use of transcription-level adjustments to improve soluble protein expression has been described in recent years, initially as "codon optimization" [49] and later as "codon harmonization" [50]. Both of these approaches rely on the modification of codons in the DNA sequence of the target prior to expression, to code for the same eventual polypeptide, but with a set of tRNAs tailored for the machinery of the expression host. The difference among these approaches can be summarized as such:

codon optimization substitutes rare codons in the native gene sequence with those that are most abundant in the heterologous host, potentially allowing a high-speed protein production, whereas codon harmonization aims to replicate the cadence of native gene expression in the host, potentially allowing for correct protein folding during expression. While codon optimization has been widely demonstrated to have some degree of success in expressing genes from a diverse range of native hosts [48,51], including viruses [52–54], codon harmonization is a more recent approach and, to our knowledge, has not yet been explored towards the expression of viral genes in *E. coli*.

In this work, we report the first study of a temperate phage infecting *H. thermotrophus:* a free-living, Gram-negative, moderately thermophilic bacterium isolated from a microbial mat collected from the Seven Sisters hydrothermal vent field located on the Arctic Mid-Ocean Ridge [55,56]. Within the phylum *Fusobacteria, Hypnocyclicus thermotrophus* IR-2<sup>T</sup> (=DSM 100055 =JCM 30901) is listed as the current type strain of the genus *Hypnocyclicus*. In addition to describing the identification, gene organization and taxonomic analysis of the prophage via in silico methods, we also report on our efforts to identify and recombinantly express genes with potential links to various lytic and nucleic acid modifying enzymatic activities. In an effort to facilitate the soluble heterologous production of proteins in *E. coli,* we implemented the codon optimization and harmonization approaches in parallel for a set of nine diverse enzyme candidates. The comparison of proteins produced via these approaches revealed notable differences in their soluble yields and thermostability. Altogether, the combined strategy used herein presents a cohesive application of both bioinformatics and molecular biology to improve access to the viral genetic diversity present in marine environments.

#### **2. Materials and Methods**

#### *2.1. Identification and Annotation of Prophage Genes*

The annotated genome assembly of the bacterium *H. thermotrophus* was downloaded from NCBI GenBank (RefSeq GCF\_004365575.1). Manual analysis of the genome indicated presence of prophage genes. To further assess these putative prophage genes, the GenBank file of the assembly was uploaded to the PHASTER (https://phaster.ca/, accessed on 10 December 2019) [57,58] online tool and compared against the PHASTER prophage/virus database (last updated in August 2019). The analysis output described the genome region(s) containing the prophage genes, along with putative functional annotations. In addition to annotations provided by NCBI and PHASTER, the HHpred server (https://toolkit.tuebingen.mpg.de/tools/hhpred, accessed on 1 December 2020) [59–61], and the eggNOG-Mapper (http://eggnog-mapper.embl.de accessed on 12 December 2019) [62,63] online services were also used for the functional annotation of the prophage genes using corresponding amino acid sequences.

When using the HHpred server for the pairwise comparison of profile hidden Markov models (HMMs), the databases queried were PDB\_mmCIF70\_29\_Nov, Pfam-A\_v33.1, COG\_KOG\_v1.0 and NCBI\_Conserved\_Domains(CDs)\_v3.18.

#### *2.2. Taxonomic Analysis of HTH1*

To taxonomically characterize HTH1, the genes identified as phage-related using the PHASTER tool were subjected to a translated nucleotide to protein BLAST (blastx, accessed on 2 April 2020) search. The following parameters: organism = viruses (txid:10239), number of alignments = 100, word size = 6 were used. The resulting hits were then parsed and taxonomically assigned by lowest common ancestor (LCA) analysis [64] in MEGAN software (version 6.18.6) (Tübingen, Germany) [65]. The following parameters were used: minimum support = 2, minimum score = 70, top percent = 10. Megan Mapping Database file version October 2019 was used.

With a reported success rate of 93% when assigning tailed and unclassified phages to their defined head–neck–tail-based categories, the "Remote Homology Detection of Viral Protein Families—Virfam" [66] (http://biodev.cea.fr/virfam, accessed on 3 April 2020) server was also used to further analyse the taxonomy of HTH1.

#### *2.3. Analysis of Prophage Host Range*

In order to analyse the currently documented host range of similar phages, DNA sequence of HTH1 was used to perform a translated nucleotide–protein BLAST (blastx) search as described above, except using the NCBI non-redundant (nr) nucleotide database. The species names of the top 5000 hits were parsed and uploaded to phyloT (https:// phylot.biobyte.de/, accessed on 5 April 2020) (version 2) [67] online tool to visualize the taxonomic distribution by generating a phylogeny of the cumulative NCBI taxonomy lineages of each species on the list (Supplementary Materials Figure S1).

#### *2.4. Phylogeny Analyses*

The amino acid sequence of the holin (GenBank WP\_134112787.1) identified in HTH1 was used as a basis for phylogeny analyses and relationship of the prophage to viruses in the NCBI (nr) database. A protein–protein BLAST (blastp) search was performed via NCBI BLAST [68] with the following parameters: organism = viruses (txid:10239), word size = 6. A list of 94 proteins exported from the BLAST search (including the holin from HTH1) was aligned using MAFFT (version 7.453) [69]. Gap regions were trimmed with trimAl (version 1.2 rev59) [70] using the '*gappyout*' command to automatically trim sequences based on gaps in the alignment. The resulting trimmed alignment comprising 106 amino acid positions was manually analysed and used as a basis to infer maximum likelihood (ML) phylogeny using IQ-TREE (version 1.6.12) [71] tool. The best-fitting model was automatically determined by ModelFinder [72], and ultrafast bootstrapping was performed with 1000 replicates [73]. The best-fitting model was identified as LG+I+G4 (general matrix with invariable site plus discrete gamma model [74,75]). The resulting tree was then annotated using the online Interactive Tree of Life (iTOL) (https://itol.embl.de/, accessed on 5 April 2020) (version 5.5.1) [76] software. Branches with less than 50% bootstrap support were collapsed (Figure 1).

The amino acid sequence of HTH1 holin was further analysed using a protein–protein BLAST (blastp) against the Integrated Microbial Genomics/Virus (IMG/VR) (https:// img.jgi.doe.gov/vr/, accessed on 10 April 2020) [77] and Ocean Gene Atlas (http://taraoceans.mio.osupytheas.fr/ocean-gene-atlas/, accessed on 10 April 2020) [78] databases to compare the prophage to viral genes from environmental samples, metagenomic datasets and other non-isolated virus genes. The top 100 hits with the highest percent identity from the IMG/VR search and all the hits (18) from the Ocean Gene Atlas were extracted in addition to the 94 sequences from NCBI as described above. After automatic and manual curation to remove duplicates or non-holin hits, a total of 211 holin-related sequences were aligned, trimmed and visualized as described above, with the best-fitting ML model for this group of sequences identified as LG+F+I+G4 (general matrix with invariable site plus discrete gamma model [74,75] with empirical codon frequencies counted from the data) (Supplementary Materials Figure S2).

The putative DNA polymerase (HTP4385) (GenBank WP\_134112782.1) was also subjected to phylogeny analysis, using the same parameters as described above for the holinbased tree. In this analysis, a list of 101 protein entries was used to create a 625 amino acid long alignment for the construction of the tree shown in Supplementary Materials Figure S3.

#### *2.5. Gene Neighbourhood Analysis*

Gene neighbourhoods between genes of HTH1 and three highly similar viral gene clusters was compared. The similar viral gene clusters were selected based on the closest alignments to the HTH1 holin in the extended tree shown in Supplementary Materials Figure S2. Alongside HTH1, marine anoxygenic phototropic community R3 (MAPCR3) (IMG scaffold ID: Ga0071011\_100294), *Streptococcus* phage Javan630 (SPJ630) (NCBI:txid2548289) and

the *Erysipelothrix* phage phi1605 (EP1605) (NCBI:txid2006938) were inspected using Gene-Graphics (https://katlabs.cc/genegraphics/app, accessed on 20 April 2020) [79] (Figure 2) by uploading the relevant genome regions with their annotations for each entry in NCBI GenBank format to the online tool.

#### *2.6. Selection of Genes for Expression Trials*

In addition to the genes constituting the lytic cassette, genes with various putative functions on either side of the HTH1 lytic cassette were analysed. After inspection, nine genes were selected for expression trials for their putative activities related to lysis and DNA replication, including three genes with hypothetical function or conserved domains of unknown function (DUF). The selected genes were labelled with the prefix HTP (*H. thermotrophus* phage) followed by the last four digits of their corresponding locus tag in the NCBI GenBank annotation (such as HTP4435). The selected genes and their annotated domain structures predicted by the HMMER web service (https://www.ebi.ac.uk/Tools/hmmer/search/phmmer, accessed on 20 April 2020) [80–82] were visualized in Figure 3.

#### *2.7. Preparation of Sequences for Protein Expression of Selected Genes*

Codon optimization [49] and codon harmonization approaches [83,84] were used in parallel to evaluate their effectivity in obtaining properly folded, soluble protein from each of the selected genes tailored for heterologous expression in *E. coli*. Codon-optimized gene sequences were generated via GenSmart Codon Optimization (GenScript, Piscataway, NJ, USA) online tool following default codon optimization parameters. Codon Harmonizer developed by Claassens et al. [51] online tool was used to harmonize codon usage frequencies between the prophage host *H. thermotrophus* NCBI GenBank (RefSeq GCF\_004365575.1) and the heterologous expression host *E. coli* BL21(DE3) NCBI GenBank (GenBank GCA\_000022665.2) (accessed in April 2019). Codon Adaptation Index (CAI) and Codon Harmonization Index (CHI) values were calculated for each sequence. Both the codon-optimized and codon-harmonized target protein gene sequences (Supplementary Materials File S1) were ordered to be synthesized and delivered pre-cloned in pET-21b(+) (Merck, Darmstadt, Germany) [85] vector (GenScript, Leiden, the Netherlands), featuring a C-terminal hexa-histidine tag [86] to facilitate purification using affinity chromatography.

#### *2.8. Protein Production in E. coli*

All expression constructs were transformed into *E. coli* BL21(DE3) (Merck, Darmstadt, Germany) cells using the heat-shock protocol provided by the manufacturer, using 30 ng of plasmid per 15 µL of bacteria suspension. Single colonies were picked from Lysogeny Broth (LB)-agar plates containing 100 µg/mL ampicillin after plating and overnight growth at 37 ◦C, and 10 mL pre-cultures in LB were subsequently inoculated and incubated overnight at 37 ◦C with 220 rpm shaking. Expression cultures in Tryptic Soy Broth (Merck, Darmstadt, Germany) (adjusted to pH 7.4/RT) at 100 mL scale were inoculated with 5% (*v/v*) of each pre-culture and were grown at 37 ◦C and 220 rpm until an optical density at 600 nm of 0.5–0.6 was reached. The incubation temperature was then reduced to 28 ◦C and allowed to equilibrate for 30 min. Expression was induced with 0.5 mM isopropyl β-D-1 thiogalactopyranoside, at 28 ◦C for 5 h. Following the expression, cells were harvested by centrifugation at 5000× *<sup>g</sup>* at 4 ◦C for 10 min. Collected cells were re-suspended in 10 mL of lysis buffer containing 50 mM Tris-HCl pH 7.4/RT, 60 mM imidazole, 500 mM NaCl and 5% (*v/v*) glycerol and were lysed using ultrasonication performed at 4 ◦C using <sup>5</sup> × 30 s bursts at 15 s intervals, with 25% amplitude. An aliquot representing the total protein fraction was taken and stored at 4 ◦C from each crude lysate before clarification of lysates by centrifugation at 12,000× *g* at 4 ◦C for 3 min. After clarification, aliquots were taken from all samples representing the soluble protein fraction and stored at 4 ◦C.

#### *2.9. Protein Solubility Assessment and Yield Estimation*

Aliquots taken from lysed cell pellets, representing the total protein (crude lysate) and soluble protein (clear lysate) fractions were run on a gradient (8–16%) SDS-PAGE gel (Gen-Script, Piscataway, NJ, USA) to assess expression levels. Precision Plus Dual Color (Bio-Rad, Hercules, CA, USA) protein ladder was used for protein molecular mass determination. Equivalent volumes of protein samples were loaded onto the electrophoresis gels seeking to fractionate equal protein amounts. The gel was run at 200 V, and subsequently stained using InstantBlue (Expedeon, Cambridge, UK) using a staining protocol provided by the manufacturer. After staining was complete, unbound dye was washed off the gel using distilled water on a benchtop shaker to reveal protein bands. The gels were photographed using MiniBIS Pro system processing images with GelCapture (version 7.0.15) suite (DNR Bio-Imaging Systems, Neve Yamin, Israel).

Densitometry calculations to determine relative abundance of target proteins in the soluble lysate fractions were performed using GelQuantum Pro (version 12.2) suite (DNR Bio-Imaging Systems, Neve Yamin, Israel). Total protein concentration was measured with a NanoDrop 1000 spectrophotometer (operating software version 3.7; Thermo Fisher Scientific, Waltham, MA, USA), assuming A<sup>280</sup> 1 = 1 mg/mL. Target protein soluble yields were estimated by combining the results of densitometry and total soluble protein quantification.

#### *2.10. Protein Purification*

HTH1 proteins obtained in soluble form were purified to near homogeneity from clear lysate fractions by nickel affinity chromatography. Soluble protein fraction in lysis buffer was loaded 1 mL/min into a HisTrap HP 1 mL (7 mm × 25 mm) column (Cytiva, Uppsala, Sweden) equilibrated with lysis buffer. Target proteins were eluted (2 column volumes (CV)) with elution buffer containing 50 mM Tris-HCl pH 7.4/RT, 500 mM imidazole, 500 mM NaCl and 5% (*v/v*) glycerol at 1 mL/min after extensive washing (5–8 CV) of unbound proteins with lysis buffer. The purified proteins were stored in elution buffer at 4 ◦C after filtering twice through regenerated cellulose 0.2 µm pore size syringe filters (GE Healthcare, Uppsala, Sweden).

Protein integrity and purity were assessed via SDS-PAGE. Protein concentrations were measured spectrophotometrically, considering calculated absorption coefficients for pure proteins. Purification yields were calculated comparing the target protein amount in the soluble protein fractions with the target protein amount obtained after the purification and filtration steps.

#### *2.11. Protein Thermal Unfolding Assay*

Nanoscale differential scanning fluorometry based on internal tryptophane as well as tyrosine content was performed to determine the melting temperatures (Tm, ◦C) of purified HTH1 proteins. These measurements were carried out on a Prometheus NT.48 system using standard grade capillaries (NanoTemper Technologies, Munich, Germany). The purified protein samples were diafiltrated into assay buffer containing 50 mM Tris-HCl pH 7.4/RT and 2% (*v/v*) glycerol using Amicon Ultra-0.5 mL (3 Kda) centrifugal filters (Merck, Darmstadt, Germany). Protein concentrations were adjusted to 0.2 mg/mL with assay buffer after diafiltration. Thermal unfolding assays were performed at adjusted 40% excitation power, with a temperature gradient between 20–95 ◦C and at a ramp rate of 1 ◦C/min. Finally, analysis of the recorded emission intensities, emission ratio (350 nm/330 nm) and first derivative calculations were processed using the PR.ThermControl software (version 2.0.4) (NanoTemper Technologies, Munich, Germany).

#### **3. Results**

#### *3.1. Functional Annotation and Taxonomy Analysis of HTH1*

Three regions of putative viral origin were identified within the *H. thermotrophus* using the PHASTER tool [58]. Region 1 (Supplementary Materials Table S1) was reported as an incomplete prophage region (PHASTER score: 10), consisting of eight conserved domains

(CDs) from locus tags EV215\_RS03310 to EV215\_RS03345 in the sense (+) strand. Region 2 was also predicted as incomplete (PHASTER score: 50), consisting of 33 CDs from locus tags EV215\_RS04355 to EV215\_RS04515. However, attachment sites attL and attR (nucleotide sequence TTACCATCTTA) were found between locus tags EV215\_RS04470-EV215\_RS04475 and EV215\_RS04435-EV215\_RS04440, respectively, within region 2, indicating that this region was likely associated with viral interaction and virus integration on to the host genome. Region 3 was predicted to be an intact prophage region (PHASTER score: 100) and contained 29 CDs from locus tags EV215\_RS04440 to EV215\_RS04580. There was an 11,971 bp overlap between regions 2 and 3, representing 16 CDs, with both regions found on the complementary (−) strand of the genome. Furthermore, regions 2 and 3 showed highly similar average G + C contents, 37.7% and 38.5%, respectively. In comparison, the average G + C contents of region 1 and the host genome were 27.9% and 24.8%, respectively. Due to their overlap, and coherent composition, regions 2 and 3 were considered as the "complete" prophage genome, totalling 46 CDs and a genome size of 41,571 bp. This region was subsequently designated with the proposed name *Hypnocyclicus thermotrophus* phage H1 (HTH1). With the combined use of various pipelines, functional annotations could be suggested for 34 HTH1 genes. The remaining 12 were noted as hypothetical, or to contain unknown elements as listed in Supplementary Materials Tables S1 and S2.

Taxonomic analysis based on the LCA algorithm in MEGAN suggested affiliation of HTH1 with the family *Siphoviridae* and the order *Caudovirales*. Consistently, the Virfam analysis (resulting identities provided in Supplementary Materials Table S3) identified the prophage head–neck–tail modules as being part of "Neck Type 1—Cluster 2" type of phages, noted to be associated with siphoviruses. Holin genes have previously been suggested as a phage-specific signature gene for siphoviruses [87]. Phylogeny analyses based on the HTH1 holin (Figure 1) as well as DNA polymerase (Supplementary Materials Figure S3) amino acid sequences revealed the closest affiliations to known phages from the Javan group of *Streptococci* phages [88] and to the *Erysipelothrix* phage phi1605 (NCBI:txid2006938). The sequence identity between the HTH1 holin and the holins from *Streptococcus* phage Javan630 (SPJ630) and *Erysipelothrix* phage phi1605 (EP1605) was found to be 75.7% and 75.0%, respectively.

The closest identified holin homologue from another phage infecting Gram-negative bacteria was that of the phage Funu2 (NCBI:txid1640978) (Figure 1), which is reported to infect *Fusobacterium nucleatum* [89] (sequence identity of 38.6%). This is an interesting hit, as to date, studies of viruses and viral genes associated with *Fusobacteria* remain limited, with only a small number of phages characterized thus far [37,90–92].

When the HTH1 holin was compared against environmental sequences from IMG/VR, an even closer hit at 99% sequence identity was observed against a metagenome-derived holin from a marine anoxygenic phototrophic community R3 (MAPCR3) sample (IMG genome ID 3300004816) originating from a shallow salt marsh pool in Falmouth, MA, USA (Supplementary Materials Figure S2 and Table S4). When the gene neighbourhood surrounding the lytic cassette of HTH1 was compared with those of MAPCR3, SPJ630 and the EP1605 (Figure 2), a remarkably close similarity was identified between the HTH1 and MAPCR3 lytic cassettes, particularly over the four genes corresponding to HTP4425 to HTP4410 in HTH1 (Supplementary Materials Table S4). The similarity was less significant when comparing to cassettes of SPJ630 and EP1605. Furthermore, the lytic cassette amidase (HTP4410) was observed to be replaced by a second glycosyl hydrolase (CAZy GH25) in SPJ630 and EP1605 when the gene annotation and protein domain structures were reviewed using a HMMER search [82] (Figure 3).

**Figure 1.** Phylogeny analysis of the prophage based on the alignment of 106 amino acid long region of holin proteins from 94 phages, using maximum likelihood, with 1000 bootstrap replicates. The tree is centre-rooted, and the scale bar represents the average number of amino acid substitutions per site. Numbers next to collapsed clades represent the number of leaves covered by each illustration. The HTH1 holin is highlighted in red.

**Figure 2.** Gene neighbourhood map of HTH1 and the comparable regions of three closely related phage gene clusters aligned around the holin in their respective lytic cassettes. Displayed genes are drawn to scale, as shown on the top right. Respective organism or sample names, related accession numbers (in parentheses) and genome regions displayed (in bp ranges) are provided above each graphic. Genes chosen for expression of proteins from HTH1 are also labelled with their identifier numbers. Double dashes (//) indicate the presence of genes further up or downstream the gene regions displayed in this figure.

**Figure 3.** Illustration depicting sequence features of chosen candidate proteins predicted by HMMER [82]. Black lines show non-annotated amino acid sequences, grey boxes show predicted Pfam domains, purple lines mark transmembrane domains and numbers flanking each feature show their respective amino acid residue number ranges. The blue box shows the HTP4410 analogue found in *Streptococcus* phage Javan630 (SJ630) and *Erysipelothrix* phage phi1605 (EP1605).

#### *3.2. Selection of Genes for Expression Trials*

HTH1 genes with annotations related to roles in lysis and DNA replication were examined further, examining protein domain structures through comparisons to multiple sequence databases (Supplementary Materials Tables S1 and S2). A set of nine genes were chosen for protein expression trials, as shown in Figure 3, with their designations and associated domain structures. Gene targets associated with the prophage lytic cassette (defined in Section 3.1), including holin (HTP4415), glycosyl hydrolase (HTP4420) and the amidase with a LysM domain (HTP4410), were selected for their putative role in cell lysis, in addition to the phage tail protein (HTP4435) with associations to endopeptidase activity. The hypothetical gene HTP4425 neighbouring the glycosyl hydrolase (HTP4420) was also picked for its potential connection to the lysis-related cluster. Two genes annotated with nucleotide cleavage and production activities were also selected: the rRNA biogenesis protein RRP5 (HTP4400) with putative endonucleolytic activity towards rRNA, and the DNA polymerase I (HTP4385). Furthermore, two genes flanking the HNH endonuclease, HTP4360 and HTP4350, were picked for their potential associations with nucleolytic activity. The gene HTP4350 (GenBank: WP\_134112775.1) was annotated as "DUF262 domain containing protein" by the NCBI pipeline; however, a putative DNase activity was also suggested when analysed with HHpred (Supplementary Materials Table S2), and it is upstream of the prophage gene region in the *H*. *thermotrophus* genome.

Searches made against PDB for structural insight pertaining to the nine HTH1 proteins revealed only low similarity hits for three proteins, HTP4420, HTP4410 and HTP4350, to PDB entries 4S3J, 3HMB and 1D9D, respectively (Supplementary Materials Table S5). However, all three structures reported associations with the expected functions in the HTH1 proteins, such as peptidoglycan lysis for HTP4410 and HTP4350, and DNA polymerase for HTP4350 (Supplementary Materials Table S4).

#### *3.3. Expression of Target Codon-Adjusted Gene Variants*

The codon frequencies of the HTH1 gene sequences were analysed, estimating CAI for the native host *H*. *thermotrophus*. All target protein genes demonstrated CAI values of approximately 0.4–0.5 (Table 1). Estimated CAI values indicated that HTH1 gene sequences were moderately adapted for expression in the native host, predicting comparatively moderate native expression level of the target proteins. Target genes were subsequently processed to generate codon-optimized and codon-harmonized gene sequence variants,

adjusted from the *H*. *thermotrophus* codon usage bias towards compatibility with the expression host *E*. *coli* BL21(DE3). Quantitative analysis of codon-adjusted sequence variants confirmed the expected levels of codon adaptation (Table 1). The CAI of codonoptimized gene sequences varied between 0.84 and 0.89, indicating high adaptation towards heterologous expression in *E*. *coli*. Codon-harmonized sequences, as expected, were less adapted to be expressed in the selected strain, with CAI varying between 0.58 and 0.74. It was noted that CAI of codon-harmonized sequences showed higher variation compared to CAI of codon-optimized sequences. The CHI values of codon-optimized variants were 0.12–0.13 below (Table 1) the estimated CHI values from codon-harmonized sequences, confirming an expected trend for more substantial changes imposed on codon-optimized variants. Moreover, the CHI value of each codon-harmonized gene variant was similar and between 0.43 and 0.48. Even though CHI comparison indicated that codon-harmonized variants were closer to native codon sequences of target protein genes, the "harmonization" effect observed could be interpreted as moderate [51].

All nine codon-optimized gene variants were successfully expressed in *E*. *coli,* at different levels (data not shown). However, the hypothetical protein (HTP4425), glycosyl hydrolase (HTP4420), holin (HTP4415) and the DNA polymerase I (HTP4385) were not detected in the soluble protein fraction, as estimated by SDS-PAGE. Insolubility was particularly expected for the holin because of the multiple transmembrane helices present in the structure (Figure 3), and no significant difference was observed from the use of either codon adjustment approach. Among the codon-harmonized set of genes, expression in *E. coli* could not be observed for the genes encoding the holin (HTP4415) as well as the hypothetical protein (HTP4360). For the other seven genes, only four were found to yield soluble proteins. These proteins were the endopeptidase tail protein (HTP4435), amidase (HTP4410), rRNA biogenesis protein RRP5 (HTP4400) and DUF262 / DNase (HTP4350) (Table 1).

In total, implementation of codon adjustment approaches for selected HTH1 genes resulted in the soluble protein production from five codon-optimized and four codonharmonized gene variants (Figure 4). The set of soluble proteins expressed from codonoptimized and codon-harmonized variants differed by the hypothetical protein (HTP4360) that was not found expressed as soluble from its codon-harmonized variant. As typically expected [50,93], expression levels estimated by densitometry analyses for the five common soluble protein targets revealed higher yields from codon-optimized variants (Figure 4). Exemplifying this trend, the relative soluble abundance of the rRNA biogenesis protein RRP5 (HTP4400) was found nearly three times higher when expressed from its codonoptimized variant compared to its harmonized equivalent (Figure 4); corresponding to a yield difference of ~110 mg/L (Table 1). The codon-optimized gene variant of hypothetical protein (HTP4360) was also expressed at a high level, with an estimated yield of ~150 mg/L soluble protein. Endopeptidase tail protein (HTP4435), amidase (HTP4410) and DUF262/DNase (HTP4350) expressed from codon-optimized gene sequences demonstrated only slightly higher relative abundance (by 2–5%, respectively,) compared to respective codon-harmonized variants (Figure 4). The soluble yields of HTP4435, HTP4410 and HTP4350 from codon-optimized variants were also found to be ~8–16 mg/L higher than the yields of the corresponding codon-harmonized variant (Table 1). Following this step, target proteins from both variants, which were noted as soluble, were up-scaled to be produced in 1 L expression cultures.

#### *Viruses***2021**, *13*, 1215


**Table 1.** Codon usage parameters and soluble production yield estimation of target HTH1 proteins. CAI—codon adaptation index, CHI—codon harmonization index, CO—codonoptimized, CH—codon-harmonized, ND—target protein not detected in total soluble protein fraction.

> \* Values represent mean±standard error of three independent expressions.

**Figure 4.** Relative abundance of target HTH1 proteins produced after expression from codon-optimized (CO) and codonharmonized (CH) gene variants in total soluble protein fraction. ND—target protein not detected in total soluble protein fraction. Values represent relative abundance mean in percent of total proteins in total soluble protein fraction ± standard error of three independent expressions.

#### *3.4. Protein Purification*

Soluble proteins produced from 1 L cultures were purified to near homogeneity by nickel affinity chromatography. An optimized affinity chromatography purification protocol ensured high purity of the target proteins as was visualized by SDS-PAGE (Figure 5), where target proteins were observed at bands corresponding to their expected sizes. Purified endopeptidase tail protein (HTP4435), expressed from both types of codon-adjusted gene variants, were aggregation-prone, while the other target HTH1 proteins remained stably soluble after purification. The single step purification strategy led to generally high purification yields (Table 2). Comparison of the obtained yields of amidase (HTP4410) as well as DUF262/DNase (HTP4350) expressed from codon-optimized and codon-harmonized gene sequences did not differ, whereas the purification yield of codon-harmonized rRNA biogenesis protein RRP5 (HTP4400) was approximately 20% higher compared with the yield of its codon-optimized gene counterpart. In general, the purification yields confirmed a comparatively high affinity of heterologous proteins towards the chromatography resin and were in the expected range for the method [94,95].

**Table 2.** Purification yield of target HTH1 proteins. Protein concentrations were measured spectrophotometrically estimating total amount of target recombinant protein in clarified lysate by combining densitometry calculation results and total soluble protein quantification results. CO—codon-optimized, CH—codon-harmonized, ND—target protein not detected in total soluble protein fraction.


\* Values represent mean ± standard error of three independent purifications.

**Figure 5.** SDS-PAGE image of purified proteins produced from codon-harmonized (CH) and codonoptimized (CO) genes. The HTP prefix and the numbers above the lanes correspond to the identifiers of the genes tested. M indicates the protein marker (Bio-Rad Precision Plus Dual Color). Numbers next to each protein marker lane show the respective molecular weight labels in kDa.

#### *3.5. Crystallization and Thermostability of Target Proteins*

Purified, stably soluble target HTH1 proteins expressed from the optimized and harmonized types of codon-adjusted gene variants were subjected to both crystallization trials and analysis of thermostability. As a higher thermal unfolding temperature has been indirectly connected to an improved fold, that may affect the possibility to crystallize the target protein. In crystallization trials, amidase (HTP4410) as well as DUF262/DNase (HTP4350) expressed from codon-harmonized gene variants (Supplementary Materials Figure S5) and rRNA biogenesis protein RRP5 (HTP4400) from both codon sequence adjustment variants were observed to form protein crystals (M. Håkansson and S. Al-Karadaghi, SARomics Biostructures, personal communication).

In the thermostability assessment with differential scanning fluorometry, which was performed to compare melting temperatures (Tm) of target recombinant proteins expressed from both types of codon-adjusted gene sequence variants, an increase in unfolding temperature was observed from the codon-harmonized variants of the three target proteins where crystal formation was observed. The in vitro thermostability (Tm) of the target HTH1 proteins amidase (HTP4410), rRNA biogenesis protein RRP5 (HTP4400) and DUF262/DNase (HTP4350) varied between approximately 51 and 73 ◦C. Remarkably, recombinant proteins expressed from the codon-harmonized gene variants were all observed to unfold at higher <sup>T</sup><sup>m</sup> values (3–7 ◦C) than corresponding codon-optimized gene variants (Table 3). A T<sup>m</sup> of approximately 61 ◦C was determined for DUF262/DNase (HTP4350) expressed from a codon-optimized gene variant, which was an almost 3 ◦C lower unfolding temperature compared with the T<sup>m</sup> observed for this hypothetical protein expressed from the codonharmonized version. Amidase (HTP4410) and rRNA biogenesis protein RRP5 (HTP4400) expressed from codon-harmonized gene sequence versions demonstrated a T<sup>m</sup> at 73 ◦<sup>C</sup> and 56 ◦C, respectively—increases of almost 7 and 5 ◦C compared to the T<sup>m</sup> of proteins expressed from codon-optimized genes.


**Table 3.** Thermal unfolding estimation with differential scanning fluorimetry of stably soluble target HTH1 proteins. CO—codon-optimized, CH—codon-harmonized.

\* Values represent mean ± standard error of three independent differential scanning fluorimetry assays.

#### **4. Discussion**

Marine bacteriophages remain a largely unexplored resource for enzyme bioprospecting. As a part of the Virus-X consortium (http://virus-x.eu/, accessed on 1 May 2021), successful expression of genes from bacteriophage genomes was identified as a key step towards discovering enzymes from various marine niches. Crystallization of novel viral proteins to collect structural knowledge was another aim of the consortium, as recently exemplified for the proteins XepA and YomS from a *Bacillus subtilis* prophage [96]. Hence, significant research interest currently exists for the analysis of new phage genes that may hold interest both in basic and structural research and for applications in biotechnology.

In this context, a novel prophage, designated HTH1, was identified via the study of the Gram-negative hydrothermal vent bacterium *H. thermotrophus*, which is classified in the phylum *Fusobacteria*. The relationship between *H. thermotrophus* and HTH1 can be considered fitting, as lysogeny is suggested to be prevalent in physiochemically demanding environments. These include deep-sea biomes [97] and diffuse-flow hydrothermal vent communities [22], where temperate phages may provide benefits to host fitness via various mechanisms [98–100].

Taxonomic analyses placed HTH1 within the family *Siphoviridae,* which contains dsDNA viruses defined by their long, non-contractile tails, as opposed to the contractile tails of the *Myoviridae* and the short and non-contractile tails of the *Podoviridae* [101]. The genome size of HTH1 was 41571 bp, indicating it to be smaller compared to the average genome size of *Siphoviridae* at ~53 kb [102]. Interestingly, phylogeny (Figure 1), Virfam [66] and sequence homology analyses of HTH1 genes (Supplementary Materials Figure S1) all suggested closest similarity of HTH1 to siphoviruses that infect Gram-positive bacteria, mainly of the phylum *Firmicutes.*

HTH1 was annotated to contain a suite of expected viral backbone genes, such as structural elements for the viral head, neck, capsid and tail, core viral enzymes such as integrases, terminases, the viral lytic enzymes, and DNA modifying enzymes such as DNA polymerase, endonuclease and recombinases (Supplementary Materials Tables S1 and S2, Figure 3. However, further studies including the lytic induction and isolation of viral particles would be required to confidently determine whether the presented genome of HTH1 corresponds to the complete and functional phage genome infecting *H. thermotrophus*.

Closer inspection of the HTH1 lytic cassette revealed three main genes related to cell lysis: a glycosyl hydrolase putatively capable of chitin and peptidoglycan-degrading activities specific to endo-β-N-acetylglucosamine residues [103,104]; a holin crucial for the perforation of the cell membrane [105,106]; and an N-acetylmuramoyl-L-alanine amidase featuring a membrane binding lysin motif (LysM), with an expected activity of cleaving bonds between N-acetylmuramoyl residues and L-amino acids in the bacterial cell wall (Figures 2 and 3). However, no genes related to spanins, rod-like viral lysis proteins considered essential to disrupt the cell membranes of Gram-negative hosts, were detected [106,107].

The enzymes of the HTH1 lytic cassette, containing the genes annotated to encode glycosyl hydrolase, holin and amidase, were of obvious interest as their peptidoglycandegrading capabilities could be utilized against pathogenic bacteria as bactericidal agents [108].

In addition, the hypothetical protein HTP4435 was selected for testing due to the presence of a tail-associated endopeptidase domain (Pfam PF06605, MEROPS M23) (Figure 3). Such peptidases may find a broad range of potential uses in industrial, medical or scientific applications [109–111]. The DNA polymerase I (HTP4385) was also of direct interest for its potential as an enzymatic tool in many modern molecular biological methods such as PCR, genome sequencing and more [112]. As *H. thermotrophus* was reported to grow optimally at 48 ◦C [55], the proteins encoded by HTH1 may possess elevated thermostability and thermal activity, which are desirable traits in many industrial or scientific applications [113,114]. Furthermore, only limited structural similarity was observed for the chosen HTH1 proteins to structures present in PDB (Supplementary Materials Table S4), suggesting novel features could potentially be revealed with their future structural analyses.

The heterologous expression of native phage proteins has been reported to be challenging [115]. To aid in this process, codon optimization [49] and codon harmonization [50] approaches were considered for the heterologous production of proteins encoded by HTH1. Here, these two approaches were tested, and compared over their effects towards obtaining and increasing soluble protein yields, and also for their effects on the thermostability of the proteins produced. While codon optimization is commercially offered as an option during gene-synthesis services [116], codon harmonization must be carried out manually, and so a deeper understanding of the native viral host is required. As bacteriophages can naturally use their host's machinery to express their genes, they are understood to adapt the same codon usage frequency (CUF) as the host [117]. Therefore, while preparing sequences for codon harmonization, the genome of *H. thermotrophus* was used to calculate and compare CUFs between itself and *E. coli* as the expression host.

Codon analysis of selected native HTH1 genes suggested the target proteins are naturally produced in moderate amounts in *H*. *thermotrophus*. As expected, heterologous target proteins were produced more readily from codon-optimized gene variants than comparable codon-harmonized genes (Table 2, Figure 4 and Supplementary Materials Figure S4), which were adjusted to mimic the gene native codon landscape, sacrificing overall codon adaptation to the expression host in the process [50]. The codon optimization approach for selected HTH1 proteins was successful, as quantitatively confirmed by estimated CAI values and also by observed soluble expression yields. The CAI for the codon-harmonized variants of selected genes were comparatively high and varied substantially, indicating that the codon harmonization algorithms used [83,118] were suitable and specific for each of the HTH1 genes.

Protein folding quality is typically reflected by a higher thermal unfolding temperature and a higher thermostability [119]. While the codon harmonization approach did not result in the soluble expression of a greater variety of HTH1 proteins than codon optimization, it yielded proteins with comparatively higher melting temperatures (Tm) determined by differential scanning fluorimetry, suggesting a higher folding quality. Assayed under identical conditions, higher unfolding temperatures were observed for all HTH1 target proteins expressed from codon-harmonized gene variants compared to corresponding proteins from codon-optimized variants. The melting temperatures determined were in an expected range for HTH1 proteins natively produced within the host cells, fitting with the optimal growth temperature of *H. thermotrophus* [55]. Furthermore, ongoing crystallization trials also confirmed better crystal-forming properties of target HTH1 proteins expressed from codon-harmonized genes as an indicator of improved folding quality (M. Håkansson and S. Al-Karadaghi, SARomics Biostructures, personal communication).

The CHI values estimated for codon-harmonized variants of the selected gene set were comparatively high and did not differ substantially between the different genes in the set, indicating moderate, if not limited harmonization of codons (Table 1). These results could partially explain why target proteins produced from codon-harmonized variants were not persistently more soluble than codon-optimized variants after production in *E*. *coli*. In theory, production of soluble proteins should be ensured by codon harmonization [84], even though further optimization of physiochemical heterologous expression parameters is

recommended to enhance the expression level of soluble protein from codon-harmonized gene variants [120]. Preliminary experiments to express selected HTH1 genes in *E*. *coli* were carried out under the recommended conditions for the expression vector and strain used [85]. Further optimization of the process could be implemented to achieve soluble production of target proteins, which remained insoluble despite codon harmonization. As the current codon adjustment algorithm was mainly developed using non-viral genome sequences, its efficacy could be limited for the adjustment of viral genes. With the limited data available for the implementation of codon adjustment for viral genes [121,122], the results presented herein may aid the further development of codon adjustment algorithms.

#### **5. Conclusions**

In this work, complementary application of bioinformatics and molecular methods allowed the identification, description and protein-level study of a novel marine prophage. Here, we describe the first genome sequence of a prophage discovered in *H. thermotrophus*, a Gram-negative, moderately thermophilic bacterium isolated from the Seven Sisters hydrothermal vent field. The *H. thermotrophus* phage H1 (HTH1) showed similarity to phages infecting Gram-positive bacteria of the genus *Firmicutes*, but in our study, it was found within the genome of a Gram-negative host. A set of nine genes were identified with putative functions, including cell lysis, nucleotide lysis and replication—interesting for both ecological studies and potential biotechnology applications. To facilitate the soluble heterologous production of HTH1 proteins in *E. coli*, codon optimization, and harmonization approaches were tested in parallel. Valuable data regarding production yield, solubility and folding quality of heterologous HTH1 proteins were gathered following expression of codon-adjusted gene variants, which may be useful in improving the application of codon adjustment strategies for viral genes. In the context of the proteins tested, codon optimization was found to lead to higher protein yields, whereas codon harmonization was underlined as more beneficial for the production of proteins with higher stability and folding quality.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/v13071215/s1: Table S1: The list of prophage-associated CDs identified in the *Hypnocyclicus thermotrophus* genome and their putative functions, predicted by NCBI and PHASTER annotation pipelines; Table S2: HHpred-suggested annotations of prophage-associated genes; Table S3: Highest identities of HTH1 proteins with at least 3 protein hits in Aclame, via Virfam analysis; Table S4: Amino acid sequence comparisons of target HTH1 proteins to their homologues in the three chosen viral gene clusters; Table S5: Detailed information on the HTH1 proteins chosen for expression trials; Figure S1: Taxonomical distribution of phage hosts; Figure S2: Extended holin phylogeny analysis; Figure S3: DNA polymerase phylogeny analysis; Figure S4: SDS-PAGE gel images; Figure S5: Protein crystal of HTP4350 produced from a codon-harmonized gene variant; File S1: Nucleic acid sequences of all genes chosen for protein expression, in Fasta format.

**Author Contributions:** Conceptualization, H.A., A.J. and I.H.S.; formal analysis, H.A. and A.J.; funding acquisition, R.-A.S., E.N.K. and I.H.S.; methodology, A.J., R.S. and I.H.S.; project administration, I.H.S.; resources, E.N.K. and I.H.S.; software, R.S.; supervision, I.H.S.; validation, A.J.; writing—original draft, H.A.; writing—review and editing, A.J., H.D., R.-A.S., R.S., E.N.K. and I.H.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** Generous funding was received from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme Virus-X project: Viral Metagenomics for Innovation Value (grant no. 685778), from the Research Council of Norway within the MARINFORSK programme, project: VirVar (project number 294363) and the Kristian Gerard Jebsen Foundation.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All relevant data for the study is provided within the article, and its supplements.

**Acknowledgments:** The sequence data of Marine anoxygenic phototropic community R3 (MAPCR3) (IMG scaffold ID: Ga0071011\_100294) were produced by the US Department of Energy Joint Genome Institute (https://www.jgi.doe.gov/, accessed on 1 May 2021) in collaboration with the user community and was used with permission of the P.I. (Jean J. Huang).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


### *Review* **Plant Viruses: From Targets to Tools for CRISPR**

**Carla M. R. Varanda 1,\* , Maria do Rosário Félix <sup>2</sup> , Maria Doroteia Campos <sup>1</sup> , Mariana Patanita <sup>1</sup> and Patrick Materatski 1,\***


**Abstract:** Plant viruses cause devastating diseases in many agriculture systems, being a serious threat for the provision of adequate nourishment to a continuous growing population. At the present, there are no chemical products that directly target the viruses, and their control rely mainly on preventive sanitary measures to reduce viral infections that, although important, have proved to be far from enough. The current most effective and sustainable solution is the use of virus-resistant varieties, but which require too much work and time to obtain. In the recent years, the versatile gene editing technology known as CRISPR/Cas has simplified the engineering of crops and has successfully been used for the development of viral resistant plants. CRISPR stands for 'clustered regularly interspaced short palindromic repeats' and CRISPR-associated (Cas) proteins, and is based on a natural adaptive immune system that most archaeal and some bacterial species present to defend themselves against invading bacteriophages. Plant viral resistance using CRISPR/Cas technology can been achieved either through manipulation of plant genome (plant-mediated resistance), by mutating host factors required for viral infection; or through manipulation of virus genome (virus-mediated resistance), for which CRISPR/Cas systems must specifically target and cleave viral DNA or RNA. Viruses present an efficient machinery and comprehensive genome structure and, in a different, beneficial perspective, they have been used as biotechnological tools in several areas such as medicine, materials industry, and agriculture with several purposes. Due to all this potential, it is not surprising that viruses have also been used as vectors for CRISPR technology; namely, to deliver CRISPR components into plants, a crucial step for the success of CRISPR technology. Here we discuss the basic principles of CRISPR/Cas technology, with a special focus on the advances of CRISPR/Cas to engineer plant resistance against DNA and RNA viruses. We also describe several strategies for the delivery of these systems into plant cells, focusing on the advantages and disadvantages of the use of plant viruses as vectors. We conclude by discussing some of the constrains faced by the application of CRISPR/Cas technology in agriculture and future prospects.

**Keywords:** CRISPR/Cas systems; viral vectors; gene editing; plant genome engineering; viral resistance

#### **1. Introduction**

Plant viruses are known to infect and cause devastating diseases in many agricultural systems, leading to significant losses in crop quality and yield, with extreme economic impacts worldwide, being a serious threat for the provision of adequate nourishment to a continuous growing population [1,2]. Climate change has been rapidly causing aggravation of viral disease impacts, with existing virus showing pandemic behavior, and with the appearance of new emergent viruses, making the development of efficient long term disease management approaches difficult [3].

Plant viruses are obligate intracellular pathogens and at present there are no chemical products that directly target the virus, that can be used in agronomic context, making

**Citation:** Varanda, C.M.R.; Félix, M.d.R.; Campos, M.D.; Patanita, M.; Materatski, P. Plant Viruses: From Targets to Tools for CRISPR. *Viruses* **2021**, *13*, 141. https://doi.org/ 10.3390/v13010141

Academic Editor: Henryk Czosnek Received: 21 December 2020 Accepted: 17 January 2021 Published: 19 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

preventive sanitary measures the only way to hamper infections. Preventive sanitary measures consist mostly of good sanitation techniques during cultural practices, that include the immediate removal and destruction of infected plants, the limitation of the virus vector organisms populations and the development of legislative measures concerning the commercialization and trade of virus free plant material [4]. Many of these conventional strategies are unsafe for the environment and have proved to be far from enough. The use of viral resistant plants is currently the most efficient and sustainable solution to reduce viral infections. Thus, it is essential to develop effective and durable virus resistant varieties to face the increasingly severe viral diseases and viral variants [5–8]. For many years, classical breeding for crop improvement involved the selection of plants with certain agronomic characteristics and absence of viral symptoms, a very laborious and time-consuming strategy [9].

Advances in biotechnology have provided new knowledge on molecular mechanisms of plant virus interactions, which accelerated the process of breeding through approaches based on molecular marker-assisted breeding, genomic selection, gene silencing, pathogen-derived resistance (PDR), etc., and has provided many resistant varieties to agriculture [10–12]. However, the rapid evolution and emergence of new viruses makes the durability of the resistance a major drawback and creates the need of rapid and efficient techniques for obtaining resistant plants.

In recent years, the versatile gene editing technology known as CRISPR/Cas has simplified the engineering of crops and has already been used for the development of resistance to viral pathogens, overcoming many difficulties of the techniques used to date [13–15].

Moreover, viruses can be manipulated to be beneficial and useful for several purposes as they present an efficient machinery and a comprehensive genome structure. They have been used in biotechnology as molecular tools in several areas such as medicine, materials industry, and agriculture with different purposes including the production of proteins and being targets and vectors of many materials [16,17]. Due to all this potential, it is not surprising that viruses have also been used in this revolutionary genome editing technique.

In this review, we start by describing the basic principles of CRISPR/Cas technology, with special focus on the advances of CRISPR/Cas to engineer plant resistance against RNA and DNA viruses. We demonstrate that, for the successful use of this technology, it is imperative that the CRISPR/Cas system is efficiently delivered and expressed in the targeted cells, and we describe several strategies for the delivery of these systems into plant cells. In a different perspective, we show how viruses can be manipulated to be used as tools for the delivery of CRISPR/Cas systems into plant cells, focusing on the advantages and disadvantages of the use of viruses as vectors of CRISPR systems into plant cells. We conclude by discussing the constrains faced by CRISPR/Cas technology and the future prospects.

#### **2. CRISPR: From a Natural Bacterial Immune System to a Gene Editing Tool**

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPRassociated (Cas) proteins is a natural adaptive immune system that some bacterial and most archaeal species present to defend themselves against invading bacteriophages, which works on the basis of sequence complementarity via cleavage [18,19].

CRISPR systems may be divided into two main classes (I and II) and six different types (I to VI), defined by the nature of the nucleases complex and the mechanism of targeting, each presenting a unique nuclease Cas protein. Class I systems are multicomponent systems composed of multiple effectors; these systems are subdivided into types I, III, and IV. Class II systems include the types II, V, and VI and are single-component systems consisting of a single effector guided by the CRISPR RNA (crRNA) [20].

The CRISPR/Cas9, belonging to class II, is based on the immune system of *Streptococcus pyogenes*. It consists of the capacity of the bacteria to acquire pieces of DNA from an invading phage or plasmid and incorporating them in their own DNA, which will further

serve to guide Cas9 to cleave homologous RNA, leading to immediate RNA disruption and further specific RNA disruption in subsequent invasions, thus providing immunity to the bacterial cell [21]. The mechanism involved in this natural immune system is very simple and has been the basis for the most developed CRISPR/Cas genome-editing platform.

The first steps of CRISPR/Cas9 as a successful editing tool, started with the possibility of engineering into a single RNA chimera (sgRNA), two noncoding RNAs essential for CRISPR, crRNA, and trans-activating crRNA (tracrRNA) [22]. crRNA is the genomic complementary region, i.e., the target for Cas (the programmable portion defined by the user) and tracrRNA is the RNA sequence that provides the stem loop structure to bound Cas. This has simplified gene editing using CRISPR/Cas9, which can now be accomplished by introducing two components in the same cell: the sgRNA and the Cas protein [22] and led to efficient genetic manipulation in a wide array of plants, becoming the most promising, versatile, and powerful tool for plant improvement [23].

In CRISPR/Cas9 system (Figure 1), first Cas9 binds to the sgRNA to create the Cas9 sgRNA duplex which becomes catalytically active and directs the RNA-guided DNA endonuclease Cas9 to target. For target recognition and cleavage, it is also required the presence of a Protospacer Adjacent Motif (PAM) positioned 3–4 nucleotides downstream of the 3′ end of the target sequence, which differs depending on the species of Cas9 (this sequence consists of NGG in *S. pyogenes*) [22,24]. Once the PAM sequence is recognized by the Cas9-sgRNA complex, and the crRNA portion within the sgRNA (the 5′ most 20 nts) anneals to the genomic DNA through Watson–Crick base pairing, it will cleave both DNA strands, three bases upstream of the PAM, creating sequence-specific blunt end doublestranded breaks (DSBs) at target site. When a DSB in the DNA is created, the host cell repairs it via evolutionary conserved DNA pathways such as error-prone non-homologous end-joining (NHEJ) and homology-directed repair (HDR).

**Figure 1.** The mechanism of CRISPR-Cas9-mediated genome engineering in plants. A single guide RNA recognizes a region in the genome followed by a PAM sequence, and recruits a Cas9 protein that will cleave DNA, creating a double-stranded break that is repaired by error-prone non-homologous end-joining (NHEJ) and homology-directed repair (HDR).

NHEJ creates insertions or deletions (indels) at the target site that, if within the protein coding region, can cause a frameshift mutation that eliminates gene expression, leading to gene knock out [25]. HDR is a more precise method for DSB repair; it requires, besides sgRNA and Cas, a donor repair template with ends homologous to each border of the target site sequence. When a repair template is provided, HDR will result in the introduction of new sequences at breaking site and a knock in occurs [25]. For producing specific desired mutations and genomic replacement, DSBs should be repaired by HDR pathway. More recently, a new generation of CRISPR is being developed by fusing nuclease DNA targeting proteins with deactivated nuclease domains, with enzymes to enable direct conversion of a single DNA nucleotide into another [26] without the need of DSB formation.

Genetic engineering using CRISPR/Cas systems enables accurate and precise genomic modifications. Moreover, this strategy can be used to target different sequences simultaneously with high efficiency [27], achieving a broader result, as for example immunity against different pathogens.

The easiness and rapidity of execution, low cost, reproducibility and efficiency turns understandable why it is the system of choice for many genome engineering applications in several fields using different organisms. The possibility of using Cas proteins with deactivated nuclease domains can contribute to a broader application of CRISPR such as regulating gene transcription and inducing targeted epigenic modifications [28]. In addition, CRISPR has shown to have potential for other applications besides genome engineering, such as studies on gene functions and diagnostics. CRISPR/LwaCas13a system was able to highly select and detect up to a single copy of RNA [29], which may be a very interesting starting point to develop a far more sensitive method than currently available methods, for the detection of RNA viruses, including qPCR [30].

In plants, this technology has been used for plant breeding including nutrition enhancement and plant resistance against several agents such as fungi, bacteria, and viruses in many crop plants—including rice [31], tomato [32], citrus [33,34], wheat [35], and maize [36,37]—proving its potential to transform agriculture and enhancing world food safety.

#### **3. CRISPR to Engineer Plant Virus Resistance**

Due to the devastating losses that plant viruses cause, it is not surprising that CRISPR/Cas technologies have been applied to develop plant resistance against viral pathogens.

Plant viral resistance using CRISPR/Cas systems can been achieved either through manipulation of plant genome (plant-mediated resistance), or virus genome (virus-mediated resistance).

The CRISPR/Cas technology was initially thought to be exclusively applied to DNA, which, in terms of its use for plant viral resistance through manipulation of viral genome, would be restricted to DNA viruses. However, thanks to the discovery of RNA-targeting CRISPR/Cas effectors that efficiently target and cleave single-stranded RNAs, an exciting opportunity has been opened for achieving plant resistance also against RNA viruses, which are most of the plant viruses known [38,39].

Below we present several studies that report the use of the CRISPR/Cas system to engineer plant resistance against several viruses, either by acting on plant genome (plant mediated resistance) or on viral genomes (virus mediated resistance). These studies have shown the capacity of CRISPR to confer efficient and durable molecular immunity to plants against viruses that rely on the integrity of their genome at some point of their replication cycle [15,40–43].

#### *3.1. CRISPR for Plant Mediated Resistance*

Plant viruses are dependent on the host's machinery for their replication, since they interact with many host factors required for viral replication and movement inside plants, essential to complete their cycle of infection [44]. CRISPR/Cas allows the mutation/deletion of recessive genes that encode critical host factors for viral infection, conferring recessive resistance, which, as an inherited characteristic is very durable [45].

Considerable knowledge has been generated on the genetics of plant disease resistance and many plant genes have been discovered as essential for viral infections and have been the focus for the development of plant resistance using transgenic approaches [12,46,47]. These studies have provided many valuable potential targets for genome editing and genes—such as the translation initiation-like factors elF4E, elF4G, and their isoforms—that have shown to be directly involved in the infection process of viruses. Those genes are being subjected to targeted mutations introduced by CRISPR to engineer plant resistance [48]. In fact, any host gene encoding a factor required by the virus is a potential target for CRISPR.

This approach is interesting as it allows that Cas9, as well as other endonucleases which target DNA, to be used to provide plant resistance to RNA viruses by mutating host factors/genes associated to viral pathogenesis in the plant [49]. In addition, CRISPR for plant mediated resistance does not require the maintenance of a transgene for Cas9 and sgRNA in the plant genome, engineering transgenic-free virus-resistant plants [14,42,49].

Several studies have achieved plant mediated resistance against viruses using CRISPR/Cas9 (Table 1). For example, specific mutations were introduced in *Arabidopsis thaliana*, causing the knock out of elF(iso)4E gene, which resulted in a stable resistance against *Turnip mosaic virus* (TuMV) [42]. Macovei et al. [50] developed rice plants resistant to *Rice tungro spherical virus* (RTSV) through mutation of elF4G gene. Similarly, the disruption of the cucumber (*Cucumis sativus*) elF4E gene provided plant resistance to multiple members of the *Potyviridae*, namely the ipomovirus *Cucumber vein yellowing virus* (CVYV) and the potyviruses *Zucchini yellow mosaic virus* (ZYMV) and *Papaya ringspot mosaic virus* (PRSV) [49]. Resistance against *Clover yellow vein virus* (CYVV) was achieved in *A. thaliana* plants by targeting the elF4E1 gene using CRISPR/Cas9 [51]. Very recently, CRISPR/Cas9 has also allowed to perform double mutations on the novel cap-binding protein-1 and protein-2 (nCBP-1 and nCBP-2) belonging to the elF4E family, on cassava, which increased the resistance to *Cassava brown streak virus* (CBSV) [52].

It is a fact that modifications of plant genes may always face the risk to interfere with plant functions associated to those genes, with a fitness cost for the host, however these examples have demonstrated the success of CRISPR/Cas9 to produce genetic resistant plants through plant mediated resistance and without compromising plant functions.

#### *3.2. CRISPR for Virus Mediated Resistance*

Another approach to achieve plant viral resistance through CRISPR systems is by directly targeting viral genomes. In this approach, the problems that may arise by interfering with genes, that may also be associated to other plant functions—such as growth, reproduction, or others—are surpassed. However, for this type of mediated resistance, CRISPR/Cas systems must specifically directly target and cleave DNA of DNA viruses, or RNA of RNA viruses [43].

CRISPR for virus mediated resistance was first exploited to fight DNA viruses, as the discovery of CRISPR/Cas systems that can cleave RNA was more recent [27,39]. The discovery of such systems (class II, type VI Cas effectors, and Cas9 variants)—namely Cas13a (C2c2), Cas13b (C2c6), Cas13c (C2c7), Cas13d, FnCas9, and RCas9 (RNA targeting SpCas9) [20,27,53–56], was a great benefit—enabling direct targeting of RNA viruses which represent most plant pathogenic viruses.

Several studies have demonstrated the potential of CRISPR to impart plant resistance by targeting either DNA or RNA viral genomes, causing delayed or reduced accumulation of viruses and significantly attenuating symptoms of infection [57]. Some of those studies which directly mutate DNA and RNA viruses in plants expressing CRISPR/Cas machinery are described below (Table 1).

There are two major groups of plant DNA viruses, the double stranded caulimoviruses and the geminiviruses, the later which, although single stranded, replicate within the plant cell as double stranded DNA [58]. According to the latest report of the international Committee on Taxonomy of Viruses (ICTV), the *Geminiviridae* is the largest group of plant viruses, with 485 species [59]. Geminiviruses infect many economically important crops such as cassava, watermelon, squash, petunia, tobacco, pepper, potato, tomato, bean, soybean, cowpea, cotton, and others, leading to reduced crop yields worldwide [60,61]. Due to this reason, it is not surprising that most DNA virus mediated resistance studies have been applied to geminiviruses (Table 1). Ali et al. [62] used sgRNA molecules targeting coding (rep genes and coat proteins) and non-coding sequences (conserved intergenic region) of the *Tomato yellow leaf curl virus* (TYLCV) genome, that were delivered via *Tobacco rattle virus* (TRV) system into *Nicotiana benthamiana* plants expressing Cas9, causing a reduction of accumulation of viral DNA and reduction of symptoms in plants. A subsequent study using CRISPR/Cas9 system with a sgRNA targeting a conserved region in multiple begomoviruses (CLCuKoV, TYLCV, TYLCSV, MeMV, BCTV-Worland and BCTV-Logan), simultaneously mediated interference and showed that the targeting of viral non-coding, intergenic sequences was more efficient, limiting the generation of recovered viral variants that evade CRISPR-mediated immunity by reverting the induced mutations through NHEJ [40]. Other studies have achieved plant viral resistance through the expression of sgRNAs complementary to sequences either within *Bean yellow dwarf virus* (BeYDV), *Wheat dwarf virus* (WDV) or *Beet severe curly top virus* (BSCTV) genomes, which reduced virus accumulation and symptoms in plants overexpressing Cas9 such as *N. benthamiana*, barley, and *A. thaliana* [41,63,64]. Similarly, CRISPR/Cas9 allowed to obtain resistance against banana streak disease by targeting endogenous *Banana streak virus* (eBSV) sequences [65].

**Table 1.** CRISPR/Cas for viral resistance in plants by targeting viral genome (virus mediated resistance) and host factors (plant mediated resistance).


Plant resistance to a caulimovirus was achieved when Liu et al. [38] expressed multiple sgRNAs targeting the caulimovirus *Cauliflower mosaic virus* (CaMV) coat protein gene in Arabidopsis plants and 20 days after mechanical inoculation of the virus, 85–90% of the plants remained symptomless and showed no presence of CaMV.

Immunity against the RNA viruses *Cucumber mosaic virus* (CMV) and *Tobacco mosaic virus* (TMV) was achieved in *N. benthamiana* and *A. thaliana* transgenic plants expressing FnCas9 and a sgRNA complementary to viral genome delivered through a pCambia based vector [13]. Another study showed that *N. benthamiana* expressing Cas13a either transiently (using binary vector pK2WG7) or constitutively, and expressing crRNAs complementary to different *Tulip mosaic virus* (TuMV) genomic regions, delivered through TRV system, interfered with viral replication and spread [39]. CRISPR/Cas13a (LshCas13a) system showed to target and degrade genomic RNA of TMV in *N. benthamiana* plants and to confer resistance to *Southern rice black-streaked dwarf virus* (SRBSDV) and *Rice stripe mosaic virus* (RSMV) in rice plants [15]. Zhan et al. [66] showed that transgenic potato lines expressing Cas13a/sgRNA constructs targeting conserved coding regions of different *Potato virus Y* (PVY) strains allowed to confer broad spectrum resistance against multiple PVY strains.

As stated above, many studies have shown the great versatility of the CRISPR technology towards plant virus resistance and have successfully shown the production of viral resistant plants. CRISPR has the potential to accelerate viral resistance breeding, since it is more effective and rapid than conventional breeding. In addition, CRISPR has the capacity to target virus directly and therefore to be applied to crops with limited genome sequence information.

There are also limitations of the use of CRISPR in virus plant resistance that must not be discarded. Knocking out essential host factors may always lead to the possibility of plant lethality or impaired growth [67,68]. Although many studies concerning mutations of host factors did not report any negative effects, the introduction of point mutations in host factor genes, instead of knocking out, should be considered, so that it does not interfere with plant growth but still prevents viral infection [69]. Another important limitation of CRISPR is the undesirable genomic modifications of plant genome, the off-targets. Although much less common to occur in plants than in other systems, off-target mutations may be avoided by the use of catalytically inactive Cas nucleases [70] or by using systems that only target RNA, which will be further destroyed by the plant silencing system.

CRISPR/Cas requires the optimal selection of sgRNA target sites to ensure that targeted viruses do not evolve mutations that escape from CRISPR/Cas cleavage, and that novel and more severe strains that cannot be cleaved again do not arise [40,71]. Additionally, multiplex targeting and targeting noncoding regions of viral genomes have shown to reduce viral mutation rates and minimize the formation of new viral strains capable of infection [40]. Also, CRISPR/Cas systems that target or bind RNA can be used together with Cas9 to reduce the RNA intermediates of DNA viruses, eliminating the viruses that may escape the CRISPR/Cas9 machinery [40]. FnCas9 has shown binding capacity to viral transcripts which probably provides even more durable resistance than nucleases that provide direct targeting [43].

There is still a long way to go concerning the full potential of CRISPR/Cas systems for engineering plant virus resistance, and more studies still need to be performed to improve their efficiency. However, it is clear that CRISPR is a milestone in plant virus resistance and the utilization of this technology in agriculture will certainly result in higher yields and quality of plants.

#### **4. Delivery and Expression of CRISPR Systems in Plants**

One crucial step in CRISPR for achieving a highly efficient genome engineering technology is the delivery and expression of CRISPR/Cas components within a plant cell [72], which greatly influences the editing efficiency.

If alien DNA is introduced in the host in a way that it gets incorporated into host genome (transgenic plants), a stable expression is provided and higher editing efficiencies may be obtained, but it is more likely that undesirable off-target mutations are originated [73]. On the other hand, if introduced DNA does not get incorporated into host genome and is expressed transiently, the host is considered free from the alien DNA or simply DNA-free.

Transient expression may be achieved by using ribonucleoproteins (RNP) or plasmids or other vectors delivered by agroinfiltration, carrying CRISPR/Cas components. Several studies have used CRISPR by expressing both Cas and sgRNA constitutively, both transiently or either Cas or sgRNA transiently and the other constitutively [72].

Transient expression of Cas endonuclease reduces off-target modifications, while maintaining a high expression of the sgRNAs that would be constitutively being expressed in the plant. However, this situation involves the use of two different plasmids (which would increase to three if a donor DNA was used for knock in). Transient expression of all CRISPR/Cas components (if no donor for DNA repair is used) can obtain DNA-free plants, avoiding the hurdles associated to transgenic plants.

Either way, it is desirable that CRISPR/Cas components are expressed in germline cells, which easily occurs in stable integration, as all cells in transgenic plants will express the CRISPR system, but which may not occur in transient expression. In this case, CRISPR/Cas components must be introduced directly into germline cells or be able to migrate to these cells, thus allowing mutations to be transmitted to the next generation of plants, without the need of tissue culture and all the labor and time consumption it implies.

Several methods have been used to introduce CRISPR/Cas components in plants, including Agrobacterium-mediated T-DNA transformation or physical means such as protoplast transfection and microprojectile bombardment. These methods rely on mediators such as plasmids, ribonucleoproteins or viruses to carry the sequences to be introduced.

Plant protoplasts can be obtained by digesting cell walls with enzymes and editing reagents, that can be delivered by electroporation or by polyethylene glycol (PEG) treatment. Transfection of CRISPR/Cas components into protoplasts with subsequent regeneration of plants allowed to successfully introduce mutations with editing efficiencies ranging from 3% to 46%, resulting in either stable or transient expression in several plants including rice, soybean, *A. thaliana*, potato, grapevine, wheat, and lettuce [74–81]. This method allowed the creation of DNA-free edited plants by delivering preassembled Cas9-sgRNA ribonucleoproteins (RNPs) [79,80,82], which cannot be delivered by Agrobacterium [83]. The delivery of Cas9-sgRNA RNPs instead of plasmids that encode Cas9-sgRNA avoids that plasmids are degraded in cells by nucleases, resulting in small DNA fragments that may undesirably be inserted in the host genome [84]. This method has the ability to deliver multiple components to a large number of transfectable cells and to obtain vector less or DNA-free plants, since regenerants are obtained from single genetically modified protoplasts. This is an important advantage as plants edited using transfection of protoplasts may not be subjected to the regulatory issues and ethical barriers associated to transgenic plants. However, if this technique is used for knock in, an exogenous DNA template is required and regulation may no longer be avoided. In addition, protoplast transfection is in many cases associated with problems with plant regeneration and presence of undesired somaclonal mutations.

Another method used to deliver CRISPR/Cas components in plants is biolistic bombardment. It consists of coating microprojectiles—generally gold, silver, or tungsten particles—with DNA constructions which are then fired into plant cells with high pressure to penetrate the cell wall. Biolistic bombardment has introduced targeted mutations into plants, by using gold particles to carry and deliver CRISPR/Cas9 reagents in plasmids, causing stable integration in rice, wheat and soybean genomes, with editing efficiencies ranging from 14.5% to 76% [31,85,86]. Other study achieved TECCDNA (transiently expressing CRISPR/Cas 9 DNA) in wheat with editing efficiency of 1–9.5% [35]. Edited plants, without alien DNA integration, were obtained by biolistic delivery of RNP in maize [87] and wheat [88] with editing efficiencies that range from 21.8% to 47%. A geminivirus *Wheat dwarf virus*-based vector, pWDV2, carrying both Cas9 and sgRNA was used for biolistic transformation in wheat, providing a 12-fold increase editing efficiency when compared to the delivery of this system by traditional vectors [81]. The use of viruses to deliver CRISPR/Cas components will be further discussed in this review. Biolistic bombardment is usually efficient, multiple constructs can be delivered simultaneously and it can be used for many plant species. The major disadvantage is that it leads to multiple copies of the introduced genes, with random integration within genomes, which can lead to phenomena such as gene suppression in the recovered transgenic plants. It is also more costly than other methods.

To date, the most common system used to obtain transgenic plants is based on *Agrobacterium tumefaciens*. This approach has been widely used to deliver CRISPR/Cas components into plant cells of a variety of plant species. Agrobacterium has the ability to transfer a piece of its genome (T-DNA) to the cell nucleus, where it randomly integrates the plant genome [89]. Cas9 and sgRNA expression cassettes can be easily cloned into Ti plasmid, transformed into Agrobacterium and then introduced into plants. Many studies have used *A. tumefaciens* to deliver CRISPR/Cas components into plant cells, providing the insertions of T-DNA and achieved stable integration of transgenes in the genomes of many plant species—such as sorghum, *A. thaliana*, rice, tomato, maize, grapevine, aspen, rapeseed, and watermelon—with editing efficiencies that ranged from 23% to 100% [36,75,90–94].

Agrobacterium may also be used for transient expression of Cas9/sgRNA (agroinfiltration) [95]. This has been achieved in citrus with editing efficiency of 20% [33]. In *N. benthamiana*, rice and *A. thaliana*, viral transient expression resulted in editing efficiencies reaching 85% [23,62,96]. The use of viruses to deliver CRISPR/Cas components will be further discussed in the following section.

*Agrobacterium rhizogenes* has also been used for genome editing, resulting in stable integration of foreign DNA in soybean and a few other plant species, with editing efficiencies that range from 14.7% to 95% [97–99]. *A. rhizogenes* indicates a successful editing event by the appearance of hairy roots, however it requires regeneration of whole plants from these roots, which can be problematic for some species.

Agrobacterium-mediated delivery presents several advantages, it requires technology available in most laboratories, it is cheap, it allows multiplex editing as multiple binary vectors can be delivered into Agrobacterium and co-transformed into plant cells. Additionally, it can be used in transient assays, which may result in a non-transgenic plant and in a lower number of edited off-target sites.

#### *The Use of Viruses to Carry CRISPR Components*

Many viruses, including retroviruses, adenoviruses and adeno-associated virus, have already shown to achieve effective delivery of genome-engineering reagents in mammalian systems [100,101].

In plants, *Tobacco mosaic virus* (TMV) was the first virus to be manipulated as vector, resulting in virus-induced gene silencing (VIGS) of an endogenous gene in *N. benthamiana* [102]. Since then, many other viruses have been widely used as vectors of gene silencing and for expression of foreign proteins in plants. However, their specific use to deliver genetic material such as CRISPR/Cas components in plants is much more recent. The first reports of the use of viruses to assist CRISPR/Cas gene editing, were in 2014 and were based on geminiviruses [103]. Since then, studies have been focused not only on the use of the DNA geminiviruses [23,81,96,104,105] but also on RNA viruses [40,62,106–111] as sgRNA delivery systems.

The numerous studies on the use of geminiviruses as vectors, result mostly from their easy manipulation. Geminiviruses (family *Geminiviridae*) are widespread, insecttransmitted and infect a wide range of plants [60,112]. Geminiviruses have a single stranded circular DNA with monopartite or bipartite genomes that range between 2.5 kb to 3 kb, with four to six open reading frames (ORFs). Once inside a plant cell, their single stranded genome forms a double stranded intermediate which is then used as template for transcription and for rolling-circle replication. They require only one replication initiator protein, Rep (C1), to initiate rolling-circle replication inside the host. Following replication, single stranded genomes are either converted to double stranded intermediates to initiate another replication cycle, or encapsidated by the coat protein to produce virions which then move to adjacent cells through plasmodesmata. Their small sizes mean they are easy to manipulate but on the other hand, it physically limits their cargo capacity; as so, they are unable to carry long DNA fragments, such as genes encoding Cas nucleases (~4.2 kb) [113].

To retain most of the features required for movement and replication, the CP of some bipartite begomoviruses may be replaced by the desired heterologous sequence of up to 800 bp or up to 1000 bp with further modifications [96,103,114]. However, with this change, geminiviruses are still unable to carry long DNA fragments such as genes encoding Cas nucleases, but it is enough to express and produce high amounts of sgRNA. In fact, the number of double stranded intermediates during viral replication is higher in the absence of the CP, possibly because the CP sequesters and packages ssDNA to form viral particles.

To increase cargo capacity, geminiviruses have been manipulated into non-infectious replicons (GVRs) by removing movement protein (MP) and coat protein (CP) coding sequences, and thereby eliminating cell to cell movement and insect transmission. In these cases, viral vectors are not infectious on their own and must be delivered into plant cells using Agrobacterium mediated transformation, in contrast to the possibility of agroinfiltration or mechanical inoculation for virus-induced gene editing (VIGE). These deconstructed DNA replicons have been used to introduce large amounts of repair templates in plants, which are required for HDR to outcompete NHEJ, showing high efficiency of HDR in plants.

Several studies have shown the use of geminiviruses to assist CRISPR/Cas (Table 2). Baltes et al. [103] used *Bean yellow dwarf virus* (BeYDV) replicons to efficiently deliver a sequence-specific nuclease (Cas9) and a repair template to tobacco plants for gene targeting, showing a considerable cargo capacity and with gene targeting frequencies with two orders of magnitude increase over conventional Agrobacterium T-DNA transformation. The use of BeYDV replicons also allowed genome editing in potato, by causing mutations capable of supporting a reduced herbicide susceptibility phenotype, while Agrobacterium T-DNA transformation held no detectable mutations for the same phenotype [104]. Cermark et al. [105] used BeYDV replicons to insert a strong promotor upstream of a tomato (*Solanum lycopersicum*) gene that regulates anthocyanin synthesis (ANT1) and obtained efficiencies 12-fold higher than traditional Agrobacterium T-DNA delivery. Similar efficiencies were obtained by Yin et al. [96] who used *Cabbage leaf curl virus* (CaLCuv) for VIGE by replacing viral CP by sgRNA, to edit different genes (NbPDS3 and NblspH) in *N. benthamiana* plants. VIGE makes use of Cas9 overexpression in plants and transient delivery of geminivirus vectors carrying sgRNAs and can be used as an alternative to VIGS.

In 2017, *Wheat dwarf virus* (WDV) replicons were used for gene targeting in wheat and rice [23,81]. WDV replicons showed high gene targeting efficiency and allowed to target multiple genes within the same cell [81]. Using this WDV-based system, Wang et al. [23] showed efficient HDR in rice.

In addition to geminiviruses, many RNA viruses have been used as vectors in plants (Table 2).

RNA virus-based vectors have the advantage of not integrating plant genome accidentally, so resulting in DNA-free plants, which avoids raising additional regulatory and ethical issues.

One of such virus-based vector, also widely used for VIGS, is *Tobacco rattle virus* (TRV) [115]. TRV belongs to genus *Tobravirus*, family *Virgaviridae*; it infects over 400 plant species and is transmitted by nematodes of the family *Trichodoridae*. It has a bipartite genome with two positive sense single stranded RNAs, RNA1 (TRV1), and RNA2 (TRV2). TRV1 is essential for virus replication and movement and TRV2 genome has genes encoding the CP and nonstructural proteins involved in nematode transmission. For its use as vector, these non-structural proteins in TRV2 can be replaced for the fragments of interest [116].

The first application of TRV as vector for genome engineering was in a non-transgenic approach for zinc-finger nucleases (ZFN) delivery in plants, by replacing RNA2 with RNA for the Zif268: FokI ZFN. In this system, targeted genome modifications were recovered at an integrated reporter gene in somatic tobacco and petunia cells, and transmission of mutations to next generation confirmed the stability of the ZFN induced changes [117].

The first use of TRV as a vector for CRISPR was in 2015, when TRV was developed as a vehicle for delivery of sgRNAs to modify genomes of *N. benthamiana* and *A. thaliana* [115]. A TRV vector containing sgRNA for phytoene desaturase gene (PDS) was introduced into leaves of *N. benthamiana* transgenic lines overexpressing Cas9, via agroinfection, which

showed modification of the PDS gene [115]. In addition, TRV showed the ability to infect germline cells, as TRV-mediated delivery of sgRNA was not limited to infiltrated plants, allowing to successfully recover the desired modification in the next generation [115]. TRV can carry DNA fragments up to 3000 bp, however it is still not enough for the Cas gene, having been used only for sgRNA delivery into transgenic plants stably expressing Cas nuclease, thereby requiring that all genome edited plants are transgenic.

TMV, as mentioned previously, was the first virus to be manipulated as vector in plants, and has shown high level of accumulation and gene expression in several hosts, as well as prolonged integrity of its derived gene vectors [107,118]. Based on this potential, TMV was also developed as a vehicle for delivering sgRNA by partially substituting the CP with a sgRNA [107]. TMV showed to mediate target gene editing by showing the ability to deliver high concentrations of sgRNA and to efficient edit the target host gene in *N. benthamiana* plants, that was previously infiltrated with a plasmid expressing Cas9 [107].

Ali et al. [106] demonstrated that *Pea early browning virus* (PEBV) was able to deliver sgRNAs, resulting in mutagenesis of the targeted genomic loci in *N. benthamiana* plants, constitutively overexpressing the Cas9, in a more efficient way than TRV. In addition, like TRV, PEBV can infect meristematic tissues [119] which may allow the recovery of seeds with the desired mutations and obviate the need for tissue culture to generate heritable targeted mutations. *Barley stripe mosaic virus* (BSMV) has also been engineered as a sgRNA delivery system for CRISPR/Cas9 mediated targeted mutagenesis in wheat and maize, both transformed constitutively with Cas9 [108]. Recently, *Beet necrotic yellow vein virus* (BNYVV)-based vectors were designed to allow simultaneous expression of multiple foreign proteins and used for efficient sgRNA delivery for genome editing in transgenic *N. benthamiana* plants expressing Cas9 [109].

*Foxtail mosaic virus* (FoMV) has also showed to express sgRNAs in *N. benthamiana*, *Setaria viridis* and maize plants constitutively expressing Cas9, demonstrating that FoMV can enable gene editing [110].

All these previous attempts using plant RNA viruses for expression of sgRNA were able to express sgRNAs and introduce mutations into plant genomes that were overexpressing Cas9.

Until recently, there were no reports of delivery of the entire CRISPR/Cas system into plants through viral vectors due to their small capacity for carrying DNA/RNA fragments [120]. This was overcome when technical breakthroughs in delivering all CRISPR/Cas components into plant cells using negative-strand viruses were reported [121,122]. The negative-strand viruses, *Barley yellow striate mosaic virus* (BYSMV) and *Sonchus yellow net rhabdovirus* (SYNV), were used to successfully deliver CRISPR/Cas reagents and sgRNAs into plant cells. Ma et al. [122] showed that SYNV was able to knock out different genes in plants, achieving highly efficient DNA-free genome editing. This study also showed the multiplex editing ability of virus-delivered CRISPR/Cas9 system by designing sgRNAs for different genes without affecting the efficiency, and confirmed that genome-edited plants pass the genome alteration to subsequent generations. However, rhabdoviruses rarely infect germline cells, and SYNV mediated genome editing only works efficiently in somatic cells being plant tissue culture required to obtain an individual genome edited plant.


**Table 2.** Viruses used to carry CRISPR sequences into plants and type of delivery.

More recently, *Potato virus X* (PVX) has also been used to efficiently deliver both Cas9 and sgRNA into *N. benthamiana* plants [111]. PVX has a filamentous flexible structure with a 6345 nt (+) ssRNA, and each particle contains ~1350 coat protein subunits [123]. In opposition to what happens to small viruses, it is not likely that gene insert size is physically limited in PVX. Cas9 and sgRNA were placed between Triple Gene Block (movement proteins MP1, MP2, and MP3) and the CP of PVX and virus vector was both agroinfiltrated and mechanically inoculated in *N. benthamiana* plants. PVX-Cas9 RNA showed to infect most cells and express a large amount of Cas9 protein, while T-DNA integration into *N. benthamiana* genome occurred at low frequency. In addition, the mutation introduced was inherited by the next generation, but no PVX RNA was detected in these plants, showing that PVX was not transmitted through seed, leading to the suggestion that transgenerational transmission of PVX is unlikely to occur, resulting in DNA-free genome edited plants [111]. The possibility of such as simple and efficient virus-vector mediated

delivery as the mechanical inoculation of a virus carrying the entire CRISPR/Cas system greatly facilitates transgene free gene editing in plants.

#### **5. Challenges in the Use of Viruses for CRISPR**

Virus mediated delivery of CRISPR/Cas is an easy way to deliver Cas nuclease and sgRNAs into plants, that overcome many challenges of transgene delivery, with no additional requirements, allowing to edit a desired feature into a plant, in laboratory or in the field, to obtain an improved DNA-free plant. They present several advantages such as they are easy to manipulate; viral genome can be used as repair template; they replicate to high copy number and accumulate at high levels (including sgRNAs and repair template) and systemically spread in a large number of plants leading high level expression and genome editing efficiency; multiple sgRNAs can be expressed from a single viral genome, allowing multi targeted genome editing; VIGE phenotypic alterations appear in plants in a relatively short time. In fact, VIGE is a promising tool for transgene integration-free genome editing, as it may not require the production of transgenic lines or simplify this operation, which is often laborious and time consuming, expensive, and raises public concerns and extra regulations [124,125].

In addition, some viruses have shown the capacity of invading meristems when used as CRISPR/Cas vectors, by systemically deliver sgRNAs and therefore enabling the recovery of progeny carrying the targeted genomic modification, overcoming the need of tissue culture—i.e., start from leaf tissue and regenerate the whole plant and then genotype for the presence of the modification [62,106], and opens new possibilities for producing plants with desired characteristics without the need of laborious and time consuming steps. Therefore, as a vector for genome engineering, it is highly desirable that viral vector infects germline cells, so that it will be possible to harvest mutant seeds from infected plants.

VIGE, especially RNA-based, may also contribute to decrease off-target activities, a major issue in CRISPR that occurs due to sgRNA mismatches and continuous expression of Cas nucleases, that result of editing unintended sites in the genome [126]. When viruses are used to express CRISPR/Cas systems, these will only be expressed when viruses invade plant cells, limiting the concentration of Cas and thus more likely that no off-target effect is detected [127].

Despite all these advantages, the limited cargo capacity that many viruses present (typically <1 kb) is a major drawback for their use for delivery of all gene editing reagents such as Cas9 (approx. 4.2 kb), as excess cargo results in the loss of systemic movement or loss of the cargo DNA [128].

For this reason, viruses have been developed to deliver sgRNAs to transgenic plants expressing Cas9 or have been deconstructed into non-infectious replicons or, more recently, a negative sense RNA virus and PVX showed to be able to carry the entire CRISPR/Cas system. All these studies show the huge possibilities and great potential of the use of plant viruses as vectors to efficiently target and deliver CRISPR/Cas reagents.

Further research may result in new discoveries that may allow positive-strand RNA or DNA viruses to be engineered to carry large DNA/RNA sequences without affecting their infectivity and with even greater editing efficiencies.

#### **6. Concluding Remarks and Future Prospects for CRISPR in Agriculture**

CRISPR/Cas technology has definitely simplified gene engineering showing great potential on improving several traits in plants, not only on the development of resistance to viral pathogens, but also to fungi, bacteria and insects, as well as tolerance to abiotic stresses and increase in yield [129–131] overcoming many difficulties of the techniques used until now [13–15]. This innovative technology at the disposal of plant breeding holds promise for protecting crops against abiotic and biotic stresses, so that farmers can meet consumers expectations for healthful and affordable products obtained by using few natural resources.

There are still technological improvements needed, such as precise editing and strategies to bypass the need for tissue culture. When using genome editing strategies, the possibility of editing unintended sites in the genome, off-targets, can never be ignored. As mentioned before, viruses as vectors of CRISPR systems may be used to decrease these collateral effects. In addition, a CRISPR/Cas technology in which a single nucleotide is chemically modified instead of producing DSB may also be widely used to prevent off-target effects [26].

Another constraint of the implementation of CRISPR as a plant breeding technique, is the difficulty to obtain new edited plants without tissue culture. Regeneration of plants through tissue culture is a time-consuming process, and there is the possibility of producing random somatic mutations. In addition, some crops are recalcitrant to regeneration through tissue culture. Delivery of CRISPR components in plant apical meristems so that seeds harvested will carry the mutations is desirable and already showed to be possible. However, many crop plants will lose valuable traits when propagated by seed.

Besides the technical and scientific aspects that must be overcome, CRISPR will also have to deal with social and political aspects such as the public concerns and government regulations mostly associated with transgenic plants. It is essential to provide clear information on CRISPR to the public and government to gain their acceptance and to influence regulatory policies on the use of CRISPR technologies in agriculture. The first clarification that must be done is that CRISPR may be applied to rapidly produce plants with traits that might easily also result from conventional plant breeding, as deletions and small insertions may also occur naturally or be induced during conventional plant breeding; or, in alternative, it can be used to introduce exogenous genes in plants and, only so, it would be equated with genetically modified organisms (GMO).

Plants subjected to CRISPR/Cas have gained extreme attention in terms of regulation. The United States Department of Agriculture (USDA) has recently regulated genome edited plants as safe for human consumption and the environment, as long as the resulting mutations are indistinguishable from mutations that occur naturally or by traditional breeding techniques. USDA has considered genome editing as an expansion of traditional plant breeding that can introduce new traits in plants more quickly and precisely, saving years or decades to bring needed new varieties to farmers, which is a great advance in the application of CRISPR in agriculture (Code of Federal Regulations, Vol. 7, part 340). This view has been adopted by most of the world, with the exception of the European Union, where, in 2018, the European Court of Justice (ECJ) ruled that genome edited organisms are GMOs until clarification of their legal status and, as so, are at present, subjected to the same obligations as transgenic organisms (Judgement in case C-528/16) and therefore fall under the European GMO Directive (2001/18/EC). The European Commission is currently carrying out a study on the potential of new genomic techniques that may play a role in sustainability, provided that resulting products they are safe for consumers and environment, as stated on the communication of 'A Farm and Fork Strategy for a fair, healthy and environmentally-friendly food system' (COM/2020/381), which is expected to be concluded in April 2021, and a different perception may be achieved. However, the current regulation is a clear obstacle to European agricultural innovation as greatly makes it difficult for genome-edited products to reach the market and has a huge impact in terms of competitivity with other countries with less restrictions.

CRISPR is a powerful plant breeding tool, which can contribute to provide food security to the ever-growing world population and to a sustainable agriculture, and discussions concerning the risks associated with genome editing should be driven more by scientific principles than by socio-political factors.

**Author Contributions:** Conceptualization, C.M.R.V. and P.M.; Resources, C.M.R.V., M.d.R.F., and P.M.; Writing—original draft preparation, C.M.R.V. and P.M.; Writing—review and editing, C.M.R.V., P.M., M.d.R.F., M.D.C., and M.P.; Funding acquisition, C.M.R.V. and P.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the projects "Control of olive anthracnose through gene silencing and gene expression using a plant virus vector" with the references ALT20-03-0145-FEDER-028263 and PTDC/ASP-PLA/28263/2017 and "Development of a new virus-based vector to control TSWV in tomato plants" with the references ALT20-03-0145-FEDER-028266 and PTDC/ASP-PLA/28266/2017, both co-financed by the European Union through the European Regional Development Fund, under the ALENTEJO 2020 (Regional Operational Program of the Alentejo), ALGARVE 2020 (Regional Operational Program of the Algarve) and through the Foundation for Science and Technology, in its national component. M.P. was supported by the FCT research grant SFRH/BD/145321/2019. This work was also funded by the National Funds through FCT—Foundation for Science and Technology under project no. UIDB/05183/2020.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


*Article*
