**Bovine Foamy Virus: Shared and Unique Molecular Features In Vitro and In Vivo**

#### **Magdalena Materniak-Kornas 1, Juan Tan 2, Anke Heit-Mondrzyk 3, Agnes Hotz-Wagenblatt <sup>3</sup> and Martin Löchelt 4,\***


Received: 27 September 2019; Accepted: 19 November 2019; Published: 21 November 2019

**Abstract:** The retroviral subfamily of *Spumaretrovirinae* consists of five genera of foamy (spuma) viruses (FVs) that are endemic in some mammalian hosts. Closely related species may be susceptible to the same or highly related FVs. FVs are not known to induce overt disease and thus do not pose medical problems to humans and livestock or companion animals. A robust lab animal model is not available or is a lab animal a natural host of a FV. Due to this, research is limited and often focused on the simian FVs with their well-established zoonotic potential. The authors of this review and their groups have conducted several studies on bovine FV (BFV) in the past with the intention of (i) exploring the risk of zoonotic infection via beef and raw cattle products, (ii) studying a co-factorial role of BFV in different cattle diseases with unclear etiology, (iii) exploring unique features of FV molecular biology and replication strategies in non-simian FVs, and (iv) conducting animal studies and functional virology in BFV-infected calves as a model for corresponding studies in primates or small lab animals. These studies gained new insights into FV-host interactions, mechanisms of gene expression, and transcriptional regulation, including miRNA biology, host-directed restriction of FV replication, spread and distribution in the infected animal, and at the population level. The current review attempts to summarize these findings in BFV and tries to connect them to findings from other FVs.

**Keywords:** bovine foamy virus; BFV; foamy virus; spuma virus; model system; animal model; animal experiment; miRNA function; gene expression; antiviral host restriction

#### **1. Introduction**

The family of *Retroviridae* is divided into two subfamilies: The *Spumaretrovirinae* consist of five genera of different spuma or foamy viruses with shared and unique features that separate them from the canonical *Orthoretrovirinae*, which comprise all other known exogenous retroviruses (Figure 1) [1]. The number of research groups working on FVs is correspondingly small and even further split by their individual research focus or the FV isolate or host species used in their studies but also due to the sheer difference in numbers and an apparent lack of pathogenicity of foamy viruses (FV).

**Figure 1.** Phylogenetic tree of known exogenous and endogenous foamy viruses (FVs) (blue branches) and members of the *Orthoretrovirinae*. A fasta file with the conserved regions of the Pol proteins (supplement from ref. [2] and prototype FV (PFV, U21247.1) was used for alignment with ClustalW (http://www.clustal.org/). From the alignment, an ML tree was created using fastml (https://fastml.tau. ac.il, default parameters). The resulting newick tree was displayed by Itol (https://itol.embl.de/).

Most molecular analyses have been conducted on the so-called prototype/primate FV isolate, also initially designated human FV, but subsequently shown to be the end-product of the zoonotic transmission of a chimpanzee FV to an East African naso-pharynx cancer patient [3]. Upon subsequent propagation and passaging in diverse human and non-human cell lines and concomitant severe genetic changes [4], this virus became the best-studied FV isolate and it gained the name prototype FV (PFV). However, its prototypic character might be questioned, since research on highly related simian FVs or more distantly related FVs of feline, bovine, and equine origin is lagging behind and has revealed—at least in selected cases—more or less different data (Figure 1 and Table 1) [5]. Simian FVs share substantial relatedness and, despite having a long co-evolutionary history with their cognate hosts, inter-species transmission is frequent and well documented among closely related hosts, like Old World monkeys and apes, including humans, but also between New World monkeys and humans [5–8]. In general—and there are only very few exceptions known—FVs co-speciate with their cognate hosts and more or less closely related species may be susceptible to the same or a highly related FV [5,9–11]. This host range is likely due to the co-evolution of the virus, together with its host with FVs being

the most ancient retrovirus according to the presence of endogenous viruses in all of the vertebrate groups [10,12].

**Table 1.** Special features and novel insights that were gained from past and current work on Bovine Foamy Virus.


Although a so-called prototype (and/or primate) FV exists in the literature, conserved, prototypic features, besides those basic characteristics that led to the establishment of an independent subfamily of FVs, are currently only partially known. Here, an unbiased comparison of distant FVs and their replication strategies might be worth trying to discriminate basic from deduced, secondary features. In addition, unique data not available for the other FVs have been generated for bovine FV (BFV, see Table 1), and there is the question whether they represent shared or unique features [5]. In this review, we try to use the current data on diverse aspects of the molecular biology of BFV to broaden and complete the overall knowledge of FV biology and indicate avenues of further investigation on BFV biology in vivo and in vitro. In Table 1, the biggest achievements and strengths in the BFV system are summarized and this review will cover some of them in more depth.

#### **2. Specific Topics and Highlights in BFV Biology and Virus-Host Interaction**

#### *2.1. Historic View*

While the first FV was already described in 1954 [29], the first FV from cattle was isolated 15 years later and designated Bovine Syncytial Virus [30]. The subsequent isolates were also designated Bovine Spuma Virus and Bovine Spumaretrovirus before the name bovine foamy virus (BFV) was coined and finally acknowledged by the ICTV in 1999 (https://talk.ictvonline.org//taxonomy/p/taxonomy-history? taxnode\_id=20074661) [1]. Holzschu et al. [19] published the first full-length nucleotide sequence of a BFV isolate from the United States (US) in 1998. These data confirmed the overall genetic structure and coding capacity of BFV as a typical member of the FV genus (Figure 2A).

**Figure 2.** Genetic structure and schematic illustration of bovine foamy virus (BFV) gene expression and the BFV primary miRNA. (**A**) The BFV provirus DNA genome is shown on top schematically and out of scale with the terminal long terminal repeats (LTRs) consisting of the U3, R, and U5 regions. The position of the miRNA cassette in the U3 regions is indicated in color. BFV genes are shown as overlapping open boxes sub-divided into the mature protein domains. Proteolytic processing is marked by dotted lines. The spliced *bet* gene is separately shown below the genome. Broken arrows indicate the transcriptional start sited and direction of LTR- and internal promoter- (IP) directed gene expression and the Tas-mediated transactivation of the 5'LTR and the IP is indicated in red. Below, a selection of the major early and late BFV transcripts starting at the IP and LTR are shown with spliced-out areas indicated by broken lines. Only the major BFV IP-directed Tas mRNA is shown (\*). The shift between early and late transcription is marked by a boxed arrow at the right-hand margin. (**B**) The predicted folding and secondary structure of the BFV dumbbell-shaped miRNA precursor (BFV pri-miRNA) is given, for additional information, and the sequence of the mature and stable miRNA, see below and Whisnant et al., 2014 [22].

This opened the way for functional and genetic studies on the molecular biology and replication of BFV in cell cultures and experimentally BFV-infected animals. In addition, it allowed for the establishment of tools for high sensitivity and specificity detection and diagnosis, as described in the subsequent chapters and undertaken in the labs of Jacek Kuzmak and Magdalena Materniak-Kornas, Wentao Qiao, Yunqi Geng and Juan Tan, and Martin Löchelt and co-workers (Figure 3).

Almost unrecognized since exclusively publishing in German, the BFV Riems isolate was established and characterized by Dr. Roland Riebe and co-workers in East Germany (Friedrich Löffler-Institute, Riems, Germany) in the early 80s of the last century [17,18]. The original BFV Riems isolate is, to our knowledge, the only FV that has been exclusively propagated in primary cells of its authentic host species and it thus might have not so much "suffered" genetic changes and co-adaptive imprints due to (repeated) host cell changes and prolonged growth in tumor cells displaying highly selected and aberrant features.

**Figure 3.** BFV100-infected canine fetal thymus Cf2Th cells: (**A**) Giemsa stained syncytia; (**B**) detection of BFV Gag proteins (red) by indirect immunofluorescence, nuclei were stained in blue; BFV particles budding from the (**C**) plasma membrane (magnification is 60,000-fold) and (**D**) accumulating intracellularly in the endoplasmic reticulum (magnification is 32,000-fold) as visualized by transmission electron microscopy. Scale bars in (**A**,**B**) are 250 μm and in (**C**,**D**) 500 nm.

#### *2.2. Excellent, Well Established Non-Primate FV Model of Transactivation, Gene Expression and Gene Function*

Gene expression and transactivation studies have been mainly conducted in the earlier years of PFV and SFV research, in particular between 1990 and 2000. Research regarding the underlying molecular mechanisms of BFV gene expression has only started in 2008 and it is still ongoing in the lab of Wentao Qiao and Juan Tan while using current, state of the art methods and technologies, thus also extending from this perspective our understanding of FV gene expression and replication as reported here by J.T. (Figure 2A). Similarly, BFV Bet and Gag have been additionally studied by this group during the last years and are thus included in this review, allowing for a more comprehensive view on structural and non-structural FV proteins (Figure 2A).

#### 2.2.1. Function of Tas

Unlike PFV Tas, BFV Tas has no classical nuclear localization signal (NLS), but it is mainly present in the nucleus beside some cytoplasmic localization [31–33]. Like most typical DNA-binding transcriptional activators, nuclear localization and multimerization are both required for the transactivation activity of Tas [31,32]. It was reported that PFV Tas has three domains that mediate multimer formation in the nuclei of mammalian cells, but the biological function of PFV Tas multimerization has not been defined [32]. In contrast to PFV Tas, BFV Tas has only one domain that mediates dimer formation. The comparison of the multimerization domains of both proteins does not reveal obvious homologies. Deleting the dimerization region abolishes the Tas-induced transactivation of BFV LTR and internal promoter (IP), which suggests that the active form of BFV Tas is a dimer [31].

There are at least four BFV *tas* mRNAs during BFV infection [34]. These four forms of BFV *tas* mRNA transcripts initiate either at BFV LTR (one) or IP (three), are spliced or unspliced and they have a differential ability to activate the BFV promoters (for clarity, only one representative IP-derived *tas* mRNA is shown in Figure 2A) [34]. According to these findings, we propose the following model of Tas-mediated BFV gene expression. Firstly, activator protein 1 (AP-1) and some other unknown cellular transcriptional genes activate the Tas-mediated transactivation of the BFV IP as the early promoter for BFV gene expression, leading to the transcription of BFV IP *tas* mRNAs [35]. In consequence, BFV Tas quickly accumulates to further enhance BFV IP activity. When a defined threshold level of BFV Tas is reached, the early phase of BFV IP-directed *tas*/*bet* expression is switched to the late phase of structural gene expression directed by LTR (Figure 2A). The transcription of LTR-spliced BFV *tas* transcripts with low biological activity can ensure modest levels of Tas, which makes it possible to establish persistent viral gene expression to complete the viral life cycle and maintain a balance between the virus and host cells.

Until now, the molecular mechanisms of transcriptional activation by Tas have remained unclear. Past investigations indicate that co-activators p300 and PCAF physically and functionally interact in vivo with PFV Tas, resulting in the enhancement of Tas-dependent transcriptional activation [36]. Subsequently, PCAF acetylation of feline FV (FFV) Tas was shown to augment promoter-binding affinity and virus transcription [37]. Similar to Tas of PFV and FFV, p300 can specifically interact in vivo with BFV Tas, which results in the enhancement of Tas-dependent transcriptional activation [38]. In addition, the p300-mediated acetylation of BFV Tas can increase its DNA binding affinity, and the K66, K109, and K110 residues are critical for the DNA binding ability of BFV Tas; however, they are not conserved among different FVs [38]. The K→R mutations in full-length BFV infectious clones reduce the expression of viral proteins, and the triple mutant completely abrogates viral replication [21]. These findings suggest that acetylation might be an ubiquitous mechanism adopted by FVs as an effective means to regulate gene expression and animal FVs potentially share similarities with PFV in their need for essential host cell factors, e.g., p300 and PACF, etc. In addition to p300, BFV also engages the cellular RelB protein as a co-activator of BFV Tas to enhance its transactivation function [39]. Furthermore, it was found that BFV infection upregulates cellular RelB expression through BFV Tas-induced NF-κB activation [39]. Thus, it is a positive virus-host feedback circuit, in which BFV utilizes the host's NF-κB pathway through the RelB protein for its efficient transcription [39,40]. There are many other unknown factors that are involved in the transactivation of Tas and some advanced techniques, such as tandem affinity purification and proximity labelling, can be used to discover new co-activators of Tas.

#### 2.2.2. Function of Bet

Although the mechanisms of FV Bet expression by splicing-mediated fusion of the N-terminal domain of Tas to the entire *bel2* coding sequence were described almost 25 years ago, its functions are only partly clarified (Figure 2A) [5,41]. FV Bet is highly expressed after infection by different FVs [13,42]. Previously, FFV and PFV Bet were shown to serve as antagonists of apolipoprotein B mRNA-editing, enzyme-catalytic, polypeptide-like 3 (APOBEC3) family antiretroviral proteins for facilitating PFV and FFV replication [43–45]. In addition, Bet might play an important role in the establishment and maintenance of viral persistence in vitro and in vivo [46,47]. Furthermore, Bet has been described as having a negative regulatory effect upon the basal IP activity of PFV and it might limit the expression of the transcriptional transactivator Tas by inhibiting the activation of the IP [48].

BFV Bet consists of 419 aa and it derives from a multiplied spliced mRNA fusing the first N-terminal 35 aa of BFV Tas to the entire Bel2 open reading frame. Although the sequence homology between the Bet proteins of different FVs is very low, some motifs in Bel2 are similar among the different Bet proteins [49]. The Bet proteins of the known BFV isolates [17,19,50–52] are highly conserved. In PFV-infected cells, Bet was shown to fuse with Env and form a glycoprotein of ~170 kDa [53], but a corresponding BFV Env–Bet fusion protein could not be detected while using a BFV Bet antiserum.

BFV3026 Bet is present in both the nucleus and cytoplasm (predominantly in the nucleus) of the infected or transfected cells [54]. Analysis of BFV3026 Bet amino acid sequences did not reveal the apparent structural sequence or functional protein motifs, but a nuclear localization signal (NLS) was predicted at the C-terminal end of BFV3026 Bet (392–396 aa) containing the amino acid sequence RRRRR (PSORT II software, [55]) and NLStradamus model, [56]). PFV Bet was reported to have an effective NLS at the C terminus (between 406 and 459 aa), but it does not contribute to nuclear localization of the protein and PFV Bet is located in both the nucleus and cytoplasm [57]. BFV3026 Bet has a similar subcellular localization as PFV Bet, so it will be interesting to determine whether the predicted NLS of BFV Bet is functional. Nuclear pore complexes are known to allow two modes of transport: the passive diffusion of small molecules (<20–40 kDa) and active transport of larger molecules (50 kDa and more) [58]. As BFV Bet (55 kDa) is slightly too large to freely shuttle between the cytoplasm and nucleus, it might enter the nucleus using the NLS or by a currently unknown mechanism.

The functions of Bet during FV infection and replication are seemingly contradictory. It is required for FFV productive replication, as Bet mutants showed approximately 1,000-fold reduced viral titer in feline kidney cells when compared to the wild-type FFV [59]. This is in contrast to PFV, where different Bet mutations or deletions did not show a defined phenotype or only an approximately 10-fold decreased cell-free viral transmission, which suggests that Bet might play a role in efficient cell-free viral transmission [60]. However, these studies were often conducted in heterologous or genetically altered cells. Similarly, the BFV Bet mutant BFV3026 genome showed a four-fold higher level of replication than the wild-type genome in engineered human 293T cells [50]. In addition, similar to PFV [61], the overexpression of BFV Bet in heterologous canine fetal thymus cells (Cf2Th) reduced BFV3026 replication approximately threefold [50]. Taken together, these data suggest that BFV Bet may serve as a negative regulator for BFV replication; however, analyses in authentic host cells appear to be absolutely mandatory.

In summary, these observations indicate that a biologically relevant FV Bet phenotype might only be detectable in cells expressing the cellular partner or target molecules of the authentic FV Bet protein. This is e.g. exemplified by the controversial finding on the Bet-induced inactivation of APOBEC3-mediated virus restriction: if APOBEC3 is either missing in the host cell used or the FV in question is propagated in different host cells without APOBEC3 expression or expression of heterologous APOBEC3 proteins, the intricate interaction between these partner molecules is lost, resulting in irrelevant phenotypes. Similarly, the repeated shifts of FVs from one to another host cell may have had similar consequences. These different scenarios are a strong case to use in vitro homologous host cells without genetic changes and adaptations often occurring in tumor cells or after extended passages in vitro and/or to conduct animal experiments in the authentic host species.

#### 2.2.3. Function and Localization of Gag

The interaction and subsequent self-multimerization of retro- and foamy virus Gag protein cause capsid formation [62,63]. Unlike Gag proteins from Orthoretroviruses, FVs Gag is not processed into separate matrix (MA), capsid (CA), and nucleocapsid (NC) subunits (see Figure 2). In fact, four processing sites have been identified in the PFV Gag protein, which are divided into the optimal C-terminal cleavage site yielding p68/p3 and three suboptimal cleavage sites yielding p33/p39 or p39/p29 [64,65]. In BFV, four Gag cleavage forms (p71, p68, p33, and p29) were also observed, indicating that both the optimal and suboptimal cleavages of Gag protein also occur in BFV; the Gag p68/p3 cleavage is the most efficiently used cleavage site [66]. In contrast to Orthoretroviruses, the C-terminal domain of PFV Gag (the NC domain equivalent) contains three glycine (G) and arginine (R)-rich motifs (GR boxes) or less-defined RG-rich regions instead of the canonical cysteine-histidine repeat motif [67]. Similar to PFV Gag, BFV Gag also has a nuclear location signal (NLS) in GR box II, which causes the nuclear accumulation of overproduced Gag protein [66,68].

Unlike Orthoretroviruses, but similar to the other FVs, BFV Gag is not myristoylated and it cannot produce cell-free Gag-only virus-like particles [24,25]. Similar to hepatitis B virus (HBV), BFV particle budding and release are instead dependent on the co-expression of the cognate viral envelope (Env) protein [24,25], which suggests that Env provides a critical membrane-targeting function inherently lacking in BFV Gag. In the case of BFV, this occurs at the plasma membrane rather than the endoplasmic reticulum (ER), due to a lack of a functional ER retrieval signal (ERRS) [68]. The addition of a membrane targeting signal to the N-terminus of Gag restores Gag-only budding from the plasma membrane, implying that Myr-membrane targeting substitutes for Env in particle release [24,68].

Unlike PFV, FFV, and SFVs, BFV is highly cell-associated and it can only transmit through cell-to-cell but not via cell-free pathways [17,24]. Interestingly, the Gag protein of BFV-Z1 (an in vitro selected cell-free infectious BFV3026 clone) lost a 14-amino acid sequence as compared to BFV-B (an infectious cell-associated BFV clone). This 14-residue deletion is located in the central and non-conserved region of FV Gag, which strongly contributes to the size differences of simian versus non-simian FVs [69]. This deletion led to a smaller Gag-Z1 that enhanced cell-free infectivity by four- to five-fold [25]. At the same site in Gag in some high titer (HT) cell-free BFV-Riems variants, insertions, and duplications occurred. However, their impact on BFV titers has not been studied [26]. The Gag-Env interaction is very important for the budding and release of FV virions. Yet, the interaction of Gag and Env in BFV-B and BFV-Z1 was almost the same, which suggests that the contribution of Gag-Z1 to enhanced cell-free transmission is not through promoting interaction with Env [25].

Viruses must engage bidirectional cellular transport mechanisms for completing their whole life cycle, and many viruses require microtubules (MTs) during cell entry for efficient nuclear targeting or the cytosolic transport of naked viral particles [70–72]. In BFV, co-localization of MTs and assembling viral particles was clearly observed in BFV infected cells, which implied that BFV particles or assembly intermediates may transport along the cellular MTs to the cellular membrane to ultimately egress from the host cell. In fact, the MTs-depolymerizing assay indicated that MTs are required for the efficient replication of BFV [66]. In conclusion, BFV has evolved this mechanism to hijack the cellular cytoskeleton for its replication. Until now, it is not clear which components of the MTs are involved in a uni- and/or bidirectional cellular transport of BFV particles. Thus, investigations on the direct interaction between the Gag and MT components should be a future research topic.

#### *2.3. BFV-Host Interactions: Restriction Factors, Innate Immunity, miRNAs and Tight Cell Association*

#### 2.3.1. Restriction Factors

The innate immune system constitutes a first line of defense against invading viruses. Cellular restriction factors are key players of innate and/or intrinsic immunity, which interferes with defined steps in the viral life cycle, leading to the attenuation or complete suppression of virus replication mainly acting immediately after virus infection [73]. On the other side, viruses have evolved strategies to circumvent this inhibitory activity by co-evolution with host-encoded restriction factors.

Restriction factors are constitutively expressed and their expression can usually be increased by interferons (IFNs) [74–77]. Until now, several restriction factors acting on retroviruses have been characterized in detail: APOBEC3, tripartite motif protein 5α (TRIM5α), bone marrow stromal cell antigen 2 (BST2, also called tetherin), SAMHD1, IFITM, MxB, and SERINC [78–84]. Recently, some restriction factors were found to inhibit the replication of FVs. For instance, TRIM5α is implicated in restricting PFV, SFV, and FFV during viral entry [85,86]; APOBEC3 proteins are known to act during PFV, SFV, and FFV reverse transcription (RT), and introduce lethal mutations in the viral genome [43–45,87]; whereas, human BST2 (hBST2) and bBST2A1 (one isoform of bovine BST2) suppress the release of PFV and BFV [88–90]. Moreover, unlike hBST2, bBST2A1 displays no inhibitory effect on cell-to-cell transmission of PFV and BFV [90]. Other antiviral proteins include promyelocytic leukemia protein (PML), IFN-induced 35-kDa protein (IFP35), N-Myc interactor (Nmi), and p53-induced RING-H2 protein (Pirh2), which have been recently shown to inhibit FV replication through interacting with Tas [27,91–93]. PML directly interacts with PFV Tas and it interferes with its ability to bind the TREs in the PFV LTR and IP [91]. IFP35 might inhibit BFV Tas-induced transactivation by interfering with the interaction of a cellular transcriptional activation factor(s) and BFV Tas [27]. Nmi inhibits the Tas transactivation of the PFV LTR and IP by interacting with Tas and sequestering it in the cytoplasm [92]. Pirh2 negatively influences the Tas-dependent transcriptional activation of the PFV LTR and IP by interacting with the transactivator Tas and down-regulating its expression [93]. These antiviral proteins likely limit or modulate the viral spread in vivo, but other antiviral proteins detected, for instance, in

a recent high-throughput screen using PFV might, in addition, lead to FV latency, but are currently mostly unexplored [94].

#### 2.3.2. Innate Immunity

An interesting feature of FVs is their ability to infect a diverse range of cell types and cause a characteristic foam-like cytopathic effect in culture system. However, they appear to be non-pathogenic in either naturally or accidentally infected hosts with a currently "emerging" but still ill-defined capacity to affect blood or kidney parameters without overt clinical consequences [11,95,96]. This suggests that the host immune system controls viral infection and/or FV replication in vivo. Some evidence showed that the innate immune system probably plays an important role in limiting FVs replication to superficial epithelial cells of the oral mucosa [97]. It has been suggested in early studies while using human or primate cell lines that FV does not activate an innate response and cannot induce type I IFNs (IFN-I) [98–100]. However, only in recent years, it was reported that PFV is efficiently sensed by primary human hematopoietic cells via Toll-like receptor (TLR) 7, which leads to the production of high levels of IFN-I [101]. The PFV-induced IFN-I induces the expression of IFN-stimulated genes in line with the finding that factors restricting FV replication are IFN-I-induced (e.g. TRIM5α and APOBEC3, see above). This activation of the innate immune responses might be a prerequisite for controlling viral replication in zoonotically infected humans or natural animal hosts. In line with this finding, previous studies reported that FV replication is sensitive to IFN-I [98,100,102] due to the induction of several IFN-induced cellular proteins with antiviral activity in culture systems [27,43,89–93]. Besides IFN-I, gamma IFN (IFN-γ) that is produced by activated human peripheral blood lymphocytes has also been found to be a major suppressive factor of PFV [103].

Unfortunately, knowledge regarding the host-cell responses (especially innate immune responses) to BFV infection on the level of gene expression is still limited. In a recent study, changes in the transcriptome of the bovine macrophage cell line BoMac after in vitro BFV infection were examined while using bovine long oligo plus microarray (BLOPlus, Michigan State University, US) technology [104]. In total, 124 genes involved in distinct cellular processes were up- or down-regulated. Among the differentially expressed genes, only five are involved in immune response. Three genes (Hsp90b1, hla-drb1, and Cxorf15) were up-regulated while two genes (CXCL2 and SELENBP1) were down-regulated. However, only the results of all three up-regulated genes (Hsp90b1, hla-drb1, and Cxorf15) were confirmed by subsequent RT-qPCR analyses [104].

The Hsp90b1 protein is essential for the broad tropism of vesicular stomatitis virus (VSV) and for the establishment of infection with VSV and activation of innate immunity via TLRs [105]. Therefore, the Hsp90b1 protein might have an effect on the capacity of FVs to infect a variety of tissues from different organisms. In addition, HLA-DRb1 (major histocompatibility complex class II, DR beta 1), an HLA class II antigen, plays central roles in immunity by presenting peptides derived from foreign, non-self proteins. It was found that specific HLA haplotypes, including HLA-DR, may protect against human immunodeficiency virus (HIV) [106], and MHC class II molecules are up-regulated in several lymphoid cell lines following infection with feline immunodeficiency virus (FIV) [107], as well as in T-lymphocytes of FIV-infected cats [108]. Taken together, the increased level of HLA-DRb1 in BFV-infected BoMac cells might be responsible for the sustained elevation of MHC class II antigen levels. Furthermore, Cxorf15 γ-taxilin, together with α- and β-taxilins, is a member of the taxilin family. β- and γ-taxilin may play a role in intracellular vesicle trafficking [109], and the α-taxilin levels are elevated in hepatitis B virus (HBV)-expressing cells and are essential for the release of HBV particles [110]. One can assume that the upregulation of the Cxorf15 gene following BFV infection suggests a possible role of this protein in virion egress while taking similarities in budding strategies for FVs and HBV into consideration [111]. These data offer a basis for further investigation of the immune response of host cells to FV infection, but the above speculation also needs to be further experimentally confirmed.

The effects of BFV infection on immune gene networks were confirmed in a recent study using experimentally infected calves; however, the differentially expressed genes identified one and three days after infection of the animals were different from those reported for the in vitro study while using BoMac cells [14,104].

#### 2.3.3. miRNA Expression as an Additional Layer to Control Host Gene Expression and Innate Immunity

Recently, BFV and simian FV of African green monkey (SFVagm) have been shown to encode miRNAs via RNA polymerase III (RNA Pol III)-directed expression of a complex double-hairpin and, thus, dumbbell-shaped primary miRNA (pri-miRNA) precursor (Figure 2A,B) [22,112]. The identification of such FV miRNA cassettes of about 120 nt length was stimulated and directed by prior findings in bovine leukemia virus (BLV), which is a close relative of human T cell leukemia/lymphotropic viruses (HTLVs) and used as an animal model for its human counterpart [113,114]. The BLV RNA Pol III-driven miRNA cassettes consist of single hairpins structures and they were identified by an algorithm combining the search for RNA Pol III promoters and terminators and the presence of stable RNA hairpin structures that were flanked by these RNA Pol III-specific features [113].

In BFV-Riems, only a single two-hairpin, dumbbell-shaped pri-miRNA with its RNA Pol III promoter and terminator is present in the non-coding part of the LTR U3 region downstream of the *bet*/*bel2* open reading frame (Figure 2A) [22]. In contrast, in SFVagm, several different miRNAs are encoded by either dumbbell-shaped precursors RNA Pol III cassettes and possibly other pri-miRNAs that have been mapped to corresponding sites in the 3' end of the SFVagm U3 region [112] (see below and Table 2). In both studies, the miRNAs were identified and characterized by miRNA sequencing. In BFV, two high level expressed miRNAs comprising about 70% of the total miRNA pool and a third one at modest levels were detected, a potential fourth miRNA from the remaining strand of the second hairpin was undetectable [22]. In contrast, and reflecting the complexity of situation in SFVagm, sequencing identified three high-abundant, two intermediate, and at least six low abundant mature miRNAs [112].

The different miRNA expression capacity and underlying mechanisms of BFV versus SFVagm [22,112] encouraged us to conduct bioinformatics while using the online available and further optimized algorithms to study the situation in BFV-Riems, SFVagm, and other FVs. By modifying the original algorithm of Kincaid et al. [112] we especially focused on dumbbell structures of about 130 nucleotides in size in the LTR sequences of 38 FVs (Table 2). Kincaid et al. also analyzed most of them for miRNA structures (Table S1 in [112]). We predicted for 37 of the 38 FV sequences one or more dumbbell miRNA structures while using a fold energy cutoff of −30 kcal/mol and the existence of a terminator together with a TATA- and/or A/B-Box (as overview, see Table 2).

We confirmed the presence of a singlemiRNA cassette encoding a dumbbell-shaped pri-miRNA [22] in the genome of all known BFV isolates by using the updated algorithm (Table 2). Single dumbbell-shaped pri-miRNAs were also predicted for the closely related EFV and several SFVs from different simian hosts as well PFV derived upon zoonotic transmission into humans. Surprisingly, while a single dumbbell-shaped miRNA cassette was detected in SFVgor, it was absent in another SFVgor sequence that was derived from a zoonotically infected person [115]. Our algorithm found each two independent RNA Pol III dumbbell-type pri-miRNA cassettes in SFVagm (representing S1/S2 and S6/S7 miRNAs in [112]). For other SFVs, two, three, and even five dumbbell-shaped miRNA cassettes were predicted. While the different FFV isolates from domestic cats contained four miRNA cassettes, the highly related FFV variant that was derived from Puma concolor was predicted to only encode three miRNAs.

In general, the predicted dumbbell miRNA cassettes are located in the non-coding region of the U3 LTR sequence, except the first miRNA cassette of all FFV isolates, which partially overlaps the 3' end of the *bel2*/*bet* gene. In addition, the fourth miRNA cassette of the domestic cat FFVs is very close to the transcriptional start site of the LTR promoter and it might interfere with RNA Pol II-directed mRNA expression similar to the situation in SFVCni, where the third miRNA cassette even extends into the R region (Table 2). The size of the predicted dumbbell-shaped pri-miRNA of the different FVs varies between 111 and 128 nt.


**Table 2.** Results of bioinformatics on dumbbell-type RNA Pol III cassettes in the LTRs of selected FVs flanked by consensus TATA boxes and termination signals.

\* References are given for those FVs where experimental miRNA data are available.

As experimental miRNA sequencing data are currently only available for SFVagm and BFV-Riems, it is currently an open question as to whether these bioinformatics-based predictions presented here properly reflect the expression capacity and strategy of the different FVs and whether there is a huge variability of miRNAs between different, and even closely related, FVs. Additionally, the experimentally detected central SFVagm miRNA and the corresponding stem-loops 3, 4, and 5 [112] were not detected by our dumbbell-specific miRNA detection tool, so that, in certain FVs, there may be a co-existence of single-hairpin and dumbbell-shaped pri-miRNAs. Alternatively, the central SFVagm stem-loops 3, 4, and 5 may be derived from larger, more complex pri-miRNAs, for instance, with terminal stem-loops but separated by unfolded, single-stranded intervening sequences.

The two independent experimental studies [22,112] and our in silico analyses show that probably all FVs of different origin contain at least one RNA Pol III-directed miRNA cassettes of the dumbbell-shaped type. The miRNA repertoire of in SFVagm is clearly more complex than that of BFV and it is currently unknown as to whether other FVs may or may not encode also SFVagm/BLV-like single hairpin pri-miRNAs. Thus, further wet biology analyses, high throughput sequencing and bioinformatics are needed to allow for full understanding of this highly important regulatory system of FVs.

The importance of cellular miRNA processing factors dicer and drosha was shown for SFVagm [112], while, for BFV, the impact of the overall shape of the dumbbell-shaped pri-miRNA was demonstrated [22]. In BFV, minor modifications of the pri-miRNA sequence are well tolerated, while the replacement of an authentic stem-loop by heterologous shRNA sequences reduced but did not eliminate reporter gene suppression in dual luciferase assays [23]. This finding, together with high-throughput optimization procedures, as done, for instance, for the BLV RNA Pol III miRNA cassettes [116], may open the way to engineer efficient and specific chimeric FV-based pri-miRNA constructs for therapeutic application, as discussed below (see below, Section 2.5).

Similarities to host miRNAs were detected for experimentally validated SFVagm and BFV miRNAs [112,117] (see Table 3). SFVagm miR-S4-3p shares seed identity and functionality with host miR-155, a noted host oncogenic miRNA (oncomiR). SFVagm miR-S6-3p shares seed identity with the host miRNA family miR-132, which suppresses innate immunity. In contrast, the similarities that were detected for BFV miRNAs comprise the miRNAs bta-miR-125a and the human counterpart of this miRNA has been described as stabilizing the suppressive phenotype of R848-stimulated antigen presenting cells on different levels in a hsa-miR-99b/let-7e/miR-125a cluster [118]. Furthermore, miR-125a and miR-125b are both involved in the progression of cervical cancer [119]. MiR-125b inhibits the PI3K/AKT pathway through the down-regulation of mRNA and protein PIK3CD, while miR-125a is anti-oncogenic by the downregulation of TRIB2 and HOXA1 by the family miR-99 clustered in miR-let-7c~99a, miR-125a~let-7e~99b, and miR-100~let-7a-2. The members of these clusters are diminished in cervical cancer [120]. Taking all of this together, the miRNAs coded by FVs seem to interfere with immune and proliferation processes in the cells, but in different ways. The transcription by RNA Pol III makes the expression of miRNAs independent from protein expression, while the location in the LTR avoids the restrictions that are imposed by overlapping coding regions, thus enhancing variability and adaptability.

**Table 3.** Homology of seed sequences of experimentally identified BFV-Riems and simian FV of African green monkey SFVagm) miRNAs to known miRNAs of other species.


In BFV, three different and closely spaced miRNAs are detectable in chronically or lytically infected cell culture cells and, importantly, also in experimentally BFV-infect calve peripheral blood mononuclear cells (PBMCs), with the latter confirming the relevance of these findings beyond cell culture systems [22]. The three stable miRNAs are generated from both stem-loops of the dumbbell-shaped pri-miRNA [22]. In chronically BFV-infected cells in vitro, two BFV miRNAs make up more than 2/3 of the total cellular miRNA pool pointing to an important role, especially in chronically infected cells [22]. The two high-abundance miRNAs are localized each to the 5'part of the two different stem-loops, while in SFVagm, miRNAs from both strands of the S3 stem-loop are easily detectable [22].

In the BFV system, bovine cells and bovine genomics have been used and the outcomes of bioinformatics-guided target gene prediction as well as wet biology-based experimental validation were mostly conducted in bovine cell cultures and finally in BFV-infected cells and experimentally infected cattle, as described by Cao et al. [14]. In brief, target site predictions for the high abundance BF2-5p miRNA yielded several potential targets with very high scores and two of them with relevance for innate immunity and virus replication, ANKDR17 [121] and Bif-1 (SH3GLB1) [122] were experimentally confirmed in independent assays [14]. In addition, even downstream targets of ANKRD17 showed altered expression in response to BFV miRNA cassette deletion and miRNA co-transfection [14]. A small number of calves were infected with MDBK cells expressing the highly cell-associated BFV Riems isolate and high-titer in vitro-selected BFV variants lacking or carrying the miRNA cassette in order to establish conditions for in depth analyses of the importance of the miRNAs in the authentic host [14]. The data show that all BFV variants are replication-competent in calves; however, the deletion of the miRNA cassette caused a drop of viral infectivity. The deletion of miR-BF2-5p probably reduced the replication competence of the virus, as seen by the lower induction of genes involved in the recognition of viruses by the innate immune system when compared to wt BFV-infected calves. It probably also resulted in the lower level of the humoral response to BFV Gag observed, especially in one of the animal infected with the miR-BF2-5p-deficient BFV variant (for details, see [14]).

#### 2.3.4. Highly Cell-Associated Spread, at Least in Cell Cultures—What Is Behind This Phenotype?

Viruses have two major transmission strategies: cell-free transmission, involving the release of virus particles into the extracellular space, and cell-to-cell transmission [123]. Retroviruses exhibit different degrees of cell-free and cell-to-cell transmission. Unlike most other retroviruses, such as HIV-1, murine leukemia virus (MLV), PFV, FFV, and SFV, which transmit through both cell-to-cell and cell-free pathways, the transmission of BFV is highly cell-associated, with very low to undetectable cell-free transmission [17,24]. This lack, or only low level, of cell-free transmission appears to be independent of whether the BFV isolates have been exclusively propagated in primary bovine cells (the BFV Riems isolate) or whether immortalized bovine (MDBK cells) and hamster and canine cell lines, like BHK-21 and Cf2Th, have been used for virus propagation. BFV is an excellent model for studying virus adaption to cell-free transmission and identifying the principles of viral transmission by in vitro selection and evolution analyses, since the BFV particle budding machinery is similar to that of other FVs [24,25].

In two independent selections screen using established immortal MDBK and BHK-21 cells, BFV Riems was shown to adapt to cell-free transmission within 80 and 130 cell-free passages reaching titers of more than 105 and 10<sup>6</sup> FFU/mL, respectively. The resultant HT variants had independently gained the capacity to spread via cell-free particles, but still also use cell-cell transmission [24]. Genetic studies indicate that consistent and cell type-specific, as well as cell-type-independent adaptive changes, occurred in Gag and Env as well as in the LTR regions where larger changes had also been observed [26] (Bao, Stricker, Hotz-Wagenblatt, and Löchelt, to be published). Importantly, cell-free HT BFV-Riems is still neutralized by serum from naturally infected cows [24]. The different selected HT BFV variants will shed light into virus transmission and the potential routes of intervention in the spread of viral infections.

Zhang and colleagues successfully isolated HT cell-free BFV strains from the original cell-to-cell transmissible BFV3026 strain (Chinese isolate) while using in vitro virus evolution and further constructed an infectious cell-free BFV clone, called pBS-BFV-Z1, to independently explore the molecular mechanisms of BFV cell-free transmission [25]. Following sequence comparisons with a cell-associated clone pBS-BFV-B [50], a number of changes in the genome of pBS-BFV-Z1 were identified. Extensive mutagenesis analyses revealed that the C-terminus of Env, especially the K898 residue, controls BFV cell-free transmission by enhancing cell-free virus entry [25]. The authors also claim that virus release of this variant is increased, although this was not experimentally analyzed [25]. It is well-known that lysine (K) can undergo methylation, acetylation, succinylation, ubiquitination, and other modifications, which play an important role in regulating the protein activity and structure adjustment [124]. Interestingly, the equivalent position of the 898 residue in all known BFV isolates (from the United States, NC001831.1. [19], China, AY134750.1 [50], Poland, JX307861 [51,52], and Germany JX307862.1 [17,52], which only spread through cell-to-cell, is not a lysine (K), while the equivalent position is occupied by a lysine in other high titer cell-free FVs, such as SFV, FFV, and PFV. This suggests that the K898 in Env has an important role in FV cell-free transmission. The underlying mechanisms warrant further studies. Interestingly, the Gag protein of BFV-Z1 lost 14 amino acids in the highly divergent sequence between the matrix and capsid regions, which enhanced cell-free infectivity by four- to five-fold [25]. Other changes of the BFV-Z1 genome contributed little to BFV cell-free transmission. Taken together, these data reveal the genetic determinants that regulate cell-to-cell and cell-free transmission of BFV, and suggest the possibility of generating high-titer BFV vectors through engineering viral Env and, in particular, its C-terminal sequence.

#### *2.4. BFV-Host Interactions at the Organismal and Populational Level*

#### 2.4.1. BFV Epidemiology and Naturally Occurring Co-Infections

Infections with BFV have been reported worldwide since the first isolation of BFV by Malmquist and co-workers in 1969 [30]; however, sero-epidemiological data are only available from some countries and they show variable rates of BFV infected animals. The highest sero-prevalence was reported in Canada, where it varied between 40 and 50% [125,126]. A slightly lower rate of 39% was observed in Great Britain [127] and Australia [128]. The most recent data come from Germany, where only 7% of tested animals were identified as being BFV positive [15] and from Poland with BFV sero-prevalence of over 30% in dairy cattle [129]. BFV prevalence based on these studies seems to be very diverse. However, these data span a long time-frame; therefore, one of the reasons for these disparities might be the different sensitivity of methods used for serological testing, from agarose gel immuno-diffusion (AGID) and syncytia inhibition assay to ELISA and indirect immuno-fluorescence assays. Additionally, the age of animals tested might be a reason of such diverse BFV prevalence, but, although the age of the animals was not specified in these studies, the importance of this factor was suggested by Jacobs and co-workers, who observed a higher rate of BFV positive status in older animals [126]. This might be due to the persistence of BFV infections and prolonged sero-conversion in animals, but one cannot exclude other factors, like breed and the type of animal rearing. Interestingly, no disease or clear clinical symptoms were ever associated with BFV infection in cows. However a role of BFV as a co-factor of other retroviral infection has been suggested, especially in the context of mixed infections, which are one of the characteristic features of retroviral infections [130]. This has been also suggested for people infected with HIV and HTLV [131], cats infected with FIV, FeLV, and FFV [132], FIV/FeLV [133], and FIV/FFV [134], and monkeys infected with SIV and STLV [135]. Similar studies have been carried out with respect to co-infections with BFV, bovine leukemia virus (BLV), and bovine immunodeficiency virus (BIV) in cattle. Amborski and co-workers published the first report on BFV co-infection with other lymphotropic retroviruses in dairy cows [136]. In the already quoted study by Jacobs and others carried out in Canada, including numerous dairy cattle herds, it was shown that the percentage of

animals with antibodies to BIV, BLV, and BFV is 5.5%, 25.7%, and 39.6%, respectively, however with no statistically significant correlation between the individual values [126].

It is assumed that the source of mixed infections might be due to the same route of retroviral transmission, which results in a statistically significant correlation in the occurrence of antibodies, e.g. for BLV and BIV [137] or FIV and FeLV [138,139] and HTLV II and HIV [140]. Mixed infections are particularly important in herds with BLV-infected animals, due to the fact that BLV is the etiological agent of enzootic bovine leukosis and since BFV has been suggested to act as a cofactor in BLV infections [130]. In a recent study, a statistically significant correlation between the occurrence of serologically positive reactions for BLV and BFV at the herd level was shown [141]. Although the results of these studies cannot be related to individual animals, they indicate a certain pattern in the distribution of herds, where BLV and BFV are present. In the study by Jacobs and others, mixed infections of BLV and BFV were recorded in 9.9% of cows [126]. It has been suggested that BFV and BLV co-infections may impair the immune defense capacity of the host [136,142], similarly as proposed for cats co-infected with FIV and FFV [143]. However, it might also be considered that both viruses interact at the molecular level, especially since both of them use the phenomenon of transactivation in the process of viral replication and, even more interesting, they encode miRNAs that interact with genes directly involved in immune defense processes of the host. There is evidence that BLV- and BFV-encoded miRNAs target genes involved in innate and adaptive immunity and, thus, dysregulating their expression levels might facilitate BFV spread, transmission, or persistence. [14,144,145].

BFV is endemic at high prevalence in livestock cattle in different parts of the world, which can be quite easily confirmed by virus isolation in different types of cells (Cf2Th, BoMac, MDBK, KTR, BHK21); however, it is also possible to develop productive infection under experimental in vivo conditions. Only few reports confirmed the possibility of the experimental infection of cattle with BFV [128]. Materniak and co-workers used the experimental BFV inoculation to determine its replication and immunogenicity, not only in its homologous, but also in the heterologous host [13]. Calves and sheep were selected to analyze the infection kinetics in different, but related, species. Although, neither the experimental BFV infection of calves nor sheep resulted in pathology, BFV spread and replicated to similar degrees in both, the homologous and heterologous hosts. Productive BFV infections were established in calves and sheep, as confirmed by virus isolation from leukocytes of all infected animals. BFV was rescued from both infected animal hosts, even in the presence of BFV-specific antibodies, confirming that BFV infection is not cleared by the host immune system [13]. Additional parameters of BFV infection, like humoral immune response to BFV proteins and the presence of BFV DNA in blood cells and organs, also confirmed the persistence of BFV in both hosts [13]. Interestingly, upon long-term replication in sheep, approximately 70% and 40% of the single nucleotide mutations in the sheep-derived BFV *bet* and *env* sequences, respectively, led to changes in the amino acid sequence. As no consistent pattern of adaptive changes was detectable, this proves the utility of sheep as an animal model to study the biology of persistent spumavirus infections [13].

#### 2.4.2. BFV Transmission Route

The transmission of BFV is suggested to occur through close contact. While considering cattle behavior, it is assumed that BFV shedding occurs via saliva through non-aggressive contact, like sneezing or licking and via infected milk [143]. The successful recovery of BFV from saliva and milk of naturally infected cattle tends to confirm this mode of virus shedding and transmission [15,16]. Interestingly, older studies that were reported by Johnson and others, as well as Kertayadnya et al., showed that calves being BFV negative at birth or the beginning of the experiment became infected when placed together with infected adults [128,146]. Additionally, Johnson and others studied different routes of transmission using BFV infected culture fluids containing cell debris. This experiment showed that only throat spray and intravenous application resulted in successful BFV infection in calves, while the swabbing of cell culture-derived BFV into the vagina or onto the prepuce did not lead to infection [128]. Kertayadnya and others also excluded insect or airborne transfer of BFV infection [146]. The authors of both reports state that the most possible source of infection under natural conditions is close contact to a single immunologically tolerant individual, which is productively BFV-infected, but has no or only very low levels of BFV-specific antibodies. However, the source of such BFV tolerance is disputable. One hypothesis is that such animals are characterized by a very early viremic stage of infection before the development of neutralizing antibodies occurs. However, as Kertayadnya and others reported, there are animals that are productively infected with BFV, but do not produce antibodies, even after several months of infection [146]. Another explanation of immunological tolerance towards BFV could be via in utero infection. Such a scenario might be supported by the studies of Bouillant, and Ruckerbauer who recovered BFV from the uterus of BFV infected cows [147]. Finally, perinatal transmission of BFV via colostrum or milk has been proposed. This route of BFV spread is supported by our studies showing that BFV can be reproducibly isolated from the cellular fraction of raw milk [15,16].

#### 2.4.3. Interspecies and Zoonotic Transmission of BFV as Part of the Human Food Chain

It has been clearly demonstrated that SFVs can be zoonotically transmitted to humans in Africa, South America, and Asia upon exposure or contact with SFV-infected monkeys [6,8,148]. The nature of human exposure to BFV is slightly different, but it seems to be constantly present in the human food chain when considering the routes of virus transmission and replication sites of BFV. Additionally, there are many products of cattle origin used in pharmaceutical and cosmetic industry. However, direct contact, which seems to be the most likely mode of zoonotic transmission, is restricted to a limited part of the human population. So far, two serological studies were performed to screen for BFV in humans. One of them focused on dairy cow caretakers, cattle owners, and veterinarians who were tested for BFV-antibodies and showed an overall sero-prevalence of about 7%; however, none of them was PCR positive for BFV DNA in PBMCs [149]. Another study included three groups of humans, who were serologically tested for BFV antibodies [150]. BFV-specific reaction was found in 7% of immunosuppressed patients, 38% of people claiming contact with cattle, and 2% of the general population with no interaction with cows. In each group, a single BFV PCR positive individual was identified and the sequence of short PCR product showed high homology to BFV isolates that are available in GenBank. The data obtained suggest that BFV zoonotic infection may be possible, however it is not common and, in most of the cases, cleared by the host immune system.

#### 2.4.4. BFV Replication in Naturally and Experimentally Infected Animals

In vivo studies play a vital role in virology, since they allow for investigation of the events taking place during the viral infection in the host. Many aspects of infection can be, in fact, only explored by examining naturally infected animals; however, this needs to be done under carefully controlled conditions in the homologous or a heterologous host. In studies on the replication of BFV in vivo, both directions were used; therefore, these data are quite comprehensive. Similar to other FVs, BFV shows a wide tissue tropism. In naturally infected animals, BFV was recovered from peripheral blood leukocytes/lymphocytes, tumors, fetal tissues, placenta, testis, and from fluids used to flush the uterus and oviducts of super-ovulated cows [17,30,51,151–154]. Studies on naturally and experimentally infected animals using PCR-based virus detection showed that BFV DNA is present in most tissues, like lung, salivary glands, liver, spleen, and bone marrow [13]. Interestingly, some reports from SFVs in monkeys previously showed that, although DNA is present in most animal tissues, SFV RNA, indicative of viral gene expression and replication, is mostly if not exclusively detected in oropharyngeal sites [155–157]. In the most recent studies on tissue distribution of BFV DNA and RNA, different organs, as well as blood, bronchoalveolar lavage cells (BALs), and trachea and pharynx epithelium of experimentally infected calves, were analyzed [28]. The highest load of BFV RNA was detected in the lungs, spleen, liver, PBMC, BALs, and trachea epithelium, while in contrast to the previous studies showing BFV isolation from saliva and milk cells of naturally infected cows [15,16], BFV RNA was detected in the saliva of only a single calf. The presence of BFV RNA in such diverse

organs seems to be strong proof that active replication of BFV might be not limited to the oral cavity, in contrast to the findings for SFV gene expression in monkeys.

In a recent study, HT BFV Riems, the cell free variant of BFV, and wild type BFV Riems isolate were used for the experimental inoculation of calves [14,24]. The infection pattern was very similar in both groups of calves. The humoral response was comparable in both groups, but BFV viral load measured in PBMCs of infected animals during 16 weeks p.i. was slightly lower in calves that were infected with HT-BFV Riems. However, at the end of experiment, BFV was rescued from PBLs of all, parental and HT BFV-infected, animals. Interestingly, when changes in the expression of selected genes involved in innate immunity were analyzed at day 1 and 3 p.i., the level of induction was clearly lower in the HT-BFV Riems infected calves as compared to wt BFV Riems inoculated animals, which suggests a slightly impaired detection of HT BFV.

Importantly, and confirming the concept that interspecies transmission of FVs is possible to genetically related hosts, sheep have been shown to be permissive to experimental BFV infection while using intravenous inoculation of BFV100-infected Cf2Th cells [13]. Successful inoculation of sheep with BFV as well as its transmission via saliva may support the risk of cross-species infections in mixed farms, where cattle and sheep are kept in close contact. Previous work, in fact, reported the presence of an FV-like virus in sheep [158], but two scenarios are possible since no further characterization of the isolate was performed. One is that the infection was a result of a cross-species transmission of BFV from cattle; alternatively, the isolate was, in fact, a sheep specific foamy virus. Over 500 German sheep serum samples were tested to try to answer this question and 35 sheep sera showed reactivity to BFV Gag antigen in GST ELISA [149]. Unfortunately, further diagnostics of these BFV-cross-reactive animals by virus isolation or PCR amplification were unsuccessful [159]. Recent studies regarding wild ruminants revealed a similar scenario: some sera also reacted with BFV antigens in ELISA test, but PCR mostly failed to identify genetic material of BFV [160]. In fact, the lack of amplification with BFV-specific primers might suggest the infection with novel, species-specific FVs, which generate antibodies that cross-react with BFV-specific antigen. Therefore, the existence of BFV-related FVs in other ruminants still remains open.

#### *2.5. Utilization of BFV as Viral Vector for Translational Applications*

As with other retroviruses, different viral vectors for the expression of therapeutic genes or the delivery of vaccine antigens have been constructed for PFV, few SFVs, and FFV, as review see [161]. These vectors mostly include replication-deficient gene transfer vectors generated in producer cells, but they also include replication-competent engineered viruses, which are mostly intended for life vaccine applications. Some of these vectors have been tested beyond cell cultures in small lab animals (mostly mice), but also in outbred hosts like dog (ex vivo PFV-based gene transfer vectors to treat canine leukocyte adhesion deficiency [162]) and cats (FFV-based replication-competent vaccine vectors [163,164]). The recent cloning of full-length BFV genomes that allows for cell-free transmission [14,25,26] is an important prerequisite for conducting and extending corresponding studies also for BFV.

Cattle are important livestock animals and vector-based approaches are likely to meet the costs imposed by bovine infectious disease or the need to engineer defined traits in the future. Such vector-directed gene transfer and vaccination might be an interesting option with a corresponding market to explore and use BFV as a suited viral vector for treatment of cattle. The availability of CMV-IE-driven BFV genomes and recent data that HT cell-free BFV variants replicate in cattle are important prerequisites for such studies [14]. Furthermore, even an engineered HT cell-free BFV variant lacking the entire miRNA cassette replicated in experimentally infected animals and induced immunity against BFV Gag and Bet [14]. Together with the data on the function and core features of the BFV miRNA cassette and the chimerization of the BFV pri-miRNA [23], the insertion of therapeutic or prophylactic miRNAs into replication-competent BFV vectors or the construction of BFV-based miRNA expression tools appear to be new and interesting future developments [161].

#### **3. Conclusions and Outlook**

As shown here, research regarding diverse aspects of BFV replication and biology in vitro and in vivo has significantly expanded our understanding of the complexity and diversity of FVs. These new findings are in line with the concept that each of the known exogenous FVs has been shaped by a long history of co-adaptation and co-evolution [10]. It remains to be seen whether, for instance, the host dictates the major pathway of FV transmission: Here, strong differences between herbivorous cattle that do not display aggressive intra-species behavior and carni- and omnivorous simians or felines with a substantial amount of aggressive behavior within and between groups and individual animals can be anticipated. Whether such differences in the host's biology not only affect transmission, but also the repertoire of antiviral restriction, remains to be seen, but appears to be possible.

Additional avenues of high-impact research in BFV may be related to development of vaccine vectors based on the BFV genome or genetic elements thereof. The possibility of cell-free BFV infections recently achieved offset the limitations of a tight cell-associated transmission from BFV as a therapeutic and prophylactic vector candidate [14,24,25]. Vaccines that are based on bovine virus-based vectors could be a great alternative in veterinary science and practice, especially in the context of economically important infections that, due to the high prevalence in cattle populations, cannot be eradicated by culling. Furthermore, extending the studies on the BFV miRNAs as modulators of the virus-host interface and evolutionary struggle, as well as their translational application within a BFV-based vector or as an independent cassette, appears to be a promising extension of ongoing work. Finally, it is of high priority to explore the "requirements" of BFV—as part of the human food chain and present in several raw cattle products—to enter the human population. Such studies may be pretty challenging and also—to a certain degree—unpredictable, but of high medical and epidemiological importance. Here, in vitro selection and evolution screens employing either fresh animal-derived BFV or "native" BFV isolates and primary human cells of different tissue/organ origin will be of special value.

**Author Contributions:** M.M.-K., J.T., A.H.-W. and M.L. wrote, corrected and approved the article, A.H.-M. provided valuable information and conducted bioinformatics analyses together with A.H.-W., the initial layout and final editing were done by M.L.

**Funding:** Work by M.M.-K. was supported by Polish Ministry of Science and Higher Education and DAAD, gants no. 5 PO6K 043 27, 484/N-DAAD/2009/2010 and KNOW (Leading National Research Centre) scientific Consortium, "Healthy Animal—Safe Food", decision of Ministry of Science and Higher Education 05-/KNOW2/2015. J.T. was funded by the National Natural Science Foundation of China, grant number 31670151 and A.H.-M., A.H.-W and M.L. were supported by the Baden-Württemberg Stiftung, research grant SID 49 and PPP Polish-German DAAD travel grants.

**Acknowledgments:** M.M.-K. would like to thank Jacek Ku´zmak for introduction into the foamy virus field and his continuous support. M.L. thanks his partners and supporters on BFV research, in particular Roland Riebe, Frank Rösl, Jacek Ku ´zmak, Timo Kehl, Torsten Hechler, Wentao Qiao, Bryan Cullen, Thomas Vahlenkamp, Lutz Gissmann and M.M.-K. The authors thank Martha Krumbach (DKFZ) for critically reading the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Structural and Functional Aspects of Foamy Virus Protease-Reverse Transcriptase**

#### **Birgitta M. Wöhrl**

Lehrstuhl Biopolymere, Universität Bayreuth, D-95440 Bayreuth, Germany; birgitta.woehrl@uni-bayreuth.de Received: 12 June 2019; Accepted: 29 June 2019; Published: 2 July 2019

**Abstract:** Reverse transcription describes the process of the transformation of single-stranded RNA into double-stranded DNA via an RNA/DNA duplex intermediate, and is catalyzed by the viral enzyme reverse transcriptase (RT). This event is a pivotal step in the life cycle of all retroviruses. In contrast to orthoretroviruses, the domain structure of the mature RT of foamy viruses is different, i.e., it harbors the protease (PR) domain at its N-terminus, thus being a PR-RT. This structural feature has consequences on PR activation, since the enzyme is monomeric in solution and retroviral PRs are only active as dimers. This review focuses on the structural and functional aspects of simian and prototype foamy virus reverse transcription and reverse transcriptase, as well as special features of reverse transcription that deviate from orthoretroviral processes, e.g., PR activation.

**Keywords:** foamy virus; protease; reverse transcriptase; RNase H; reverse transcription; antiviral drugs; resistance

#### **1. General Features of Foamy Virus Replication**

Foamy viruses (FVs) are retroviruses that—based on several differences in their molecular properties—are gathered in the subfamily of *Spumaretrovirinae*, whereas all other retroviruses are members of the subfamily *Orthoretrovirinae* [1]. The latter include well-characterized retroviruses such as human immunodeficiency virus (HIV), murine leukaemia virus (MLV), or Rous sarcoma virus (RSV) [2]. FVs are endemic in various mammalian hosts, including cats, horses and non-human primates, but not humans. The so-called prototype foamy virus (PFV) was first isolated from a human nasopharyngeal cell line [3]. Sequence comparisons with a simian FV revealed that it originally was derived from a chimpanzee [4].

FVs are complex retroviruses, i.e., they contain accessory genes. Similar to orthoretroviruses, their genomes contain the genes *gag*, *pol,* and *env* (Figure 1). However, in contrast to orthoretroviruses such as human immunodeficiency virus (HIV), the Pol protein is expressed from a separate mRNA and translated from its own AUG start codon; thus, no Gag–Pol fusion protein is produced [5–7].

In the proviral genome, the viral genes are flanked by long terminal repeats (LTRs). The 5 LTR harbors the viral promoter, which controls transcription of the *gag*, *pol*, and *env* mRNAs. However, additionally, FVs possess an internal promoter (IP) near the 3 end of the *env* gene, which is responsible for transcription of the accessory proteins Bet and Tas [8–11] (Figure 1). Tas activates transcription from the 5 LTR and enhances transcription from the IP [12]. The Bet protein appears to be important for efficient virus replication [13], and interacts with the cellular proteins of the APOBEC family, which function as antiretroviral restriction factors [14–18].

Another interesting feature of FVs is the processing of the Gag protein. Whereas in orthoretroviruses, Gag is cleaved into matrix (MA), nucleocapsid (NC), and capsid (CA) proteins, the only cleavage in the 71-kDa Gag of FV occurs near the C-terminus, resulting in a 68-kDa Gag and a ca. 3-kDa peptide (Figure 1). The cleavage of Gag by the viral protease (PR) was shown to be essential for infectivity [19–21]. The wild-type virus contains a mixture of Gag p71/p68 proteins at a ratio of ca. 1:4 [22]. Inactivation of the Gag p68/p3 cleavage site inhibits reverse transcription at the first template switch. However, p3 itself is not required for infectivity [20,23–26].

**Figure 1.** Overview of the foamy virus (FV) genome organisation. The proviral DNA genome is shown. The *gag, pol,* and *env* genes are depicted as boxes. The flanking long terminal repeats (LTRs) comprise the U3, R, and U5 regions, as indicated underneath the 5 LTR. Transcription starts at the promoter upstream of the R region in the 5 LTR and at the internal promoter (P and IP, respectively), which are depicted as rectangular arrows. The transactivator protein Tas activates both promoters, as indicated by the arrows. *bel2* encodes the Bet protein. The locations of the Cas sequences, PARM, the purine rich elements A–D, as well as the cPPT and 3 PPT are illustrated. Only the gene products Gag and Pol, which are processed by the viral PR, are shown.

#### **2. The Pol Protein**

p3

Conventional retroviruses express *pol* as a Gag–Pol fusion protein by a rare frameshift event or nonsense codon suppression mechanism. In FVs, Pol is generated from a spliced mRNA independently from Gag [5–7,27–29]. It contains the genes for the PR, polymerase, and RNase H domains, forming the reverse transcriptase (RT) as well as the integrase domain (IN) (Figure 1). The FV Pol protein undergoes only limited proteolysis. A single cleavage between the RNase H and IN domains is carried out, resulting in two mature viral enzymes IN and a PR–RT fusion protein [30,31]. This is in contrast to HIV and other orthoretroviruses, in which Pol is cleaved by the viral PR into three separate proteins, PR, RT and IN (reviewed in [32]).

The existence of a separate Pol poses interesting questions regarding its encapsidation into the FV capsid. It has been suggested that only very few Pol molecules are encapsidated [33,34]. Various studies identified two *cis*-acting sequences (Cas) in the FV pregenomic RNA: CasI and CasII (Figure 1), which are essential and sufficient for the transfer of FV vectors, indicating an important role in virus assembly [35–38]. CasI spans from the 5' leader sequence into the 5' *gag* region of the pregenomic RNA of PFV (nucleotides 1–645). CasII is situated in the 3 region of *pol* (nucleotides 3869–5884) [39]. Within the Cas regions, so-called Pol encapsidation sequences (PES) have been detected that are required to incorporate the full-length Pol protein into the FV capsid. These PES regions range from nucleotides 314 to 354 in CasI and nucleotides 4881 to 5884 in CasII. The deletion of either PES resulted in a significant reduction of Pol uptake into the virus particles [39].

Furthermore, FV Gag binds to pregenomic RNA, and its C-terminus contains determinants that are also important for Pol encapsidation [24,37,39,40]. These results indicate that the pregenomic RNA functions as a bridging molecule between Gag and Pol precursors, and that an interplay of protein–protein as well as protein–RNA interactions is important for correct virus assembly.

#### **3. Reverse Transcription**

Reverse transcription—the reverse flow of genetic information from RNA to DNA—is pivotal in the replication cycle of all retroviruses. In the year 1970, the enzyme reverse transcriptase (RT), which catalyses this process, was identified [41,42]. Retroviral RTs exhibit two enzymatic activities that are required to synthesize double-stranded DNA from a single-stranded RNA template: (1) a DNA polymerase activity that can use both DNA and RNA as a template, and (2) an RNase H endonuclease activity that hydrolyzes the RNA strand in an RNA/DNA intermediate. Without the RNase H activity, reverse transcription cannot take place, since RNA degradation is absolutely required for synthesis of the second DNA strand. Misleadingly, the polymerase domain alone is often called the RT domain.

Although the principal order of events is similar in orthoretroviruses and spumaretroviruses, FV reverse transcription takes place late in the replication cycle, i.e., shortly before the virus leaves the cell, whereas conventional retroviruses reverse transcribe their genomic immediately after entering the cell. The pregenomic single-stranded RNA is packaged during virus assembly and reverse transcribed into double-stranded DNA before budding. Thus, the virions of FVs contain mainly double-stranded DNA, which is the functional genome when the virus infects the cell. The packaged pregenomic RNA is diploid. A dimerization signal has been identified at the 5 end of the RNA [43,44].

Experiments with the RT inhibitor 3 azido-3 deoxythymidine (AZT) revealed that reverse transcription is largely complete before the infection of new cells [5,45,46]. However, the results of Delelis and Zamborlini suggested a biphasic DNA synthesis with an additional early reverse transcription event, which might optimize genome replication [47,48]. In contrast, in conventional retroviruses such as HIV-1 and MLV, only a very small amount of DNA consisting only of early reverse transcription products—but no full-length DNA—could be detected in virions [49].

In some aspects, FVs resemble hepatitis B virus (HBV). Similar to the FV Gag, the HBV viral structural core protein is not cleaved in virions, and contains Arg-rich regions that interact with RNA in the early stages of reverse transcription and with DNA during encapsidation and in the mature particle [50,51]. In HBV, long reverse transcription products are synthesized by the reverse transcriptase, which is called the P protein. Interactions between the viral pregenomic RNA, the P protein, and the core protein are necessary for particle assembly. In extracellular HBV particles, a partially double-stranded (gapped) circular DNA molecule is present instead of RNA, indicating that reverse transcription takes place during and after particle formation, but before the virus enters a new cell [52,53].

Several groups have investigated the effect of PFV mutants expressing *gag* and *pol* as a Gag–Pol fusion protein [54–57]. The co-expression of Gag with Gag–Pol resulted in a molar ratio of 20:1 in virus particles, which is similar to orthoretroviruses. However, larger variations in the Gag:Pol ratio than in orthoretroviruses are tolerated. Furthermore, virus titers similar to that of the wild type could be achieved as long as a proteolytic cleavage took place between Gag and Pol [54,56]. If the constructs did not allow for removal of the p3 Gag peptide from Pol, particle release resembled that of the wild type, but infectivity was reduced [56]. Reverse transcription with the Gag–Pol mutant virus was also found to be a late event in the replication cycle. However, under AZT treatment, a ca. fivefold drop in virus titer was determined for both wild-type and mutant viruses, implying that early DNA synthesis might also be required [54,57].

Similar to all retroviruses, FV reverse transcription starts at the so-called primer binding site (PBS) close to the 5 RU5 region of the pregenomic RNA (Figure 1). PFV uses a tRNALys1,2 primer annealed to the PBS for minus-strand DNA synthesis [30]. Synthesis of the plus-strand DNA is initiated at the 3 polypurine tract (PPT), which is located upstream of the 3 U3R region. Additionally, FVs harbor a second so-called central PPT (cPPT), which is located in the CasII region of the *pol* open reading frame. Within CasII, four purine rich sequences (elements A–D) are present. However, only the D element is 100% identical to the 3 PPT, and thus is likely to constitute the actual cPPT (Figure 1). It is highly conserved in all FV species [58,59]. The C element is required for the regulation of gene expression, and appears to be relevant in *cis* to achieve a sufficient amount of Gag protein. It has been shown

recently that it regulates splicing by suppressing the branch point recognition of the strongest *env* splice acceptor. Thus, it plays an essential role in the formation of unspliced *gag* and singly spliced *pol* transcripts [39,58–61]. A and B elements play a role in Pol encapsidation and moreover in PR activation (see below) [59]. Similar to lentiviruses such as HIV, the cPPT of FVs is used as a second initiation site for plus-strand DNA synthesis. In HIV, a so-called central flap region with overlapping single-stranded DNAs is created during reverse transcription. The flap ensures efficient replication in non-dividing cells [62]. However, FVs are not able to establish productive infection in resting cells [63]. Instead of creating a flap, the cPPT is degraded to produce a single-stranded gap region in the double-stranded unintegrated linear PFV DNA [58–60]. The length of the PFV gap varies from 144 to 731 nucleotides with the start and terminal nucleotides being located on either side of the cPPT D element. Mutations in the FV cPPT, which retain the IN amino acid sequence, result in the reduction of the virus titer, indicating the important role of the cPPT in virus replication [59,64]

#### **4. Foamy Virus PR-RT**

#### *4.1. Domain organization.*

Although the RTs of retroviruses all fulfill the same essential function, i.e., the formation of double-stranded DNA from a single-stranded RNA template, their domain organization is different (Figure 2). HIV RT is a heterodimeric enzyme in which only the larger p66 subunit harbors the polymerase active site and carries the RNase H domain located at the C-terminus. The p51 domain is homologous to the N-terminus of p66, but lacks the RNase H domain, which is cleaved off by the viral PR (Figure 2). Due to the different conformations of p51, no polymerase active site can be formed [65,66]. The RT of RSV is also heterodimeric, consisting of a 63-kDa α subunit and a 95-kDa β subunit, although the respective homodimers can also be isolated from virus particles [67,68]. In addition to the polymerase domain, the connection subdomain and the RNase H domain, the β subunit harbors the IN domain. The active sites of both the polymerase and RNase H are located in the α subunit [69].

**Figure 2.** Domain organisation of retroviral RTs. Human immunodeficiency virus (HIV-1) reverse transcriptase (RT) is a heterodimer with a 66-kDa and a 51-kDa subunit. The sequence of the N-terminal region of the two subunits is identical, and comprises the polymerase domain and the connection subdomain, which are highlighted in light and dark green, respectively. The RNase H domain (blue) is located at the C-terminus of the larger subunit. The Rous sarcoma virus (RSV) RT is also heterodimeric. The larger β subunit (95 kDa) carries, in addition to the polymerase, connection, and the RNase H (sub-)domains of the small α subunit (63 kDa), the IN domain (grey). Xenotropic murine leukaemia virus-related virus (XMRV) and murine leukemia virus (MLV) RTs are monomeric enzymes (75 kDa). In addition, the mature monomeric FV enzyme (86 kDa) harbors the PR domain. The stretch ranging from amino acids (aas) 102–143 between the C-terminal end of the PR domain, and the start of the RT domain is highlighted in pink.

The RT of Moloney MLV (MoMLV) and the closely related xenotropic murine leukaemia virus-related virus (XMRV) are monomeric enzymes. The RNase H domain is connected to the polymerase domain via a flexible linker and thus is quite mobile, but becomes ordered in the presence of substrate [70,71]. The mature RT from FVs is actually a PR-RT fusion protein harboring the PR domain at its N-terminus [72,73]. Nevertheless, FV PR-RTs resemble MLV RT in their structural organization and in some biochemical and biophysical properties, but differ from HIV RT. However, since the overall amino acid similarity of FV RTs to MoMLV or XMRV RT is less than 25%, and the PR domain is an integral part of the mature FV enzyme, subdomain assignments cannot be easily obtained from sequence comparisons. Size exclusion chromatography with purified PR-RTs of SFV from macaques (SFVmac) and PFV showed that they are monomers in solution [74,75]. They exhibit polymerase as well as RNase H activities [33,74,76]. Furthermore, similarly to MLV, the isolated RNase H domain is active, but loses specificity (see below) [77–79].

Purified recombinant FV PR-RT monomers do not exhibit PR activity, nor does the separate PR domain. The PR activity of the full-length PR-RT and the separate PR domain can be induced by unphysiological high NaCl concentrations of 3–4 M (see below) [74,80,81]. Unfortunately, no crystal structure of a full-length PR-RT enzyme is available so far. However, the NMR solution structures of isolated FV PR and RNase H domains exist, which give insight into the functions of the FV PR-RT enzyme and its domains [75,77,78,80]. Amino acid sequence comparisons of FV PR-RT with RTs from other retroviruses indicate that the polymerase domain is composed of fingers, palm, and thumb subdomains followed by a connection subdomain and the RNase H. Comparable to HIV RT, the connection subdomain appears to play a role in primer/template binding, protein stability, and polymerization efficiency. The stretch ranging from amino acid (aa) 102 to 143, which is located between the C-terminal end of the PR domain and the start of the RT domain, does not exhibit homology to retroviral PRs or any other RT, but appears to be an intrinsic part of the RT domain that is necessary for solubility and the integrity of the protein [82].

#### *4.2. Polymerization Activities.*

The YXDD motif of the polymerase catalytic site is localized in the palm subdomain and is highly conserved among retroviruses. The Asp residues are involved in metal binding. A general model for the catalysis of the polymerase suggests the coordination of two Mg2<sup>+</sup> ions in which one of them supports the nucleophilic attack of the 3 OH group of the DNA primer onto the α-phosphate of the incoming dNTP, while the second metal ion is important for pyrophosphate release [83,84].

In most RTs, including the HIV-1 RT, the second site of the motif is a Met (YMDD). However, MLV and FVs contain a Val as the second residue. In HIV-1, mutation of the polymerase active site from YMDD to YVDD causes high level resistance to the inhibitor 3 thiacytidine (3TC) [85]. Changing YVDD to YMDD in PFV severely impairs virus replication, since reverse transcription cannot be completed. In vitro polymerization assays further indicate that the wild-type YVDD PFV PR-RT is a highly processive DNA polymerase, whereas the YMDD mutant exhibits significantly reduced processivity [34], which is defined as the length of polymerization products synthesized during one round of binding and polymerization before dissociation and reassociation occur. These results indicate that FVs require a highly processive RT for efficient replication. This is probably because in contrast to HIV, only a few Pol molecules are taken up into the virus particle via the direct interaction of Pol with Gag and the viral RNA [33,34]. Interestingly, although the mutant YMDD PFV enzyme resembles the wild-type HIV-1 RT, it still is resistant to 3TC, indicating that probably additional determinants other than the Val in the YXDD motif are involved in the 3TC resistance of FVs [34].

Investigation of the fidelity of PFV PR-RT revealed that it is similar to that of HIV-1 RT for base substitutions; however, it generates more insertions and deletions [33]. Nevertheless, the genetic variation of FV genomes is limited. This might be because although FV genomes can be found in many tissues, high levels of viral RNA were only detected in oral tissues [86]. Compared to HIV, the replication activities of FVs are restricted to certain tissues, which probably supports the conservation of the genome.

Comparison of the KM and kcat-values for polymerization on homopolymeric and heteropolymeric substrates indicated similar results for purified SFVmac and PFV PR-RT. The KM values for both substrate types are also comparable [74,76]. However, the KM values for FV PR-RTs are about five to 30-fold higher than published values for HIV-1 RT [87–89]. In addition, KD values for DNA/DNA (PFV 44.4 nM; SFV 36.4 nM) or DNA/RNA (PFV 9.9 nM; SFV 32.4 nM) substrates are much higher than those determined for HIV-1 RT, for which the KD values for both substrates of ca. 2 nM have been determined [90,91]. Comparison of the pre-steady state kinetics of dNTP incorporation of PFV PR-RT with HIV-1 and MLV showed a severely reduced primer extension capacity of PFV PR-RT at low dNTP concentrations. This behavior is similar to MLV RT, but in strong contrast to HIV-1 RT [92]. For example, kpol/KD values for dATP incorporation for PFV PR-RT and MLV RT reach values of 2.9 and 2.1, respectively, whereas a value of 55.3 was achieved for HIV-1 RT [92,93]. The authors suggest that the different polymerization properties might have evolved, because HIV and FVs as well as MLV replicate in different cell types. Whereas HIV is able to efficiently propagate in non-dividing cells that have low dNTP concentrations, MLV and FVs replicate in dividing cells. Since these cells contain high dNTP concentrations, FVs did not need to evolve an RT enzyme with high dNTP binding affinities.

#### **5. RNase H Activity and Structure**

The catalytic activity of the RNase H domain of retroviral RTs is essential during reverse transcription. Mutations that inactivate the RNase H prevent virus propagation [94,95]. Retroviral RNases H are partially processive endonucleases, which in general do not cleave sequence-specifically. Cleavage of the RNA strand of an RNA/DNA hybrid takes place in the presence of Mg2<sup>+</sup> ions and results in 5 phosphate and 3 OH termini. RNase H also exhibits a 3 to 5 exonuclease activity during DNA polymerization [96–98]. In addition, during reverse transcription, two specific cleavages are required to remove the extended tRNA and PPT primers, which are used to start minus-strand and plus-strand DNA synthesis. RNase H cleaves specifically between the RNA–DNA junctions [99–103].

To investigate cleavage at the FV PPT-U3 region, DNA/RNA substrates were designed using 5 end-labeled RNA that contained the entire PPT and part of the U3 region of FV. During reverse transcription, FV PR-RT progressively degrades the RNA until it encounters the PPT. It was shown that FV PR-RT recognizes its own PPT, and cleaves specifically at the U3/PPT boundary. However, it did not properly cleave a similar substrate containing the HIV-1 U3–PPT junction, and vice versa HIV-1 RT did not cleave the FV substrate correctly, suggesting that the two enzymes bind the substrate differently [33]. Gel filtration and NMR data showed that the separate PFV RNase H is a monomer [77,78]. Although the presence of Mg2<sup>+</sup> ions in the RNase H catalytic center is not required for substrate binding by RTs, NMR spectroscopy indicated that the metal ions are important for stabilization of the overall structure of PFV RNase H [77]. In addition, it has been shown for other RNases that RNA cleavage is achieved by a mechanism that involves two Mg2<sup>+</sup> ions bound in the catalytic center [104,105].

The KM values for the RNase H activity of the full-length SFVmac (18.1 nM) and PFV (17.1 nM) enzymes are similar to that of HIV-1 RT (25 nM). This is somewhat surprising, since the amount of RT molecules is much higher in HIV-1 than FV virions [33,34,74].

The isolated RNase H domain of HIV-1 RT is inactive, but activity can be restored by N-terminal extensions, which stabilize the protein [106]. Independent MoMLV and PFV RNase H retain cleavage activity; however, it is remarkably lower than that of the full-length RT enzymes [77,79,107,108]. The RNase H cleavage patterns of the independent PFV RNase H domain and the full-length PR-RT differ, and the KD value of 23 μM for DNA/RNA substrate binding for the free RNase H is about 4000-fold higher, indicating a substantial role of the polymerase domain for nucleic acid affinity and specificity [74,78].

Moreover, analysis of the RNase H cleavages performed by full-length FV PR-RT and HIV-1 RT on non-specific RNA/DNA substrates revealed different cleavage sites, which also suggests differences in nucleic acid binding [33]. Time-course experiments with PFV and SFVmac PR-RTs indicate that both enzymes cleave endonucleolytically at around −17 to −19 in the RNA. This is followed by a 3 > 5 directed processing of the RNA [33,74]. Amino acid sequence comparisons of the RNases H from various retroviruses as well as the human and the *Escherichia coli* RNases H showed that in contrast to the inactive free HIV-1 RNase H, they all contain an additional helix-loop structure, the basic protrusion, which consists of the so-called C-helix and a downstream basic loop element [78]. The basic protrusion of RNases H has been suggested to be important for substrate binding and activity. In HIV-1 RT, this function is probably fulfilled by positively charged residues located in the connection subdomain [109–111].

The structure of the PFV RNase H exhibits the typical fold of an RNase H, which consists of a five-stranded mixed β-sheet flanked by five α-helices (Figure 3). The PFV RNase H structure most closely resembles those of XMRV and HIV-1, even though HIV-1 lacks the basic protrusion [78,112,113]. The catalytic core consists of the highly conserved residues D599, E646, D669, and D740. Helix C precedes the basic loop, which contains four Lys (KKKPLK). On the contrary, in XMRV RNase H, the consecutive basic residues are three Arg, which are part of helix C [78,112]. The structural similarity of the HIV-1 and PFV RNase H was used to examine whether PFV RNase H can serve as a model enzyme for HIV-1 RNase H inhibitors. Indeed, several HIV-1 RNase H inhibitors were identified that also bind and inhibit PFV RNase H at low μmolar concentrations, which are similar to those of the HIV-1 RNase H. Based on NMR binding experiments with PFV, RNase H and the HIV-1 RNase H inhibitor RDS1643 structural overlays with both enzymes, and in silico docking experiments were performed to propose the inhibitor binding site in HIV-1 RNase H [114].

**Figure 3.** Ribbon diagram of the prototype foamy virus (PFV) RNase H structure. The C-helix is highlighted in green; the basic loop in blue. The active site residues D599, E646, D669, and D740 are depicted in red as sticks (pdb: 2LSN).

NMR titration experiments were performed to identify the residues involved in RNA/DNA substrate binding. 1H-15N heteronuclear single quantum coherence (HSQC) spectra of purified 15N-labeled PFV RNase H were recorded after the addition of increasing amounts of substrate. Chemical shift changes indicated that apart from the active site residues, residues in helix B, helix C, and the basic loop participate in binding of the substrate (Figure 3). The orientation of helix C is established by several hydrophobic contacts with helix D. This interaction enables helix C to correctly position the basic loop toward the nucleic acid substrate. Only then can proper RNA cleavage—and, if necessary, specific cleavage—be guaranteed [78].

#### **6. Protease Activity and Structure**

The PR activity of FVs is essential for virus production. When processed Gag in combination with a PR-deficient Pol was provided during virus production, infectious virus particles containing viral DNA were obtained, indicating that PR activity is not absolutely required at cell entry. However, infectivity was reduced to 0.5% to 2% of the wild-type infectivity [23]. Thus, other groups suggest that Gag cleavage is essential for viral infectivity [19,20]. However, PR-mediated Gag processing is absolutely necessary to initiate intraparticle reverse transcription as well as the template switch of reverse transcriptase [23,26]. What is more, Pol processing is essential for genome integration, but not for the RT activity itself [23,72].

Since retroviral PRs have been shown to be only active as dimers and FV PR-RTs are monomeric proteins, the question arises how the activation of PR can be achieved [74,80,115]. The NMR solution structure of the independent SFVmac PR domain (residues 1 to 102) showed that it is a stable monomer and adopts a conformation similar to one subunit of the HIV-1 PR dimer [75,116] (Figure 4). The monomer consists of seven β-strands and a helical turn. The β-strands form a closed barrel-like β-sheet. A β-hairpin is formed by the amino-terminal halves of β4 and β5, which is typical for the so-called flap region of aspartate PRs [75,117]. Similarly to other retroviral PRs, the FV PR domain harbors four characteristic structural features: (a) a hairpin containing the A1 loop, (b) the B loop or the so-called fireman's grip, which includes the conserved amino acid motif DSG (in some PRs DTG) forming the active site in the dimer, (c) an α-helix, and (d) the flap region [75]. Structural analyses of other retroviral PRs revealed that the fireman's grip, the flap, the N-terminal region, and the C-terminal region, which form a four-stranded β-sheet, are involved in dimerization [117], corroborating that the FV PR domain is also able to form dimers.

**Figure 4.** Three-dimensional structure of the SFV from macaques (SFVmac) protease (PR) monomer. The flap region (blue), the a-helix (orange) and the location of the DSG motif (red) forming the active site in the dimer are highlighted (pdb: 2JYS).

Nevertheless, activity of the independent PR domain (1 to 102) as well as of the full-length FV PR-RT could only be achieved using high NaCl concentrations of 2 to 3 M [74,80,118]. The expression of PFV PR as a maltose-binding protein (MBP) or thioredoxin fusion at the N-terminus as well as a C-terminal extension of the PR (residues 1 to 143) appeared to improve the stability of the PR and allowed substrate cleavage, but activity was lost after elimination of the fusion protein [73,118,119]. Based on sequence alignments with HIV-1 PR, single (Q8R, H22L, S25T, T28D) and double (Q8R-T28D, H22L-T28D) mutants of PFV PR were created that harbored amino acid exchanges, making the PR

variants more similar to HIV-1 PR. Urea denaturation revealed an increased stability for most mutants, suggesting that the substitutions promote dimer stability [120].

The putative PR dimerization inhibitor cholic acid inhibited the activity HIV-1 and FV PR, whereas darunavir and tipranavir—which are known to prevent HIV-1 PR dimerization—had no effect on FV PR. Determination of the binding site for cholic acid by 1H-15N HSQC experiments using 15N labeled PR indicated that the inhibitor binds in the putative dimerization interface. Paramagnetic relaxation enhancement (PRE), an NMR method that allows the detection of minor conformational species, finally showed that the FV PR domain is able to form transient homodimers. However, in solution, these dimers constitute only a small fraction of less than 5% [80,81].

Obviously, high NaCl concentrations do not represent the situation in a living cell in which the virus replicates. PR activation of HIV-1 is achieved by the formation of transient PR dimers in the Gag–Pol precursor, which leads to N-terminal autoprocessing [121]. Since FVs express Gag and Pol separately, and Pol can only be taken up into the virus particle by binding to the pregenomic viral RNA, it isobvious that FVs developed a different mechanism for PR activation. A PR-activating RNA motif (PARM) was identified in the cis-acting CasII sequence of the RNA, which includes the A and B elements of the purine-rich sequences located at the 3 end of *pol* (Figure 1) [122].

The addition of PARM RNA to PFV PR-RT initiates substrate cleavage. The corresponding DNA does not lead to PR activation. Truncated PARM RNA or the addition of only the A or B element RNA to the assay also resulted in a loss of PR activation. Gel shift experiments with the PARM RNA and PFV PR-RT showed that the enzyme oligomerizes upon RNA binding [122]. Determination of the PARM RNA secondary structure using selective 2 hydroxyl acylation analysed by primer extension (SHAPE) revealed that both the A and B elements are located in a stem-loop structure of ca. 15 nucleotides in length. PARM enables the formation of proteolytically active PR-RT dimers (Figure 5). It might also be possible that only the PR domains of two full-length enzyme molecules dimerize upon PARM binding [122]. The data suggest that in the host cell, the PR domain in the Pol precursor is inactive until enough viral RNA is produced. PR activation can only be achieved during packaging upon binding of the RT domain to the PARM of the pregenomic RNA. The IN domain of Pol is not required for PR activation [55]. This order of events creates a regulatory mechanism by which premature Pol or Gag processing can be avoided.

**Figure 5.** Model of protease (PR) activation upon binding to the PR-activating RNA motif (PARM). Both the A and B elements are required. They are involved in stem structures to which two PR-RT molecules can bind. Upon interaction of the RT domain with the RNA, the PR domains of the two PR-RTs can dimerize. Colors as in Figure 2.

#### **7. Resistance of FV PR-RT against RT Inhibitors**

The only known RT inhibitors that impair PFV replication are tenofovir and azido-3 deoxythymidine (AZT, zidovudine). The addition of 5 μM of AZT to cell cultures are sufficient to prevent virus propagation [45,123,124]. Attempts to generate AZT resistant FV were only successful with SFVmac, but not with SFV from chimpanzee (SFVcpz) or PFV. This is quite astonishing, since the amino acid sequences of the polymerase domains of PFV and SFVmac are 84.5% identical.

Four amino acid substitutions in the RT domain of SFVmac have been identified that together confer high-level resistance to AZT: K211I, I224T, S345T, and E350K (Figure 6). The I224T substitution is probably a polymorphism that does not contribute directly to AZT resistance, but is important for regaining polymerization activity and viral fitness [76,125]. Two different AZT resistance mechanisms have been shown for HIV: HIV-2 is able to discriminate between the natural triphosphate TTP and the phosphorylated inhibitor AZTTP, whereas the major mechanism in HIV-1 is based on the removal of the already incorporated chain-terminating AZTMP in the presence of ATP [126]. In SFVmac, the AZT-resistant RTs can also remove the incorporated AZTMP more readily than the wild-type enzyme in the presence of ATP. The PR-RT harboring the single amino acid exchange S345T is the only single substitution variant exhibiting significant AZTMP excision activity. Excision efficiency doubles when K211I is present together with S345T or E350K [127].


**Figure 6.** Sequence alignment of the regions of human immunodeficiency virus (HIV)-1 and SFVmac RT, showing the azido-3 deoxythymidine (AZT) resistance amino acid exchanges. The amino acids conferring AZT resistance are shown in red for HIV-1 (M41L, D67N, K70R, T215Y/F, K219E/Q) and blue for SFVmac RT (K211I, I224T, S345T, E350K), respectively. The amino acids of the polymerase active site are highlighted by an orange box. The amino acid identity is 26.2%.

In AZT-resistant HIV-1 RT, the aromatic amino acid exchange T215F/Y allows π–π stacking interactions with the adenine ring of ATP and thus more efficient AZTMP excision [128,129]. In AZT-resistant SFVmac RT, instead of acquiring an aromatic residue, the most important substitution is S345T. NMR 1H-15N HSQC experiments with truncated wild-type and resistant SFVmac RTs harboring the fingers and palm subdomains of the polymerase were recorded in the absence and presence of ATP. Comparison of the spectra revealed that a Trp residue is involved in ATP binding in

the S345T variant, which is obscured in the wild-type enzyme, suggesting a direct contact of ATP via π–π stacking interactions similar to HIV-1 RT.

#### **8. Outlook and Persepectives**

The life cycle of FVs differs from that of conventional retroviruses in various aspects. Several of the molecular details have been elucidated that make us aware of the differences that have developed during the evolution of FVs. The structure of some FV proteins is already known: PR, RNase H, and IN, as well as parts of the Gag protein [75,78,130–134]. The structure of the full-length FV PR-RT is still missing, but would contribute greatly to our understanding of RTs in general. So far, the only monomeric RT 3D structures known are those of XMRV RT and the closely related MLV RT [70,71]. In addition, the crystal structure of the yeast retrotransposon Ty3 RT has been solved, which is a monomer in solution, but dimerizes upon substrate binding [135]. In order to fully understand the function and mechanistic details of FV proteins and enzymes, more structural and functional information is urgently needed.

**Funding:** This work was supported by the University of Bayreuth.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Structural Insights on Retroviral DNA Integration: Learning from Foamy Viruses**

#### **Ga-Eun Lee 1,**†**, Eric Mauro 2,3, Vincent Parissi 2,3, Cha-Gyun Shin <sup>1</sup> and Paul Lesbats 2,3,\***


Received: 6 August 2019; Accepted: 20 August 2019; Published: 22 August 2019

**Abstract:** Foamy viruses (FV) are retroviruses belonging to the *Spumaretrovirinae* subfamily. They are non-pathogenic viruses endemic in several mammalian hosts like non-human primates, felines, bovines, and equines. Retroviral DNA integration is a mandatory step and constitutes a prime target for antiretroviral therapy. This activity, conserved among retroviruses and long terminal repeat (LTR) retrotransposons, involves a viral nucleoprotein complex called intasome. In the last decade, a plethora of structural insights on retroviral DNA integration arose from the study of FV. Here, we review the biochemistry and the structural features of the FV integration apparatus and will also discuss the mechanism of action of strand transfer inhibitors.

**Keywords:** retrovirus; foamy virus; integrase; integration

#### **1. Introduction**

The *retroviridae* family is a large group of viruses containing seven genera (alpha, beta, gamma, delta, epsilon lenti, and spuma-virus). The deltaretrovirus and lentivirus genera contain the two major human pathogens, Human T-Lymphotropic Virus (HTLV-1) and Human Immunodeficiency Virus-1 (HIV-1), respectively. One feature that distinguishes retroviruses from the other viruses is the ability to integrate their linear double stranded DNA into host cellular chromatin. This essential activity is catalyzed by the virally encoded integrase (IN) protein and will lead to the covalent insertion of the provirus into the host genome [1]. The mechanism of retroviral integration is also shared by numerous prokaryotic and eukaryotic mobile DNA elements to mobilize genetic information between and within genomes. Moreover, retroviral integrases are closely related to the DD(E/D) polynucleotidyl transferase family of DNA transposases [2]. Although the DNA cutting and strand transfer reactions occur through a similar mechanism between these genetics elements, the structure of DNA to be mobilized differs, i.e., IN cannot act on an already integrated DNA molecule and requires linear DNA to carry out the two essential sequential events, 3 processing, and strand transfer [3–5]. These processes take place in the context of a nucleoprotein complex called intasome, consisting of the two viral DNA (vDNA) ends and a multimer of IN [6,7]. While the function of retroviral integrases is well described, the molecular mechanisms involved were, for a long time, hampered by the lack of structural information. The propensity of many retroviral integrase to self-associate into high order aggregates in vitro has been a factor limiting structural endeavors. Conversely, FV integrase like prototype foamy virus (PFV) was shown to be very amenable for structural biochemistry and was the source of many breakthroughs on the comprehension on the molecular basis of retroviral integration and strand transfer inhibitors resistance [8–11].

#### **2. Biochemistry of Foamy Virus Integration**

Biochemical studies of retroviral integration started with the purification of preintegration complexes (PIC) from infected cells [12,13]. Such complexes can perform vDNA integration into target DNA in vitro. Analysis of the intermediates produced during these integration reactions uncovered the two activities catalyzed by retroviral integrase: 3 processing and strand transfer (Figure 1) [3,4]. The resulting integration products generate a single strand gap and a two-nucleotide overhang that will be repaired by cellular proteins to complete the integration reaction.

**Figure 1.** DNA cutting and joining steps catalyzed by retroviral integrases. During 3 processing (**left**) the integrase removes two (or three) nucleotides from the 3 ends to expose a conserved terminal CA dinucleotide. The 3 hydroxyl groups (red OH) will be used in the second step (**right**) to attack the phosphodiester bonds on each target DNA strand.

During 3 processing, retroviral integrase cleaves two (or, depending on the in vitro conditions, three [14,15]) nucleotides on the 3 ends of the U3 and U5 vDNA long terminal repeats (LTR). This sequence-specific reaction, a nucleophilic attack by a water molecule, liberates a recessed 3 hydroxyl group adjacent to an invariant CA dinucleotide [5]. Foamy virus 3 processing occurs asymmetrically, modifying only the U5 end as the U3 extremity generated after reverse transcription constitutes a bona fide substrate for integration [16,17]. In contrast, the U5 extreme dinucleotides are necessary during the first strand of reverse transcription but have to be cleaved off for integration. During the strand transfer step, the intasome binds host chromosomal DNA, forming the target capture complex (TCC), and utilizes the 3 hydroxyls as nucleophiles to cut and join simultaneously both 3 vDNA ends to apposing DNA strands with 4–6 bp stagger (4 in the case of FV).

Recombinant retroviral integrases are very efficient at catalyzing 3 processing and strand transfer reactions in vitro [18–20]. However, the bulk of strand transfer products obtained are generally the result of unpaired products, also called half site integration. Recombinant PFV integrase became a standard model to investigate retroviral integration, as it appeared far more proficient at paired full-site integration. PFV integrase is more soluble in vitro than HIV-1 IN, but the exact biochemical reasons underlying these differences are unclear. Interestingly, comparison of in vitro IN enzymatic reaction conditions among FVs, such as substrate specificity, cofactor usage, and target commitment, showed that the feline foamy virus (FFV) IN has a broader range of substrates and cofactor than other FV INs [21]. FFV IN cleaved PFV U5 LTR substrate, as well as FFV U5 LTR substrate, during in vitro 3 processing reaction, but not vice versa. The internal six nucleotides in front of terminal CA dinucleotide are identical between the two substrates, indicating that the FFV IN has low substrate

specificity compared with PFV IN. Mn2<sup>+</sup> or Mg2<sup>+</sup> ions are known as essential cofactors of IN enzyme activities, and in vitro IN activities appear most effectively in the presence of Mn2+. Previous studies have reported that multimerization of HIV-1 IN was promoted by Ca2<sup>+</sup> as well as Mn2+, although Ca2<sup>+</sup> could not substitute in strand transfer reaction [22]. Interestingly, Zn2<sup>+</sup> and Ca2<sup>+</sup> divalent cations were found to act in FFV 3 processing in the absence of Mn2<sup>+</sup> ion, and their inductions of enzymatic reactions were concentration-dependent. Moreover, like FFV IN, PFV integrase was shown to be fairly lax for divalent cations and target DNA commitment. Indeed, while HIV-1 integrase was shown to commit to substrate DNA within 1 min, PFV integrase took more than an hour [23]. Moreover, the same group performed single molecule experiments using PFV intasomes to investigate the mechanics of target DNA capture and catalysis. Using single molecule total internal reflection fluorescence (smTIRF) microscopy, individual PFV intasome were visualized on naked DNA [24]. Theoretical dynamic modelling showed a 1D rotation-coupled translational diffusion of PFV intasome along DNA. 1D diffusion is a phenomenon exploited by many proteins to scan for sequences, lesions, or structures on nucleic acids. Remarkably, this target DNA searching process is very often non-productive as few integration events were recorded, even in the presence of favored PFV integrase sequences. Instead, since PFV intasome prefers supercoiled DNA as the target substrate [8,24], the authors suggested an additional search for DNA conformation rather than sequence alone. However, the question of the search process on the nucleosomal chromatin template remains to be investigated.

#### **3. Domain Organization of Retroviral Integrase**

All retroviral IN contain three conserved folded domains that were initially identified using limited proteolysis on HIV-1 IN [25]: the N-terminal domain (NTD), the catalytic core domain (CCD), and the C-terminal domain (CTD). In addition, *spumaretrovirinae* (as well as epsilon and gammaretroviral) integrases harbor a ~40 residues NTD extension domain (NED) (Figure 2A).

The first structural features of individual domains were obtained using nuclear magnetic resonance (NMR) and X-ray crystallography. The structure of HIV-1 and HIV-2 NTD was determined using NMR and shows 3-helical bundles coordinating a single zinc atom via the side chains of a HisHisCysCys (HHCC) motif [26,27] (Figure 2B). The structure confirms the importance of the zinc as an IN cofactor, and also the location of the conserved His and Cys residues involved in the chelation of metal. The CTD structure was also solved in solution by NMR and revealed a high similarity with Src homology 3 (SH3)-like beta barrel and Tudor domains [28,29] (Figure 2D). The NTD and CTD domains play important roles in substrate recognition and assembly of intasome. They are connected to the CCD via flexible linkers whose size varies among retroviral genera. The CCD contains the active site of the enzyme with the invariant D,D-35-E motif. The crystal structure of HIV-1 IN CCD showed a nucleotidyltransferase fold, which is shared with several prokaryotic and eukaryotic transposases, recombinases, and resolvases [30,31]. The structure revealed a dimer of CCD with an extensive interface. The two active sites are facing outward, opposite to each other, and separated by approximately 35 Å. This distance is incompatible with a functional concerted integration of the two viral ends across a major groove of the target DNA that is around 17 Å in a canonical B-form (Figure 2C). Following this observation and the similarity with the mechanistically related transposases [32–34], it appeared clear that an IN multimer must be involved in vDNA concerted integration. Biochemical analysis of IN from various genera failed to establish a relationship between their oligomeric states in solution and the formation of active complexes once bound to their cognate DNA substrates. The breakthrough came from PFV integrase. Monomeric in solution, highly soluble, and exceptionally efficient in catalysis in vitro, this model was the first functional retroviral IN.DNA complex amenable to structural characterization.

**Figure 2.** Domain organization of retroviral integrases. (**A**) Schematic of the retroviral IN domain sequences shown as boxes. Isolated domain structures of HIV-1 NTD (**B**), (PDB 1WJC), CCD (**C**), (PDB 1ITG), CTD (**D**), (PDB 1IHV). Chains are shown in cartoon, except active site residues Asp64 and Asp116, which are shown as red sticks (Glu152 residues are disordered and not visible in the structure).

#### **4. Architecture of the PFV Intasome**

Determined by X-ray crystallography, the structure of the PFV intasome fundamentally changed the landscape in the field of retroviral integration, as it could both unravel the functional architecture of the integration apparatus and elucidate the mechanism of action of HIV strand transfer inhibitors [9].

The PFV intasome revealed a tetramer of integrases synapsing a pair of vDNA ends. The tetramer consists of a dimer of dimer with two structurally distinct subunits (Figure 3A). The inner subunits mediate all the protein–protein, protein–DNA contacts in an extended conformation and host the active sites to catalyze the 3 processing and strand transfer reactions. The inner integrases interact via intermolecular NTD−CCD contacts, and by the insertion of a pair of CTDs that rigidly bridge the two halves of the intasome between the CCDs. The outer subunits connect the inner protomers via the canonical CCD–CCD interface. Although the respective positions of the outer NTDs and CTDs are not resolved in the intasome structures published to date, some hints were obtained using SAXS/SANS analysis of PFV intasome [35]. These domains are dispensable for PFV intasome assembly and in vitro activity [36] but they are suspected to provide additional stabilizing interaction with vDNA and/or cellular cofactors. However, the outer CTDs appear to promote aggregation in vitro, as further experiments using intasome lacking the outer domains have shown an increased stability and activity on naked DNA. Solving the structure of the PFV intasome reinforced the hypothesis that the tetrameric architecture was the functional multimer of HIV-1 intasome. Yet, more recently, four additional structures from orthoretroviral intasome; α-retroviral Rous sarcoma virus (RSV) [37], β-retroviral mouse mammary tumor virus (MMTV) [38], lentiviral maedi-visna virus (MVV) [39], and lentiviral HIV-1 [40] were reported, revealing a variety of architectures (see [41] for a more detailed review) (Figure 3B). First, RSV and MMTV intasomes structures solved by X-ray crystallography and Cryo-EM, respectively, revealed an octameric assembly. A core tetramer (called conserved intasome

core, CIC [41]) is positioned similarly as in PFV intasome, with the conserved inner catalytic protomers flanked by outer monomer subunits. The position of the synaptic CTDs bridging both halves of the intasome is conserved in the octameric structures, but due to the small size of the CCD–CTD linker, they cannot be supplied by the inner protomer and come from the flanking dimers. Indeed, while in PFV IN the CCD–CTD linker is fifty residues long, in α and β retroviral INs, they are only eight amino acids long. Interestingly, the size of this linker varies among retroviral genera and may predict the requirement for additional oligomers to support CIC assembly [38].

**Figure 3.** Architecture of PFV and related retroviral intasomes. (**A**) PFV intasome shown in two orthogonal views (PDB 4E7H) with individual domains indicated. Inner IN subunits are colored pale green and light blue, and the outer subunits are in orange. (**B**) Comparison of retroviral intasomes structures (RSV PDB: 5EJK, MMTV PDB: 3JCA, MVV PDB: 5M0Q). Complexes are viewed from below the active site. The conserved intasome core, CIC, is colored as in (A), synaptic CTDs are in red, and flanking subunits are in light grey.

In the case of lentiviral (and δ-retroviral) INs, the size of the CCD–CTD linker is around twenty residues. However, it adopts a compact alpha-helical structure, which is predicted to be incompatible to allow the formation of a minimalist CIC [42].

Fusing HIV-1 IN with the DNA binding domain Sso7d [43] promoted its solubility as well as its in vitro activity [44], allowing the assembly of a complex that could be structurally characterized by Cryo-EM. The structure of the HIV-1-Sso7d intasome revealed a tetramer competent for integration [40]. However, the CCD–CTD linker could not be seen on the electron density map, and assembly of an intasome using HIV-1 IN cofactor lens epithelium-derived growth factor (LEDGF/p75) integrase

binding domain (IBD) to stabilize higher-order species revealed a dodecameric structure. In this complex, the core intasome is assembled between two tetramers with a flanking dimer inserting the synaptic CTDs.

The MVV intasome was assembled using wild type integrase proteins and shows a hexadecameric structure (a tetramer of tetramers). Here again, the catalytic core is formed by the CIC. Overall, both intasome architecture are similar and resume the CIC formation. It has been suggested that the extra fusion domain Sso7d in HIV-1 intasome, which cannot be seen in the EM density, may disrupt the dimer–dimer interaction in the flanking HIV-1 IN tetramer, and therefore result in a dodecameric structure, while MVV intasome displays a hexadecamer.

#### **5. Structural Basis for Target DNA Capture**

Co-crystallization of the PFV intasome with its target DNA (tDNA) allowed the visualization of both target capture complex (TCC) and strand transfer complex (STC) before and after the reaction, respectively [10,45]. The tDNA binds along the groove created by the two inner subunits, right below the active site (Figure 4A). The intasome does not undergo significant structural rearrangements to accommodate the tDNA, which is severely bent. This deformation is maximal at the center of the integration site, with the widening of the major groove to 26.3 Å. This separation allows the scissile phosphodiester to fit into the active site for in line nucleophilic attack. Because DNA bendability is in large part dictated by the nature of the dinucleotide step, with pyrimidine–purine (YR) being the most flexible and purine-pyrimidine (RY) being the least, it is then not surprising that PFV integration sites are naturally biased towards more flexible pyrimidine–purine dinucleotide at the central position. As expected, due to the low selectivity of tDNA sequence, the majority of contacts between the intasome and tDNA are mediated through the phosphodiester backbone [10], except CCD residue Ala188 and CTD Arg329, that make base-specific contacts. Ala188 makes van der Waals interaction with cytosine at position 6, whereas Arg329 interacts with guanosine 3, guanosine −1, and thymine −2 through hydrogen bonds (Figure 4A, right). Interestingly, these two residues interact with all the consensus bases flanking the flexible central YR dinucleotide. Consequently, PFV IN Ala188 and Arg329 mutants showed in vitro strand transfer defects, as well as new sequence selectivity. The importance of these contacts has been validated for HIV-1 integrase, as mutating Ser119 (the structural equivalent of PFV IN Ala188) showed altered strand transfer and modified sequence selectivity [46–48].

In eukaryotes, host target DNA is compacted within chromatin that strongly distorts DNA around nucleosomes. PFV intasome showed strong integration activity when supplied with purified or recombinant human mononucleosomes [11,49]. Isolation of a stable complex of the PFV intasome and recombinant mononucleosome permitted the characterization by cryo-electron microscopy (Cryo-EM) of the TCC and a nucleosome core particle at 8 Å resolution [11] (Figure 4B). The crystal structures of the intasome and the nucleosome can be unambiguously docked into the electron density map. The intasome harbors the classical tetramer with the two types of subunits. No additional density is seen compared to the previous intasome crystal structures. The intasome sits on nucleosomal DNA above one of the H2A–H2B dimers and makes an extensive nucleosome–intasome interface involving three IN subunits, both turns of the nucleosomal DNA, and one H2A–H2B dimer. The carboxy-terminal helix of H2B is directly poking toward the intasome and is surrounded by a triad of loops from the inner subunits. Integrase residues Pro135, Pro239, and Thr240 wrap the C-terminal helix of H2B (Figure 4B, left) and the double substitution P135E/T240E strongly affected nucleosome binding and nucleosome strand transfer activity. The histone H2A shows density from its N-terminus reaching out to the inner IN CTD, and deletion of the first twelve H2A residues abolished intasome binding and decreased strand transfer activity into nucleosome. Further mutagenesis uncovered a role for the intasome outer domains, specifically the outer CTDs, as its deletion reduced the ability to bind nucleosomes. Additional important contacts between the intasome and the nucleosome involve the canonical CCD–CCD interface and the second gyre of nucleosomal DNA (Figure 4B, right). Residues

Q137, K159, and K168 are located in the vicinity of the contacts with the second gyre of DNA, and their substitution affected nucleosome binding and integration activity in vitro.

**Figure 4.** Target DNA capture. (**A**) Crystal structure of the target capture complex TCC (PDB: 3OS1) with sequence specific target DNA interactions shown as a blow up. Arg329 making contacts with guanosine 3, −1, and thymine −2, as well as Ala188 making contact with cytosine 6, are shown as red sticks. (**B**) Structure of the PFV intasome–nucleosome complex displayed as pseudoatomic model by docking PFV intasome (PDB 3L2Q) and nucleosome (PDB 1KX5) structures into the Cryo-EM map (EMDB ID 2992). Histones H2A are colored in yellow, H2B in red, H3 in blue, and H4 in green. IN contacts with H2B (**left**) and with the second gyre of nucleosomal DNA (**right**) are shown as zoomed boxes.

Most striking is the path of DNA captured within the tDNA-binding groove of the intasome. When compared to its structure on a native nucleosome, the captured DNA is kinked and lifted from the surface of the histones, perfectly matching the strong bending seen on the PFV intasome capture complex [11]. The multivalent intasome–nucleosome interactions may aid to reach the energy state required to deform nucleosomal DNA beyond its ground state, and seems to be the only determinant required as, more recently, Yoder and colleagues demonstrated that unwrapping DNA-histones modifications in the vicinity of the intasome integration sites does not impact nucleosome capture [50].

#### **6. Mechanics of PFV Intasome Active Site**

Because the IN catalysis requires divalent metal ion cofactor, it has been possible to freeze the PFV enzyme in different ground states before 3 processing and strand transfer [45] (Figure 5). Both reactive and non-reactive strands of the vDNA are separated via the intrusion of the residues Pro214-Gly218, stacking against the adenine base, leaving three bases unpaired. The scissile dinucleotide phosphodiester backbone makes hydrogen bonds with Tyr212 and Gln186, while the adenine and thymidine bases contacts with the IN are limited to Van der Waals interactions. The binding of the two Mn2<sup>+</sup> ions in the active site induces a shift of the scissile phosphodiester toward the catalytic triad DDE. The metal ion A is in a near perfect octahedral coordination. It comprises oxygen atoms from

Asp128 and Asp185, the pro-Sp oxygen atom of the scissile phosphodiester and three water molecules, one of them positioned for in-line nucleophilic attack on the scissile CA\AT phosphodiester bond. Both oxygen atoms of Glu221 and one from Asp128 coordinate metal B, as well as one water molecule, a bridging oxygen atom of the scissile phosphodiester and a non-bridging pro-Sp oxygen shared with metal A. This non-ideal environment for metal B may aid scissile phosphodiester bond destabilization during catalysis. Before 3 processing, the distance between the two metal ions is 3.9 Å, and changes to 3.1 Å after dissociation of the dinucleotide. This metal ions movement has been also described in the RNase H active site and was suggested to allow the nucleophilic water to approach the scissile phosphodiester [51]. In the active site, the metal cofactors move further apart from each other (from 3.1 Å to 3.8 Å) upon target DNA capture. The roles of both metal ions changes between 3 processing and strand transfer. Metal A and metal B coordination with active site residues stays unchanged, as well as the sharing of the pro-Sp oxygen atom from the target phosphodiester. Accordingly, metal A destabilized the target phosphodiester scissile bond by interacting with the 3 -bridging oxygen atom while metal B activates and positions the 3 OH of the vDNA for nucleophilic attack. After strand transfer catalysis, both metal ions move closer to approximately 3.2 Å.

**Figure 5.** Mechanics of PFV 3 processing and strand transfer. Top panel, a close up of PFV intasome active site during 3 processing. Superimposition of intasome structures before 3 processing with and without bound manganese Mn2<sup>+</sup> (grey spheres A and B) (**left**) and after cleavage (**right**). Relocation of the scissile phosphodiester upon metal binding is shown with a red arrow. The red spheres illustrate the water molecules, and the nucleophile water molecule is shown as a big red sphere. Bottom panel, strand transfer activity upon target DNA binding. The nucleophilic attack is shown with red dashes.

Overlaying the TCC and the STC structure shows that the overall DNA conformations do not change, except the position of the phosphodiester linking the tDNA to vDNA, which is shifted away from the active site. Integrase apply a significant torsional stress to the tDNA, likely providing the displacement force, which is relieved upon cutting of the target phosphodiester bond. This ejection prevents any reversible reaction that would lead to unfruitful viral infection. A soaking experiment with metal cofactor showed an apparent loss of metal B binding affinity after strand transfer, probably due to the ejection of the DNA from the active site. Interestingly, such a tDNA kink within the active site is important for other transpososomes activity like Hermes [34,52], MuA [53], Tn10 [54], and IS231A [55]. This could be an evolutionary conserved feature of DNA transposition apparatus in order to prevent any reversal reaction, while being competent to access tDNA scissile phosphodiester.

#### **7. PFV Intasome and HIV-1 Strand Transfer Inhibitors**

Human immunodeficiency virus type 1 (HIV-1) IN has been widely considered as an important target protein for novel anti-acquired immune deficiency syndrome (AIDS) drugs [56]. Based on biochemical assay and biophysical analysis, several classes of retroviral IN inhibitors have been discovered over the last 25 years [57–60]. Hydroxylated natural products and their derivatives were developed, and the most important IN inhibitor family, diketo acids (DKA), emerged [59]. Integrase strand transfer inhibitors (INSTIs) are one of active site inhibitors against HIV-1 integration that act by preventing the strand transfer reaction; however, numerous significant developments and rational designs of INSTIs were reported during recent years. Raltegravir (RAL) was the first INSTIs approved by the United State food and drug administration (FDA) in 2007 [61], providing a new option for highly active antiretroviral therapy (HAART). After that, elvitegravir (EVG) and dolutegravir (DTG) have been approved [62,63] (Figure 6A). RAL, EVG, and DTG belong to the bioisosteres compounds of DKA. DKA derivatives, which contain a 1,3-dicarbonyl aromatic ring, are a class of highly effective HIV-1 INSTIs where the 1,3-dicarbonyl group seizes two Mg2+ ions, preventing the metal ion-mediated retroviral integration [64–66]. More recently, two new molecules, bictegravir (BIC) and cabotegravir (CAB), have been developed [67,68]. Bictegravir was approved by the FDA in early 2018 and is being used as a combination drug. Cabotegravir is currently in phase III development. BIC and CAB are structurally similar to DTG with their tri-cyclic central pharmacophores (Figure 6A), but the latter offers an improved half-life [69].

Despite an increasing drug arsenal, the experimental data related to full-length, wild type HIV-1 intasomes structures are rare. As an alternative, PFV intasome has been adopted for anti-AIDS drug development. A comparison of the CCD structures between HIV-1 and PFV showed that both conserved unique structural features, such as the host cellular factor binding faces and the organization of the active site [8,9,30]. A recent NMR study using the CCD of HIV-1 IN showed that the HIV-1 and PFV IN flexible loops (residues 140–149 in HIV and 209–218 in PFV) are almost similar, and structure prediction of the HIV integrase intasome provided further evidence for the similarities between the active amino acid resides of the PFV and HIV INs [70,71]. Johnson et al. generated a corresponding HIV-1 IN model from the PFV IN crystal structure and they predicted the in vitro anti-INSTI activities using molecular docking and molecular dynamics simulation [72]. Despite the limited sequence similarity and different intasome architecture features (lentiviral: tetramer-of-tetramer, PFV: dimer-of-dimer), PFV IN was highly sensitive to HIV INSTIs [8,73], suggesting that INSTIs target the most conserved regions of IN-DNA complexes. Recently, many studies using PFV IN as a surrogate model in order to investigate HIV-1 INSTIs have been published. Some groups investigated the consistency between in vitro and in vivo resistance profiles for RAL using PFV IN structures mutated at the corresponding to HIV-1 IN active site, Q148, and N155 [74,75]. Hare et al. confirmed the interaction of the pharmacophores of PFV intasome with two metal ions at the IN active site, and they also investigated an interaction between the bound vDNA end and the benzyl group through cocrystal structures of RAL and EVG [9]. Also, they obtained a crystal structure of PFV IN complexed with vDNA and DTG [76] (Figure 6B) (See [1] for more details on the structural basis for INSTIs). Johnson et al. designed a series of INSTIs based on the previous target model through homology modeling and structural superposition method. In their modeling system, the junction between the CCD and CTD adopts a helix-loop-helix motif, which is similar to the corresponding segment of PFV IN [77].

**Figure 6.** Integrase strand transfer inhibitors. (**A**) Chemical structures of INSTIs. (**B**) Structure of PFV active site with or without DTG. By positioning into the active site, the INSTI engages the metal cofactors and induces a shift of the 3 reactive hydroxyl (red circle) out of position incompatible for strand transfer.

Hu et al. investigated the inhibitory mechanism of RAL and the recognition of DKA inhibitors with PFV-IN via molecular dynamics and molecular docking methods, and they validated the HIV-1 inhibitor screening platform [73,78]. Du et al. proposed the crystal structure of PFV-IN DNA as a potential HIV-1 INSTI screening platform through a structural biology information survey [79]. They also investigated the molecular recognition system of PFV IN, using six naphthyridine derivatives inhibitors through molecular docking, molecular dynamics simulations, and water-mediated interactions analyses. Besides, there are a lot of studies using PFV intasome to explore the binding mode of compounds for new HIV IN inhibitors. These results have implications for the rational design of HIV-1 IN targeting specific INSTIs with improved affinity and selectivity [80,81].

Some studies have raised doubts on HIV-1 IN inhibitor screening platforms using PFV-IN, indicating that the HIV-1 IN system behaves differently from PFV in terms of folding, recognition, and hydrophobicity of the tDNA binding site, and stability [82]. Although conformational changes and the energy landscape are still unclear, the molecular docking and molecular dynamics study validates the reliability of the platform and reestablishes PFV IN as one of the most credible surrogate model for HIV-1 INSTIs studies and anti-AIDS drug development based on IN structure. Nevertheless, thanks to Cryo-EM advances, future high-resolution structures of primate lentiviral integrases will be of great interest to further improve the structural basis of INSTI mechanisms and development.

#### **8. Conclusions and Perspectives**

As an important therapeutic target and molecular tool, retroviral integrase is having a lot of attention from the scientific community. Intensive biochemical studies gave important insights on the functional architecture of the viral enzyme and, little by little, the structural counterpart emerged: from individual domains to active intasomes bound to a nucleosome. The publication in 2010 of the first retroviral intasome structure from PFV was the starting point of a decade-long period of exciting and insightful research on the integration process. The recent revolution in single particle cryo-electron microscopy significantly increased the repertoire of retroviral intasome structures now available that highlight both the conservation and diversity in the architectures. Conservation, because the presence on all retroviral intasome of a PFV-like intasome CIC hosting the catalytic subunits is quite striking, and diversity being on the variety of oligomers needed for the whole assembly. It will be of great interest to expand the catalogue of known intasome structures from the remaining retroviral genera, but also to further investigate new structures derived from wild type primate lentiviral integrases to better understand HIV-1 strand transfer inhibitors.

Many open questions will surely keep the fire of retroviral integration research vivid, notably, what is the precise chronology of intasome assembly during infection. Indeed, HIV-1 virion packages around 250 molecules of integrase, which is far more than needed from the recent structures of lentiviral intasomes. Also, although the structure of the PFV intasome bound to a nucleosome afforded important information on the chromatin capture by retroviral intasomes, the requirement for histones might differs from genus to genus [83,84], highlighting the need for additional structures of intasomes bound to nucleosomes. Additionally, early chromatinisation of retroviral pre-integration complexes has emerged as a feature of two retroviral genera [85,86]. Future studies will be required to determine the functional importance and the conservation among integrative mobile elements and, notably, Foamy viruses.

**Funding:** This work was supported by a grant from the National Research Foundation of Korea (NRF) funded by the Korean government (NRF-2018R1D1A1A09081872) to Cha-Gyun Shin.

**Acknowledgments:** We thank Dmitry Lyumkis for providing coordinates of the higher-order HIV-1-IN-Sso7d intasome structure.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
