**Binding of CCCTC-Binding Factor (CTCF) to the Minute Virus of Mice Genome Is Important for Proper Processing of Viral P4-Generated Pre-mRNAs**

## **Maria Boftsi 1, Kinjal Majumder 2, Lisa R. Burger <sup>2</sup> and David J. Pintel 2,\***


## Academic Editor: Giorgio Gallinella Received: 23 October 2020; Accepted: 25 November 2020; Published: 30 November 2020

**Abstract:** Specific chromatin immunoprecipitation of salt-fractionated infected cell extracts has demonstrated that the CCCTC-binding factor (CTCF), a highly conserved, 11-zinc-finger DNA-binding protein with known roles in cellular and viral genome organization and gene expression, specifically binds the genome of Minute Virus of Mice (MVM). Mutations that diminish binding of CTCF to MVM affect processing of the P4-generated pre-mRNAs. These RNAs are spliced less efficiently to generate the R1 mRNA, and definition of the NS2-specific exon upstream of the small intron is reduced, leading to relatively less R2 and the generation of a novel exon-skipped product. These results suggest a model in which CTCF is required for proper engagement of the spliceosome at the MVM small intron and for the first steps of processing of the P4-generated pre-mRNA.

**Keywords:** parvovirus; minute virus of mice; RNA processing; gene expression

## **1. Introduction**

Parvoviruses are small (20 nm) non-enveloped icosahedral viruses that infect and cause disease in many vertebrate hosts. They are unique among all known animal viruses in that they contain ~5 kb single-stranded linear DNA genomes, with inverted terminal repeats at both ends, which form hairpin structures and serve as origins of replication [1].

The viral genome is organized into two overlapping transcription units, producing three major transcript classes, R1, R2, and R3 [2]. Transcripts R1 (4.8 kb) and R2 (3.3 kb), generated from a promoter (P4) near the left-hand end of the genome, encode the two non-structural proteins NS1 (83 kDa) and NS2 (24 kDa), respectively [3]. Transcripts R3 (3.0 kb), generated from P38 promoter, encode the two viral capsid proteins, VP1 (83 kDa) and VP2 (64 kDa), utilizing the open reading frame (ORF) in the right half of the genome [4,5]. A small, overlapping intron, common to both P4- and P38-generated transcripts, utilizes two donors, D1 (nt 2280) and D2 (nt 2317) and two acceptors, A1 (nt 2377) and A2 (nt 2399), which are alternatively used to produce nine different spliced mRNA species [2,6–8]. A large upstream intron, present in the pre-mRNAs generated from the P4 promoter, is either retained (R1 transcript class) or excised (R2 transcript class) in mature mRNAs [9].

Upon initiation of parvovirus replication, MVM forms distinct foci in the nucleus termed autonomous parvovirus-associated replication (APAR) bodies where active transcription of viral genes and viral replication takes place [10]. The viral replicator protein NS1 co-localizes with the replicating viral genome in APAR bodies, where DDR sensor and response proteins, host replication factors, and cell cycle regulators also reside [10–12].

Using a novel adaptation of high-throughput chromosome conformation capture assay, V3C (Viral Chromosome Conformation), sites on the cellular genome where MVM localizes for replication have been mapped. These cellular regions preferentially accrue DNA damage in uninfected as well as MVM infected cells, and are also constituent parts of chromosomal substructures called Topologically Associating Domains (TADs) [13,14]. These are large, megabase-sized genomic regions, which are defined by preferential interactions within them and thus are relatively insulated from neighboring regions [15,16]. The boundaries of TADs are enriched for binding sites of CCCTC-binding factor (CTCF), a highly conserved, 11-zinc-finger DNA-binding protein, which along with cohesin, play a key role in the formation and maintenance of topological domains [15,17,18]. In addition to its role in genome organization, CTCF regulates key aspects of gene expression, including transcriptional activation/repression, and enhancer/promoter insulation, by facilitating long-range chromatin interactions via looping [19,20]. Apart from its DNA-binding activity, it was reported that CTCF can bind RNA and that CTCF-RNA interactions can participate in CTCF-mediated chromatin loop formation and subsequent regulation of gene expression [21,22]. Emerging evidence suggests that CTCF also regulates gene expression at the level of mRNA splicing. More specifically, CTCF has been shown to promote inclusion of weak upstream exons in the mRNA of CD45 gene by mediating local RNA polymerase II pausing [23]. Moreover, a more recent study showed that CTCF-mediated intragenic chromatin looping facilitates inclusion of exons in spliced mRNA by bringing exons in physical proximity, providing a functional link between chromatin organization and regulation of splicing [24].

It has become clear that a number of viruses, including Kaposi's sarcoma-associated herpesvirus (KHSV), Epstein-Barr virus (EBV), and human cytomegalovirus (HCMV), utilize CTCF to control viral gene expression [25–27]. It was demonstrated that CTCF associates with several regions within the KHSV genome, and that the CTCF-cohesin protein complex regulates the cell cycle control of viral gene expression during latency [25]. In a later study, it was also shown that CTCF and cohesin play important roles in regulating KHSV reactivation from latency by modulating viral gene transcription [28]. CTCF binding on EBV genome was shown to negatively affect transcription [26] and in the case of HCMV, binding of CTCF to the first intron of the Major Immediate Early (MIE) gene repressed MIE gene expression [27]. In addition, CTCF recruitment to the small DNA genome of human papillomavirus (HPV) was shown to regulate viral gene expression and transcript processing [29].

In this report, we show that CTCF can play an important role in parvovirus gene expression. Mutations that diminish binding of CTCF to the MVM genome affect processing of P4-generated pre-mRNAs; R1 is spliced less efficiently, and definition of the NS2-specific exon upstream of the small intron is reduced, leading to relatively less R2 and the generation of a novel exon-skipped product. These results implicate a requirement for CTCF in engagement of the spliceosome at the MVM small intron and the first steps of processing of the P4-generated pre-mRNA.

## **2. Materials and Methods**

## *2.1. Cell Lines and Viruses*

Murine A9 and human NB324K cells were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 5% fetal bovine serum (FBS) and incubated at 37 ◦C with 5% CO2. Wild-type MVMp for infections was produced in A9 cells as previously described [11].

### *2.2. Transfections and Viral Infections*

For transfections, cells were grown on 60-mm tissue culture dishes until they reached ~80% confluency. Cells were transfected with plasmids using LipoD293 transfection reagent (SignaGen Laboratories, Baltimore, MD, USA) according to the manufacturer's instructions. For RNA isolation, cells were co-transfected with wild-type or mutant plasmids (2 μg) and 3xFLAG-eGFP (p3xFeGFP) expression vector (0.5 μg), and harvested at 48 h post transfection (hpt). For Chromatin

Immunoprecipitation (ChIP) assays, cells were transfected with 2 μg of wilt-type or mutant plasmids and harvested at 20 hpt. Viral infections were carried out at a Multiplicity of Infection (MOI) of 10 unless otherwise stated and infected cells were harvested at the indicated timepoints.

## *2.3. Cell Synchronization*

For infection experiments, A9 cells were parasynchronized in G0 phase by isoleucine deprivation for 36–42 h prior to infection as previously described [11]. Following synchronization, cells were released into complete media containing 5% FBS and infected with MVMp.

## *2.4. Plasmids*

The infectious plasmid clone of MVM (pWT), which expresses the full length viral genome, was previously described [30]. gBlocks Gene Fragments of MVM, gbNSc and gbVPc, containing the mutated CTCF binding sites at the NS and VP region, respectively, were synthesized by Integrated DNA Technologies (IDT, Coralville, IA, USA). pNSc plasmid was constructed by replacing the XcmI-BsrGI fragment of pWT (nt 644–1253) with the gbNSc gene block, so that MVMp with a mutated CTCF binding site at the NS region was expressed. Similarly, the pVPc plasmid was constructed by replacing the PmlI-XbaI fragment of pWT (3636–4347) with the gbVPc gene block, so that MVMp with a mutated CTCF binding site at the VP region was expressed. In order to make the double CTCF-binding site mutant plasmid, pDc, in which both CTCF binding sites were mutated, the PmlI-XbaI fragment in pNSc was replaced with the gene block gbVPc. To generate the marker rescue of the pNSc plasmid, pNScMR, the XcmI-BsrGI fragment in pNSc was replaced with the XcmI-BsrGI fragment from pWt. The marker rescue of the pDc plasmid, pDcMR, was constructed by replacing both the XcmI-BsrGI and PmlI-XbaI fragments in pDc with the corresponding fragments from pWT. The p4Tppt plasmid, with improved polypyrimidine tract at the large intron 3 splice site, was previously described [31]. To generate the double CTCF-binding site mutant construct with improved polypyrimidine tract (pDc4Tppt), the BsrGI-XhoI fragment (nt 1248–2075) from p4Tppt was cloned into the pDc plasmid between the BsrGI and XhoI sites. The pD3 plasmid was constructed by replacing both the XcmI-BsrGI and PmlI-XbaI fragments in pWT with the gene blocks gbNS3 and gbVP3, respectively.

## *2.5. Extraction of MVMp Nucleoprotein Complexes*

MVMp nucleoprotein complexes were isolated from infected cells as previously described with modifications [32]. At the indicated timepoints, cells were washed with phosphate-buffered saline (PBS), harvested into HBE buffer (10 mM HEPES, 5 mM KCl, 1 mM EDTA), and collected by centrifugation at 1000× *g* for 3 min. Cell pellets were resuspended in 500 μL HBE buffer and lysed on ice for 10 min by addition of 1% NP-40 (to a final concentration of 0.1%). To pellet the nuclei, the lysate was centrifuged for 5 min at 1000× *g*. The supernatant (cytoplasmic extract) was transferred to a clean tube and the nuclei was resuspended in 500 μL buffer HBE. Sodium chloride (NaCl) was added to the suspension to a final concentration of 100, 200, or 400 mM and incubated on ice for 2 h. The remaining chromatin (chromatin pellet) was pelleted at 10,000× *g* for 10 min while the supernatant contained the MVMp nucleoprotein complexes (salt-wash extract).

#### *2.6. Total RNA Isolation*

Total RNA was extracted from transfected or infected cells as previously described with minor modifications [33]. Briefly, for total RNA isolation, cells were lysed in TRIzol reagent (Invitrogen, Carlsbad CA, USA) and RNA was prepared according to the manufacturer's protocol.

#### *2.7. RNase Protection (RPA) Assay*

Total RNA was extracted from transfected or infected cells using TRIzol reagent (Invitrogen) according to the manufacturer's protocol and RNase protection assays were performed on 25 μg RNA as previously described [34]. The probes used for the RPAs were α-32P-UTP-labeled Sp6-generated antisense RNAs. The MVM HaeIII probe, extended from before the acceptor site of the large intron (nt 1852) to within the small intron (nt 2378), was used to analyze all MVM pre-mRNAs generated during wild-type MVMp infection. The HaeIII fragment (nt 1852–2378), cloned into a pGEM-3Z cloning vector between the XbaI and SphI restriction sites, was used as a template for the preparation of the HaeIII probe. Appropriate homologous probe (HaeIII 4Tppt) was used to analyze the RNA species generated from the Dc4Tppt mutant. The MVM P4 probe (spanning nt 201 to 652) was produced to analyze the P4-generated RNA products. The MVM 201–652 fragment was cloned into the pGEM-3Z vector between the BamHI and HindIII restriction sites and it was used as a template for the synthesis of the P4 probe.

To make the 3xFeGFP antisense RNA probe, the 3xFeGFP fragment with a SP6 promoter sequence at the 3 end was amplified from the 3xFeGFP expression vector (p3xFeGFP) by PCR with primers 5- ATC ATG CGG CCG CCG TCA GAA TTA ACC ATG GAC TAC AAA GAC 3 and 5- CTA TAT TTA GGT GAC ACT ATA GTT AAT TTT ATT AGG ACA AGG CTG GTG 3- .

#### *2.8. Northern Blotting*

For Northern blot analysis, 10 μg of total RNA, prepared as described above, was resolved on a formaldehyde—1.4% agarose gel at 35 mA for 24 h. After staining with ethidium bromide for 30 min, the gel was washed in DEPC-treated water for 4 h and transferred to a nitrocellulose membrane overnight. Blots were baked for 2 h at 80 ◦C and hybridized with randomly primed radiolabeled MVM probes. A HaeIII probe (nt 1852–2378) was used to detect all full-length viral mRNAs and a whole genome probe (Bam) was used to specifically detect the exon-skipped product generated from the double CTCF-binding site mutant construct.

## *2.9. Chromatin Immunoprecipitation (ChIP) Assay in Whole Cell Lysates*

ChIP assays were conducted on parasynchronized murine A9 cells infected with MVMp at an MOI of 10 or human NB324K cells transfected with the wild-type or the CTCF-binding site mutant constructs as described previously [13]. Briefly, cells were cross-linked by addition of 1% formaldehyde directly to the culture media and incubated with shaking at room temperature for 10 min. The reaction was quenched with 0.125 M glycine for 5 min and cells were collected and lysed for 20 min on ice in ChIP lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl pH 8.0, protease inhibitors). Cell lysates were sonicated with a Diagenode Bioruptor for 75 cycles (30 s on and 30 s off) and debris was pelleted by centrifugation (8000× *g*, 15 min, 4 ◦C). The supernatant was then added to the indicated antibody-bound Protein A Dynabeads (Invitrogen) and samples were incubated overnight with rotation at 4 ◦C in ChIP dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.0, 167 mM NaCl). The next day, the following washes were performed (3 min each at 4 ◦C with rotation): once in low salt wash (0.01% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, 150 mM NaCl), once in high salt wash (0.01% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8.0, 500 mM NaCl), once in lithium chloride (LiCl) wash (0.25 M LiCl, 1% NP40, 1% DOC, 1 mM EDTA, 10 mM Tris-HCl pH 8.0) and twice in TE buffer, followed by elution in SDS elution buffer (1% SDS, 0.1 M Sodium bicarbonate). The DNA-antibody complexes and input DNA were reverse cross-linked overnight at 65 ◦C in the presence of NaCl and proteinase K. The DNA was purified using a PCR purification kit (Qiagen, Baltimore, MD, USA) and analyzed by quantitative PCR (qPCR) with iTaq universal SYBR green master mix (Bio-Rad) and primers 5- CGC CTT CGG ACG TCA CAC GTC 3- (MVM nt 60–80) and 5- CCA GCC ATG GTT AGT TGG TTA C 3- (MVM nt 268–247). Data are presented as percent input, calculated as described previously, ref. [35] or relative to IgG.

#### *2.10. Chromatin Immunoprecipitation (ChIP) Assay on Viral Nucleoprotein Complexes*

Following the extraction of MVMp nucleoprotein complexes as described above, the salt-wash extract was cross-linked with 0.1% formaldehyde for 5 min at room temperature and the reaction was quenched with 0.125 M glycine. The sample was then loaded onto an Amicon Ultra-0.5 Centrifugal Filter Device placed in filtrate collection tube and centrifuged for 30 s at 10,000× *g* to remove salt. PBS was added to the remaining sample and centrifuged for 30 s at 10,000× *g* to exchange buffer and concentrate. The purified sample was recovered from the Amicon filter by reverse spin (1000× *g*, 2 min). The viral genome-protein complexes were incubated with the indicated antibodies bound to Dynabeads Protein A (Invitrogen) and the ChIP assay performed as described above.

## *2.11. Immunoblot Analysis*

Infected cells were harvested at the indicated timepoints, lysed in 1× dye (25 mM Tris pH 7.5, 2% SDS, 2 mM EDTA, 6% glycerol, 20 mM DTT, bromophenol blue) and sheared using a 25 G × 5/8-inch, 1-mL needle-syringe (BD Biosciences San Jose, CA, USA). The whole-cell lysates were boiled for 10 min at a 100 ◦C-heat block and equal volumes of samples were loaded per well for Western blot analysis. For Western blot analysis of the salt-wash extracts, 1× dye was added directly to the samples and processed as described above. Chromatin pellet, prepared during the salt-wash extraction procedure, was resuspended in 1× dye, sheared, and processed as described above.

## *2.12. Southern Blot Analysis*

Infected cells were harvested at the indicated timepoints, pelleted and resuspended in Southern lysis buffer (2% SDS, 150 mM NaCl, 10 mM Tris pH 8.0, 1 mM EDTA). Cells were proteinase K treated for 2 h at 37 ◦C, and sheared using 25 G × 5/8-inch, 1-mL needle-syringe (BD Biosciences). Total DNA content in the samples was quantified using Nanodrop, equal amount of DNA loaded per well and electrophoresed on a 1% agarose gel for 16 h at 35 V. Samples were transferred to a nitrocellulose membrane and hybridized with randomly primed radiolabeled MVM probe (Bam) or genomic DNA probe (SINE). For Southern blot analysis of the chromatin pellet and salt-wash extracts, samples were resuspended in Southern lysis buffer and processed as described above.

### *2.13. Reverse Transcription-Polymerase Chain Reaction (RT-PCR) and TA Cloning*

Total RNA was extracted using TRIzol reagent (Invitrogen) from cells transfected with the wild-type (pWT) and the double CTCF mutant (pDc) vectors, respectively, and subjected to DNase I (Thermo Fisher Scientific, Waltham MA, USA) treatment for 1 h at 37 ◦C to remove genomic DNA contamination. First-strand cDNA synthesis was performed on 1 μg DNase I-treated RNA using SMART®MMLV Reverse Transcriptase (Clontech, Mountain View, CA, USA) according to the manufacturer's instructions with primer 5- GTT TTT TTT TAG CTC TGG CTT GG 3- (MVM 2758–2736). The cDNA product was used for downstream PCR amplification using Platinum™ Taq DNA Polymerase High Fidelity (Invitrogen) with primers 5- GTA TTG ATC ATA GGC CTC GTC G 3- (MVM 2514–2493) and 5- GTA ACC AGG AAG TGT TCT CAT TTG 3- (MVM 322–345). The PCR products were analyzed by agarose gel electrophoresis and individual bands were extracted from the gel for downstream analysis using the QIAquick Gel Extraction kit (Qiagen). The small product, generated from the double CTCF-binding site mutant construct, was cloned into a PCR®2.1 vector using the TA Cloning® Kit (Invitrogen) according to the manufacturer's protocol. The construct was transformed into competent *Escherichia coli* DH5α cells and a number of individual clones were analyzed by Sanger Sequencing. The large product, generated from both the wild-type and the mutant construct, was submitted directly for sequencing analysis.

## *2.14. Immunofluorescence Assay*

Immunofluorescence assays were performed in human NB324K cells infected with MVMp at an MOI of 10. At 24 h post infection (hpi), cells were harvested and processed as previously described [13]. Samples were incubated with the indicated antibodies for 1 h followed by the Alexa Fluor® conjugated secondary antibodies 488 and 568 for 1 h. Samples were mounted on slides with ProLong Diamond

Artifade Mountant with DAPI (Invitrogen) and images were acquired using a Leica TCP SP8 confocal microscope and a 10 × 1.4 NA objective lens.

### **3. Results**

#### *3.1. CTCF Specifically Binds the Viral Genome and Localizes to MVM Replication Compartments*

We previously showed that MVM replicates in close association with sites on the cellular genome, taking advantage of the fact that these sites are replete with factors involved in gene expression and DNA damage signaling [13]. Therefore, in order to identify and characterize factors that specifically bound the MVM genome during replication, we developed a nuclear salt-wash extraction protocol, which could effectively separate the replicating viral genome from cellular DNA prior to cross-linking. This protocol, based on previous strategies designed to purify soluble nuclear protein complexes of MVM [32], was here further optimized across both time and salt gradients. Two-hour incubations proved best, and as can be seen in Figure 1A, MVM replicative forms were efficiently extracted beginning at approximately 200 mM NaCl. Both histone H3 and γ-H2AX, typically associated with cellular DNA [36], were used to monitor the purity of fractionation following extraction. As can be seen in Figure 1B, γ-H2AX appeared in the 400 mM salt-wash (the γ-H2AX band in the 100 mM salt-wash was not reproducible, and likely was an overflow from the adjacent lane), and so subsequent experiments were performed using extraction conditions of 200 mM NaCl for 2 h.

In silico inspection of the MVM genome suggested a potential interaction with the multifunctional cellular DNA-binding zinc finger protein CTCF, and thus this potential interaction was investigated by ChIP assays of salt-wash extracts of MVM infected cells [37]. First, the purity of the salt-wash extraction as assayed by ChIP was confirmed. While γ-H2Ax was found to associate strongly with MVM by ChIP following cross-linking within total cell extracts (Figure 1C, left panel), γ-H2Ax did not bind the MVM genome significantly over background when cross-linking and ChIP assays were performed in optimized salt-wash extracts (Figure 1C, center panel). These results were consistent with the results of the Western blot analysis shown above in Figure 1B, and further highlighted the importance of separating viral genomes from cellular DNA prior to attempts to identify specific viral binding factors. Following separation, ChIP assays demonstrated strong and specific CTCF binding to the MVM viral genome (Figure 1C; Rad 21, another cellular chromosome binding factor [38] was used as a negative control). Consistent with the ChIP results, we found that, while CTCF displayed a punctate pattern and was found throughout the nucleus of cells at both time points shown, NS1 co-localized with CTCF in both early- and late-stage APAR bodies (Figure 1D, middle and bottom panel respectively). MVM has two potential CTCF binding sites in its genome, one within the NS1 gene and one in the capsid gene (Figure 1E). Inspection of the autonomous parvoviruses H1 and Minute Virus of Canine (MVC), revealed potential CTCF binding sites in the same relative position as MVM, and the dependovirus AAV has a CTCF binding site within its Rep gene (Figure 1E).

**Figure 1.** CTCF is associated with the viral genome and localizes to MVM replication compartments. (**A**) Murine A9 cells were parasynchronized by isoleucine deprivation and infected with MVMp at an MOI of 10 at the time of release into complete medium. At 24 hpi, viral nucleoprotein complexes were extracted from infected cells with various NaCl concentrations, and both the chromatin pellet (P) and the salt-wash extracts (W) were subsequently analyzed by Southern blotting as described in Materials and Methods. DNA extracted directly from infected cells served as a positive control (lane 7). The blot was hybridized with a radiolabeled MVM whole genome probe (top panel) and replicative intermediates of single-stranded DNA, ssDNA; monomer, mRF; and dimer, dRF; are indicated to the right. The blot was also hybridized with a genomic DNA probe against the SINE element (bottom panel). (**B**) Salt wash extracts and chromatin pellet described in (**A**) were assayed by Western blotting using antibodies directed against the indicated proteins. Whole-cell lysates of MVM-infected parasynchronized murine A9 cells were also analyzed by Western blotting with the indicated antibodies and served as a positive control (lane 7). (**C**) Murine A9 cells were parasynchronized by isoleucine deprivation and infected with MVMp at an MOI of 10 at the time of release into complete medium. At 16 hpi, cells were processed, as described in Materials and Methods for whole-cell lysate ChIP (left panel) or ChIP on salt wash extracts (middle and right panel) with the indicated antibodies. Samples were analyzed by qPCR as described in Materials and Methods. Data are presented as mean ± standard error of the means (SEM) of two individual experiments. Background binding levels were determined using mouse IgG pulldowns. (**D**) Representative confocal images of Mock versus MVM infected, non-synchronized human NB324K cells at 24 hpi, probing MVM-NS1 (red) and the host cellular factor CTCF (green). CTCF co-localized with NS1 in both early and late stage APAR bodies designated as previously described [12] (middle and bottom panel respectively). Blue corresponds to DAPI staining. Nuclear border is indicated by dashed white line. (**E**) Schematic representation of the protoparvoviruses MVM, MVC, H1, and the dependovirus AAV2 genome showing the positions of transcriptional promoters (solid black arrows), the major open reading frames that encode the viral non-structural and capsid proteins (arrowed boxes), and the relative positions of CTCF binding sites (blue oval shapes).

The sequences of the consensus CTCF binding sites in the MVM genome (RefSeq: NC\_001510.1) are shown (Figure 2A). Interestingly, the consensus signals lie on opposite strands of the double stranded transcription template: the consensus NS motif lies in 5- -3 polarity on the virus minus strand, while the VP motif lies 5- -3 on the plus strand. To confirm that CTCF bound to the genome at these sites on the double strand replicative form, a series of mutations were made, and these were used as

targets for CTCF ChIP experiments. As we could not reproducibly shear the replicating MVM genome during the ChIP procedure, this step was omitted, and so ChIP pull-downs revealed binding to the complete MVM genome. Originally, we attempted to mutate the CTCF binding sites by third nucleotide substitutions, which left the amino acid sequences unchanged; however, these mutations only partially prevented CTCF binding, and so more complete mutations were introduced. These severe mutations of both sites together led to significant loss of CTCF binding over background (Dc; Figure 2A,B). Mutation of the NS site alone reduced CTCF to nearly Dc levels (NSc; Figure 2A,B), while mutation of the VP site retained intermediate binding (VPc; Figure 2A,B). Binding in the single mutants was likely due to binding at the remaining unaltered site, which suggested that CTCF could bind independently to either site, and that binding to the NS site appeared stronger. Unfortunately, mutations needed to prevent CTCF binding destroyed the NS1 open reading frame precluding assessment of their replication. Prior to the further analyses described below, all mutants were marker rescued with wild-type MVM sequences as described in the Materials and Methods to ensure no additional mutations were present.

**Figure 2.** CTCF binding to the single DNA-binding site mutants is reduced, while it is almost completely abolished on the double mutant (**A**) schematic representation of MVM genome showing the positions of transcriptional promoters (solid black arrows), the major open reading frames (arrowed boxes), the relative positions of CTCF binding sites (blue oval shapes) and the nucleotide sequences of the WT and the mutant CTCF binding sites in the NS and VP region. The nucleotides that were mutated within the CTCF binding motifs are shown in red. The yellow arrows show the orientation of the CTCF motifs on the viral genome. (**B**) Human NB324K cells were transfected with the WT or the indicated CTCF-binding site mutants, and harvested at 20 hpt as described in Materials and Methods for whole-cell lysate ChIP using antibodies directed against the cellular factor CTCF or the mouse IgG protein. Samples were analyzed by qPCR as described in Materials and Methods and presented relative to IgG isotype control. Data are presented as mean ± SEM of two individual experiments. Ct values (average of the two experiments) for each plasmid transfection are shown and indicate similar levels of target DNA in the samples.

## *3.2. CTCF-Binding Site Mutants Exhibited a Decrease in Levels of Spliced to Unspliced R1, as Well as Reduced Levels of R2 Relative to R1*

Following transfection of human NB324K cells, both the Dc double mutant, and the NSc single mutant, were found to generate significantly reduced levels of spliced R1 relative to unspliced R1 RNAs, and reduced levels of R2 relative to R1, as assayed by RNase protection assays [Figure 3B, lanes 4 and 6, respectively (ratios represent an average of two independent experiments)], using the HaeIII probe, which spans the small intron (Figure 3A). As the mutations in the NS region changed the amino acid sequence of NS1, R3 was not generated by either of these mutants. Control transfections of an eGFP expressing plasmid confirmed similar levels of transfection efficiency in these experiments (Figure 3B, bottom panel). Mutants reducing binding within the VP region site alone (VPc), in which CTCF binding to the NS region remained, showed a decrease in splicing to R1, but essentially wild-type patterns of R2 expression (Figure 3B compare lanes 5 with lane 3 and 2). A similar phenotype for the three mutants was also observed following transfection of murine A9 cells. Together, these results indicated that the phenotype of the Dc mutant was primarily due to the mutation in the NS region, and even though CTCF bound at both the NS and VP sites (Figure 2), the individual mutations exhibited different effects. While both mutants exhibited decreased splicing of R1 from the

P4-generated pre-mRNA, only the NSc mutation affected subsequent appearance or R2. It is important to note that the NSc mutations fell outside of the affected R2 RNA itself, and did not lie close to any known RNA regulatory element. Additionally, mutation of multiple nucleotides within the NS and VP motifs that did not efficiently disrupt CTCF binding (D3, diagrammed in Figure 2A) had no deleterious effect on RNA processing (Figure 3B, lanes 7, 8). RNase protection assays with a P4 probe, which specifically detects the P4 promoter-generated R1 and R2 transcripts individually, also showed an increased ratio of R2 relative to R1, while the total P4 products were similar for the two. Splicing of R1 pre-mRNA depends upon engagement of the spliceosome at the small intron [30], which was also necessary for exon definition of the upstream NS2-specific exon required for splicing of the large intron and generation of R2 [39]. Thus, our results suggested that CTCF engagement of its MVM binding sites may play a role in processes functioning at the small intron.

**Figure 3.** CTCF-binding site mutants exhibit a decrease in levels of spliced to un-spliced R1, and levels of R2 relative to R1. (**A**) Genetic map of MVM showing the three major transcript classes (R1, R2, and R3), the open reading frames that encode the two non-structural (ORF2 and ORF3) and the capsid (ORF1) proteins, and the relative positions of the small and large introns. Approximate locations of the RNase protection viral probes (Bam, P4 and HaeIII) are also indicated. (**B**) Human NB324K cells were infected with MVMp at an MOI of 10 or co-transfected with the indicated plasmids and eGFP. Infected cells were harvested at 24 hpi while transfected cells were harvested at 48 hpt and total RNA was isolated using TRIzol reagent. Samples were processed for RNase protection assay (RPA) using a HaeIII (that detects all viral transcripts) or an eGFP probe, as described in Materials and Methods. The protected bands representing all nine viral mRNA species are indicated to the left. The ratios of total R2 to total R1, and spliced to unspliced R1 are indicated at the bottom of the panel. Ratios represent an average of two independent experiments. (**C**) RPA of total RNA extracted from NB324K cells 24 h post infection or 48 h post transfection with the indicated plasmids using the P4 viral probe, detecting specifically the P4-generated transcripts, or the eGFP probe. The identities of the protected bands are shown on the left. The ratio of R2 to R1 is indicated at the bottom of the panel. Ratios represent an average of two independent experiments.

## *3.3. CTCF-Binding Site Mutants Resulted in Skipping of the NS2-Specific Exon and Joining of the Large Intron Donor to the Small Intron Acceptors*

Northern blot analysis of RNA generated in NB324K cells by the double CTCF-binding site mutant Dc revealed a transcript, approximately the size of R3, that hybridized with a whole-genome probe (Figure 4A, lane 2). This was surprising since this mutant, which does not produce wild-type NS1, did not generate the R3 mRNA, as was demonstrated in Figure 3B. A mutant containing a translation termination signal immediately downstream of the NS1 AUG is shown for comparison (Figure 4A, lane 3). Interestingly, the R3-size RNA generated by Dc was not detected in Northern blots using the HaeIII probe, which covers the NS2-specific exon (Figure 4A, lane 5). These results confirmed that this band was not R3, and suggested that the approximate 3 kb size RNA generated by Dc might have been an RNA product spliced at the large intron donor (nt 514) that was joined to a small intron acceptor (Although the large amount of transfected plasmid DNA in these samples makes R1 poorly visible on these gels, it was clearly apparent on the RNase protection gels of these RNAs shown in Figure 3).

**Figure 4.** CTCF-binding site mutants resulted in skipping of the NS2-specific exon and joining of the large intron donor to the small intron acceptors, which can be overcome by mutations that improve the polypyrimidine tract of the upstream large intron. (**A**) Human NB324K cells, transfected with the indicated plasmids, were harvested at 48 hpt and total RNA was isolated using TRIzol reagent. Samples were processed for Northern blot analysis using the probes indicated at the bottom of the panel. The identity of the viral RNA species is shown on the left. (**B**) Total RNA extracted from NB324K cells, transfected with the WT or the double CTCF-binding site mutant, was subjected to DNase I treatment for 1 h at 37 ◦C. First-strand cDNA synthesis was performed on the DNase I-treated RNA samples and the cDNA product was used as a template for downstream PCR analysis as described in Materials and Methods. The identities of the amplified bands were determined by sequencing analysis and are shown to the right. (**C**) Schematic representation of R2 (top) and the exon-kipped product (bottom), generated from the double CTCF-binding site mutant construct. The exon-skipped product skips the NS2-specific exon (ORF2) and joins the large intron donor at nt 514 to the small intron acceptor A1 (nt 2377) or A2 (nt 2399). The PCR primers used to detect the exon-skipped product are indicated at the top of the panel (black arrows). The numbers on top of the arrows represent the location on the viral genome where the primers anneal. Approximate locations of the RNase protection viral probes (Bam and HaeIII) are also indicated. Total RNA extracted from NB324K cells, transfected with the indicated plasmids as well as eGFP, was subjected to Northern blot analysis (**D**) or RNase protection assay (**E**) as described in Materials and Methods using the probes shown at the bottom (**D**) or to the right (**E**) of the panel. The identity of the RNA species are also depicted.

To reveal whether such an RNA was in fact made by Dc, we performed non-quantitative RT-PCR analysis of Dc-generated RNA using primers shown in Figure 4C. As shown in Figure 4B, the Dc mutant did generate such a novel spliced product, which is diagrammed in Figure 4C. These cDNAs were cloned and sequence analysis revealed that these spliced products joined the large intron donor at nt 514 to either the small intron acceptor, A1, at nt 2377, or the small intron acceptor, A2, at nt 2399. Inspection of the more quantitative Northern results in Figure 4A suggests that this NS2-specific exon-skipped product was present at approximately half the concentration of R2.

## *3.4. Improvement of the Large Intron Splice Acceptor in the Dc Mutant Led to Increased NS2-Specific Exon Definition and Increased Levels of R2 RNA*

If lack of CTCF binding to the MVM genome led to weakening of the large intron acceptor due to loss of definition, for splicing purposes, of the NS2 specific exon, we would expect that improving the large intron acceptor would overcome this deficiency. As can be seen in a Northern blot analysis using the whole genomic probe, strengthening the large intron acceptor polypyrimidine tract with the addition of 4 additional thymidine residues, previously shown to overcome mutations that reduced NS2-specific exon definition [39], led to both a decrease in the exon skipped product and an increase in authentic R2 generated by Dc (Figure 4D, compare lanes 2 and 3). Northern analysis of this RNA using the HaeIII probe confirmed the authenticity of the exon skipped product lost in the left panel of Figure 4 (Figure 4D, compare lane 2 to 5), and revealed enhanced levels of R2. An increase in R2 RNA generated by pDc4Tppt was confirmed by quantitative RNase protection analysis in which expression of an eGFP gene was included as a transfection control (Figure 4E, compare lane 2 to 3).

#### **4. Discussion**

In surveying the MVM genome for the binding sites of known cellular factors, we noticed consensus CTCF binding sites in the NS and the VP regions of MVM that were conserved in a number of other parvoviruses. Because we previously showed that the replicating MVM genome associates with particular sites of DNA damage on the cellular genome [13], determining whether CTCF specifically bound to MVM required that we separate the viral genome from the cellular genome prior to the cross-linking step during chromatin immunoprecipitation assays. Upon doing so, we could demonstrate specific binding of CTCF to these sites on MVM.

Full disruption of CTCF binding to MVM required destruction of both sites together; destruction of the NS site individually (which retained the VP binding site) reduced binding similarly to the double mutant, while destruction of the VP site (leaving the NS site) retained an intermediate binding phenotype. Because the mutations required to disrupt CTCF binding could not be made without disrupting the NS1 ORF, the mutants could not be assessed directly for replication. However, both the double CTCF binding site mutant Dc, and the NS-alone mutation NSc, showed a dramatic defect in gene expression. These mutants were both deficient in the splicing of the R1 RNA, and they generated relatively less R2 at the expense of a new product, an RNA that joined the large intron donor at nt 514 to one or the other of the small intron acceptors. As previously mentioned, the NSc mutations falls outside of the affected R2 RNA itself, and does not lie close to any known RNA regulatory element. Additionally, mutation of multiple nucleotides within the NS motif that did not efficiently disrupt CTCF binding had no deleterious effect on RNA processing (Figure 3), further implying that this region did not contain a previously unrecognized *cis*-acting RNA processing element. It is interesting that the VPc single mutant, although apparently not deficient in NS2-specific exon definition, still generated less relative spliced R1 RNA. This perhaps suggests that the role of CTCF binding at the individual sites, and their potential interaction, is complex, and may be related to their different orientations on the viral chromosome. Our preliminary results have shown that the exon-skipped RNA product does transit to the cytoplasm, but we could detect no protein product that it generates.

Interestingly, we observed the newly spliced exon-skipped RNA product before [39,40]. In previous studies that characterized splicing of the P4-generated pre-mRNA, we found that when the NS2-specific exon was poorly defined—either by virtue of its weak large intron acceptor [30], by certain mutations within the NS2-specific exon itself [40], or importantly, by mutation of the downstream small intron [30], an RNA was generated in which the NS2-specific exon was skipped. Because definition of the NS2-specific exon functions to strengthen the adjacent upstream large intron acceptor at nt 1989 [39], improvement, in those mutants, of the large intron acceptor by the addition of four thymidine residues in its polypyrimidine tract overcame the defect in NS2-specific exon definition [39]. These observations, as well as the absence in infected cells, of P4-generated RNAs lacking only the large intron but not the small intron [31], led us to propose a model (diagrammed in Figure 5) in which the spliceosome first engages the R1 pre-mRNA at the small intron, allowing its splicing as well as facilitating its interaction with the upstream large intron acceptor to define the NS2-specific exon allowing splicing of the large intron [39].

In the light of these previous results, the results presented here—that the Dc and NSc mutants exhibited reduced splicing of R1 and generated an exon-skipped product at the expense of R2, which could be suppressed by improvement of the large intron polypyrimidine track—suggested a model in which CTCF binding likely plays a role in proper engagement of the spliceosome at the small intron. In its absence, R1 would be poorly spliced, and the NS2-specific exon poorly defined, leading to the generation of the new exon-skipped product we observe. How interruption of interaction of the spliceosome at the small intron may affect our general model of P4-generated pre-mRNA processing is shown in Figure 5. Binding of CTCF to the site in the NS1 gene appears to play a more significant role in this effect than binding to the site in the capsid gene.

**Figure 5.** Model depicting a potential mechanism by which CTCF regulates splicing of MVM P4-generated transcripts. Top: A general model of processing of P4-generated pre-mRNAs; CTCF binding to the viral genome likely plays a role in proper engagement of the spliceosome at the small intron, allowing its splicing as well as facilitating its interaction with the upstream large intron acceptor, to define the NS2-specific exon allowing splicing of the large intron. Bottom: In the absence of CTCF binding, R1 would be poorly spliced, and the NS2-specific exon poorly defined, leading to the generation of a novel exon-skipped product.

How CTCF binding to MVM functions to play its role in MVM RNA processing is not yet known. CTCF has been shown to have a role in chromosomal architecture, specifically looping of DNA, as well as transcriptional activation, and has been shown to have RNA binding activity [19–22]. RNA immunoprecipitation experiments, done as we have previously described [41] did not demonstrate CTCF binding to MVM RNA. It was first reported that DNA-bound CTCF regulates alternative pre-mRNA splicing by mediating RNA Polymerase II pausing, allowing the inclusion of upstream weak exons [23]. A more recent report, however, suggested that CTCF regulation of alternative splicing of human papillomavirus early genes was more complicated. Specifically, it was found, similar to the results reported here, that loss of CTCF binding to the viral genome resulted in both increased levels of unspliced transcripts and an alteration of splice site usage upstream of the CTCF binding site, with a significant reduction of a specific alternatively spliced product [29]. Thus, it is possible that CTCF binding affects RNA processing through modulation of the elongating transcription complex. ChIP assays of RNA pol II on the MVM P4 promoter showed no reproducible difference between the Dc mutant and wildtype MVM. However, it is well known that RNA processing factors bind to the extending RNA polymerase at its CTD [42], and our assays would not have distinguished if the composition of the complexes engaging the Dc P4 promoter and wild-type P4 differed. Perhaps relevantly, we have previously shown that when the AAV2 P40 promoter and the AAV5 P5 promoter were replaced with either the HIV LTR or the CMV promoter, the RNA generated by these constructs was processed differently [43,44].

Importantly, it has been shown that the ratio of R2:R1 is exquisitely critical for successful MVM infection. Even small changes in this ratio can have large effects on replication [45]. Thus, the role of CTCF in controlling the ratio of R2 to R1 would be predicted to have significant effects on replication.

**Author Contributions:** Conceptualization, M.B., K.M. and D.J.P.; methodology, M.B., K.M. and L.R.B.; validation, M.B., K.M., L.R.B. and D.J.P.; resources, K.M. and D.J.P.; writing—original draft preparation, M.B.; writing—review and editing, M.B., K.M. and D.J.P.; funding acquisition, K.M. and D.J.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by NIH grants R01AI 046458 and R01AI116595 to DJP. K.M. was supported by Ruth L. Kirschstein Postdoctoral Individual National Research Service Award AI 131468, and NIH K99/00 AI 148511.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **No G-Quadruplex Structures in the DNA of Parvovirus B19: Experimental Evidence versus Bioinformatic Predictions**

#### **Gloria Bua** †**, Daniele Tedesco** †**,**‡**, Ilaria Conti §, Alessandro Reggiani, Manuela Bartolini and Giorgio Gallinella \*,**-

Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; gloria.bua2@unibo.it (G.B.); daniele.tedesco@isof.cnr.it (D.T.); ilaria.conti@unife.it (I.C.);

alessandro.reggiani5@unibo.it (A.R.); manuela.bartolini3@unibo.it (M.B.)


Received: 27 July 2020; Accepted: 21 August 2020; Published: 25 August 2020

**Abstract:** Parvovirus B19 (B19V), an ssDNA virus in the family Parvoviridae, is a human pathogenic virus, responsible for a wide range of clinical manifestations, still in need of effective and specific antivirals. DNA structures, including G-quadruplex (G4), have been recognised as relevant functional features in viral genomes, and small-molecule ligands binding to these structures are promising antiviral compounds. Bioinformatic tools predict the presence of potential G4 forming sequences (PQSs) in the genome of B19V, raising interest as targets for antiviral strategies. Predictions locate PQSs in the genomic terminal regions, in proximity to replicative origins. The actual propensity of these PQSs to form G4 structures was investigated by circular dichroism spectroscopic analysis on synthetic oligonucleotides of corresponding sequences. No signature of G4 structures was detected, and the interaction with the G4 ligand BRACO-19 (*N*,*N*- -(9-{[4-(dimethylamino) phenyl]amino}acridine-3,6-diyl)bis(3-pyrrolidin-1-ylpropanamide) did not appear consistent with the stabilisation of G4 structures. Any potential role of PQSs in the viral lifecycle was then assessed in an in vitro infection model system, by evaluating any variation in replication or expression of B19V in the presence of the G4 ligands BRACO-19 and pyridostatin. Neither showed a significant inhibitory activity on B19V replication or expression. Experimental challenge did not support bioinformatic predictions. The terminal regions of B19V are characterised by relevant sequence and symmetry constraints, which are functional to viral replication. Our experiments suggest that these impose a stringent requirement prevailing over the propensity of forming actual G4 structures.

**Keywords:** parvovirus B19; G-quadruplex; bioinformatics; antivirals; BRACO-19; pyridostatin

## **1. Introduction**

Parvovirus B19 (B19V), an ssDNA virus in the family Parvoviridae [1], is a human pathogenic virus, widely circulating in the population, responsible for an ample spectrum of clinical manifestations [2]. The genome is a 5.6 kb ssDNA molecule of either polarity, with a coding repertoire comprising a non-structural (NS) protein, functional to virus replication, and two structural proteins, VP1 and VP2, constituting a T = 1, 22 nm icosahedral capsid [3,4]. The virus is characterised by a selective but not

exclusive tropism for erythroid progenitor cells (EPCs) in the bone marrow and by a strict dependence on the cellular machinery and environment for its replication [5,6].

The selective tropism of B19V for EPCs in the bone marrow and the ability to induce cell cycle arrest and apoptosis in productively infected cells can cause a partial block in erythropoiesis. This may manifest as a transient or persistent erythroid aplasia, clinically acute and severe in patients with underlying haematological disorders, or chronic in patients with immune system deficits [7]. The virus is capable of infecting and maintaining long-term persistence in disparate tissues, mostly within endothelial or stromal cells, and can establish a complex relationship with the immune system, whose efficacy in innate and adaptive responses is crucial to the course of infection and the development of pathological processes [8,9]. In addition to haematological consequences, B19V infection can commonly manifest as erythema infectiosum and cause post-infection arthropathies. Further, a wide range of other different pathologies have been reported, among them mainly myocarditis [10] and autoimmune processes [11]. Infection in pregnancy may be transmitted to the foetus, posing a risk of foetal death and/or foetal hydrops [12–14].

B19V infection requires diagnostic awareness to lead and support clinical care in severe cases [15]. The development of antiviral strategies directed against B19V as compared to other viruses is still lagging, although recent work has identified a few compounds that show a selective inhibition of B19V replication in vitro [16]. Such compounds include hydroxyurea (HU), a ribonucleotide reductase inhibitor, also used for the treatment of sickle-cell disease and known to have "virostatic" properties [17]; the nucleotide analogues cidofovir (CDV) and its lipid derivative brincidofovir (BCV), broad-spectrum anti-viral agents mostly active against dsDNA viruses [18–20]; and a few coumarin derivatives [21]. Some flavonoid compounds can inhibit the endonuclease activity of viral NS protein, a function critical to the replicative process of B19V [22].

The genome of B19V has a limited coding potential, and its replication depends largely on the cellular environment. Consequently, a deeper understanding of the viral lifecycle and virus–cell interactions are required to identify further targets and agents for an effective antiviral strategy. Unconventional DNA structures have been recognised as relevant features for the regulation of several biological processes, including replication, recombination, and transcription [23]. Particular emphasis has been given to the potential of G-rich sequences to adopt G-quadruplex (G4) planar structures disrupting the regular double-helix structure of DNA [24]. These structures are characterised by stacks of guanine tetrads, which are bound via Hoogsteen-type hydrogen bonds, and can typically form when runs of 2–4 guanine bases are regularly spaced on the DNA sequence. Small-molecule ligands recognising and binding to these structures, either with interfering or with stabilising effects, may act as modulators in the biological process involved, raising interest as compounds of pharmacological interest [25].

Methodological developments have allowed the in silico prediction of specific G4 structures directly from primary sequences, and the number of studies reporting genome-wide G4 exploration across species has rapidly increased [26], including viruses [27]. A recent survey of viral genomes by a regular expression patterns search has led to assembly of a comprehensive database (G4-virus) reporting the presence, distribution, and statistical significance of potential quadruplex sequences (PQSs) in reference genomes and genome sets for all viral families [28]. In some cases, the presence and a biological role of PQS structures in viral genomes have been validated in experimental models, and the role as antiviral agents of specific G4-ligands such as pyridostatin (PDS) and BRACO-19 (*N*,*N*- -(9-{[4-(dimethylamino)phenyl]amino}acridine-3,6-diyl)bis(3-pyrrolidin-1-ylpropanamide) demonstrated in relevant instances [29]. Within the G4-virus database, indication for the presence of PQSs in the B19V genome was reported, raising the need for an experimental challenge of the bioinformatic prediction and, as a consequence, for the investigation of any possible relevance of these structures as targets for antivirals against B19V.

On these grounds, we carried out a closer bioinformatic inspection of the B19V genome for the presence of PQSs, comparing the results reported in the G4-virus database to targeted predictions obtained by a different computational method for G-quadruplex prediction, the QGRS (Quadruplex forming G-Rich Sequences) mapper [30], a method based on a scoring algorithm. By this analysis, we identified two sequence stretches located in the genomic terminal regions, close to the origins of replication of viral DNA, as a potentially relevant PQSs. Experiments were carried out to test the prediction. Synthetic oligonucleotides corresponding to the PQSs were investigated by circular dichroism (CD) spectroscopy, which can provide information on the propensity to form G4 structures. Then, any potential role of PQSs in the viral lifecycle was assessed by using G4 ligands in a model virus–cell system and evaluating the occurrence of a dose-dependent variation in replication or expression levels of B19V.

## **2. Materials and Methods**

## *2.1. Bioinformatic Analysis*

The B19V sequence used in bioinformatic analysis is a derived consensus sequence, referred to as B19V EC [GenBank KY940273] [31]. The G4-virus PQS database [28] was accessed at http://www. medcomp.medicina.unipd.it/main\_site/doku.php?id=g4virus. The QGRS Mapper web server [30] was accessed at http://bioinformatics.ramapo.edu/QGRS/index.php.

## *2.2. Chemicals*

Oligonucleotides used in CD analysis (Table 1) were obtained from Eurofins Genomics (Ebersberg, Germany) (https://www.eurofinsgenomics.eu/). BRACO-19 and pyridostatin were obtained from Merck-Sigma (Milan, Italy). Stock solutions were prepared in H2O at 1 mM and further diluted for subsequent experiments.


**Table 1.** Oligonucleotides used for the CD analysis on PQSs in the DNA of B19V.

## *2.3. CD Analysis*

Circular dichroism (CD) studies on oligonucleotides were carried out on a Jasco (Tokyo, Japan) J-810 spectropolarimeter equipped with a PTC-423S Peltier-type temperature control system. Measurements were performed using a micro-volume QS quartz cell with black walls (1 cm path length, 500 μL volume; Hellma Italia, Milan, Italy). Oligonucleotides and BRACO-19 were diluted from stock solution into an analysis buffer (KCl 70 mM, potassium acetate 20 mM, pH 6.8) at 2 and 10 μM, respectively. PDS was not used in CD studies because its addition to oligonucleotides caused precipitation in the samples, making it unsuitable for spectroscopic analysis. CD spectra (330–230 nm) were recorded at 17 different temperatures (every 5 ◦C between 15 and 95 ◦C) applying a 0.25 ◦C/min gradient for both heating and cooling ramps. A 4 nm spectral bandwidth, a 0.2 nm data interval, a 100 nm/min scanning speed and a 2 s data integration time were employed for measurements; solvent-corrected spectra were then converted to molar units per residue (Δεres, in M−<sup>1</sup> cm<sup>−</sup>1). CD melting curves were determined by plotting the Δεres values as a function of temperature (*T*) for each oligonucleotide, using the wavelength at which the difference between their CD signals at 15 and 95 ◦C was maximum (λΔmax ). Mid-transition temperatures (*T*m) for both heating and cooling ramps were then derived by non-linear regression on the CD melting curves using a 6-parameter logistic function [32,33].

## *2.4. Cells*

Erythroid progenitor cells (EPCs) were generated in vitro from peripheral blood mononuclear cells (PBMC), as described [5]. Blood donations were made available for institutional research purposes from the Immunohaematology and Transfusion Service, S. Orsola-Malpighi University Hospital, Bologna (authorisation 0070755/1980/2014). Availability was granted under conditions complying with Italian privacy law. Neither specific ethics committee approval nor written consent from donors was required for this research project.

## *2.5. Cytotoxicity*

The effects of tested compounds on cell viability were monitored by the Cell Counting Kit 8 (WST-8/CCK8) assay (Dojindo Molecular Technologies, Microtech, Italy), as described [20]. DMSO at 10% was used as a cytotoxicity positive control. The assay is based on a production of a formazan dye in response to cellular metabolic activity, measured as absorbance (OD) values. Replicate net OD values were normalised with respect to the control samples and expressed as mean percentage values for cell viability.

## *2.6. Infection*

B19V was obtained from a cloned synthetic genome, first transfected into UT7/EpoS1 cells, then propagated by serial passage in EPCs, as described [31]. For infection, EPCs were incubated at a density of 10<sup>7</sup> cell/mL, in the presence of B19V to a multiplicity of infection (moi, expressed as geq/cell) of 103 geq/cell, for 2 h at 37 ◦C. After removal of inoculum virus, EPCs were incubated at 37 ◦C in 5% CO2 in complete growth medium, at the different concentrations of tested compounds, at an initial density of 10<sup>6</sup> cells/mL.

## *2.7. Molecular Analysis*

Equal amounts of cell cultures, corresponding to 1.5 <sup>×</sup> 105 cells, were collected as appropriate at 2 or 48 h post-infection (hpi) and processed by using the Maxwell Viral Total Nucleic Acid kit on a Maxwell MDx platform (Promega), to obtain a total nucleic acid fraction in elution volumes of 150 μL. The quantitative evaluation of target nucleic acids was carried out by qPCR assays in a Rotor-Q system (Qiagen, Hilden, Germany). For the analysis of B19V DNA, aliquots of the eluted nucleic acids (corresponding to ~500 cells) were directly amplified in a qPCR assay (Maxima SYBR Green qPCR Master Mix, Thermo Scientific, Life Technologies, Monza, Italy). For the analysis of B19V RNA, parallel aliquots were first treated with the Turbo DNAfree reagent (Ambion, Life Technologies) before amplification in a qRT-PCR assay (Express One-step SYBR GreenER Kit, Invitrogen, Life Technologies). Standard cycling programs were used, followed by a melting curve analysis to define the *T*<sup>m</sup> of amplified products. The primer pair R2210–R2355, located in the central exon of B19V genome, was used to amplify both viral DNA and total RNA, and a target sequence in the region of genomic DNA coding for 5.8S rRNA (rDNA) was amplified in parallel reactions for normalisation [5,31].

## **3. Results**

## *3.1. Sequence, Symmetry, and Higher-Order Structures in B19V Genome*

The B19V reference sequence used for bioinformatic analysis is a derived consensus sequence, resulting from the alignment of a selected, non-redundant set of complete genomic sequences, referred to as B19V EC [GenBank KY940273]. Such a sequence provides the basis to a synthetic genetic system for B19V, able to yield virus with full replicative competence used for subsequent experiments [31]. The whole genome is 5596 nts long, and its arrangement presents two levels of symmetry. On a genomic scale, a unique internal region, 4830 nts, containing all the coding sequences, is flanked by inverted terminal regions, each 383 nts, serving as replicative origins. Within the terminal regions,

the distal 365 nts are disposed as a palindromic sequence around a central site of dyad symmetry. The palindrome is imperfect, presenting a few base mismatches leading to two different sequences, one the inverse complement of the other and usually referred to as "flip-flop", which can combine independently at each end, thus producing four different sequence isomers.

Superimposed on these symmetries, the B19V genome presents signatures of higher-order structures such as PQSs (Figure 1). The prediction on the presence and distribution of PQSs in B19V genome reported in the G4-virus database was compared to predictions obtained by the QGRS mapper program. Predictions were only partially concordant. The G4-virus database reports a list of all PQSs identified in the genome, their position on positive and negative strands, their degree of conservation among isolates included in the dataset expressed as frequency, and the statistical significance of their abundance [28]. For B19V, on a dataset of 13 sequences, only the presence of dinucleotide PQS, and not of tri- or tetra-nucleotide PQS, was considered statistically significant over a random distribution. A disperse dinucleotide (GG) PQS distribution was reported, including 22 GG-PQSs in the plus strand and 18 GG-PQSs in the in minus strand, at frequencies in the range 0.08–1.00. On the other hand, the QGRS mapper [30] uniquely identified, at a relevant score (G-score > 60), two sequence stretches with features of a PQS (G3N13G3N8G3N11G3) located within the terminal regions, which are characteristically GC-rich. In particular, these PQSs are located on either plus or minus strand, in close 5 proximity to the axis of dyad symmetry, partially overlapping with the sequence asymmetries and, thus, in different relative positions with respect to "flip" and "flop" isomers (Figure 2).

The palindromic sequences in the terminal regions allow intra-strand base pairing, leading to a hairpin configuration, as well as inter-strand base pairing leading to an extended configuration. Hairpins can provide priming for second-strand synthesis, whereas strands in the extended configuration need to separate and fold back into hairpin structures for reinitiating replication. Predictions locate PQSs within a functional replicative origin, so that the strand unwinding and folding mechanisms occurring during genome replication can offer the opportunity for DNA strands to assume a G4 structure. This, in turn, may play a role in the regulation of viral genome replication or expression. The following experiments analysed the actual propensity of predicted PQSs to assume a G4 configuration, and any potential relevant role of these in the viral lifecycle.

**Figure 2.** B19V genome symmetry and PQSs. ITRs are shown in the hairpin configuration for the positive and negative strands in the different "flip-flop" isomers. The potential G4 structures predicted by QGRS Mapper are located within the terminal sequences (shown as boxes), 5 to the dyad symmetry, either on the plus strand (blue) or minus strand (red). PQSs partially overlap with the asymmetries leading to the flip/flop isomers (bubbles). Oligonucleotides used in circular dichroism (CD) experiments (PQSs, Table 1) are shown in context (blue/red stripes). See also Supplementary Figure S1.

## *3.2. PQSs in B19V DNA: CD Analysis*

The propensity of the PQSs identified by the bioinformatic analysis to form G4 structures, which can occur in one of 26 folding arrangements [34], was first investigated by CD spectroscopic analysis on synthetic oligonucleotides of the corresponding sequence (Table 1). CD spectroscopy is routinely employed to investigate the secondary structure of nucleic acids, thanks to its sensitivity to chirality [35], and in this framework it can be used to evaluate the presence and geometry of G4 structures. Each geometry is characterised by different angles for the glycosidic bonds of guanosines and a different topology for the loops linking the stacked tetrads of the G4 stem, defining the coupling among the guanine chromophores of the bases and giving rise to peculiar CD signatures that can be used as an indicator for the presence of G4 structures [36]. Further, CD melting curve analysis [32,33] can yield information on thermal stability and binding of small molecules, in this case G4 ligands such as BRACO-19, a 3,6,9-trisubstitued acridine derivative designed to bind and stabilise quadruplex DNA structures.

The oligonucleotide HIV LTR-II (Table 1) was chosen as a positive control for the formation of G4 structures [37]. The CD spectrum of this oligonucleotide at low temperature (Figure 3A) can be interpreted as the overlap between the contribution of a parallel G4 structure, which gives a strong positive band centred at around 265 nm [36], and the profile of a GC-rich (76%) ssDNA in B-form, which gives a positive band at around 280 nm and a negative band at around 245 nm [35]. The CD melting curves of the oligonucleotide at 265 nm (Figure 4A), both in the absence and in the presence of BRACO-19, show a clear decrease in intensity for the positive band at 265 nm during the heating ramp, indicative of the disruption of the G4 structure upon thermal denaturation, and a fully reversible profile upon renaturation due to the reorganisation of the G4 structure during the cooling ramp. As expected for a G4 ligand, BRACO-19 stabilises the G4 structure of HIV LTR-II, as the *T*m of the melting curves is shifted towards higher values (~+7 ◦C; Table 2), although the degree of stabilisation was found to be smaller than previously reported in the literature [37]. All these observations confirm the presence of a G4 structure in HIV LTR-II.

**Figure 3.** CD spectra of the oligonucleotides under investigation (2 μM) during the heating ramps of the melting assays. (**A**) HIV LTR-II; (**B**) PQS 113; (**C**) PQS 140; (**D**) PQS 068. The arrows indicate the evolution along the heating ramp from 15 to 95 ◦C.

**Table 2.** Mid-transition temperatures (*T*m, in ◦C) for the oligonucleotides under investigation, both in the absence or in the presence of BRACO-19 (10 μM), as determined by CD melting assays.


BRACO-19—(*N*,*N*- -(9-{[4-(dimethylamino)phenyl]amino}acridine-3,6-diyl)bis(3-pyrrolidin-1-ylpropanamide).

For B19V, three different oligonucleotides were investigated. Oligo PQS 113 has a sequence matching the most probable PQS in the B19 genome, showing the highest G-score. Oligo PQS 140 has a sequence of corresponding length located in 5 proximity to the dyad symmetry, upstream and partially overlapping with PQS 113, showing a low G-score. Oligo PQS 068 is also a sequence of corresponding length, located downstream to PQS 113 and showing a null G-score. The oligonucleotides PQS 113, PQS 140, and PQS 068 all display the CD profiles of ssDNA in B-form with no clear contribution from G4 structures (Figure 3B–D); CD signatures peculiar to G4 structures were not observed, while the differences in the CD profiles are most probably due to different primary structures [35].

**Figure 4.** CD melting curves for the oligonucleotides under investigation (2 μM), both in the absence or in the presence of BRACO-19 (10 μM). (**A**) HIV LTR-II; (**B**) PQS 113; (**C**) PQS 140; (**D**) PQS 068. Filled squares: heating ramps (15 to 95 ◦C) in the absence of BRACO-19. Empty squares: cooling ramps (95 to 15 ◦C) in the absence of BRACO-19. Filled circles: heating ramps (15 to 95 ◦C) in the presence of BRACO-19. Empty circles: cooling ramps (95 to 15 ◦C) in the presence of BRACO-19. BRACO-19—(*N*, *N*- -(9-{[4-(dimethylamino)phenyl]amino}acridine-3,6-diyl)bis(3-pyrrolidin-1-ylpropanamide).

ė ė ėė The CD melting profiles of PQS 113 at 290 nm in the absence of BRACO-19 (Figure 4B) show a broad conformational transition at low temperature after both denaturation and renaturation, revealing a high degree of instability in solution; the large uncertainty of the *T*m value determined on the cooling ramp (Table 2) is a result of such instability. The CD melting profiles of PQS 140 and PQS 068 in the absence of BRACO-19 (Figure 4C,D) both display narrower, reversible thermal transitions; in both cases, the temperature-dependent variation in CD response at 285 nm has a smaller magnitude than that of HIV LTR-II. The CD melting curves of the oligonucleotides in the presence of BRACO-19 provide an indication of binding, although the underlying mechanisms of these binding interactions appear to be quite different from those observed with HIV LTR-II. Once again, the behaviour of PQS 113 (Figure 4B) is more complex: the trend of the melting curves suggests the possibility of a two-state conformational transition, which is not accurately described by the non-linear regression model used to analyse the melting profiles of G4 structures. On the other hand, the melting curves of PQS 140 and PSQ 068 (Figure 4C,D) are not reversible, since the *T*<sup>m</sup> value of the heating ramp is higher than that of the cooling ramp (Table 2). This phenomenon of hysteresis may be explained by a slower kinetics of denaturation and renaturation due to the presence of BRACO-19. Overall, BRACO-19 appears to interact with all the oligonucleotides under investigation, although the mechanism of binding is not consistent with the stabilisation of eventual G4 structures.

## *3.3. PQSs in B19V DNA: Biological Analysis*

To extend the results of CD studies and investigate a possible role of putative G4 structures in B19V DNA, we tested the biological effects on the virus–cell system of two reported G4 ligands, BRACO-19 and pyridostatin (PDS) [29]. BRACO-19 has been shown to inhibit telomerase activity, to possess antitumour activity and antiviral activity on different viruses in vitro, including HIV-1. PDS is a very selective G4 DNA-binding small molecule designed to form a complex with and stabilise G4 structures. It has been shown to strongly stabilise telomeric G4, triggering a DNA-damage response at telomeres. As an antiviral agent, PDS has been used to study the role of G4 in Epstein Barr Virus (EBV). As a model cell system, we used primary EPCs, which constitute a heterogeneous cellular population mimicking the natural target cells in in vivo infection and that present full permissiveness to viral replication at the appropriate differentiation stage [5,31]. Effects on cell viability and any possible activity on B19V were assessed in a time course of infection, by evaluating any dose-dependent effects of BRACO-19 and PDS.

Effects on cell viability. EPCs were cultured for 48 h at 37 ◦C in medium containing different concentrations of each compound (0.1–100 μM range), then cell viability was assessed by a WST-8-based colorimetric assay. Results are reported in Figure 5 and are expressed as percentage viability with respect to control cells incubated without compounds. A reduction in cell viability below 50% of control was observed starting from 10 and 5 μM for BRACO-19 and PDS, respectively. At higher concentrations, 50 and 100 μM, the metabolic activity of cells was totally inhibited.

**Figure 5.** Percentage of viability of erythroid progenitor cells (EPCs) cultured for 48 h in presence of different concentrations of BRACO-19 (**A**) and pyridostatin (PDS) (**B**). DMSO at 10% was used as a cytotoxicity positive control. Values are expressed as mean percentage compared to the control with medium only. Data were collected from triplicate wells in two different experiments. Statistical analysis was performed by one-way ANOVA (analysis of variance) followed by Dunnett's multiple comparison test. \*\*\* *p* value < 0.001.

By expressing a dose-dependent relationship between compound concentration and percentage cell viability, non-linear regression curves allowed determining 50% cytotoxic concentration (CC50) values: 9.99 μM for BRACO-19 (95% confidence interval: 8.9–11.53 μM; *R*<sup>2</sup> = 0.95); 4.01 μM for PDS (95% confidence interval: 0.47–15.52 μM; *R*<sup>2</sup> = 0.84).

Antiviral activity against B19V: The effects of BRACO-19 and PDS on B19V were evaluated by quantitative determination of viral replication and expression in a time course of infection. EPCs were infected with B19V at the multiplicity of infection of 10<sup>3</sup> geq/cell and cultured either in the absence or in the presence of each compound, at 0.5, 5, and 50 μM. Cells were collected at 2 and 48 h post-infection

(hpi). The extent of viral replication and expression was assessed by qPCR and RT-qPCR evaluation of the number of viral genomes and total transcripts respectively, at 48 hpi compared to that at the baseline at 2 hpi (Figure 6).

**Figure 6.** Amount of B19V DNA (**A**) and RNA (**B**) at 2 hpi, before addition of tested compounds, and at 48 hpi in infected cells cultured in the presence of BRACO-19 and PDS at the indicated concentrations (Log copies/105 cells). Data were collected from triplicate qPCR and RT-qPCR reactions in duplicate experiments. Statistical analysis was performed by one-way ANOVA (analysis of variance) followed by Dunnett's multiple comparison test among 48 hpi samples. \*\*\* *p* value < 0.001.

Viral DNA increased from 2 to 48 hpi on average 2.5 Log in the control samples, indicating productive viral replication. Inhibition of viral replication, displayed as a significant net reduction in viral DNA from 2 to 48 hpi, was not observed in cells treated with BRACO-19 and PDS. Compared to control samples, a partial inhibition of viral replication was evident only at the highest concentrations, with percentage values of 65% and 85% for BRACO-19 and PDS, respectively. Transcription of the viral genome, as determined by the increase of total viral mRNAs from 2 to 48 hpi, was also unaffected unless at the highest concentrations of BRACO-19 and PDS, with percentage values of 88% and 98%, respectively. The absence of dose-dependent effects on virus and the concomitant and prevalent effects on cell viability suggest that any marginal antiviral activity of BRACO-19 and PDS is likely due to the inhibition of the cellular metabolism rather than to a specific inhibitory activity on the virus, whose replication and expression appears unaffected by these G4 ligands.

## **4. Discussion**

B19V is a virus with distinctive features that induces interest, not least in the characterisation of its lifecycle and of virus–host interaction [6]. B19V is a widely circulating human pathogenic virus, although its clinical impact is often underestimated, and the development of specific antiviral tools still suffers from a striking gap. In addition to the propensity to enhanced diagnostic awareness [15], the development of effective antiviral strategies against B19V should be considered a relevant goal in the field [16]. A better understanding of the viral lifecycle and virus–cell interactions are required to identify relevant targets for more efficient and specific antiviral strategies. Unconventional DNA structures, in particular G-quadruplex planar structures disrupting the regular double helix structure of DNA, are increasingly recognised as relevant features for the regulation of critical biological processes. Viruses can include G4-forming sequences in their genomes as part of their interaction network within the cellular environment, and in many instances, these structures can provide targets for small-molecule ligands that can provide an antiviral effect by interfering with the normal viral regulation pathways [27–29].

The comprehensive survey in the G4-virus database provides a framework overview of PQS elements in viral genomes, aiming, in its statement, at expediting research on G-quadruplex in viruses, and at finding novel therapeutic opportunities. Out these PQSs, the presence and relevance of G4s as functional elements have been validated in some cases [29], or only predicted otherwise, requiring

experimental evidence as in the present case for B19V. For B19V genome, the G4-virus database reports a disperse presence of dinucleotide "GG" PQSs that can be considered statistically significant over a random distribution. An independent prediction on the presence of G4 structures can also be obtained by the QGRS mapper, which identifies at a high score two PQSs within the genomic terminal regions. In our work, we sought to validate these predictions on the presence of PQSs, but neither chemical nor biological evidence could lend experimental support to bioinformatics.

In B19V, predicted PQSs are mainly located within the terminal regions (ITRs), which are critically involved in the viral lifecycle under several aspects [6]. First, ITRs serve as origins of replication of the viral genome. A palindromic sequence is required to allow strand fold-back to form hairpin structures, in turn necessary for priming second-strand synthesis. The sequence asymmetries in the palindrome (flip/flop heterogeneity) are also strictly required [31], as they can possibly induce distortions in the hairpin secondary structure or determine the exact placement of sequence motifs recognised by binding moieties. Moreover, the ITRs are populated by binding motifs for the viral NS and several cellular proteins, relevant for both replication and transcription of the viral genome [38–40]. Finally, ITRs have the characteristics of CpG islands and are a possible target for epigenetic modifications such as CpG methylation, in turn able to regulate expression of the viral genome [41]. It should be also mentioned that for viruses in the family, the sole indirect evidence of the presence of PQSs forming G4 structures has been presented for Adeno-associated viruses (AAV) ITRs [42].

Our experimental challenge of bioinformatic predictions analysed the actual propensity of PQSs in B19V DNA to assume G4 structures and any possible inhibitory activity of G4 ligands on the viral lifecycle. Results did not lend support to the bioinformatic predictions on the occurrence of G4 structures in B19V genome and did not show any antiviral role for G4 ligands such as BRACO-19 and PDS. The reason for this discrepancy is possibly due to the sequence and symmetry constraints imposed on the sequence of B19V prevailing over the propensity of forming G4 structures, so that preservation of the secondary hairpin structures within ITRs is likely a more stringent functional requirement than the possibility of forming actual G4 structures. The QGRS mapper is reported to predict G4 structures with high accuracy (>0.95) [26], but for B19V DNA the evidence classifies the predicted PQSs as false positives. Within the ITRs, a high overall GC content may introduce a sequence bias and just increase the probability of detecting PQS-like signatures by bioinformatic tools. Actually, predictions reported in the G4-virus database indicate only a moderate statistical significance of the presence of dinucleotide PQSs. The patterns identified by the QGRS mapper program are not stringent, although the G-score obtained for the PQS regions in B19V ITRs matches that of a validated G4 structure such as HIV LTR-II. Overall, the formation and/or any relevant biological role of unconventional DNA structures such as G4 are unlikely in B19V ITRs, as determined by both CD and in vitro biological studies. On the contrary, it can be hypothesised that the formation of unconventional structures would possibly interfere with many processes crucial to viral replication without conferring any discernible selective advantage. Based on the present data, the development of antiviral strategies directed at perturbing the replicative origins in B19V DNA ITRs cannot include G4 structures as specific targets or G4 ligands as antiviral agents. In this respect, the characteristic combination of hairpin structures and sequence asymmetries appears to be a more relevant feature.

#### **5. Conclusions**

As a concluding remark, our work highlights how the enormous potential for structural or functional predictions provided by bioinformatic tools must be used with caution and results subjected to critical scrutiny. Computational methods tend to be assertive and additive, especially when aiming at the construction of comprehensive databases based on pattern search algorithms, as in the present case. Experimental validation/falsification of such predictions is required for a correct understanding of the biological systems as well as for the assessment of the computational algorithms' reliability. As in the present case, a negative experimental evidence is constructive both to avoid misconceptions and to provide a benchmark to evaluate the performance of computational methods.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1999-4915/12/9/935/s1. Figure S1: Inverted Terminal Regions (ITR) in B19V genome.

**Author Contributions:** Conceptualisation, M.B. and G.G.; data curation, G.B. and D.T.; investigation, G.B., D.T., I.C. and A.R.; methodology, G.B. and D.T.; resources, M.B. and G.G.; supervision, M.B. and G.G.; writing—original draft, G.B. and D.T.; writing—review and editing, M.B. and G.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. G.B. is a recipient of a post-doctoral fellowship from the Fondazione Luisa Fanti Melloni, University of Bologna.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

## **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article*
