**Long Non-Coding RNA-Ribonucleoprotein Networks in the Post-Transcriptional Control of Gene Expression**

#### **Paola Briata \* and Roberto Gherzi \***

Gene Expression Regulation Laboratory, Ospedale Policlinico San Martino IRCCS, 16132 Genova, Italy **\*** Correspondence: paola.briata@hsanmartino.it (P.B.); rgherzi@ucsd.edu (R.G.)

Received: 31 August 2020; Accepted: 16 September 2020; Published: 17 September 2020

**Abstract:** Although mammals possess roughly the same number of protein-coding genes as worms, it is evident that the non-coding transcriptome content has become far broader and more sophisticated during evolution. Indeed, the vital regulatory importance of both short and long non-coding RNAs (lncRNAs) has been demonstrated during the last two decades. RNA binding proteins (RBPs) represent approximately 7.5% of all proteins and regulate the fate and function of a huge number of transcripts thus contributing to ensure cellular homeostasis. Transcriptomic and proteomic studies revealed that RBP-based complexes often include lncRNAs. This review will describe examples of how lncRNA-RBP networks can virtually control all the post-transcriptional events in the cell.

**Keywords:** long non-coding RNA 1; RNA binding protein 2; post-transcriptional regulation

#### **1. Introduction**

Mammalian genomes are pervasively transcribed even though, in humans, only 19,000 proteins are coded for by less than 2% of the genome and, in the last two decades, it has become clear that the vast majority of the genome is transcribed as non-coding RNAs (ncRNAs) [1]. Long non-coding RNAs (lncRNAs), a largely underexplored class of ncRNAs arbitrarily classified as >200 nucleotides long, account for most of this pervasive transcription and more and more lncRNAs have been demonstrated to be functional molecules rather than transcriptional noise [1,2]. They are expressed in many different cell types and tissues at different levels, display strong cell- and tissue- specific expression, and are often characterized by poor conservation among species, at least at the primary sequence level [1,2]. Besides lncRNAs that display genomic features in common with protein coding-genes, others can be assigned to the following categories: (*i*) lncRNAs that are intergenic to protein-coding genes (lincRNAs); (*ii*) natural antisense transcripts (AS); and (*iii*) intronic lncRNAs [1,2]. In general, lncRNAs exhibit a surprisingly wide range of sizes, structural arrangements and functions and can be detected in the nucleus and/or the cytoplasm of expressing cells. All these features endow them with diverse and enormous functional potential even though they have also presented experimental challenges for their analysis [1,2].

Like proteins, lncRNAs exert their roles in all cell functions operating through different mechanisms. Their versatile features depend on several reasons but mainly on their subcellular localization and the adoption of specific structural modules with interacting partners, a process that may undergo dynamic changes in response to local cellular environments [3]. lncRNAs have been shown to be involved in diverse fundamental cellular processes such as proliferation and apoptosis, development and differentiation, X chromosome inactivation, and genomic imprinting [3]. They have also been implicated in human diseases such as coronary artery disease, amyotrophic lateral sclerosis, and Alzheimer's disease [4–6] as well as in cancer with either oncogenic or tumor suppression

functions [7]. LncRNAs can mediate their effects in *cis* or in *trans* by directly binding to DNA, RNA or proteins and can (*i*) influence the function of transcriptional complexes; (*ii*) modulate chromatin structures; (*iii*) regulate genome organization through interaction with nuclear matrix proteins; (*iv*) function as scaffolds to form ribonucleoprotein (RNP) complexes; (*v*) act as decoys for proteins and micro-RNAs (miRNAs) [2,3]. Thus, lncRNA-mediated control of gene expression may take place at transcriptional and/or post-transcriptional levels [2,3].

In general, lncRNAs interact with RNA-binding proteins (RBPs) that are conventionally viewed as proteins that bind to RNA through one or multiple RNA-binding domains and then change the fate or function of the bound RNAs [8]. A wide range of RBPs has been discovered and investigated over the years and proved to regulate gene expression at many levels but these are generally viewed as key players in post-transcriptional events [9,10]. The combination of the versatility of their RNA-binding domains with their structural flexibility enables RBPs to be involved in virtually all the post-transcriptional regulatory layers in the cell and to control the metabolism of a large array of transcripts [9,10]. RBPs establish highly dynamic interactions with other proteins, as well as with coding and non-coding RNAs, creating functional RNPs that regulate pre-mRNA splicing and polyadenylation, mRNA export, stability, localization and translation [9,10].

Excellent reviews are available on the roles of lncRNAs in transcriptional regulation and genomic organization. This review will focus on different levels of post-transcriptional control exerted by lncRNA/RBP interactions (*i*) polyadenylation and pre-mRNA splicing, (*ii*) mRNA export, (*iii*) mRNA decay, (*iv*) translation, (*v*) protein stability, (*vi*) miRNA maturation from precursors. We will not consider post-transcriptional effects dependent on base pairing between lncRNAs and other RNA species that do not involve RBPs.

#### **2. LncRNAs, RBPs, and Regulation of pre-mRNA Processing**

In order to produce a mature mRNA that can be efficiently translated into a protein, pre-mRNAs need extensive processing that can be recapitulated in (*i*) addition of cap structures at their 5- -end (capping), (*ii*) addition of stretches of A nucleotides at their 3- end (polyadenylation), and (*iii*) removal of introns with joining of exons (splicing). In certain circumstances, splicing and polyadenylation reactions can be modulated in order to originate two or more mRNA isoforms from a single pre-mRNA with processes defined as alternative polyadenylation (AP) and alternative splicing (AS) that concern more than 90% of intron-containing genes in humans [11,12]. The initial post-transcriptional modifications of pre-mRNA molecules—5- -end capping, splicing, and 3- -end formation by cleavage/polyadenylation—occur co-transcriptionally in the nucleus [13]. Indeed, seminal experiments performed in the early 2000s revealed that coupling early modifications of pre-mRNA with polymerase II-dependent transcription accelerates, by several orders of magnitude, the process of mRNA maturation [13]. Therefore, one could properly refer to these events as co- and post-transcriptional modification of nascent mRNAs. In recent years, a number of reports indicated that lncRNAs directly regulate AS events by utilizing three distinct modes: (*i*) the interaction with specific splicing factors (SFs) as well as with other SF-associated RBPs; (*ii*) the formation of RNA-RNA duplexes with pre-mRNA molecules [2,3], and (*iii*) the induction of chromatin remodeling that indirectly favors the AS of specific genes [2,3]. We will discuss here only the first mode of regulation.

Studies performed by the Chess laboratory in 2007 revealed that two abundant, predominantly nuclear lncRNAs, *MALAT1* (Metastasis Associated Lung Adenocarcinoma Transcript 1) and *NEAT1* (Nuclear Enriched Abundant Transcript 1), are associated with nuclear domains enriched in pre-mRNA splicing factors that are located in the interchromatin regions of the nucleoplasm of mammalian cells (speckles and paraspeckles) [14].

*MALAT1* co-localizes with several transcription factors as well as pre-mRNA processing factors and plays a critical role in coordinating transcriptional and post-transcriptional gene regulation [15]. Numerous RBPs (hnRNPH1, hnRNPK, hnRNPA1, hnRNPL, and PCBP1, just to mention a few) are required to ensure *MALAT1* proper localization to nuclear speckles [15]. Further, *MALAT1* has

been described to interact with component of the pre-mRNA splicing complex (RNPS1, SRRM1, and AQR) as well as with a number of RBPs involved in specific pre-mRNA AS events (SRSF1, SRSF2, SRSF3, SON, hnRNPC, hnRNPH1, hnRNPL among others) [16,17]. Overall, *MALAT1* localizes to hundreds of genomic sites belonging to active genes, modulates the recruitment of splicing factors to a large number of actively transcribing loci, and its silencing severely affects pre-mRNA splicing in cultured cells [17–21]. Further, Prasanth and coworkers reported that *MALAT1* is able to modulate the phosphorylation status of the SF SRSF1 further reinforcing the notion that the lncRNA exerts a biological role as a coordinator of pre-mRNA splicing [17] (see also Section 5).

Kingston and coworkers have demonstrated that *MALAT1* colocalizes to many of its chromatin binding sites with another abundant lncRNA, *NEAT1,* even though the two lncRNAs display overall distinct binding patterns thus suggesting that they exert partly overlapping functions [20]. Interestingly, proteomic experiments revealed that both *MALAT1* and *NEAT1* interact with a common set of proteins that include the splicing factor ESRP2 and the scaffold protein SAFB2 that is involved in the regulated phosphorylation of SRSF1 by the kinase SAPK1 [20]. *NEAT1* is an exquisitely nuclear lncRNA and an essential structural component of paraspeckles that include the splicing factors SFPQ and NONO and control different aspects of gene expression [22]. Similar to *MALAT1*, also *NEAT1* recently proved to play an important role in modulating AS events. Shelkovnikova laboratory, taking advantage of a *Neat1* knockout mouse model, demonstrated that the lncRNA controls the AS of a group of genes important for neuronal proliferation and differentiation, cell–cell interactions in the central nervous system (CNS), synaptogenesis, and axon guidance [23]. Interestingly, *Neat1* also controls the AS of a group of RBPs including hnRNPA2B1, hnRNPH1, hnRNPD, hnRNPK, SRSF5, and SRSF7 [23]. *Neat1* knockout mice display a phenotype characterized by deficit in social interaction and rhythmic patterns of CNS activity [23]. Further evidence of the role of *Neat1* in regulating AS derived from a recent study that demonstrated the interaction of the lncRNA with the multifunctional RBP KHSRP. *Neat1*-KHSRP complex controls the process of metastatization of soft-tissue sarcomas by regulating AS events [24].

Another lncRNA localized to a nuclear compartment enriched in pre-mRNA splicing factors, is *Miat* (Myocardial Infarction Associated Transcript, a.k.a. *Gomafu*) that has been reported by Mattick and coworkers to be implicated in the pathogenesis of schizophrenia, a debilitating mental disorder affecting about 1% of the world population [25]. Authors demonstrated that *Miat* can regulate neuronal activity-dependent AS likely by acting as a scaffold for splicing factors (including SF1, SRSF1, and QK1) [25]. *Miat* transient downregulation that occurs upon neuronal depolarization allows the release of the splicing factors thus affecting AS events in neuronal cells [25].

A mass spectrometry-based analysis of molecular partners of *PANDAR* (Promoter Of *CDKN1A* Antisense DNA Damage Activated RNA)—a lncRNA involved in the regulation of proliferation and senescence whose overexpression has been observed in several human cancers and correlates with poor survival rate—allowed the identification of an unanticipated function of this lncRNA in modulating AS. Hennig and coworkers demonstrated that *PANDAR* interaction with PTBP1, a factor implicated in the regulation of AS events, results in modulated AS of *BCL2L1* pre-mRNA that encodes a potent inhibitor of cell death [26]. Authors hypothesize that *PANDAR* exerts a decoy function [26]. PTBP1 also interacts with *Pnky*, a neural-specific, nuclear lncRNA and modulates the expression and the AS of an overlapping set of transcripts [27]. Double knockdown experiments performed in neuronal stem cells indicate that the RBP and the lncRNA function in the same pathway [27].

The interaction of *LINC01133* with the SF SRSF6 proved to contribute to the ability of the lncRNA to modulate the Epithelial to Mesenchymal Transition (EMT) in colorectal cancers [28]. *LINC01133* is an abundant lncRNA whose expression is down-regulated upon colon cancer cell treatment with TGFβ, a potent inducer of EMT [28]. *LINC01133*-mediated inhibition of the SRSF6 function appears to be required for the lncRNA-mediated inhibition of EMT [28]. This observation supports the notion previously reported by our laboratory that TGFβ induces EMT by modulating the activity of RBPs involved in AS regulation [29].

By investigating the functions of *DSCAM-AS1* (Down Syndrome Cell Adhesion Molecule antisense 1)—a lncRNA overexpressed in invasive breast cancers—De Bortoli and coworkers reported that the lncRNA, besides affecting global gene expression and producing changes in the AS of its targets, influences polyadenylation by regulating the alternative 3- UTR usage of 360 genes [30]. These changes in the early steps of the post-transcriptional regulation of gene expression appear to depend on the interaction between *DSCAM-AS1* and the nucleoplasm-enriched RBP hnRNPL [30].

#### **3. LncRNAs, RBPs, and Regulation of mRNA Nuclear Export**

Mature (capped, spliced, polyadenylated) mRNAs rapidly associate with RBPs and, together with various other RNA species (rRNA, tRNA, miRNA precursors, lncRNA), are transported from the nucleus to the cytoplasm through the nuclear pore complex (NPC) in the context of RNPs [31]. Despite the fact that mammalian cells synthesize a multitude of distinct mRNAs and that the composition of each individual RNP is unique and extremely dynamic throughout its life, export of the vast majority of mRNAs utilizes a single export receptor, the heterodimeric export receptor NXF1-NXT1 that mediates translocation through the NPC [31]. The export receptor is displaced at the cytoplasmic side of the NPC to release the RNPs into the cytoplasm. Directionality of the transport is controlled by distinct sets of DEAD-box ATPases that regulate RNPs association to and dissociation from the NXF1-NXT1 complex [31,32]. Importantly, mRNA nuclear export can undergo intense regulation by a variety of stimuli [32] that can also contribute to drug-induced eradication of cancer cells [33].

Recently, Prasanth and coworkers demonstrated that the overexpression of a predominantly nuclear lncRNA (*ROCR*, a.k.a. *LINC02095*) promotes breast cancer proliferation by facilitating the expression of the oncogenic transcription factor SOX9 [34]. *ROCR* favors both transcription and nuclear export of *SOX9* mRNA and its silencing in breast cancer cells reduces the cytoplasmic levels of *SOX9* mRNA [34]. Interestingly, SOX9 displays strong nuclear localization in highly invasive triple-negative breast cancer cells as opposed to other breast cancer subtypes [34]. Although nuclear retention of *SOX9* mRNA in cells depleted of *ROCR* is demonstrated, authors do not provide information on how the lncRNA affects the process of mRNA export and on the identity of the RBP(s) that, interacting with *ROCR*, contributes to its function.

Chromosome translocations may result in the exchange of DNA sequences between genes. Many such gene fusions are strong driver mutations in neoplasia and have provided fundamental insights into the pathogenetic mechanisms of certain tumors [35]. Chimeric mRNAs resulting from genomic rearrangements need to be translocated to the cytoplasm in order to be translated into the resulting oncogenic proteins [35]. Wang and coworkers recently reported on the involvement of the *MALAT1* in the regulation of nuclear export of chimeric mRNAs encoding the oncogenic fusion proteins PML-RARA, MLL-AF9, MLL-ENL, and AML1-ETO [36]. These authors show that nuclear export of the chimeric mRNAs depends on the *MALAT1* expression levels [36]. They propose a complex regulatory mechanism that involves the methylation of mRNAs to form N6-methyladenosine (m6A). m6A modification of mRNA accounts for the most abundant mRNA internal modification and has emerged as a widespread regulatory mechanism that controls gene expression in diverse physiological processes [37]. RBPs able to catalyze the m6A modification (writers), to recognize the m6A modification (readers), and to abrogate this specific modification (erasers) have been identified and characterized in recent years [37]. m6A has been reported to enhance mRNA export from the nucleus through the interaction of the m6A-modified mRNAs with the "reader" RBPs YTHDC1 and SRSF3 that function as adaptors for the NXF1-dependent mRNA export pathway [37]. Wang and coworkers provide evidence that *MALAT1*, upon interaction with oncogenic fusion proteins in nuclear speckles, promotes the interaction between the fusion proteins and the m6A methyltransferase cofactor METTL14 thus controlling the chimeric mRNA-exporting process through the m6A reader YTHDC1 [36]. The results of this study suggest the possibility that other lncRNAs, besides *MALAT1*, could provide a platform for the association of m6A "readers" with m6A-modified specific mRNAs to influence their nuclear export.

#### **4. LncRNAs, RBPs, and Regulation of mRNA Decay**

It is well known that the abundance of an mRNA is a function not only of its synthesis, processing, and nuclear export, but also of its degradation rate in the cytoplasm [38]. mRNA decay is an essential step in gene expression as it can rapidly set the levels of transcripts that undergo translation. A multitude of RBPs and/or non-coding RNAs can bind to specific elements of a certain mRNA and dictate its degradation rates via their ability to recruit (or exclude) the mRNA degradation machineries which perform the complex events of deadenylation, decapping and degradation of the RNA body [38]. Several cues can activate signal transduction pathways and modify the general mRNA decay machinery through their interaction with specific RBPs and this affects the mRNA decay rate and abundance [38]. We will describe and discuss here below examples of lncRNAs that contribute to the regulation of mRNA decay through their interaction with RBPs and, in turn, modulate important cellular functions and crucial pathological events.

An important example of lncRNA-RBP network operating in the cytoplasm and modulating the relevant cell function of maintaining genomic stability in human cells is based on the lncRNA *NORAD* [39,40]. *NORAD* (non-coding RNA activated by DNA damage) is highly conserved, broadly and abundantly expressed in mammalian cells and tissues, and induced after DNA damage [39,40]. Importantly, inactivation of *NORAD* triggers dramatic aneuploidy in previously karyotypically stable cell lines. In a search for *NORAD*-interacting proteins, Mendell and co-workers found that this lncRNA functions as a multivalent binding platform for the PUMILIO (PUM) family of RBPs, with the capacity to sequester a significant fraction of the cellular pool of PUM1 and PUM2 and, in turn, to limit their ability to repress target mRNAs [39]. RBPs of the PUM family bind with high specificity to sequences in the 3- UTRs of target mRNAs and stimulate deadenylation and decapping, resulting in accelerated turnover and decreased translation [41]. Among PUM targets are a large set of factors that are critical for mitosis, DNA repair as well as DNA replication and their excessive repression in the absence of *NORAD* perturbs accurate chromosome segregation and can induce tetraploidization [39–41]. These findings have revealed a lncRNA-dependent mechanism that regulates a highly dosage-sensitive family of RBPs, uncovering a post-transcriptional regulatory axis that maintains genomic stability in mammalian cells and contributes to an emerging concept that a major class of lncRNAs function as molecular decoys. More recently, *NORAD*, whose sequence is characterized by several repetitive units, has been studied in order to identify additional interacting partners [42]. Ulitsky and coworkers found the RBP KHDRBS1 (a.k.a. SAM68) binds to *NORAD* and is required for *NORAD* function in antagonizing PUM [42]. This provides a paradigm for how repeated elements in lncRNAs synergistically contribute to complex tasks and for how a lncRNA can interact with multiple RBPs in order to operate a specific function.

Another lncRNA endowed with several distinct functions is *H19* [43]. In a systematic search to detect regulatory RNA species interacting with the RBP KHSRP in multipotent mesenchymal C2C12 cells, we identified, among others, *H19* [44]. We demonstrated that KHSRP directly interacts with *H19* in the cytoplasm of proliferating undifferentiated C2C12 cells and that this interaction favors the decay-promoting function of KHSRP on labile transcripts, such as *Myog*, through recruitment of the Exosome complex [44]. AKT activation during C2C12 differentiation induces KHSRP dissociation from *H19* and, as a consequence, *Myog* mRNA is stabilized whereas KHSRP is able to shuttle to nuclei where it promotes maturation of myogenic miRNAs from precursors, thus favoring myogenic differentiation (see also Section 6) [44]. In a sense, *H19* can be viewed as a modulator of two important and distinct post-transcriptional regulatory steps that lead to myogenic differentiation.

Recently, we identified a lncRNA expressed in epithelial tissues which we termed *Epr* (Epithelial cell Program Regulator, a.k.a. *BC030874*). *Epr* is rapidly downregulated by TGF-β and its sustained expression largely reshapes the transcriptome, favors the acquisition of epithelial traits, and reduces cell proliferation in cultured mammary gland cells as well as in an animal model of orthotopic transplantation [45]. Mechanistically, *Epr* interacts with chromatin and regulates the transcription of several genes [46] including the cyclin-dependent kinase inhibitor *Cdkn1a*. Interestingly, *Epr* changes *Cdkn1a* gene expression by affecting both its transcription and mRNA decay through its association with the transcription factor SMAD3 and the RBP KHSRP, respectively [45]. KHSRP is predominantly an mRNA decay promoting factor in this cellular context and the interaction with *Epr* blocks its ability to induce decay of *Cdkn1a* mRNA.

The lncRNA *LERFS* (Lowly Expressed in Rheumatoid Fibroblast-like Synoviocytes) is expressed at low levels in fibroblast-like synoviocytes (FLSs) derived from patients suffering for rheumatoid arthritis (RA) and regulates the migration, invasion, and proliferation of FLSs through interaction with the RBP SYNCRIP (a.k.a. hnRNPQ) [47]. Under healthy conditions, the *LERFS*-SYNCRIP complex, by binding to the mRNA of *RHOA*, *RAC1*, and *CDC42*—the small GTPase proteins that control the motility and proliferation of FLSs—, decreases the stability and/or translation of the target mRNAs and downregulates their protein levels [47]. In RA FLSs, decreased *LERFS* levels induce a reduction of the *LERFS*-SYNCRIP complex and this, in turn, reduces the binding of SYNCRIP to the target mRNAs thus increasing their stability or translation [47]. More specifically, *LERFS* and SYNCRIP regulate the stability and the translation of *RAC1* mRNA but regulate only the mRNA translation of *RHOA* and *CDC42* (see also Section 4) [47]. In general, these findings suggest that a decrease in synovial *LERFS* may contribute to the synovial aggression and joint destruction that are features of RA and targeting *LERFS* may have therapeutic potential in patients suffering for RA.

The lncRNA *UCA1* (Urothelial Carcinoma-Associated 1) has been found as a target of the CAPERα/TBX3 transcriptional repressor complex which is required to prevent premature senescence of primary cells, to regulate the activity of core senescence pathways in mouse embryos, and to control cell proliferation by repressing the transcription of *CDKN2A* gene (a.k.a. p16INK) and the RB pathway [48]. *UCA1* is a direct transcriptional target of CAPERα/TBX3 repression and its overexpression is sufficient to induce senescence [48]. In proliferating cells, hnRNPA1 binds and destabilizes *CDKN2A* mRNA whereas during senescence, *UCA1* sequesters hnRNPA1 and this, in turn, stabilizes *CDKN2A* mRNA [48]. Dissociation of the CAPERα/TBX3 co-repressor during oncogenic stress activates *UCA1* which, therefore, can be considered a tumor suppressor. See Section 4 for *UCA1*-dependent translational regulation and its opposite outcome in tumorigenesis.

Akiyama and colleagues demonstrated that *MYU* (MYC-Upregulated, a.k.a. *VPS9D1-AS1*) is a lncRNA transcriptionally induced by MYC upon its activation by the WNT signaling [49]. *MYU* is upregulated in most colon cancers and required for the tumorigenicity of colon cancer cells. Mechanistically, *MYU* associates with the RBP hnRNPK to stabilize *CDK6* mRNA and thereby promotes the G1-S transition of the cell cycle [49]. The authors also propose that hnRNPK and *MYU* hinder the inhibitory effect of miR-16 on *CDK6* mRNA [49]. Importantly, the WNT/MYC/*MYU*-mediated upregulation of CDK6 is essential for cell cycle progression and clonogenicity of colon cancer cells [49].

Another lncRNA playing a role in tumorigenesis is *LINC-ROR* (Regulator of Reprogramming) whose knockout in colon cancer cells suppresses cell proliferation and tumor growth. *LINC-ROR* plays an oncogenic role in part through regulation of *MYC* mRNA expression [50]. The lncRNA interacts with the RBPs PTBP1 (a.k.a. hnRNPI) and hnRNPD (a.k.a. AUF1) and is required for PTBP1 binding to *MYC* mRNA, while the interaction of *LINC-ROR* with hnRNPD inhibits its binding to *MYC* mRNA. As a result, *MYC* mRNA stability is increased and this leads to enhanced cell proliferation and tumorigenesis [50]. See also Section 4 for *LINC-ROR* functions in translation.

Cao and coworkers demonstrated that miR-1 promotes IFNG- (a.k.a IFN-γ) activated innate response in macrophages during *Listeria monocytogenes* infection through increasing the expression of *Stat1* mRNA [51]. From a mechanistic point of view, miR-1 targets the lncRNA *Sros1* (Suppressive non-coding RNA of STAT1) for degradation [51]. In noninfected macrophages *Sros1* blocks the interaction of *Stat1* mRNA with the RBP CAPRIN1 while the *Listeria monocytogenes*-induced degradation of *Sros1* releases CAPRIN1 that is made available to bind and stabilize the *Stat1* mRNA thus leading to increased STAT1 protein levels [51]. This ultimately strengthens IFNG signaling in the macrophages and promotes an innate immune response to intracellular bacterial infection.

#### **5. LncRNAs, RBPs, and Translation Regulation**

Translation is a multistep process comprising initiation, elongation, termination and ribosome recycling [52]. During initiation, the ribosome is recruited to the mRNA and scans the 5 untranslated region of the transcript for the presence of the translation start codon. Under most conditions, initiation is the rate-limiting step of translation and therefore it is tightly regulated. Several key signaling pathways, including mammalian/mechanistic target of rapamycin (mTOR), mitogen activated protein kinases (MAPKs), and integrated stress response (ISR) pathways, converge on the initiation step to control the rate of protein synthesis in response to a variety of stimuli [52]. Control of mRNA translation plays a pivotal role in the regulation of gene expression in embryonic and adult tissues and defects in the translation process are deleterious for development and physiology [52]. During recent years, several lncRNAs have been identified as regulators of distinct steps of their target mRNA translation.

The lncRNA *TRERNA1* (Translational Regulatory, a.k.a. *treRNA*) was identified through genome-wide computational analysis [53]. *TRERNA1* is upregulated in breast cancer primary and lymph node metastasis samples and its expression stimulates tumor invasion in vitro and metastasis in vivo [53]. Authors found that *TRERNA1* downregulates the expression of the epithelial marker CDH1 (a.k.a. E-cadherin) by suppressing the translation of its mRNA and identified a novel RNP complex—consisting of the RBPs hnRNPK, FXR1, and FXR2 as well as the splicing factors PUF60 and SF3B3—that is required for *TRERNA1* function [53]. In more detail, PUF60-SF3B3 dimer interacts with hnRNP K, FXR1, and FXR2 to form a *TRERNA1*-containing RNP complex that, in turn, binds to eIF4G1 affecting translation [53].

Mo and coworkers have found that *LINC-ROR* is transcriptionally induced by TP53 (a.k.a. p53) and, at the same time, is a strong negative regulator of TP53-mediated cell cycle arrest and apoptosis [54]. Unlike MDM2 that causes TP53 degradation through the ubiquitin–proteasome pathway, *LINC-ROR* suppresses TP53 translation through direct interaction with the phosphorylated form of the RBP PTBP1 (a.k.a. hnRNPI) in the cytoplasm [54]. This suggests that the *LINC-ROR*-PTBP1-TP53 axis may constitute an additional surveillance network for the cell to better respond to various stresses (see also Section 3 for the role of *LINC-ROR* in mRNA decay control). The same group demonstrated that PTBP1 can also form a functional RNP with the lncRNA *UCA1* and increase the *UCA1* RNA stability [55]. In addition, in this case the phosphorylated form of PTBP1, predominantly in the cytoplasm, is responsible for the interaction with *UCA1* [55]. The interaction of *UCA1* with PTBP1 suppresses the protein level of CDKN1B (a.k.a. p27KIP1) by competitive inhibition, although the precise mechanism is still unclear. Authors demonstrate that the complex comprising *UCA1* and PTBP1, has an oncogenic role in breast cancer both in vitro and in vivo [55]. See Section 3 for *UCA1*-dependent regulation of mRNA stability and its opposite outcome in tumorigenesis.

*LncMyoD* (a.k.a. *1700025L06Rik*) is a lncRNA whose primary sequence is not well conserved between human and mouse models while its locus, gene structure, and function are preserved [56]. *LncMyoD* is transcribed next to the *Myod* gene and is directly activated by MYOD during myoblast differentiation. Knockdown of *LncMyoD* strongly inhibits terminal muscle differentiation, mainly due to an unsuccessful exit from the cell cycle [56]. Authors demonstrate that *LncMyoD* directly binds to the RBP IGF2BP2 (a.k.a IMP2) and negatively regulates IGF2BP2-mediated translation of genes able to modulate proliferation such as NRAS and MYC and this contributes to the failure of myoblast terminal differentiation [56].

Bozzoni and co-workers describe another regulatory circuitry controlled by a muscle-specific cytoplasmic lncRNA, *Lnc-Smart* (Skeletal Muscle Regulator of Translation, a.k.a. *Gm14635*), which is essential for proper differentiation of murine myogenic precursors [57]. By direct base pairing with a G-quadruplex region present in the *Mlx-*γ mRNA, *Lnc-Smart* prevents the translation of the mRNA by counteracting the activity of the RBP DHX36 endowed with RNA helicase function [57]. The time-restricted, specific effect of *Lnc-Smart* on the translation of *Mlx-*γ isoform modulates also the general subcellular localization of total MLX proteins (isoforms α and β), impacting on their transcriptional output and promoting proper myogenesis and mature myotube formation [57]. In more

detail, *Lnc-Smart* depletion leads to alteration of the differentiation program with defects in myoblast fusion while its overexpression produces an apoptotic phenotype. Authors propose that *lnc-SMaRT* needs to be precisely controlled in time and quantity in order to fine-tune the balance between differentiation and apoptosis to ensure proper myogenesis [57].

The lncRNA *BCYRN1* (Brain Cytoplasmic RNA, a.k.a. *BC200*) regulates RNA metabolism in neural cells by modulating local translation in the postsynaptic dendritic microdomain by interacting with components of the translational machinery, such as eIF4A, eIF4B, and PABPC1 [58]. Lee and coworkers identified the RBPs hnRNPE1 and hnRNPE2 as *BCYRN1*-interacting proteins using a yeast three-hybrid screening. hnRNPE1 and hnRNPE2 bind to *BCYRN1* and can rescue the *BCYRN1*-dependent inhibition of translation by competing with eIF4A for binding to the lncRNA in an in vitro system [58].

#### **6. LncRNAs, RBPs, and Post-Translational Modifications**

Post-translational modifications occur in almost every protein during or after its translation and represent an extremely powerful tool operated by the cell in order to regulate the activity, stability, localization, interactions or folding of proteins by inducing their covalent linkage to new functional chemical groups, such as phosphate, acetyl, methyl, carbohydrate and ubiquitin [59]. Different post-translational modifications lead to distinct effects on target proteins and result in disparate biological consequences, from survival to apoptosis, from proliferation to differentiation, from activation to quiescence [59].

FUS (Fused in Sarcoma) is a multifunctional RBP that plays essential roles in post- transcriptional gene expression and possesses the ability to contribute to RNP granule formation via an RNA-dependent self-association [60]. FUS ability to interact with multiple RNA species accounts for its multiple functions. FUS (*i*) binds to nascent pre-mRNAs and acts as a molecular mediator between RNA polymerase II and RNAU1 small nuclear RNA-containing RNP thereby coupling transcription and splicing, (*ii*) binds to its own pre-mRNA and autoregulates its expression, and (*iii*) promotes homologous recombination during DNA double-strand break repair [60]. Numerous mutations in the *FUS* gene have been identified in patients suffering for two severe neurodegenerative disorders, amyotrophic lateral sclerosis and frontotemporal lobar degeneration [60]. Although the molecular mechanisms of FUS-dependent neurotoxicity are poorly understood, high concentrations of the RBP within RNA granules have been proposed to promote the formation of irreversible pathological aggregates [60]. Two recent papers point to lncRNA-dependent post-translational modifications of FUS as critical mechanisms affecting the cellular concentration and activity of the RBP and, in turn, its cellular functions. Nagai and coworkers reported that silencing of the *Drosophila* lncRNA *hsr*ω converts FUS from a mono-to di-methylated arginine status via upregulation of the arginine methyltransferase 5 (PRMT5) [61]. PRMT5-dependent modification of FUS promotes its proteasomal degradation, thus leading to a strong downregulation of its cellular levels. Although in this case FUS regulation by the lncRNA is indirect, it is also interesting to note that *hsr*ω interacts with and organizes a number of RBPs including TARDBP, hnRNPAB and hnRNPA2B1 and FUS itself [61]. Further, authors show that an increase in FUS causes a downregulation of PRMT5 expression leading to an autoregulatory accumulation of FUS, thus increasing the complexity of this regulatory mechanism [61].

Wu and coworkers investigated the functions of the lncRNA *RMST* (RhabdoMyosarcomaassociated Transcript) that has been characterized as a tumor suppressor in triple-negative breast cancers as well as a regulator of neuronal differentiation and brain development [62]. Authors reported that FUS and *RMST* directly interact and *RMST* enhances FUS SUMOylation [62] but fails to provide a mechanistic explanation for the *RMST*-dependent FUS SUMOylation. *RMST*-induced SUMOylation is required for the interaction between FUS and hnRNPD that is able to affect the stability of ATG4D protein, a factor involved in the biogenesis of autophagosomes, vesicles that contain cellular material intended to be degraded by autophagy [62]. Altogether, these data suggest that *RMST*-dependent SUMOylation of FUS promotes the hnRNPD-mediated stabilization of ATG4D and potentially impacts on the autophagic process [62].

The lncRNA *OCC1* (Overexpressed in Colon Carcinoma-1) plays a tumor suppressive role in colorectal cancer [63]. *OCC1* knockdown promotes cell growth both in vitro and in vivo, which is largely due to its ability to inhibit G0 to G1 and G1 to S phase cell cycle transitions [63]. *OCC1* exerts its function by destabilizing ELAVL1 (a.k.a. HuR) an RBP that, by interacting with the 3 untranslated regions of its target mRNAs, can stabilize thousands of transcripts [64]. *OCC1* enhances the binding of an ubiquitin E3 ligase to ELAVL1 and renders the RBP susceptible to ubiquitination and degradation, thereby reducing the levels of ELAVL1 and, in turn, of its target mRNAs, including the mRNAs associated with cancer cell growth [63]. This report confirms the original observation that ELAVL1 undergoes regulated ubiquitination and proteasome degradation [64] and represents an example of a lncRNA that indirectly regulates the stability of a group of mRNAs through modulation of the post-translational modification of an RBP [63].

As anticipated in Section 1, levels of *MALAT1* affect the ratio between dephosphorylated and phosphorylated SF SRSF1 with a not completely defined mechanism [17].

#### **7. LncRNAs, RBPs, and Maturation of microRNAs from Precursors**

A flood of studies published in the last 20 years have demonstrated that microRNAs (miRNAs) regulate the entire spectrum of cellular functions and a number of reports clearly demonstrated that miRNA biogenesis is an important regulatory step that controls the cellular levels of miRNAs and, consequently, their functions [65]. The biogenesis of miRNAs involves two distinct enzymatic reactions carried out by distinct multiprotein complexes located in different cellular compartments [65]. First, primary miRNAs (pri-miRNAs) are processed to precursor miRNAs (pre-miRNAs) through the intervention of the DROSHA-containing complex in the nucleus. Next, through the interaction with XPO5 (a.k.a. exportin-5) and RAN, the pre-miRNA is transported into the cytoplasm where it undergoes a second round of processing catalyzed by the DICER1-containing protein complex. Finally, one strand of the resulting short (21–25 nt) RNA duplex, that corresponds to the mature miRNA, is loaded into the RISC (RNA Induced Silencing Complex) to exert its mRNA targeting functions [65]. Numerous studies have demonstrated that specific RBPs associate with the enzymatic complexes responsible for miRNA maturation to provide specificity and/or to regulate their activity [65].

Groundbreaking investigations conducted in 2015 by Filipowicz laboratory demonstrated that, during the course of postnatal development of retinal photoreceptors, the accumulation of mature miR-183/96/182 is delayed compared with pri-miR-183/96/182 [66]. Authors identified the lncRNA *Rncr4* (named after Retinal Non-Coding RNA 4) that is expressed in maturing photoreceptors as a factor activating pri-miR-183/96/182 maturation [66]. *Rncr4* modulates the activity of the DEAD-box RNA helicase/ATPase DDX3X, an RBP that exerts a potent inhibition on pri-miR-183/96/182 maturation in early phases of postnatal photoreceptor development [66]. Authors observe that the photoreceptor-specific DDX3X silencing results in a significant decrease in pri-miRNAs and a strong increase in mature miR-183/96/182 levels in photoreceptors when compared with controls [66]. MiR-183/96/182 control the expression of CRB1 that is a component of the molecular scaffold involved in the formation and integrity of tight junctions between retinal glia and photoreceptors that controls proper development of polarity in the eye [66]. Altogether the study reveals that *Rncr4*-regulated timing of miR-183/96/182 maturation from precursors is an essential step for obtaining the even distribution of cells across retinal layers.

More recently, Portman and coworkers utilized a different model of organ development—sexual maturation in *Caenorabditis Elegans (C. Elegans)*—to prove the involvement of lncRNA-regulated miRNA maturation from precursors during development [67]. The *C. Elegans* RBP LIN-28, similarly to mammalian LIN28, is a negative regulator of the maturation of let-7 miRNA family members from their pri-miRNAs and Portman and coworkers demonstrated that the lncRNA *lep-5* inhibits LIN-28 function thus promoting the maturation of let-7 that, in turn, controls the onset of sexual maturation in the nervous system of roundworms [67]. Mechanistically, *lep-5* functions as an RNA scaffold, forming a tripartite complex with LEP-2 (whose mammalian homolog is MKRN1 an E3 ubiquitin ligase that promotes the ubiquitination and proteasomal degradation of target proteins) and LIN-28 to promote LIN-28 degradation [67]. The well-known conservation of regulatory mechanisms across species allowed Portman and coworkers to hypothesize that an unidentified *lep-5*-like lncRNA may exist in mammals and play a key role in sexual maturation [67].

The heterodimeric complex formed by the two RBPs NONO and SFQP (a.k.a. PSF) has been defined as a prototypical multipurpose molecular scaffold that dynamically mediates a wide range of protein–protein and protein–nucleic acid interactions [68]. Indeed, the NONO-SFQP complex (*i*) controls pre-mRNA splicing and polyadenylation processes [68], (*ii*) plays a role in nuclear retention of defective RNAs—when associated with the nuclear matrix protein MATR3—, and (*iii*) promotes DNA double-strand break repair via the canonical non-homologous end joining pathway [68]. Fu and coworkers reported an additional function for the NONO-SFQP complex by demonstrating its ability to bind to a large number of pri-miRNAs and to globally enhance pri-miRNA processing into pre-miRNAs by the DROSHA complex [69]. The NONO-SFQP heterodimer is involved in paraspeckle formation and integrity and, therefore, it is not surprising that it interacts with the paraspeckle-enriched lncRNA *Neat1*. The authors also prove that *Neat1* specifically links NONO-SFQP heterodimer with the DROSHA complex thus modulating its enzymatic activity [69].

As we have discussed in Section 3, the lncRNA *H19* is endowed with remarkably distinct regulatory properties. Wu and coworkers recently reported that *H19* suppresses the expression of PTBP1 in cholestatic mouse livers [70]. Authors have observed that PTBP1 and *H19* interact under normal conditions but fail to provide information about the mechanism by which *H19* controls PTBP1 expression [70]. It would be interesting to investigate whether *H19* exerts a scaffold function by bridging together a putative ubiquitin ligase with PTBP1 in order to promote its degradation similarly to what *lep-5* does with LIN-28 in *C. Elegans* (see above, [67]). Authors report a suppressive effect of PTBP1 on the maturation of let-7 family members from their pre-miRNA precursors and suggest that *H19*-dependent PTBP1 downregulation ultimately leads to enhanced levels of let-7 family members in cholestatic mouse livers [70].

Our laboratory has reported that *H19* is indirectly implicated in the processing of a specific subset of miRNAs, the so-called myogenic-miRNAs, whose enhanced expression contributes to myogenesis and muscle regeneration [44]. Indeed, during myogenic differentiation of multipotent mesenchymal C2C12 cells, AKT-dependent phosphorylation of the RBP KHSRP induces its dismissal from the cytoplasm (where it is associated with *H19* to promote decay of labile mRNAs including *Myog*, see Section 3) and its translocation to cell nuclei where KHSRP is repurposed to induce myogenic-pri-miRNAs maturation [44].

#### **8. Take-Home Message**

It is evident from the above Sections that the networks based on lncRNA-RBP interactions represent highly versatile tools to post-transcriptionally regulate gene expression. We have discussed examples of specific lncRNAs that, through interactions with distinct sets of RBPs, regulate complex layers of post-transcriptional control (Summarized in Figure 1 and Table 1).

LncRNAs usually display a cell- or tissue-restricted expression while RBPs are more broadly expressed. Thus, a lncRNA can provide a cell- and/or tissue-specific function to an RBP. Further, since the expression levels of lncRNAs can be modulated by extracellular signals and RBP functions can be post-translationally modulated by the same and/or different pathways, the functional outcome of lncRNA-RBP complexes can be tightly controlled in a time- and space-specific manner. This results in a huge regulatory potential.

**Figure 1.** Nuclear and cytoplasmic functions of long non-coding RNA (lncRNA)-RNA binding protein (RBP) networks.

**Table 1.** Summary of the lncRNA-RBP networks described in this review. The ENSEMBL accession number is provided in parentheses. In the case of *Drosophila* and *C. Elegans* lncRNAs, the accession numbers to FlyBase and WormBase, respectively, are provided in parentheses.



**Table 1.** *Cont*.

It is known that many lncRNAs function as molecular decoys and we have reviewed examples of abundant lncRNAs that exert part of their biological functions through this mechanism (e.g., *MALAT1, NEAT1, H19, NORAD*). However, the generally low abundance of many lncRNAs can generate debate on the stoichiometry of their interaction with the usually abundant RBPs. More and more evidence points to the functional relevance of specialized membrane-free subcellular compartments where high abundance of lncRNAs may not be required because their local concentration might be the limiting step. Indeed, ncRNAs have been viewed as potential mediators of liquid–liquid phase separation through their ability to operate as molecular scaffolds for the binding of RBPs, thus regulating the sizes and the dynamics of membrane-free organelles that carry out biological processes [71]. Phase separation is an emerging paradigm for understanding spatial and temporal regulation of a variety of cellular processes and additional studies will be needed to clarify its role in the post-transcriptional regulatory layer of gene expression [72].

In conclusion, the complexity of lncRNA-RBP functional networks is often increased by the experimental evidence that some post-transcriptional modifications of gene expression occur co-transcriptionally and by the ability of some lncRNAs to exert both transcriptional and posttranscriptional functions in a coordinated way. Recently developed technologies aimed at analyzing—in the context of distinct cell compartments—macromolecular complexes including lncRNAs, chromatin, and RBPs in an "almost-native" status, will allow researchers to portray, at a better resolution, the elaborate scenario of the interactions that we have described.

**Author Contributions:** P.B. and R.G. equally contributed to all the steps of this review preparation. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Fondazione AIRC per la Ricerca sul Cancro (IG 21541).

**Acknowledgments:** L. Brondolo (Gene Expression Regulation Laboratory, Ospedale Policlinico San Martino) contributed to the early phase of the bibliographic search.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Endogenous Double-Stranded RNA**

**Shaymaa Sadeq, Surar Al-Hashimi, Carmen M. Cusack and Andreas Werner \***

Biosciences Institute, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, UK; s.k.sadeq2@newcastle.ac.uk (S.S.); s.o.t.al-hashimi2@newcastle.ac.uk (S.A.-H.); Carmenmariacusack@gmail.com (C.M.C.)

**\*** Correspondence: andreas.werner@ncl.ac.uk

**Abstract:** The birth of long non-coding RNAs (lncRNAs) is closely associated with the presence and activation of repetitive elements in the genome. The transcription of endogenous retroviruses as well as long and short interspersed elements is not only essential for evolving lncRNAs but is also a significant source of double-stranded RNA (dsRNA). From an lncRNA-centric point of view, the latter is a minor source of bother in the context of the entire cell; however, dsRNA is an essential threat. A viral infection is associated with cytoplasmic dsRNA, and endogenous RNA hybrids only differ from viral dsRNA by the 5 cap structure. Hence, a multi-layered defense network is in place to protect cells from viral infections but tolerates endogenous dsRNA structures. A first line of defense is established with compartmentalization; whereas endogenous dsRNA is found predominantly confined to the nucleus and the mitochondria, exogenous dsRNA reaches the cytoplasm. Here, various sensor proteins recognize features of dsRNA including the 5 phosphate group of viral RNAs or hybrids with a particular length but not specific nucleotide sequences. The sensors trigger cellular stress pathways and innate immunity via interferon signaling but also induce apoptosis via caspase activation. Because of its central role in viral recognition and immune activation, dsRNA sensing is implicated in autoimmune diseases and used to treat cancer.

**Keywords:** double-stranded RNA (dsRNA); innate immunity; repetitive DNA elements (RE); antisense transcript

#### **1. Introduction**

If an endeavor has "Buckley's chance", no one in Melbourne would bet any money on it, as the odds to succeed are close to zero. The phrase "Buckley's chance" refers to William Buckley, an English convict who was deported to Australia. He escaped and lived with an Aboriginal tribe for more than 30 years. The chances of survival were, indeed, very slim for Buckley from the start; he was pursued and shot at when he escaped, and then he had to survive in the scorching Australian summer with little water and no food. Finally, he had to learn to communicate with the Aboriginal people and win their respect. In many ways, the unlikely survival story of William Buckley could stand as a metaphor for the development of spurious transcripts into "functional" long non-coding RNAs (lncRNAs) in a treacherous cellular environment.

The genome of complex organisms is riddled with repetitive sequences related to endogenous retroviruses (ERVs) and DNA transposons. They constitute a large part of the genome; in humans, 50–70% are repetitive or repeat-derived [1,2] and are largely responsible for the variation in genome size of complex organisms [3,4]. Despite the fact that the two classes of transposable elements (ERV and DNA transposons) can be grouped into superfamilies that are present in all taxa and then further into families and subfamilies, particular variants of transposable elements are species-specific.

The vast majority of transposons and retroviruses are inactivated through truncations and point mutations. In humans, only about 100 L1 retrotransposons (of about 500,000) are full-length, and less than 10 retained retro-transposition potential [5,6]. Hence, the repetitive, low-complexity part of the genome is often referred to as "junk DNA" [7].

**Citation:** Sadeq, S.; Al-Hashimi, S.; Cusack, C.M.; Werner, A. Endogenous Double-Stranded RNA. *Non-coding RNA* **2021**, *7*, 15. https:// doi.org/10.3390/ncrna7010015

Received: 31 January 2021 Accepted: 17 February 2021 Published: 19 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Whether the vast graveyard of transposable elements actually represents "junk", functional elements or recyclable material constitutes an ongoing scientific debate [8,9]. Two important observations, however, are uncontested and particularly relevant in the context of long non-coding RNAs. First, the insertion of an ERV into the host genome affects transcriptional activity around the insertion site, thus creating the pressure to mitigate the overwhelmingly deleterious consequences of the interference [10]. Second, the remnants of transposable elements contain regulatory sequences such as weak promoters and enhancers or polyadenylation sites, and thus, a large proportion of the repetitive genome is being transcribed at a very low level [11,12]. In a sense, pervasive transcription may create opportunities to salvage genetic material in the form of long non-coding RNAs [13]. Accordingly, 75% of mature human lncRNA sequences contain an exon originating from transposable elements (TEs) [14,15]. Comparatively, the percentage of transcripts with TE material in 5 and 3 untranslated regions (UTRs) is substantially lower, with 8.44% in the 5 UTR and 26.74% in the 3 UTR [15]. The vast majority of the transcripts are quickly degraded because they lack protective modifications such as splicing, polyadenylation and capping that would also license them for export from the nucleus. Because of repetitive sequence content as well as bi-directional transcription, the spurious transcripts are prone to form both intra- and intermolecular double-stranded RNA (dsRNA) structures. Alternatively, the association with local cellular components such as chromatin remodeling complexes [16,17] may increase the stability and chances to escape degradation [18]. This brief review discusses the former outcome of pervasive transcription, the formation of endogenous dsRNA, which may trigger a cellular antiviral response; the focus will be on observations in humans and mice. It aims to draw a bigger picture rather than drilling into details.

#### **2. Sources of Endogenous dsRNA**

The detection and quantification of dsRNA requires specific tools such as specific antibodies or dsRNA-binding proteins [19,20]. After immune purification, RNA can be analyzed by high-throughput sequencing or conventional methods such as cloning or RT-PCR. An alternative strategy to investigate nuclear dsRNA uses adenosine-to-inosine (A-to-I) editing to identify double strand formation [21,22]. Single- or double-strand specific RNases in combination with RT-qPCR provide an additional tool to demonstrate RNA hybrids. Unfortunately, RNA purification prior to nuclease treatment introduces a positive or negative bias for dsRNA (depending on the specific methodology), making quantitative assays difficult to interpret [23].

There are three main sources of endogenous dsRNA: mitochondrial transcripts, repetitive nuclear sequences, including short and long interspersed elements (SINEs, LINEs), and endogenous retroviruses (ERVs) as well as natural sense–antisense transcript pairs.

#### *2.1. Mitochondrial Transcripts*

Human mitochondria have a circular genome of 16,566 bp, with a guanine-rich heavy strand and a guanine-poor light strand, depending on buoyant density. Both strands are equally transcribed, resulting in complimentary transcripts that may bind to each other, though the light strand undergoes rapid degradation. Complementarity encompasses the length of the entire mitochondrial genome, as shown by electron microscopic analysis [24,25]. The mitochondrial DNA encodes 13 genes, 12 of which are encoded by the heavy strand and one by the light strand [26]. Under physiological circumstances, the light strand is rapidly degraded by two enzymes, polynucleotide phosphorylase (PNPase) and the helicase HSuv3 [27]. PNPase is located in the inter-mitochondrial membrane space, thus being well-placed to play an important role in preventing the escape of dsRNA into the cytoplasm. Mitochondrial RNA is a potent stimulator of the innate immune system, especially in dendritic cells and Toll-like receptor (TLR)-expressing cells [28] via a protein kinase R (PKR)-modulated interferon response. Conversely, inhibition of HSuv3 resulted in an increase in dsRNA without triggering an interferon response, which suggests that

the increased levels of dsRNA remained sequestered within the mitochondria [19]. These findings are underpinned by the knockout of PNPase or Suv3 that leads to an accumulation of dsRNA in the cytoplasm and an altered immune response [29]. Moreover, patients with bi-allelic PNPase variants showed increased levels of unprocessed mitochondrial transcripts and an enhanced expression of interferon-stimulated genes [30].

Mitochondrial dsRNA formation was also demonstrated using fCLIP-seq, an approach which entails formaldehyde cross-linking of PKR-bound dsRNA followed by highthroughput sequencing. Most of the dsRNA bound to PKR mapped to the mitochondrial genome. The mitochondrial origin of the RNA was corroborated by the lack of A-to-I edited nucleotides, as mitochondrial dsRNA is not subjected to adenosine deaminase acting on RNA (ADAR)-dependent editing [31]. Collectively, these findings established that mitochondria are an important source of dsRNA which may be released into the cytoplasm upon stress-mediated mitochondrial permeabilization [32].

#### *2.2. Repetitive DNA Sequences*

For dsRNA originating from nuclear DNA, A-to-I editing provides an accurate readout to assess genome-wide dsRNA formation [33]. In humans, 62.9% of all edited sites map to repeat regions, including SINEs, LINEs, endogenous retroviruses and DNA transposons, whereas protein coding transcripts are hardly edited at all (Figure 1). Overall, editing shows distinct species' variability and depends on the nature of the repetitive elements rather than the complexity of the organism [33,34].

**Figure 1.** Schematic representation of repetitive elements in the human genome associated with double-stranded RNA (dsRNA) formation. LINE 1 and endogenous retroviruses (ERVs) give potential rise to long dsRNA structures formed from convergent transcripts or hairpin structures from read-through transcription of head-to-head/tail-to-tail arranged elements. Alu elements are much shorter and form hairpin structures as well as "open" dsRNA hybrids, though the intermolecular duplexes are rare. Alu elements are the predominant target for adenosine deaminase acting on RNA (ADAR)-mediated adenosine-to-inosine (A-to-I) editing. LTRs function as bi-directional promoters. ORF, open reading frame; GAG (group specific antigen), POL (reverse transcriptase), ENV (envelope protein), retroviral proteins; UTR, untranslated region; LTR, long terminal repeat. Figure created with Biorender.com.

SINEs: The most common sources of dsRNA in human cells are Alu repeats, the most abundant class of short interspersed nuclear elements [35] (Figure 1). Alu elements are approximately 300 nucleotides in length and contain two 7SL RNA genes including short A-rich stretches [36,37].

Alu repeats are commonly found in intergenic regions (autonomous) as well as in introns and UTRs of genes (mRNA-embedded elements) [38]. Autonomous Alu elements constitute a small portion of the repetitive genome and are highly induced by viral infection, heat shock and cycloheximide treatment [39]. Stress enhances the activity of the RNA polymerase lll (viral infection) or increases the chromatin accessibility of Alu elements (heat shock), which is reversed with recovery from stress [40]. As compared to autonomous Alus, embedded Alu elements represent a higher proportion of repeated sequences. Because of their enrichment in UTRs, embedded Alus play an important function in gene expression via the stabilization of mRNA, as well as its localization and translation [38,41].

The repetitive nature of Alu insertions allows the formation of predominantly intramolecular dsRNA, which is recognized by the nuclear isoform of ADAR [42,43]. In addition, PKR-fCLIP sequencing showed that more than 20% of dsRNAs associated with PKR derive from Alu repeats [31]. The Alu-dependent dsRNAs are not long enough to trigger efficient oligomerization and activation of melanoma differentiation-associated gene 5 (MDA5). In contrast, a mutated form of MDA5 that shows greater tolerance towards mismatches in the RNA hybrid has been linked to immune hypersensitivity and autoimmune disease (Aicardi–Goutières syndrome, [44]).

ERVs: Human endogenous retroviruses share a comparable structure with exogenous retroviruses, the protein coding genes gag, pro (protease), pol and env flanked by two terminal repeats (5 and 3 LTR) (Figure 1). ERVs comprise up to 8% of the human genome; however, most open reading frames (ORFs) are mutated [45]. Nevertheless, ERV-related transcripts can be detected in most human tissues [46], particularly when repressive DNA methylation is inhibited. In contrast to the mutated protein coding genes, ERV-related LTRs have retained their promoter activity and provide alternative transcriptional control elements for cellular genes or drive the production of non-coding cellular RNA [45,47].

LTR promoters are bi-directional and can lead to widespread dsRNA formation [48,49]; alternatively, two adjacent ERVs in opposite orientations could fold back and form a hairpin structure [31]. Although ERVs are not a very common source of dsRNA, the activation of LTR promoters and subsequent dsRNA formation still have significant clinical consequences. For example, transcription of ERVs can be triggered by DNA methyl transferase inhibitors such as Azacitdine and Decitabine through demethylation and activation of ERV promotors [50]. Induction of ERV expression results in activation of the mitochondrial antiviral signaling protein/interferon regulatory factors (MAVS-IRFs) pathway via MDA5 and, to lesser extent, retinoic acid-inducible gene I (RIG1). This "viral mimicry" is exploited for the treatment of many cancers such as melanoma and colorectal carcinoma by activating an innate immune response against cancer cells [51].

LINEs: Long interspersed nuclear elements (LINEs) are 6–7 kb in size and constitute up to 20% of the human genome. Full-length copies contain two open reading frames (ORF1 and ORF2) which encode proteins essential for retro-transposition [52] (Figure 1). ORF1 makes a 40-kDa RNA-binding protein (RBP 40) which plays an important role in activating the host innate immune system, while ORF2 encodes an endonuclease and the reverse transcriptase [53]. Transcription is driven by a promoter that harbors several transcription factor binding sites as well as a CpG island. Most LINEs are inactive because of truncations, mutations and rearranged copies; however, a small number of elements are functional [54].

The exact mechanisms by which LINEs form a double-strand configuration is unknown; some studies hypothesize that they form hairpin structures when two complementary LINEs are present in the same transcript. Alternatively, two LINEs on two different transcripts close to each other can hybridize [55]. This idea is supported by fCLIP sequencing data showing that the distance between two LINEs interacting with PKR is much shorter than the space between random copies [31]. Furthermore, LINE elements have the ability to fold back on their 5 region, forming stable hairpin structures that are recognized by PKR [20].

LINEs associate with various dsRNA binding proteins, mostly PKR and MDA5, and their expression has been linked to the activation of an interferon 1 response [31]. Moreover, extensive editing of LINEs by ADAR has been shown using ADAR–CLIP

sequencing [56,57]. Although LINEs only give rise to 3% of cellular dsRNA as compared to 67% from SINEs, they are linked to many human diseases [44].

Natural antisense transcripts: According to the gencode biotype definition, antisense transcripts are "transcripts that overlap the genomic span (i.e., exon or introns) of a protein-coding locus on the opposite strand". This definition excludes protein-coding antisense transcripts and read-through transcripts from tail-to-tail arranged gene pairs; if those are included, 40–70% of loci show bi-directional transcription [58,59]. Hence, if a sense/antisense transcript pair is co-expressed in the same cell, dsRNA structures are potentially formed (Figure 2). To what extend hybridization actually occurs is controversial and rather challenging to demonstrate experimentally.

**Figure 2.** Double-stranded RNA (dsRNA) formation from sense–antisense transcripts. Natural antisense transcripts are processed and potentially reach the cytoplasm, where they interact with the sense transcript. In somatic cells, the level of sense–antisense hybrids is low, and there is no evidence of ADAR editing, for example, nor is dsRNA immune signaling triggered. Various mechanisms (RNA interference, RNA masking, RNA editing and dsRNA signaling) are potentially triggered by the dsRNA, depending on the cellular context. In male germ cells and during early embryogenesis, sense–antisense dsRNA formation may play a general, system-relevant role. Figure created with Biorender.com.

Before the dawn of the genomics era, natural antisense transcripts were studied in the context of parental imprinting. Early ground-breaking work demonstrated that the expression of the antisense transcript was associated with the silencing of the related sense transcript on the same allele. Experimental silencing of the antisense transcript (Airn, Kcnq1ot1, for example) abolished parental imprinting and led to bi-allelic expression of the entire cluster, not only of the complementary gene [60,61]. Similar observations were made with non-imprinted genes; a deletion in the genome of a patient with α-thalassemia placed the constitutively active LUC7L (Putative RNA-Binding Protein Luc7-Like) directly downstream of the HBA2 (Hemoglobin 2A) gene. The ectopic expression of LUC7L produced an antisense transcript complementary to HBA2, causing hypermethylation of the CpG-rich promoter and transcriptional silencing of the gene [62]. Likewise, the promotor of the tumor suppressor gene p15 (CDKN2B, Cyclin Dependent Kinase Inhibitor 2B) is hypermethylated and silenced in various tumors, associated with the expression of the antisense transcript p15-AS (CDKN2B-AS1) [63]. Silencing was found to be independent of Dicer, and the fact that the entire CDKN2B gene is imbedded in an intron of CDKN2B-AS1 argues against a role of dsRNA formation in an antisense transcript-mediated regulatory mechanism [63,64].

On the other hand, there is increasing evidence of dsRNA formation as the result of antisense transcription from both genomics studies and examples of specific sense–antisense transcript pairs. Early studies on the genome-wide expression of natural antisense transcripts followed a strategy where complementary full-length transcripts and expressed sequence tags in whole-data repository searches were identified [65,66]. The formation of dsRNA is inferred by the observation that natural antisense transcripts are significantly under-represented on the X chromosome of both humans and mice, whereas no such bias was found for sense–antisense pairs that lacked exonic complementarity [65,66]. Accordingly, dsRNA formation between processed transcripts represents a feature with a positive (accumulation on autosomes) or negative impact (reduction on X chromosomes) on evolutionary selection. The implications of dsRNA formation in the context of antisense transcription have been discussed including RNA masking, RNA editing, RNA interference as well as the stimulation of an innate immune response [67]. RNA masking is generally associated with a concordant expression of sense and antisense transcripts, often by interfering with the inhibitory action of miRNAs [68,69]. The latter three mechanisms (RNA editing, RNA interference and immune response) induce a discordant expression of the sense–antisense transcript pair ("antisense inhibits sense") (Figure 2).

There is a steadily increasing number of reports on specific sense–antisense pairs where dsRNA formation is implicated in a regulatory interaction between the two transcripts. In line with the proposed mechanisms, both concordant and discordant expression of the complementary transcripts have been observed [59]. An example of a stimulatory interaction described in detail is the interplay between the transcript for β secretase-1 (BACE1) and its natural antisense transcript (BACE1-AS) in the context of Alzheimer's disease pathophysiology. The antisense transcript protects BACE1 mRNA from miR-485- 5p-induced degradation, and because of the increased β secretase, more β amyloid 1-42 was produced. In line with the mechanism, the levels of BACE1-AS were elevated in patients with Alzheimer's disease [70,71]. Other selected examples of antisense transcripts masking miRNA binding sites are listed in Piatek et al. [72]. Natural antisense transcripts can also stabilize the sense transcript by blocking the binding of RNA decay-promoting factors [73]. This mode of action is exemplified by the interaction between the tumor suppressor gene PDCD4 (Programmed Cell Death 4) and its antisense transcript PDCD4-AS1 in mammary epithelial cells. The antisense transcript blocks the binding of human antigen R (HuR), which, in turn, stabilizes the sense mRNA and leads to increased PDCD4 expression [73]. Accordingly, PDCD4-AS1 expression is decreased in breast cancer patients and is low in mammary epithelial cells.

The mechanisms that lead to the degradation of the sense transcript generate specific products that can be experimentally assessed at a large scale, i.e., A-to-I conversions for editing, short RNAs for RNA interference and sequencing of RNA bound to protein kinase R. However, only limited evidence supports that these mechanisms are involved in processing RNA hybrids between genic sense and antisense transcripts—at least in a specific experimental context [74,75]. There are a few examples where the involvement of Dicer or ADAR has been experimentally tested for specific bi-directionally transcribed loci including the gene pairs glutaminase (GLS)/GLS-AS or sodium/phosphate co-transporter and a read-through transcript from profilin 3 (Slc34a1/Pfn3) [74,76]. Low levels of GLS-AS and enhanced expression of GLS in patients with pancreatic cancer predict a poor clinical outcome. The underlying mechanism was investigated in PANC-1 cells (human pancreatic cancer cell line-1). Accordingly, dsRNA formation occurs in the nucleus and both ADAR and Dicer can process the hybrid, resulting in a decrease in GLS sense mRNA and encoded glutaminase. Enhanced levels of glutaminase are observed under nutrient stress and related to tumorigenesis [74]. With regard to the Slc34a1/Pfn3 locus, there is little evidence that the antisense transcript is involved in the physiological regulation of the Na-phosphate cotransporter. Depending on the model system, both RNA interference and transcriptional interference can be observed. The fact that both transcripts are lowly expressed in testis may indicate that the sense–antisense interaction is biologically relevant in male germ cells, where the vast majority of natural antisense transcripts are expressed [76].

Despite the ever-increasing number of mechanistically established sense–antisense interactions, there is still a huge gap between the number of characterized examples and the thousands of sense–antisense gene pairs. An interesting set of articles have recently revived the idea that natural antisense transcripts and the potential dsRNA formation feed into a common mechanism(s) that merits selection, as seen with the X-chromosome bias or—more generally—the weak evolutionary conservation of sense–antisense arrangements [77].

Work in a preprint by S Pillay investigated the role of natural antisense transcript expression during early zebrafish embryogenesis and divided the RNAs into two groups with negative and positive correlation with sense transcript abundance, respectively [78]. Positively correlated transcripts are predominantly associated with house-keeping genes, whereas the transcripts with discordant expression are maternally expressed and are complementary to developmental genes. Based on the finding that the discordantly regulated transcripts were enriched in the cytosol, the authors speculate that these natural antisense transcripts act in a similar way as miRNAs to silence ectopic expression of developmental genes [78]. Another study in our own lab focused on dsRNA formation in mouse testis and involved enrichment of dsRNA using the J2 antibody followed by deep sequencing. We found that dsRNA was predominately present in pachytene spermatocytes and that the dsRNA transcriptome in testis was fundamentally different from the one in somatic liver cells. In both cell types, dsRNA was derived from mitochondrial transcription, though in testis, mRNA-related signals were clearly more abundant than in liver. Moreover, we could establish an association between dsRNA, antisense genes and endogenous siRNAs (small interfering RNAs)—again, the link was weaker or insignificant in liver cells (Werner et al., under revision). Importantly, both investigations focused on native tissues and cells, developing male germ cells and early zebrafish embryos, respectively. Both systems display low levels of DNA methylation [79,80] and transcriptional activity that is distinct from "normal" somatic cells. Moreover, testis male germ cells are immune privileged and tolerate dsRNA without activating innate immunity [81]. It is intriguing to speculate that natural antisense transcripts and dsRNA formation play a role in mitigating the consequences of the genome-wide transcriptional changes. Findings in zebrafish and mouse testis also suggest that dsRNA may have a fundamentally different impact in somatic cells.

The different handling of dsRNA in germ cells versus somatic cells has been experimentally corroborated using transgenic mice expressing a construct with a long hairpin the 3 UTR. In mouse oocytes, dsRNA was processed into siRNAs, whereas in somatic cells, a small fraction was A-to-I edited. An interferon (IFN) response was only observed after high-level expression of the hairpin construct in a transfected human cell line (HEK293) [82]. A germ cell-specific biological role of dsRNA and endo-siRNAs is also supported by low siRNA sequencing of both female and male germ cells [83,84].

#### **3. Proteins Binding dsRNA**

The structure of dsRNA adopts an A-form duplex with a narrow major groove (4- Å width) and wide minor groove (10–11-Å width). As a consequence, dsRNA-binding proteins are generally unable to form base pair-specific interactions and recognize the backbone rather than sequence motives [85]. However, a few examples such as ADAR2 or STAUFEN recognize specific base pairs in the minor grove of the duplex [86,87]. Moreover, additional structures such as the Cap or RNA base modifications affect the binding of dsRNA-binding proteins and help the distinction between viral RNA and endogenous RNA hybrids. The dsRNA-binding protein families include RIG-I-like receptors (RLRs), PKR, ADAR, oligo adenylate synthetase (OAS), Dicer, Drosha and other helicases [20]. We focus here on the dsRNA-binding proteins that create a link between pathogenic dsRNA formation and the immune system (Figure 3). As part of the host defense against

invading pathogens, these dsRNA sensors are also linked to a number of inflammatory and autoimmune diseases [88,89].

**Figure 3.** Double-stranded RNA (dsRNA) sensor proteins and activation of innate immunity. Viral dsRNA (including 5 phosphorylation) or dsRNA from mitochondria and repetitive elements in the cytoplasm are recognized by dsRNA sensors retinoic acid-inducible gene I (RIG1), melanoma differentiation-associated gene 5 (MDA5), protein kinase R (PKR) and ADAR. RIG1 requires the 5 phosphate group to initiate oligomerization, and MDA5 forms long dsRNA-dependent polymers. Both structures induce mitochondrial antiviral signaling (MAVS) polymerization and, eventually, caspase and interferon signaling. PKR binds short dsRNA molecules, dimerizes and becomes activated through autophosphorylation. Activated PKR dissociates from dsRNA, phosphorylates eukaryotic initiation factor 2α (eIF2α) (which, in turn, inhibits translation globally) and triggers an interferon response. ADAR is present in both the nucleus and cytoplasm and antagonizes dsRNA signaling by melting the RNA hybrid. Figure created with Biorender.com.

RIG-I-like receptors (RLRs): The protein family of retinoic acid-inducible gene-like receptors (RLRs), also called cytosolic RNA sensors, includes RIG1, MDA5 and laboratory of genetics and physiology 2 (LGP2). The latter lacks two caspase recruitment domains which are essential for downstream signaling. Consequently, LGP2 plays a regulatory role rather than an effector function in a dsRNA response [90]. The two main sensors that trigger a dsRNA inflammatory response are RIG1 and MDA5, which will be briefly introduced here [91].

RIG1 and MDA5 are members of the DExD/H box helicase family and contain five specific protein domains: from the N terminus, two caspase recruitment domains (CARDs), which participate in antiviral signaling, a DEAD-like helicase superfamily ATP-binding domain (DExDc), a helicase domain (HELICc) and a zinc-binding C-terminal domain [92]. In the non-signaling state, the two N-terminal domains are auto-repressed and unable to bind to mitochondrial antiviral signaling (MAVS) protein, a protein involved in the cellular innate antiviral defense. The auto-repression is abolished by the release of the N-terminal domains upon binding to dsRNA via helicase and the C-terminal domains [93].

RIG1 and MDA5 share the same signaling pathway but identify a discriminate group of dsRNA. Dimerization of RIG1 only takes a 300-base-pair duplex but requires a 5 triphosphate group at the RNA end [94]. The triphosphate group is normally found in RNAs but is 7-methyl guanosine-capped in most eukaryotic mRNAs in the cytosol. Viral RNA usually lacks this modification. Recognition of dsRNA by MDA5 does not depend on the triphosphate group but requires a longer stretch of dsRNA (500–1000 bp) for a process of nucleation and filament assembly to be activated [95].

RIG1 and MDA5 activation leads to oligomerization of CARD domains, which, in turn, produces a platform for the generation of MAVS filaments at the mitochondrial membrane [96]. This triggers two main cascades, one activating nuclear factor κB and the transcription of proinflammatory genes, the other leads to the phosphorylation of interferon regulatory factors 3/7 (IRF3/7) and the stimulation of interferon gene expression [97].

Protein kinase regulated by RNA: PKR, also referred to as eukaryotic translation initiation factor 2-alpha kinase 2, EIF2AK2, is activated by binding to dsRNA and its gene expression is induced by interferon [98]. PKR includes two N-terminal RNA-binding motifs (RI and RII) and a catalytic kinase domain at the C-terminus [99]. The dsRNA-binding domains can interact with adjacent minor grooves of dsRNA by binding to the phosphate and ribose backbone independent of the base sequence [100].

Activation of the enzymatic activity of PKR requires an RNA duplex of at least 33 bp. Activation efficiency increases up to 85 bp and decreases with longer duplexes or high concentrations of dsRNA because of a dilution effect that reduces the chances of PKR dimerization [101]. PKR recognizes all types of dsRNA, but the majority of PKR was bound to dsRNA of mitochondrial origin, followed by IRAlus (inverted- repeat Alu elements, 20%) [31].

Binding of PKR to dsRNA induces a conformational change which displaces the inhibitory dsRNA binding domain from the catalytic kinase domain [102]. Moreover, homodimerization results in auto-phosphorylation and activation of PKR. The activated kinase dissociates from dsRNA and phosphorylates eukaryotic translation initiation factor 2A (EIF2A) at serine 51 and triggers global translational shut-down [103]. Alternatively, PKR phosphorylation may lead to Fas-associated via death domain (FADD)/caspase 8 mediated activation of caspases 3/7 and, ultimately, apoptosis [104,105].

Adenosine deaminase acting on RNA (ADAR): Members of the ADAR protein family catalyze the conversion of A to I in dsRNA. In humans, there are three ADAR genes: ADAR1, ADAR2 and ADAR3, with ADAR1 being interferon-inducible [106,107]. All of the three ADAR proteins contain two or three dsRNA-binding domains and a C-terminal deaminase domain. Moreover, ADAR1 has one or two N-terminal Z-DNA-binding domains and ADAR3 contains an arginine-rich region [108].

Transcription of ADAR is driven by interferon inducible- and constitutively active promoters [109]. ADAR1 is ubiquitously expressed in human tissues and predominantly targets dsRNA formed by IRAlus in the 3 UTR of the mRNAs. Around 97.7% of editing occurs in non-protein-coding regions [110,111]. ADAR2 expression is highest in the brain and is directly linked to site-specific base changes of neurotransmitter receptor transcripts with functional and phenotypic consequences [112,113]. Additional targets have been identified in the brain and other tissues, but the consequences of editing are less well established [114]. ADAR2 accounts for 25% of the editing in non-repetitive sites in proteincoding transcripts [111]. ADAR3 is exclusively expressed in the brain; the enzyme lacks catalytic activity and its main role appears to be the inhibition of ADAR2 by competition for dsRNA binding [115].

ADAR antagonizes apoptosis by counter-balancing the activation dsRNA sensors and the stimulation of inflammatory and apoptotic signaling [116]. In a negative feedback mechanism, interferon stimulates ADAR that binds to and melts dsRNA, thus competing with other dsRNA sensors [117]. Despite compartmentalization of dsRNA and the various other strategies to distinguish intrinsic dsRNA from viral insurgents, there are still various pathologies with an underlying inflammatory phenotype potentially linked to endogenous dsRNA. Two examples where dsRNA plays a role in disease development but also offers treatment avenues are cancer and autoimmune diseases.

#### **4. Physiological and Pathophysiological Roles of dsRNA**

Apart from stimulating an antiviral response, there is growing evidence to suggest that dsRNA contributes to physiological cell growth and function, depending on the length, abundance and location of dsRNA within the cell [118,119]. In this context, the activation of PKR and downstream interferon signaling as well as TLR3 activation by cytoplasmic long dsRNA are particularly relevant [118].

PKR is ubiquitously expressed in mitochondria as well as in the cytoplasm in its unphosphorylated inactive form; its physiological role extends beyond an antiviral response [31,120]. PKR activation is strictly regulated during mitosis, and its activity is essential for proper cell division. The disruption of the nuclear structure during mitosis means that IRAlus escape compartmentalization and activate PKR. As a consequence, eukaryotic initiation factor 2α (eIF2α) becomes phosphorylated, with subsequent suppression of the global translation [121]. Inhibition of PKR by RNA interference or expression of a transdominant-negative mutant alleviating translation suppression during M phase lead to the dysregulation of several mitotic factors (cyclins A and B and polo-like kinase 1). The reduced phosphorylation of histone 3 and stabilization of G2-specific cell cycle regulators cause a delay in the progression from G2 to M phase [121]. Activated PKR also induces phosphorylation of p53, a tumor-suppressor protein with a pivotal role in controlling cell cycle and apoptosis, which leads to a 25–35% increase in cells arrested in G1. On the other hand, a reduction in PKR expression by doxorubicin decreases p53 stability [122,123].

Wound-induced hair neogenesis (WIHN) is a rare example of adult organogenesis in which dsRNA plays a central role [124]. The activation of TLR3 by endogenous dsRNA contributes essentially to wound healing and hair regeneration. Full thickness wounds in mice result in the release of dsRNA from damaged skin that activates TLR3 and triggers downstream signaling via interleukin 6 and STAT3 (Signal transducer and activator of transcription 3), which promote hair neogenesis. Moreover, activated TLR3 induces intrinsic synthesis of retinoic acid (RA) that orchestrates skin appendages' growth and regeneration [125,126]. Injection of poly(I:C), a dsRNA analogue, into mouse wounded skin results in a significant increase in new hair formation, while TLR3-deficient mice failed to generate new hair upon skin wounding [124,126]. Furthermore, human skin biopsies taken after rejuvenation laser treatment display increased endogenous RA synthesis and enhanced gene expression signatures for dsRNA and RA [125].

Endogenous dsRNA and autoimmune diseases: Autoimmune diseases are pathologies where the immune system mistakenly attacks healthy cells. Around 50% of autoimmune diseases are of unknown etiology, while others are attributed to genetic pre-disposition or hormonal and environmental factors [127]. The contribution of dsRNA to autoimmune diseases was inferred by Schur and colleagues, who detected antibodies against dsRNA in the sera of 51% of patients with systemic lupus erythematosus (SLE) and 9% with rheumatoid arthritis as compared to 6% of normal people [128]. Elevated interferon levels and enhanced expression of IFN-stimulated genes in the blood of SLE patients have been shown more recently [129–131]. Furthermore, the presence of anti-MDA5 antibodies in dermatomyositis patients is considered as a prognostic marker associated with high death rate due to interstitial lung disease [132].

Myasthenia gravis is an autoimmune disease characterized by auto-antibodies against the acetyl choline receptor AChR. Injection of poly (I:C) in mice stimulates the expression of αAChR via TLR3 and PKR activation. Accordingly, the expressions of TLR3, PKR, IFR7, IRF5 and IFN-β are all upregulated in the thymus of patients with myasthenia gravis, indicating an important role of dsRNA signaling in the disease etiology [133]. PKR, MDA5 and RIG1 expression are all increased in psoriatic lesional skin, paralleled by high IFNα levels [134]. IFNα treatment for hepatitis C virus infection is well known to trigger autoimmune diseases such as psoriasis, antiphospholipid syndrome or sarcoidosis, highlighting the contribution of innate immunity to the pathogenesis of these diseases [135,136].

A-to-I RNA editing enhances transcriptome and protein diversity; conversely, editing in protein-coding regions generates auto-antigens and potentially causes or aggravates autoimmune diseases. Accordingly, increased editing was observed in SLE and rheumatoid arthritis [137,138]. On the other hand, there is a global reduction in A-to-I editing in psoriatic lesional skin and an accumulation of dsRNA feeding into an antiviral response, highlighting the fine balance between protective and detrimental consequences of dsRNA signaling.

dsRNA in cancer: Somatic mutations and escaping immune surveillance are essential steps in tumor initiation and progression. Recent studies have highlighted that RNA mutations constitute an additional cause for transition to malignant tumor, with RNA editing being a major cause for the underlying sequence changes. Adenosine-to-inosine changes in dsRNA by ADAR can give rise to transcriptomic alterations via point mutations, alternative splicing, altered RNA targeting and defects in microRNA synthesis [139]. Accordingly, many cancer types such as liver and breast cancer as well as some gastrointestinal malignancies express high levels of ADAR, which also promotes cancer growth and metastasis [140].

Although both ADAR1 and ADAR2 are linked to tumorigenesis, ADAR1 appears to play the major role due to its ubiquitous expression [139]. ADAR1 expression is stimulated by interferon as a negative feedback to control inflammation and cell survival, potentially also promoting tumor growth and invasiveness [141,142]. ADAR1 has been found to edit disease-relevant transcripts in a number of cancers [143]. For example, in prostate cancer, A-to-I editing in the androgen receptor transcript affects interaction of the receptor with androgens and androgen antagonists, which results in the reactivation of androgen signaling, tumor development and growth [144]. In hepatocellular carcinoma, increased levels of ADAR lead to editing of Antizyme Inhibitor 1 (AZIN1) and consequently enhanced nuclear import of the edited protein and stabilized interaction with its binding partner (Antizyme). The reduced inhibitory potential of the complex promotes tumor formation and is associated with aggressive behavior [145] (for a comprehensive review, see [143]).

dsRNA cancer therapies: There is a relation between autoimmune diseases and cancer—for example, long-standing autoimmune diseases may results in cancer transformation. Interestingly, the upregulation of ERV transcription is a common feature between these two pathologies [146,147]. The majority of ERVs are transcriptionally inactive, though 7% of their sequences can be reactivated by exogenous viruses or hypoxia [148]. Unlike in the autoimmune diseases discussed above, cancer cells mitigate the impact of moderate levels of ERV-related dsRNA formation and escape immune surveillance. However, drug-induced stimulation of ERV transcription can trigger a dsRNA-mediated immune response and make the cancerous cells visible to a variety of immune cells [149]. Hence, host dsRNA-binding proteins and the associated signaling cascades are widely used drug targets [150].

Transcription of ERVs is efficiently silenced through DNA hypermethylation in normal somatic cells [48]. Hypomethylating drugs such as azacytidine or decitabine induce transcription of ERVs and the formation of dsRNA, which, in turn, activates innate immune signaling. Both drugs are widely used to treat hematological cancers and have been investigated to treat other types of solid tissue tumors [151]. The consequences of bi-directional transcription of ERVs have been established in various cancer cell lines including epithelial ovarian cancers, colonic cancer cell lines and melanoma [48,152]. Accordingly, azacytidine causes an interferon response and increased expression of programmed death-ligand 1 (PD-L1), an important target in cancer immunotherapy [152].

A novel approach to treat various cancers combines ERV re-activation using histone deacetylase inhibitors (HDACi) in combination with immune checkpoint inhibitors targeting PD-1 or Cytotoxic T-Lymphocyte Associated Protein 4 (CTLA-4) [153]. Accordingly, ERV activation triggers a dsRNA-mediated interferon response that leads to increased expression of Major Histocompatibility Complex type I (MHC-I) on cancer cells; hence, the cell becomes "visible" to a T cell-mediated response [48]. Immune checkpoint inhibitors such as Atezolizumab and Avelumab or Ipilimumab (monoclonal antibodies against PD-1

or CTLA-4, respectively) used in combination dampen the inhibitory immune response and enhance anti-tumor activity [154].

The viral dsRNA analogues poly(I:C) and poly(A:U) are being used as adjuvants in anti-tumor therapy for their potential to stimulate an interferon response. There are two main mechanisms by which cancer cells are affected: first, by inducing cancer cell apoptosis through an IFN-β autocrine loop, and second, by IFN-β-mediated signaling. This leads to stimulation of the major players in anti-cancer immunity, including maturation and differentiation of dendritic cells, promotion of a T cell response and activation of natural killer cells [155]. Hence, immune-stimulatory adjuvants are key components of cancer vaccines together with tumor-specific antigens [156].

#### **5. Conclusions**

The pathways by which viral dsRNA activates innate immunity have been established for quite some time. In this context, the discovery of widespread dsRNA formation from endogenous sources such as repetitive elements or natural antisense transcripts raised questions of how the different stimulators of innate immunity are controlled. Compartmentalization and specialized dsRNA sensor proteins that integrate structural information and dsRNA abundance to elicit a physiologically sensible response have evolved as a protective strategy. Nonetheless, cellular dsRNA homeostasis is often challenged in disease and these observations have disclosed an interplay between repetitive genomic elements, long non-coding RNA and innate immune signaling that can jeopardize the well-being of cells, organs and the entire organism. A detailed understanding of dsRNA expression and processing can inform strategies to avoid ectopic dsRNA formation and inflammation through stress, drugs or malnutrition, for example. Alternatively, therapeutic stimulation of dsRNA expression shows great promise in directing an immune response against cancer cells.

**Author Contributions:** All authors have written parts of the manuscript and generated figures. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partly funded by The Northern Counties Kidney Research Fund, Grant 18.011 to A.W., and the Iraqi Ministry of Higher Education (S.S. and S.A-H.).

**Acknowledgments:** We would like to thank Aikaterini Gatsiou for the critical discussion of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

