**1. Introduction**

The rapid spread of the recently emerging coronavirus disease, COVID-19, caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has given rise to the current pneumonia pandemic, which has a moderate (5%) fatality rate compared to SARS-CoV-1 (10% fatality rate) and Middle East Respiratory Coronavirus (MERS-CoV) (34% fatality rate), due to sepsis/acute respiratory distress syndrome (SARS) [1,2]. Rapid transmission can be hindered by hand washing, distance maintenance between people and the use of masks by infected individuals to trap virus-carrying breath droplets [3]. As of 20 November 2020, there have been over 55 million cases, worldwide, with over 12 million (~22%) being within the USA. The pandemic shows doubling of cases every 5 days and an

**Citation:** Wong, N.A.; Saier, M.H., Jr. The SARS-Coronavirus Infection Cycle: A Survey of Viral Membrane Proteins, Their Functional Interactions and Pathogenesis. *Int. J. Mol. Sci.* **2021**, *22*, 1308. https:// doi.org/10.3390/ijms22031308

Academic Editor: Masoud Jelokhani-Niaraki

Received: 1 December 2020 Accepted: 22 January 2021 Published: 28 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

estimated incubation period of 5 days, both in the USA and world-wide [3]. Well over 1.3 million deaths have resulted, with over 250,000 of them occurring in the USA. The rapid inter-human disease transmission has caught global health professionals by surprise, and as of 20 November 2020, we have no clinically approved anti-viral drug for treatment, and vaccines, recently developed for prevention, are not ye<sup>t</sup> available for distribution [4,5].

The disease arose in the city of Wuhan (Hubei Province, China), and seems to have been caused by the sale of live wild animals as a food source in the Wuhan wholefood market [6]. These animals were thought to be the intermediate carrier of the virus, originating in *Rhinolophus* bats [1]. Although the primary epidemiological factors contributing to the rapid spread of the disease [7] and the immune responses to the virus [8] are reasonably well understood, the animal intermediate carrier has not ye<sup>t</sup> been identified. A greater incidence of the disease in older people, particularly those with co-morbidities such as hypertension and cardiovascular disease, and disproportionate distributions of cases and fatalities among people of different ethnic groups (among African, Hispanic and Native Americans as compared with European Americans) are now well established [9,10].

#### **2. Viral Proteins and Their Roles in the Infection Cycle**

While the widespread availability of a vaccine is foreseeable, the development of antivirals, targeting specific proteins involved in SARS-CoV-2s pathogenesis and the infection cycle, may prove to be feasible in a longer time frame. The subsequent progression of disease conditions produced by the 2003 SARS-CoV-1 and the 2020 SARS-CoV-2 include a deadly viral pneumonia and Severe Acute Respiratory Syndrome (SARS), caused in part by the overstimulation of the host's innate immune system. Patients infected with SARS-CoV-1 or SARS-CoV-2 exhibit similar lung pathological symptoms, again reflecting the similarities between these two viral types. However, additional symptoms affecting other bodily organs have been documented [11,12].

A number of anti-viral agents, known to inhibit or otherwise negatively influence the infectivity or transmission of related viruses, the closest being SARS-CoV-1, have been considered, and some of these have even been tested, either in vitro or in vivo. For example, micronutrients such as vitamins C, D and E have been suggested to be of value for prophylaxis or treatment [7]. Controversial therapies include treatment with inhibitors of the angiotensin-converting enzyme, ACE, the receptor for SARS-CoV-2 [13], non-steroidal and steroidal anti-inflammatory drugs such as ibuprofen and corticosteroids, and various Chinese herbal medicines. These have all been considered and recommended in published articles [7,14,15]. However, several antiviral agents, effective against other viruses, are being tested, and in some cases, they are being used in clinical trials. One of the most promising drugs is remdesivir and its derivatives, broad spectrum antiviral drugs; current limited evidence suggests that they have been effective as a therapeutic option in combating COVID-19 [16,17]. Chloroquine (CG) and hydroxychloroquine (HCG), anti-malarial drugs, have been used, but caution is required because of the toxicity of these drugs and because higher doses seem to be required to combat the virus than are used for malaria. At present, there is insufficient evidence to sugges<sup>t</sup> that CQ/HCQ are safe and effective treatments for COVID-19, and most of the evidence suggests that at safe doses, they are not effective [18,19].

Potential drug targets of SARS-CoV-2 include several viral structural and non-structural proteins. The effective development of novel drugs specifically for this virus depends on an understanding of the structures and functions of the constituent proteins [20]. The genome sequences of several *Betacoronaviruses* (β-CoVs) have been available for several years, and that of the new CoV-2 has been available almost since the initial outbreak in the city of Wuhan [21]. A typical betacoronavirus genome includes at least six genes (open reading frames; ORFs), some or all of which can be post-transcriptionally/translationally proteolytically cleaved into smaller proteins or peptides with importance to the viral infection cycle, and these may prove to be suitable drug targets [22]. These proteins include the structural Spike (S), Envelope (E), Membrane (M), Nucleocapsid (N), Hemagglutinin

esterase (HE) and Helicase (H) proteins, and nonstructural proteins (nsp) which include Proteases, papain-like proteases (PLP or PLpro) and 3C-like protease (3CLpro), and Replicase proteins [22]. The Spike (S) protein, found as protrusions on the surface of the viral particle, includes the fusion peptide while the envelope (E) protein has viroporin activity, important for normal completion of the viral infection cycle. The nucleocapsid (N) protein forms a nucleoprotein with the single (+) stranded RNA genome, and this nucleoprotein complex is maintained within the capsid comprised in part of the matrix (M) protein. The protease proteins, which lie in ORF1a/ORF1ab, have been a target of anti-viral action by lopinavir/ritonavir, with and without arbidol, an anti-envelope viral indole derivative [16,23]. Four main structural proteins are encoded by ORFs 10 and 11 near the 3 terminus of the viral genome (Figure 1) [21]. Among the potential targets of drug action are E-protein viroporins [24] and 'Spike' membrane fusion proteins [25]. In this review, we discuss several of the SARS-CoV-2 proteins but focus on these two proteins as potential targets of novel potential drugs to combat COVID-19, SARS and MERS.

**Figure 1.** Genome graphic of SARS-CoV, showing where the genes encoding the recognized viral proteins reside.

#### **3. The CoV Replication Cycle (from Entry to Exit)**

The coronaviruses comprise a family of enveloped RNA viruses with positive sense strand genomes, meaning that expression of most genes, with the exception of the late structural genes, are transcribed directly from the genome. The first obstacle of any virus is entry into its host cell, utilizing common and often highly expressed viral receptors present on the host membranes. These receptors not only play a role in adherence to the host surface but also facilitate membrane fusion [26]. Attachment to host surface proteins often leads to a chain reaction that activates entry mechanisms in either the host, the virus, or both [27]. At some point during or after viral entry into the host cell, the virus may release its genome to begin hijacking the cellular machinery, allowing replication of its genome with the eventual production of viral products (Figure 2). These receptors may vary widely amongs<sup>t</sup> viruses; they may be carbohydrates as is the case for influenza virus which uses sialic acid in respiratory epithelial cells, or they may be specific for Clusters of Differentiation (CD) ligands on immune cells. For example, HIV targets CD4 and CCR5/CXCR4 of human T-cells. Viruses may also utilize coreceptors that do not activate entry mechanisms but increase their affinity for their host to improve the probability of eventually engaging the correct viral receptor for entry.

**Figure 2.** The infection cycle of coronaviruses. 1.a. Endocytic entry of virus particles after attaching to receptor proteins. 1.b. Immediate fusion at the surface of the cell upon attachment to receptor protein. 2.a. Endocytic fusion of viral and host membranes after endosomal maturation. 3. After shedding N protein from the genome, viral RNA is translated, expressing nsps from pp1a and pp1ab. Nsps migrate to the ER and perinuclear spaces, and RdRp complexes form to reverse transcribe additional viral transcripts and genomes. 4. Nsp3, nsp4 and nsp6 begin membrane rearrangements and form CMs and DMVs. 5. Viral structural proteins are produced and post-translationally modified in the ER/Golgi. 6. Viral genomes are encapsulated by N proteins. 7. Encapsulated genomes are wrapped by lipid envelopes assembled by structural proteins S, M and E. 8. Late-stage membrane rearrangements, many interconnected DMVs spread through perinuclear space with dsRNA within them. The Golgi has many virions budding from it with LVCVs attached to its membranes. 9. Mature CoV virions bud from ERGIC. 10. Exit of progeny CoV viruses via lysosomal exocytosis.

For enveloped viruses, entry into the host involves fusion of the host membrane with its own viral envelope to release its genome into the cell (Figure 2). All human infecting coronaviruses are capable of infecting the respiratory tract, and the three epidemic strains of β coronaviruses, SARS-CoV-1, SARS-CoV-2 and MERS, take advantage of prevalent ectopeptidases, surface glycoproteins found not only in respiratory organs, but also in epithelial cells of the gastrointestinal (GI) tract and excretory system as well as immune cells [28,29]. Dipeptidyl peptidase-4 (DPP4) is utilized by MERS, and both SARS-CoV-1 and SARS-CoV-2 use human angiotensin-converting enzyme 2 (ACE2) [26,30–32]. ACE2 is commonly found throughout human epithelial tissues including the oral and nasal mucosa, nasopharynx, lungs, stomach, small intestine, colon, lymph nodes, thymus, bone marrow, spleen, liver, kidney and even the brain which sometimes leads to systemic infection in late pathogenesis of Covid diseases [29]. For both SARS-CoV-1 and SARS-CoV-2, The S1 subunit of the Spike (S) protein engages ACE2 and primes activation of the S fusion core complex by host proteases. The S protein fusion complex of β-CoVs is promiscuous and can be cleaved by a variety of proteases such as furin, trypsins, trypsin-like proteases, cathepsins, PC1, transmembrane serine protease-2 (TMPRSS-2), TMPRSS-4, type II transmembrane serine protease (TTSP) matriptase, and human airway trypsin-like (HAT) protease [32,33]. Membrane fusion, and release of the viral genome into the host cell may occur through a non-endosomal pathway at the membrane surface of the cell, although this is a minor pathway for entry in SARS-CoV-1 and MERS [34]. Studies have confirmed that SARS-CoV-1 and MERS prefer to enter the host cell through endocytic pathways, activating the fusion complex and releasing their genomes after maturation of the endosome following initial entry, similar to the endosomal pH-dependence of influenza viral fusion complexes [35]. However, it is the availability of these host proteases, present at extracellular binding sites, that dictates if the CoV will fuse its membrane at the surface, or after endocytosis, which may explain the varying infection dynamics amongs<sup>t</sup> different cell types [32]. For instance, colocalization of ACE-2 and TMPRSS-2, possibly on lipid rafts of cell surfaces, may optimize cell to cell entry of SARS-CoVs and syncitia formation in respiratory epithelial cells [36].

Induction of endocytic pathways is a common theme among enveloped viruses, often utilizing commonly characterized methods such as clathrin- or caveolae-coated pits, pinocytosis, or macro-pinocytosis [37]. Drugs inhibiting endocytic pathways have proven to be effective antivirals for various viruses such as influenza virus, simian virus, herpes simplex virus, and coronaviruses [35,38–40]. Human infecting coronaviruses have been shown to use clathrin- and caveolae-mediated endocytosis; however, alternative SARS-CoV-1 entry mechanisms, independent of clathrin and caveolae, have also been confirmed, distinguishing SARS-CoV-1 from the other coronaviruses and enveloped viruses in general [34,41–43].

Clinically available chlorpromazine (CPZ), a clathrin inhibiting drug, significantly reduced MERS entry and infection rates, but not those of SARS-CoV-1 [34,44]. Additionally, sequestering cholesterol lipid raft-dependent caveolae-mediated endocytosis with filipin or nystatin did not disrupt SARS-CoV-1 entry [34]. The lack of reliance on clathrin and caveolae for cell entry of SARS-CoV-1 is similar to influenza viral entry, which can also infect cells via clathrin- and caveolin-independent endocytic pathways [35]. Interestingly, Ou et al. reported that SARS-CoV-2 endosomal entry can be inhibited using apilimod, a potent phosphoinositide 3-kinase (PIKfyve) inhibitor. PIKfyve is the only mammalian producer of phosphatidylinositol-3,5-bisphosphate (PI(3,5)P2) in early endosomes, and it regulates endosomal sorting [32,45]. Although Ou et al. reported no in vitro cytotoxicity of apilimod at any of the tested concentrations on HEK293 and HeLa cell lines, despite its low abundance in endosomes, the PIKfyve-PI(3,5)P2 complex is crucially important to cellular homeostasis and signal pathways [45,46]. Disruption of this complex has been linked to a variety of neurological diseases and neurodegeneration in humans with links to Alzheimer's disease and amyotrophic lateral sclerosis (ALS) [45,46].

Downstream of PI(3,5)P2 is the NAADP-PI(3,5)P2-activated cation-sensitive Two Pore Calcium Channels (TPC) [47]. TPC1 (TCID 1.A.1.11.25) and TPC2 (TCID: 1.A.1.11.19) have been shown to play roles in calcium signaling in endosomes and facilitate the fusion of acidic protease-rich lysosomes with endosomes to form late endo-lysosomes, providing a versatile array of cell type-specific responses [48–50]. Generally, both TPC1 and TPC2, but primarily TPC2, efflux calcium from endosomes into the cytosol of the cell to signal endo-lysosomal fusion, with TPC2 localizing to both early and late endosomes as well as lysosomes [51,52]. Mature endo-lysosomes may continue to use TPC2-mediated calcium signaling to regulate endo-lysosome interactions with the ribosomal rich ER membrane, a region of importance for enveloped RNA viruses [51].

Interest in TPC2 has grown after confirmation of its role in NAADP-induced calcium signaling for endosomal sorting and maturation, relevant to virus entry. Inhibition of TPC2 has shown positive results in preventing endosomal viral entry into cells for both enveloped and nonenveloped viruses. The enveloped virulent viruses, Ebola [53], MERS [54], SARS-CoV-2 [32], and henipaviruses [55] all show weakened or abrogated abilities to release viral contents into the host cell cytosol after either deleting or inhibiting TPC2, indicating that TPC2 is a necessary and common entry participant for various viruses. Disruption of TPC2 activity also blocks translocation of virus-containing endosomes through the endocytic pathway in MERS infected cells [54]. Unfortunately, none of the studies pinpointed a more complete reason as to why TPC2 is crucial for viral entry, but an understanding of the requirements of viral fusion proteins may provide insight.

Another overlap among the mentioned viruses is the dependency on the host endolysosomal protease, cathepsin L, which cleaves and activates the various viral fusion proteins [56]. While other proteases may be used by various viruses, SARS-CoV-2 showed a specific dependency on cathepsin L; deletion of this lysosomal protease drastically decreased viral release [32]. TPC2s calcium signaling could be a crucial step in (1) endolysosome formation, (2) incorporation of lysosomal proteases, (3) acidification of the virus-containing endosome as a prelude to membrane fusion/viral release, and (4) navigation to the ER-golgi membrane [51] where membrane-associated protein synthesis occurs. Indeed, TPC2 inhibition with fangchinoline inhibited endosomal furin protease activity and mobility of endosomal structures [54]. It remains elusive if TPC2-mediated signaling is required for cathepsin L protease activity. To the best of our knowledge, no such study exists as of ye<sup>t</sup> connecting the full nature of endosomal maturation pathways, protease activity and viral membrane fusion.

#### **4. Coronavirus Genome Structure, Replication and Expression**

The coronavirus genomes are the largest of any positive stranded RNA viruses, ranging from 26 to 32 kb [57]. Each of these genomes is a single continuous strand of RNA that mimics host mRNAs with the existence of a 5 cap and poly A tail [58]. Coronaviruses share a common structure, but in the interest of the epidemic strains of β-CoVs, we will primarily discuss the genomes of MERS, SARS-CoV-1, SARS-CoV-2 and other analogous β-CoVs. The 5 end of the genome contains ORF1a and ORF1ab which contain the early nonstructural genes that compose the CoV replication-transcription complex (RTC) (Figure 1).

While the genes in the RTC region are directly expressed, replication and transcription of the genome is more complicated. Like other RNA viruses, Coronaviruses encode their own RNA-dependent RNA polymerases (RdRp) for genome and gene amplification. Both genome replication and expression of sub-genomic RNAs towards the 3 end of the genome require reverse transcription and production of negative strand mRNA intermediates. Further complicating amplification and expression of late genes is the presence of transcription-regulating sequences (TRSs) that exist at the beginning of each structural or accessory gene [59]. These TRSs are composed of clusters of weak affinity oligonucleotides; for instance, SARS-CoV-1 TRSs contain the core sequence (5-AAACGAAC-3) [60]. These unique TRSs act as RdRp pause sites, giving the viral replication complex opportunity to dissociate and randomly walk around the genomic template until it finds another TRS to resume transcription. Transcription or random recombination of these sub-genomic RNA strands does not stop until the RdRp reaches a final TRS at the leader sequence on the 5 end of the genome [59]. Alternatively, the RdRp can continuously transcribe through the genome, creating complete negative strand templates for full genome replication. This novel ability to discontinuously transcribe negative strand templates and recombine transcripts can provide insight into how CoVs are so readily able to mutate and recombine into novel strains. Coinfection of a single cell by two different CoVs can experience an RdRp jump from the TRS of one genome to that of another CoV genome, potentially creating a recombined genomic transcript [61]. Altogether, negative strand RNA templates comprise only up to 1–2% of the total viral RNA products in an infected cell, implying that only a few negative strand templates are required for amplification of full genomes and sub-genomic late genes [59,62]. All negative strand templates must then be reverse transcribed, again to create positive strand mRNA templates for protein synthesis or production of viable genomes for packaging.

Like other positive sense single stranded RNA viruses, early expression genes are transcribed and translated directly off its genome. These early genes exist in two nested open reading frames, ORF1a and ORF1ab, and include the nsps (Figure 1). These proteins are translated by cellular ribosomes into polyproteins, ORF1a polyprotein (pp1a) and ORF1ab polyprotein (pp1ab) [63]. pp1a is cleaved into 10-11 nsps, and pp1ab is cleaved into as many as 16 mature viral nsps that participate in immediate catalysis of viral RNA synthesis (Table 1) [64]. Cleavage of these polyproteins is accomplished by virally derived papain-like proteases PL1pro and PL2pro in nsp3 (but only PL1pro designated as PLpro in SARS-CoV) [65] and 3CLpro in nsp5 [66]. nsp3 cleaves off itself, nsp1 and nsp2 from pp1a or pp1ab while 3CLpro cleaves itself from nsp4 and nsp6, and it additionally cleaves nsp7-16 [67]. Expression of both pp1a and pp1ab is made possible by a heptameric 'slippery sequence' (5-UUUAAAC-3) and a 'pseudoknot' that exists at the end of ORF1a before ORF1ab. In SARS-CoV-1, and likely other betacoronaviruses, the pseudoknot is composed of 3 highly conserved, thermodynamically stable stem loops (free energy ~ −21.12 kcal/mol). These stem loops are digestible by dsRNAses, and disruption of any of these stem loops reduces frame shifting and expression of pp1ab by up to 94% in vero E6 cells [68]. The pseudoknot occasionally restricts the ribosome from proceeding farther than the ORF1a stop codon, halting expression at this point and allowing for the ribosome to 'slide' on the slippery sequence. This ribosomal sliding may shift the reading frame backwards by 1 nucleotide from ORF1a to ORF1ab, extending translation to express pp1ab [59]. However, most of the time, the ribosome is able to melt the pseudoknot, progressing through the slippery sequence uninterrupted until it reaches the ORF1a stop codon, producing more pp1a than pp1ab [68,69].


**Table 1.** Functions and properties of corona virus structural (S, E, M, N) and non-structural proteins (Nsps1–16) as well as ORFs 3a & b, 6a, 7a & b, 8a & b, and 9b.


ORF1a encodes nsp1–11, which include several membrane spanning proteins associated with viral replication. Within ORF1ab are the genes for the important nsp12–16 which all directly participate in viral RNA replication [63]. Collectively, the nsps derived from ORF1a and ORF1ab are called the RTC [62]. The important nsp12 functions as the viral RdRp for reverse transcription and is associated with nascent viral RNA synthesis [62]. To emulate host mRNAs, provide stability, ensure proper translation, and avoid degradation by host nucleases, the viral genomic and nascent RNAs are all capped and methylated at the 5 end via nsp14 N7 cap methyltransferase [84]. With expression of the RTC, the virus is equipped with the proper arsenal to hijack the cell, requisitioning and modifying important host organelles and the machinery for its own replication.

#### **5. CoV Replication Organelle Formation: A Dance of Membrane Rearrangements**

In order to review the nature of coronavirus membrane reorganizations, it is important to note that all nidovirales rearrange host membranes in similar manners, thanks to nsps derived from pp1a and pp1ab. In this section, we present collected information from various studies of early membrane rearrangements induced by members of the Nidovirales (Coronaviruses and Arteviruses) with an emphasis on β-CoVs (SARS-CoV-1/2, MERS-CoV and MHV). Throughout the replication cycle of Nidovirales viruses, dramatic rearrangements of host membranes are utilized by the viruses for replication, protein expression, assembly and exit. Virally induced zippered ER, production of double membrane vesicles (DMVs), convoluted membranes (CMs), and later Vesicle Packets (VPs), Double Membrane Spherules (DMSs), the occasional membrane whorls (MWs), and giant vesiculations (GVs) are hallmarks of coronavirus and other nidovirales infections. Other more elusive structures include cubic membrane structures (CMSs) and tubular bodies (TBs) [84]. All membrane rearrangements are visible through electron microscopy (EM) and electron tomography (ET) and occupy generous portions of the cell space. While minute details will obviously cause slight differentiations in the dynamics of early infection, the nsps responsible for membrane rearrangements are astoundingly similar amongs<sup>t</sup> the Nidovirales, allowing for a comparative analysis within the virus order. See Table 2 for comparisons between important membrane rearrangements amongs<sup>t</sup> different Coronaviruses.


**Table 2.** Membrane structures found in various coronaviruses. Bold checkmarks indicate dominating membrane structures.

In all (+) stranded RNA viruses, viral-induced reorganization of cellular membranes has been documented and associated with viral RNA synthesis [85]. Benefits of associating viral RNA synthesis with modified membranes include more efficient catalysis of viral products with association of macromolecule-rich membranes such as the ER and Golgi. Viruses may reorganize these organelle membranes into 'viral organelles' to separate specific parts of viral replication stages and allow induction and confinement of RNA synthesis and dsRNA intermediates to microenvironments shielded from host innate antivirals [86]. Viral organelles specific for replication and nucleotide synthesis are appropriately called replication organelles (ROs). Nidovirales, and more specifically, the subfamilies Coronaviruses and their related subfamily Arteriviruses appear to have a novel rearrangemen<sup>t</sup> of host membranes not seen in any other (+) sense RNA viruses [85,87]. In β-CoVs, early expression and cleavage of nsp2–6 TM proteins participate in membrane rearrangemen<sup>t</sup> of the ER into DMVs which are completely sealed from the cytosol [MERS, SARS, MHV, HCoV-229E, PEDV, IBV], with cleavage of nsp3 and nsp4 being crucial for SARS-CoV-1 and MERS. Plasmid expression of only MHV or MERS-CoV-1 nsp3 and nsp4 was sufficient to replicate clustered DMV membrane rearrangements reminiscent of MERS-infected cells [88,89]. Similarly, SARS-CoV-1 nsp3 and nsp4 were sufficient to induce phenotypic

membrane rearrangements reminiscent of SARS-CoV-1 infected cells [89], but nsp6 may additionally promote these rearrangements [57]. Together, these findings imply that late membrane-bound structural proteins are not required to modify host membranes in the early infection, and this process is primarily associated with the RTC.

In the following text, we attempt to provide an approximate timeline of CoV RO formations with an emphasis on β-CoVs. Nearly immediately after the early synthesis and expression of RTC RNAs, viral replication complexes begin to form within cytoplasmic membranes. Abundant transmembrane nsps begin to accumulate in the membranes of the ER and induce unusual modifications, localizing in the perinuclear region of the cell. With β-CoVs, as early as 1–2 hpi, isolated DMVs can be seen forming in the cytoplasm with a slight proximity to the ER, which may exhibit a slight zippering shape [90,91]. Low levels of viral RNA synthesis are also detectable in MERS, SARS, MHV, and IBV [62]. Shortly after, in β-CoVs, large CMs (0.2–2 μm) and reticular inclusions form with connections to DMVs and the ER [90,91]. By 4 hpi, DMVs have drastically increased in amount and cluster throughout the cell with increasing localization in the perinuclear space [90,91]. At some point, the golgi membrane is also incorporated in the DMV-CM-ER network [91,92]. By 5–7 hpi, newly assembled and budding virions appear in the Golgi cisternae as do virion-containing golgi-derived LCVCs [90,91,93]. α-, β-, δ- and γ-CoV ER morphologies adopt a zipper form [92–94]. γ- and δ-CoVs also contain DMSs sprouting from the ER with small 'neck like' openings to the cytosol [93,94]. At 7 hpi and beyond, there can be as many as 200–300 DMVs clustered within the cells infected with SARS [90]. After this point in time, the outer membranes of several DMVs have fused together to create an interconnected system of single membrane vesicles contained within one communal outer membrane forming large VPs. At least 95% of DMVs have one or multiple thin 'neck like' connections (~8 nm) to one or several other DMVs, CM or the ER, implying that late stage modified membranes form one large uninterrupted network with the ERGIC membranes [90,91]. Late infection membrane rearrangements in α- and β-CoVs may also include DMSs embedded in CMs and have interconnections with CMs, CM-ER, and ER membranes, although the exact time frame when these are first formed in β-CoVs is not clear [92,93,95]. This network is appropriately called the reticulovesicular network (RVN). [64]. Finally, at ≥10 hpi, assembly of new virions can be seen budding into and from the heavily modified ERGIC lumen, and the virus has completely repurposed the cell for viral replication [90,92].

There can be up to 1000 vesiculations in a single SARS-infected cell [90], and separate VPs fuse together to form GVPs which may be interconnected with LVCVs and Golgi membranes containing both vesicles and significant numbers of budding complete virions [90,95]. In all CoVs, at every recorded stage, no DMVs exhibit openings to the cytosol [64,90–96]. DMVs close to the ER exhibit tightly apposed inner and outer membranes, but perinuclear residing DMVs have looser organization between the two membranes [90].

While the patterns of membrane rearrangements are generally common amongs<sup>t</sup> all coronaviruses studied thus far, slight differences do exist among the families. γ-CoV IBV and δ-CoV PDCoV did not show visible membrane rearrangements until after 6 hpi, but the phenotypes were remarkably similar to those of the observed β-CoVs and remained consistent through to 24 hpi [92–94]. Another stark difference between β-CoVs and γ-CoVs is the organization of CMs and DMVs. In β-CoVs, DMVs likely form first and cluster in a net of CMs [90,91], while in γ-CoVs, zippered ERs and DMSs are the primary structures, and DMVs tend to appear as free-floating vesicles away from the ER [94]. α-CoVs, HCoV-229E and PEDV, have membrane rearrangemen<sup>t</sup> patterns similar to those of β-CoVs, but they take between 24–60 hpi to develop [92,95]. PEDV also exhibits unique late infection membrane rearrangements called endoplasmic reticular bodies (ERB) which occur in a minority of cells and virion-positive endolysosomal compartments [95].

As mentioned before, β-CoVs require nsp3 and nps4 for modification of cell membranes. Exactly how these nsps are capable of modifying the ER membrane to produce zippering and vesiculations remains uncertain. Disruption of nsp3 and nsp4 expression

greatly inhibits RVN formation and viral genomic replication [97,98]. Co-expression of nsp3, nsp4 [88,89] and nsp6 (for SARS) [57] induces clustered DMVs and disorganized double membraned CM-like structures sprouting from the ER. It is thus proposed that accessory structures other than DMVs are induced by other viral proteins or viral replication.

Initially, it was thought that host vesicular or secretory pathways might associate with these nsps to induce membrane changes. Autophagosomes and endo-lysosomes have been known to be induced by coronavirus infections and appear during mid-late infection, but they are not required for viral replication [95,99]. Additionally, during mid infection, virion positive Golgi born LVCVs form in conjunction with ERGIC and DMV constructs [91]. However, no matter which CoV, there exists a high statistical correlation between the abundance of different membrane rearrangements. In MHV, the abundance of DMVs with CMs in a single cell is comparable to the abundance between DMVs and DMSs in an IBV infected cell [100]. These findings provide evidence that membrane rearrangements for a given CoV are highly related to each other in the formation of an RVN. Perhaps a dominating structure in a CoV may be able to compensate in function for another missing membrane structure.

Picornavirus utilizes some parts of the secretory pathway for membrane reorganization (DMVs) and replication [101]. Treatment of infected cells with brefeldin A, an antiviral drug that inhibits secretory protein transport from the ER to Golgi, suppresses host secretory pathways completely and prevents picornavirus replication, but it only partially inhibits coronavirus replication [64]. In fact, upon suppression of native host secretory pathways, early induction of DMVs was hindered merely to 20% of normal activity, and latestage infection exhibited a similar phenotype [64]. Interestingly, this experiment revealed that host ER secretory protein Sec61α (TCID: 3.A.5.9.1), a transport protein subunit that anchors ribosomes to ER membranes and shuttles unfolded polypeptides into the lumen of the ER, was redistributed to RVNs upon SARS infection. However, of interest, Brefeldin A treatment of SARS-infected cells (1–7 hpi) accelerated RVN formation, while slightly inhibiting the expression of nsp3 and N protein causing small differences in RVN morphology. DMV formation seemed to be accelerated, and aggregation into LVCVs occurred during mid infection (7 hpi) as opposed to late infection times. Additionally, the luminal space between inner and outer membranes of DMVs appeared more open, but not on the sides facing CMs. Intracellular virions could still form, but brefeldin A treatment prevented secretion of these particles, due to downstream inhibition of excretory pathways necessary for virion budding [64]. Hence, suppression of secretory pathways slows early CoV infection and lowers virion productivity, but it does not stop RNA synthesis and virion production altogether.

Protein disulfide-isomerase (PDI), another luminal ER-associated protein that participates in the formation of disulfide bridges of unfolded polypeptides, migrated to MHVand SARS-induced RVNs and also partially localized with nsp3 [64,102]. Interestingly, when nsp3 or nsp4 is singly expressed within a cell, it localizes with PDI, but not when they are co-expressed [88]. Unfortunately, what the relationships between CoV RVNs and host PDIs are remains elusive. Any antiviral responses involving suppression of host ER protein interactions with nsps must be further investigated [102]. Autophagosomes have also been proposed to be involved in membrane rearrangements, but several studies seem to have argued against this hypothesis, as deletion of autophagy-related-genes (encoding ATGs) did not prevent DVM formation, despite a SARS nsp6 association with the ATG pathways [103–105]. Additionally, microtubules are not required, despite LC3 decorating the outer membranes of DMVs [85].

This departure from common viral replicase themes in early membrane rearrangements has brought Nidovirales research to the currently supported hypothesis that the nsps themselves are sufficient to induce these dramatic membrane rearrangements. Associated host pathways discussed above merely support mechanisms that promote membrane rearrangemen<sup>t</sup> without being required. Bioinformatic analyses of the TM domains as well as NMR and X-ray crystallography analyses of these nsps, support the theory that simply

the shapes and protein-protein interactions amongs<sup>t</sup> ER spanning nsps is enough to scaffold membrane rearrangements. Both nsp3 and nsp4 span the membrane multiple times and induce the ER early membrane pairing necessary for RVN formation. Intriguingly, nsp3 and nsp4 of β-CoVs and their equivalents in other CoVs retain only modest primary sequence homology, in spite of strong structural similarity, specifically, in the proximal location of the TMDs and large luminal loops [106]. This could explain why early infection membranes are characterized by a zippered ER with minimal vesiculation in some CoVs. Over time, as RNA synthesis increases, expression of pp1a and pp1ab allows for sufficient accumulation of membrane spanning nsps in the ER, causing dramatic vesiculation and reconstruction of host membranes. Differences in dominating structures may be attributed to small differences in integral membrane nsp topologies, or to other infection dynamics. Lone expression of nsp3 or nsp4 individually only localizes the protein to the ER, but coexpression of nsp3 and nsp4, or polyprotein nsp3–6, gave rise to perinuclear localization of these nsps and formation of clustered DMVs [57,88]. Structural inspection of these nsps revealed that a truncated nsp3, where the deletion spans from the first TMS to the C-terminus, can still localize with nsp4 to perinuclear regions [88]. Even more interesting is a possible direct involvement of nsp3 and nsp4 in viral replication in addition to RO formation.

#### **6. Structures of nsp3 and nsp4**

Initially, pp1a and pp1ab are cleaved by the PL1pro or PL2pro papaine-like proteases that are encoded in the region adjacent to nsp3. In SARS and MERS-CoVs, only the PL2pro protease is present, but it performs the same function and cleaves the pp1a or pp1ab into the nsp1, nsp2, and nsp3-polyproteins. Further proteolysis of the nsp3-polyprotein is fulfilled by nsp5 to generate the individual nsp3, and remaining nsp proteins. Also, to the N-terminal side of nsp3 is a ubiquitin-like domain 1 (Ubl1) which, while retaining poor primary sequence conservation amongs<sup>t</sup> CoVs, has well conserved secondary folding and is present in all CoVs [98]. Ubl1 is associated with ssRNA, interacts with the N protein, and is essential for viral RNA synthesis [106]. Ubl1 weakly binds to N, but the exact regions that associate with each other may vary among CoVs [98]. However, deletion of Ubl1 in MHV prevents viral replication [98]. It is possible that Ubl1 serves as a dock for viral genomes to early RTCs, bringing N and the nascent viral genomes close together for packaging. This would sugges<sup>t</sup> that nsp3 not only facilitates the remodeling of host membranes, but also serves as an active protein in genome packaging. Additionally, the Ubl1 of SARS-CoV-1 might disrupt Ras regulated cell-cycle progression. The Ubl1 is similar to the Ras-interacting domain of the Ral guanine nucleotide dissociation stimulator (RalGDS). Since SARS and MHV infections are known to induce cellular arrest in the G0/G1 phase, it could be that nsp3/Ubl1 disrupts the interaction between Ras and its downstream effectors [98].

Following the Ubl1 domain are the macrodomains (Mac1, Mac2 and Mac3), with Mac2– 3 forming a portion of what was once thought of as a SARS Unique Domain (SUD). It was thought the SUD was unique to SARS due to the high variability in CoV genomes, but secondary structure analysis has revealed strong structural similarity in these regions amongs<sup>t</sup> other CoVs, and Mac2-3 may not be unique to SARS [106]. The Mac domains are all similar in structure and are hypothesized to be gene duplicates of Mac1. However, only Mac3 was shown to be necessary for SARS replication in a cDNA study [107]. Following Mac3 on the C-terminal end is Domain Preceding Ubl2 and PL2pro (DPUP), which forms antiparallel β-sheets. The Mac2-Mac3-DPUP complex has an affinity for nucleic acids and binds to RNA. Specifically, Mac3 was shown to bind oligo(G) [107] and oligo(A) [98]. Thus, Mac3 may bind to the poly(A) tails of RNA molecules present in viral genomic, subgenomic and host RNAs. Also present in Mac3, is a unique antiparallel β sheet that exists in MHV [108]. Following the DPUP and Ubl2 domains is PL2pro, previously mentioned to cleave the nascent viral polyprotein into nsp1/2, nsp2/3, and nsp3/4 [109]. In addition to autocleavage activity, PL2pro has deubiquitinating and deISGylating activities, which may participate in suppression and evasion of intracellular innate immune pathways [98,109].

Note: ISGylation is the process of IFN-induced gene ISG15 ubiquitin-like protein associating with targets (ISG15 targets) [110]. The SARS-CoV-1 Mac2-3 and PL2pro domains together may elicit innate immune suppression by competitively binding to host E3 ubiquitin ligase RCHY1, leading to down-regulation of antiviral pro-apoptotic transcription factor p53 [111,112]. In contrast to the enzymatic nature of the N-terminal side of nsp3, the C-terminal one third end of the nsp3 protein (nsp3C) interacts with nsp4 and other nsps (e.g., nsp8) [57,88,97,98]. The specific interaction of nsp3 with nsp4 may induce the hallmark membrane curvature in the ER.

The nsp4 protein has four TMSs, one large luminal loop between the 1st and 2nd TMSs, and a small luminal loop between the 3rd and 4th TMSs. The 3rd and 4th TMSs are dispensable for SARS and MHV nsp4, but deletion of TMSs 2–4 affects localization with nsp3. Moreover, deletion of either TMS 1 + the large loop, or TMSs 2–4 + this large loop, completely prevents localization with nsp3 and nsp3c [88]. Changing the luminal loop of nsp4 of one CoV for another also prevents localization with nsp3, suggesting that the exact structure of the N-terminal large luminal loop is specific per CoV nsp3c binding site [88]. Within the large luminal loop resides 4–10, cysteine residues as well as glycosylation sites are conserved amongs<sup>t</sup> CoVs [57,88]. Deletion of glycosylated regions produces aberrant DMVs with large luminal spaces and increased levels of CMs [57]. Replacement of the cysteine residues results in low localization with nsp3, suggesting possible disulfide bridge formation amongs<sup>t</sup> nsps during membrane pairing [88], as well as a possible reliance on PDI and Sec61 α. In SARS, the region responsible for localizing with nsp3C are within the regions 112–164 aas and 220–234 aas. Deleting those two regions, or only the specific residues H120 and F121 within nsp4, prevents localization with nsp3C, replicon formation and thus, viral replication, but the phenotype can be rescued when wild type nsp4 is reintroduced via an encoding cDNA [97].

In SARS-infected cells, nsp6 is required to induce an RVN phenotype akin to wildtype. Without nsp6, DMVs migrate farther away from the RVN. Nsp6 may break up elongated stretches of membrane pairings, causing DMV vesiculation to cluster near CMs, causing RVN formation for SARS. The existence of polyprotein nsp4–6 following nsp3 cleavage may be critical for efficient SARS DMV formation [57]. Overall, nsp3, nsp4 and nsp6 are essential for normal SARS replication. In the pursuit of antivirals, inhibition of the papain-like proteases that cleave the early polyproteins would be an excellent upstream target to block early viral replication.

#### **7. The Replicon Conundrum and a Putative Nucleopore**

Based on the themes of other (+) sense RNA viral replicases, it would be expected that coronaviruses may modify membranes for similar reasons as for the existing canonical replication themes of many RNA viruses, using DMVs as RNA factories. In other RNA viruses, the existence of dsRNAs is often a marker for nascent viral RNA synthesis since dsRNA molecules are a necessary intermediate. Labelling for dsRNA in CoVs revealed that their presence within DMVs could be observed throughout infection, with the signal growing stronger over time. Following the logic of other RNA viruses, perhaps DMVs might provide an encapsulated environment free of host innate immune mechanisms to protect viral RNA synthesis. Utilizing membrane rearrangements sourced from the ER would also benefit CoVs as their proteomes contain many essential structural and nsp TMD-containing proteins. However, in all previously recorded CoVs and Arteviruses, the DMVs are sealed from the cytosol, and nsps delocalize from DMVs during mid to late infection [90,92]. Spatially, DMVs migrate away from the ER-CM constructs and cluster in the perinuclear regions forming the VPs in conjunction with modified Golgi and LVCVs during late infection. Electron micrographs of some VPs and virion budding LVCVs documented that the connected membranes seem to compartmentalize, keeping former DMVs on one side while mature and budding virions form on the other [90,92].

Studies using either BrU or Click chemistry and labelled RdRp and an nsp (nsp8 or nsp12) to detect nascent RNA synthesis revealed that CoV viral synthesis first localizes with dsRNAs and TM nsps, but migrates to perinuclear regions later in infection, eventually spreading throughout the cell [62]. On the other hand, nsps primarily accumulate in the ER and CMs [62,64]. Additionally, dsRNAs did not incorporate BrU or clickU, implying that these intermediates were catalytically inactive and did not participate in RNA synthesis at the time of labelling, despite RdRp nsps also co-occurring in the lumen of DMVs [90]. Similar findings in studies with γ-CoVs showed that less than 1.5% of RdRp nsps colocalized with dsRNAs [94]. These results suggested that dsRNAs and minor localizations of RdRp nsps are not bona-fide markers for viral RNA synthesis. Rather, encapsulating dsRNAs within DMVs may be used to protect the replicase from activating immune pathways.

Another proposed structure for RNA synthesis is the DMS-ER network. DMSs have been detailed to also occur in α-virus Semliki Forest Virus (SFV) of the Togaviridae, and they are reported to be sites of RNA synthesis [113]. Due to the small openings in DMSs that connect them to the cytoplasm, it was hypothesized that DMSs would be probable sources of RNA synthesis in CoVs since they also occur in abundance in α-CoVs [94], δ-CoVs [PDCov], γ-CoVs [IBV], and recently also, in low quantities, in β-CoVs [92]. However, actual labelling of nascent RNAs with radioactive 3H-U and EM imaging revealed little to no localization mid-late infection near zippered ER/CMs or DMSs in IBV, MERS, and SARS [92]. Intriguingly, both 3H-U and indirect immunogold-BrU labelling still supported membrane structures in proximity to DMVs as the primary regions of RNA synthesis [114], and not DMSs or CMs [92]. However, these conclusions were drawn only from images taken in mid to late infection and do not represent the dynamic findings of previous CoV RNA synthesis. Although DMSs tend to occur abundantly in IBV, RNA synthesis and virion production were not hindered in the pathogenic M41 strain of IBV that produced significantly less DMSs [100]. This could mean that very few DMS structures are necessary for subsequent RNA synthesis and virion production. Since (1) DMSs occur in low abundance with non-γ-CoVs, (2) they are difficult to discern among other membrane rearrangements, and (3) their time of production remains in question, it is possible that very few DMSs are required in early stages of infection. As the infection progresses, late RNA synthesis could migrate from DMSs or CMs to DMV-associated structures. Since IBV DMVs have repeatedly been reported to occur as lone vesiculations with only a few connected to the ER, it may not be a requirement for them to be connected to the ER as previously stated. Still, RNA synthesis has also been reported to have low background activity in the cytosol of infected MHV cells [114], and 3H-U or immunogold-BrU labelling with EM may not be sensitive enough to indicate minor regions of RNA synthesis. Following similar reasoning, the existence of CMs that do not occur in some CoVs may perform functions similar to those of DMSs. Alternatively, CMs may simply be consequential constructs caused by an overaccumulation of nsps despite their close relationship to DMVs. To elucidate the nature of RO-RVN formations and RNA synthesis, it would be best if a future study combines labels for TM nsps, RdRp complexes (nsp7 + 8 and nsp12), nascent RNA labelling, dsRNA labelling and EM all together throughout several time points spanning early to late infection with high resolution imaging. Of course, all results and findings are at the mercy of sample preparation and handling to carefully preserve the delicate nature of these RVNs.

This unpairing between dsRNAs and RdRp complexes seems to be a common theme among all of the CoVs examined in this review. Even more curious is the observation that MHV nascent RNAs delocalize with dsRNAs between 4.15–5 hpi to 8–9 hpi, with a Pearson correlation coefficient dropping from ~0.6 to ~0.35 [62]. In the perspective of virion synthesis, statistical correlations between membrane rearrangements, CM-DMVvirion abundance in MHV and DMV-DMS-virion abundance were low [100]. However, these findings still do not exclude DMVs from being involved in RNA synthesis, as EM imaging revealed that nascent RNAs still localize near DMVs and the ER in early and late infections [90,92]. Determining how dsRNAs collect inside the lumen of DMVs has also been troubling since no imaging provided has been able to detail DMV intermediates so far.

Returning to the pore hypothesis, recent cryo-EM tomograms of DMVs revealed a pore complex formed by nsps, sparsely scattered on DMV surfaces, opening the lumen to the cytosol in MHV and SARS-CoV-2 infected cells [115]. The pore has 6-fold symmetry, and the channel begins with a 6 nm wide opening facing the cytosol, surrounded by a protruding crown with 6 prongs extending 13 nm outward and 14 nm away from the central axis of the opening. The pore is stated to be analogous to the reoviridae genome packaging pore [115]. RNA export function has ye<sup>t</sup> to be confirmed, but the structure and its 6-fold symmetry is primarily composed of nsp3, which has RNA binding capacities [106]. The luminal side of the pore complex appeared denser, and Wolff et al. speculated that other luminal DMV-associated proteins such as N and/or nsp12 RdRp associate with this pore. Since nsp7, nsp8 and nsp12 form a reverse transcription tunnel, it is possible that the viral transcriptome associates with the luminal nsp3s of the pore. This suggestion is additionally supported by nsp3s known function as a scaffold protein for other nsps in the RTC [116]. Such a complex could then transcribe RNAs and export them upon synthesis in an efficient manner. Despite the pore being composed of nsps with catalytic activity, the pore itself has no confirmed catalytic activity, characteristic of other viral portals, such as those in bacteriophages, Reoviridae and Herpes [117–119]. We thus sugges<sup>t</sup> the RdRp could be used as a motor to feed transcripts into the pore. Meanwhile, aborted, or malfunctioned transcripts are not exported, but instead are left within the DMVs, due to the size limitations of the pore. Confirmation of this structure's existence is a major step forward to completing the coronavirus RO puzzle. It is also ye<sup>t</sup> to be confirmed if equivalent pores exist in CoVs outside of the β-CoVs.

It is possible that after significant suppression of host innate antiviral pathways due to the accumulation of nsps or other ORF proteins, that RNA replication no longer needs to reside in perinuclear membranes and may migrate throughout the cytosol during mid-late infections. This pore has been confirmed only in the β-CoVs, MHV and SARS-CoV-2, but due to the phenotypic similarity between many β- and γ-CoVs, it is likely that pores exist beyond the findings of Wolff et al. [115].

#### **8. Viral Proteins–Structures, Expression and Assembly**

#### *8.1. The Nucleocapsid (N) Protein: Genome Packaging*

The nucleocapsid (N) proteins of coronaviruses are reasonably well conserved proteins, although the SARS-CoV-1 and SARS-CoV-2 N-proteins, nearly 90% identical to each other, show only about 25% sequence identity with those from other members of the *Coronaviridae* family [120]. Nevertheless, most of them exhibit only moderate variation in size, usually being between just below 400 amino acyl residues (aas) (e.g., Porcine transmissible gastrointestinal CoV, TEGV, of 382 aas), to just over 450 aas (e.g., Murine CoV-3, MHV3, of 454 aas) [121]. They have three domains, an N-terminal domain (NTD) nearly 200 residues in size which is the dominant RNA-binding domain, a central Ser/Argrich flexible linker domain with a striated box of about 50 residues, and a C-terminal domain (CTD), which like the NTD, is roughly 200 residues in length, but functions in dimer/oligomer formation [122]. The N protein has primary functions in self dimerization/oligomerization and RNA binding, ye<sup>t</sup> although the NTD serves a primary function in RNA binding, all three domains have affinity for nucleic acids [123]. In addition, there are intrinsically disordered regions near the N- and C-termini of these N-proteins, each about 50 residues in length [124].

N proteins have multiple functions including but not limited to: (1) forming stable but dynamic complexes with the genomic RNA for compaction of the nucleic acid in the viral particle, (2) interacting with the structural membrane (M) protein to promote membrane envelop folding and virion assembly, (3) interacting via two distinct regions of N with the nonstructural protein, nsp3, to allow proper recruitment of N to the replication/transcription complex, (4) playing an essential role in enhancing the transcription of genomic RNA and viral mRNA, (5) increasing RNA replication efficiency, in part, by facilitating separation of the two RNA strands, (6) interfering with host cellular defense processes such as interferon

production, and (7) promoting host cell death (apoptosis) [125–128]. The N-protein can be phosphorylated to facilitate condensation with RNA and the M-protein and to modulate the liquid-liquid phase separation [129,130]. As noted above, its recruitment to the RTC plays a role in the coronavirus life cycle [131].

The details of many of these functions have been elucidated to a considerable degree, and several of them are clearly interrelated [123]. Its recruitment to RTC plays a crucial role in the overall coronavirus infection cycle [131]. Self-association of the N-protein, which also depends on its RNA binding capacity [132], is required for formation of the viral capsid, which occurs at intracellular membranes of the ER-Golgi intermediate compartment. Unfortunately, many details of the molecular packaging inside the virion have not been fully elucidated. Early electron microscopy revealed that the ribonucleoproteins (RNPs) are helical, consisting of coils of 9–16 nm in diameter with a hollow interior of about 3–4 nm [121]. In the mature virus particle, the capsid protects the viral genome from caustic chemicals and extreme physical conditions [133]. In this regard, it is important to note that N has protective RNA folding/chaperone activity, due in part, to the central disordered domain (the LKR domain), reducing the free energy barrier for dissociation of the nascent minus RNA chain from the genomic RNA template during discontinuous RNA transcription. N also promotes template switching, which may be a primary cause of its acceleration of transcription [134].

The 3-D structure of N together with NMR analyses revealed that the basic region between aas 248 and 280 in the SARS-CoV-1 N protein binds RNA, while the region just Cterminal to this sequence promotes octamerization of the CTD [135–138]. The former region forms a positively charged groove, being able to accommodate either single stranded or double stranded negatively charged nucleic acids [139]. Such interactions allow formation of a compact ribonucleoprotein complex, the nucleocapsid, that ensures timely replication, reliable transmission and proper regulation of translation while in the cell, before formation of the filamentous nucleocapsid of about 12 nm in diameter and up to several hundred nm in length, that will be incorporated into the viral particle during assembly [123]. The assembly process also depends on the M protein, which together with the E protein, is a primary core constituent in the final virion. By using 3D cryo-electron tomography with MHV particles, it was possible to see that the viral membrane was nearly twice the thickness of a typical cell membrane, possibly due to the C-terminal domain of the M protein [140]. It should be clear that the ribonucleoprotein complex, together with the closely associated M-protein, plays a major role in envelope formation and viral budding within intracellular ER-Golgi complexes (see the section on the M-protein). For this reason, the assembly of the N-protein oligomer with its associated RNA has been considered to be an appropriate target of drug action [141] (see below).

The N-protein has proven to be a successful target for antiviral drugs and may be useful for the potential development of vaccines. This topic has been extensively reviewed recently [142], and only a couple of examples will be provided here. Cyclosporin A and its non-immunosuppressive derivatives are effective antiviral agents for coronaviruses and many other viruses. They normally bind to cellular cyclophilins, thus inactivating the cistrans peptidyl-prolyl isomerase activities of the latter. Cyclosporin A (but not cyclosporin B) binds to and blocks the interaction between the N-proteins of various CoVs and cyclophilin to prevent viral RNA replication. Thus, cyclophilin inhibitors such as cyclosporin A block this protein-protein interaction, inhibit replication, and thus prevent infectivity [143]. Examining several cyclophilin inhibitors revealed that even non-immunosuppressive cyclosporin derivatives can block replication, showing that they could be effective antiviral agents with minimal side-affects [143]. Clearly these compounds might prove effective at blocking diseases such as Covid-19.

In another recent study, Lin et al. [138] examined the structure-based stabilization of N-protein-protein interactions for the purpose of designing antiviral drugs. This unique approach for the discovery of novel drugs was based on the high resolution 3-dimensional structure of the N-terminal domain of the MERS-CoV nucleocapsid protein (N-NTD). Non-native interacting interfaces of the dimeric N-protein surface proved to form a conserved hydrophobic cavity that could be used for targeted drug screening. The authors evaluated the complementary surface as a potential binding pocket for drugs and identified 5-benzyloxygramine as an ortho-steric stabilizer that exhibits both antiviral and N-NTD protein-stabilizing activities. X-ray analyses revealed that 5-benzyloxygramine stabilizes the N-NTD dimer through hydrophobic interactions between the protein and the compound. This causes abnormal oligomerization of the protein. Thus, novel approaches can be used to identify potential drugs that can be used to fight viral infections [138].

In case the antiviral approaches discussed above do not prove successful in combating the current or any future pandemic, there may be a need for novel antiviral approaches that can target emerging viruses, particularly when no effective vaccine or pharmaceutical is available, as is currently the case for Covid-19. Abbott et al. [144] showed that a CRISPR-Cas13-based strategy, which they called PAC-MAN (prophylactic antiviral CRISPR in human cells), can be used for viral inhibition by effectively degrading viral RNA in intact cells. The approach was tried against SARS-CoV-1 and live influenza A virus in human lung epithelial cells. CRISPR RNAs targeted conserved regions of the target proteins and proved to reduce viral load. The authors concluded that a set of only six CRISPR RNAs could target more than 90% of all coronaviruses, thus being potentially applicable to diseases caused by both human and animal coronaviruses. This technique could be developed for safe and effective delivery into the respiratory tracts of intact animals [144].

#### *8.2. The Envelope (E) Protein: Viral Assembly*

Among the essential conserved transmembrane proteins in the Coronaviridae family, the Envelope (E) proteins are multifunctional viroporins. The genomes of CoVs may encode up to 2 additional viroporins, 3a and 8a, making CoVs among the most viroporin-rich RNA viruses. E proteins are 74–109 aas long [145] with multiple domains and cellular associations. The N-terminal end is a short hydrophilic region followed by a hydrophobic region containing the α-helical trans-membrane-spanning segmen<sup>t</sup> (TMS). Following this TMS is a C-terminal hydrophilic region. E proteins contain an unusually short, palindromic transmembrane helical hairpin around a pseudo-center of symmetry, a structural feature which seems to be unique to CoVs [146]. The hairpin deforms lipid bilayers by way of increasing their curvature, providing a molecular explanation for E protein's pivotal role in viral budding [147]. Depending on the CoV, E protein may be glycosylated [148], palmitoylated [149] and ubiquinated [150]. These conclusions have been extensively confirmed [151–153], although unfortunately, a high-resolution X-ray or cryoEM structure is not ye<sup>t</sup> available. Deletions in various parts of the E protein throughout its length produces an attenuated virus.

Expression of the E and M proteins together in transfected cells is sufficient for VLP formation in MHV, TGEV, BCoV, IBV and SARS-CoV-1 [149]. In some CoVs, their expression is not essential to produce intracellular particles; however, their loss may result in a severe reduction of the number of released virions from the Golgi as for SARS, MERS and MHV. In many CoVs, deletion of or mutations within the E protein gene attenuates the virus both in vivo and in vitro and reduces the progression of disease and mortality in animal models. For SARS, ΔE mutants are able to replicate viable particles albeit at a lower efficiency [154]. HCoV-OC43 ΔE mutants have a dramatic deficiency in viral replication but are still able to replicate at a much lower efficiency than wildtype in CNS tissue culture and in mice with decreased pathogenicity [155]. TGEV ΔE and MERS ΔE mutants completely lose their ability to bud from host cells, making intracellular virions that are unable to infect new cells [156,157]. Deleting E from IBV results in lethality for the virus [24].

The ion channel activity of an E protein is a major contributor to the hallmark inflammatory response [158], leading to the cytokine storm and acute respiratory distress syndrome (ARDS) associated with respiratory CoV infections [159]. The two other recognized ion channels, 3a and 8a, have also been shown to illicit inflammatory stress in a similar manner; however, targeted changes to E have the strongest attenuation of viral

infections in SARS [160] and MERS [157]. Some CoVs, such as γ-CoV IBV do not have other viroporins [24], and studies on IBV E have shed some light on the various responsibilities of E proteins during infection without the noise of other accessory viroporins.

E proteins are largely interchangeable between β- and γ-CoVs, but not between E proteins of these CoVs and the α-CoVs [161]. Structurally, β- and γ-CoV Es are more similar to each other than they are to α-CoVs E proteins, containing predicted β-hairpin structural motifs in their C-terminal cytoplasmic facing tails (see preceding paragraph), responsible for localizing the protein to the membrane. In all CoVs, E localizes to the ER/ERGIC/Golgi perinuclear membranes, consistent with the CoV-induced reticulovesicular network. Specifically, the cytoplasmic tail of IBV-E targets and binds to the golgi tag GM130 and trans golgi tag p115, while SARS E localizes with GM130, ERGIC tag ERGIC53 and trans-Golgi tag p230 [162,163]. Surprisingly, neither the TMD nor the β-hairpin motif is necessary for Golgi localization as shown with truncated SARS E, suggesting the presence of a second Golgi localization tag within the N-terminal region of E. A truncated SARS-E, containing its N-terminus attached to the C-terminus of the VSV G protein still localized with GM130, ERGIC53 and p230 [163].

All CoVs have a well conserved proline residue in the β-strands of E tails. Mutating Pro54 to alanine (P54A) in IBV E disrupts its localization with the Golgi [163]. While the E protein is produced in abundance during infection, very few copies are actually incorporated into mature virus particles [148,158]. Despite this fact, E deletion mutants of SARS, MERS, MHV, TGEV and IBV produced weakened viruses that could either not escape cells (MERS, IBV, TGEV) [24,156,157], or had difficulty budding (SARS, MHV) [154,164]. For SARS, deleting the E protein leads to 100–1000-fold lower viral titers in lungs and nasal turbinates of infected hamsters [154] and lower NFκB (a major immune transcription factor) activation [158]. SARS with both 3a and E proteins deleted were non-viable but were rescuable if either 3a or E was reintroduced [160], thus suggesting marginal flexibility and exchangeability in viroporin roles.

Like many of the other CoV proteins, E may perform multiple roles during the infection process. Since it is only incorporated into virions in small numbers, it has been proposed that E helps scaffold newly forming virions, adding to their structural integrity. EM scans of intracellular virions revealed no change in SARS morphology upon deletion of the E gene, but upon viral purification, many of the ΔE mutant viruses had aberrant or misshaped morphologies, suggesting that an E deficiency makes CoV particles susceptible to shearing forces [154]. In all E deleted CoVs, smaller in vitro plaques are observed, and viral titers are reduced [148].

When purifying IBV E protein, two distinct molecular weight pools for the protein were extracted, suggesting oligomerization properties [165]. The lower molecular weight pool was predicted to consist of monomers and/or homodimers, while the higher molecular weight pool was predicted to consist of homopentamers, or possibly, hetero-oligomers associated with host proteins [165]. Wild type IBV E tended to favor the higher molecular weight pool, suggesting that a majority of the resulting conformations were homopentameric. Disrupting the hydrophobic domain composing the TMD of IBV E produced no high molecular weight pool, implying that the pentamers are formed through α-helical interactions [165]. Homopenatmers of E had already been suggested from earlier studies [166]. Indeed, it was predicted that the E protein requires homopentamerization for ion channel (IC) activity, where the amphipathic/hydrophobic α-helical TMSs form a continuous channel just large enough (4–5 A for SARS) to fit a dehydrated cation (H+, Na+, K+ , or Ca2+) through it [167,168].

β-CoV Es tend to be selective for Na+ over K+ ions, but conflicting evidence suggests that SARS E is slightly selective for K+ and Ca2+ ions and is dependent on the slight negative charge of ER membranes [168,169]. However, because the ER and Golgi are large Ca2+ stores in cells, it is more probable that E functions as a Ca2+ efflux channel, while minimal amounts of Na+ and K+ are imported into the ER/Golgi lumen [168]. For SARS, the predicted residues responsible for conferring IC activity to E are N15 and

V25 [167]. Solution NMR analyses in dodecyl-phosphatidylcholine micelles revealed dynamic conformational changes in the homopentamer that could accompany cation translocation [164].

The SARS E N15 residue and its polar equivalents in other CoVs may provide a cation selectivity filter, while V25 and V28 form a 2.0–2.3 Å constriction, predicted to correspond to the closed state of the IC [167]. In fact, creating a recombinant SARS E virus with a mutated residue at position 15 (N15A), but not at position 25, consistently eliminated IC activity and reduced pathogenicity in mice [170]. V25F mutants reverted back to the IC+ phenotype either by directly mutating F25 to C, or by mutating neighboring residues: L19A, F20L, F26L, L27S, T30I, and L37R 2 dpi in mice. Modifying the equivalent IC residues (T16A or A26F) in IBV E in vitro gave similar results, with E-A26F unable to form VLPs in vitro. However, no reversion mutations were recorded [24]. The reversion of V25F to various other residues just after 2 dpi in mice revealed an obvious danger in generating attenuated point mutant viral vaccines. However, analysis of the N15A mutant, which did not mutate back within the time interval of the study, suggests that mutations in the predicted cation selectivity filter are more lethal and specific than the structural V25/V28 residues. Perhaps leaving the filter intact causes the pore to retain selectivity, and opening the channel, obstructed by V25F, is easier than reverting the N15A filter residue.

The drug, hexamethylene amiloride (HMA), shown to abolish IC activity in MHV E and HCoV-299 E, also inhibits the IC activity of SARS E, likely by associating with the N15 residue and the equivalent residues in other CoVs [167]. Further investigations into SARS N15A mutants revealed 80–100% survivability rates in mice, despite similar disease progression during the first 2 dpi [170]. Overall, lung autopsies of infected mice revealed less swollen alveolar walls and airways free of pulmonary edema, as opposed to the typical Acute Respiratory Distress Syndrome (ARDS) phenotype induced by SARS [170]. Furthermore, neutrophil recruitment was lower in N15A mice due to the reduced amount of secreted IL-1β, TNF and IL-6 proinflammatory cytokines. IC activity also promotes the fitness and release of IBV viral particles [171,172]. HMA treated MHV or HCoV-299 infected cell lines exhibited much smaller plaques as opposed to the HMA-free infected cells with plaques roughly 3–4 mm in diameter [173].

In addition to inducing pro-inflammatory responses, the IC activity of E may confer major modifications to secretory or apoptotic pathways. IBV infections are associated with p53-independent, caspase-dependent, CHOP transcription factor and IRE sensormediated unfolded protein response (UPR) pathway-regulated apoptosis. This pathway is stimulated by ER stress, marked by the cleavage of downstream poly ADP-ribose polymerase (PARP) [24]. Similarly, SARS induces apoptosis in cell cultures via protein kinase R (PKR) [174], caspase-3-mediated ER stress, JNK-dependent pathways [175], and PERK and eIF2 α-mediated UPR activation [176]. However, CoVs prefer nonapoptotic budding of virions and regulate the apoptotic pathways, likely through E protein IC activity to optimize virus release. Specifically, CoV-induced apoptosis is related to ER stress, induced by the viral replication in the ER-derived RVN and the expression of unfolded, unprocessed accessory and structural proteins [24,177]. In SARS, S induced the greatest ER stress [176], although the E IC activity may also induce stress to a lesser degree late in infection [24,178].

E IC deficiency in CoVs, induced by mutations or drugs, leads to smaller plaques in in vitro tissue cultures and reduced pathogenicity in vivo. When eliminating E IC activity by generating recombinant IBV E-T16A or E-A26F mutants, levels of cleaved PARP and pro-inflammatory mRNAs for IL-6 and IL-8 were reduced [24]. SARS E protein alone reduced ER stress of vero-E6 and MA-104 cells when the stress was induced externally by adding either respiratory syncytial virus or an ER stress inducing drug, tunicamycin or thapsigargin. In comparison to wildtype, SARS-CoV-ΔE underwent higher rates of apoptosis and increased the expression of double specificity phosphatases, DUSP-1 and DUSP-10, despite expressing lower levels of proinflammatory chemokines CXCL2 and CCL2 [179]. DUSP-1 and DUSP-10 negatively regulate mitogen-activated protein kinase (MAPK) signaling, reduced the viral induced inflammatory response, and reduced the

synthesis and secretion of TNF, IL-6, CCL2/MCP-1, CCL3, CCL4 and CXCL2/MIP-2 [179]. Thus, deleting E in SARS or IBV leads to a weakened proinflammatory response while also attenuating virus production and infectivity. Possibly, E allows host tolerance to the virus during early-mid infection, preventing apoptosis for a long enough period to allow production of more viral particles through budding.

Meanwhile, wild type SARS had reduced expression of ER stress induced GRP78, GRP94 and MHCI antigen-presenting facilitator HSPs on the surfaces of infected cells compared to the mutant [179], and as noted above, SARS-CoV-ΔE deletion mutants induce apoptosis in infected cells at a greater rate than their wild type counterpart [179]. By contrast, late infection IBV IC activity may induce apoptosis by destabilizing the ion gradients between the Golgi lumen and the cytosol [24]. PEDV E protein was also reported to induce ER stress through the UPR, but the results were attained through a transfected plasmid encoding only PEDV E [178]. The IC activity of E protein [168] as well as those of the other accessory viroporins, 3a and 8a [160], activate the NLRP3 inflammasome by effluxing Ca2+ from the lumen of the ER/ERGIC/Golgi, altering the homeostatic levels of cytosolic Ca2+ [168,180] and resulting in upregulation and secretion of pro-inflammatory TNF-<sup>α</sup>, IL-1β, IL-6 and IL-18 [160,181]. ER stress [182] and ROS production [183] are also activators of the NLRP3 inflammasome, and due to E protein's regulation of ER stress, they may also activate NLRP3 through an alternative mechanism.

Release of assembled CoV virions requires secretory pathways during early-mid infection, and lysosomal pathways for egress in late infection [62,182,184]. While secretory pathways are necessary for production of CoV structural proteins and processing, the Arl8bdependent lysosome exocytosis pathway has recently been shown to be the exit pathway of mature MHV and SARS-CoV-2 virions [184]. Hence, E protein's role in viral release may be most important during assembly, before egress of mature virions to the cellular membrane. It is possible that E protein modifies secretory pathways to facilitate the release of intracellular particles since E mutants of several CoVs have difficulty leaving the cell. IBV infected cells, or expression of E alone, induces the neutralization of the Golgi pH, suggesting a role for E in altering secretion [185]. Replacing the hydrophobic domain in IBV E with VSV Glycoprotein HD led to a decrease in viral shedding, increase in damaged particles and accumulation of prematurely cleaved S protein [185], all suggesting a protective role for E in the maturation of other structural proteins. On the other hand, merely replacing the residues in the HD responsible for IC activity in IBV did not affect glycosylation or proteolytic processing of S [24]. Hence, the HD domain in its entirety, but not IC activity of E alone, may contribute to some of the purported functions of E. This suggestion is further supported by the fact that monomers were more strongly correlated with IBV-induced secretory modifications than pentamers although IC activity supports virion assembly [165]. Thus, additional conformations of E may equip CoVs with multifunctional molecular tools.

E proteins also contain a C-terminal class-II hydrophobic PDZ binding motif (PBM) [158] that anchors them to lipid membranes and participates in the relocalization of syntenin-1, a multifunctional adaptor protein that is involved in trafficking of membrane proteins to perinuclear regions [186], thus, activating p38 MAP kinase-mediated inflammation [160]. This PBM has also been found to interact with the host PALS1 protein, an epithelial cell polarization protein [187], disrupting the tight junctions between epithelial cells. It may perform a role in non-apoptotic virus release through a cell to cell exit mechanism [188]. Deletion of the PBM in SARS, either by truncating the E protein at the C-terminus or by replacing the residues within the PBM to produce a mutant E of the same length, did not affect viral replication efficiency in vero-E6 and DBT-mACE2 cells [189]. However, mice infected with virus possessing E, but lacking the PBM, did show a decrease in expression of inflammatory cytokines and active p38 MAPK in their lungs, reducing the pathogenic response and mortality [160,189]. Additionally, SARS transfected vero-E6 cell lines in which the full length PBM of E was disrupted, and deleting 3a (Δ3a, E-PBM-), or its inverse (3a-PBM-, ΔE) resulted in an infectious virus [160]. Introducing a stop codon

to truncate the E protein missing the PBM (3a, E-ΔPBM) reverted back to wildtype [160], showing an ability of different CoV PBMs to substitute for each other. These studies revealed the essentiality of PBMs in SARS-CoV-1 infections. HCoV-OC43 E PBM greatly improves propagation in human and mouse neuronal cells and infectivity in the brain and spinal cords of mice, and its removal attenuates the virus [155].

Other mutations introduced into the E protein also promoted attenuation of the virus, implying that other important regions of the E protein contribute to viral pathogenicity. Deleting regions along the hydrophilic C-terminus of SARS E led to reduced pathogenicity, although deletions at the very end of the C-terminal tail had no such effect in a mouseadapted model [190]. Clearance of viral infections, typical of CoV survivors, is associated with elevated levels of T cell production [191]. Infection by SARS is in part attributed to reduced numbers of T cells, primarily CD4+ T cells, leading to the host's inability to clear the infection [190,191]. Attenuating the SARS E protein by deleting these regions leads to less lung tissue damage and higher T cell counts, likely disrupting the β-hairpin Golgi localization motif or the PBM [163]. In MHV, replacing clustered charged residues within the C-terminal end of the E protein with alanine (E-K63A/K67A or E-D60A/R61A) resulted in thermally unstable virus particles with much smaller plaque morphologies [192].

#### *8.3. The Spike (S) Protein: The Primary Receptor and Membrane Fusion Mediator*

Spike (S) proteins of coronaviruses are the receptor binding glycoproteins and class I fusion proteins of CoVs. S proteins are large [1162–1376 aas], are synthesized in the ER/ERGIC, and are post translationally modified in the Golgi, undergoing proteolysis and extensive O- and N-linked glycosylation as well as palmitoylation. SARS S proteins are relatively unique among CoVs, sharing little sequence similarity with their relatives despite strongly conserved structures and functions [193]. SARS-CoV-2 S is similar to its 'predecessor', SARS-CoV-1 S, with a 76% aa identity with SARS-Urbani S and 80% identity with bat SARS-CoV ZXC21 S and ZC45 S [194], and 98% identity with bat RaTG13 [32,195], conserving several N-linked glycosylation sites [194]. During synthesis, the protein may be cleaved into the S1 (head and receptor binding) and S2 (membrane embedded stalk and fusion) subunits by either host or viral proteases [196], or it can be left as a full-length S protein, requiring cleavage at S1/S2 upon receptor binding [197]. If cleaved, these subunits then remain noncovalently bound to each other [193]. The S1 subunit can be further divided into the N-terminal domain (NTD) and C terminal domain (CTD), both of which participate in receptor binding [198]. The S1/S2 structure then trimerizes with 2 other S1/S2 molecules to form the complete S protein [196,198].

There are multiple important domains among the S1 and S2 subunits that contribute to the binding and fusion functionalities of the protein. The S1 subunit (N-terminal residues 14–685) for SARS-CoV-2) [199] contains the receptor binding domain (RBD) that associates with the host receptor (DPP4 for MERS or hACE2 for SARS). Comprising the RBD is either the S<sup>A</sup> domain of CoV-HKU1, HCoV-OC43 and MERS or the S<sup>B</sup> domain of SARS-CoV-1 and SARS-CoV-2, which interacts with the host receptors [194]. S<sup>B</sup> directly binds to ACE2 to allow viral entry of target cells [194]. The S<sup>B</sup> domain operates like a lock and key, existing in an open or closed conformation which possibly induces differential folding at the S1/S2 junction [194,200]. The open and closed conformations of S<sup>B</sup> are transient states, stochastically revealing and sheltering the RBD [195]. For SARS-CoV-2, the closed conformation is indicated by the RBD bound in trans in a pocket provided by the NTD and RBD of the neighboring S1 monomer [201]. SARS-CoV-2 S has been reported to have an even higher binding affinity [202], about 20-fold higher than SARS-CoV-1 S, for the ACE2 receptor [195]. SARS-CoV-1 S also binds to the C-type lectin DC-SIGN (dendritic cell specific intercellular adhesion molecule grabbing nonintegrin) as well as DC-SIGNR of dendritic cells without engaging the fusion complex [203]. Since dendritic cells migrate to lymphatic tissues, SARS may utilize dendritic cells as 'ferries', traversing blood and lymphatic vessels to new ACE2+ tissues, leading to systemic infections [203].

Upon binding to ACE2, the SB domain goes into the open configuration,releasing constraints at the S1/S2 site [194]. SARS-CoV-2 is much more susceptible to fusion activation than is SARS-CoV-1, indicating the presence of an additional furin cleavage site, confirmed to exist between S1 and S2 (residues 677–687) [194]. Depending on the mode of entry and the CoV strain, the furin cleavage site can be cleaved by host furin, transmembrane protease serine protease-2 (TMPRSS-2), TMPRSS-4, trypsin, lysosomal cathepsins or airway trypsin like protease (HAT), priming the class I fusion complex during synthesis, either at the cell surface or within an endosome [32]. The additional furin cleavage site in SARS-CoV-2 S may expand its tropism or propensity to fuse with host cells. Many SARS-CoV-1 or SARS-CoV-2 pseudo-virions/virions contain pre-cleaved S1/S2, indicating that cleavage can occur during S synthesis [197,199].

Fusion is likely pH-independent [204] but may be regulated by endosomal maturation and Ca2+ [51,52,205]. After binding to the receptor and cleavage of S1/S2, a final cleavage must occur at the S2 site (residue R797 in SARS-CoV-1) [206]. Once cleaved, steric bulk is released from two amphipathic α-helical, 4-3 coiled-coil heptad repeats, HR1 (residues 910–988 in SARS-CoV-2) and HR2 (1162–1206 in SARS-CoV-2), in the S2 subunit, releasing stiffness from this joint. X-ray crystallography revealed that the post fusion conformation of these HRs is characterized by a hip-knee-ankle style folding, where three HR2 helices collapse onto the hydrophobic grooves in an antiparallel manner of the central coiled-coil of the HR1 helices [194,207]. This folding reduces the distance between the viral envelope and the host surface/endosomal membrane, allowing insertion of a fusion peptides (FP) into the host membrane before membrane fusion [206].

Several efforts have been made to identify the FPs and regions in S2 that contribute to membrane fusion. Three regions in SARS S2, termed R1, R2 and R3, were found to have membrane-associating properties. R1 (858–886) is upstream of HR1, R2 (1077–1092) is situated between HR1 and HR2, and R3 (1190–1202) is proximal to the TM portion of S2. Of these regions, mutations in R1 led to a decrease in syncytia formation [206]. Upstream and overlapping with R1 are regions discovered through Wimley and White interfacial hydrophobicity analysis. They are called WW-I (770–778) and WW-II (864–886) and are strongly associated with membranes [206]. An exposed FP (FP1 798–818) is likely to occur in the WW-I N-terminal side of HR1, containing Ca2+ salt bridge-forming residues, D830 and L831. Immediately following FP1 is FP2 (816–835) which contains two disulfide bridge-forming cysteine residues C822 and C833. FP1 is highly conserved among CoVs with a crucial invariant LLF motif that when mutated to alanines causes defective fusion [208]. Additional putative FPs are the Alt-FP (770–788), overlapping with the WW-I, downstream of the S1/S2 cleavage site and internal FP (IFP) (873–888), coinciding with R1 [209]. Additionally, the region upstream of the TMS (1185–1202) seems to have membrane association properties [209]. ESR analyses conducted on each of these lone peptide segments, exposing multilamellar vesicles (MLVs) to these putative FP segments, induced an increase of membrane ordering, an indicator of viral fusion peptide activity. In all FPs, a requirement for Ca2+ was observed, and activity occurred from pH 5 to pH 7, consistent with the ability of S to fuse membranes at neutral pH values. FP1 and FP2 likely work in concert with each other, embedding themselves into membranes, forming disulfide and Ca2+ salt bridges to stabilize the fusion complex [206,209]. Fusion activity of FP2 was undetectable when Ca2+ was not present or if the complex was treated with disulfide bond reducing dithiothreitol [209]. Understanding their conformations may provide additional drug targets for the inhibition of SARS entry.

S protein also induces cell to cell fusion with production of syncytia with tissue damage [210]. New intracellular CoV particles can exit a cell and enter directly into adjacent cells of epithelial tissues, leading to disruption of cellular barriers and production of multinucleated cells. SARS-CoV-1, MHV, IBV, MERS and SARS-CoV-2 have all been reported to induce formation of syncytia in vitro in cell lines, likely due to S protein's ability to engage its fusion complex at neutral pH [32,211–214]. However, SARS-CoV-2 has an unprecedented capacity to form syncytia, producing multinucleated cells with hundreds of nuclei per 293T or Huh-7 cell [199]. Host proteases on membrane surfaces may be required for immediate fusion and entry of the virus into neighboring cells. TMPRSS-2, present on the opposite side of the membrane as S, seems to be required for syncytium formation in SARS-CoV-1 and SARS-CoV-2-infected veroE6 cells [214,215]. Lone SARS S, expressed via a cDNA in one of several cell lines, expressed S on the surface of transfected cells, and for some CoVs, syncytia have been reported to be formed by S alone [212]. These results sugges<sup>t</sup> that SARS S binds to ACE2 and is readily cleaved by membrane bound proteases [215]. An accumulation of secreted S protein on the surface of cells may induce fusion between neighboring cell membranes.

Further supporting the requirement of a membrane protease to generate syncytia is work conducted with MHV-2, a strain of MHV with S that can only be cleaved by cathepsins and cannot generate syncytia [211]. TMPRSS-2 was identified as a potent entry factor for SARS-CoV-2 in nasal epithelial cells [216], and TMPRSS-2 is abundantly distributed in respiratory epithelia [217]. Hence, developmental therapeutics, suppressing the interaction of SARS with TMPRSS-2, may reduce viral replication in tissue. Additionally, mutating all 9 palmitoylated residues in S, which does not disrupt folding, trafficking or core functions, does disrupt syncytium formation [218]. The increased capacity of SARS-CoV-2 to form syncytia could be due to the additional furin cleavage site and the increased occurrence of pre-primed S protein, so that only the S2 cleavage site is a prerequisite for fusion. Recently, it was discovered that pan-coronavirus fusion inhibitor, EK1 peptide variant EK1C4, could inhibit SARS-CoV-2 fusion in a dose-dependent manner by binding to HR1, but the exact mechanism was not revealed [199].

Tropism for ACE2 may be mediated by cholesterol. SARS and other enveloped viruses such as HIV have been reported to be dependent on lipid rafts for entry [219]. ACE2 associates with detergent-resistant cholesterol-rich microdomains in membranes, but treating cells with methyl-β-cyclodextrins (MβCDs) did not affect expression of ACE2 [220]. Cholesterol depletion in several cell lines treated with MβCD inhibited binding of S to ACE2 and SARS entry [220,221]. ACE2 receptor binding may be promoted by S palmitoylation, since palmitoylation was found to promote S association with lipid rafts and detergent-resistant membranes [218]. However, due to the virus' ability to infect new cells through cell-to-cell fusion, depletion of cholesterol and lipid rafts could not completely suppress viral replication [219]. Thus, lipid rafts are promoters of viral entry, but not necessarily of viral replication within tissues.

It was noted earlier that S can bind to DC-SIGN, a C-type lectin on the surfaces of dendritic cells. However, SARS has also been known to infect monocytes through both ACE2- independent and ACE2-dependent mechanisms in the lungs of SARS patients [222,223]. Infection of T cells by SARS-CoV-2 has also been reported [224]. This may contribute to the severe inflammation and depletion of T cells. A preprint study detailed tropism for white blood cells which may be mediated through CD147. However, further evidence is required for confirmation [225]. While abundant IFN-γ, present in macrophages, may suppress viral replication [222], SARS has multiple IFN-suppressing strategies that allow it to evade and silence innate antiviral activity in monocytes [223]. For instance, the virus can avoid detection from intracellular pattern recognition receptors (PRRs) such as MDA5 and RIG-I. These PRRs, which illicit a specific antiviral response upon detection of a virus, are either never activated or silenced by suppression of IRF-3 [223]. Rather than an antiviral response, cytokine secretions may be dominated by nonspecific inflammatory mediators that contribute to SARS-CoV-1 and SARS-CoV-2-associated diseases [223].

S protein may activate complement in early infections, which could explain the early onslaught of cytokines circulating in SARS patients. IgM, IgG, mannose binding lectin (MBL) or an alternative pathway may allow recognition of and binding to S, thereby activating complement. Subsequent activation of complement downstream pathways results in a flush of proinflammatory cytokines, possibly leading to a cytokine storm. Since complement can be activated directly by the presence of antigen, S protruding from viral particles may be a major contributor in the development of disease. Since S is abundantly glycosylated with N-linked mannosyl oligosaccharides, sequestering of SARS could occur early by binding MBL to S. MBL has been shown to bind to S in SARS-CoV-1 bearing pseudo-viruses, specifically at an N-linked oligomannosyl glycosylation site in the RBD. This critical localization within the RBD prevents S from binding to DC-SIGN, but not to ACE2 [226]. MBL was also found to bind to SARS-CoV-1 infected FRhK-4 cells and immobilize actual SARS-CoV-1 particles, inhibiting their infectivity [227]. Thus, low serum levels of MBL may be a susceptibility factor for the acquisition of SARS [227]. Despite these findings, it remains unclear what role MBL and C3b may have in activating complement early in the infection, even though complement activation has been confirmed in SARS-CoV-1, MERS and SARS-CoV-2 [228,229].

#### *8.4. The Membrane Matrix (M) Protein, the Virion Scaffold*

The homologous M proteins of CoVs and many other envelope viruses have been called the membrane proteins, the matrix proteins, the M proteins, or simply "M". M is the most abundant protein in any one coronavirus virion, and it is among the most conserved and constrained of all the viral structural proteins [230]. This may be attributed to its many functions in the viral infection cycle as well as in interferon antagonism (see Table 3) [230]. All of the major structural proteins of these viruses are derivatized and/or hydrolyzed at specific positions by post translational modification (PTM) reactions (see Fung & Liu, 2018 [231] for a review). These derivatization reactions involve (1) protease-mediated hydrolysis by both virus- and host cell-encoded proteases, (2) either O (serine or threonine)- or N (asparagine)-glycosylation, and often both, (3) palmitoylation of the spike (S) and envelope (E) proteins, (4) protein phosphorylation by ATP-dependent protein kinases, and (5) ADP-ribosylation of the nucleocapsid (N)-protein. Other PTMs of nonstructural "accessory" proteins have also been documented [231].

**Table 3.** Potential Functions of Coronavirus Matrix (M) proteins.


The most complicated of these PTM reactions is glycosylation. M proteins of SARS-CoVs are O-glycosylated (on seryl and/or threonyl residues), and not N-glycosylated (on asparaginyl residues) [232,233]. The structures of the O-linkages are known and include O-linked N-acetylgalactosamine to which galactosyl and sialyl residues are glycosidically linked [234]. O-Glycosylation occurs in the Golgi and has been used as a marker for proper M protein intracellular trafficking, membrane insertion and maturation [235]. It seems that glycosylation is non-essential for assembly of some CoV virions, but it greatly facilitates the formation of active virus particles, and it also regulates interferon production (see below). The established or probable functions of M proteins are presented in Table 3.

The primary function of M is assembly of newly formed viral particles. As noted above, it is the most prevalent protein component of the virion. It provides a homodimeric scaffold for virion assembly and has affinity not only for itself, but also for all of the constituent structural proteins found in the virion. Thus, to provide the "master assembly function", it has both homotypic and heterotypic associative properties. In one study, based on cryoEM, tomography and statistical analyses, Neuman et al., 2011 [236] suggested that M can assume two distinct conformations. One, they suggested, is elongated, being associated with rigidity, spike clustering and a narrow range of membrane curvature, while the other is more compact and is associated with greater flexibility and a lower spike density. Presumably, the proper ratio of these two forms determines the final virion construction. As noted above, M associates with itself to form dimers, but also with the nucleocapsid (N)-protein, the spike (S)-protein and the envelope (E)-protein as well as the genomic RNA. Thus, with M as the 'glue', holding the complex together, these primary constituents of the viral particle determine virion size and shape with M playing the dominant role.

A subsequent study led to the suggestion that initial self-assembly and ultimate release of the membrane-enveloped vesicle/particle (the virion) depends most importantly on the association of M with N and the viral RNA [237]. Assembly seems to be a multi-step process as illustrated in Figure 2: First, M self-associates, creating an M-protein homodimer, and this self-association process involves several distinct regions of M, explaining why this occurs with high affinity. Its heterotrophic interactions, then, may be largely responsible for the order of the protein associations. Because of the TMSs in M, this early intermediate is likely to already be membrane associated. Second, although M is made in the ER, it acts either in the trans-Golgi network, or the ER-Golgi intermediate compartment (ERGIC) during assembly, depending on the specific coronavirus under study, clearly requiring specific host-catalyzed trafficking of M through the endo-membrane network [238]. In this regard, it is important to note that two motifs in the C-terminal domain (DxEER and KxGxYR in the MERS CoV M protein) are ER export and trans-Golgi network retention sequences, respectively [238]. Third, M associates with N, the nucleocapsid protein, again probably via multiple sites in M, although a particularly important sequence for this association is the di-leucine motif in the C-terminal tail of the protein [237]. Based on mutational analyses, the N-terminal exo-domain or the central TMSs appear to be of lesser importance for the association of N with M. However, a central cysteyl residue (C158) also plays a role [237]. The C-terminal domain of N is largely responsible for the association with M [239]. Fourth, the M-N association allows the genomic RNA to become part of the developing particle complex because of the high affinity of N for this nucleic acid. However, the inclusion of the genomic RNA within the complex may occur simultaneously with step 4 because of the high affinity of N for the genomic RNA. Fifth, the endo-membrane-M association allows recruitment of the Spike (S)-protein to the particle. In fact, M has affinity for ALL of the other structural proteins that end up in the virion. Sixth, several M-protein residues seem to be involved in the final secretion and budding processes, and these residues are scattered throughout the protein, probably playing specific roles [237]. Finally, the Eprotein, together with M, with which it interacts, plays a significant but less well-defined role in the assembly process [239]. The stage(s) of its involvement in the temporal scheme

outlined here are not as well defined as the general scheme itself. As will become apparent, the viroporin functions of E and 3a are assumed to play a role (see below).

In addition to the associations with its own viral proteins, Gordon et al., 2020 [240] cloned, tagged and expressed 26 of the 29 SARS-CoV-2 proteins in human cells and identified the human proteins physically associated with each of the 26 viral proteins using affinity-purification/mass spectrometry (AP-MS). They identified 332 high-confidence SARS-CoV-2-human protein-protein interactions (PPIs). Among these, were 66 human potential drug targets (host factors), and these were targeted by 69 compounds. This work therefore provides a guide for the development of anti-viral drugs that may act against SARS-CoV-2 to block different aspects of the viral infection cycle.

#### **9. Viroporin Activities: The E, 3a and 4a Proteins, and the Ability of Mutated M to Substitute for E**

Three distantly related proteins in SARS-CoV-1 and other related coronaviruses display very similar hydropathy plots. The first of these is M (TC# 1.A.117), the second is protein 4a (TC# 1.A.89), and the third is protein 3a (which also can assume other designations, depending on the virus) (TC# 1.A.57). The similarities of their topologies can be viewed in Figure 3 as hydropathy plots. As shown in this figure, variations within each family occur, but they are similar in all three families (Figure 3). Moreover, surprisingly similar hydropathy plots with sequence similarity of borderline significance can be observed. As noted above, viroporin activities have been demonstrated for the E, 4a and 3a proteins as well as a fourth family of apparent viroporins classified under TC family # 1.C.99; however, porin activity has not been demonstrated for M [80,241,242]. Nevertheless, a most interesting study, suggesting a functional relationship between the M and E proteins, was conducted by Kuo and Masters [243]. E was eliminated by deletion of its structural gene in the mouse hepatitis virus. The virus was found to still be infective, but it showed poor assembly with altered virion morphology, and it gave rise to tiny plaques. The authors then selected for "suppressor" mutations with at least partially restored viral growth and virion production, giving rise to much larger plaques. The secondary mutations were found to be in M, and these mutants arose in a sequential process involving M-gene duplication, where one copy retained the native M gene while the second M-gene encoded an altered M protein (M\*) with a truncated C-terminus. Both M and M\* were incorporated into the virion. It seems that M\* served as a surrogate for E, providing a new gene function through recombination. Since E is known to have viroporin activity [244,245], it seems plausible that M\* had recovered (at least partly) the viroporin activity of the deleted E protein. Although these authors had a different interpretation of their observations, we sugges<sup>t</sup> that the N-terminal transmembrane domain of M may be capable amino acid substitution that allows it to form transmembrane pores, a suggestion that needs to be confirmed or refuted. In Figure 4, we provide comparative hydropathy plots between the SARS 3a viroporin, HCoV 299E 4a viroporin and M protein.

**Figure 3.** Average hydropathy and similarity within three families. (**A**) Family of SARS-3a cation-selective viroporins. (**B**) Family of HCoV-229E-4a cation-selective viroporins. (**C**). Family of M (matrix)-proteins. Red curves indicate average hydropathy, gray curves indicate average similarity per position, and vertical thin black bars on the *x*-axis indicate regions predicted to be part of TMSs. Conserved hydrophobic peaks (inferred TMSs) are highlighted with moccasin-colored bars. Proteins within each family were aligned with MAFFT [246] using the L-INS-i algorithm and then edited with trimAL [247] to keep positions with less than 30% gaps. Plots were generated with the program AveHAS [248]. Notice the high topological similarity among the three families, despite their poor sequence similarity.

**Figure 4.** Topological relationships between (A) SARS-VP (3a) viroporins, 229E (4a) viroporins, and (C) M-proteins. Families were compared using our methodological pipeline based on the transitivity property of homology [249,250]. Hydrophobic peaks (inferred TMSs) are highlighted as green bars. Pfam domains were projected with the program GetDomainTopology [250] and drawn as solid black bars above the *x*-axis. (**A**) Hydropathy plots of representative alignments (E-value: 1.3 × <sup>10</sup>−5) between a SARS-VP (3a) viroporin homolog AWV67041 (red) and a 229E (4a) viroporin homolog ADX59489 (blue). The characteristic Pfam domain of family 229E viroporins (PF03053) was projected to the SARS-VP homolog ADX59489 (E-value: 8.7 × <sup>10</sup>−4). (**B**) Hydropathy plots of the representative alignments (E-value: 6.1 × <sup>10</sup>−7) between a SARS-VP (3a) homolog ADX59475 (red) and an M-protein homolog ARI44791 (blue). The characteristic Pfam domain of the M-protein family (PF01635) was projected to the SARS-VP homolog ADX59475 (E-value: 4.8 × <sup>10</sup>−3). (**C**) Hydropathy plots of the representative alignments (E-value: 1.4 × <sup>10</sup>−6) between a 229E (3a) viroporin homolog ABQ57217 (red) and an M-Protein homolog YP\_003858587 (blue). The characteristic Pfam domain of family 229E viroporin (PF03053) was projected to the M-protein homolog YP\_003858587 (E-value: 1.4 × <sup>10</sup>−3). Notice how the projected domains cover the entire length of the alignments in panels A-C. Altogether, the compatibility of TMS topologies (Figure 3) and the similarity of sequence characteristics between these three families sugges<sup>t</sup> that they form a superfamily.

#### **10. Post Translational Modifications (PTMs) to Coronaviral Structural Proteins**
