**A Systematic Review of Genotype–Phenotype Correlation across Cohorts Having Causal Mutations of Di**ff**erent Genes in ALS**

**Owen Connolly 1,**†**, Laura Le Gall 1,**†**, Gavin McCluskey 1,2, Colette G Donaghy 2,3, William J Duddy <sup>1</sup> and Stephanie Duguez 1,\***


Received: 18 April 2020; Accepted: 15 June 2020; Published: 29 June 2020

**Abstract:** Amyotrophic lateral sclerosis is a rare and fatal neurodegenerative disease characterised by progressive deterioration of upper and lower motor neurons that eventually culminates in severe muscle atrophy, respiratory failure and death. There is a concerning lack of understanding regarding the mechanisms that lead to the onset of ALS and as a result there are no reliable biomarkers that aid in the early detection of the disease nor is there an effective treatment. This review first considers the clinical phenotypes associated with ALS, and discusses the broad categorisation of ALS and ALS-mimic diseases into upper and lower motor neuron diseases, before focusing on the genetic aetiology of ALS and considering the potential relationship of mutations of different genes to variations in phenotype. For this purpose, a systematic review is conducted collating data from 107 original published clinical studies on monogenic forms of the disease, surveying the age and site of onset, disease duration and motor neuron involvement. The collected data highlight the complexity of the disease's genotype–phenotype relationship, and thus the need for a nuanced approach to the development of clinical assays and therapeutics.

**Keywords:** ALS; MND; ALS variants; genotype–phenotype; ALS genes

#### **1. Introduction**

Amyotrophic lateral sclerosis, or ALS, is characterised by a progressive and fatal degeneration of upper and/or lower motor neurons (UMN and LMN, respectively) resulting in muscle weakness and wasting. Classical ALS is the most common form of motor neuron disease (MND) [1] and is defined by the selective deterioration of both UMN and LMN [2]. The global incidence of ALS varies between 1 and 2.6 cases per 100,000 people per year [3], with the average age of onset ranging from 54 to 67 years old [4]. The prevalence of ALS increases with age, reaching 1/5000 among people aged 70–79 years old [5]. Consequently, as the population ages, it is expected that the world's total number of cases will reach more than 375,000 by 2040 [6]. Owing to the lack of a reliable diagnostic test, absence of validated biomarkers, and phenotypes that are easily confounded with other MNDs, including primary lateral sclerosis (PLS) and progressive muscular atrophy (PMA), there is a delay of approximately 11–12 months in reaching a definite diagnosis [7]. Currently, diagnosis is based on a set of clinical criteria (El Escorial [8] and revisions [9], and Awaji-Shima criteria [10]) that can be used to stratify patients according to the area of initial onset and the progression of symptoms.

ALS phenotypes vary between patients who can present with different sites of onset and symptom severity (Figure 1). Concomitant impairments in cognitive ability are sometimes associated with the ALS phenotype. A recent finding from Chiò et al. suggested that 20.5% of ALS patients had frontotemporal dementia (FTD), and a further 31.3% had a behavioural, cognitive or non-executive impairment [11].

**Figure 1.** Clinical features of amyotrophic lateral sclerosis (ALS) and their role in prognosis. Diagram summarising the heterogeneity of clinical features in ALS. Multiple features have been associated with a poor prognosis, with an elderly onset being associated with a rapid progression of symptoms and a poor prognosis, especially among elderly females presenting with bulbar-onset phenotype [12]. Disease progression can be assessed either by diagnostic delay or by the ALS functional rating score (ALSFRS: amyotrophic lateral sclerosis functional rating scale). Poor prognosis is associated with patients whose ALS diagnosis has been given less than 8 months after symptom onset, or among those patients losing more than 1.4 points/month on the ALSFRS scale [13].

In the past 30 years, there have been a large number of studies investigating the genetic underpinnings of ALS. To date, over 30 genes have been related to the disease; yet it is important to note that mutations in these genes explain only ~20% of total ALS cases [14] whilst the majority of cases remain unexplained and present no family history. ALS is therefore considered to be a mainly sporadic disease (sALS), with ~80% of cases having no known genetic basis [3], although twin studies have estimated heritability at 40–45% [15] or 61% [16]. Known gene mutations explain some 70% of familial cases (fALS) [17,18], and they have also been identified in 10% of sporadic cases [18]. In European cohorts, the hexanucleotide repeat expansion in the *C9orf72* gene is the most common genetic cause of fALS (33.7%) and sALS (5.1%), followed by *SOD1* (14.8% in fALS and 1.2% in sALS cases), *TARDBP*/*TDP-43* (4.2% in fALS and 0.8% in sALS), and *FUS* (2.8% in fALS and 0.3% in sALS) [19].

To understand the molecular mechanisms underlying ALS, it is useful to study genotype–phenotype relationships, to determine whether certain gene mutations are associated with specific clinical features or outcomes. Genotype–phenotype relationships have previously been examined for certain gene mutations, and several informatics resources exist to collect genotype–phenotype data [20–24], but a systematic understanding across different gene mutations has not been established. As a step towards this, the present review gathers together the clinical summary statistics from previously studied cohorts across 22 of the more commonly associated genes. Each of the genes are considered in the order in which they were discovered and, where available, a summary of the reported phenotypes associated with each gene is later provided.

#### **2. Pathological Definition of ALS: Clinical Features and Phenotype Variability**

#### *2.1. Age of Onset Variation*

ALS occurs primarily in patients in their sixth decade, though peak onset is later in sporadic cases (58–63 years) than in familial cases (47–52 years) [25] (Table 1 and Figure 1). Four periods of onset can be defined: juvenile (<25 years old); young (25–45 years); mid–late adulthood (45–70 years); and elderly (>70 years). Juvenile ALS is extremely rare (<1/1,000,000 cases) [26], and is usually associated with slower symptom progression, hence a longer survival time and better prognosis [27]. Some mutations are now described to be associated with juvenile ALS, such as specific mutations of *FUS*, *ALS2* and *SETX* genes [26]. UMN rather than LMN dysfunction is predominant among juvenile ALS cases. Young-onset ALS also shows mainly UMN dysfunction, which is predominant in 60% of those patients [26]. Bulbar-onset ALS is rare in young patients and represents ~15% of cases [27]. In addition, young-onset ALS affects a relatively high proportion of males, with a male:female ratio of 3:1 [26]. These young-onset cases are also associated with a better prognosis than older ALS patients. Elderly-onset patients are more likely to present with bulbar symptoms and are represented by a greater proportion of female patients (M:F ratio 1–1.6) [12,26]. Symptom onset after 80 years is associated with a more aggressive phenotype and poor prognosis, with mean survival times of less than 20 months [12].

**Table 1.** ALS age of onset variability and their clinical features. Summary of clinical features for ALS in different age periods from Chio et al. [28], Forbes et al. [12], Swinnen et al. [27], Turner at al [26], Sabetelli et al. [29], and Kiernan et al. [25]. In addition to the classical ALS phenotype with age of onset ranging from 45 to 70 years old (mean age ~ 61 years old), three additional age of onset periods (columns) have been observed. Male to female ratios, genetic characteristics, site of onset, estimated survival time, and clinical features are shown where applicable. sALS: sporadic ALS. fALS: familial ALS. MN: motor neurons. UMN: upper motor neurons. LMN: lower motor neurons. -: no data.


#### *2.2. Site of Onset Variability*

The majority of ALS cases (~70%) have spinal onset, usually presenting with focal limb weakness [30] such as foot drop or a weak hand [7]. The disease then tends to spread in a contiguous manner, initiating at distinct focal regions of the body and then propagating from the primarily affected area to adjuvant secondary sites of the body [31].

In 25% of ALS cases, symptoms develop initially in the bulbar-innervated muscles [30,32]. Bulbar-onset ALS is more common in women [7], especially after 70 years (M:F ratio 1:1.6 [12]). Dysarthria almost always predates dysphagia and cognitive impairment is often present [32].

Approximately 3% to 5% of patients [33] present with respiratory or cognitive onset [25]. Thoracic spinal-onset ALS can present as truncal weakness or respiratory impairment and is associated with poor prognosis, with a mean survival time of just 1.4 years [27,34].

Cognitive-onset ALS patients usually present symptoms characteristic of frontotemporal dementia (FTD), such as changes in behaviour, personality and cognition which are all suggestive of frontal impairments [35].

In summary, initial site of symptom onset varies among ALS patients from classic limb-onset to rare cognitive-onset phenotypes, and a poor prognosis is often associated with bulbar and respiratory onset [25].

#### *2.3. Motor Neuron Involvement in ALS Variants*

ALS patients can present with either a LMN or UMN predominant phenotype (Figure 2). Signs of pure LMN dysfunction are considered as progressive muscular atrophy (PMA), whereas predominant UMN signs are associated with primary lateral sclerosis (PLS) [30]. PMA and PLS are both rare diseases and represent 5% of MND patients [27].

#### 2.3.1. UMN-Dominant ALS Variants

Patients can present predominant UMN dysfunction as in primary lateral sclerosis (PLS) or pseudobulbar palsy. The UMN predominant phenotype can then progress to ALS, which is observed in 40% of PLS cases [36]. Patients diagnosed with PLS for not meeting the diagnostic criteria for ALS can still slowly develop signs of LMN dysfunction and therefore present both UMN and LMN signs [27]. However, LMN involvement and limb atrophy in PLS is exceptionally rare [37] and the prognosis for PLS patients is better than that for patients diagnosed with ALS as symptom progression is relatively slow.

#### 2.3.2. LMN-Dominant ALS Variants

On the contrary, some patients can develop a LMN-dominant phenotype which can be defined as progressive muscular atrophy (PMA), and flail-arm or flail-leg syndrome variants. PMA patients are similar to classic ALS patients without obvious signs of UMN dysfunction. However 50% to 60% of PMA patients develop degeneration of upper motor neurons during the progression of the disease [38], and post-mortem histopathology has demonstrated that some PMA patients show UMN involvement which could not be detected upon clinical examination [39,40]. In patients with flail-arm or flail-leg syndromes, a LMN pattern of weakness and atrophy is observed in the upper limbs or lower limbs, respectively. Similar to PMA, flail-arm and flail-leg syndrome have been described as a LMN variant but can show UMN involvement in the later stages of disease [41]. Involvement of secondary sites should not occur within 12 months of initial onset [42] and prognosis for flail-arm and flail-leg syndrome is better than that seen in ALS, with median survival times of 5 to 6 years [41,43].

#### *2.4. Non-Motor Involvement in ALS and Overlap with FTD*

Formany years, ALS was described as a neurodegenerative disorder with no extra-motorinvolvement. However, non-motor involvement is now accepted in the ALS phenotype [44], with neuroimaging demonstrating reduced grey matter in motor and non-motor brain regions of ALS patients [45], and histopathology suggesting widespread neuronal and glial TDP-43 pathology in the CNS [46]. In regards to symptomology, a low proportion of ALS patients experience non-motor impairment as a first indication of pathology (3% of sporadic cases and 15% of familial cases) [47]. It has been estimated that approximatively 35% of ALS patients present behavioural and/or cognitive changes (with 15% meeting the Neary criteria [48] for FTD diagnosis (ALS-FTD) [47]). The reported percentage seems to be much

lower in most gene-specific studies and varies considerably between them, but it should be noted that the number of patients and studies for which these clinical parameters are reported is relatively small (Table S2). ALS and FTD are sometimes described as part of one continuum, with pure ALS patients (without any non-motor involvement) and pure FTD cases (for whom no motor dysfunction has been described) representing opposite ends of the spectrum.

**Figure 2.** The role of upper and lower motor neurons in different ALS variants. ALS is a disease with high variability in clinical phenotype. "Classic ALS" patients will present with signs of both UMN and LMN degeneration. However, patients with progressive muscular atrophy (PMA) and primary lateral sclerosis (PLS) present with LMN-predominant or UMN-predominant signs, respectively. LMN-predominant patients also include flail-arm syndrome and flail-leg syndrome ALS variants where LMN signs are present in upper or lower limbs, respectively. ALS patients might present symptoms in bulbar-innervated muscles, if UMN signs are predominant, patients are diagnosed with pseudobulbar-palsy. Blue colour circles indicate motor neurons of the corticospinal tract. Green colour circles indicate motor neurons of the corticobulbar tract. Solid circles indicate UMNs and open circles indicate LMNs. Colour of ticks corresponds to colour of variant label and tick location indicates the motor neuron populations affected. ALS: amyotrophic lateral sclerosis. PLS: primary lateral sclerosis. PMA: progressive muscular atrophy. CS: corticospinal. CB: corticobulbar.

ALS patients having FTD usually meet the criteria for behavioural variant FTD characterised by defects in cognitive functions, personality traits and behavioural collapse. Among ALS cases experiencing non-motor dysfunction, language (particularly deficits in verbal fluency) and cognition are the most affected categories [49], and apathy is the most frequently encountered personality impairment [47].

#### 2.4.1. Dementia in ALS Patients—ALS-FTD Variants

ALS-FTD diagnosis is made upon the presence of an ALS phenotype associated with behavioural or cognitive defects that fulfil FTD diagnostic criteria: (1) progressive impairment of behavioural/cognitive functions and observation of at least three behavioural symptoms defined by Rascvosky et al. [50]; or (2) loss of insight and/or presence of psychotic features associated with at least two Rascvosky et al. [50] symptoms; or (3) language impairment combined with semantic dementia (defined in [48]).

#### 2.4.2. Cognitive Changes in Non-Demented ALS Patients—ALSci and ALSbi Variants

Non-demented ALS patients presenting with behavioural impairment are classified as ALSbi-variant, while ALS patients experiencing cognitive impairment including language defects are considered to be ALSci variant [47]. Based on the revised diagnostic criteria from Strong et al. [51], ALS patients can be diagnosed as ALSci variant if either executive impairment (social cognition), or language dysfunction, or a combination of the two features are evident during diagnosis. Diagnostic criteria for ALSbi variant require apathy with or without other behavioural symptoms, or two or more behavioural changes, such as disinhibition, loss of sympathy/empathy, perseverative/stereotypic/compulsive behaviour, hyper orality/dietary change, loss of insight and psychotic symptoms.

#### **3. Genetics of ALS**

*Superoxide dismutase 1 (SOD1)* was the first gene demonstrated to be associated with ALS in 1993 [52]. *SOD1* is ubiquitously expressed in human cells and serves to protect them from harmful reactive oxygen species (ROS). Mutated forms of *SOD1* are believed to result in a toxic gain of function, provoking the presence of misfolded protein aggregates, increased endoplasmic reticulum (ER) stress, and oxidative stress and ultimately accelerating motor neuron degeneration [17].

In 2001, mutations in *ALSIN2 (ALS2)* were shown to be implicated in juvenile forms of ALS [53–55] and PLS [56]. The ALS2 protein has been found to act as a guanine nucleotide exchange factor for the GTPase, Rab5, which is in involved in endosome trafficking [57]. Mutations in *ALS2* have been shown to inhibit activation of Rab5 and its translocation to mitochondria, leaving *ALS2* mutated motor neurons more susceptible to oxidative stress [58]. However, in murine studies, genetic ablation of *ALS2* has failed to recapitulate the pathological features seen in ALS [59,60] although primary motor neurons from these mice did show greater sensitivity to oxidative stress and aberrant morphology, suggesting that *ALS2* mutations may indeed play a role in motor neuron susceptibility in ALS.

Genetic mutations were next reported in 2004 for the *senataxin (SETX)*, *angiogenin (ANG)*, and *vesicle-associated membrane protein-associated protein B (VAPB)* genes. *SETX* plays a role in numerous cellular functions including RNA metabolism and has been shown to regulate RNA polymerase II transcription termination [61] and its yeast homolog, SEN1, has been linked with processing of non-coding RNA [62]. *SETX* mutations are strongly associated with juvenile-onset ALS [63] and associations have been confirmed in American, Italian and Dutch cohorts [63–65]. ANG is highly expressed in the human central nervous system [66] and has been reported to show neuroprotective properties [67]. Indeed, expression of ALS-associated *ANG* variants has been shown to cause motor neuron death in cell culture models [67]. *ANG* has also been reported to play a role in the transcription of ribosomal RNA [68] and many ALS-associated variants are believed to elicit a loss of function in ANG, thus eliminating any neuroprotective functionality [69]. VAPB is a protein closely associated with the endoplasmic reticulum and is thought to be involved in the induction of the unfolded protein response (UPR) [70], as well as cellular processes including lipid transport [71], protein secretion [72], and calcium homeostasis [73]. The P56S mutation in *VAPB* has been implicated in an early-onset and slow-progressing form of fALS [74] and follow-up studies have highlighted how this mutation can result in nuclear envelope defects [75], and provoke VAPB ER aggregates [72]. However, murine models expressing the P56S mutation show widespread VAPB aggregates but demonstrate no motor neuron pathology or ALS phenotypes [76].

The next genetic mutation associated with ALS did not arrive until 2008, when mutations in *TAR DNA-binding protein (TARDBP)*, encoding TDP-43, were reported in patients [77]. TDP-43 is a RNA/DNA-binding protein that plays important roles in several RNA metabolism processes [78]. Ubiquitinated TDP-43 was first shown to be present in CNS inclusions of ALS patients in 2006 [79] and subsequent studies have confirmed TDP-43 as the major protein component of pathological inclusions present in approximately 90% of ALS patients [80]. However, TDP-43 pathology is not unique to ALS and has been reported in numerous neurodegenerative conditions including FTD [79], Parkinson's disease [81], Huntington's disease [82], Alzheimer's disease [83], and dementia with Lewy bodies [84].

Then, in 2009, multiple mutations in the nuclear RNA-binding protein, *Fused in Sarcoma (FUS)* and *FIG4 phosphoinositide 5-phosphatase (FIG4)*, were associated with ALS [85,86]. FUS is another RNA/DNA-binding protein involved in mechanisms of RNA splicing and DNA repair [87] and is implicated in both ALS and FTD [88]. Mutations in *FUS*, particularly those near the nuclear localisation signal (NLS) domain, cause cytoplasmic protein mislocalisation and are associated with a severe phenotype in murine models [89]. *FIG4* is involved in vesicle trafficking due to its role in the regulation of the membrane bound phosphoinositide, PI(3,5)P2 [90]. Mutations in *FIG4* were initially shown to cause neurodegeneration in Charcot–Marie–Tooth (CMT) neuropathy [91]. However, others have questioned the role of *FIG4* in ALS pathology after failing to find pathogenic mutations in their Taiwanese [92] and Italian [93] cohorts.

In 2010, mutations in *Optineurin (OPTN)*, *Spatacsin paraplegia 11 (SPG11)*, *Valosin-containing protein (VCP)*, and *Ataxin-2 (ATXN2)* were all implicated in ALS. Three different *OPTN* mutations were identified in ALS patients [94] and researchers were able to demonstrate the increased immunoreactivity of OPTN in both TDP-43 and SOD1 inclusions found in the spinal cord of sALS patients, suggesting a role for *OPTN* in general ALS pathogenesis.

The link between *SPG11* and ALS was established when mutations were found to be associated with autosomal recessive juvenile ALS [95]. Mutations to *SPG11* are the most common cause of autosomal recessive hereditary spastic paraplegia [96] and loss of function mutations have been shown to elicit lysosomal dysfunction and UMN + LMN degeneration in mice [97]. *ATXN2* encoding the ataxin-2 polyglutamine (polyQ) protein was associated with ALS when researchers identified the presence of intermediate length polyQ expansions (27-33 Qs) in 4.7% of their North-American ALS cohort [98]. Ataxin-2 protein has been shown to regulate mRNA stability and translation [99,100] and upregulation of the fly homolog of Ataxin-2 was found to enhance neurodegeneration in *Drosophila* via its interaction with wild-type and mutated forms of TDP-43 [98]. Involvement of *Ataxin-2* in ALS pathogenesis has since been confirmed in European and Chinese patient cohorts [101,102]. VCP is an ATP-driven chaperone protein that plays a role in ubiquitin-regulated protein degradation [103], autophagy [104], and mRNA processing [105,106]. *VCP* mutations were shown to be present in 1–2% of familial ALS patients in an Italian cohort [107] and mice expressing ALS-associated *VCP* mutations have been shown to develop a slow-progressing ALS phenotype [108].

In 2011, mutations in *ubiquilin-2 (UBQLN2)*, *sequestosome-1 (SQSTM1)*, and *chromosome 9 open reading frame 72 (C9orf72)* were discovered. Ubiquilin-positive inclusions have been implicated in both sALS and fALS [109], whilst mutations in *SQSTM1* have been observed in rare ALS and FTD cases [110] and can be shown to lead to p62 protein inclusions in motor neurons of both patient groups [111]. The G4C2 hexanucleotide repeat expansion mutation (HREM) within *C9orf72* [112,113] is perhaps the most significant genetic mutation associated with ALS thus far, and is estimated to be present in 34% of familial cases, and 5% of sporadic cases in Europe [19,114]. In healthy subjects, the G4C2 repeat length ranges from 2 to 23 units [112], whilst intermediate expansions ranging from 24 to 30 [115] and large expansions ranging from 30 to many hundreds of units have been observed in ALS patients [112,116]. Although rare, *C9orf72* expansions have been implicated in other neurodegenerative and psychiatric diseases including PD [117] and Schizophrenia [118], suggesting a wider role for *C9orf72* in neuropathology and perhaps offering some insight towards the heterogeneous phenotype seen in *C9orf72* ALS.

In 2012, *Profilin 1 (PFN1)* was implicated in familial and sporadic cases of ALS [119]. Mutant *PFN1* has been shown to cause motor neuron degeneration through the formation of insoluble aggregates and disrupted cytoskeleton dynamics in mice [120] and co-aggregation of PFN1 and TDP-43 has been reported in cell lines expressing mutant *PFN1* [119].

Then, in 2013, *heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1)* was reported to be involved in ALS after researchers identified three *hnRNPA1* variants—two of which were associated with familial ALS and the other of which was associated with a sporadic case [121]. hnRNPA1 is known to colocalise with TDP-43 [121] and post-mortem studies have shown that motor neurons of ALS patients display marked reductions in hnRNPA1 alongside concomitant TDP-43 inclusions [122].

In 2014, mutations in *Tubulin alpha-4A (TUBA4A)* and *Matrin-3 (MATR3)* were implicated in ALS. Mutations in *TUBA4A* were first identified in a European and American cohort [123] and then validated in Belgian and Chinese cohorts in 2017 and 2018 [124,125]. *TUBA4A* mutations have been shown to cause cytoskeletal defects in primary motor neurons [123] and are recognised as a rare cause of ALS and FTD [125].

*MATR3* was first associated with ALS after exome sequencing identified mutations in Italian, UK and US kindreds, alongside increased levels of MATR3 protein in spinal cord sections of ALS patients relative to controls [126]. MATR3 has been found to interact with TDP-43 and both proteins were shown to co-aggregate in skeletal muscle tissue of ALS patients [126]. *MATR3* is known to play various roles in RNA metabolism and alternative splicing [127,128] and recent evidence suggests ALS-associated *MATR3* mutations play a role in defective nuclear export of FUS and TDP-43 mRNA [129]

In 2015, *NIMA-related kinase 1 (NEK1)* was recognised as an ALS-risk gene [130] and was shown to interact with two other ALS genes, *ALS2* and *VAPB*—both of which are involved in endosomal trafficking. Subsequent studies provided further evidence for the pathogenic role of *NEK1* in ALS [131,132] and pathway analyses have shown NEK1 to interact with C21orf72—both of which are involved in DNA repair mechanisms [133]. Mutations in *Tank-binding kinase 1 (TBK1)* were also associated with ALS in 2015 after exome sequencing identified eight loss of function mutations in 13 fALS pedigrees [134].

*Cyclin F (CCNF)* was implicated in ALS in 2016 with variants identified in both familial and sporadic cases [135]. In the same study, researchers were able to demonstrate how mutant *CCNF* led to aberrant ubiquitination and aggregation of proteins including TDP-43. More recently, CCNF was shown to be a binding partner of another ALS protein, VCP. Binding of mutated CCNF to VCP increased VCP ATPase activity, which in turn led to increased TDP-43 aggregation in U20S cells [136].

Then, in 2018, the most recent genetic mutations implicated in ALS were discovered when research demonstrated the pathological involvement of *Kinesin family member 5A (KIF5A)* [137]. KIF5A is a protein expressed specifically in neurons and is involved in regulating neuronal microtubule dynamics [138,139]. *KIF5A* is also associated with spastic paraplegia and Charcot–Marie–Tooth neuropathy [140] and mutations have been reported in ALS patients in Chinese [141], European [142,143], and US cohorts [137].

#### **4. Correlation of Genotype**/**Phenotype: Methods, Results and Discussion**

To evaluate whether there is a correlation between associated genes and phenotype in ALS, a systematic search of original papers was performed using key words summarised in Table S1, while adhering to PRISMA guidelines (see checklist in Supplementary Materials).

#### *4.1. Protocol*

A systematic search was performed in PubMed using the key words: ALS, genotype phenotype, patient, and onset. To make sure that clinical data would also be obtained for rare genes involved in ALS and listed in Vijayakumar et al. [14], the following search terms were added: ALS, phenotype, patient and the gene name such as *TBK1*, *VCP*, *SQSTM1*, *CCNF*, *NEK1*, *OPTN*, *FIG4*, *PFN1*, *ATXN2*, *VAPB*, *ANG*, *ALS2*, *SPG11*, *UBQLN2*, *KIF5A*, and *MATR3*. There were no language, type of study, or publication date restrictions.

#### *4.2. Eligibility Criteria*

The search combining the different key words resulted in 355 articles. Reviews and duplicated papers were excluded. To avoid redundancy, papers re-using previously published clinical data were excluded. All studies used in the systematic review were peer-reviewed, written in English, and published original clinical data related to patients affected by monogenic forms of ALS. At least one of the following parameters had to be described in the paper: age of onset, site of onset, motor neuron population being affected (UMN, LMN, UMN+LMN), disease duration, number of patients with FTD, and number of patients with cognitive impairment. A total of 107 papers were then eligible for the analysis (see PRISMA flow chart in Figure 3).

**Figure 3.** PRISMA flow chart showing how studies have been selected.

#### *4.3. Data Extractions and Synthesis*

The papers were thoroughly reviewed by OC, LLG, VM and SD. Key information was extracted from each study, and grouped into cohort characteristics (ethnicity/ country of the study, number of patients), age of onset (distribution and mean and standard deviation (SD)), site of onset (spinal, bulbar, respiratory, other/unknown), motor neurons being affected (UMN, LMN, UMN+LMN), disease duration (mean and SD), percentage of patients with FTD, and percentage of patients with cognitive impairment. All data are collated per gene in Table S2.

For the summary Table (Figure 4), the age of onset and disease duration are presented as the weighted mean ± SD, and the site of onset, motor neuron impairments, and FTD comorbidity are presented as weighted percentages, in all cases taking into account the number of patients studied as described below:

Mean = mean of the parameter of interest given in the referenced study;

 n

n = number of patients studied for the corresponding parameter in the given study;

$$\begin{array}{l} \text{Sx}^2 = SD^2(n-1) + \text{((Sx)}^2/n) \\ \text{Weights mean} = \frac{\sum \text{Sx}}{\sum \text{n}} \\ \text{Weights SD} = \sqrt[2]{\frac{(\sum \text{(Sx}^2) - (\sum \text{Sx})^2)/\sum \text{n}}{(\sum \text{n}) - 1}} \end{array}$$

#### *4.4. Characteristics of Studies*

A total of 1630 ALS patients were included in the systematic review. The total number of reported patients for each gene is shown in Figure 4 column 4. As not all studies reported all clinical parameters, the total number of patients studied for each parameter is reported in the first subcolumn for each parameter. On average, 59% of the population was male, with considerable variation between genes (See Table S2). Most of the studies were conducted in Europe, North America and Asia.

#### *4.5. Overall Findings and Discussion*

For most genetic forms of ALS reported in Table S2 and in Figure 4, the age of onset ranges between 50 and 70 years old. Exceptions to this include cases of juvenile ALS, which are observed with mutations in *SPG11* [95,144], *FUS* [145,146] and *ALS2* [53,55] (Table S2). Whilst *FUS* patients are known to show considerable variation in phenotype, with some showing early onset and fast progression, others show a later age of onset and a slower-progressing phenotype [147]. This variation in the *FUS* phenotype has been hypothesised to arise due to the different effects exerted by missense and truncating mutations [148]. Interestingly, the studies reviewed here suggest that *FUS* mutations are indeed associated with a relatively early age of onset (41.8 ± 14.5 years) and a fast-progressing phenotype, with average disease duration lasting 30.6 months (Figure 4). Another gene sometimes associated with early-onset ALS is *SETX*. Patients with *SETX* mutations have been reported to display a slow-progressing phenotype in which bulbar and respiratory muscles seem largely unaffected [149]. However, in one reported case, a patient did go on to experience bulbar symptoms 3 years after onset [150]. Moreover, from the studies retrieved in this review, *SETX* patients do not show an early age of onset nor a particularly slow phenotype. For instance, the average age of onset for *SETX* patients was 59.5 ± 24.7 years with an average disease duration of 43.8 ± 37.5 months.

Many ALS-associated genes show variation in site of onset. Among the 22 genes included in Figure 4, cases of spinal onset are predominant in 19. This is in line with previous findings that suggest spinal onset accounts for approximately two-thirds of ALS cases [32]. For example, *SOD1*, *hnRNAP1*, *TUBA4A*, and *ALS2* show a high percentage of patients with spinal onset (>80%), while spinal onset in *VCP*, *NEK1*, and *TBK1* cases accounted for 50%, 50% and 55% of cases, respectively. Some other ALS-associated gene mutations were associated with a lower proportion of spinal onset, e.g., 33% of *C9orf72* cases, and 40% of *UBQLN2* cases. However, previous research suggests that *C9orf72* ALS demonstrates frequent occurrence of both spinal [151] and bulbar onset [152]. Moreover, it has been reported that site of onset in *C9orf72* ALS can be used to predict disease duration. For instance, the average age of onset in patients with spinal onset was 59.3 years, increasing to 62.3 years in patients with bulbar onset, and male patients with spinal onset seem to display a faster-progressing phenotype [153].

A striking 95% of *SOD1* cases were classified as spinal onset. Indeed, animal studies have provided support for the notion that *SOD1* pathology begins at the periphery and proceeds in a retrograde manner [154,155]. Recently, a homozygous mutation that eliminates the enzymatic activity of *SOD1* was found to result in a severe LMN phenotype and mild cerebellar atrophy in a young child [156] and the presence of a *SOD1* p.D12Y variant was shown to result in a LMN-predominant phenotype [157]. Similarly, seven studies reported a non-negligible percentage of patients with pure LMN signs (Table S2,

Figure 4, 47.6% pure LMN vs. 45.2% UMN+LMN, [158–161]). Overall, these studies seem to suggest that *SOD1* mutations exert profound effects at the distal nerve. In addition, the observation that both overexpression, and absence of *SOD1* activity lead to pathology should be an important consideration in the development of therapeutics that aim to alter *SOD1* levels as a novel treatment in ALS [162].

Figure 4 was sorted in descending order for the percentage of patients showing LMN signs. Not all studies reported UMN and/or LMN signs, and thus the percentage given in this table only represents a small proportion of the studies (see Table S2 for more details). However, it is interesting to see that the majority of the gene mutations do indeed elicit a phenotype that is characterised by both UMN+LMN signs, consistent with the classical clinical definition of ALS. *FUS*, *C9orf72* and *TARDBP* all demonstrated increased presence of both UMN and LMN signs with both neuronal populations affected in 66.7%, 72.7% and 44.4%, respectively. Surprisingly, only 33% of *FIG4*, *PFN1*, *MATR3* and *NEK1* cases showed both UMN and LMN signs, although it should be noted that 4 of the 14 studies reviewed in relation to these genes did not provide details regarding the pattern of motor neuron involvement. Some ALS-associated genes demonstrated >20% of patients with pure LMN signs (*SOD1*, *FUS*, *PFN1*, *ATXN2*, *TARDBP*, *TBK1*, and *hnRNPA1*), while pure UMN signs had >20% preponderance in several genes (*ANG*, *TBK1*, *FIG4*, *MATR3*, *NEK1*, *hnRNPA1*).

Finally, the current review also aimed to collect information regarding the prevalence of cognitive impairments and FTD in ALS. FTD was most frequent in cases with mutations in either *C9orf72*, *SQSTM1*, or *TBK1* (36%, 67%, and 43%, respectively, Figure 4). Indeed, *C9orf72* [163], *SQSTM1* [164], and *TBK1* [165] have all previously been linked with FTD onset. However, in a large screen of 121 patients with FTD, genetic mutations were successfully identified only in *C9orf72* and *SQSTM1*, whilst no *TBK1* variants were identified [166]. It is also worth noting that despite the frequent association between *TARDBP* and FTD, only 12% of cases reviewed here were found to have concomitant FTD symptoms. In relation to general cognitive functioning, reports of impairment were observed across 10 ALS-associated genes, although the number of patients studied for this parameter were often quite low (Table S2), rendering it difficult to form conclusions.


**Figure 4.** Table summarising the phenotypes observed in ALS patients with different mutations. A detailed version of this table is accessible as supplemental data (Table S2). PubMed was searched to identify published studies reporting genotype–phenotype data for 23 genes. Column 2 indicates the frequency of genes observed in

Caucasian populations described in previous reviews (Volk et al. [18] and Chia et al. [167]). The ethnicity/origin of cohorts reported across studies is specified in Table S2 and summarised in column 3. For each parameter reported in this table, the number of patients is given in the first subcolumn for each category. For age of onset, motor neuron impairments, disease duration, and FTD, the weighted mean ± SD, and weighted percentages are given, taking into account the numbers of patients studied. Data for each gene were collected from the following reference studies, then summarized: hnRNPA1: [168]; SETX: [169–173]; SOD1: [24,92,160,161,171,172,174–188]; TBK1: [189–192]; FUS: [24,92,145,146,148,170–172,179,180,187,188,193–196]; OPTN: [171,180,187,197–203]; TUBA4A: [123,188,204,205]; TARDBP: [171,172,179,180,187,188,192,206–211]; ATXN2: [101,102,171,180,188,212–214]; C9orf72: [114,170,171,175,180,188,206,215–226]; MATR3: [227–231]; FIG4: [171,192,232]; ANG: [171,180,187,233–235]; NEK1: [131,236]; KIF5A: [137,141,143,237]; CCNF: [135]; PFN1: [238–241]; VAPB: [171,242,243]; UBQLN2: [244,245]; SQSTM1: [171,175,192,246]; VCP: [170,171,175,192,247]; SPG11: [179,248]; ALS2: [55,170]. Results from each separate study are shown in Table S2. Gradient colour for age of onset from dark blue to dark red: dark blue, 16 years; dark red, 60 years. Gradient colour for site of onset and for motor neuron impairment distributions: white, 0%; green, 100%.

#### **5. Conclusions**

Over 150 years have passed since ALS was first reported by Charcot and still the aetiology of the disease remains elusive. Although research is progressing and genetic studies continue to identify novel gene associations [14,249–251], many questions remain surrounding the pathological mechanisms associated with already established mutations, their role in the ALS phenotype, and the as yet undiscovered mechanisms that underlie sporadic onset of disease. Here, we have performed a systematic review in an attempt to highlight genotype–phenotype correlations for 23 of the more commonly reported mutated genes in ALS. This has proven to be challenging as many genetic studies do not capture or report a complete summary of clinical data. Whilst it is understandable that such data are difficult to acquire, we hope to illustrate that there is a need for improved and more widely available clinical and informatics resources that would enable genotype–phenotype associations to be easily visualised in ALS.

Whilst we have illustrated the relationship between commonly reported mutated genes and various clinical measures including age and site of onset, disease duration and motor neuron involvement, a limitation of the current review is that we do not consider variation among phenotypes of patients having different mutations of the same gene. For many genes involved in ALS, including *FUS*, *SOD1*, and *TARDBP*, the phenotype may be different depending upon the specific genetic mutation in question. In *SOD1* patients, for instance, the A4V mutation results in a much more aggressive phenotype (death occurring ~1.2 years after onset [252]) than the H46R mutation, for which patients show a relatively mild phenotype (duration of ~17 years [253]). It could be of value in future work to comprehensively review variations in genotype–phenotype correlations among the different mutations reported by single-gene studies, which in turn could contribute towards a comprehensive database of ALS genotype–phenotype correlation. Such a resource could ultimately improve our mechanistic understanding of ALS by enabling a more robust assessment of how the ALS phenotype responds to different variants across multiple genes.

Additional limitations include that many of the studies surveyed are relatively small, involving low numbers of patients, and that, as well as only a subset of studies reporting clinical breakdown of phenotype, ethnic breakdown is also not always reported and some ethnicities have minimal representation.

Despite these limitations, the collected data reveal a landscape of highly variable phenotypic associations, underlining the complexity of the disease, and the need for nuanced approaches to the development of clinical assays and therapeutics.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2075-4426/10/3/58/s1, **Table S1**: Key words used in PubMed literature search. **Table S2:** Collated data from clinical studies on monogenic forms of ALS. **Table S3:** PRISMA checklist for systematic review.

**Author Contributions:** O.C., L.L.G., and S.D. collated the data from the literature. O.C., L.L.G. and S.D. organised the data and wrote the paper. O.C., L.L.G., G.M., C.G.D., W.J.D. and S.D. wrote, discussed, and edited the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financed by the European Union Regional Development Fund (ERDF) EU Sustainable Competitiveness Programme for Northern Ireland, Northern Ireland Public Health Agency (HSC R&D) and Ulster University (PI: A Bjourson). L.L.G. was a recipient of an ArSLA PhD fellowship, O.C. was a recipient of a PhD DELL fellowship and G.M. was a recipient of an IICN fellowship.

**Acknowledgments:** We would like to thank Vanessa Milla for her input in the systematic search.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review DUX4* **Expression in FSHD Muscles: Focus on Its mRNA Regulation**

#### **Eva Sidlauskaite 1, Laura Le Gall 1, Virginie Mariot <sup>1</sup> and Julie Dumonceaux 1,2,\***

	- e.sidlauskaite@ucl.ac.uk (E.S.); l.gall@ucl.ac.uk (L.L.G.); virginie.mariot@ucl.ac.uk (V.M.)

Received: 16 June 2020; Accepted: 24 July 2020; Published: 28 July 2020

**Abstract:** Facioscapulohumeral dystrophy (FSHD) is the most frequent muscular disease in adults. FSHD is characterized by a weakness and atrophy of a specific set of muscles located in the face, the shoulder, and the upper arms. FSHD patients may present different genetic defects, but they all present epigenetic alterations of the D4Z4 array located on the subtelomeric part of chromosome 4, leading to chromatin relaxation and, ultimately, to the aberrant expression of one gene called *DUX4*. Once expressed, DUX4 triggers a cascade of deleterious events, eventually leading to muscle dysfunction and cell death. Here, we review studies on *DUX4* expression in skeletal muscle to determine the genetic/epigenetic factors and regulatory proteins governing *DUX4* expression, with particular attention to the different transcripts and their very low expression in muscle.

**Keywords:** FSHD; DUX4; transcription; muscle; regulation

#### **1. Introduction**

Double homeobox 4 (*DUX4*) is a transcription factor that is normally expressed during embryonic development and in the human testes but suppressed in somatic tissue (for review see [1]). The recent finding of DUX4 in an early cleavage-stage embryo raised the hypothesis that DUX4 might act as a functional transcriptional programmer to activate the cleavage-stage transcriptional platform and might be a key regulator of zygotic genome activation [2–4]. Moreover, the presence of DUX4 in the testis suggests that *DUX4* may be activated in the primary spermatocytes during spermatogenesis [5]. More recently, *DUX4* activation gained a particular interest across cancer research, as *DUX4* expression in tumours results in immune evasion [6].

Despite the awareness of *DUX4* expression in normal germline biology, DUX4 is principally described as a toxic factor involved in facioscapulohumeral dystrophy (FSHD) pathophysiology. Indeed, in FSHD patients, DUX4 is aberrantly expressed in the muscle tissue [5,7]. The role of DUX4 in FSHD pathogenesis is intensively investigated, and several reviews have been published in this topic [8,9] explaining the potential role of DUX4 in cell death and discussing the role of DNA methylation in FSHD1 and 2 patients. The current review focuses on the recent understanding and regulation of *DUX4* mRNA expression at the mRNA level in skeletal muscle and myogenic cells.

#### **2. FSHD**

FSHD is the third most common genetic muscular dystrophy with a frequency between 1/8000 to 1/20,000 (www.orpha.net, April 2020). The primary manifestation of FSHD is an asymmetric atrophy of the muscles located in the face, the shoulder, and the upper arm. The pathology often begins during late adolescence; however, the presence of symptoms at an early age is often associated with more

severe muscle weakness (reviewed in [10]). The mutation that causes FSHD was identified nearly 30 years ago [11]. FSHD is associated with genetic and epigenetic molecular changes of the D4Z4 microsatellite repeats in the subtelomeric region of chromosome 4 [12,13]. There are two different genetic mechanisms leading to FSHD, and both are associated with the loss of epigenetic marks within the D4Z4 and the aberrant expression of *DUX4* [14]. The first one concerns 95% of FSHD patients (known as FSHD1, OMIM#158900) who show a contraction of a tandemly repeated 3.3 kb microsatellite D4Z4 repeat at the distal end of chromosomal region 4q35. The number of D4Z4 repeats usually varies from 11 to 150, while fewer repeats are observed in less than 3% of the population [15]. In FSHD1 patients, this number is reduced to 10 and below [16]. This reduction of D4Z4 unit number is associated with chromosome relaxation and loss of repression of *DUX4* gene (OMIM#606009), allowing DUX4 transcription in muscle cells [17]. The second one concerns the remaining 5% of FSHD patients (known as FSHD2, OMIM#158901), who do not present a shortened D4Z4 array but carry a mutation in epigenetic modifier genes. The vast majority of FSHD2 cases have been linked to mutations in the *SMCHD1* (structural maintenance of chromosomes flexible hinge domain containing 1) gene [18], encoding a remodelling protein essential for DNA methylation. Few FSHD2 cases present a heterozygous mutation in the *DNMT3B* (DNA methyltransferase 3 beta) gene [19], which is normally responsible for the establishment of the cytosine methylation profile during development. The exact mechanism of how particular mutations cause the FSHD pathology is still under investigation, but the notion of permissive chromosome 4 is now acknowledged for FSHD patients. This "pathological" chromosome 4 is characterized by the following: the presence of specific simple sequence length polymorphism (SSLP) located 3.5 kb proximal to the D4Z4 repeat [20]; the presence of at least one D4Z4 repeat [21]; a chromatin relaxation within the D4Z4 repeat [17]; and the presence of the 4qA haplotype [22,23] containing the polyadenylation signal for DUX4 [14]. Indeed, each D4Z4 contains the open reading frame (ORF) of the *DUX4* retrogene [7,24]. DUX4 protein and mRNA are detected in both FSHD1 and FSHD2 muscle biopsies at very low levels [5] but sufficient to induce a cascade of mis-regulated genes [25] eventually leading to muscle atrophy and muscle fibre death by the disruption of multiple cellular processes (for review see [8]).

#### **3. Regulation of** *DUX4* **Expression**

There is a consensus in the scientific community on *DUX4* expression in FSHD biopsies, but its regulation still needs to be deciphered. Indeed, *DUX4* expression is regulated by several factors including D4Z4 epigenetic modification, chromosome conformation and the presence of myogenic enhancers (Figure 1).

**Figure 1.** Regulation of *DUX4* expression.

*DUX4* expression is regulated by several factors including D4Z4 epigenetic modification, chromatin structure, regulatory proteins, and myogenic enhancers. *DUX4* is composed of 3 exons, exons 1 and 2 are present in each D4Z4 repeat, but exon 3 is located outside of the repeats. Three types of exon 3 have been described: exons 3a and 3b are transcribed from the 4A161L allele (dashed line) and exon 3 from 4A161S allele (plain line). Exon 3 carries the polyadenylation signal. Five *DUX4* isoforms have been characterized. The four leading to the full-length protein (DUX4-fl) are pathogenic, whereas the one leading to a truncated protein (DUX4-s) is non-pathogenic.

#### *3.1. D4Z4 Epigenetic Modification*

Because it is well known that epigenetic modifications play a significant role in gene regulation in normal and pathological environments, several studies have evaluated whether or not the epigenetic disruption observed at the 4q35 locus could lead to the expression of *DUX4*. In 2012, Lemmers and colleagues reported that antisense nucleotide-mediated exon skipping of *SMCHD1* in normal human myoblasts led to *DUX4* expression [18]. Combined with the observation that families with FSHD2 present a haploinsufficiency of SMCHD1 and a hypomethylation of the D4Z4 array [18], a link between epigenetic modifications and *DUX4* expression was established. Since then, several articles have reinforced the idea of an epigenetic regulation of DUX4 expression. The consequences of *SMCHD1* expression level on *DUX4* expression were particularly studied, and it was shown that SMCHD1 levels participate in *DUX4* expression in muscle cells. Indeed, depletion of *SMCHD1* in FSHD1 myoblasts increased *DUX4* expression [26] whereas its ectopic overexpression resulted in DUX4 silencing in FSHD1 and FSHD2 myotubes [27]. This is consistent with the fact that *DUX4* expression is increased during muscle differentiation, which correlates with decreased SMCHD1 protein levels at D4Z4 [27]. Moreover, the interaction of SMCHD1 with the chromatin is facilitated by the ligand-dependent nuclear receptor-interacting factor 1 (LRIF1), which binds to the D4Z4 repeat [28]. Interestingly, mutations in *LRIF1* lead to chromatin relaxation and *DUX4* derepression [28], and knockdown of the *LRIF1* long isoform in control myoblasts using siRNA results in the expression of DUX4 [28]. *DUX4* expression in myoblasts was also observed after decreased binding of SMCHD1 to D4Z4 caused by the inhibition of H3K9me3 (repressive mark associated with heterochromatin formation) using drugs [29]. Finally, a recent study has also shown that *DUX4* is expressed in myocytes obtained from patients presenting a 18p hemizygosity with a decreased of *SMCHD1* mRNA [30]. Altogether, these studies suggest a link between *SMCHD1*-mediated epigenetic modifications and DUX4 expression.

Multiple other lines of evidence show a role of epigenetics in DUX4 expression: (i) MyoD-converted fibroblasts isolated from FSHD2 patients carrying a mutation in the *DNA methyltransferase 3B* (*DNMT3B*) gene express DUX4, suggesting a D4Z4 derepression associated with DUX4 expression [19]. (ii) Several epigenetic pathways such as *ASH1L*, *BRD2*, *KDM4C*, and *SMARC5* were found to regulate DUX4 expression in primary FSHD cells after independent knockdown of multiple chromatin regulators [31]. (iii) Human chromosome 4/CHO hybrid cells treated with 5 -aza-2 deoxycytidine (AZA, a cytosine analogue that is incorporated into DNA during DNA replication) and/or trichostatin A (TSA, which inhibits class I and II histone deacetylases) led to *DUX4* expression [32,33]. (iv) Two D4Z4 factors, nucleosome remodelling deacetylase (NuRD) and chromatin assembly gactor 1 (CAF-1) were identified as DUX4 repressors in human skeletal muscle cells using RNA-guided Cas9 nuclease from the microbial clustered regularly interspaced short palindromic repeats (CRISPR/Cas9) engineered chromatin immunoprecipitation (enChIP) locus-specific proteomics to characterize D4Z4-associated proteins [34]. (v) Hemizygous transgenic mice carrying either a 2.5 or 12.5 D4Z4 repeat showed a chromatin relaxation of the D4Z4 repeats in D4Z4-2.5 mice compared to D4Z4-12.5 mice, associated with *DUX4* expression in the D4Z4-2.5 mouse [35].

Altogether, these studies strongly suggest that chromatin relaxation results in inappropriate DUX4 expression in skeletal muscle. However, regulation of *DUX4* expression may be different in other tissues or during development. Indeed, *DUX4* is expressed in early cleavage-stage embryos whereas a high methylation level is found at D4Z4 in pluripotent cells in both FSHD1 and controls [4,36], which goes against a link between D4Z4 hypomethylation and *DUX4* expression.

#### *3.2. Chromatin Conformation*

D4Z4 chromatin structure was also associated with *DUX4* expression/repression in muscle. Indeed, the 3D organization of chromatin modulates major biological processes including transcription. In regard of the link between DUX4 expression and chromatin conformation, it was proposed that, as a single repeat, D4Z4 behaves as a CCCTC-binding factor (CTCF) insulator interfering with enhancer–promoter communication [37]. However, both its CTCF binding and insulation properties are suppressed upon multimerization of D4Z4 units, suggesting that FSHD could result from an inappropriate insulation mechanism and a CTCF-gain of function [37]. Because CTCF can mediate transcriptional regulation by creating accessible or inaccessible loops of chromatin at specific sites, the involvement of CTCF in *DUX4* expression was proposed [38]. In this study, the authors found CTCF to be more readily associated with transcriptionally silent arrays, suggesting a role of CTCF in repressing *DUX4* transcription.

D4Z4 was also described as an insulator shielding from telomeric position effect (TPE). Indeed, telomeres can regulate gene expression by trapping adjacent heterochromatin. Using isogenic clones with different telomere lengths, it was demonstrated that telomere shortening led to *DUX4* expression [39]. The likely mechanism is that the epigenetic landscape is altered during telomere shortening resulting in decreased heterochromatin at 4q35 [40,41].

Interestingly, whereas the epigenetic modifications observed in FSHD patients at the D4Z4 array are not restricted to the muscle tissue [42–44], *DUX4* mRNA was found mainly in the skeletal muscle, testis, and thymus [5,45]. Two enhancers upstream of the D4Z4 that upregulate DUX4 expression in skeletal myocytes but not in fibroblasts were described [46]. Importantly, these enhancers participate in *DUX4* expression only when the *DUX4* promoter is hypomethylated. However, the exact role of these enhancers in FSHD onset may be questioned as two FSHD1 patients have been identified with large deletions encompassing this chromosomal region [47]. Moreover, meiotic rearrangements between chromosomes 4 and 10 [14,48] go against a central role of other regions of chromosome 4 in *DUX4* expression.

#### *3.3. Regulatory Proteins of DUX4 Expression*

Transcriptional regulation of DUX4 expression may be also controlled by gene regulatory proteins that interact with the DUX4 promoter, and one study identified Poly(ADP-Ribose) Polymerase 1 (*PARP1*) using a DNA pull-down assay coupled with mass spectrometry and chromatin immunoprecipitation [49].

Several inhibitors of *DUX4* have been published, suggesting that the target inhibitors may play a role in *DUX4* expression. It was shown that activation of the Wnt/β-catenin signalling reduced *DUX4* expression whereas knockdown of Wnt/β-catenin signalling pathway components activates DUX4 [50]. The mechanism of *DUX4* regulation by Wnt/β-catenin is likely independent of direct binding of β-catenin at D4Z4. Bromodomain and extra-terminal (BET)- and β2 adrenergic receptor-mediated pathways were also associated with DUX4 expression regulation [51]. Using BET inhibitors (BETi) targeting all proteins of the BET family, *DUX4* and DUX4 target candidates were silenced in primary FSHD muscle cells [51]. The research team suggested that BETi efficiently repressed *DUX4* transcription by lysine deacetylation but not DNA methylation. Similarly, β2 adrenergic receptor agonists activate signalling pathways known to induce chromatin remodelling. *DUX4* and DUX4 target candidates' expression were both repressed following treatment withβ2 adrenergic receptor agonists, suggesting the role of BET and β2 adrenergic receptor signalling pathways in DUX4 expression in FSHD patients [51]. Since then, the importance of the β2 adrenergic receptor has been confirmed in additional studies [52], and downstream pathways have been the centre of attention in order to identify therapeutic targets. P38 mitogen-activated protein kinase is activated by the β2 adrenergic receptor signalling pathway [53]. In FSHD muscle cells or in a xenograft model of FSHD, pharmaceutical or siRNA-mediated inhibition of p38 induced a reduction of *DUX4* mRNA levels [54]. This suggests that β2 adrenergic receptor agonist-mediated *DUX4* expression is a consequence of p38 kinase activation. Phosphodiesterases, or PDEs, which are responsible for regulation of available cAMP in the cell, were identified as *DUX4* expression regulators [52] by reducing expression levels of both DUX4 and its target genes *ZSCAN4* and *TRIM43*. β2 adrenergic receptor and PDEs are both implicated in cAMP-mediated signalling that further regulates protein kinase A (PKA) signalling pathways. Both cell-permeable cAMP and catalytic active PKA were sufficient to reduce *DUX4* expression and *ZSCAN4* and *TRIM43* mRNA levels [52] in primary FSHD patients' muscle cells. The authors suggested that β2 adrenergic agonists and PDE inhibitors mediated a c-AMP and PKA-mediated repression of DUX4 gene expression in FSHD muscle cells. However, downstream effectors of cAMP also include PKA-independent pathways, and the results from Campbell et al. suggest a PKA-independent mediated repression of DUX4 [51]. Later, p38α and p38β MAPK inhibitors were identified as suppressors of *DUX4* mRNA transcription in myotubes and in a xenograft model of FSHD [54], suggesting a positive regulation of *DUX4* transcription by both p38α and p38β.

#### **4.** *DUX4* **mRNA**

#### *4.1. DUX4 Transcription*

The presence of a large ORF encompassing 2 homeoboxes in each D4Z4 repeat was first described in 1995 [55], but the identification of the DUX4 gene occurred in 1999 by the Belayew group [7]. This group also identified the DUX4 promoter with a variant of TATAA box (TACAA) [7]. The final demonstration that D4Z4 contains a functional DUX4 transcriptional unit leading to the *DUX4* transcription was made few years later after cloning of the D4Z4 region into a promoter-less vector and transfection into myoblasts [56]. 5 Rapid amplification of cDNA ends (RACE) PCR lead to the identification of the 5 untranslated region (UTR) composed of 97–187 nt [56]. The polyadenylation site was described after 3 RACE PCR on total RNA extracted from C2C12 mouse myoblasts transfected with a 13.5 kb genomic fragment of a patient with two D4Z4 repeats [57]: It is the ATTAAA hexanucleotide sequence (12852–12858 in GenBank accession no. AF117653).

The *DUX4* mRNA found in the muscle tissue is composed of 3 exons, with the *DUX4* ORF being entirely within exon 1. Importantly, exons 1 and 2 are present in the D4Z4 repeats but not exon 3, which is located in region called pLAM. Notably, the pLAM region is not present on the 4qB haplotype that is classified as non-pathogenic [22,58]. This leads to the hypothesis that DUX4 would only be transcribed for the most telomeric repeat because only this one would give rise to a polyadenylated DUX4. The role of this region in DUX4 expression and stability was highlighted by the report of individuals with a genomic rearrangement between chromosome 4q and 10q. Indeed, the subtelomeric part of these 2 chromosomes is highly homologous and, importantly, chromosome 10 does not carry the ATTAAA poly(A) signal found in chromosome 4, but an ATCAAA sequence that is not known to be a poly(A) signal [14]. Meiotic rearrangements between chromosomes 4 and 10 generated a short hybrid structure on 4qA where the pLAM sequence was conserved but immediately proximal to a 1.5 D4Z4 repeat coming from chromosome 10, resulting in disease presentation. Transfection experiments with genomic D4Z4 constructs derived from permissive or non-permissive chromosomes or in which the poly(A) signals from non-permissive chromosomes are replaced by those from permissive chromosomes established the importance of this poly(A)signal in the stabilization of *DUX4* [14].

Two different *DUX4* mRNAs, resulting from the inclusion or exclusion of an alternatively spliced intron of 136 bp located in the 3 UTR part of mRNA have been described [57]. The two *DUX4* mRNAs have also spliced out a 345 bp intron also located in the 3 UTR region [57]. These two *DUX4* mRNAs were later renamed DUX4-full length (DUX4-fl) [5]. Recently, other *DUX4* mRNAs have been characterized from a common variant of the most prevalent FSHD-permissive haplotype 4A161 (containing an SSLP of 161 nt and the distal 4qA variant [59]). These two variants present a 1.6 kb

size difference of the most distal D4Z4 units [60]. Two *DUX4* mRNAs are transcribed from this long allele using 2 alternative 3 splice sites, leading to either the DUX4-fl 161La or Lb transcripts (Figure 1) (GenBank accession numbers MF693913 and KQ983258.1). The three pathogenic DUX4-fls share the pLAM sequence containing the DUX4 poly(A) and lead to the same DUX4 protein. There is no link between disease severity and transcript variants [60].

#### *4.2. DUX4 Isoforms*

DUX4 transcription from the last D4Z4 repeat results in at least 5 different mRNAs, the 4 *DUX4-fls* described above, code for the same protein but differ by an altered splicing of intron 1 in the 3 UTR and by the use 2 alternative 3 splice sites leading to different types of exon 3. The fifth *DUX4* transcript corresponds to a short version of DUX4 (DUX4-s), in which an alternative donor splice site located in first exon is used [24], leading to a truncated form of DUX4, lacking the C-terminal part of the protein containing the transactivation domain [61] and acting as a dominant negative [25]. *DUX4-fl* isoforms are mainly found in myotubes and muscles biopsies isolated from FSHD patients, whereas *DUX4-s* can be found in both control individuals and FSHD patients [5,62]. DUX4-fl expression increases in myotubes [5,62,63]. An isoform switch may be possible, since it was shown in iPS cells derived from control fibroblasts that *DUX4*-fl is expressed in undifferentiated cells but can switch to *DUX4*-s in embryoid bodies [5]. *DUX4-fl* mRNA is expressed in muscles during development, as both isoforms are found in foetal muscle biopsies and cells derived from foetal muscle [64,65].

Interestingly, *DUX4* mRNA is also found in human testes at a level 100-fold higher compared to FSHD muscle biopsies [5] but does not seem to be toxic. 3 RACE PCR analysis revealed that both chromosomes 4 and 10 were used for *DUX4* transcription, despite the absence of a permissive poly(A) signal on chromosome 10. Chromosome 10 and some 4qA transcripts use an alternative poly(A) located in exon 7. Surprisingly, *DUX4* transcripts were also found from the 4qB allele, but the poly(A) still need to be identified. Exons 3 and 7 are excluded since they are not present in the 4qB allele. Non-canonical poly(A) signals may be also used in some circumstances, as observed in the presence of antisense oligonucleotides targeting the poly(A) signal [66]. The use of alternative poly(A) signals could also explain the normal embryogenesis observed in individuals carrying non-permissive 4q alleles. Consistent with this hypothesis, studies have also shown that alternative polyadenylation pattern varies among cell types [67] and during embryonic development [68].

#### **5. DUX4 Low Abundancy and Stochastic Expression**

*DUX4* mRNA is found at a very low level in both biopsies and muscle cells from both FSHD1 and FSHD2 patients. This low abundance could reflect a uniform low level in all nuclei or a high expression in a limited number of nuclei. By pooling a different number of nuclei and after assessment of the presence of *DUX4-fl* by PCR, it has been estimated that about 1 in 1000 FSHD nuclei are positive for *DUX4* mRNA [5]. The question is how could a gene expressed at such low levels be so toxic? The presence of the endogenous DUX4 protein in consecutive myotube nuclei, forming an intensity gradient, suggested a spreading of the protein within the myotubes [69]. This hypothesis was confirmed by co-culture experiments between FSHD myoblasts and murine C2C12 myoblasts. Whereas *DUX4* is transcribed in human nuclei only, the protein was found in both human and murine nuclei showing the spreading of the DUX4 protein [70]. The sporadic and asynchronous burst of expression of DUX4 was confirmed using a *DUX4*-activated reporter [71].

#### **6. Conclusions**

During the past decade, our knowledge about FSHD onset considerably improved. Several genetic and epigenetic defects have been clearly identified that cause FSHD, all leading to the aberrant expression of the DUX4 transcription factor. Once expressed, DUX4 triggers a cascade of events that ultimately converge to cell death and impair muscle development and repair (for review see [8]). After years of controversy, DUX4 is now seen as one of most important players in FSHD onset and

progression. Some areas remain unelucidated, such as the non-toxic expression of DUX4 during embryogenesis [2,4] or the different splicings observed in the testis [5]: Are they due to a difference between pathogenic and healthy environment or are they tissue-specific?

Multiple studies have deciphered the expression of DUX4 in skeletal muscle and demonstrated that chromatin conformation, DNA methylation and histone modification, myogenic enhancer, and regulatory proteins are involved in the regulation of its expression. Moreover, some other repressor proteins or lncRNA that are associated with the D4Z4 repeat may also play a role [32,72].

Several laboratories are developing therapeutic approaches targeting DUX4 by either blocking *DUX4* mRNA synthesis [31,51,52,54], targeting *DUX4* mRNA using antisense oligonucleotides [66,73–76], or targeting the DUX4 protein or its downstream consequences [77–79]. One phase 2 clinical trial (NCT04003974) aiming at inhibiting or reducing its expression in skeletal muscle is already on-going and may enable a better understanding of the role of DUX4 in the pathophysiology of FSHD.

**Author Contributions:** The idea of writing this review was done by E.S., L.L.G., V.M. and J.D. All the authors wrote and edited the review. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** This work was supported by the FSHD society (grant number FSHS-22018-02 for E.S. salary) and L.L.G. is funded by the Association Française contre les Myopathies AFM-Telethon (grant number # #22582). V.M. and J.D. are supported by the National Institute for Health Research Biomedical Research Centre at Great Ormond Street Hospital for Children NHS Foundation Trust and University College London. All research at Great Ormond Street Hospital NHS Foundation Trust and UCL Great Ormond Street Institute of Child Health is made possible by the NIHR Great Ormond Street Hospital Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Molecular and Cellular Mechanisms A**ff**ected in ALS**

**Laura Le Gall 1,2,**†**, Ekene Anakor 1,**†**, Owen Connolly 1,**†**, Udaya Geetha Vijayakumar 1, William J. Duddy <sup>1</sup> and Stephanie Duguez 1,\***


Received: 20 July 2020; Accepted: 22 August 2020; Published: 25 August 2020

**Abstract:** Amyotrophic lateral sclerosis (ALS) is a terminal late-onset condition characterized by the loss of upper and lower motor neurons. Mutations in more than 30 genes are associated to the disease, but these explain only ~20% of cases. The molecular functions of these genes implicate a wide range of cellular processes in ALS pathology, a cohesive understanding of which may provide clues to common molecular mechanisms across both familial (inherited) and sporadic cases and could be key to the development of effective therapeutic approaches. Here, the different pathways that have been investigated in ALS are summarized, discussing in detail: mitochondrial dysfunction, oxidative stress, axonal transport dysregulation, glutamate excitotoxicity, endosomal and vesicular transport impairment, impaired protein homeostasis, and aberrant RNA metabolism. This review considers the mechanistic roles of ALS-associated genes in pathology, viewed through the prism of shared molecular pathways.

**Keywords:** oxidative stress; mitochondria dysfunction; axonal transport; autophagy; endocytosis; secretion; excitotoxicity; RNA metabolism; MND

#### **1. Introduction**

Amyotrophic lateral sclerosis (ALS) is the most frequent motor neuron disease (MND), with an estimated ~223,000 patients being affected globally in 2015 [1]. The pathology affects both upper motor neurons (UMN) in the cortex and lower motor neurons (LMN) in the brainstem and spinal cord [2]. Paralysis and death usually occur between three to four years after symptom onset [3], and there are currently no effective treatments to slow disease progression [4]. Approximately 90% of ALS cases are sporadic, while 10% are familial, defined by the occurrence of ALS in more than one family member [5]. Around 30 different genes are linked with ALS [5,6], explaining ~20% of all ALS cases and associated with different molecular functions and disease phenotypes [7], so that the task of understanding the relationships between affected pathways is complex.

To investigate the different molecular pathways affected in ALS, various in vivo models, including drosophila [8–11], C-elegans [12], zebrafish [13–16], and rodents [17], as well as in vitro cell models such as patient lymphoblastoid cell lines [18] and hybrid [19,20] or primary murine cell lines, [21] have been developed. Most of these models investigate the pathological effects of mutations to ALS genes, including *Fused in Sarcoma* (*FUS*), *Superoxide dismutase*(*SOD1*),*TAR DNA-binding protein 43* (*TDP-43*), and*Chromosome 9 open reading frame 72* (*C9orf72*) [22,23]. Their study has resulted in numerous cellular and molecular mechanisms being proposed to explain motor neuron death. Mechanisms frequently implicated include: reactive oxygen species (ROS)-associated oxidative stress [24–27], mitochondrial dysfunction [24], axonal

and vesicular trafficking dysregulation [28,29], glutamate-mediated excitotoxicity [30–33], proteostatic impairments [34–38], and altered RNA metabolism and/or processing [39–42].

Alteration to one or more of these cellular processes may be present, not only in the motor neurons themselves but, also, in neighboring cell populations, such as glial cells, peripheral inflammatory cells, and muscles, as ALS is increasingly considered a multisystemic disease that culminates in motor neuron death [6,24]. For example, astrocytes and microglia have been implicated in the release of proinflammatory mediators that lead to chronic neuroinflammation and motor neuron toxicity [43]. In addition, the selective overexpression of mutant SOD1 in skeletal muscle was shown to cause mitochondrial abnormalities, induce microglial activation in the central nervous system (CNS), and result in severe muscle atrophy in mice [44].

Consensus is yet to be reached regarding the causal mechanisms involved in the onset and propagation of ALS. The aim of this review is to identify and summarize the different molecular mechanisms implicated in various forms of the disease, including sporadic and familial cases. In doing so, it is hoped that new insights may be gained regarding the role of different pathways across different forms of the disease.

#### **2. Oxidative Stress**

Oxidative stress results from an imbalance between the production and elimination of reactive oxygen species (ROS) [45], as well as an impaired ability to repair ROS-mediated toxicity [46], and has been of particular interest in ALS pathogenesis ([47] and Figure 1) since the discovery of *SOD1* mutations in familial forms of ALS [48]. Increased levels of oxidized proteins, RNA, DNA, and lipids have been observed in post-mortem tissue from both sporadic and *SOD1* ALS cases [27,49,50], as well as in the cerebrospinal fluid (CSF), serum, and urine of sporadic ALS patients [26].

SOD1 is a major antioxidant enzyme that is ubiquitously expressed and catalyzes radical superoxide anions into molecular oxygen and hydrogen peroxide [51]. Approximately 80 of the 160 *SOD1* mutations reported in ALS are missense mutations that fail to cause a loss of SOD1 activity [52], and many SOD1 mouse models show a progressive, late-onset motor phenotype with concomitant astrogliosis and motor neuron pathology when mutated forms of human SOD1 are overexpressed [17]. Evidence from human samples have shown that there is a 42% reduction in overall SOD1 activity in familial SOD1 patients [53], potentially leading to an imbalance between ROS production and degradation (Figure 1). This imbalance might be exacerbated by the disruption of the NRF2-ARE (Nuclear erythroid 2-Related Factor—antioxidant response element) signaling pathway that is observed in *SOD1* ALS [54], thus affecting the expression of antioxidant proteins [55] (Figure 1). Supporting these hypotheses, oxidative damage such as protein glycoxidation and lipid peroxidation were observed in the motor neurons of the anterior horn from *SOD1* familial ALS (fALS) patients [56] and SOD1G93A mice [57,58].

The generation of ROS could result from the activity of NADPH oxidase in the lipid raft membrane compartment. Interestingly, the *ATXN2* gene encodes the ataxin-2 polyglutamine (PolyQ) protein, and intermediate-length PolyQ expansions (27–33 Qs), which are known to be a significant risk for ALS [59–61], can interact with NADPH oxidase and may lead to an increase in ROS production, DNA damage, and mitochondrial distress [62] (Figure 1).

**Figure 1.** Oxidative stress, mitochondrial dysfunction, axonal transport, and glutamate excitotoxicity in amyotrophic lateral sclerosis (ALS). An increase in oxidative stress can result from defects in detoxifying pathways. Such defects include the loss of SOD1 function, aberrant DNA damage repair machinery, or a decrease in expression of antioxidant genes affecting the NRF2-ARE pathway. Oxidative stress can also be increased by the stimulation of ROS production via increased NADPH oxidase activity or from disrupted mitchondrial respiratory chain activity. Mitochondrial activity can be affected by several ALS mutations, such as those leading to the accumulation of protein aggregates, or to decreased mitochondrial biogenesis and transport, or to increased cytosolic Ca2<sup>+</sup> (as observed when glutamate receptor activity is stimulated or when the Ca2<sup>+</sup>-buffering capacity is decreased). Consequently a disruption of the mitochondrial respiratory chain will lead to an increase in ROS production and, thus, to an accumulation of oxidized proteins, lipids, DNA, and RNA. Oxidative damage occurring over time may then stimulate apoptotosis and, thus, cell death. Defective axonal transport affects not only the mitochondria but, also, the transport of other proteins and RNA, with consequences on the axon structure and function being accompanied by neurofilament accumulation.

Defective glutamate uptake by astrocytes, and/or a defect in glutamate receptor clearance or in AMPA or GABA receptors, can lead to increased Ca2<sup>+</sup> permeability and can impact the post-synaptic hyperexcitability and mitochondrial function. ARE: antioxidant response element, AMPA2: α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor 2, ATXN2: ataxin 2, Bcl2: B-cell lymphoma 2, C9orf72: Chromosome 9 open reading frame 72, C21orf2: Chromosome 21 open reading frame 7, CHCHD10: coiled-coil helix coiled-coil helix domain-containing 10, DCTN1: Dynactin 1, EEAT2: Excitatory amino acid transporter, ER: endoplasmic reticulum, FUS: Fused in Sarcoma, GABA: gamma-Aminobutyric acid, GlyR: glycine receptor, GlyT: glycine transporter, KIF5A: Kinesin heavy-chain isoform 5A, MAM: Mitochondria-associated ER membranes, NEFH: heavy-weighted neurofilaments, NEK1: (NIMA)-related kinase 1, NMDA: N-methyl-D-aspartate receptor, NRF2: Nuclear erythroid 2-Related Factor, PFN1: profilin-I, PTPIP51: Protein tyrosine phosphatase-interacting protein 51, SETX: senataxin, SOD: Superoxide dismutase 1, SPG11: Spatacsin, TDP-43: TAR DNA-binding protein 43, VAPB: vesicle-associated membrane protein-associated protein B, VCP: valosin-containing protein, and ROS: reactive oxygen species.

Recurrent oxidative stress and/or mitochondrial dysfunction occurring throughout the life of the cell can lead to DNA damage—damage that can be fixed by activating the DNA damage repair machinery. Several genes known to encode for proteins involved in DNA damage repair [63–65] are also associated with ALS: *NEK1* [66], *C21orf2* [67], and *SETX* [68]. These encode for the proteins never in mitosis-A (NIMA)-related kinase 1 (NEK1), cilia and flagella-associated protein 410, and the DNA/RNA helicase senataxin, respectively. Mutations in these genes may therefore increase the susceptibility to ALS as a result of dysregulated DNA damage repair machinery [67,69,70], leading to an impaired ability of motor neurons to cope with oxidative stress, consequently leading to cell death [64,70] (Figure 1). For example, induced pluripotent stem cell (iPSC) motor neurons derived from NEK1c.2434A<sup>&</sup>gt;T-mutated ALS patients exhibit an increased level of DNA damage, as well as a failure to repair DNA double-strand breaks [70]. Primary motor neurons from *SETX*R2136H and *SETX*L389S murine models were unable to cope with induced oxidative stress and showed an increased stress granule formation [71].

Altogether, these studies suggest that oxidative stress might be increased in sporadic and familial ALS patients. Increased oxidative stress may affect mitochondrial function [72], exacerbate endoplasmic reticulum stress [73], and impact protein homeostasis mechanisms [74], ultimately leading to cell damage and neuronal loss.

#### **3. Mitochondrial Dysfunction**

Mitochondria are key organelles for ATP generation, calcium buffering, and apoptosis regulation [75], and their dysfunction in the dorsal root ganglion cells of sporadic ALS patients has been described previously [76]. Several mechanisms can trigger mitochondrial dysfunction in ALS (Figure 1).

The maintenance of mitochondrial cristae organization is crucial to ensure respiratory chain function [77] and requires cardiolipin, the ATP synthase dimer, and large protein complexes such as the mitochondrial contact site complex (MICOS) and dynamin-like Opa1/Mgm1 [78,79]. The coiled-coil helix coiled-coil helix domain-containing protein 10 (CHCHD10), known to be associated with ALS [23], is suspected to be either part of [80,81] or interact with MICOS [82]. Consequently, mutations in *CHCHD10* result in the loss of mitochondria cristae [80], mitochondria fragmentation [81], and defective mitochondrial repair [80,83] (Figure 1).

Mitochondrial biogenesis can also be directly affected, as observed in *FUS*-mutated conditions [84,85]. While *FUS* encodes for a DNA/RNA-binding protein [86] predominantly localized to the nucleus, mutated forms of FUS can accumulate in the cytosol and possibly become toxic [87,88] and affect the mitochondrial function. For example, the mutated FUSP525L can interact with mitochondrial chaperone proteins and induce mitochondria fragmentation and elevated ROS production [84,85].

Aberrant swollen mitochondria morphology has also been observed in neuronal, non-neuronal cells, and muscle tissue of other fALS cases, such as *SOD1* [24,89] and *C9orf72* [25,89]-mutated ALS patients but, also, in both SOD1G93A and TDP-43A315T murine models [44,90,91]. The aberrant morphology may result from a cascade of events involving the mutated protein aggregates. For example, insoluble mutant SOD1 can aggregate in mitochondria in the spinal cord of SOD1G93A mice [92], causing the formation of vacuoles in the outer- and inter-mitochondrial membrane [93], affecting mitochondrial respiration, energy production, and ultimately, increasing the level of oxidative stress [94] (Figure 1). ALS patients with the SOD1A4V mutation show significant increases in complex I and III activity of mitochondria in the motor cortex [95,96]. The overactivation of complexes I and III increased the production of mitochondrial ROS [97] and may explain the high level of oxidative stress observed in SOD1 mice and patients.

The G4C2 hexanucleotide repeat expansion mutation (HREM) in the *C9orf72* gene explains 40–50% of familial ALS cases and 5–10% of sporadic cases [98–101]. There are several hypotheses regarding the mechanisms by which this leads to toxicity, and evidence exists for both loss and gain-of-function-mediated toxicity. One hypothesis suggests that the repeat-associated non-AUG (RAN) translation of G4C2 repeats is causal in the expression of toxic dipeptide repeat (DPR) proteins. RAN translation can occur in both sense and antisense reading frames [41], resulting in the production of five different DPRs: glycine-alanine (GA), glycine-arginine (GR), proline-arginine (PR), proline-alanine (PA), and glycine-proline (GP) [38]. Interestingly, the expression of poly-GR results in early abnormalities in the mitochondiral respiratory chain by interacting with ATP5A1, a complex V protein, and leads to its ubiquitination and degradation in *C9orf72* ALS-FTD patients [102]. Mitochondrial dysfunction [103] and an increased oxidative stress [104] are reported in fibroblasts and iPSC-derived astrocytes obtained from *C9orf72* ALS patients (Figure 1).

Nonfunctional and damaged mitochondria can be targeted by NIP3-like protein X (NIX) or PTEN-induced putative kinase protein 1 (PINK1)-E3 ubiquitin ligase parkin, then sequestered into isolation membranes and degraded after fusion with the autophagosome or lysosome [105]. Optineurin (OPTN) and TANK-binding kinase 1 (TBK1) are key actors for mitochondria engulfment [106]. Consequently, ALS mutation in *OPTN* [107] and *TBK1* [23] will affect the mitophagic flux and may lead to an accumulation of nonfunctional mitochondria over time and result in motor neuron death (see [108] for review). Taken together, except for *CHCHD10*, these studies point toward mitochondria dysfunction and damage being a downstream effect of ALS gene mutations that lead to protein aggregations and/or proteostasis dysfunction (see Section 7: Impaired Protein Homeostasis).

In addition, damage to mitochondria and alterations in their functions can disrupt calcium homeostasis, increasing the sensitivity of neurons to glutamate excitotoxicity and the risk of motor neuron damage ([109], Figure 1). Mitochondrial dysfunction can also activate proapoptotic signals [93], such as the caspase-dependent [110] or bcl-2-dependant pathways [93], and might lead to motor neuron degeneration.

#### **4. Axonal Transport**

Motor neurons have exceptionally long axons, up to 1 m in length, placing extreme demands on cellular physiological functions that rely on the axonal transport of organelles such as mitochondria or of molecules including proteins, lipids, and RNA to and from the synapse [111]. Axonal transport, as well as the conduction of electrical impulses and the maintenance of the axon structure, are heavily regulated processes linked with control of the neurofilament structure [112,113]. In both sporadic ALS ( sALS) and fALS patients, the disorganization of neurofilament networks has been reported [38].

Neurofilaments are neuron-specific intermediate filaments that are stretch-resistant and are major cytoskeleton proteins [114]. They form parallel coiled-coiled heterotetramers composed of light, medium, and heavy-weighted neurofilaments (NF-L, NF-M, and NF-H, respectively) and α-internexin or peripherin [112,114]. Eight heterotetramers form cylindrical structures known as unit-length filaments (ULFs) with the tail domains sticking out [112,114]. A series of ULFs form a filament that

matures into neurofilament after a radial compaction of the cylindrical structure [112]. Consequently, variants in the *NEFH* gene affecting the crosslinking properties of the NF-H protein may result in abnormal neurofilament accumulations and in axonal transport defects [115].

Neurofilaments form cross-bridges not only with each other but, also, with actin filaments, actin rings, and microtubules [114], constituing a protein network that might participate to the maintenance of the axon structure [114,116]. Actin polymerization requires the small actin-binding proteins profilin I and II and phosphoinositide islands localized at the membrane [117]. Mutations in the *PFN1* gene encoding profilin-I are associated with ALS [118], and the expression of mutant hPFN1G118V in a murine model resulted in dysregulated actin polymerization [119]. Consequently, the attachement of actin to the microtubules might be affected, probably impacting anterograde and retrograde transport and, thus, leading to an accumulation of fragmented mitochondria and, ultimately, to upper and lower motor neuron death ([119], Section 3 and Figure 1).

Microtubules and motor proteins such as the dynein-dynactin complex [28,120,121] and the kinesins [120,122,123] are involved in the long-distance transport of cellular cargo. Microtubules are composed of dimers of α- and β-tubulin. The alpha tubulin subtype TUBA4A is an ALS-associated protein [124], and ALS-associated mutations of *TUBA4A* lead to microtubule polymerization defects and network destabilization [124].

The dynein-dynactin complex [28,120,121], along with the kinesins [120,122,123], are key drivers of the anterograde and retrograde movements of diverse cargoes along the microtubule cytoskeleton, including organelles, vesicles, neurofilaments, AMPA and GABA receptors, and RNAs. Interestingly, mutations in *dynactin subunit 1 (DCTN1)* affecting the tertiary structure of the dynactin protein and its capacity to bind to microtubules can cause ALS [125]. When the interaction between dynein-dynactin is interrupted by the overexpression of dynamitin, axonal transport is impaired, and mice develop a late-onset motor pathology that recapitulates late-onset progressive ALS [126]. Kinesins form a superfamily of molecular motors that can be divided into three groups [120]. KIF5, a member of kinesin 1-group, is a tetramer with two kinesin heavy chains (KHCs) that contains a motor domain and two kinesin light chains (KLCs) that facilitate connections with cargo. There are three KIF5 isoforms—KIF5A, KIF5B, and KIF5C—all three isoforms being associated with the neuronal function and anterograde transport of proteins and organelles [127]. Mutations in the C-terminal of KIF5A, leading to a loss of function, are associated with ALS [128] and are suspected to disrupt the axonal transport (Figure 1). This hypothesis is supported by the defective axonal transport of mitochondria, the local accumulation of neurofilament, and the reduced axonal growth and survival observed in the primary culture of motor neurons from KIF5A−/<sup>−</sup> mice [129].

Distal axonal transport is also affected in SOD1G93A mice at an early stage, with an early decrease in kinesin expression in asymptomatic mice, followed by a decrease in dynein expression in older presymptomatic mice [130]. Defective axonal transport may contribute to the accumulation of impaired mitochondria at distal sites ([93], Figure 1), resulting in decreased ATP production and disrupted calcium homeostasisis at the neuromuscular junction, consequently leading to a distal axonopathy in SOD1G93A mice [109,131,132] and *SOD1* patients [24,28]. Kinesin-dynein machineries have been described to be affected in sporadic ALS, where KIF1Bβ and KIF3Aβ, two kinesin-related proteins, were found to be downregulated in motor cortex samples of sporadic patients [133]. However, the expression level of another kinesin-related protein, KIFAP3, is inversely correlated with sporadic ALS patient survival [134].

In conclusion, different mutations associated to ALS can directly alter the architecture and dynamics of the cytoskeleton, affecting the axonal transport machinery. Interestingly, aberrant axonal transport has also been observed in sALS patients and in fALS patients harboring mutations in non-cytoskeletal-related genes. Disrupted transport mechanisms can then affect the mitochondrial metabolism and degeneration (Section 3), as well as protein degradation (Section 7) and RNA transport (Section 8), ultimately leading to motor neuron death.

#### **5. Glutamate Excitotoxicity**

Glutamate is the most abundant neurotransmitter in the CNS and is released from presynaptic neurons into the synaptic cleft, resulting in the activation of NMDA and AMPA receptors that mediate calcium and sodium influxes in postsynaptic neurons. Excess glutamate may result in the abnormal activation of glutamate receptors, causing an excessive influx of Ca2<sup>+</sup> in the postsynaptic neuron (Figure 1), which leads to extreme neuronal firing [135], resulting in excitotoxicity, which is potentially implicated in a number of pathological conditions, including multiple sclerosis [136], Parkinson's diasease [137], and ALS [138,139]. Glutamate excitotoxicity is thought to occur as a result of defective glutamate uptake and transport mechanisms that lead to excessive neuronal Ca2<sup>+</sup> intake, aberrant Ca2<sup>+</sup> homeostasis, downstream mitochondrial dysfunction, and increased ROS production [140,141] (Figure 1). Glutamate-gated AMPA receptors are abundant in human and animal motor neurons [142,143] and are made up of four subunits, GluA1 to GluA4 (also GluR1–4) [144]. The overactivation of AMPA receptors has been shown to result in hindlimb paralysis and motor neuron degeneration in wild-type rats, highlighting the susceptibility of motor neurons to Ca2<sup>+</sup> dysregulation [145]. The Ca2<sup>+</sup> permeability of AMPA receptors is mediated by the presence of the GluA2 subunit, the absence of which, in addition to impaired transcriptional editing at the Q/R site, confers increased AMPA Ca2<sup>+</sup> permeability [146]. Interestingly, spinal motor neurons have been reported to display a reduced expression of GluA2 relative to dorsal horn neurons from the same region, providing some explanation for the selective susceptibiltiy of motor neurons in ALS [147], and GluA2 transcriptional editing has been found to be impaired in motor neurons of sporadic ALS patients relative to controls [148]. Furthermore, evidence suggests that GluA2 editing is also impaired in ALS oculomotor neurons, despite their spared function in disease. However, spared functionality has been hypothesized to be the result of increased Ca2+-binding proteins and, in particular, parvalbumin, which is highly abundant in oculomotor neurons and present at low levels in spinal motor neurons [149]. GluA2 transcriptional editing into the Ca2<sup>+</sup> impermeable subunit is mediated by adenosine deaminase acting on RNA 2 (ADAR2) activity [150,151]. A reduced ADAR2 expression has been reported in sporadic ALS patients and has been shown to result in an increased aggregation of TDP-43 in spinal motor neurons [152]. Taken together, the evidence suggests that a decreased GluA2 expression and impaired transcriptional editing in spinal motor neuron AMPA receptors is a contributing factor to the increased uptake of Ca2<sup>+</sup> and the downstream susceptibility to excitotoxicity in ALS (Figure 1). Moreover, the finding that AMPA receptor dysfunction can result in the aggregation of misfolded TDP-43 is an important finding for linking ALS pathology with the glutamate excitotoxicity hypothesis.

In addition to dysregulated GluA2 subunit function, research has reported dysfunctional glutamate transport mechanisms in ALS. Under normal physiological conditions, glutamate at the synaptic cleft is cleared by the excitatory amino acid transporter (EAAT2), which functions to maintain low levels of extracellular glutamate and, thus, prevent excessive increases in intracellular Na<sup>+</sup> and Ca2<sup>+</sup> levels [89,153]. EAAT2 is found primarely on the synaptic processes of astrocytes, and the loss of EAAT2 has been reported to induce increased extracellular levels of glutamate and cause motor neuron toxicity and muscle paralysis in animal models [154], whilst the pharmalogical stimulation of EAAT2 was found to rescue motor neuron degeneration and delay paralysis in SOD1G93A mice [155,156]. Abnormalaties in EAAT2 have been suggested to occur post-translationally after a post-mortem research highlighted no differences in EAAT2 mRNA expressions between sporadic ALS patients and controls, despite a 95% decrease in protein levels in sALS subjects [157]. Further support of EAAT2 loss and its implications in ALS were reported by a separate group who demonstrated a reduced EAAT2 immunoreactivity in anterior horn cells of sporadic ALS and lower motor neuron disease patients relative to healthy controls [158]. Together, these studies highlight the role of dysfunctional glutamate uptake and transport mechanisms in sporadic cases of ALS (Figure 1).

Excitotoxicity has also been associated with genetic forms of the disease. *SOD1* mutations have been implicated in the glutamate excitotoxicity hypothesis, and research has demonstrated an increased glutamate release [159], as well as motor neuron and inter-neuron hyperexcitability two to three months prior to motor neuron degeneration and phenotype onset in SOD1G93A mice [160]. *SOD1* mutations have also been shown to reduce the expression of astrocytic GluA2 in vitro and in vivo, thereby diminishing their ability to protect against motor neuron excitotoxicity [21]. In patients, the deterioration of neuronal dendrites was observed in sporadic and familial ALS cases, but not in healthy or Alzheimer's disease controls, leading to the suggestion that ALS is a synaptopathy [161], which is perhaps attributable to the excessive levels of glutamate observed in the CSF of patients [138,139]. Indeed, metabolomic analyses suggest that ALS patients show elevated serum levels of glutamate [32], and there is evidence that sporadic and familial ALS cases show heightened levels of cortical excitablity, which can be detected even in the presymptomatic stages in familial *SOD1* mutation carriers [162]. However, other studies have failed to find evidence for elevated glutamate levels in ALS patients [163–165]. *C9orf72* has also been implicated in the glutamate excitotoxicity hypothesis after iPSC motorneurons from ALS patients were found to have impaired autophagosome formation and aberrant accumulations of glutamate receptors [166–168]. This has been supported in vivo with *C9orf72* knockout mice showing GluR1 upregulation in the hippocampus and a greater susceptibility to excitotoxicity compared to controls [166]. In addition, *C9orf72* knockout mice demonstrated a complete loss of SMCR8 [169], a protein that functions in a complex with C9orf72 and WD40 repeat domain 41 (WDR41) to regulate membrane trafficking and autophagy [170]. The concomitant abnormalities in autophagy and aberrant accumulations of GluR1 has led to the hypothesis that *C9orf72* loss-of-function leads to an impaired clearance of excess glutamate receptors, which, in turn, results in a greater glutamate uptake and increased susceptibility to excitotoxicity (Figure 1). *C9orf72* patients have also been reported to demonstrate elevated glutamate levels in their cerebropsinal fluid (CSF), which has been hypothesized to occur as a result of DPR-mediated splicing defects to EAAT2 and subsequent impairments in glutamate clearance [168]. Research has also implicated ALS2 in the glutamate exicitotoxicity hypothesis by virtue of its interaction with Rab5 and the endosomal pathway. ALS2-/- knockout mice have been reported to show significant increases in glutamate receptor degradation, including GluA2 [171], the loss of which is believed to contribute to excitotoxicity and motor neuron degeneration. Similarly, *ALS2*−/<sup>−</sup> spinal motor neurons were found to be more susceptible to glutamate excitotoxicity as a result of reduced GluA2 at the synapses of neurons, which was attributed to an altered glutamate receptor interacting protein 1 (GRIP1) function, caused by the genetic loss of *ALS2* [171].

Although there is evidence to suggest the presence of glutamate transport and uptake defects in both sporadic and familial cases of ALS, it is unclear how these defects lead to the specific deterioration of motor neurons in disease. Furthermore, it has been 25 years since the approval of the antiglutamatergic drug riluzole for the treatment of ALS, yet there is no understanding as to why it, as well as other antiglutamatergics, including gabapentin, memantine, and ceftriaxone, fail to delay symptom progressions in ALS by more than ~three months [172,173].

#### **6. Endosomal Pathway and Vesicle Secretion**

Extracellular vesicles encompass different types such as apoptotic vesicles, microvesicles, and exosomes, all of which can affect the functionality of the recipient cells [174–177]. The last decade has seen several investigations into the relevance of exosomes to ALS, either in propagating the disease or as biomarkers [6,178–181]. In the ALS context, exosomes secreted by astrocytes, neurons, or microglia are suspected to carry neurotoxic elements such as mutated SOD1 or C9orf72-derived DPR and to be responsible for motor neuron death [178,179,182]. Interestingly, sporadic muscle cells present an accumulation of vacuole and multivesicular bodies in their cytosol, suggesting that vesicle trafficking is disrupted in these cells ([183] and personal data) and that extracellular vesicle secretion might have an important role in ALS.

Exosome biogenesis requires the formation of inward buds in the multivesicular body (MVB), followed by their fission and release as vesicles into the MVB lumen. The generation of intraluminal vesicles can be either Endosomal Sorting Complex Required for Transport (ESCRT)-dependent or ESCRT-independent. The ESCRT is composed of four complexes—ESCRT-0, ESCRT-I, ESCRT-II, and ESCRT-III—each complex acting one after the other to form intraluminal vesicles. Interestingly, charged multivesicular protein 2B (CHMP2B) is a component of ESCRT-III involved in the processing of cargo into intraluminal vesicles and is associated with ALS [184]. The dysfunction of ESCRT-III may lead to abnormal and dysmorphic endosomes [185] (Figure 2).

**Figure 2.** Protein homeostasis dysregulation. Dysregulated protein homeostasis is mediated by multiple pathways encompassing defects in autophagy, the dysregulated ubiquitin-proteasome system (UPS), endo-lysosomal pathway disruptions, or endoplasmic reticulum (ER) stress. The presence of misfolded proteins activates endoplasmic reticulum-associated protein degradation (ERAD), leading to proteasome-mediated degradation to avoid misfolded protein accumulations in the ER lumen and subsequent ER stress. Several ALS-associated gene mutations induce proteasome-mediated toxicity via the sequestration of ubiquilin and chaperone proteins involved in the UPS pathway. The proteolytic activity of the proteasome has also been demonstrated to be targeted by gene mutations in ALS. The autophagic pathway involves the formation and maturation of phagophores that engulf selected transported-cargo and form autophagosomes. Fusion with the lysosome enables the degradation of autophagosome contents. Defects in autophagy initiation and expansion, dysregulated phagophore formation, and/or impaired cargo transport are observed in ALS patients. Mutations in ALS-associated genes also cause defects in mitophagy, a specific form of autophagy. Defects in the endolysosomal have been associated with ALS gene mutations, including defective endolysosomal trafficking and altered lysosomal hoemostasis and degradation. Defects in the autophagy/lysosomal pathway may affect vesicle secretion. Genes implicated in dysregulated protein homeostasis are indicated in red. ER: endoplasmic reticulum, ERAD: endoplasmic reticulum-associated protein degradation, ALS2: Alsin, C9orf72: Chromosome 9 open reading frame 72, CCNF: Cyclin F, CHCHD10: coiled-coil helix coiled-coil helix domain-containing 10, CHMP2B: chromatin-modifying protein 2B, DCTN1: Dynactin 1, FIG4: Phosphoinositide 5-phosphatase, FUS: Fused in Sarcoma, MATR3: matrin 3, MVB: Multivesicular bodies, OPTN: Optineurin, SOD1: Superoxide dismutase 1, SPG11: Spatacsin, TDP-43: TAR DNA-binding protein 43, TBK1: TANK-binding kinase-1, UBQLN2: Ubiquilin-2, VAPB: vesicle-associated membrane protein-associated protein B, and VCP: valosin-containing protein.

Endocytosis and vesicle trafficking from one cellular compartment to another are regulated by small Rab GTPases [186]. For example, Rab5 is associated with the formation of early and late endosomes, while Rabs 11, 35, and 27 have direct roles in exosome biogenesis and secretion. Alsin is an ALS-associated protein [187] and is a guanine nucleotide-exchange factor involved in endosome motility and fusion with the lysosome [171]. Consequently, an absence of alsin expression in hippocampal neurons leads to an accumulation of Rab5-positive vesicles and an enhanced lysosome-mediated

degradation, suggesting an enhanced degradation of endosomal vesicles [171], thus probably affecting the production and secretion of exosomal vesicles.

The C9orf72 protein structure presents some similarities with the Differentially Expressed Normal versus Neoplastic (DENN) guanine nucleotide exchange factor and, thus, may activate Rab proteins [101] such as RAB8A and RAB39B [188]; Rab1a [34]; or Rabs 1, 7, and 11 [189], which are associated with autophagy and vesicle-trafficking processes [190–192], as well as exosome biogenesis and secretion [186]. In *C9orf72* knockdown cell lines [189], a *C9orf72* knockout murine model [193], and in ALS patient fibroblasts and iPSC-derived motor neurons [194], transgolgi and endosomal trafficking were reduced, a defective autophagy pathway was observed [34,194], and exosomal secretion was affected [194].

Vesicle-associated membrane protein-associated protein B (VAPB) is involved in vesicle trafficking between the endoplasmic reticulum and the golgi apparatus [195,196]. Interestingly, VAPB has been described to interact with RAB7 and colocalize with CD63 [197], two proteins involved in late-endosome formation and exosome biogenesis [186]. However, the impact of ALS-associated VAPB mutation in exosome biogenesis and secretion still needs to be investigated.

Multivesicular body formation is at a crossroad between the autophagy (Figure 2) and secretion pathways, and an autophagic failure may lead to cell secretion [198]. The *VCP* gene is associated with ALS [199] and encodes for a valosin-containing protein, an ubiquitous AAA+ ATPase that interacts with clathrin to form early endosomes but, also, with the autophagy pathways [200]. In this context, a mutation in a valosin-containing protein (VCP) may affect the endosomal pathway, and one can hypothesize that it has an impact on the formation and secretion of exosomes. Other gene mutations associated to ALS, such as protein polyphosphoinositide 5-phosphatase (FIG4) [201,202] or spastacsin (Spg11) [203,204], are associated with the blockade of lysosomal clearance (see Section 7)—the blockade of which could potentially lead to vesicle secretion [198]. However, futher investigations related to exosome pathways in ALS in vivo are needed.

#### **7. Impaired Protein Homeostasis**

Protein aggregates positive for TDP-43 [36,205], neurofilament [41], FUS [87], or SOD1 [206] are observed in the vast majority of ALS patients, with TDP-43 being present in as many as 98% of sporadic and familial cases [207], meaning that the presence of such aggregates is widely regarded as a hallmark feature of ALS pathology. These deposits can occur in the cytoplasm of neurons [208] and skeletal muscle [99,209], and their presence is highly suggestive of an imbalance between protein synthesis and degradation pathways (Figure 2).

#### *7.1. Proteasome and Autophagic Degradation Pathways*

In the late 1960s and early 1970s, the presence of protein inclusions in the anterior horn cells of sporadic and familial ALS patients was described [210–212]. Later, these inclusions were found to be ubiquitin positive [35], and SOD1 was the first ALS-associated protein found to be immunoreactive within the inclusions of familial patients [213]. Subsequently, ubiquinated inclusions have often been found to be immunoreactive for the ubiquitin-binding protein p62 [214], and up to 98% of sALS and fALS cases show inclusions that are TDP-43-positive [215], with the exception being *SOD1* [216] and *FUS* [216] patients who do not demonstrate TDP-43 inclusions but do demonstrate SOD1 and FUS immunoreactive inclusions. Other ALS proteins that have been implicated in the formation of cytoplasmic inclusions include optineurin (OPTN) [107], ubiquilin 2 (UBQLN2) [217], dynactin 1 (DNCT1) [218], valosin-containing protein (VCP) [219], and matrin 3 (MATR3) [220]. Studying the structure of the main proteins SOD1, FUS, and TDP-43 helped to unravel potential mechanisms involved in protein misfolding and self-propagation within the cells and in surrounding cells. SOD1 is a stable homodimer, thanks to the intrasubunit disulfide bond and its ability to bind zinc and copper. However, a reducing and metal-poor intracellular environment or mutations [221–227] can abolish these features and destabilize SOD1, leading to the formation of aggregates and amyloid fibril

structures [228–231] that can self-propagate in vitro [229,230]. FUS and TDP-43 proteins possess a low complexity domain that presents similarities with yeast prions [209] and can form large aggregates and amyloid fibril structures [209,229,232,233]. Interestingly, mutated forms of FUS and TDP-43 can induce the misfolding of wild type forms of FUS and TDP-43, respectively [229], and have also been shown to induce the misfolding of wild type forms of SOD1 in vitro [234]. Altogether, these studies suggest a potential mechanism for the self-propagation of misfolded proteins in vitro—misfolded proteins that can potentially be transferred from cell to cell via secreted vesicles, thus propagating the misfolding mechanism to neighboring cells (see Section 6, [178,229,235]). The presence of these protein aggregates has been suggested to impair the proteasome and autophagic degradation pathways and could be key mediators in ALS pathogenesis [38,236,237] (Figure 2).

Dysregulation of the Ubiquitin–Proteasome System (UPS) in ALS patients was first suspected following the identification of mutations in genes encoding ubiquilin 2 [238] and VCP [199], two proteins involved in protein clearance via the ubiquitin-proteasome pathway [239]. Mutations in *OPTN* [240] and *SQSTM1*/*P62* [241] were then identified, and following this, *SOD1* [236], *VABP* [242], *C9orf72* [25,208], and *CCNF* (cyclin F) [243] mutations were all reported to reduce UPS activation. Ubiquitin-positive inclusions were observed in post-mortem neuronal and muscular tissues of fALS and sALS patients [35] and, more specifically, in *C9orf72* patients [244]. Similarly, SOD1 [245], FUS [87], ubiquilin 2 [246], and C9orf72-derived DPR proteins [247] can generate toxic aggregates positive for some proteasome components [248]. These ubiquitin-positive inclusions can also contain and "trap" nonmutated forms of SOD1 [48], TDP-43 [205], optineurin [107], and ubiquilin 2 [249], thus exacerbating the already disrupted cellular homeostasis in ALS.

The degradation of ubiquitinated proteins through the autophagy/lysosomal pathway occurs in four steps: (1) the initiation and extension of the bilayer vacuole into phagophores; (2) the transport of selective cargoes (including ubiquitinated proteins, dysfunctional mitochondria, and protein aggregates); (3) the maturation into autophagosomes; and (4) fusion with low pH lysosomes to form autolysosomes where degradation of the cargoes can proceed [250].

The fusion of the endosome with the lysosome for degradation is an tightly regulated event [251] involving the protein polyphosphoinositide 5-phosphatase, FIG4 [252]. Deleterious mutations in *FIG4* in ALS leads to abnormal lysosomal storage [201,202]. Spastacsin is also involved in lysosomal clearance, and the absence of *Spg11* expression impaired the lysosomal-autophagy pathway and is accompanied by an accumulation of lipid within the lysosomes [203,204]. Nonfunctional TBK1 [253] and p62 [254] inhibit the transport of targeted cargoes toward the autophagosome. Interestingly, impaired autophagosome maturation was also observed in ALS cells mutated for *FUS* [255], *VCP* [89], *CHMP2B* [184], and *OPTN* [256]. The importance of autophagy can be observed in studies that stimulate autophagic activation in the presence of ALS mutations. For instance, the stimulation of autophagy in murine and human iPSC-derived neurons expressing *TARDBP* mutations demonstrated a greater clearance of TDP-43 aggregates relative to nonstimulated cells and resulted in improved motor neuron survival [257].

*C9orf72* mutations can interfere with the autophagy pathway at several levels. When *C9orf72* expression is abolished or decreased as suggested by the haploinsufficiency hypothesis [258,259], autophagy is inhibited [34,189,193,194], leading to simultaneous increases in the number of cytoplasmic inclusions immunoreactive for ubiquitin, p62, and TDP-43 [34,101,260]. The impairment of autophagy in *C9orf72* cells may also result in the accumulation of cytotoxic DPR proteins encoded by the G4C2 HREM and, ultimately, lead to neuronal loss [261]. Similarly to cells expressing *TARDBP* mutations [257], the stimulation of autophagy abolished the accumulation of poly-DPR proteins and neuronal toxicity [261].

Altogether, these studies illustrate the importance of autophagy for the efficient clearance of misfolded and aggregated proteins and is indicative of the underlying impairments in proteostatic mechanisms that mediate ALS physiopathology (Figure 2).

#### *7.2. Endoplasmic Reticulum Stress*

During the formation of misfolded proteins, the unfolded-protein response (UPR) may be initiated to transport defective proteins to the endoplasmic reticulum (ER), where ER-resident chaperones will properly fold the protein [262]. The accumulation of misfolded proteins in the ER activates the ER stress response pathway, also known as the endoplasmic reticulum-associated protein degradation (ERAD) pathway. The ERAD pathway involves the translocation of misfolded proteins from the ER lumen to the cytosol, where they undergo ubiquitination and degradation through the ubiquitin-proteasome pathway [262]. In ALS patient cells, mutated SOD1 agregates were observed in ER and colocalized with UPR markers, leading to an increase in ER stress [263] by interacting with ER stress response proteins and inhibiting their function in the ERAD response [236]. The presence of poly (GA) aggregates, observed in neuronal post-mortem *C9orf72* ALS patients, can inhibit proteasome activity and induce ER stress, which can be abolished when using ER stress inhibitors such as salubrinal and tauroursodeoxycholic acid (TUDCA) [208]. Concordantly, the cerebropsinal fluid (CSF) of sporadic ALS patients displays an accumulation of ER stress markers [263], and when healthy neurons were exposed to patient CSF, the ER became fragmentated and caspase-dependent apoptosis was activated, suggesting an increase in ER stress [264].

Vesicle-associated membrane protein-associated protein B (VAPB) is localized in the endoplasmic reticulum membrane and has a key role in vesicle trafficking between the endoplasmic reticulum, golgi apparatus, and the nuclear envelope [195–197]. The VAPBP56S mutation associated with ALS leads to a misfolded protein that accumulates in the ER [265] and can cause a defect in nuclear envelope protein transport, leading to an aberrant nuclear envelope structure [266]. Interestingly, the accumulation of VAPB has also been observed in the endoplasmic reticulum of peripheral blood mononuclear cells of sporadic ALS [267].

Optineurin is a TBK1 partner and is involved in mitophagy (Section 3). When the association of optineurin with myosin VI is disrupted, as osberved in fALS cases associated with *OPTN* mutations, optineurin is diffused in the cytosol of neuronal cells and results in ER stress and Golgi apparatus fragmentation, as well as an inhibition of the autophagy pathway ([256], Figures 1 and 2).

Altogether, these studies suggest that protein degradation could be directly and indirectly affected in ALS, causing protein aggregation that leads, in turn, to the disruption of the function of organelles such as nuclei (Section 7.2) and mitochondria (Section 3) or to the blockage of lysosomal activity that can potentially affect cell-cell communication (Section 6).

#### **8. Aberrant RNA Metabolism**

FUS [86,268] and TDP-43 [269] are RNA-binding proteins involved in multiples steps of RNA metabolism. In ALS patients, mutations in both genes give rise to the translation of proteins frequently mislocalized to the cytoplasm [87,88,270] and, subsequently, result in downstream complications that affect RNA-processing mechanisms. Dysregulated RNA metabolism is another key feature of ALS pathogenesis and includes transcription defects, alternate splicing changes, miRNA biogenesis, stress granule formation, and RNA nucleocytoplasmic transport (Figure 3).

**Figure 3.** RNA and miRNA biogenesis defects in ALS. Many processes in RNA and miRNA pathways are disrupted in ALS patients, including transcription defects, alternate splicing events, miRNA biogenesis, and nucleus-cytosol transport impairment. RNA metabolism defects are particularly relevant in ALS pathogenesis, since TDP-43 and FUS are both well-known ALS-associated genes involved in RNA processing. Both FUS and TDP-43-mutated proteins mislocalize to the cytoplasm of ALS motor neurons, leading to a probable loss and/or toxic gain-of-function of these proteins. ANG: Angiogenin, ATXN2: Ataxin-2, C9orf72: Chromosome 9 open reading frame 72, DCTN1: Dynactin 1, eIF2α: Eukaryotic translation initiation factor 2A, ELP3: Elongator protein 3, FUS: Fused in Sarcoma, G3BP1: Ras GTPase-activating protein-binding protein 1, hNRNPA1: Heterogeneous nuclear ribonucleoprotein A1, hnRNPA2B1: Heterogeneous nuclear ribonucleoprotein A2B1, MATR3: matrin 3, NEFH: Neurofilament heavy subunit, PABP1: Polyadenylate-binding protein 1, PFN1: Profilin, SETX: Senataxin, SOD1: Superoxide dismutase 1, TDP-43: TAR DNA-binding protein 43, and TIA-1: TIA1 Cytotoxic Granule-Associated RNA-Binding Protein.

#### *8.1. RNA Splicing and Translation*

Given the large number of possible protein-protein interactions between *FUS* or *TDP-43* and their partners, it is easy to expect alterations to important RNA-processing mechanisms in ALS patients [89]. *FUS* and *TDP-43* also regulate the expression of multiple proteins involved in neuronal physiology, including components of the synaptic plasticity pathways [39,271,272] and dendritic branching processes [272–274]. In addition, the HREM in *C9orf72* generates repeat RNA and RNA foci, which repress the gene expression of RNA metabolism regulators (such as *hnRNPA3*) [275] or sequester TDP-43 [275,276] and FUS [275] proteins and, thus, indirectly inhibits the transcription of RNA metabolism-associated genes. Similarly to *C9orf72*-mediated RNA-processing defects, *FUS* mutations have been associated with major transcriptional defects [268].

Ataxin-2 is a polyglutamine (polyQ) protein that is involved in mRNA translation, and it interacts with RNA-binding proteins such as TDP-43 and FUS [277]. In ALS spinal cords, ataxin 2 exhibited significant cytoplasmic accumulation and enhanced the toxicity of TDP-43 in Drosophila via RNA binding [59].

*TDP-43*, *FUS*, *hnRNPA1*, *hnRNPA2B1*, and *MATR3* are associated with ALS and involved in pre-mRNA processing [220,278–281]. Consequently, *TDP-43* knockdown in murine tissues results in the alternate splicing dysregulation of numerous mRNA transcripts [42,282], and the loss-of-function of *FUS* also induces many splicing defects [39], suggesting important alternate splicing events in ALS patients (Figure 3). These downstream complications are not surprising given the ability of the protein FUS to sequester numerous components of the splicing process, such as key splicing factors [283] U1 snRNP and U11/U12 snRNPs [274,284], which are involved in minor intronic splicing. Alternative splicing changes have been identified in neuronal genes involved in cytoskeleton organization, axonal growth, and guidance in *FUS*-mutated ALS patients [285,286], and, interestingly, axonopathy and axon retraction occur in the early stages of ALS [287].

ALS-linked MATR3S85C and MATR3P154S mutations were observed to affect Matrin 3 interactions with the TRanscription and EXport (TREX) protein complex, altering the global nuclear export of mRNA [288]. As a result, mRNA is sequestered within the nucleus, causing export defects of TDP-43 and FUS mRNA [288], which may affect mRNA splicing directly [289] and indirectly [278] (Figure 3). Consequently, as observed in the *MATR3S85C* murine model, dysfunctional MATR3 may lead to astrocyte and microglia activation and result in spinal motor neuron degeneration [290].

The *ELP3* gene encodes for elongator protein 3, a histone acetyltransferase subunit of the RNA polymerase II elongator complex responsible for RNA translation (Figure 3). Mutations in the *ELP3* gene are associated with ALS [291,292] and result in the shortening and abnormal branching of motor neurons, as observed in *ELP3* knockdown in zebrafish embryos [291], and altered tRNA modification, triggering proteome impairment and the subsequent aggregation of susceptible proteins [292].

Angiogenin, encoded by the hypoxia-inducible gene *ANG*, is a member of the pancreatic ribonuclease superfamily [293] and, as well as angiogenesis, is also involved in ribosomal biogenesis [294,295]. Defects in this protein are associated with the impairment of its nuclear localization and diminished ribonucleolytic activity [295] (Figure 3), both of which are essential for normal ANG functioning and motor neuron viability.

Together, these findings suggest that RNA processing is a key pathway affected in ALS, either due to mutations directly affecting proteins involved in RNA processing or as a consequence of protein aggregations.

#### *8.2. RNA Foci*

Sense and antisense RNA generated from the bidirectional transcription of G4C2 repeats have been proposed to induce a toxic gain-of-function in ALS *C9orf72* patient cells by forming RNA foci that may sequester RNA-binding proteins, thus disrupting RNA metabolism and processing in cells [101] (Figure 3) widely throughout the central nervous system [296,297]. Both sense and antisense RNA foci are frequently observed in nucleoli, with antisense RNA foci being denser [296]. In addition, the antisense RNA foci correlate with TDP-43 aggregation in the cytosol of *C9orf72* motor

neurons [296,297]. An in situ hybridization of post-mortem *C9orf72* ALS tissue revealed that 78.7% of the neurons and 24.9% of the glial cells in the motor brain and spinal cord regions were positive for antisense RNA foci [297]. Interestingly, extra-motor brain regions also show a high percentage of cells positive for antisense RNA foci, with 89.4% of neurons and 46.1% of glia being positive [297].

#### *8.3. Epigenetic Modulation*

Epigenetic mechanisms such as microRNA regulation maintain cell type and tissue identity and may be involved in the onset and progression of neurodegenerative diseases, including ALS. The decreased expression of miRNAs, including miRNAs let-7e, miR-148b-5p, miR-577, miR-133b, and miR-140-3p, were observed in post-mortem spinal cords of sporadic ALS patients [298], suggestive of impairment in the genes and pathways associated with miRNA biogenesis, neuroinflammation, and apoptosis.

Interestingly, the class II ribonuclease, Drosha, interacts with TDP-43, FUS, and *C9orf72*-mediated DPRs [299–301], while the Dicer enzyme interacts with TDP-43 protein [301] and FUS can interact with pri-miRNA [302] (Figure 3). Consequently, mutated *TDP-43* may impair the post-transcriptional regulation of miRNAs and lead to an altered expression of miR-132-3p and miR-132-5p (involved in the regulation of neuronal outgrowth [301]), miR-143-3p and miR-143-5p (involved in myoblast cell differentiation [303]), miR-558-3p (involved in neurofilament stability [304]), and miR-574-3p (associated with stroke [305]) [301]. Similarly, the downregulation of FUS in a neuroblastoma cell line had a considerable impact on the biogenesis of miRNAs, with an altered expression of miR-9, miR-125b, and miR-132 implicated in neuronal differentiation, activity, and function [302], while mutated *FUS* affected the expression levels of miR125 and miR192 [302], which are involved in early neural conversion [306] or senescence [307].

Altogether, these findings are consistent with defective miRNA processing in ALS patients, which may affect downstream pathways with an impact on motor neuron survival.

#### *8.4. Stress Granules and Nucleocytoplasmic Transport*

In response to stressful conditions, RNA granules, also known as stress granules, are generated and can recruit FUS and TDP-43 [41,308]. Mutations in *FUS* and *TDP-43* can increase the persistence of stress granules in the cytoplasm, resulting in a possible toxic gain-of-function [237] by inhibiting mRNA translation and, thus, contributing to the progression of ALS pathology (Figure 3). The heterogeneous nuclear ribonucleoprotein particle proteins hnRNPA1 and hnRNPA2B1 are RNA-binding proteins and binding partners of TDP-43 and are involved in RNA processing, including miRNA maturation, the nucleocytoplasmic transport of mRNA, and RNA metabolism [300,309]. Mutations in the prion-like domains of hnRNPA2/B1 and hnRNPA1 increase fibril formation and aggregation potential, as well their hyperassembly into stress granules [310,311]. Stress granules are then targeted to the lysosome by the autophagic machinery involving VCP. Indeed, the pharmacological inhibition or RNAi knockdown of VCP is accompanied by reduced stress granule clearance, while Hela cells expressing VCPA232E and VCPR155H mutations showed a constitutive appearance and accumulation of stress granules containing TDP-43 [312]. Concordantly, the ALS-VCP mutation is accompanied by an increase in stress granules [313].

In *C9orf72* patients, stress granules are also involved in the sequestration of proteins required for effective nucleoplasmic transport, such as RAN GAP [40], or importing and exporting proteins [40,41]. The impairment of the nucleoplasmic transport of molecules in *C9orf72* ALS cells is controverisal. Indeed, while some studies observed that newly formed DPRs such as poly-PR can bind to nuclear pore transporters, thereby impairing the subsequent translocation of molecules [314], other studies did not observe any disruption in the nucleocytoplasmic transport with poly-GR or poly-PR [314]. However, with the expression of poly-GA, defects were observed both in import and in export in a SH-SY5Y cell line and in iPSC-derived motor neurons, respectively [101].

#### **9. Concluding Remarks**

Over 150 years have passed since ALS was first reported by Charcot, and still, the etiology of the disease remains elusive. Although research is progressing and genetic studies continue to identify novel gene mutations in familial cases of ALS [315], many questions remain surrounding the pathological mechanisms associated with already established mutations, their roles in the disease phenotype, and the as-yet-undiscovered mechanisms that underly sporadic onset. The most investigated mechanisms revolve around neurocentric deficits in dysfunctional mitochondria and oxidative stress, axonal transport, glutamate excitotoxity, protein homeostasis, and RNA processing (Figure 4).

**Figure 4.** Summary of the different molecular and cellular mechanisms involved in ALS pathogenesis.

Among the most studied and well-established pathways are: oxidative stress, mitochondrial dysfunction, axonal transport, glutamate excitotoxicity, endosomal and vesicle secretions, protein homeostasis, and RNA metabolism. One pathway may lead to another, exacerbating the disruption of cellular homeostasis. The disruption of these pathways can lead to microglia activation, neuroinflamation, astrocytosis, and, ultimately, to motor neuron death and muscle denervation.

By detailing, as has been done in this review, the molecular events of the various pathways that are implicated in ALS, it becomes clear that these pathways can be linked to each other—in some cases, with one leading to another. For example, disrupted axonal transport can lead to an accumulation of nonfuctional mitochondria, while ATP deficiency and increased oxidative stress may damage proteins and DNA, which, in turn, could exacerbate the disruption of cellular homeostasis, leading to motor neuron death (Figure 4). These pathways are disrupted not only in motor neurons [24] but, also, in astrocytes [316,317], microglia [318,319], peripheral blood cells [43,320], and muscle [37,321–323], suggesting multisystemic [6] involvement in motor neuron death. Thus, by considering ALS from the perspective of shared molecular pathways [6,324], a cohesive understanding may yet emerge of the cellular mechanisms driving this pathology. It may be that different molecular pathways correspond to sub-strata of patients, such as among those with known genetic forms of ALS, as suggested in [6]. However, the identification of these strata may prove to be extremely challenging in non-monogenic forms of the disease [324].

**Author Contributions:** Writing—original draft preparation, L.L.G., E.A., O.C., U.G.V., and S.D.; writing—review and editing, W.J.D. and S.D.; and funding acquisition: S.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financed by the European Union Regional Development Fund (ERDF) EU Sustainable Competitiveness Programme for N. Ireland, Northern Ireland Public Health Agency (HSC R&D) & Ulster University (PI: A Bjourson). LLG was a recipient of the ArSLA PhD fellowship, OC was a recipient of the PhD DELL fellowship, EA was a recipient of the Vice-Chancellor's Research Scholarships, and UGV was a recipient of the TargetALS fellowship (PI: S Duguez).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Review*
