*Article* **Assessing Blood-Based Biomarkers to Define a Therapeutic Window for Natalizumab**

**Júlia Granell-Geli 1,2, Cristina Izquierdo-Gracia 3, Ares Sellés-Rius 1, Aina Teniente-Serra 1,2, Silvia Presas-Rodríguez 3, María José Mansilla 1,2, Luis Brieva 4, Javier Sotoca 5, María Alba Mañé-Martínez 6, Ester Moral 7, Irene Bragado 3, Susan Goelz 8, Eva Martínez-Cáceres 1,2,\*,† and Cristina Ramo-Tello 3,\*,†**


**Abstract:** Natalizumab is a monoclonal antibody that binds CD49d. Although it is one of the most effective treatments for Relapsing-Remitting Multiple Sclerosis (RRMS), a dosing regimen has not been optimized for safety and efficacy in individual patients. We aimed to identify biomarkers to monitor Natalizumab treatment and to establish a personalized dose utilizing an ongoing longitudinal study in 29 RRMS patients under Natalizumab with standard interval dose (SD) of 300 mg/4 wks or extended interval dose (EID) of 300 mg/6 wks. Blood samples were analyzed by flow cytometry to determine CD49d saturation and expression in several T and B lymphocytes subpopulations. Each patient was analyzed at two different timepoints separated by 3 Natalizumab administrations. Natalizumab and sVCAM-1 levels in serum were also analyzed using ELISA. To determine the reproducibility of various markers, two different timepoints were compared and no significant differences were observed for CD49d expression nor for saturation; SD patients had higher saturation levels (~80%) than EID patients (~60%). A positive correlation exists between CD49d saturation and Natalizumab serum levels. CD49d expression and saturation are stable parameters that could be used as biomarkers in the immunomonitoring of Natalizumab treatment. Moreover, Natalizumab and sVCAM-1 serum levels could be used to optimize an individual's dosing schedule.

**Keywords:** multiple sclerosis; natalizumab; extended interval dose; biomarker; CD49d; sVCAM-1; immunomonitoring; personalized dose

#### **1. Introduction**

Natalizumab (NTZ) is a humanized IgG4κ monoclonal antibody that selectively binds by allosteric antagonism to α4-integrin (CD49d), preventing leukocyte migration into the central nervous system (CNS) in multiple sclerosis (MS) patients [1]. α4-integrins form heterodimers with β-subunits [β1 (CD29) and β7] to form functional molecules [2]. α4β1 (VLA-4) and α4β7 are located on leukocytes surface and interact with VCAM-1

**Citation:** Granell-Geli, J.;

Izquierdo-Gracia, C.; Sellés-Rius, A.; Teniente-Serra, A.; Presas-Rodríguez, S.; Mansilla, M.J.; Brieva, L.; Sotoca, J.; Mañé-Martínez, M.A.; Moral, E.; et al. Assessing Blood-Based Biomarkers to Define a Therapeutic Window for Natalizumab. *J. Pers. Med.* **2021**, *11*, 1347. https://doi.org/10.3390/jpm 11121347

Academic Editor: Michael Uhlin

Received: 2 November 2021 Accepted: 3 December 2021 Published: 10 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and MAdCAM-1, respectively, for the firm adhesion of leukocytes to endothelial cells, a necessary step for leukocyte extravasation into the inflamed tissue.

The interaction of VLA-4 with VCAM-1 not only facilitates adhesion of leukocytes to the endothelium enabling the transmigration of circulating leukocytes across the bloodbrain barrier (BBB) [3,4], but also can increase the activation and proliferation of lymphocytes [5,6]. This process leads to a cascade of local chemokines and cytokines that activates more lymphocytes and further promotes adhesion and transmigration of immune cells into the inflamed tissue [7,8]. In addition, pro-inflammatory factors released in autoimmune conditions such as MS can increase the expression of VCAM-1 on the endothelial cell surface allowing leukocyte binding to the BBB which, in turn, promotes the release as soluble VCAM-1 (sVCAM-1) [9]. This suggests that serum levels of sVCAM-1 could be a marker of immune cells binding to the endothelial barrier as well as endothelial barrier activity.

NTZ is generally administrated intravenously at 300 mg every 4 weeks in relapsingremitting MS (RRMS) patients. Although it is one of the most effective treatments [10] its use is associated with a very severe side effect, the risk of developing Progressive Multifocal Leukoencephalopathy (PML) [11]. PML is an uncommon and severe opportunistic brain infection caused by the reactivation of the neurotropic John Cunningham virus (JCV), as a consequence of immunosurveillance debilitation [10,12]. JCV is present in ~50–70% of the population [13], as evidenced by the presence of anti-JCV antibodies in serum. It may remain asymptomatic throughout life, being generally considered as non-pathogenic [14,15]. However, it can become neurotropic and cause PML and demyelination of axons as a consequence of lytic infection of the myelin-producing oligodendrocytes. The prognosis for PML is often bleak, with a high fatality rate [13]. Though it is extremely rare (only 0.2 per 100,000 of the general population) [13], the PML risk becomes significant when a patient is immune compromised or is treated with a therapy that can inhibit CNS immune surveillance such as NTZ.

One approach to reduce the risk of PML is to define the lowest efficacious dose for an individual patient; the premise being that this would also be the safest dose. The standard dosing of 300 mg every 4 weeks maintains a maximal VLA-4 saturation, defined as >80% saturation of these receptors on PBMCs [16]. The extended dosage of NTZ is an attempt to define the saturation level of VLA-4 that maintains the clinical effectiveness of the drug but allows a slightest increase in CNS immunosurveillance in order to reduce PML risk [17]. It has been reported that patients positive for anti-JCV antibodies receiving NTZ in extended interval dose (EID) appear to have a lower risk of PML compared with those with the standard dose (SD) [18].

In this study, we aimed to identify biomarkers to monitor NTZ treatment and to establish a personalized dose in NTZ-treated patients. Unlike previous studies where PBMCs were used, we have validated CD49d saturation and expression on T-cells (CD4+ and CD8+) and B-cells to be used as biomarkers to monitor and personalize the treatment by two different protocols. We have also explored the use of sVCAM-1 in serum as a biomarker to monitor MS disease activity. In addition, we have studied the correlation between CD49d saturation and NTZ levels in serum.

#### **2. Materials and Methods**

#### *2.1. Study Design*

This is a pilot, multicentric, prospective, open study in RRMS patients treated with NTZ, performed in the Multiple Sclerosis Unit of the Hospital Germans Trias i Pujol (Badalona), the Hospital Mútua de Terrassa (Terrassa), the Hospital Arnau de Vilanova (Lleida), the Hospital de Sant Joan Despí Moisès Broggi, the Hospital Joan XXIII (Tarragona), and the Hospital de Mataró.

Expanded disability scale score (EDSS) and annualized relapse rate (ARR) were obtained during clinical visits. Patients under NTZ treatment from 18 years old were included in the study and were classified in 2 groups. The first group included patients under intravenous NTZ treatment that were clinically or radiologically active in SD of 300 mg/4 wks with at least 13 uninterrupted doses. The second group included patients under NTZ treatment for at least 6 months in EID of 300 mg/6 wks that were clinically or radiologically active (Active) or remained clinically and radiologically stable (Inactive). Clinically active patients were defined as those who presented a relapse at some point under Natalizumab treatment, and as radiologically active patients those who presented at least two new lesions in T2 brain MRI sequences or one new gadolinium lesion at some point during the treatment, but not specifically during our study.

The assignment of each patient to a specific therapeutic strategy was previous and independent regarding the participation of the patient to the study. According to our daily clinical practice, all patients started the treatment with Natalizumab in SD and after 13 infusions it was proposed to switch to EID if they did not have clinical and radiological activity. Whether patients showed clinical or radiological activity, they remained in SD schedule. Blood samples were obtained before every NTZ infusion in a first timepoint (V1) and a second timepoint after 3 NTZ administrations (V2). A total of 30 mL of peripheral blood were extracted by venipuncture (10 mL in a serum-separator tube and 20 mL of whole blood in EDTA tube). All patients gave written informed consent to participate in the study and approval was obtained from the corresponding local Ethic Committees.

Patients who had planned to withdraw NTZ treatment during the period of the study, who will not be able to comply with the study procedures, having suffered a relapse during the 30 days prior to the baseline visit, or an infection that had required more than symptomatic treatment during the 30 days prior to the baseline visit were excluded from the study.

#### *2.2. Flow Cytometry*

Whole blood samples were collected in EDTA tubes, kept at room temperature, and processed within the next 24 h. Several parameters were analyzed in peripheral blood by multiparametric flow cytometry by two different protocols performed in parallel.

#### 2.2.1. Whole Blood Analysis

Quantification of CD49d and bound NTZ molecules was performed by Quantitative Flow Cytometry on T (CD4+ and CD8+) and B (CD19+) lymphocytes following a protocol set in our lab [19]. Tracking and calibration of the flow cytometer was performed using Rainbow 6 Peak calibration particles and QuantiBRITE phycoerythrin (PE) beads (BD Bioscience, Franklin Lakes, NJ, USA) before sample acquisition. Briefly, 5 mL of peripheral blood were lysed with non-fixing ammonium chloride-based lysing reagent (FACSLysing Solution®, BD) for 10 min. A total of 100,000 cells per tube were incubated 1 or 2 times (depending on the labelling) for 20 min at room temperature with pre-titrated amounts of the monoclonal antibodies anti-CD3 V450 (clone UCHT1, BD), anti-CD4 FITC (clone SK3, BD), anti-CD8 APC-H7 (clone SK1, BD), anti-CD19 PerCP-Cy5.5 (clone SJ25C1, BD), and either anti-CD49d (clone 9F10, BD) or huIgG4 Fc PE (clone HP6025, Southern Biotech) to measure bound NTZ. After two washes with PBS, lymphocytes were acquired on a FAC-SCanto II flow cytometer (BD Bioscience), and all samples were analyzed with FACSDiva software (BD Bioscience) (see analysis in Figure A1). In parallel, a Fluorescence Minus-One (FMO) was performed, for each sample, to establish a cut-off between the negative and the positive populations for huIgG4 and CD49d fluorescence signal. Quantification of huIgG4 and CD49d surface molecules was performed according to the instructions of QuantiBRITE manufacturer. The molecules per cell surface were obtained by linear regression, and the CD49d receptor occupancy (RO) was calculated as the percentage of NTZ bound to CD49d with the following formula: [(bound NTZ molecules)/(total CD49d molecules)] × 100.

#### 2.2.2. PBMCs Analysis

A total of 15 mL of peripheral blood were diluted with PBS and PBMCs were isolated by Ficoll-Paque Plus (density 1.077. GE Healthcare). After 2 washes, PBMCs were dis-

tributed into wells of 96-well plate (100,000 cells/well). Cells were first incubated with an Fc Block for 20 min at 4 ◦C and washed. Then they were incubated for 30 min at 4 ◦C with the corresponding amount of the monoclonal antibodies anti-IgD FITC (clone IA6-2, BD), anti-CD45RA PerCP-Cy5.5 (clone HI100, Biolegend), anti-CD197 (CCR7) BV421 (clone G043H7, Biolegend), anti-CD19 BV510 (clone HIB19, Biolegend), anti-CD49d BV711 (clone 9F10, Biolegend), anti-CD3 BV605 (clone SK7, BD), NTZ-AF647 APC (Biogen), anti-Integrin β1 (CD29) Alexa700 (clone TS2/16, Biolegend), anti-CD8 APC-H7 (clone SK1, BD), anti-Integrin β7 PE (clone FIB504, Biolegend), anti-CD27 PE-CF594 (clone M-T271, BD), anti-CD4 PE-Cy7 (clone OKT4, Biolegend). After two washes with PBSA, lymphocytes were acquired on a LSRFortessa flow cytometer (BD Bioscience), and all samples were analyzed with FlowJo software (BD Bioscience) (see analysis in Figure A2). UltraComp eBeadsTM Compensation Beads (ThermoFisher, Waltham, MA, USA) were used to compensate each individual fluorochrome. FMO was performed for each sample to establish a cut-off between the negative and the positive populations for CD49d, NTZ-AF647, CD29 and α7-Integrin fluorescence signal.

The acquired and analyzed subpopulations were CD4+CD27+, CD4+CCR7+CD45+ (Naive), CD4+CCR7+CD45<sup>−</sup> (Central Memory (CM), CD4+CCR7−CD45<sup>−</sup> (Effector Memory (EM)), CD4+CCR7−CD45+ (Effector), CD8+CD27+, CD8+CCR7+CD45+ (Naive), CD8<sup>+</sup> CCR7+CD45<sup>−</sup> (CM), CD8+CCR7<sup>−</sup>CD45<sup>−</sup> (EM), CD8+CCR7<sup>−</sup>CD45<sup>+</sup> (Effector), CD19+CD27<sup>−</sup> IgD+ (Naive), CD19+CD27+IgD<sup>−</sup> (Switched), CD19+CD27+IgD+ (Non-switched), CD19<sup>+</sup> CD27−IgD− (Double Negative (DN)).

#### *2.3. Serum Analysis*

A total of 4 mL of serum contained in 10 mL serum-separator tubes were frozen at −80 ◦C for the determination of NTZ and sVCAM-1 levels in both first and second extractions.

NTZ was quantified using an ELISA method with a mouse anti-human IgG4 (Fc-HRP, Southern Biotech). Briefly, Coating Material (12C4, Tysabri anti-ID) was diluted from 2 mg/mL to 1.0 μg/mL in PBS and 100 μL of 1.0 μg/mL coating solution was added to each well and incubated overnight at 2 to 8 ◦C shaking at 400 rpm. After washing, 300 μL of Blocking Buffer (Thermo Scientific) was added to each well, and the plate incubated for 2 h at ambient room temperature (ART) while shaking at 400 rpm. Controls and samples were thawed and diluted at least 1/50. After washing, 100 μL of diluted controls and samples were added and incubated for 1 h at ART on plate shaker set to 400 rpm. Plate was washed 3 times and dried, and 100 μL of detection solution (1/20,000 in Casein) was added to each well and incubated for 30 min at ART on plate shaker set to 400 rpm. Finally, the plate was washed 3 times and dried, and 100 μL of TMB Substrate (Thermo Scientific) were added to each well and incubated at ART approximately for 5 min. Substrate reaction was stopped by adding 100 μL of Stop Solution (1N H2SO4, Fisher) to each well and plate reading was done within approximately 15 min of stopping the reaction using the microplate reader set to 450 nm. Standard curve was prepared in Assay buffer (2% Human Serum in Casein).

Analysis of soluble human VCAM-1 was performed following the instructions of DuoSet® ELISA Development system manufacturer.

#### *2.4. Statistical Analysis*

To test the stability of these parameters over time, first and second extractions were compared separately for each group (SD and EID) using Two-tailed paired *t*-test. The obtained values were compared between treatment groups for both first and second extraction using Two-tailed unpaired *t*-test. Two-tailed *p* values < 0.05 were considered statistically significant.

Correlation test was performed by comparing CD49d saturation levels and NTZ serum levels for the first extraction (*n* = 20) and second extraction (*n* = 14). Two-tailed *p* values < 0.05 were considered statistically significant.

All statistical analysis were performed using GraphPad Prism software (version 8.4.0; La Jolla, CA, USA).

#### **3. Results**

A total of 29 RRMS patients (72.4% females) under NTZ treatment with mean age of 44.4 ± 10.5 years and body mass index of 23.3 ± 3.6 participated in the study and were classified in 2 groups. The first group included 8 active patients (27.58%) in SD. The second group included 21 patients in EID, of which 19 patients (65.6%) were inactive and 2 patients (6.89%) were active. Demographic and clinical features of the patients are represented in Table 1.


**Table 1.** Demographic and clinical features of the 29 RRMS patients of the study.

BMI: body mass index; EDSS: Expanded Disability Status Scale; EID: extended interval dose; F: female; JCV: John Cunningham virus; MRI: magnetic resonance imaging; NTZ: natalizumab; SD: standard desviation; y: years.

#### *3.1. CD49d Is a Good Biomarker to Monitor Natalizumab Treatment*

The aim of this analysis was to assess if CD49d expression could be used as a putative biomarker to monitor the efficacy of NTZ treatment in RRMS patients.

First, we checked whether the expression levels were stable over time by comparing the two timepoints (V1 and V2) for both SD and EID patients in all T and B cell subpopulations. CD49d surface molecules and bound NTZ (Figure 1a–c) as well as CD49d saturation (Figure 1d) were determined by Quantitative Flow Cytometry in whole blood in 29 patients receiving NTZ therapy (SD, *n* = 8; EID, *n* = 21). None of the lymphocyte subpopulations showed significant differences between V1 and V2 for any of these biomarkers except for the bound NTZ levels in both CD4+ and CD8+ T cell subpopulations (Figure 1b) and the CD49d saturation in CD4<sup>+</sup> cells in patients receiving NTZ in SD (Figure 1d).

In parallel with the measurement of CD49d saturation, an alternative flow cytometry panel was performed in PBMCs to assess the percentage of positive cells for CD49d. CD4+ Effector cells were excluded from this analysis as there were very few cells to define the positive population for each marker and then to draw a consistent conclusion about their percentage of expression. No significant differences were observed between V1 and V2 in CD49d expression for any of the studied lymphocyte subsets for this marker (Figure 2).

**Figure 1.** Comparison between the first extraction (V1) and the second extraction (V2) within CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19<sup>+</sup> B lymphocytes for the CD49d surface molecules levels and saturation percentage in RRMS patients under Natalizumab treatment in SD or EID. Mean of (**a**) CD49d molecules/cell surface, (**b**) bound NTZ/cell surface, (**c**) free CD49d molecules/cell surface, and (**d**) CD49d saturation percentage in CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes in the SD (*n* = 8) and EID (*n* = 21) groups. Each dot represents the number of PE molecules per cell for each patient in either V1 and V2, translated into the levels of the corresponding antibodies (anti-CD49d PE and huIgG4 Fc PE). EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing. ns: *p* > 0.05, \* *p* < 0.05, \*\* *p* < 0.01.

**Figure 2.** Comparison of first (V1) and second (V2) extractions within CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes subpopulations in RRMS patients under Natalizumab treatment in SD or EID. Percentage of expression of in several CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes subsets. Each dot represents the percentage of expression for each marker regarding their parent population (CD4<sup>+</sup> or CD8+ T lymphocytes, or CD19<sup>+</sup> B lymphocytes populations) in the SD (*n* = 8) and EID (*n* = 21) groups. CM, Central Memory; DN, Double Negative; EID, extended interval dosing; EM, Effector Memory; NS, Non-Switched; NTZ, natalizumab; SD, standard dosing. ns: *p* > 0.05.

#### *3.2. Extended Interval Dosing Reduces CD49d Saturation*

Once the stability of CD49d saturation and expression were assessed, the differences between SD and EID groups were studied to test whether differential CD49d values could be defined. The mean (V1 and V2) of CD49d molecules per cell surface in lymphocyte subpopulations measured in whole blood was lower in patients treated with NTZ in SD schedule compared with the ones in EID (CD4+ CD49d molecules/cell surface: 1334 vs. 1535; CD8<sup>+</sup> CD49d molecules/cell surface: 1191 vs. 1558; CD19+ CD49d molecules/cell surface: 1158 vs. 1475) (Figure 3a). Conversely, no significant differences in the mean of NTZ bound molecules per cell surface in lymphocyte subpopulations was observed between SD and EID groups (CD4+ NTZ molecules/cell surface: 956.9 vs. 970.9; CD8<sup>+</sup> NTZ molecules/cell surface: 819.6 vs. 895.7; CD19<sup>+</sup> NTZ molecules/cell surface: 848.4 vs. 841.3) (Figure 3b). As a result of this lower number of CD49d molecules in SD patients, together with the same levels of bound NTZ in both groups, the percentage of CD49d saturation was higher in SD patients compared with EID patients (CD4+ CD49d% saturation: 72.31 vs. 63.82; CD8+ CD49d% saturation: 68.97 vs. 55.91; CD19<sup>+</sup> CD49d% saturation: 73.74 vs. 58.30) (Figure 3d). In addition, we also checked the amount of free CD49d as a verification with the following formula: (*total CD49d molecules*) − (*bound NTZ molecules*). As expected, mean of free CD49d molecules per cell surface in lymphocytes subpopulations was lower in SD patients than EID patients (CD4+ free CD49d molecules/cell surface: 376.6 vs. 564.0; CD8<sup>+</sup> free CD49d molecules/cell surface: 371.8 vs. 697.9; CD19<sup>+</sup> free CD49d molecules/cell surface: 309.3 vs. 634.0) (Figure 3c). The data represented here corresponds to the first extraction (V1), but similar results were also obtained for the second extraction (V2) (Figure A3).

**Figure 3.** CD49d surface molecules levels and saturation percentage in CD4+ and CD8+ T lymphocytes, and CD19<sup>+</sup> B lymphocytes in RRMS patients under Natalizumab treatment in SD or EID. Mean of (**a**) CD49d molecules/cell surface, (**b**) bound NTZ/cell surface, (**c**) free CD49d molecules/cell surface, and (**d**) CD49d saturation percentage in CD4+ and CD8+ T lymphocytes, and CD19+ B lymphocytes in the SD (*n* = 8) and EID (*n* = 21) groups. Each dot represents the number of PE molecules per cell for each patient in the first extraction (V1), translated into the levels of the corresponding antibodies (anti-CD49d PE and huIgG4 Fc PE). EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing. Ns: *p* > 0.05, \* *p* < 0.05, \*\* *p* < 0.01.

We then aimed to check whether there were significant differences between SD and EID patients in CD49d expression in PBMCs for any of the subsets to define which subpopulations showed differences in the expression of the markers due to the dosing schedule. That would allow us to establish some reference values or range of values for each group in order to monitor the patients and to have a criterion to decide their dosing schedule should be altered. SD and EID groups were compared for both the first (V1) and the second (V2) extractions separately. Significant differences between SD and EID were observed in CD8+ CD27<sup>+</sup> (Figure 4a) and CD19<sup>+</sup> DN (Figure 4b) for CD49d expression. Several additional subsets also showed significant differences in V1 or V2 (Figures A4 and A5), while the rest of subsets did not (Table 2).

**Figure 4.** Representation of SD and EID groups in some CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes subpopulations in RRMS patients under Natalizumab treatment in both first (V1) and second (V2) extractions. Percentage of expression of CD49d in (**a**) CD8+CD27+ T lymphocytes and (**b**) CD19<sup>+</sup> DN B lymphocytes. Each dot represents the percentage of expression for each marker regarding their parent population (CD4<sup>+</sup> or CD8+ T lymphocytes, or CD19+ B lymphocytes populations) in the SD (*n* = 8) or EID (*n* = 21) groups. DN, Double Negative; EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing. Ns: *p* > 0.05, \* *p* < 0.05, \*\* *p* < 0.01.

**Table 2.** Summary table of the most relevant cell-surface and serum markers analysed.


DN: double negative; EID: extended interval dose; NTZ: natalizumab; SD: standard dose; sVCAM-1: soluble vascular cell adhesion molecule-1; V1/V2: visit 1/visit 2.

#### *3.3. CD29 and β7-Integrin Are Not Good Biomarkers to Monitor Natalizumab Treatment*

In parallel with the study of CD49d saturation and expression, other surface molecules were studied as putative biomarkers to monitor NTZ treatment in PBMCs of RRMS patients. The percentage of positive cells for CD29 and β7-integrin was determined regarding all CD4<sup>+</sup> and CD8<sup>+</sup> T and CD19+ B lymphocytes subpopulations. CD4+ Effector cells were excluded from this analysis as there were very few cells to define the positive population for each marker and then to draw a consistent conclusion about their percentage of expression. First, the stability of the expression levels of these markers over time was checked by comparing the two timepoints (V1 and V2) for both SD and EID patients in all T and B cell subsets.

No significant differences were observed between V1 and V2 in any of the studied lymphocyte subsets for β7-integrin (Figure A6). Conversely, CD29 showed significant differences between extractions (V1 vs. V2) in some lymphocyte subsets for both SD and EID groups (Figure 5).

The β7-Integrin was further studied by comparing the SD and EID groups for all CD4<sup>+</sup> and CD8<sup>+</sup> T and CD19+ B lymphocytes subpopulations to assess whether it could work as a biomarker. None of the studied subsets showed significant differences between SD and EID groups.

**Figure 5.** Comparison of first (V1) and second (V2) extractions within CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes subpopulations in RRMS patients under Natalizumab treatment in SD or EID. Percentage of expression of CD29 in all CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19+ B lymphocytes subsets. Each dot represents the percentage of expression of CD29 regarding their parent population (CD4+ or CD8+ T lymphocytes, or CD19<sup>+</sup> B lymphocytes populations) in the SD (*n* = 8) and EID (*n* = 21) groups. CM, Central Memory; DN, Double Negative; EID, extended interval dosing; EM, Effector Memory; NS, Non-Switched; SD, standard dosing. Ns: *p* > 0.05, \* *p* < 0.05.

#### *3.4. Natalizumab and sVCAM-1 Are Putative Serum Biomarkers to Monitor the Treatment*

Several markers in serum were also analyzed to study whether they could be useful for the monitoring of the NTZ treatment. We had the opportunity to analyze 21 patients (SD *n* = 7 and EID *n* = 14), of which 15 patients (EID *n* = 9, SD *n* = 6) were analyzed for both first (V1) and second (V2) extractions.

First, V1 and V2 were compared to check the stability of NTZ and sVCAM-1 over time in both SD and EID groups. All conditions appeared to be stable over time except for the NTZ levels in SD condition and this may be due to the low number of subjects in this group (Figure 6).

After checking the stability of these markers, SD and EID groups were compared to study if there were significant differences for the levels of these serum parameters. Here we show the results for the first extraction as an example, as the sample size is higher than the for the second extraction. The levels of NTZ were significantly higher in SD patients than in EID patients (Figure 7a), while the levels of sVCAM-1 were significantly lower in SD patients compared with EID patients (Figure 7b).

**Figure 6.** Comparison of V1 and V2 for NTZ and sVCAM in serum in RRMS patients under Natalizumab treatment in SD or EID. Levels of (**a**) NTZ (μg/mL) and (**b**) sVCAM (pg/mL) in serum in the SD (*n* = 6) and EID (*n* = 9) groups. Each dot represents the concentration of NTZ or sVCAM in serum for each patient in either the first extraction (V1) and the second extraction (V2). EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing; sVCAM, soluble VCAM. ns: *p* > 0.05, \* *p* < 0.05.

**Figure 7.** Representation of SD and EID groups in RRMS patients under Natalizumab treatment. Levels of (**a**) NTZ (μg/mL) and (**b**) sVCAM (pg/mL) in serum in the SD (*n* = 7) and EID (*n* = 14) groups. Each dot represents the concentration of NTZ or sVCAM in serum for each patient in the first extraction (V1). EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing; sVCAM, soluble VCAM. ns: *p* > 0.05, \* *p* < 0.05, \*\*\*\* *p* < 0.0001.

#### *3.5. Natalizumab Levels in Serum Correlate with CD49d Saturation*

The correlation between CD49d saturation levels in CD4+ and CD8+ T lymphocytes and CD19+ lymphocytes and the levels of NTZ and sVCAM in serum was explored. The results showed a positive correlation between CD49d saturation and NTZ in serum for all the lymphocyte subpopulations in both timepoints (Figure 8), while no consistent correlation was observed between CD49d saturation and sVCAM in serum.

**Figure 8.** Correlation between Natalizumab serum levels (μg/mL) and CD49d saturation (%) in CD4<sup>+</sup> and CD8+ T lymphocytes and CD19<sup>+</sup> B lymphocytes in RRMS patients under Natalizumab treatment. Each dot represents a patient of the study (*n* = 29). ns: *p* > 0.05. NTZ: Natalizumab.

#### **4. Discussion and Conclusions**

NTZ is one of the most effective treatments for RRMS [20–22], but it has associated a very severe side effect, the risk of developing PML. In this study, we aimed to identify biomarkers to facilitate the development of a personalized dosing regimen for NTZ-treated patients. First, we examined the robustness of several possible cellular and serum biomarkers that could be useful to monitor and personalize NTZ treatment. Here we present data that validates the stability of CD49d saturation and expression as cellular biomarkers in both whole blood and PBMCs as well as the serum protein sVCAM-1. Second, we have explored the impact of an SD schedule and an EID schedule on these markers.

The CNS is an immune-privileged site that generally has sufficient levels of immunosurveillance to protect it against opportunistic infections and neoplastic proliferation. T lymphocytes expressing CD49d play an important role in CNS immunosurveillance. Thus, it has been proposed that CNS immunosurveillance reduction would be the factor that leads to JCV infection in the CNS and PML. One strategy that has been proposed to reduce PML risk is the use of NTZ extended dosing. Previous studies have reported that PML risk was substantially reduced with EID compared to SD, suggesting that the reduction of overall exposure to NTZ can alter PML risk [17]. Nevertheless, little is known about the impact of EID on NTZ pharmacodynamics and pharmacokinetics [19,23]. In this project we aimed to test new biomarkers to monitor NTZ treatment by evaluating the impact of different dosing schedules on NTZ blood levels, the surface expression and saturation of CD49d (as well as their partners CD29 and β7-Integrin).

NTZ binds the CD49d receptor on the surface of leukocytes leading to the saturation of this receptor and changes in its expression. For this reason, the measurement of the saturation and expression levels of CD49d can give information about NTZ binding, which ultimately should impact the effectiveness of the treatment. First, we studied which was the best way to measure such parameters and test if they are stable biomarkers over time to monitor the treatment. To assess the CD49d saturation we performed a protocol previously developed in our lab [19] that consists of a Quantitative Flow Cytometry assay where CD49d and bound NTZ molecules per cell surface are measured to define the saturation

of the receptor. The expression of both CD49d and NTZ in whole blood was successfully measured, and it was generally stable over time. Although a few patients showed a large difference in the expression of these parameters between the two timepoints, it could be due to different factors such as intrinsic variability of the patient or the technique. Overall, we consider that CD49d expression measured by this method is very consistent as most of the patients were very stable between V1 and V2. The percentage of saturation of CD49d was also assessed and was also generally found to be quite stable over time and easy to calculate. The instability observed in the CD49d saturation percentage for CD4<sup>+</sup> in SD could be explained by the instability in the bound NTZ levels (one of the parameters used to calculate saturation). Moreover, the percentage of CD49d positive cells assessed in PBMCs was also stable overtime. Thus, we demonstrated that CD49d expression and saturation could be assessed by flow cytometry either in whole blood or using PBMCs. Both CD49d saturation and expression were stable within each patient overtime making them potential biomarkers for the clinical practice.

The determination of the saturation levels of CD49d in RRMS patients may allow the definition of a safe saturation range to establish a personalized dose of NTZ for each individual, providing information about whether we must change the dosing schedule or cease NTZ treatment. To this end, we measured and compared the levels of CD49d expression and saturation between SD and EID groups. The results of our study help to describe the pharmacokinetic and pharmacodynamic differences between SD and EID treated patients contributing to better understand how EID impacts on NTZ efficacy and safety. Consistent with previous results obtained in our laboratory [19], we demonstrated that patients receiving SD show a higher percentage (approximately 80%) of CD49d saturation than those in EID (approximately 60%). Interestingly, these higher levels of saturation are not explained by the presence of higher levels of bound NTZ. In fact, both groups showed similar levels of bound NTZ for the studied timepoints, and the difference appears to be due to CD49d expression. Other studies have shown that SD NTZ patients have a decreased expression of CD49d in total PBMCs of approximately 50% and a small (~10%) increase in expression in EID patients [24–26]. Thus, in EID, CD49d expression on the cell-surface should rise as CD49d saturation is reduced indicating a dose-dependent relationship between CD49d surface expression and NTZ serum concentration [19]. In our study, we looked at the specific cell types and could observe significant increases in CD49d expression in CD8<sup>+</sup> and CD19<sup>+</sup> cells. Although not significant, the number of CD49d molecules on CD4<sup>+</sup> cells also appeared higher in EID; we attribute the non-significant result to the sample variation and sample size. When subpopulations of CD8+ and CD19<sup>+</sup> cells were assessed, CD49d expression in CD8+ CD27<sup>+</sup> and CD19+ DN showed significant increases in the EID group (Figure 4). Taking this into account, we consider CD49d expression as a putative biomarker just in those subpopulations that showed significant differences in both V1 and V2, as they were more consistent. Thus, CD49d expression could be a putative consistent biomarker when analyzing its expression in the previously mentioned CD8+ and CD19+ subpopulations.

The monitoring of CD49d levels allows the identification of patients with different CD49d saturation levels despite being in the same dosing schedule. As performed in this study, NTZ patients can be immunomonitored by Quantitative Flow Cytometry assays to identify patients with suboptimal treatment as well as patients with high levels of saturation that would benefit from EID [19]. The development of news tools for immunomonitoring, such as the one used in this study, contributes to the identification of the optimal NTZ dosing schedule to improve the clinical management and life quality for each RRMS patient. The monitoring and personalization of the treatment could reduce the visits of the patient to the hospital and would allow the patients to achieve proper levels of immunosuppression while maintaining certain levels of immunosurveillance, which could reduce PML risk and other secondary effects of the treatment.

In addition, we checked other parameters in order to search other putative biomarkers that could work as complementary indicators in CD49d monitoring. To do so, we studied the expression of the two beta subunits that form heterodimers with CD49d: CD29 (β1- Integrin) and β7-Integrin. Comparing V1 with V2 suggests that β7-integrin is in general quite stable biomarker over time while CD29 is not, as significant differences were observed between timepoints in several cellular subsets. As we were not able to establish a range of values where they showed significant differences between the two dosing schedule groups, the results suggest that neither CD29 nor β7-Integrin are unlikely to be useful biomarkers to monitor NTZ treatment. This could be probably explained by the fact that these beta subunits also associate with other alpha subunits in T cells to form heterodimers, and we are not detecting just CD49d/CD29 or CD49d/β7-Integrin but also the different heterodimers on cell-surface. Thus, further studies would be needed to determine whether they could be used to monitor NTZ.

Finally, we further studied serum to determine if there was any soluble factor that could be checked to monitor the treatment, since a serum-based biomarker would be much more easily implemented into the clinical practice. The levels of two serum proteins were assessed: NTZ and sVCAM-1. VCAM-1 is an adhesion molecule expressed mainly by in inflamed endothelial cells. It is participating in the firm adhesion of leukocytes to the endothelium, enabling the transmigration of cells into the inflamed tissue [27]. When VLA-4 binds VCAM-1, there is a shedding of the endothelial VCAM-1 that leads to the increase of the concentration of the soluble molecule (sVCAM-1) in serum. The shedding of VCAM-1 from the cell surface and the increase of sVCAM-1 in serum is not specific to the endothelial cerebrovascular cells. However, its increase in patients with MS activity strongly suggests that sVCAM-1 most probably comes from the shedding of VCAM-1 from the activated endothelium of the blood-brain barrier. Because NTZ treatment inhibits leukocyte binding to the endothelium, there is a decrease in sVCAM-1 serum concentration in NTZ treated patients [9,28]. This effect is reversed with the presence of NTZ-neutralizing antibodies in patients, especially when titers are high [28]. Previous studies have suggested a putative role of sVCAM-1 as a sensitive biomarker that could reflect the efficacy of NTZ treatment in MS patients, as its increased concentration in serum is associated with the presence of inflammatory lesions in the CNS [29,30]. As it has been shown that sVCAM-1 serum concentration positively correlates with MS clinical activity [29] and MRI activity [30,31], sVCAM-1 could be a good marker of inflammatory cells binding to the BBB and might serves as a monitoring tool for treatment efficacy. Hence, we studied the sVCAM-1 serum concentrations in SD and EID to test any difference between them that could give indirect information about the BBB cell adhesion. We observed higher levels of sVCAM-1 in blood in EID patients suggesting an increased binding of VLA-4 with its ligand VCAM-1, which may imply an increased trafficking of lymphocytes into the CNS. This could suggest that sVCAM-1 could work as a biomarker to monitor NTZ treatment.

As expected, we observed that patients in SD show higher levels of NTZ in serum than patients in EID, which is in accordance with the administration schedule as patients receiving the treatment more often (4 weeks) have less time to clear NTZ. We observed evident differences in NTZ serum levels between patients which could be explained by differences in how individuals metabolize the drug. Alternatively, this may be the instability observed in the levels of bound NTZ for SD group in the first section (Figure 1b). In brief, our results suggest that both NTZ and sVCAM-1 levels could be used as putative biomarkers to monitor NTZ treatment. To explore the utility of combining serum NTZ levels with other possible biomarkers, a clear correlation between CD49d saturation and NTZ serum levels was observed (Figure 8); this is in agreement with the results obtained by J.Serra López-Matencio et al. [32]. Importantly, since cytometer facilities are not present in all hospitals, monitoring a serum biomarker would be much more feasible to use in routine clinical practice. The positive correlation that we observe between CD49d saturation levels and NTZ levels in serum, suggests that the measurement of NTZ in serum could possibly be used instead of CD49d saturation to monitor the treatment in RRMS patients under NTZ, as was described by Kempen et al. [33]. Hence, it would be very convenient to have a kit to measure NTZ levels in serum to be implemented in the clinical practice.

Regarding the high variability of NTZ levels inside each group, it has been described that the body weight of the patient influences the pharmacodynamic and pharmacokinetic responses to NTZ treatment [19,24]. However, further research is needed to establish pharmacological thresholds of NTZ safety and efficacy, which could help to define the NTZ dosing for each individual patient. Thereby, the variability observed in the studied parameters can be partially explained by factors such as body weight, though different factors could be influencing NTZ metabolism as well [34–36].

In summary, our study shows that CD49d saturation is a stable biomarker that can be used to monitor NTZ-treated RRMS patients, and that could be used to establish a safety range to personalize the treatment. Moreover, the measurement of NTZ levels in serum could be also used in this way in the clinical practice. Finally, further research could also identify sVCAM-1 as a biomarker to achieve the same goal.

Further studies will explore both the cell- and serum-based biomarkers that we have identified with respect NTZ efficacy to assess their potential to develop a personalized dosing schedule for NTZ patients that will maintain efficacy but lower risk of PML.

**Author Contributions:** Conceptualization, S.G., C.R.-T. and E.M.-C.; methodology, S.G., C.R.-T. and E.M.-C.; software, C.I.-G., A.S.-R. and I.B.; validation, C.I.-G., A.S.-R., A.T.-S. and C.R.-T.; formal analysis, J.G.-G., A.S.-R., S.G., E.M.-C. and C.R.-T.; investigation, J.G.-G., C.I.-G., A.S.-R., S.P.-R., M.J.M., L.B., J.S., M.A.M.-M., E.M., I.B., S.G. and C.R.-T.; resources, S.G., E.M.-C. and C.R.-T.; data curation, C.I.-G., A.S.-R. and A.T.-S.; writing—original draft preparation, J.G.-G.; writing—review and editing, C.I.-G., A.S.-R., A.T.-S., S.P.-R., M.J.M., L.B., J.S., M.A.M.-M., E.M., I.B., S.G., E.M.-C. and C.R.-T.; visualization, J.G.-G.; supervision, S.G., E.M.-C. and C.R.-T.; project administration, E.M.-C. and C.R.-T.; funding acquisition, S.G., E.M.-C. and C.R.-T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially sponsored by Biogen Inc. MJM is beneficiary of a Sara Borrell contract from the ISCIII and the FEDER (CD19/00209).

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Institut d'Investigació Germans Trias i Pujol (FII-NAT-2015-01, date of approval June 14 2019).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Acknowledgments:** This work has been supported by positive discussion through Consolidated Research Group #2017 SGR 103 (Advanced Immunotherapies for Autoimmunity), AGAUR, Generalitat de Catalunya. The authors are grateful to Katie Whartenby for the critical reading of the manuscript and helpful suggestions. The authors thank Susi Soler for her technical assistance. The authors thank all patients for participating in the study. The authors thank Marco A. Fernández of the Cytometry Facility of IGTP for his continuous help and suggestions.

**Conflicts of Interest:** S.G. receives compensation from Biogen as a consultant.

**Appendix A**

**Figure A1.** Analysis strategy of the Whole Blood protocol.

**Figure A2.** Analysis strategy of the PBMCs protocol.

**Figure A3.** CD49d surface molecules levels and saturation percentage in CD4+ and CD8<sup>+</sup> T lymphocytes, and CD19<sup>+</sup> B lymphocytes in RRMS patients under Natalizumab treatment in SD or EID. Mean of (**a**) CD49d molecules/cell surface, (**b**) bound NTZ/cell surface, (**c**) free CD49d molecules/cell surface, and (**d**) CD49d saturation percentage in CD4+ and CD8+ T lymphocytes, and CD19+ B lymphocytes in the SD (*n* = 8) and EID (*n* = 21) groups. Each dot represents the number of PE molecules per cell for each patient in the second extraction (V2), translated into the levels of the corresponding antibodies (anti-CD49d PE and huIgG4 Fc PE). EID, extended interval dosing; NTZ, natalizumab; SD, standard dosing. ns: *p* > 0.05, \* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001.

**Figure A4.** Representation of SD and EID groups in some CD4+ and CD8<sup>+</sup> T lymphocytes subpopulations in RRMS patients under Natalizumab treatment in both first (V1) and second (V2) extractions. Percentage of expression of CD49d in (**a**) CD4+CD27+, (**b**) CD4<sup>+</sup> Effector Memory (EM), (**c**) CD8+ Effector and (**d**) CD8+ Effector Memory (EM) T lymphocytes Each dot represents the percentage of expression for each marker regarding their parent population (CD4+ or CD8<sup>+</sup> T lymphocytes populations) in the SD (*n* = 8) or EID (*n* = 21) groups. EID, extended interval dosing; EM, effector memory; SD, standard dosing. ns: *p* > 0.05, \* *p* < 0.05.

**Figure A5.** Representation of SD and EID groups in some CD19<sup>+</sup> B lymphocytes subpopulations in RRMS patients under Natalizumab treatment in both first (V1) and second (V2) extractions. Percentage of expression of CD49d in (**a**) CD19+ Switched and (**b**) CD19<sup>+</sup> Naïve B lymphocytes. Each dot represents the percentage of expression for each marker regarding their parent population (CD19+ B lymphocytes populations) in the SD (*n* = 8) or EID (*n* = 21) groups. EID, extended interval dosing; SD, standard dosing. ns: *p* > 0.05, \* *p* < 0.05.

**Figure A6.** Comparison of first (V1) and second (V2) extractions within CD4<sup>+</sup> and CD8<sup>+</sup> T lymphocytes, and CD19<sup>+</sup> B lymphocytes subpopulations in RRMS patients under Natalizumab treatment in SD or EID. Percentage of expression of β7-Integrin in all CD4<sup>+</sup> and CD8+ T lymphocytes, and CD19<sup>+</sup> B lymphocytes subsets. Each dot represents the percentage of expression of β7-Integrin regarding their parent population (CD4+ or CD8<sup>+</sup> T lymphocytes, or CD19+ B lymphocytes populations) in the SD (*n* = 8) and EID (*n* = 21) groups. CM, Central Memory; DN, Double Negative; EID, extended interval dosing; EM, Effector Memory; NS, Non-Switched; SD, standard dosing. ns: *p* > 0.05.

#### **References**


## *Article* **Dynamics and Predictors of Cognitive Impairment along the Disease Course in Multiple Sclerosis**

**Elisabet Lopez-Soley 1, Eloy Martinez-Heras 1, Magi Andorra 1, Aleix Solanes 2, Joaquim Radua 2,3,4, Carmen Montejo 1, Salut Alba-Arbalat 1, Nuria Sola-Valls 1, Irene Pulido-Valdeolivas 1, Maria Sepulveda 1, Lucia Romero-Pinel 5, Elvira Munteis 6, Jose E. Martínez-Rodríguez 6, Yolanda Blanco 1, Elena H. Martinez-Lapiscina 1, Pablo Villoslada 1, Albert Saiz 1, Elisabeth Solana 1,\*,† and Sara Llufriu 1,\*,†**


**Abstract:** (1) Background: The evolution and predictors of cognitive impairment (CI) in multiple sclerosis (MS) are poorly understood. We aimed to define the temporal dynamics of cognition throughout the disease course and identify clinical and neuroimaging measures that predict CI. (2) Methods: This paper features a longitudinal study with 212 patients who underwent several cognitive examinations at different time points. Dynamics of cognition were assessed using mixedeffects linear spline models. Machine learning techniques were used to identify which baseline demographic, clinical, and neuroimaging measures best predicted CI. (3) Results: In the first 5 years of MS, we detected an increase in the z-scores of global cognition, verbal memory, and information processing speed, which was followed by a decline in global cognition and memory (*p* < 0.05) between years 5 and 15. From 15 to 30 years of disease onset, cognitive decline continued, affecting global cognition and verbal memory. The baseline measures that best predicted CI were education, disease severity, lesion burden, and hippocampus and anterior cingulate cortex volume. (4) Conclusions: In MS, cognition deteriorates 5 years after disease onset, declining steadily over the next 25 years and more markedly affecting verbal memory. Education, disease severity, lesion burden, and volume of limbic structures predict future CI and may be helpful when identifying at-risk patients.

**Keywords:** cognition; cognitive impairment; neuroimaging; longitudinal; predictors; multiple sclerosis

#### **1. Introduction**

Multiple sclerosis (MS) is a chronic inflammatory demyelinating disease of the central nervous system that entails physical and cognitive impairment (CI). The latter has been reported in 40–70% of people with MS and it has a severe impact on the individual's

**Citation:** Lopez-Soley, E.;

Martinez-Heras, E.; Andorra, M.; Solanes, A.; Radua, J.; Montejo, C.; Alba-Arbalat, S.; Sola-Valls, N.; Pulido-Valdeolivas, I.; Sepulveda, M.; et al. Dynamics and Predictors of Cognitive Impairment along the Disease Course in Multiple Sclerosis. *J. Pers. Med.* **2021**, *11*, 1107. https:// doi.org/10.3390/jpm11111107

Academic Editor: Cristina M. Ramo-Tello

Received: 30 September 2021 Accepted: 26 October 2021 Published: 28 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

quality of life [1,2]. CI can be detected in the early phases of MS, but it is more frequent as overall disability accrues [3]. The pattern of cognitive decline predominantly affects information processing speed (IPS) and episodic memory, although executive functions, semantic fluency, and visuospatial analysis may also be altered [4,5]. However, how this deterioration evolves and affects different cognitive domains as the disease progresses is still to be determined.

A few longitudinal studies have investigated the association between clinical and imaging features of MS with cognitive decline, suggesting a predictive value of baseline cognitive status [5], baseline IPS [6], education, and aging [7]. Using different magnetic resonance imaging (MRI) techniques, a relationship has been demonstrated between CI and the combined effect of white matter (WM) and gray matter (GM) damage [8]. In addition, identifying neurodegeneration in specific and cognitively relevant GM regions may help to more accurately predict CI.

Characterizing the natural course of cognitive performance in MS, and identifying predictors of CI, are still distant milestones in clinical MS research. Therefore, in this study, we first describe the temporal dynamics of global cognition and cognitive domains using mixed-effects models, which allowed us to obtain model estimates of specific parameters and to control for between- and within-subject variability. Subsequently, we investigated the baseline demographic, clinical, and MRI measures that best predicted the CI using machine learning (ML) techniques. These issues were addressed in an appropriately large cohort of MS patients with a wide range of disease duration.

#### **2. Materials and Methods**

#### *2.1. Participants, Clinical, and Cognitive Assessment*

For this longitudinal study, we collected data from a prospective cohort recruited at the MS Unit of the Hospital Clinic of Barcelona from January 2011 to February 2020 [9,10]. The criteria for inclusion in this study were aged between 18 and 65 years, and having at least two clinical and cognitive assessments, with MRI scans available at the first evaluation. Patients did not present any relapse or received any corticosteroid treatment in the last 30 days of the study visit. As such, 212 MS patients fulfilled the inclusion criteria and were analyzed. We collected data regarding sex, age, educational level, disease duration, disease type, the number of relapses before study inclusion, the use of disease-modifying therapies (DMTs), and their global disability evaluated with the Expanded Disability Status Scale (EDSS) [11]. The Ethics Committee at the Hospital Clinic of Barcelona approved the study, and all the participants provided their signed informed consent prior to their enrolment.

At each visit, the participants underwent a neuropsychological evaluation using the Rao's battery [12], with alternate versions when available. Raw values were transformed into z-scores by adjusting for age and education according to the Spanish normative data, and they were grouped in terms of global cognition and for each cognitive domain (verbal and visual memory, attention-IPS, and semantic fluency) [13]. Failure in any test was considered when z-score was below −1.5 standard deviation (SD) of the norm. CI in a given cognitive domain was defined as a failure in at least one test assessing that domain, and global CI was defined as an impairment in at least two cognitive tests evaluating the same or different cognitive domains.

#### *2.2. Magnetic Resonance Imaging (MRI)*

#### 2.2.1. MRI Acquisition and Processing

Baseline MRI were acquired on a 3 Tesla Magnetom Trio (SIEMENS, Erlanger, Germany) scanner using a 32-channel phased-array head coil, as described previously [10]. Two different acquisition protocols were used, involving a 3D-Magnetization Prepared Rapid Acquisition Gradient Echo (MPRAGE) and 3D-T2 fluid attenuated inversion recovery (FLAIR) sequence (see Supplementary Material).

#### 2.2.2. Structural MRI Processing for Volumetric Analysis

WM lesions were defined semi-automatically into the 3D-MPRAGE space with the registered 3D-FLAIR image as a reference to improve lesion identification using the Jim7 Software (http://www.xinapse.com/j-im-7-software/). Lesion in-painting was applied to the 3D-MPRAGE image to enhance segmentation and registration. GM regions were parcellated using the Mindboggle software (https://mindboggle.info), applying the Desikan– Killiany Tourville cortical labeling atlas, and the automated subcortical segmentation was achieved with the FSL-FIRST package (fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST), resulting in 31 cortical and 7 subcortical labels per hemisphere [14,15]. The volumetric measurements were analyzed using the SIENAX [16] scaling factor to reduce the head-size variability.

We removed interscan variability between the different acquisition protocols using the ComBat function in the R software [17,18].

#### *2.3. Statistical Analysis*

All baseline demographic, clinical, and neuroimaging data were described through the median and interquartile range (IQR) or the mean (±SD) for quantitative variables as appropriate as well as through the absolute numbers and the proportions of the qualitative variables. The normal distribution of the data was checked by histograms inspection and using the Shapiro–Wilks test.

We used mixed-effects linear regression to model the dynamics of cognition throughout the course of MS. Models adjusted for age at MS onset, educational level, and sex were used to fit the rates of global cognitive performance and of each cognitive domain using disease duration as a main fixed-effect predictor. In addition, we used linear spline models with the same variables as in the mixed-effects regressions to divide the duration of the disease into three periods. Using visual inspection of the raw data together with prior evidence [19] and the Akaike Information Criterion [20], we selected knots at 5 and 15 years of disease duration to model our data. These models provided three parameters, beta coefficients, for the change in cognition relative to disease duration.

We used ML techniques to identify which baseline demographic, clinical, and MRI measures best serve as predictors of CI. A priori, potential predictor variables were sex, educational level (basic, primary, secondary, or higher), disease duration, disease type, EDSS score, use of DMTs (yes or no), number of relapses before study inclusion, lesion volume, and 76 cortical and subcortical regional volumes [15]. Multiple imputation was employed to handle missing data: we used multiple regression to find the variable distribution and we replaced the missing value by taking a random value from the distribution found. Logistic Lasso regressions were performed to predict the global cognitive status (preserved or impaired, see above). The effect of age was removed from the anatomical brain features, although we included age as a predictor variable in the Lasso model. Lasso regressions automatically select a small number of baseline measures, avoiding overfitting. To validate the performance of the ML models, we used a 10-fold cross-validation method, splitting the overall sample into training and test datasets. We created the imputation algorithms and Lasso regressions using the training datasets alone, while we assessed the performance of the predictions in the independent test datasets. Due to the use of multiple imputation and folds, we created several ML models. We selected the most representative model as the one with the highest overlaps (Dice coefficient) with the other models in the selection of the baseline measures. The same procedure was used for each specific cognitive domain.

All the analyses were performed using the R statistical software (version 3.6.0, www. R-project.org), and the statistical significance was set at *p* < 0.05.

#### **3. Results**

A cohort of 212 MS patients who performed a median of three clinical visits per patient (range, 2–5; total assessments = 605) with a median follow-up time of 2.1 years (range 0.9–7.9 years) were included in this study. In terms of the baseline characteristics

(Table 1), the patients were mostly female (68%), middle aged-adults (41 ± 9.47 years), with a relapsing-remitting MS (83%) and with a median disease duration of 8.2 years (range, 0.1–29.0).


**Table 1.** Demographic, clinical, and MRI characteristics of MS patients at baseline.

The data represent the absolute numbers and the proportions of the qualitative data, or the median and the interquartile range (IQR) for the quantitative data, unless otherwise specified. SD: standard deviation; MS: multiple sclerosis; EDSS: Expanded Disability Status Scale; DMTs: Disease-Modifying Therapies.

One hundred and eleven patients (52%) were receiving DMTs at baseline, and from them, 94 patients (85%) used moderate-efficacy DMTs (Table S1).

At the latest follow-up, 77 patients (36%) had global CI, 58 patients (27%) had verbal memory impairment, 51 patients (24%) had visual memory impairment, 38 patients (18%) had attention-IPS impairment, and 41 patients (20%) had semantic fluency impairment.

#### *3.1. Cognitive Trajectory throughout Disease Course*

According to the linear mixed-effects models, there was an annual cognitive decline that affected verbal memory, visual memory, and semantic fluency (Figure 1A,B, and Table S2). A trend was found in global cognition (*p* = 0.058), and no significant model was found for attention-IPS (*p* = 0.345).

When we divided the duration of the disease into three periods, we detected distinct cognitive slopes for each stage (Figure 1C,D and in Table S3). The initial period extended over the first 5 years of the disease, during which an increase in cognition was evident. In the second period, covering 5–15 years of the disease and the third phase, 15–30 years, the cognitive decline in the participants became increasingly accentuated. In the first 5 years of MS, we detected an enhancement in global cognition (β = 0.080 (95% CI, 0.04 to 0.12) z-score/year; *p* = <0.001), verbal memory (β = 0.083 (95% CI, 0.01 to 0.16) z-score/year; *p* = 0.037), and attention-IPS (β = 0.107 (95% CI, 0.05 to 0.16) z-score/year; *p* = <0.001). However, this trajectory was followed by a decline in global cognition (β = −0.029 (95% CI, −0.05 to −0.01) z-score/year; *p* = 0.013), verbal memory (β = −0.041 (95% CI, −0.08 to 0.00) z-score/year; *p* = 0.047), and visual memory (β = −0.041 (95% CI, −0.08 to −0.01) z-score/year; *p* = 0.024) between 5 and 15 years of the disease. Moreover, similar dynamics were observed during the 15–30 years of MS course, during which cognitive decline continued in global cognition (β = −0.031 (95% CI, −0.06 to −0.01) z-score/year; *p* = 0.021) and verbal memory (β = −0.055 (95% CI, −0.10 to −0.01) z-score/year; emphp = 0.018), and a trend was observed toward a decline in attention-IPS (β = −0.035 (95% CI, −0.07 to 0.00) z-score/year; *p* = 0.055). No significant effect was detected on semantic fluency performance.

**Figure 1.** Dynamics of cognitive performance in MS as the disease progresses. The global cognition z-score (**A**) and cognitive domains z-score (**B**) were modeled by mixed-effect regressions. The duration of the disease was divided into three periods by spline models with two knots (at 5 and 15 years of disease duration) represented by dotted black vertical lines (for the global cognition z-score (**C**) and each domain z-score (**D**). Black points joined by a broken line represent the individual trajectories of the changes in the global cognition z-scores, the continuous lines represent the individual fit of the model, and the thicker brown line represents the population model (**A**,**C**). Population model lines of cognitive domains are differentiated by color (**B**,**D**): blue for verbal memory, purple for visual memory, red for attention-IPS, and green for semantic fluency. The x-axis represents the time in years from clinical onset. All models were fitted using the lme4 package in R version 3.5.2 (R Foundation for Statistical Computing: \* *p* < 0.05).

#### *3.2. Demographic, Clinical, and MRI Baseline Predictors of Future CI*

A Lasso regression was employed to predict CI at the latest follow-up. The models that showed the strongest performance were verbal memory (positive predictive value (PPV) = 62%; negative predictive value (NPV) = 90%) and attention-IPS (PPV = 38%; NPV = 92%), which were more accurate (79% and 73%, respectively) in predicting CI than the other models (Table 2).

The resulting predictive model of global CI included educational level, disease duration, EDSS score, and the number of previous relapses as clinical parameters. The model also included lesion volume and six cortical regional volumes, covering the bilateral parahippocampus, left hippocampus, and right caudate entorhinal and rostral anterior cingulate (Table 3).


**Table 2.** Performance evaluation of each Lasso regression model.

Balanced Accuracy is defined as the arithmetic mean of sensitivity and specificity. Sensitivity is defined as the proportion of subjects who developed cognitive impairment that are correctly classified. Specificity is defined as the proportion of subjects who did not develop cognitive impairment that are correctly classified. The predictive model of cognitive impairment in semantic fluency was generated with 210 patients. CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value; IPS: information processing speed.

**Table 3.** Predictors of each Lasso regression model.


The demographic, clinical, and MRI variables that remained in the age-adjusted predictive model of cognitive impairment in each domain are shown. EDSS: Expanded Disability Status Scale; RH: right hemisphere; LH: left hemisphere; IPS: information processing speed. \* Frequency up to 2000 models.

> In terms of verbal memory, CI was predicted by educational level, disease type, EDSS score, and the number of previous relapses. The MRI predictors identified involved lesion volume and six cortical regional volumes, including the right parahippocampus and rostral anterior cingulate, and the pars opercularis, pericalcarine, thalamus, and accumbens of the

left hemisphere. Lesion volume was the only predictor selected for CI in terms of visual memory. CI in attention-IPS was predicted by the EDSS, lesion volume and volume of the right hippocampus, caudal anterior cingulate and entorhinal, and the left pericalcarine. The prediction of semantic fluency CI was explained by a model that involved lesion volume and the volume of the left hippocampus and the right rostral anterior cingulate.

#### **4. Discussion**

In this longitudinal study, we set out to better understand the deterioration of cognition in MS by describing its temporal dynamics and by identifying predictors of CI. The results reveal different patterns of worsening over the disease course, both in terms of global cognition and the distinct cognitive domains, suggesting a progressive decline after the first 5 years of disease onset that most markedly affects verbal memory. When focusing on the five models that best predicted CI, we found that verbal memory and attention-IPS models had the strongest predictive performance. The results reinforce the importance of the educational level, disease severity, lesion load, and certain GM regional volumes, mainly involving the medial temporal lobe areas and the cingulate, as predictors of cognitive deficits.

There have been some attempts to describe the evolution of cognitive performance in patients with MS, mainly focusing on short time follow-up periods [3,5]. However, the diversity of cohort characteristics and the use of disparate range of cognitive tests and criteria for diagnosing impairment has produced quite heterogeneous data that prove to be difficult to compare across studies. Here, we characterized temporal modifications to different cognitive domains in a cohort of patients with wide ranging disease duration. Our data showed a progressive decline as opposed to an abrupt development of CI, supporting a combined role of age, neurodegeneration, the exhaustion of cognitive reserve, and a loss of plasticity in this clinical manifestation of MS [21]. Moreover, we modeled the trajectory in three different periods by providing differential slopes of the cognitive change during the course of the disease. The results showed an increase in global cognition, verbal memory, and attention-IPS in the first five years after MS onset, which was followed by a decline in cognitive performance. This is a surprising finding even though it is consistent with previous data indicating that cognitive deterioration occurs mainly after the fifth year following disease onset [22]. Several explanations may account for the former. First, it may reflect the capacity of the brain to compensate for the pathological effects of MS lesions through its cognitive reserve, which is a response that may be particularly protective in early stages before structural damage accumulates. Second, the mood disorders such as anxiety or depressive symptoms associated with the diagnosis of MS may negatively affect the results of a first cognitive assessment [23]. Finally, there might be a possible effect of learning in the retesting of cognition that could be present at any stage of the disease, even though we used alternate forms of the tests at each evaluation whenever this was possible.

The cognitive trajectory from the fifth year after MS onset onwards was driven by a decline in verbal and visual memory, although only verbal memory continued to deteriorate until the 30th year of the disease, along with a trend for attention-IPS to decline. Our data reinforce other smaller longitudinal studies, where CI was driven by evolving dysfunction in verbal memory and IPS [5]. By contrast, it was recently shown that IPS was the first domain to be affected [24]. This incongruence may reflect methodological differences, as we grouped the results from the attention-IPS tests into a single cognitive domain, and our cohort also had a lower educational level. Moreover, we cannot rule out a contribution of the distinct cognitive phenotypes in MS [25], as they may differ between cohorts of patients.

Little is known about what may serve to predict the development of CI, hampering research into early prevention and treatment. In the present analysis, the verbal memory and the attention-IPS prediction models produced the highest predictive balanced accuracy and a very high NPV. Educational level was a predictor in the global cognition and verbal memory models, which might reflect the protective role of cognitive reserve in CI [26,27]. In addition, the disease severity indicated by the EDSS and the number of previous

relapses seemed to be related to future impairment in global cognition, verbal memory, and attention-IPS models.

Regarding MRI features, global lesion volume was selected as a predictor in all models. In fact, lesion accumulation has been associated with more severe cognitive dysfunction [28] by promoting brain network disruption [10]. Even so, the present results enabled the identification of the specific cortical regional areas related to future cognition. The hippocampus influences global cognition, verbal memory, and semantic fluency, which is consistent with the theory that it is an integral component of the brain network that supports verbal memory and word generation [25,29–31]. Similarly, the volume of the anterior cingulate cortex was present in all predictive models, except for the visual memory model. This region is involved in the fronto-parietal network, and it plays a key role in executive functions, as well as participating in the working memory network [32,33]. Moreover, the thalamus, a highly connected nucleus, has been associated with learning and memory function, and it seems to be a good predictor for CI in MS [5,34], although here, it was more specifically associated with verbal memory impairment. All these areas are part of the limbic system, which plays a crucial role in various cognitive functions [35].

This study has several strengths, including the fact that participants were prospectively and consecutively recruited, thereby preventing a selection bias and enhancing the generalizability of the results. Drawing up a global pattern of cognition in MS was only possible because our cohort included patients with a clinical disease duration of up to 30 years. In addition, all the analyses were performed for global and stratified cognition. Our study also has some limitations. Working with a real-world MS cohort implies that it is predominantly composed of relapsing-remitting MS patients, the most common phenotype encountered clinically in the current treatment era. Moreover, we were unable to assess the influence of mood disorders and fatigue on cognition because, unfortunately, the protocol did not include any mood or fatigue specific test. Furthermore, we do not have a matched control group, although we used z-scores based on normative data to address the changes in cognition that can be expected in accordance with age and educational level. In addition, it has not been possible to analyze the effect of DMTs on cognition, as the predictive models could be influenced by the low proportion of treated patients (52%) at the study initiation predominantly using moderate-efficacy DMTs. Finally, the inclusion of GM lesion volume, WM lesion location, or advanced quantitative MRI measures, such as functional and diffusion MRI, in future studies might be useful to improve our understanding of cognition and its MRI related factors in MS.

#### **5. Conclusions**

In conclusion, cognition in MS patients progressively deteriorates after the first 5 years of the disease, with a steady decline over the next 25 years that affects verbal memory more markedly. Moreover, CI is predicted by the educational level, disease severity, lesion load, and volume of high-order regions, including the hippocampus and anterior cingulate cortex, with a strong NPV for the verbal memory and attention-IPS in particular. Consequently, beneficial cognitive maintenance strategies should be adopted that focus on predictors that identify patients at risk of CI and which promote activities such as intellectual enrichment that attenuate the impact of brain burden in the initial years of the disease as an adequate treatment window.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/jpm11111107/s1, MRI acquisition parameters, Table S1: Use of disease-modifying therapies at baseline, Table S2: Cognitive changes throughout the disease course, Table S3: Cognitive changes at the different phases of MS.

**Author Contributions:** Conceptualization, E.L.-S., E.S. and S.L.; Methodology, E.L.-S., E.M.-H., M.A., E.S. and S.L.; Formal Analysis, E.L.-S., M.A., A.S. (Aleix Solanes), J.R. and E.S.; Data Curation, E.L.-S., C.M., S.A.-A., N.S.-V., I.P.-V., M.S., L.R.-P., E.M., J.E.M.-R., Y.B., E.H.M.-L., A.S. (Albert Saiz), E.S. and S.L.; Writing—Original Draft Preparation, E.L.-S., E.M.-H., M.A., A.S. (Aleix Solanes), J.R., E.S. and

S.L.; Writing—Review and Editing, E.L.-S., E.M.-H., M.A., A.S. (Aleix Solanes), J.R., E.S. and S.L.; Supervision, E.H.M.-L., P.V., A.S. (Albert Saiz), E.S. and S.L.; Project Administration, A.S. (Albert Saiz) and S.L.; Funding Acquisition, A.S. (Albert Saiz) and S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** The author(s) disclose receipt of the following financial support for the research, authorship, and/or publication of this article. This work was funded by: a Proyecto de Investigación en Salud (PI15/00587 to S.LL., and A.S.; PI15/00061 to P.V.; PI18/01030 to S.LL. and A.S.; and JR16/00006; MV17/00021; PI17/01228; RD16/0015/0003 to E.H.M-L.), integrated into the Plan Estatal de Investigación Científica y Técnica de Innovación I+D+I, and co-funded by the Instituto de Salud Carlos III-Subdirección General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER, "Otra manera de hacer Europa"); by the Red Española de Esclerosis Múltiple (REEM: RD16/0015/0002, RD16/0015/0003, RD12/0032/0002, RD12/0060/01-02); by TEVA Spain, the Ayudas Merck de Investigación 2017 from the Fundación Merck Salud and the Proyecto Societat Catalana Neurologia 2017; and by the MS Innovation GMSI, 2016 to E.H.M.-L., E.L.-S. holds a predoctoral grant from the University of Barcelona (APIF). M.A. holds equities in Bionure and Goodgut. C.M. was awarded by the Hospital Clinic Emili Letang, and she holds a P-FIS contract (FI19/00111). J.R. holds a Miguel Servet Research Contract (CPII19/00009) and Research Project PI19/00394 from the Plan Nacional de I+D+I 2013–2016, the Instituto de Salud Carlos III-Subdirección General de Evaluación y Fomento de la Investigación and the European Regional Development Fund (FEDER, 'Investing in your future'). None of the funding bodies had any role in the design and performance of the study; the collection, management, analysis, and interpretation of the data; the preparation, revision, or approval of the manuscript; and the decision to submit the manuscript for publication.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Hospital Clinic of Barcelona (protocol code HCB/2009/4905 on 7 April 2009; protocol code HCB/2015/0236 on 23 November 2015; protocol code HCB/2912/7965 on 2 December 2015; protocol code HCB/2016/0827 on 12 December 2016).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The datasets generated during and/or analyzed in the current study are available from the corresponding authors upon reasonable request.

**Acknowledgments:** The authors are grateful to Núria Bargalló, Cesar Garrido and the IDIBAPS Magnetic resonance imaging facilities, and to the Fundació Cellex for their support of this study. This work was carried out at the Centro Esther Koplowitz of the IDIBAPS (Barcelona, Spain), which is supported by the CERCA Programme/Generalitat de Catalunya.

**Conflicts of Interest:** The authors declare the following potential conflicts of interest with respect to their research, authorship, and/or the publication of this article: E.M.-H., A.S., J.R., C.M. and S.A.-A. have nothing to disclose; E.L.-S. and E.S. received travel reimbursement from Sanofi and ECTRIMS; M.A. holds equity shares of Bionure, S.L. and Goodgut S.L., stock options for Attune Neurosciences Inc., and he is currently an employee of Roche, although his contribution to this work is associated with his previous work at IDIBAPS; N.S.-V. received compensation for consulting services and speaker honoraria from Genzyme-Sanofi, Almirall, Novartis, Roche and Almirall; I.P.-V. received travel reimbursement from Roche and Genzyme, she holds stock options in Aura Innovative Robotics and currently, she is an employee at UCB Pharma, the contribution to this study is associated with her previous work at IDIBAPS; M.S. received speaker honoraria from Genzyme, Novartis, and Biogen; L.R.-P. received honoraria compensation to participate in advisory boards, collaborations as a consultant and scientific communications and received research support, funding for travel, and congress expenses from Biogen Idec, Novartis, TEVA, MerckSerono, Genzyme, Almirall, Bayer, Celgene and Roche; E.M. received speaking honoraria from Merck and Novartis; J.E.M.-R. has participated as principal investigators in pharmaceutical company-sponsored clinical trials by Novartis, Roche, Merck-Serono, Actelion, Celgene, Oryzon Genomics, and Medday, carried out at the Hospital del Mar, IMIM, Barcelona. J.E.M.-R. also received fees for consulting services and lectures from Novartis, Sanofi and Biogen Idec, and travel funding from Biogen Idec and Sanofi; Y.B. received speaking honoraria from Biogen, Novartis, and Genzyme; E.H.M.-L. received travel support for international and national meetings from Roche and Sanofi-Genzyme, and honoraria for consultancies from Novartis, Roche, and Sanofi before joining the European Medicines Agency

where she is currently employed (Human Medicines, since 16 April 2019), although her contribution to this article is related to her activity at the Hospital Clinic of Barcelona/IDIBAPS and consequently, it does not represent the views of the Agency or its Committees. She is a member of the International Multiple Sclerosis Visual System (IMSVISUAL) Consortium; P.V. is a shareholder and has received consultancy fees from Accure Therapeutics SL, Attunne Neurosciences Inc., QMenta Inc., Spiral Therapeutix Inc, CLight Inc. and NeuroPrex Inc., as well as having held grants from the Instituto de Salud Carlos III and the European Commissions; A.S. received compensation for consulting services and speaker honoraria from Bayer-Schering, Merck-Serono, Biogen-Idec, Sanofi-Aventis, TEVA, Novartis and Roche; S.L. received compensation for consulting services and speaker honoraria from Biogen Idec, Novartis, TEVA, Genzyme, Sanofi and Merck.

#### **References**


## *Article* **Cognitive Performance and Health-Related Quality of Life in Patients with Neuromyelitis Optica Spectrum Disorder**

**Elisabet Lopez-Soley 1,†, Jose E. Meca-Lallana 2,†, Sara Llufriu 1, Yolanda Blanco 1, Rocío Gómez-Ballesteros 3, Jorge Maurino 3, Francisco Pérez-Miralles 4, Lucía Forero 5, Carmen Calles 6, María L. Martinez-Gines 7, Inés Gonzalez-Suarez 8, Sabas Boyero 9, Lucía Romero-Pinel 10, Ángel P. Sempere 11, Virginia Meca-Lallana 12, Luis Querol 13, Lucienne Costa-Frossard 14, Maria Sepulveda 1,‡ and Elisabeth Solana 1,\*,‡**


**Abstract:** Background: The frequency of cognitive impairment (CI) reported in neuromyelitis optica spectrum disorder (NMOSD) is highly variable, and its relationship with demographic and clinical characteristics is poorly understood. We aimed to describe the cognitive profile of NMOSD patients, and to analyse the cognitive differences according to their serostatus; furthermore, we aimed to assess the relationship between cognition, demographic and clinical characteristics, and other aspects linked to health-related quality of life (HRQoL). Methods: This cross-sectional study included 41 patients (median age, 44 years; 85% women) from 13 Spanish centres. Demographic and clinical characteristics were collected along with a cognitive z-score (Rao's Battery) and HRQoL patient-centred measures, and their relationship was explored using linear regression. We used the Akaike information criterion to model which characteristics were associated with cognition. Results: Fourteen patients (34%) had CI, and the most affected cognitive domain was visual memory. Cognition was similar in AQP4-IgG-positive and -negative patients. Gender, mood, fatigue, satisfaction with life, and perception of stigma were associated with cognitive performance (adjusted R<sup>2</sup> = 0.396, *p* < 0.001). Conclusions: The results highlight the presence of CI and its impact on HRQoL in NMOSD patients. Cognitive and psychological assessments may be crucial to achieve a holistic approach in patient care.

**Citation:** Lopez-Soley, E.; Meca-Lallana, J.E.; Llufriu, S.; Blanco, Y.; Gómez-Ballesteros, R.; Maurino, J.; Pérez-Miralles, F.; Forero, L.; Calles, C.; Martinez-Gines, M.L.; et al. Cognitive Performance and Health-Related Quality of Life in Patients with Neuromyelitis Optica Spectrum Disorder. *J. Pers. Med.* **2022**, *12*, 743. https://doi.org/10.3390/ jpm12050743

Academic Editor: Takahiro Nemoto

Received: 24 February 2022 Accepted: 29 April 2022 Published: 2 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** neuromyelitis optica spectrum disorder; cognition; health-related quality of life; mood

#### **1. Introduction**

Neuromyelitis optica spectrum disorder (NMOSD) is an inflammatory autoimmune disorder of the central nervous system (CNS) predominantly targeting the spinal cord and optic nerve [1,2]. The discovery of an immunoglobulin G directed against the astrocyte water channel protein aquaporin-4 (AQP4-IgG) not only allowed a reliable distinction of the disease from multiple sclerosis (MS), the most common differential diagnosis [3], but also led to expansion of the clinical syndromes associated with the disorder and the definition of a new set of diagnostic criteria with prognostic implications (2015 criteria) [4].

Most NMOSD patients follow a course of early disability accrual due to frequent and potentially severe relapses. In recent years, increasing attention has been paid to the prevalence and pattern of cognitive impairment (CI) in NMOSD patients, as it is an underestimated but disabling symptom with imprecise description [5]. The frequency of CI varies substantially across studies, ranging from 3% to 75% [6,7], with methodological heterogeneity in terms of samples enrolled, diagnostic criteria applied, CI definition or the neuropsychological assessment tools employed [1,7,8]. Previous studies not only have high variations across the frequency of CI, but also depict ambiguous results about the most affected cognitive domains in NMOSD patients. Moreover, it is not entirely clear whether the presence or absence of AQP4-IgG could influence cognitive performance.

Other aspects related to the disease, such as mood, fatigue, and self-perception of symptoms and pain have an impact on the patient's quality of life, interfering with physical and emotional aspects of wellbeing [9,10]. However, the relationship of these factors with NMOSD patients' cognitive performance has been poorly investigated. A further analysis of the full spectrum of cognitive performance and the impact of psychological comorbidities is needed for a better understanding of the disease's symptoms, and to provide potential target interventions. Therefore, the main objective of this study was to describe the cognitive profile of a well-characterised group of patients with NMOSD, and to analyse cognitive differences according to their serostatus. The secondary objective was to assess the relationship between cognition, demographic and clinical characteristics, and the contribution of emotional status and other aspects related to the health-related quality of life (HRQoL).

#### **2. Materials and Methods**

#### *2.1. Participants*

For this non-interventional cross-sectional study, we collected data from patients consecutively recruited at thirteen hospital-based neuroimmunology clinics in Spain (PERSPECTIVES-NMO study) [11] between November 2019 and July 2020. The inclusion criteria were (a) patients aged between 18 and 65 years; (b) diagnosed with NMOSD according to the Wingerchuk 2015 criteria [4]; (c) relapse-free or not having received corticosteroids in the last 30 days; (d) stable treatment in the last three months and; (e) available cognitive and mood disorder assessments. Patients with difficulties in understanding and/or responding to the study questionnaires and with other concomitant chronic disorders that could significantly affect cognition or mood were excluded from the study.

Thus, a total of 41 NMOSD patients fulfilled the inclusion criteria and were analysed. Epidemiological and clinical data (age, gender, educational level, disease duration, presence of AQP4-IgG antibodies, number of relapses, and current treatment) were recorded in an electronic case report form specially designed for this study. Neurological disability was assessed by the Expanded Disability Status Scale (EDSS) score [12]. We evaluated mood disorders using the Beck Depression Inventory-Fast Screen (BDI-FS) [13], with a total score ranging from 0 to 21. Higher scores indicate more severe depression symptoms with cut-off scores ≥4, ≥9, and >12 indicating mild, moderate, and severe depression, respectively. Daily fatigue was assessed by the Fatigue Impact Scale for Daily Use (D-FIS) [14], an 8-item self-report instrument in which higher scores indicate a greater impact of fatigue. The neuropsychological battery and the patient-centred measures employed are described in subsequent sections.

The study was approved by the investigational review board of Galicia (CEIm-G, Santiago de Compostela, Spain) and signed informed consent was obtained from all patients prior to their enrolment.

#### *2.2. Cognitive Functions*

We assessed cognitive performance using the Brief Repeatable Battery of Neuropsychological tests (BRB-N) [15]. This battery includes several tests assessing cognitive domains: (1) verbal memory: Selective Reminding Test (SRT, with two subtests: consistent long-term retrieval as an indicator of consolidation, and delayed recall); (2) visual memory: 10/36 Spatial Recall Test (SPART, with two subtests: immediate retrieval and delayed recall); (3) attention and information processing speed (IPS): Symbol Digit Modalities Test (SDMT) and Paced Auditory Serial Addition Test (PASAT) with three second per digit version; and (4) semantic fluency and cognitive flexibility: Word List Generation (WLG).

Raw values were transformed into z-scores by adjusting for age and educational level according to the available Spanish normative data [16], and then grouped in terms of global cognition (zBRB-N) and for each cognitive domain. Failure in any test was considered when z-score was below −1.5 standard deviations (SDs) of the norm. CI in a given cognitive domain was defined as a failure in at least one test assessing that domain, and global CI was defined as an impairment in at least two cognitive tests evaluating the same or different cognitive domains. Patients without global CI were categorised as cognitively preserved (CP).

#### *2.3. Patient-Centred Measures*

Measures of HRQoL were evaluated using the physical and psychological components of the Multiple Sclerosis Impact Scale (MSIS-29v2) [17], a self-reported questionnaire ranging from 0 to 100 with higher scores indicating worse health, and by the Satisfaction with Life Scale (SWLS) [18], a five-item measure of self-rated assessment of subjective wellbeing scored from 5 (worst) to 35 (best). Symptom severity from the patient perspective was assessed by the SymptoMScreen questionnaire (SyMS), consisting of 12 items with higher scores indicating more severe symptom endorsement [19]. The Stigma Scale for Chronic Illness 8-item version (SSCI-8) [20] was used to evaluate internalised and experienced stigma across neurological conditions. It is composed of eight items and scores range from 0 to 40 with higher scores indicating higher levels of perceived stigma. Finally, the MOS Pain Effects Scale (PES) [21] is a 6-item self-report questionnaire assessing how pain and unpleasant sensations affect mood, capacity to walk or move, sleep, work, recreation, and pleasure of life. Total score ranges from 6 to 30, with higher results suggesting greater impact of pain.

#### *2.4. Statistical Analysis*

We described demographic, clinical, cognitive and patient-centred measures data by the median and interquartile range (IQR) for continuous variables and by absolute numbers and relative frequencies for categorical data. The normality assumption was checked by histograms and Shapiro–Wilk test. We explored differences in demographic, clinical and cognitive characteristics between AQP4-IgG-positive and -negative NMOSD patients using the Chi-squared and Wilcoxon–Mann–Whitney *U*-test or Student's *t*-test, when necessary, and demographic and clinical characteristics between CP and CI patients. Differences between patient-centred measures in previous groups were explored with analysis of variance.

We used linear regression to analyse the association between the z-score of global cognition (zBRB-N) and demographic (age and gender), clinical (disease duration, presence of AQP4-IgG antibodies, EDSS score, number of relapses before study inclusion, current treatment, BDI-FS and D-FIS scores), and patient-centred measures (MSIS-29v2, SWLS, SyMS, SSCI-8 and PES scores). We then fitted a multiple regression model including all the variables mentioned. We used the Akaike Information Criterion (AIC) to select the variables that best fit a model based on the whole cohort. For easier interpretation, all variables were standardised using the mean and SD.

In all analyses, we included age and gender as covariates to control for their potential influence on results. We used the false discovery rate (FDR) to correct for multiple comparisons, and we set the significance level to *p* < 0.05. All the statistical analyses were performed with R statistical software (version 3.6.0, www.R-project.org; accessed on 1 September 2021).

#### **3. Results**

#### *3.1. Demographic, Clinical and Patient-Centred Measures of the Cohort*

The demographic, clinical and patient-centred measures data of the 41 patients are summarised in Table 1. Patients were more frequently female (85%) and middle-aged (median of 44 years, IQR: 39–49), with a median disease duration of 8.1 years (IQR: 3.9–15.5) and a median EDSS score of 2.0 (range 0–7.5). Depressive symptoms were present in 18 (44%) patients: 12 (29%) had mild depression and 6 (15%) moderate depression. Four had concomitant disorders, one was also diagnosed with Sjogren's syndrome and three more with Lupus.


**Table 1.** Demographic, clinical and patient-centred measures data of the study population.

Qualitative data are presented by absolute numbers and proportions, and quantitative data by the median and IQR, unless otherwise specified. NMOSD: neuromyelitis optica spectrum disorder; AQP4-IgG: aquaporin-4 immunoglobulin G; EDSS: Expanded Disability Status Scale; MSIS-29v2: Multiple Sclerosis Impact Scale.

Twenty-seven patients (66%) were AQP4-IgG positive. The demographic, clinical and patient-centred measures data were not significantly different between AQP4-IgG-positive and -negative patients (see Supplementary Material Table S1).

#### *3.2. Cognitive Characteristics in NMOSD Patients*

Fourteen patients (34%) were classified as having global CI. Demographic and clinical characteristics were similar (*p* > 0.05) between patients regardless of their cognitive status. However, patients with global CI had lower satisfaction with life, more severe symptom endorsement, higher levels of perceived stigma, and greater impact of pain interfering with their lives than CP patients (Supplementary Material Table S2).

Figure 1A summarises the cognitive z-score distribution of each test from the BRB-N. Based on the definition of CI described above, the following frequencies of impairment in each cognitive domain were recorded: 10 patients (24%) in verbal memory, 14 patients (34%) in visual memory, 13 patients (32%) in attention-IPS and 3 patients (7%) in semantic fluency (Figure 1B).

**Figure 1.** Cognitive performance in NMOSD patients. (**A**) Box plots represent the cognitive z-score distribution for each test from the BRB-N in the entire cohort; the *x*-axis depicts the name of each cognitive test and the *y*-axis the z-score for each test. The dotted black horizontal line represents −1.5 SDs of the norm. (**B**) The histograms show the proportions of patients with CP and CI in each cognitive domain. The *x*-axis shows the names of cognitive domains and the *y*-axis the number of patients for each domain. The total number of patients in each cognitive domain was 41. Both figures were fitted using R version 3.5.2 (R Foundation for Statistical Computing, Vienna, Austria). SRTS: Selective Reminding Test Long-Term Storage; SRTR: Selective Reminding Test Consistent Long-Term Retrieval; SRTD: Selective Reminding Test Total Delay; SPART: Spatial Recall Test; SPARTD: Spatial Recall Test Delay; SDMT: Symbol Digit Modalities Test; PASAT: Paced Auditory Serial Addition Task; WLG: Word List Generation.

When we analysed whether cognition was similar in AQP4-IgG-positive and -negative patients, we found no statistically significant differences in either the individual test z-scores of the BRB-N or the cognitive domains (see Table 2).


**Table 2.** Cognitive performance differences between AQP4-IgG-positive and -negative patients.

The data represent the median and IQR. AQP4-IgG: aquaporin-4 immunoglobulin G; SRTS: Selective Reminding Test Long-Term Storage; SRTR: Selective Reminding Test Consistent Long-Term Retrieval; SRTD: Selective Reminding Test Total Delay; SPART: Spatial Recall Test; SPARTD: Spatial Recall Test Delay; SDMT: Symbol Digit Modalities Test; PASAT: Paced Auditory Serial Addition Task; IPS: information processing speed; WLG: Word List Generation; BRB-N: Brief Repeatable Battery of Neuropsychological tests. The *p*-values were corrected by FDR adjustment. One patient was excluded due to unknown serostatus. <sup>a</sup> Student's *t*-test; <sup>b</sup> Kruskal–Wallis test.

#### *3.3. Association between Cognition, Demographic, Clinical and Patient-Centred Measures*

The global BRB-N z-score was associated with fatigue (D-FIS score: β = −0.322, 95% confidence interval, CI: −0.53, 0.12: corrected *p* = 0.013), physical impact of the disease on quality of life (MSIS-29v2: β = −0.31, 95% CI: −0.53, −0.09: corrected *p* = 0.028), satisfaction with life (SWLS: β = 0.302, 95% CI: 0.09, 0.51: corrected *p* = 0.024), self-perception of symptoms (SyMS: β = −0.327, 95% CI: −0.55, −0.11: corrected *p* = 0.019) and perception of stigma (SSCI-8: β = −0.322, 95% CI: −0.53, −0.12: corrected *p* = 0.012). Depression score was not related to cognitive performance (BDI-FS: β = −0.188, 95% CI: −0.41, 0.04: corrected *p* = 0.306).

Based on the AIC, the final multiple linear regression model included gender as well as depression (BDI-FS) and fatigue (D-FIS) scores, satisfaction with life and perception of the stigma (SWLS and SSCI-8). In our sample, 40% of the variability of the z-score of BRB-N was explained by this model (adjusted R2 = 0.396, *p* < 0.001). A change of 1 point in the BDI-FS questionnaire, sensitive to depression, was associated with change of 0.6 points in global cognitive scores. Fatigue (D-FIS score), satisfaction of life questionnaire (SWLS) and perception of stigma for neurological diseases (SSCI-8) were also related to cognition (Table 3).


**Table 3.** Associations between the z-score of the global cognitive score (zBRB-N) and demographic, clinical and patient-centred measures.

Beta coefficients and 95% confidence intervals (CI) and *p*-values corrected by FDR adjustment.

#### **4. Discussion**

This study of a well-characterised cohort of patients with NMOSD diagnosed by the 2015 criteria shows that up to 34% of the patients suffer CI. Visual memory was the main cognitive domain affected, followed by attention-IPS and verbal memory. AQP4- IgG-positive and -negative NMOSD patients did not differ in their cognitive performance, despite having similar demographic and clinical characteristics. The study also identifies depression, fatigue, satisfaction with life and perception of stigma as the main factors related to global cognitive performance.

Although some attempts have been made to describe the cognitive profile in NMOSD patients, both the reported CI prevalence and the affected cognitive domains varied widely. Our results are in agreement with other studies reporting that around 34% of the patients can be classified as having CI [22,23]. However, the proportion of patients suffering impairment in our study differs from others with smaller cohorts [24–26], which applied different criteria for CI [6] or used other neuropsychological tools for cognitive assessment [27]. The most affected domain in our cohort was visual memory, followed by attention-IPS and verbal memory. These findings are in line with two recent reviews where memory, attention, and IPS are the most affected cognitive functions [5,8]. Similarly, Zhang and et al. found that both memory and IPS were more severely impaired in the visual than in the verbal spectrum [28]. Conversely, our results show a relatively preserved performance for semantic verbal fluency, which is one of the most pronounced dysfunctions in other studies [6,24].

We did not find an influence of clinical worsening, as measured by the number of relapses, disease duration and EDSS score, on cognitive performance. Moreover, no association was found between a positive AQP4-IgG status and cognitive performance, supporting the results of other studies exploring differences in cognitive test scores and APQ4-IgG status [28–30]. APQ4-IgG appears to inhibit neuronal plasticity, impacting the proper functioning of the glutamatergic system and water homeostasis by increasing excitotoxicity in cerebral grey matter [25]. However, this would not explain the CI observed in NMOSD patients who are AQP4-IgG negative. It is also unknown what causes the humoral immune response that produces the AQP4-IgG antibodies. Some infectious agents, even silent infections (*Mycobacterium avium* subspecies), have been involved in NMOSD aetiology [31,32]. Molecular mimicry between microbes and host peptides has been proposed as a mechanism that would exacerbate autoimmunity and generate autoantibodies. Interestingly, one recent study has shown a different pattern of humoral-driven immune responses against viral agents (HERV-W retroviruses family) between patients with NMOSD compared to patients with MS or MOG-IgG [33]. If such infectious agents could influence cognitive performance and its implication in autoimmunity deserve further studies. Additionally, the use of techniques such as non-conventional neuroimaging can shed light on the underlying mechanisms of cognitive decline in patients with NMOSD. In this regard, the presence of brain lesions at sites of high AQP4 expression, atrophy of deep grey matter structures or impairment of white and grey matter integrity have been proposed to be related to cognitive deficits in NMOSD [22,34]. It should be noted that the pathophysiological substrate of CI in patients with NMOSD is still not completely understood and should be further explored.

Mood disorders and fatigue are other major symptoms described in patients with NMOSD. We found a moderate association between fatigue and lower cognitive performance, while depression was not related to cognition. However, when we included fatigue and depression in the same model (after applying the AIC), among other variables related to HRQoL and gender, we found a strong correlation between depression and cognitive performance, suggesting a relationship between patients' psychological wellbeing and their performance on cognitive tasks. The relationship between depression, fatigue and cognition is not straightforward [1,29], but the current results indicate that the combination of both factors exerts a more deleterious effect on cognitive function. Overall, these findings highlight the importance of considering depression and fatigue symptoms in patients with NMOSD in the clinical setting.

Importantly, we found differences in the patient-centred measures between patients with impaired cognition and those with preserved performance. Indeed, in our cohort, we observed that patients with global CI had lower life satisfaction, showed more severe symptom endorsement, and perceived more stigma and pain. Moreover, when we analysed the association between patient-centred measures and cognitive performance in the whole cohort, we found that the global cognitive score was associated with the physical impact of the disease on quality of life, satisfaction with life, self-perception of symptoms and perception of stigma. These findings highlight the impact of cognitive and psychological impairment on the wellbeing of NMOSD patients.

This study has some limitations. First, the cross-sectional design did not allow us to assess the dynamics of the cognitive profile in NMOSD patients. Similarly, causal relationships between cognition and patient-centred measures could not be identified, and we were not able to add any pathological aspects related to brain damage in the linear regression analysis. In addition, although our cohort of patients with NMOSD is not very large, it is similar in size to other studies in this field and influenced by the low prevalence of the disease [29,35]. Further studies including more patients will be needed to explore the cognitive profile and the influence of clinical and pathological aspects on cognition. Nevertheless, our study also has several strengths. We described cognitive performance and its relationship with demographic and clinical characteristics and patient-centred measures in a sample of patients treated across 13 different hospitals throughout Spain, allowing results to be generalised to clinical practice.

To conclude, about 34% of patients with NMOSD included in our study had cognitive dysfunction, with visual learning and memory and attention-IPS being the most affected cognitive domains. Cognition was mainly associated with mood, fatigue, and the patient's positive attitude toward life and their perception of the disease. Cognitive and psychological assessments may be crucial to achieve a holistic approach in NMOSD patient care.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/jpm12050743/s1, Table S1: Demographic, clinical and patient-centred measures differences between AQP4-IgG-positive and -negative patients; Table S2: Demographic, clinical and patient-centred measures differences between CI and CP patients.

**Author Contributions:** Conceptualization, E.L.-S., S.L., R.G.-B., J.M., M.S. and E.S.; Methodology, E.L.-S., S.L., R.G.-B., J.M., M.S. and E.S.; Formal Analysis, E.L.-S. and E.S.; Resources, R.G.-B. and J.M.; Data Curation, J.E.M.-L., S.L., Y.B., F.P.-M., L.F., C.C., M.L.M.-G., I.G.-S., S.B., L.R.-P., Á.P.S., V.M.-L., L.Q., L.C.-F. and M.S.; Writing—Original Draft Preparation, E.L.-S., S.L., R.G.-B., J.M., M.S. and E.S.; Writing—Review & Editing, E.L.-S., S.L., M.S. and E.S.; Supervision, S.L., R.G.-B., J.M., M.S. and E.S.; Funding Acquisition, R.G.-B. and J.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Medical Department of Roche Farma Spain (ML41397). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki, and approved by the investigational review board of Galicia (CEIm-G, Santiago de Compostela, Spain) (protocol code ML41397 on 27 September 2019).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Qualified researchers may request access to individual patient-level data through the corresponding author. The datasets generated during the analysis of the study are available from the corresponding author on reasonable request.

**Acknowledgments:** The authors would like to acknowledge all patients and their families for making the PERSPECTIVES-NMO study possible.

**Conflicts of Interest:** The authors declare the following potential conflict of interest with respect to their research, authorship and/or the publication of this article: E.L.-S. received travel reimbursement from Sanofi and ECTRIMS and reports personal fees from Roche, during the conduct of the study. J.E.M.-L. has received grants and consulting or speaking fees from Almirall, Biogen, Bristol-Myers-Squibb, Genzyme, Merck, Novartis, Roche and Teva. S.L. received compensation for consulting services and speaker honoraria from Biogen Idec, Novartis, TEVA, Genzyme, Sanofi and Merck. Y.B. received speaking honoraria from Biogen, Novartis, and Genzyme. R.G.-B. and J.M. are employees of Roche Farma Spain. F.P.-M. received compensation for serving on scientific advisory boards or speaking honoraria from Almirall, Biogen Idec, Genzyme, Merck-Serono, Mylan, Novartis, Roche, Sanofi-Aventis and Teva, outside the submitted work. C.C. reports personal fees from Biogen, Sanofi, Merck, Novartis, Teva and Roche, outside the submitted work. I.G.-S. has received funding for research projects or in the form of conference fees, mentoring and assistance for conference attendance from Biogen-Idec, Roche, Merck, Novartis and Sanofi-Genzyme. S.B. has received conference fees, mentoring, and assistance for conference attendance from Bayer, Biogen-Idec, Bristol-Myers Squibb, Roche, Merck, Novartis, Almirall and Sanofi-Genzyme. L.R.-P. received honoraria compensation to participate in advisory boards, collaborations as a consultant and scientific communications and received research support, funding for travel and congress expenses from Roche, Biogen Idec, Novartis, TEVA, Merck, Genzyme, Sanofi, Bayer, Almirall and Celgene. A.P.S. has received personal compensation for consulting, serving on a scientific advisory board or speaking from Almirall, Biogen, Bayer Schering Pharma, Merck Serono, Novartis, Roche, Sanofi-Aventis and Teva. L.Q. reports research grants from Instituto de Salud Carlos III—Ministry of Economy and Innovation (Spain), CIBERER, GBS-CIDP Foundation International, Roche, UCB and Grifols. He provided expert testimony to CSL Behring, Novartis, Sanofi-Genzyme, Merck, Annexon, Johnson and Johnson, Alexion, UCB, Takeda and Roche. He serves on the Clinical Trial Steering Committee for Sanofi Genzyme and is Principal Investigator for UCB's CIDP01 trial. L.C-.F. has received funding for research projects or in the form of conference fees, mentoring and assistance for conference attendance from Bayer, Biogen-Idec, Bristol-Myers Squibb, Biopas, Roche, Merck, Novartis, Almirall, Celgene, Ipsen and Sanofi-Genzyme. M.S. reports speaking honoraria from Roche and UCB Pharma, and travel reimbursement from Sanofi and Zambon. E.S. received travel reimbursement from Sanofi and ECTRIMS and reports personal fees from Roche, during the conduct of the study. The authors report no other conflicts of interest in this work.

#### **References**


## *Review* **Towards Multimodal Machine Learning Prediction of Individual Cognitive Evolution in Multiple Sclerosis**

**Stijn Denissen 1,2,\*, Oliver Y. Chén 3,4, Johan De Mey 1,5, Maarten De Vos 6,7, Jeroen Van Schependom 1,8, Diana Maria Sima 1,2,† and Guy Nagels 1,2,9,†**


**Abstract:** Multiple sclerosis (MS) manifests heterogeneously among persons suffering from it, making its disease course highly challenging to predict. At present, prognosis mostly relies on biomarkers that are unable to predict disease course on an individual level. Machine learning is a promising technique, both in terms of its ability to combine multimodal data and through the capability of making personalized predictions. However, most investigations on machine learning for prognosis in MS were geared towards predicting physical deterioration, while cognitive deterioration, although prevalent and burdensome, remained largely overlooked. This review aims to boost the field of machine learning for cognitive prognosis in MS by means of an introduction to machine learning and its pitfalls, an overview of important elements for study design, and an overview of the current literature on cognitive prognosis in MS using machine learning. Furthermore, the review discusses new trends in the field of machine learning that might be adopted for future studies in the field.

**Keywords:** multiple sclerosis; prognosis; cognition; machine learning; artificial intelligence

#### **1. Introduction**

As one of the most puzzling neurodegenerative disorders, multiple sclerosis (MS) is characterized by a complex biological etiology [1] and a highly heterogeneous disability progression. This gives rise to an important unmet need that has been given considerable attention in MS research in recent decades, which is the prediction of its future course [2–5]. In light of an ongoing paradigm shift in medicine, moving from a disease-centered to a patient-centered approach [6], the ability to foresee disability build-up in a specific patient would be a true game changer in modern medicine; neurologists could intervene at an early stage, whereas patients and their caregivers could anticipate future challenges in daily life.

Currently however, to predict the natural course of MS on an individual level remains challenging. Foremost, the problem is intrinsically difficult since the disease manifests differently among patients. From a biological point of view, tissue damage in the central nervous system (CNS), caused by auto-immune processes, is not restricted to a single location or to a particular timepoint during the disease course [7]. Typical observations are the presence of lesions, resulting from processes such as demyelination and inflammation,

**Citation:** Denissen, S.; Chén, O.Y.; De Mey, J.; De Vos, M.; Van Schependom, J.; Sima, D.M.; Nagels, G. Towards Multimodal Machine Learning Prediction of Individual Cognitive Evolution in Multiple Sclerosis. *J. Pers. Med.* **2021**, *11*, 1349. https://doi.org/10.3390/jpm11121349

Academic Editor: Cristina M. Ramo-Tello

Received: 10 November 2021 Accepted: 9 December 2021 Published: 11 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in conjunction with the loss of CNS tissue [7]. However, MS patients typically present a wide range of clinical symptoms as well, ranging from motor and sensory impairments to fatigue, cognitive problems, and mental health issues [8]. Since every person with MS presents a unique biological and clinical profile, health-related predictions should be individualized.

At present, the best tools to estimate individual disease progression are the so-called prognostic biomarkers. They are defined by Ziemssen et al., 2019, as: "A prognostic biomarker" that "helps to indicate how a disease may develop in an individual when a disorder is already diagnosed" [9]. Although these variables can be regarded as the cobblestones of the road towards an accurate prognostic model, it is important to note that this term is assigned regardless of any magnitude of prognostic accuracy. Moreover, they are typically established at group level, which might be a suboptimal fit in light of the aforementioned heterogeneity across subjects with MS.

In a recent systematic review by Brown et al., 2020, the authors identified several studies that used various statistical techniques to combine prognostic biomarkers [2]. Although the techniques used are widespread, some studies report on the use of machine learning (ML), allowing personalized predictions of the behavior of a clinically relevant variable over time. The literature on this topic was synthesized by Seccia et al., 2021, although the authors limited their search to models using clinical data [4]. As can be expected from a young field of research, a sprawl of underlying methodology is observed among papers that use ML to perform prognostic modelling in MS; heterogeneity in terms of input features, learning algorithms, labels to predict, and assessment metrics hamper comparability among models. The narrative nature of both aforementioned reviews underscores the fact that quantitative synthesis by means of, e.g., meta-analysis or meta-regression, is not yet possible. Furthermore, various models aim to predict disease progression in terms of changes in the Expanded Disability Status Scale (EDSS), while a recent review by Weinstock-Guttmann et al., 2021, questions the use of the EDSS for prognostic purposes due to a lack of accuracy and stability [3]. This review also highlights the importance to look at other domains, such as cognitive impairment [3]. Problems in various cognitive domains are prevalent in persons with MS, especially in memory and information processing speed [10]. Since cognitive functioning was shown to be related to socio-economic aspects such as employment status [11] and income [12], prognostication in this domain could allow patients and their caregivers to anticipate future problems at an early stage.

Although the use of machine learning for cognitive prognosis is still in its infancy, this paper aims to offer directions in this field by (1) introducing the concept of machine learning, (2) outlining the pitfalls of machine learning in medical sciences, (3) offering guidance for the design of studies that use ML for cognitive prognosis using lessons learned from ML-powered physical prognosis, (4) summarizing literature on ML-powered cognitive prognostication, and (5) highlighting trends in ML that could boost the field of MS prognosis. Since the main goal of this review is to provide directions for a young field of research rather than to synthesize the scarcely available literature, this review adopts a narrative, non-systematic design.

#### **2. An Introduction to Machine Learning**

Machine learning is defined in the Oxford University Press (OUP) as: "The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data" [13]. Although learning and adaptation can happen in multiple ways, typically categorized as "supervised", "unsupervised" and "reinforcement" learning, the most common machine learning technique adopted in the medical sciences is supervised machine learning. The notion of "supervision" here is the presence of the ground-truth label to be predicted, which can either be a continuous variable (regression) or a categorical variable (classification). In general, the goal is to learn the relationship, in terms of a function, between a given input and the output—the ground-truth label. The

function that subsequently best predicts the ground-truth label on input data that was not used to learn the function is the model of choice.

The concept can be clarified by means of an analogy; a student studying for a future exam. In the first phase, the student will gather knowledge on the domain by using available resources such as books and lecture notes (training). The student subsequently verifies whether additional study is necessary by completing an exam from previous years to which the answers are available (validation). Together, this is called the training phase. As necessary, training and validation are repeated until the student is ready to take the final exam, which constitutes the testing phase.

Let us assume that we want to use supervised machine learning to predict a person's age given a brain magnetic resonance (MR) image. We start from a dataset with T1 weighted brain MR images (input) and the age at image acquisition (ground-truth label). Since age is a continuous variable, we are facing a regression problem. How we will learn the relationship between MRI and age depends on how we will use the MRI:



**Table 1.** Supervised machine learning techniques exemplified for binary classification and univariate regression. For ease of interpretation, all examples use a low-dimensional feature space. However, the same principle holds when adding features towards higher-dimensional feature spaces.

**Table 1.** *Cont.*


#### **3. Caveats for Machine Learning and Potential Solutions**

Numerous pitfalls can be encountered when performing machine learning. The majority of them are generally applicable; they could arise in any machine learning query in any domain. Yet, we can encounter hazards that are specific for medical sciences. Both are discussed in this section, and solutions used in the field of prognostic modelling are also summarized.

#### *3.1. General Pitfalls in Machine Learning*

The most common pitfall in any machine learning query is overfitting. As already mentioned, a function is learned on training data and evaluated on validation and test data. Overfitting means that our learned function has become very specific to the training data, for example, because it also learned measurement errors in that dataset. Since measurement errors are different in another dataset, the function will be less accurate on that dataset. It is also possible, however, that we underestimate the complexity of the problem, which is the exact opposite case and understandably termed underfitting. For example, linear

regression assumes a linear relationship between input features and the endpoint, which limits the model to only learn linear relationships, while the problem might be non-linear in reality. Figure 1 serves as a visual aid towards the understanding of under- and overfitting.

**Figure 1.** Bias–variance trade-off curve. Bias and variance vary according to model complexity [16]. The blue curve is Ein, the within-sample error representing the error on the training dataset. The more complex a function is allowed to be, the more specific the function becomes for the training dataset, i.e., overfitting. The latter is notable by the inception of an increase in Eout (orange curve, minimal value indicated with the vertical dotted line), the out-of-sample error, representing the error on the validation dataset. A simple function suffers high bias, i.e., it is highly likely to assume a wrong underlying function, since it only allows limited complexity between input and output to be learned (underfitting). By allowing more complexity, the bias decreases, but the function becomes highly variable depending on the dataset used for training (overfitting). An illustration is provided above, where the learned function is the line or curve separating two classes. From visual inspection, the optimal situation would be a smooth curve between the two classes (example in the middle). In the example on the left, underfitting occurs since only a straight line is allowed; many misclassifications occur in both training and validation data. In the example on the right, we observe a curve that squirms around all datapoints to fit the training dataset (overfitting), which, for example, happens when we allow the model to learn a complex function capable of learning measurement errors in a dataset. Hence, the function becomes specific to the training dataset; no misclassifications occur in the training data, but the same curve separating the validation dataset yields many misclassifications.

Overfitting often results from an imbalance between the number of variables and observations in the dataset. As a rule of thumb in the field, the number of observations should be at least 10 times as high as the number of variables [17]. To get to that ratio, we can address an imbalance in two ways: upscaling the observations or downscaling the variables. We note that in the case of downscaling variables, one should always remain vigilant not to underfit; informative features might be rejected as well.


Besides addressing observations and features, we discuss one additional technique to mitigate overfitting, which is training interruption. In their efforts to predict the progression of disease, Bejarano et al., 2011 [22] and Yoo et al., 2016 [23] stopped the training phase early by monitoring the error in the validation set. As can be seen in Figure 1, the error in the training set keeps reducing over time, since this is the goal of training. Initially, the same is observed for the validation data set, but upon obtaining a minimal value, the error will gradually increase, indicating the inception of overfitting. When stopping training at this point, overfitting might be mitigated.

Class imbalance is a specific pitfall for classification problems and is present when a certain class is overrepresented in the data, i.e., it contains more observations compared to the other class(es). For prognosis, subjects that do not worsen over time are often in the majority compared to worsening subjects [24,25]. Like overfitting, it can lead to the poor generalization of an algorithm [26]. Methods to correct class imbalance in a deep learning context are summarized in a systematic review by Buda et al., 2018 [26]. Two types of corrections are discussed, addressing either the data or the classifier itself. When addressing the data, we could restore the balance in two ways: by oversampling the minority class or by undersampling the majority class. On the other hand, we can make adjustments when training or testing the classifier. For example, one could decide to more severely penalize a misclassification towards a certain class compared to a misclassification towards another class, i.e., cost-sensitive learning [26]. These three methods were already explored in the light of prognostic modelling in MS to address the imbalance between stabilizing and worsening subjects [24,25].

#### *3.2. Specific Pitfalls for Medical Data*

Next to several general pitfalls, there are additional pitfalls when working with medical data:


#### **4. Designing an ML Study for Cognitive Prognosis**

Supervised machine learning is popular for its ability to provide personalized predictions on health parameters that clinicians are used to work with in routine practice. One of these use cases includes predictions on how a patient with a certain condition progresses over time (prognosis) [30]. In the following section, we will address relevant questions when designing a machine learning study for cognitive prognosis in MS, using a question and answer (Q&A) approach. Answers are mostly constructed using lessons learned from the literature on ML-powered physical prognosis in MS and the literature on cognitive prognostic biomarkers.

#### *4.1. Which Outcome to Predict?*

As mentioned before, the outcome (categorical versus continuous) will define the type of problem we are facing: classification versus regression. When looking at cognitive outcomes, the most commonly affected domains are information processing speed and memory [10]. According to Sumowski et al., 2018, information processing speed is best assessed with the Symbol Digit Modalities Test (SDMT), whereas for memory, the brief Visuospatial Memory Test—Revised (BVMT-R), California Verbal Learning Test—Second Edition (CVLT-II), and Selective Reminding Test (SRT) are the most sensitive tests [31]. However, composite scores also exist to provide a more holistic view on the cognitive status of persons with MS, which are summarized in Oreja-Guevara et al., 2019 [32]. In order to predict a change in these variables, a regression approach could include prediction

of a future z-normalized test score [33], which is often the raw test score corrected for age, sex, and education level [33,34]. For classification, a popular categorization is defining "stable" and "declining" subjects [35], although wording can differ. In Filippi et al., 2013, for example, the authors defined cognitive worsening as an increase in impaired tests in a cognitive test battery over time, where impairment was defined as having a z-normalized test score below two [36]. Colato et al., 2021 defined worsening as a 10% decline of the SDMT score over time [37]. We furthermore note that practice effects can occur in cognitive tests over time [38]. To correct for this, a "reliable change index" was used in Eijlers et al., 2018 [35] and Cacciaguerra et al., 2019 [39]. Lastly, up until now, outcomes were all objective measures of cognition, while subjective, or self-reported measures also receive attention as outcomes for MS prognosis [40].

#### *4.2. Which Features to Take into Account?*

To be able to predict a future change in the variable of interest, the input of the machine learning model should receive careful consideration. Except when modelling on raw input data, learning should occur on features that are deemed informative towards the outcome to be predicted. To this end, we can use prognostic biomarkers, which were intensively studied in recent decades. However, although evidence on cognitive prognostic biomarkers exists, comprehensive reviews on the topic were mainly made for physical deterioration. We refer to reviews that summarize prognostic biomarkers for different modalities; demographics [41], clinical information [41], CNS imaging [42–45], molecular information [9], and neurophysiology [45]. Yet, there appears to be an overlap between physical and cognitive prognostic biomarkers. Although it is beyond the scope of this review to provide a summary of cognitive biomarkers, we refer to studies that identified cognitive prognostic biomarkers for different modalities such as demographics [35,46,47], clinical information [35,46,47], MRI [35,46,48], optical coherence tomography (OCT) [49], molecular information [50], and neurophysiology [51].

In analogy with the previous question on outcomes, subjective measures might also be informative for the prediction of disease course, such as patient-reported outcomes (PRO) [52]. Specifically for cognitive prognosis, features such as subjective cognitive impairment [47] and perceived ability to concentrate [53] were found to be informative.

#### *4.3. On Which Time-Frame Should Predictions Be Made?*

The literature usually makes a distinction between short-term and long-term prognosis. No clear cut-off between them has been reported, and this most probably depends on the clinical query that is addressed. Short-term prognosis is by far the most intensively studied [23,25,28], while Zhao et al., 2017 presented a longer-term predictive model of 5 years [24]. Yperman et al., 2020 stated that their rationale for a 2-year timeframe was based on maximizing the number of observations in the dataset [28]. Data availability is highly likely to hinder the field in performing longer-term predictions using machine learning, but studies investigating prognostic biomarkers for long-term disability already show promising results [36,46].

#### *4.4. Which Machine Learning Algorithm to Use?*

Given the heterogeneity in methodology throughout the literature, it is too preliminary to make firm statements regarding the superiority of one algorithm over another when considering performance. However, a second consideration is model complexity; linear models could underfit data, but are easy to interpret and familiar for clinicians. As illustrated by Sidey-Gibbons et al., 2019 [54], algorithms capable of handling increased complexity are in general harder to understand. This is, for example, the case for (deep) neural networks, which are often regarded as black box models [54].

#### *4.5. How to Assess a Machine Learning Model?*

Classifications will typically yield a so-called confusion matrix. In the case of a dichotomous endpoint, the confusion matrix is a 2 × 2 matrix with one axis indicating the true group labels and the other axis the predicted group labels. An example using the labels "worsening" versus "stabilizing" is illustrated in Figure 2, along with the metrics that can be calculated from this matrix. The different metrics allow us to study model performance from different perspectives. When looking at the confusion matrix of Figure 2, low sensitivity will leave worsening patients undetected, which causes neurologists to falsely assume that their patient is stabilizing. Withholding treatment—while this is in fact justified—will potentially endanger the patient's well-being. The opposite is true when we encounter low specificity; patients that do not worsen over time might receive treatment, while administration could potentially induce adverse events in their case.

**Figure 2.** The confusion matrix and its derived metrics.

Regarding regression performance, the most intuitive metric is the mean absolute error (MAE); it represents how much on average the predicted value deviates from the true value, while making abstraction of whether this is an under- or overestimation. The main difference with related metrics such as the normalized root-mean-square error (NRMSE, RMSE [55], MSE) is that MAE retains the unit of the outcome variable. Other performance metrics include the correlation between the true and predicted outcome [56], the variance explained by the input features (R2) [55], and the Akaike Information Criterion [55].

#### *4.6. How Should Authors Report the Performance of Their Machine Learning Model?*

Solid interpretation and comparability of models stands or falls with how papers describe their methodology and performance. As discussed in the previous subsection, different performance metrics give different insights in model performance. Although the importance of a given metric mostly depends on the domain context, it is essential to not only report scores such as accuracy, sensitivity, and specificity, but also present the raw confusion matrix in classification problems. For regression, a 2-column data frame with the

predicted and true ground-truth label allows the calculation of measures such as the MAE, NRMSE, RMSE, MSE, and correlation coefficient. Providing such results in publications (e.g., in supplementary materials [27]) would be a leap forward in terms of reproducible research, while the anonymity of subjects remains assured.

The benefit is twofold. Firstly, the readership of machine learning papers can extract other metrics that they are interested in. Secondly, it would also allow future reviews on machine learning models to move beyond a narrative design. In systematic reviews for example, meta-analysis and meta-regression allows for the quantitative synthetization of data, which is possible since randomized controlled trials (RCTs) are strongly recommended to adhere to the CONSORT statement [57], guiding RCT authors towards correct, transparent, and complete reports. Although the CONSORT statement is not applicable to machine learning research, another statement in the "Enhancing the QUAlity and Transparency Of health Research" (EQUATOR, https://www.equator-network.org/, accessed on 8 December 2021) network is in fact applicable: the "Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis" (TRIPOD) statement [58].

#### *4.7. When Is a Model Ready for Clinical Practice?*

In order to introduce a predictive model in clinical practice, extensive technical validations and clinical performance evaluations are required, which should be complemented by ethical considerations and risk analysis. There needs to be maximal transparency towards the model's performance, so that regulators and clinicians can establish whether its error is acceptable in view of the potential risks to patients. However, when do we judge a machine learning model to be performant enough to be translated into a clinical decision support system (CDSS) [59]? In this regard, a first milestone is whether it performs better than random, but in a second phase, it should compare favorably against other potentially simpler models, such as decision rules based on single prognostic biomarkers. Among other factors, model complexity might influence the trust of clinicians in artificial intelligence (AI) [60]. Furthermore, it would be informative to know how the machine's prognostic accuracy relates to the accuracy of human prediction, in this case of the neurologist. Although the literature on the latter is scarce, we identified one paper on the accuracy of decoding cognitive impairment in MS, albeit cross-sectional [61]. The authors found the accuracy to be comparable to chance, and highlighted the need for improved cognitive screening [61]. In order to benchmark how a model would perform in similar conditions to actual clinical practice, study designs should directly compare the prognostic accuracy of medical professionals without and with the assistance of the considered CDSS. A typical scenario involves comparing whether the CDSS helps bridging the gap between medical professionals with different levels of experience. For instance, an ongoing trial investigates prognostic accuracy of junior and senior doctors in the domain of traumatic brain injury [62].

We note that although some models might be complex, several methods exist to enhance clinicians' trust. In Tousignant et al., 2019 [25], deep learning, which is currently one of the most complex machine learning algorithms, was used to predict worsening in EDSS from MR images. The authors used a two-step process to gain the clinician's trust, namely by quantifying the model's confidence in its own predictions, and verifying whether the model is correct when it is confident [25]. We note that a whole field of research, i.e., explainable AI (XAI), is dedicated to, among other things, augmenting user trust [63].

#### *4.8. Which Data to Use?*

To address this question, we refer back to the section on "Specific Pitfalls for Medical Data", where we discussed study versus real-world data, single- versus multi-center data, and dealing with multiple visits of the same subject.

#### **5. State-of-the-Art ML-Powered Cognitive Prognostic Models**

Literature in the field is scarce. This was confirmed by a PubMed search using the following search strategy: "(((multiple sclerosis[MeSH Terms]) OR (multiple sclerosis)) AND ((cognit\*) OR (cognition[MeSH Terms]))) AND ((((machine learning[MeSH Terms]) OR (machine learning)) OR (artificial intelligence[MeSH Terms])) OR (artificial intelligence))", which was run on 3 December 2021, and yielded 39 records. Among those, we identified two studies that used machine learning for cognitive prognosis; Kiiski et al., 2018 [56] and Lopez-Soley et al., 2021 [64]. Kiiski et al., 2018 used supervised machine learning on different combinations of multimodal data, including demographic, clinical, and electroencephalography (EEG) data to predict short-term: (1) overall cognitive performance and (2) performance on information processing speed on a combined sample of persons with MS and healthy controls [56]. Lopez-Soley et al., 2021 also used multimodal data, including demographic, clinical, and MRI data, to predict short-term future cognitive impairment. This section is dedicated to the lessons that can be learned from their efforts.

#### Kiiski et al., 2018

First of all, the use of multimodal data is a good choice in light of the complex nature of MS and the identification of prognostic biomarkers in multiple domains. Moreover, the previous literature in the field of epilepsy established the superiority of multimodal data compared to using a single modality for machine learning predictions [65]. Secondly, the authors chose to z-normalize results for each neuropsychological test based on the mean and standard deviation (SD) of their sample, and use composite z-scores (average z-score of multiple tests) as the ground-truth label. A composite score was created for general cognitive functioning and one for information processing speed. Although transformation of raw test results allows comparison between, and aggregation of, different tests, the downside is in terms of clinical interpretation; clinicians have a reference frame for the original test results, whereas they do not for z-scores. Thirdly, the authors extracted over 1000 spatiotemporal features, whereas only 78 observations were used. This can be considered a large imbalance with a risk for overfitting, especially when considering the aforementioned rule of thumb of at least 10 times as many observations as features. The risk for overfitting might however have been reduced for several reasons:


Fourthly, we previously mentioned the importance of benchmarking to obtain a reference frame for the quality of the prediction. For this, the authors created a "null model" by shuffling the ground-truth values across subjects before starting the learning phase. According to the authors, this provides an intuition in the "level of optimism inherent in the model" [56]. Lastly, we highlighted that using a combined sample of persons with MS and healthy controls increases sample size, but it obfuscates a clear interpretation of its value for prognosis in MS. Their best-performing model for general cognitive functioning included all available data modalities and yielded a mean cross-validated correlation of 0.44.

Lopez-Soley et al., 2021

Opposed to the regression approach of Kiiski et al., 2018 [56], Lopez-Soley et al., 2021 used a classification approach to predict future global- and domain-specific cognitive impairment [64]. The risk of overfitting was reduced by using Lasso regularization during logistic regression, 10-fold cross-validation, and retaining as much data as possible by imputing missing values.

Since cognitively impaired subjects were underrepresented for global cognition and all cognitive domains, it can be considered good practice that the authors used the "balanced accuracy" ((sensitivity + specificity)/2) to assess model performance across cognitive domains. The difference with accuracy (cfr. Figure 2) can be clarified with an example. Say that in a dataset, 20 persons with MS experience cognitive decline, and 80 do not. If the model correctly classifies 70 of the 80 stabilizing subjects, but only 5 of the 20 worsening subjects, the model achieves an accuracy of (70 + 5)/100 = 75%. The balanced accuracy, however is ((5/20) + (70/80))/2 = 56.25%. Hence, the balanced accuracy might be adopted for datasets that are unbalanced. For perfectly balanced datasets, accuracy and balanced accuracy yield the same value. Based on this metric, the authors reported the best performances for verbal memory (79%) and for attention/information processing speed (73%).

We note that by reporting the true class distribution, the authors greatly contributed to the interpretation of their result, as any evaluation metric can now be assessed with respect to that reference frame.

Overall, both studies yielded valuable intuition in the future design of machine learning studies for cognitive prognosis in MS. Despite the fact that predictions were obtained on a sample of both persons with MS and healthy controls in Kiiski et al., 2018 [56], predictive performances of both studies might serve as benchmarks for evaluating future studies in the field.

#### **6. ML Trends and Opportunities for Prognostic Modelling in MS**

Although studies dealing with prognostic modelling of cognitive evolution in MS are scarce, we see several interesting avenues for ML-driven prognostication in MS. We will discuss alternative approaches for prognostication, the simulation of treatment response and solutions to scarcity of longitudinal data.

#### *6.1. Alternative Approaches for Prognostication*

Hybrid predictions. Tacchella et al., 2018 introduced the proof-of-concept "hybrid predictions" [29] in the field of MS prognosis. The authors hypothesized that the discrepancy in "reasoning" between human and machine could in fact complement each other. Their results showed that the aggregation of human (medical students) and machine predictions consistently outperformed any of the single instances in predicting the conversion from relapsing–remitting to secondary progressive MS [29]. Besides performance, the fact that human intelligence is still involved in predictions could reassure clinicians that models do not solely rely on artificial intelligence, since they also rely on expert knowledge that algorithms might not be able to learn.

Digital twin. The field of machine learning for MS prognostication is mutually geared towards augmenting personalized care with personalized predictions. Since the prediction relies on the profile of a subject in terms of multimodal data, a subject can also be represented in a digital way, i.e., a digital twin. The concept of a digital twin was discussed elaborately in a recent review by Voigt et al., 2021, highlighting its potential to predict future disease course and simulate treatment effect [67].

#### *6.2. Simulation of Treatment Response*

Up until now, studies on prognostication mostly focused on predicting the natural course of multiple sclerosis. In our view, this is a necessary step to subsequently be able to predict, in a personalized way, how this natural course changes by administering certain treatment such as disease-modifying therapy (DMT). Although such estimates might be even more challenging, Pruenza et al., 2019 aimed to predict individual responses to 14 different DMTs [68]. The authors assigned a score per DMT that represented the likelihood of no disability progression in case of administration of the DMT [68]. Beyond a research effort, the authors created a tool that allows users to predict treatment response in new patients [68].

#### *6.3. Solutions to Scarcity of Longitudinal Data*

Transfer learning. A potential solution to scarcity of longitudinal data is to mitigate the necessity of building a model from scratch by using a robustly trained model from another domain, mostly related to the domain of interest. To this end, neural networks are typically used. Since the network's weights are meaningful to solve a related task, they could be used as initialization for the task of interest, updating the weights using a smaller dataset. For example, Nanni et al., 2020 used pretrained networks (trained on the ImageNet database [69]) to classify pictures of everyday objects (number of pictures in the order of millions), for prognostic purposes in Alzheimer's disease (number of MR images in the order of hundreds) [70].

Federated learning. For various reasons, data sharing in medical sciences remains delicate [71], which might explain why efforts in ML-powered prognostication remain largely single-center, extracting data from a single central database (centralized approach). However, an increasing number of studies [72,73] prove that machine learning can also occur in a decentralized way, i.e., by federated learning, meaning that data remain at their original location, while still being used for machine learning in a remote location.

Continual learning. In continual learning, an AI is not trained once, but evolves over time by augmenting performance along with the ever-going supply of novel data. The implications of this technique in medical sciences are nicely discussed in Lee et al., 2020 [74].

#### **7. Conclusions**

Machine learning is a rising concept in light of clinical decision support systems and personalized medicine and could boost the quest to find a suitable predictive algorithm for prognosis in MS. Investigations should however also address cognitive deterioration, and authors should be maximally transparent in reporting their results to allow comparison in the field. In doing so, clinical decision support systems using machine learning to predict future cognitive deterioration in MS could become a reality in clinical practice, providing the best possible personalized care for persons with MS.

#### **8. Key Messages**


**Author Contributions:** Conceptualization, S.D., J.V.S., D.M.S. and G.N.; methodology, S.D., J.V.S., D.M.S. and G.N.; formal analysis, S.D.; investigation, S.D.; resources, S.D.; writing—original draft preparation, S.D.; writing—review and editing, S.D., O.Y.C., J.D.M., M.D.V., J.V.S., D.M.S. and G.N.; visualization, S.D.; supervision, D.M.S. and G.N.; project administration, S.D.; funding acquisition, S.D., J.V.S., D.M.S. and G.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** Stijn Denissen is funded by a Baekeland grant appointed by Flanders Innovation and Entrepreneurship (HBC.2019.2579, www.vlaio.be, accessed on 8 December 2021); Guy Nagels received research grants from Biogen and Genzyme, and is a senior clinical research fellow of the FWO Flanders (1805620N, www.fwo.be, accessed on 8 December 2021); and Jeroen Van Schependom is a senior research fellow of VUB.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** Stijn Denissen is preparing an industrial PhD in collaboration with icometrix. Diana M. Sima is employed by icometrix. Guy Nagels is medical director of neurology at, and minority shareholder of, icometrix. The other authors declare no conflict of interest.

#### **References**


## *Article* **Associations between Lifestyle Behaviors and Quality of Life Differ Based on Multiple Sclerosis Phenotype**

**Nupur Nag 1,\*, Maggie Yu 1, George A. Jelinek 1, Steve Simpson-Yap 1,2, Sandra L. Neate <sup>1</sup> and Hollie K. Schmidt <sup>3</sup>**


**Abstract:** Multiple sclerosis (MS), a neuroinflammatory disorder, occurs as non-progressive or progressive phenotypes; both forms present with diverse symptoms that may reduce quality of life (QoL). Adherence to healthy lifestyle behaviors has been associated with higher QoL in people with MS; whether these associations differ based on MS phenotype is unknown. Cross-sectional self-reported observational data from 1108 iConquerMS participants were analysed. Associations between lifestyle behaviors and QoL were assessed by linear regression, and phenotype differences via moderation analyses. Diet, wellness, and physical activity, but not vitamin D or omega-3 supplement use, were associated with QoL. Specifically, certain diet types were negatively associated with QoL in relapsing-remitting MS (RRMS), and positively associated in progressive MS (ProgMS). Participation in wellness activities had mixed associations with QoL in RRMS but was not associated in ProgMS. Physical activity was positively associated with QoL in RRMS and ProgMS. Phenotype differences were observed in diet and wellness with physical QoL, and physical activity with most QoL subdomains. Our findings show lifestyle behaviors are associated with QoL and appear to differ based on MS phenotype. Future studies assessing timing, duration, and adherence of adopting lifestyle behaviors may better inform their role in MS management.

**Keywords:** multiple sclerosis; lifestyle behavior; MS management; MS phenotype; quality of life

#### **1. Introduction**

Multiple sclerosis (MS), a chronic neuroinflammatory disorder, is commonly diagnosed in adults, predominantly women, aged 20 to 30 years [1]. On initial diagnosis, 85% of people with MS (pwMS) are diagnosed with relapsing-remitting MS (RRMS) presenting with acute attacks of new or increasing neurologic symptoms, and 10–15% with primary progressive MS (PPMS) defined by deterioration of symptoms from onset without obvious relapses or remission [2]. Within 15–20 years of diagnosis, approximately 50–75% of RRMS cases convert to secondary progressive MS (SPMS) defined by gradual worsening of neurologic function alongside a general cessation of relapses [3].

Both RRMS and progressive MS (ProgMS) may manifest an array of physiological, psychological, and motor symptoms; the number and severity of these symptoms and associated impairment play a critical role in quality of life (QoL). Symptoms of fatigue, pain, cognitive impairment, depression, and disability are key predictors of worse QoL up to 10 years later [4]. Improvement of symptoms through adoption of healthy lifestyle behaviors has potential to improve QoL.

Healthy lifestyle behaviors, including diet, vitamin D and omega 3 supplementation, and participation in wellness and physical activities have previously been found to be associated with higher QoL. PwMS who adhered to either high quality, MS-specific, or

**Citation:** Nag, N.; Yu, M.; Jelinek, G.A.; Simpson-Yap, S.; Neate, S.L.; Schmidt, H.K. Associations between Lifestyle Behaviors and Quality of Life Differ Based on Multiple Sclerosis Phenotype. *J. Pers. Med.* **2021**, *11*, 1218. https://doi.org/ 10.3390/jpm11111218

Academic Editor: Cristina M. Ramo-Tello

Received: 4 October 2021 Accepted: 15 November 2021 Published: 17 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

anti-inflammatory diets, have reported improved mental and physical QoL [5–7]. Vitamin D supplementation improved QoL in pwMS with initial levels lower than 30 ng/mL [8] and was associated with improved physical QoL in pwMS reporting an average daily intake of over 5000 IU [9]. Less is known about the effects of omega-3 supplement use, though in an international cohort of over 2500 pwMS, those self-reporting frequent fish consumption and taking omega-3 supplements had better QoL [10]. Research on wellness activity participation, ranging from Tai Chi and exercise therapy to mindfulness, relaxation, and imagery has mixed evidence of associations with QoL, though participation is generally reported to be beneficial for physical and mental QoL [11–14]. The benefits of physical activity for wellbeing are well established, with primarily aerobic forms benefiting social, physical, and mental QoL in pwMS [15].

Though the benefits of healthy lifestyle behaviors on QoL are evident, whether the effects are similar across MS phenotypes is unclear as most studies report on populations of mixed phenotype. As people with ProgMS are generally less responsive to therapies, have greater disability and more severe symptoms than those with RRMS [16–18], it is probable that the effects of lifestyle behaviors on QoL also differ. Therefore, we aim to differentiate associations of lifestyle behaviors with QoL between phenotypes, which may provide insight into personalised management strategies specific to disease course.

#### **2. Materials and Methods**

#### *2.1. Study Design and Participants*

Commencing from 2014, recruitment to the iConquerMS observational study has been ongoing and open to pwMS and the general population aged ≥21 years. The study is promoted by the sponsoring organization, Accelerated Cure Project for MS, and partner organizations and individuals via online, print and in person communication. Consenting participants are requested to voluntarily complete a series of self-reported online surveys capturing demographics, health and clinical outcomes, as well as lifestyle behaviors, at 6 month intervals. Response to questions at any timepoint is optional.

De-identified baseline data from participants who registered in the study from November 2014 to July 2020 (*n* = 3374) was extracted. Inclusion criteria were participants reporting a clinician-confirmed MS diagnosis, confidence in MS diagnosis, and having completed diet and wellness, physical activity, QoL and disability surveys. RRMS, SPMS or PPMS phenotypes were included, and SPMS/PPMS consolidated. Clinically isolated syndrome, radiologically isolated syndrome, and not sure/don't know MS phenotype, were excluded. Ethics approval ID #1956113.1.

#### *2.2. Demographics and Clinical Outcomes*

Age (from date of birth), sex (male, female), highest level of education (no formal education, elementary-middle school, high school, high school graduate, some college, associate degree, technical degree, bachelor's degree, master's degree, doctoral degree), partner status (never married, married, divorced, separated, widowed, cohabitation/domestic partner, prefer not to answer), employment status (employed outside home, employed at home, homemaker, student, worker's compensation, unemployed looking for work, disabled), country of birth (global country list), ethnicity (American Indian/Alaska Native, Middle Eastern, South Asian, other Asian, Black/African American, Native Hawaiian/Pacific Islander, White, don't know), and annual household income (<USD15,000 to >USD200,001 in increments of USD15,000) were queried and re-categorized.

MS duration was calculated by year of diagnosis and survey completion. Body mass index (BMI) was calculated by weight (kg)/height (m)<sup>2</sup> then categorized into underweight, normal, overweight, and obese according to World Health Organisation classifications [19]; underweight and normal were combined due to small sample size in the former group. Disability was measured via the Patient Determined Disease Steps (PDDS), and scores collapsed to low (0–2), moderate (3–5) and high (6–8) disability as per guidelines [20].

#### *2.3. Lifestyle Behaviors*

Variables within diet (*n* = 23), wellness (*n* = 25), vitamins (*n* = 15) and supplements (*n* = 29) categories were each queried via tick-box options of "used" and/or "used and helpful" in the past 6 months to improve health and wellbeing; those not selecting either were considered not using. Two response options were combined for analysis (Yes = used/used and helpful vs. No = none selected). Variables were recategorized for diet and wellness (Table 1) then analysed as a binary variable (Yes = use/used and helpful of ≥1 option within category). Of the vitamins and supplements, only vitamin D and omega-3 were analysed.


**Table 1.** Lifestyle Behavior Categories.

Physical activity was assessed via the Godin-Shephard Leisure-Time Physical Activity Questionnaire (GLTPAQ), which queries frequency (0–7 days) of strenuous, moderate, and mild exercise for ≥ 15 min in the preceding seven days [21]. Total leisure activity score was calculated as per guidelines and categorized into sedentary (<14), moderately active (14–23), and active (≥24).

#### *2.4. Outcome Measure*

QoL was queried via the NeuroQoL Adult Short Form, comprising 13 subdomains, classified under physical, mental, and social QoL [22]. Each of 13 subdomains comprise between five to nine questions scored on a Likert scale. Scores were summated and converted to T-scores (Mean = 50, SD = 10) as per guidelines. For mobility, fine motor, anxiety, depression, positive affect, cognitive function, social participation, and social satisfaction subdomains, T-scores were derived from an average U.S. general population; and for fatigue, sleep disturbance, emotional dyscontrol, and stigma subdomains, T-scores were derived from an average population with a diagnosed neurological disorder (MS, epilepsy, stroke, amyotrophic lateral sclerosis, or Parkinson's disease). Higher T-scores equate to higher measured concept. T-scores for the communication subdomain were unavailable, therefore raw total score for this subdomain was used for analysis and reporting.

#### *2.5. Statistical Analysis*

All analyses were conducted in Stata version 15.0 (StataCorp. 2017. Stata Statistical Software: Release 15. College Station, TX, USA: StataCorp LLC.). Associations between lifestyle behavior categories and QoL domain were assessed by multiple linear regression models, adjusted for age, sex, education, BMI, disability, and duration since MS diagnosis, estimating adjusted regression coefficients and 95% CI. An interaction term between

MS phenotype and each lifestyle behavior was added to the regression model to assess differences between RRMS and ProgMS.

#### **3. Results**

#### *3.1. Participant Characteristics Based on Phenotype*

Of 3374 participants enrolled into iConquerMS, *n* = 1108 (33%) met the inclusion criteria. In the included population, compared to participants with RRMS, people with ProgMS were older, and more likely to be male, not in paid employment, with moderate or severe disability, and longer MS duration (Table 2A). For lifestyle behaviors, compared to RRMS, people with ProgMS were less likely to have used an anti-inflammatory diet, and less likely to be at an active level of physical activity.

**Table 2.** (**A**). Characteristics of participants with RRMS and ProgMS. (**B**). Mean QoL T-scores of participants with RRMS and ProgMS.



BMI = body mass index; MS = multiple sclerosis; PDSS = Patient Determined Disease Steps; ProgMS = progressive MS; Ref. = reference; RRMS = relapsing-remitting MS; SD = standard deviation; USD = United States Dollar. *p* values indicate statistical differences between RRMS and ProgMS, where bolded values indicate significance (*p* < 0.05). <sup>a</sup> Total raw score. Bolded mean scores indicate differences > 5 points on the T-scale metric (0.5 SD) than the clinical or US general population.

> For QoL, compared to the clinical for U.S. general population, pwMS reported similar T-score difference (<0.5 SD) in 9 of 13 QoL subdomains, excepting fine motor, mobility, and both social participation and satisfaction (Table 2B), in which people with ProgMS reported marginally lower T-scores. Compared to RRMS, people with ProgMS reported significantly worse QoL in 7 of 13 subdomains: lower mobility, fine motor, positive affect, and both social participation and satisfaction, and higher fatigue and stigma. Anxiety was higher in RRMS (Table 2B).

#### *3.2. Associations between Lifestyle and Quality of Life Subdomains*

Diet was associated with physical and mental, but not social, QoL (Table 3A). In people with RRMS, anti-inflammatory, low-carbohydrate and other diets were positively associated with stigma, and other diets additionally associated with lower fine motor and cognitive function. In ProgMS, anti-inflammatory diets were associated with higher mobility and positive affect; low-carbohydrate diet with higher positive affect; low-saturated fat diet with higher ease of communication; and other diet with higher mobility. Phenotype differences were observed in mobility and communication subdomains.


**Table 3.** (**A**).

Associations

 between diet and QoL

subdomains,

 in RRMS and ProgMS. (**B**).

Associations

 between wellness activities and QoL

subdomains,

 in RRMS



and QoL subdomains.

 \* Significant (\* *p* < 0.05) difference between RRMS and ProgMS.

Wellness activities were associated with physical, mental, and social QoL (Table 3B). In RRMS, mind activities were associated with lower fine motor, cognitive function, communication, social participation, and social satisfaction, and with higher fatigue, anxiety, emotional dyscontrol, and stigma. Mind-body activities were associated with higher positive affect and social participation, and lower emotional dyscontrol. Other wellness activities were associated with lower physical, mental, and social QoL in 10 of 13 subdomains, excepting mobility, depression, and positive affect. No significant associations were observed between wellness activities and QoL in ProgMS. Phenotype differences were only observed between other wellness activities and the fine motor subdomain.

Physical activity was associated with physical, mental, and social QoL (Table 3C). In RRMS, physical activity was dose-dependently associated with higher mobility, positive affect, and social satisfaction; and with lower anxiety, depression, and stigma. Active level of physical activity was additionally associated with higher fine motor, cognitive function, communication, social participation, and lower fatigue, sleep disturbance, and emotional dyscontrol. In ProgMS, moderate physical activity was associated with higher positive affect, cognitive function, and lower communication; and active physical activity with higher mobility and lower fatigue. Phenotype differences were observed in 8 of 13 QoL subdomains.

Neither vitamin D nor omega-3 supplements use were associated with QoL (Table 3C).

#### **4. Discussion**

Lifestyle behaviors are known to be associated with QoL in pwMS. To inform potential lifestyle management strategies based on disease course, we assessed associations between diet, vitamin D and omega 3 supplementation, and participation in wellness and physical activities on QoL in pwMS, and whether these associations differed in nature and magnitude between MS phenotypes.

Compared to RRMS, people with ProgMS were older, less likely to be in paid employment, had longer disease duration and greater disability, and had a lower female/male ratio, consistent with previous reports [16,17]. Of lifestyle behaviors assessed, physical activity and QoL differed by phenotype. People with ProgMS were less physically active and had lower QoL in specific physical, mental, and social QoL subdomains, also consistent with prior studies [18,23], and expected given advanced disease stage and greater severity of symptoms adversely affecting QoL, and being likely barriers to performing daily activities and independent living.

High quality, anti-inflammatory, and MS-specific diets have been associated with better mental and physical QoL [5,6]. Our results were mixed and not always aligned with previously reported findings. We identified associations of four diet categories with mental and physical, but not social QoL domains. In RRMS, three of four diet categories were associated with higher stigma, a measure of perceived prejudice and discrimination because of disease, potentially reflective of people who feel greater stigmatisation being more inclined to make changes in their diet in attempt to improve or moderate their condition, or the stigma of adhering to dietary restrictions. Unexpectedly, no positive associations of diet with QoL in RRMS were found; the other diet category, comprising organic, low sodium/sugar and semi-vegetarian diets, was associated with both lower cognitive function and fine motor subdomain scores. Timing of adoption as well as duration and adherence of dietary modification may account for these observations.

In ProgMS, diet was associated with positive affect and ease of communication, perhaps indicative of higher mastery and self-control over MS management. Both antiinflammatory and other diets were associated with improved mobility, consistent with proposed neuroinflammatory and microbiota-gut-brain-axis disease mechanisms. Though studies have reported associations between diet quality and MS-specific diets with lower depression and fatigue respectively [24,25], we did not observe associations in these symptom subdomains. Discrepancies may be attributable to outcome measure tools in addition to potential additive benefits of adhering to multiple lifestyle behaviors. Phenotype differences were evident only in mobility and communication subdomains. The positive

association with mobility in ProgMS, an indicator of disease progression and key contributor to reduced QoL, suggests duration of dietary modification may be a factor, although our data do not allow us to make this conclusion.

No associations between vitamin D or omega-3 supplementation and QoL were observed. Prior studies report mixed evidence for a role of vitamin D supplementation on QoL, with positive associations apparent in pwMS with deficiencies or with an intake of more than 5000 IU/day in addition to sufficient sun-exposure [8,9,26]. Similarly, discrepancies between our observations and that reported for omega-3 and QoL [10], may reflect dose and source of omega-3, or dietary balance of omega-3 and -6. Baseline vitamin and mineral levels, or daily dose, frequency, and duration of supplement use were not captured in the current study.

Participation in wellness activities was associated with physical, mental, and social QoL only for people with RRMS. Mind-body activities, encompassing yoga, Tai Chi, Qigong, and exercise therapy, were associated with positive affect, emotional dyscontrol, and social participation, consistent with past reports of favourable effects of exercise therapy and Tai Chi on mental QoL [12,14]. The non-significant and negative associations observed with mind and other wellness activities with QoL subdomains, some contrary to previously reported [27], may be attributable to category inclusions, adherence to behavior, and/or non-specific symptom assessment. Alternatively, it may be that interactive group wellness activities having positive social interactions may be better interventions for improved mental and social QoL. Phenotype difference was only observed with other wellness and fine motor subdomains. Further investigation capturing information regarding adherence to lifestyle behaviors may provide better insight and is necessary to inform practice recommendations.

The benefits of physical activity on overall health are established [15,28] and supported by our data. We found dose-dependent associations in mobility, social satisfaction, and four mental subdomains in RRMS. Active levels of physical activity were positively associated across 13 QoL subdomains. That common symptoms of fatigue, mobility, anxiety, depression, and cognitive function also showed significant positive associations, highlights the potential value of incorporating regular physical activity in MS management. In people with ProgMS, moderate activity had positive associations for positive affect and cognitive function, and active levels for mobility and fatigue, also encouraging for symptom management through adoption of physical activity. The magnitude of associations was generally stronger in RRMS, especially in active levels. Significant phenotype differences were noted in fine motor, five of seven mental and both social subdomains, suggesting that physical interventions may be best implemented early in disease course, adapted to disease progression.

The strengths of our study are the inclusion of a large and diverse population of pwMS, with minimal participant bias due to the open nature of recruitment, enabling generalizability of findings. Moreover, the large number of participants of RRMS and ProgMS phenotype meant that separation based on disease stage was possible; most prior studies report on mixed phenotype populations. The dataset captures a breadth of clinical and lifestyle variables, enabling robust analysis of associations among a spectrum of behaviors and QoL.

Limitations include self-reported optional survey responses which impact data quality and missingness, and potential selection bias with only 35% participant inclusion for which we controlled by assessing biases between included and excluded participants and adjusting for variables that were significantly different (data not shown). Some participant biases, such as possible increased motivation of pwMS who completed all surveys, are unable to be adjusted for. The cross-sectional analysis limits the inference of causal relationships but provides insight to guide future longitudinal studies. Other factors including socioeconomics, access to health services, and support networks, may also contribute to QoL and should be considered in interpretation of the findings. The use of non-validated tools to capture lifestyle and health outcome variables limits interpretation and comparison with previously reported studies; however, the survey was developed by the multi-stakeholder iConquerMS Research Committee, comprising MS specialist health professionals and scientists, and pwMS, therefore results should be considered alongside other research for practice translation in pwMS. The non-exclusive lifestyle option selections, lack of capture of duration and adherence to behaviors, as well as researcher-defined broad re-categorizations, potentially masked associations; these and other recommendations are being considered for ongoing data capture.

#### **5. Conclusions**

Our study demonstrated that lifestyle behaviors concerning diet, wellness, and particularly physical activity, but not vitamin D or omega-3 intake, have positive associations with specific QoL subdomains in pwMS. Some differences in associations between RRMS and ProgMS phenotypes were observed, suggesting a need for phenotype-specific recommendations for MS management. Our findings suggest a role for modifiable lifestyle behaviors as a potential intervention for improving QoL in pwMS. Replication and validation through prospective studies are required to make specific recommendations; however, the presence and absence of associations by phenotype found in our study suggest areas that may be most rewarding for study among certain subgroups.

**Author Contributions:** Conceptualization, N.N.; methodology, N.N., M.Y.; formal analysis, M.Y.; investigation, H.K.S.; resources, H.K.S., S.L.N.; data curation, M.Y.; writing—original draft preparation, N.N., M.Y.; writing—review and editing, N.N., M.Y., G.A.J., S.S.-Y., S.L.N., H.K.S.; visualization, N.N.; supervision, N.N.; project administration, N.N.; funding acquisition, H.K.S., G.A.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** Data collection and curation was funded by the Patient-Centered Outcomes Research Institute. Data access and research activity was funded by philanthropic gifts to the Neuroepidemiology Unit from Mr Wal Pisciotta and anonymous donors. Open access fee was funded by Accelerated Cure Project for Multiple Sclerosis.

**Institutional Review Board Statement:** This study was approved by The University of Melbourne, Melbourne School of Population and Global Health Human Ethics Advisory Group, project #1956113.1.

**Informed Consent Statement:** Written informed consent has been obtained from all participants.

**Data Availability Statement:** Restrictions apply to the availability of these data. Data was obtained from Accelerated Cure Project for Multiple Sclerosis and may be requested from HS with the approval of the iConquerMS Research Committee.

**Acknowledgments:** The authors gratefully acknowledge survey participants, and data collectors and curators of iConquerMS.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

