**Gait Asymmetry Post-Stroke: Determining Valid and Reliable Methods Using a Single Accelerometer Located on the Trunk**

#### **Christopher Buckley 1, M. Encarna Micó-Amigo 1, Michael Dunne-Willows 2, Alan Godfrey 3, Aodhán Hickey 4, Sue Lord 1,5, Lynn Rochester 1,6, Silvia Del Din <sup>1</sup> and Sarah A. Moore 1,7,8,\***


Received: 2 November 2019; Accepted: 17 December 2019; Published: 19 December 2019 -

**Abstract:** Asymmetry is a cardinal symptom of gait post-stroke that is targeted during rehabilitation. Technological developments have allowed accelerometers to be a feasible tool to provide digital gait variables. Many acceleration-derived variables are proposed to measure gait asymmetry. Despite a need for accurate calculation, no consensus exists for what is the most valid and reliable variable. Using an instrumented walkway (GaitRite) as the reference standard, this study compared the validity and reliability of multiple acceleration-derived asymmetry variables. Twenty-five post-stroke participants performed repeated walks over GaitRite whilst wearing a tri-axial accelerometer (Axivity AX3) on their lower back, on two occasions, one week apart. Harmonic ratio, autocorrelation, gait symmetry index, phase plots, acceleration, and jerk root mean square were calculated from the acceleration signals. Test–retest reliability was calculated, and concurrent validity was estimated by comparison with GaitRite. The strongest concurrent validity was obtained from step regularity from the vertical signal, which also recorded excellent test–retest reliability (Spearman's rank correlation coefficients (rho) = 0.87 and Intraclass correlation coefficient (ICC21) = 0.98, respectively). Future research should test the responsiveness of this and other step asymmetry variables to quantify change during recovery and the effect of rehabilitative interventions for consideration as digital biomarkers to quantify gait asymmetry.

**Keywords:** stroke; asymmetry; accelerometer; gait; trunk; reliability; validity

#### **1. Introduction**

Hemiparesis after stroke typically results in reduced walking speed, an asymmetrical gait pattern, and a reduced ability to make gait adjustments that consequentially limit community ambulation

and physical activity [1–4]. Reduction in both predisposes an already at risk population to further cardiometabolic disease [5,6]. Therefore, the improvement of gait is a worthwhile and common target for interventions after stroke. Gait asymmetry, if not addressed early in the recovery process, can prolong and increase gait impairment due to compensatory mechanisms, leading to an increasingly asymmetric gait pattern [7]. The latter is inefficient and requires increased energy expenditure. Consequently, falls risk increases, further reducing levels of physical activity [8]. In order to quantify asymmetry and its improvement from targeted rehabilitative interventions, it is essential to have both valid and reliable tools that are able to quantify movement quality/compensatory strategies of the whole body during gait.

Tests such as the 10 m walk [9] and scales such as the Dynamic Gait Index [10] are used to measure gait after stroke. Although useful and practical for application to clinical settings, these tests are susceptible to subjectivity and not specifically designed to capture the cardinal symptoms of gait after stroke, such as asymmetry. Instrumented walkways can objectively measure asymmetry and have shown excellent intra and inter-rater reliability in subacute stroke [11]. Practically, they are costly and need a controlled dedicated environment with a trained specialist to operate; therefore, they are mainly limited to research settings [12]. From a biomechanical perspective, they limit the number of steps collected per trial and solely obtain information of the participant's footfall. They are not designed to measure the movement of the whole body, where synergistic compensatory movement strategy information may be quantified such as compensatory movements of the pelvis [8,13]. Traditionally, gaining this information would rely on three-dimensional motion analysis systems. However, due to the even higher cost, required experience, and time to use relative to instrumented mats, their application is also limited to research settings [12]. Therefore, a need exists for a valid tool that is capable of quantifying whole body asymmetry, while also being feasible for routine clinical adoption.

Wearable accelerometers are a relatively low-cost alternative that are capable of measuring human movement from a variety of contexts while capturing parameters that are difficult to quantify from clinical inspection by the human eye [1,14]. Previous attempts to quantify measures of asymmetry indicative of spatiotemporal information of the feet with accelerometers have shown their feasibility, but also poor concurrent validity with reference standards of Gaitrite [1]. Therefore, the development of algorithms to capture the complex nature of asymmetry post-stroke has been encouraged [1]. Numerous asymmetry variables exist that have been obtained from cyclical acceleration signals during gait such as variables derived from the frequency domain [15,16]. These variables vary according to the complexity of the sensor, the number of sensors used, their location, and the population on which they were tested [17–19]. Relative to the discreet spatiotemporal movement of the feet equivalents, variables quantifying asymmetry from the cyclical signals of the lower back better classified post-stroke gait from controls [16,18,20]. Their advantage stems from considering the acceleration as a complete waveform, not neglecting temporal information outside of the time domain, which may enable a more complete description of the signal and a better characterisation of gait post-stroke [17].

Previously, studies quantifying asymmetry from acceleration signals of the trunk during post-stroke gait typically focus on differences from a control group, adopt a minimal data set of variables, and to our knowledge do not report the concurrent validity or reliability to reference standards. Knowledge of the most robust asymmetry variables that are capable of quantifying similar information to reference standards using clinically feasible tools is important to further the field. This study compares the validity and test–retest reliability of a wide range of novel acceleration-derived variables to quantify asymmetry post-stroke from a single sensor located on the trunk.

#### **2. Materials and Methods**

#### *2.1. Study Design and Setting*

This cross-sectional study was undertaken in the gait laboratory at the Clinical Ageing Research Unit, Campus for Ageing and Vitality, Newcastle upon Tyne, UK.

#### *2.2. Participants*

The study was approved by the Greater Manchester West Research and Ethics Committee (NRES Committee Northwest-Greater Manchester West 15/NW/0731). All subjects gave informed written consent for the study according to the Declaration of Helsinki.

Inclusion criteria: Community-dwelling stroke survivor; at least one month post-stroke onset; mild to moderate gait deficit defined by clinical observation of gait asymmetry including reduced stance time, increased swing time in the affected limb and/or reduced gait speed/balance problems; no changes in gait-related ability over the past month based on self-report and able to walk 10 m with/without a stick.

Exclusion criteria: Medical problems other than stroke impacting on gait e.g., osteoarthritis. Participants were recruited via advertisement or therapist referral. All eligible participants were consecutively invited to participate in the study.

#### *2.3. Demographic and Clinical Measures*

The following data were collected at baseline: age, gender, height and weight, date of stroke, stroke type (Oxford Community Stroke Project Classification [21]), stroke impairment (National Institute of Health Stroke Scale [22]), presence of hemiplegia (clinical observation by two independent experienced clinicians), walking stick use, ankle foot orthosis (AFO) use.

#### *2.4. Test Protocol*

Participants were asked to walk at their preferred pace in a straight line for 4 × 10 m intermittent trials (see Figure 1). The trials were repeated on two occasions (Time 1 and Time 2) one week apart (±2 days). A GaitRite instrumented walkway was positioned in the walk path (dimensions were 7.0 m × 0.6 m, spatial accuracy of 1.27 cm and temporal accuracy of one sample (240 Hz, ~4.17 ms) (GaitRite: Platinum model GaitRite, software version 4.5, CIR systems, NJ, USA)). The participants wore an AX3 wearable sensor located at their fifth lumbar vertebrae (L5). The AX3 is a single tri-axial accelerometer-based wearable (AX3, Axivity, York, UK https://axivity.com/, cost ≈ £100, dimensions 23.0 mm × 32.5 mm × 7.6 mm). The AX3 weighs 11 g and has a memory of 512 Mb. AX3 data capture occurs with a sampling frequency of 100 Hz (16-bit resolution) at a range of ±8 g. Recorded AX3 accelerations were stored locally on the device's internal memory and downloaded upon the completion of each session.

#### *2.5. Asymmetry Variables*

Acceleration-derived asymmetry variables were selected based upon their ability to represent levels of asymmetry from signals measured from a single accelerometer located at the trunk. The variables that were selected as representative of asymmetry were the harmonic ratio [16], autocorrelation [20], gait symmetry index [18], and phase plot analysis [23–25] (described in more detail below). Four spatiotemporal variables extracted from GaitRite were selected as measures of asymmetry as defined by Lord et al. [26]. The spatiotemporal asymmetry variables included step time asymmetry, stance time asymmetry, swing time asymmetry, and step length asymmetry, and these were calculated as the absolute difference between consecutive left and right steps.

#### *2.6. Description of Acceleration-Derived Variables*

All data analysis relating to the raw acceleration signals was performed using MATLAB (version 9.4.0, R2018a). For a full description for the algorithm and data segmentation techniques applied to the accelerometer data, please see references [27,28]. In brief, the vertical acceleration underwent continuous wavelet transformation to estimate the initial contact and final contact in the gait cycle [28]. To ensure that the steady-state gait was analyzed, the initial and final three steps were removed from the signal. Prior to the calculation of additional variables, the acceleration signals were realigned to

the earth's gravitational constant [29,30] and a low-pass Butterworth filter with a cut-off frequency of 20 Hz. A full description of the following variables and the required algorithms is the supplied by the provided references. Additionally, they have been summarised in Appendix A.

**Figure 1.** Indication of the instrumentation and the protocol used to collect the acceleration signal and the asymmetry parameters from the GaitRite mat. Also pictured is the acceleration-derived asymmetry variables and the means for the calculation of asymmetry following the processing of the raw acceleration signal.

#### 2.6.1. Harmonic Ratio

The harmonic ratio (HR) describes the step-to-step symmetry within a stride from calculating a ratio of the odd and even harmonics of a signal following fast Fourier transformation [16,31]. This method has been shown previously to reflect increased asymmetry for those post-stroke relative to age and speed-matched controls [16].

#### 2.6.2. Autocorrelation

The unbiased autocorrelation was also calculated due to its ability to reflect the step and stride regularity and the symmetry between the two (autocorrelation symmetry) [20,32,33]. Previously, it has been shown as better capable to characterise hemiplegic gait relative to footfall variables [20,32].

#### 2.6.3. Gait Symmetry Index

The gait symmetry index (GSI) is a more recently proposed variable, which was calculated based upon the concept of the summation of the biased autocorrelation from all three components of movement and a subsequent calculation of step and stride timing asymmetry [18]. It has been shown to be more sensitive than and highly correlated with levels of asymmetry measured with two sensors located at the feet of participants post-stroke [18].

#### 2.6.4. Phase Plot Analysis

Phase plot analysis (aka Poincaré analysis) was performed on vertical components of the acceleration signal [23–25]. This method has had previous applications within electrocardiogram studies. It works by plotting periodic signals as a function of their past values. The resulting ellipses or orbits and the properties thereof can then assess asymmetries in the associated gait. Phase plot analysis also offers the ability to assess intra step correlation i.e., the correlation of signals from immediately successive step cycles, which necessarily corresponds to left-versus-right asymmetry.

#### 2.6.5. Measures Indicative of Stability

Although not indicative of asymmetry, the root mean square of the acceleration signal (Acc RMS) and also its first time derivative (Jerk RMS) were calculated for their potential to highlight synergistic compensatory strategies during gait post-stroke [13,16]. Their test–retest reliability needs to be established in the literature.

#### *2.7. Statistical Analysis*

Analysis was completed using SPSS v25 (IBM). The normality of data was tested with a Shapiro–Wilk test. Descriptive statistics (median and interquartile range) were calculated for gait characteristics measured by AX3 and GaitRite. Concurrent validity between the AX3 acceleration-derived variables and those of the GaitRite at Time 1 were tested using Spearman's rank correlation coefficients (RHO). For the AX3 acceleration-derived variables, the test–retest reliability between Time 1 and 2 was established using Spearman's rank correlation coefficients (RHO), intraclass correlation coefficient (ICC21), and limits of agreement (LoA) expressed as a percentage of the mean of the two variables and the 95% LoA. For all analyses, statistical significance was set at *p* < 0.05. Predefined acceptance ratings for ICC21 were set at excellent (≥900, 0.0%–4.9%), good (0.750–0.899, 5.0%–9.9%), moderate (0.500–0.749, 10.0%–49.9%), and poor (50.0%) [1,34]. The selection for the most robust variable was based upon the variable with the highest Spearman rank correlation coefficient with the asymmetry variable obtained from the GaitRite while also recording an ICC21 greater than 0.8 for test–retest reliability.

#### **3. Results**

Twenty-five participants were recruited to the study. Data for two participants who wore a fixed plastic AFO were removed from the analysis, because individual data analysis (including video observations) revealed that the step detection applied were not appropriate for these two participants due to a lack of possible plantar flexion. This was not the case for the remaining participants, as the video analysis confirmed the step detection algorithm was effective to detect both heel strike and toe off [1]. Demographic information for the remaining 23 participants is displayed in Table 1.


#### **Table 1.** Participant characteristics.

Where appropriate mean and standard deviation are displayed, OCSP (Oxford community Stroke Project), NIHSS (National Institute for Health Stroke Scale).

#### *3.1. Concurrent Validity of the Asymmetry Variables*

Figure 2 shows the correlation between the asymmetry variables quantified using a GaitRite mat (step time asymmetry, stance time asymmetry, swing time asymmetry, and step length asymmetry) and the acceleration-derived variables proposed to measure asymmetry. Overall, step time asymmetry correlated most with the acceleration-derived variables. Step regularity (vertical acceleration) had the highest concurrent validity with step time asymmetry (−0.87). Six other variables had high levels of agreement (+0.80) (HR V, step regularity (V), step regularity (AP), orbit eccentricity, orbit width deviation, and intra step correlation). Five correlated with step time asymmetry and orbit width deviation correlated with stance time asymmetry. The smallest correlations were achieved by the outputs of the autocorrelation from the medial lateralcomponent of the signal and also a variety of the outputs from the phase plot analysis.


**Figure 2.** Indication of the correlation between the asymmetry variables quantified using a GaitRite mat and the variables proposed to measure asymmetry from the acceleration signals from the trunk. Black indicates a strong positive or negative correlation. \* and \*\* denotes significance at the 0.05 and 0.01 level, respectively. V = Vertical acceleration, ML = Medial lateral acceleration, and AP = Anterior posterior acceleration.

#### *3.2. Test–Retest Reliability of the Variables*

Table 2 demonstrates the test–retest reliability between the wearable variables measured one week apart (Time 1 versus Time 2). The most reliable variables were step regularity (V) and HR (V), both recording an ICC21 of 0.98. Taken from the ICC21 values, excellent reliability was achieved for 12 out of the 27 variables tested. These came from the majority of autocorrelation outputs except for step regularity (ML), stride regularity (AP), and autocorrelation symmetry (vertical acceleration (V) and medial lateral acceleration (ML)) direction, the GSI, the HR in the V and AP direction, Jerk RMS, and the short half-orbit segment angle form the phase plot analysis. Good reliability was achieved for a further five variables (stride regularity (AP), autocorrelation symmetry (V), relative orbit inclination, short half orbit eccentricity, and long half orbit eccentricity).


**Table 2.** Test–retest reliability (one week apart) for acceleration-derived variables.

\* and \*\* denotes significance at the 0.05 and 0.01 level, respectively. V = Vertical acceleration, ML = Medial lateral acceleration, and AP = Anterior posterior acceleration, RMS = root mean square.

#### *3.3. Selection of the Most Robust Variable*

Table 3 highlights the variables that best correlated with spatiotemporal gait variables calculated from GaitRite while also achieving an ICC21 greater than 0.8 for test–retest reliability. For the GaitRite variables of asymmetry, step regularity (V) achieved the highest concurrent validity due to its correlation with step time asymmetry (RHO = 0.87 and ICC21 = 0.98 \*\*). The second highest concurrent validity was the HR in the vertical direction, which correlated with swing time asymmetry (RHO = 0.73 and ICC21 = 0.98 \*\*).


**Table 3.** Indication of what wearable sensor variable recorded the highest Spearman's rank correlation coefficient with each variable obtained by the GaitRite mat. The Spearman's rank correlation coefficient between the two devices and the intraclass correlation coefficient is displayed for each variable.

\*\* denotes significance at the 0.01 level. V = Vertical acceleration.

#### **4. Discussion**

This study examined the concurrent validity and reliability of a comprehensive range of asymmetry variables derived from a single accelerometer located on the trunk and identified step regularity as the most robust outcome. Step regularity showed strong concurrent validity and excellent test–retest reliability when compared with GaitRite outcomes reflecting asymmetry. This contrasts with previous work based on the AX3 sensor, which achieved poor to moderate criterion validity (Spearman's rank correlation coefficient of RHO = 0.01 to 0.601) for variables engineered to replicate spatiotemporal asymmetry variables calculated from GaitRite [1]. Although clinically more challenging to interpret than traditional spatiotemporal variables, our results support the adoption of novel variables to quantify asymmetry as robust digital variables for measuring asymmetrical gait post stroke.

With one exception (HR correlation with swing time asymmetry), variables calculated from performing an autocorrelation procedure on the original acceleration signal were more strongly correlated with GaitRite asymmetry. Hodt–Billington and colleagues [20] found that autocorrelation variables taken from the trunk were better at discriminating gait post-stroke from controls relative to GaitRite variables of asymmetry. The strength of the autocorrelation procedure may stem from analysing continuous successive steps. Complex measures such as gait asymmetry are not simply portrayed within a single discreet gait cycle; this concept has been highlighted before, whereby continuous measures have been described to highlight different asymmetry causes, symptoms, and gait strategies such as particular compensatory techniques [17]. Data from our study indicate that participants with high asymmetry produced poor forward propulsion from the affected limb, instead of relying on the more dominant limb to achieve progression at the end of each stride. This can be observed by the lack of step regularity and its diminution relative to stride regularity in the AP, ML, and V directions, replicating the gait strategy described by Balasubramanian et al. [35]. The autocorrelation method is well designed to reflect this synergistic gait strategy, which might explain the high correlation found from this sample of participants. However, this strategy will likely vary among a broader range of participants and throughout recovery. Other methods may better reflect true levels of asymmetry at different stages of recovery from acute, early subacute, late subacute, and chronic stroke, meaning that they should still be considered as potential variables [17,20].

Previously, Iosa et al. [16] assessed symmetry together with upright gait stability post-stroke and showed that relative to speed-matched controls, higher instabilities (Acceleration RMS) and reduced symmetry of trunk movements (as measured using the HR) were recorded. In this study, HR in the vertical direction was the only HR variable that performed favourably to autocorrelation variables due to its correlation with swing time asymmetry (RHO = −0.73) while also recording excellent reliability (ICC21 = 0.98). Since we did not assess control subjects, we could not determine the best measure to characterise gait post-stroke and highlight the compensatory mechanisms adopted relative to healthy controls. This is a broader aim for ongoing work. However, it has been previously highlighted that compensation strategies may be beneficial to increase gait ability, but this occurs at the compromise of stability. Thus, variables such as Acceleration and Jerk RMS should always be considered in

addition to variables directly linked to asymmetry, aiming to provide a more holistic description of gait patterns [13,16]. Future research should explore this relationship so that a holistic, multivariate wearable approach can better assess gait strategies during recovery post-stroke. This potentially would quantify what movements are beneficial to gait, while also highlighting the impact of compensation strategies, consequently quantifying separate movements that can be targeted for rehabilitation.

Although previously suggested as a variable representative of asymmetry in stroke [18], the GSI performed relatively poorer to the previously discussed variables, despite also being based on the autocorrelation (biased) of accelerometry. This was unexpected, as GSI theoretically is designed to detect the asymmetry within temporal footfall parameters. Equally, the autocorrelation symmetry variables did not perform better than step regularity alone, despite being designed to the capture the difference between step and stride regularity and therefore the symmetry between them. Potentially, the GSI and the autocorrelation symmetry did not quantify the synergistic movement strategy that the step regularity variable was suited to highlight and the reason for its favourable concurrent validity. The GSI and the autocorrelation symmetry variables may be better suited to highlight different compensatory synergies at different stages of recovery such as during acute, early subacute, late subacute, and chronic stages, and therefore should not be neglected in future research.

Select phase plot variables achieved RHO values greater than 0.8 when compared to GaitRite asymmetry values and also demonstrated good to excellent reliability, therefore highlighting their ability to quantify symmetry post-stroke. Adaption to the algorithms to the other directional components other than vertical and comparison with controls would better test their application as a biomarker. Similar to the other variables capable of quantifying movements in the AP and ML direction, there is the possibility that they can highlight a new domain of asymmetry separate from the asymmetry footfall asymmetry variables captured by GaitRite. Future research should explore this upper and lower body relationship post-stroke to examine the similarities and differences during gait and determine if added value is obtained [36,37].

All data were collected in a controlled environment; however, wearable technology is not limited by the testing environment and for improved ecological validity; obtaining data from the participant's community is desired [38]. To this goal, future research should utilise the variables tested in the laboratory in the participant's free-living environment. For free-living gait, the majority of walking bouts for people with Parkinson's disease and older adults have been found to be below 10 s, and it has been inferred that these bouts are when the participants are indoors [39]. One limitation with autocorrelation is that it relies on successive steps in a straight line. For free-living data, variables such as the HR may be more useful during these short walking bouts due to their ability to be calculated from a single stride in addition to successive steps [31,40]. Future research should assess the ability of these variables to accurately and reliably quantify asymmetry during short walking bouts or if tested refined spaces, as for this population, the median (and interquartile range) bout length was 16.3 (6.2) seconds for data collected over seven days [1].

#### *4.1. Limitations*

The relatively small sample size and limited heterogeneity with respect to time post-stroke did not allow us to determine what variables are the best at quantifying asymmetry for a more general sample or recovery stage-specific populations [41]. Future work is required on a larger sample size that ranges in time since stroke to discover what variables are the most capable to perform as objective biomarkers over all stages of recovery as one variable may not be appropriate for all, and compensatory strategies may change between the different stages of stroke recovery. Equally, future research should confirm that these results are replicable with different accelerometers with differing sampling frequencies, ranges, and resolutions. Further limitations stem from the reliance of the step detection algorithm. Data from two participants was not analysed due to their use of a fixed AFO that impacted on heel strike and the performance of the algorithm, which was based on the detection of initial and final contact within the gait cycle. Future research should integrate/develop step detection algorithms

for participants requiring fixed AFOs to broaden application. Alternatively, the variables should be developed so that the cyclical nature of a signal may divide gait cycles (similar to the method used for phase plots) as opposed to methods that rely on detecting the initial and final contact of the foot.

#### *4.2. Applications*

These results provide evidence that asymmetry can accurately and reliably be calculated using a single accelerometer. Although much work is needed for accelerometers to be routinely adopted [42,43], these results give evidence that asymmetry can be objectively quantified using a tool applicable for many purposes. Consequently, the variables tested here may then act as a digital biomarker to quantify the impact of targeted interventions proposed to improve gait timing mechanisms and gait asymmetry (e.g., auditory rhythmical cueing) [44]. Accelerometers provide a potentially low burden method for clinicians to collect data from a variety of environments, increasing the ability to objectively quantify asymmetry during stroke rehabilitation. Alongside application within the clinic, accelerometer data can be collected on gait asymmetry in naturalistic environments, thus removing the Hawthorn effect/observer bias associated with clinical testing. With increased development, these variables may provide continuous asymmetry focussed feedback for self-progress specific to each participant during rehabilitation.

#### **5. Conclusions**

Gait asymmetry after stroke can be measured robustly using a single wearable sensor on the trunk. Step regularity is the most valid and reliable asymmetry outcome, which is quantified by performing autocorrelation on the vertical component of the signal. The variables tested performed favourably to previous studies that also used GaitRite as the reference. Consequently, their adoption, in addition to other wearable-derived spatiotemporal variables of gait, are encouraged as they provide a more holistic description of gait that appears to indicate compensatory movement post-stroke. Future research is encouraged on larger populations where asymmetry is expected, during recovery/interventions to identify which wearable variables are biomarkers for gait asymmetry and compensatory mechanisms during gait. This will allow for increased accuracy in determining effective interventions.

**Author Contributions:** Conceptualisation, C.B., S.D.D., L.R. and S.A.M.; methodology, C.B., M.E.M.-A., M.D.-W., A.G., A.H., S.L., L.R., S.D.D., and S.A.M.; software, C.B., S.D.D., M.E.M.-A., M.D.-W., A.G., and A.H.; validation, C.B., A.H., A.G., and S.D.D.; formal analysis, C.B., S.D.D., and S.A.M.; investigation, C.B., A.G., A.H., L.R., S.D.D., and S.A.M.; resources, S.A.M., and L.R.; data curation, C.B., M.E.M.-A., M.D.-W., A.G., A.H., S.D.D.; writing—original draft preparation, C.B.; writing—review and editing, C.B., M.E.M.-A., M.D., A.G., A.H., S.L., L.R., S.D.D., and S.A.M.; visualisation, C.B., S.D.D.; supervision, S.A.M., S.D.D., L.R., and S.L.; project administration, S.A.M.; funding acquisition, S.A.M. and L.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** A Medical Research Council Centenary award supported the delivery of this research. SM is supported by Health Education England and the National Institute for Health Research (HEE/NIHR ICA Programme Clinical Lectureship, Dr. Sarah Anne Moore, ICA-CL-2015-01- 012). LR and SDD are supported by the NIHR Newcastle Biomedical Research Centre (BRC) based at Newcastle upon Tyne and Newcastle University. The work was also supported by the NIHR/Wellcome Trust Clinical Research Facility (CRF) infrastructure at Newcastle upon Tyne Hospitals NHS Foundation Trust. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

**Acknowledgments:** We would like to thank the following for their contribution: Patients who took part in the study; Staff from local NHS trusts who assisted with recruitment to the study and lastly, Lisa Alcock for her assistance during in data collection.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Sharing:** Data cannot be shared publically but can be available upon reasonable request to the corresponding author, as per local data sharing policies.

#### **Appendix A**

#### *Appendix A.1. Acceleration-Derived Variable Definitions*

**Table A1.** Indication for the variables used from the signal-derived variables and their respective definitions.


*Appendix A.2. Explanation and Equation for Each Acceleration Derived Variable for Asymmetry*

#### Appendix A.2.1. Harmonic Ratio

The harmonic ratio is a measure based upon the premise that a stride contains two steps and therefore, during continuous walking, accelerations should repeat in multiples of two. The variable quantifies how well these accelerations are repeated in each stride compared to when accelerations do not repeat and are therefore out of phase. Therefore, the ratio of in and out-of-phase accelerations is a measure of how symmetric the participant is walking. To calculate the harmonic ratio, it is required to evaluate the harmonic content of the acceleration signal using the stride frequency from the analysis of frequency components. Following a fast Fourier transform (using the FFT function in MATLAB), a ratio be can created from the first 20 harmonics extracted from the Fourier series. Due to the AP and V components of the signals being biphasic, the ratio for these components is determined by the sum of the even harmonics (in phase movement) divided by the sum of the odd harmonics (out-of-phase movement).

$$\text{HR}\_{\text{AP, V}} = \frac{\Sigma \text{ Amplitules of even harmonics}}{\Sigma \text{ Amplitutes of odd harmonics}}$$

For the ML component of the signal due to only showing only one dominant acceleration peak within a stride cycle (whereby the odd harmonics are in-phase and even harmonic out-of-phase), the opposite is performed.

$$\text{HR}\_{\text{ML}} = \frac{\Sigma \text{ Amplitules of odd harmonics}}{\Sigma \text{ Amplitutes of even harmonics}}$$

As a gait measure, a higher harmonic ratio indicates a better symmetry between steps within a single stride For the AP and V components.

#### Appendix A.2.2. Autocorrelation

Autocorrelation is calculated taking the complete signal of the time when the participant was in contact with the GaitRite mat. Plots of an autocorrelation estimate are used to inspect the structure of a cyclic component within a time series. To do this, the generic unbiased autocorrelation function of the sample sequence x(i) was computed using the below equation:

$$\operatorname{Ad}(\mathbf{m}) \, \, = \frac{1}{\mathbf{N} - |\mathbf{m}|} \sum\_{\mathbf{i}=1}^{\mathbf{N}-|\mathbf{m}|} \mathbf{x}(\mathbf{i}) \cdot \mathbf{x}(\mathbf{i}+\mathbf{m}),$$

where N is the number of samples and m is the time lag expressed as number of samples.

Since phase shifts can be performed with identical results in both positive and negative directions relative to the original time series, an autocorrelation plot is conventionally organized symmetrically with the zeroth shift located centrally. This central value was used to normalize the signal so that its maxima was one. For a time series of trunk accelerations during walking, autocorrelation coefficients can be produced to quantify the peak values at the first and second dominant period, representing phase shifts equal to one step and one stride, respectively (see Figure 1 as an example). A tailored MATLAB code was used to detect these peaks, particularly using the signals power density to determine the windows in which the peaks would occur. For the symmetry between the step and stride regularity, the absolute difference was calculated as a measure of asymmetry instead of the ratio, which is more conventionally used. This was because the between-step and between-stride autocorrelations may approach zero if the regularity between neighboring steps or neighboring strides is low.

#### Appendix A.2.3. Gait Symmetry Index (GSI)

Differently from the aforementioned autocorrelation measures, the gait symmetry index (GSI) uses a second-order Butterworth low-pass filter with the cut-off frequency of 10 Hz to filter the complete time series and then uses the biased version of the autocorrelation function as displayed below:

$$\operatorname{Ad}(\mathbf{m}) \, \, = \frac{1}{\mathbf{N}} \sum\_{\mathbf{i}=1}^{\mathbf{N}-|\mathbf{m}|} \mathbf{x}(\mathbf{i}) \cdot \mathbf{x}(\mathbf{i}+\mathbf{m}) .$$

The maximum time lag was 4 s (400 samples), which approximates 2.5 times a single stride duration in post hemiplegic stroke patients. This window length was chosen to capture the repetition of stride cycles in very slow walking. A coefficient of stride cycle repetition (Cstride) was the sum of the positive autocorrelation coefficients of the three axes as a function of the equation displayed below:

$$\text{Cstride(t)} = \text{ADv(t)} + \text{ADnl(t)} + \text{ADap(t)}; \quad \text{if } \text{AD(t)} < 0, \text{AD(t)} = 0.$$

The coefficient of step repetition (Cstep) was the norm of autocorrelation coefficients as a function of the equation displayed below:

$$\text{Cstep}(\mathbf{t}) = \sqrt{\text{ADv}(\mathbf{t}) + \text{ADml}(\mathbf{t}) + \text{ADap}(\mathbf{t})};\\\text{if } \text{AD}(\mathbf{t}) < \mathbf{0},\\\text{AD}(\mathbf{t}) = \mathbf{0}.$$

One stride time (Tstride) equals t when the Cstride had the maximum value. The hypothesis was that in a perfect symmetric gait pattern, two consecutive steps have the same step duration of 0.5 <sup>×</sup> Tstride. Thus, the maximum value of Cstep was set at <sup>√</sup>3 when the autocorrelation coefficient of each acceleration axis was 1 at zero-lag (t = 0). The gait symmetry index (GSI) was Cstep (0.5 × Tstride) normalized to its value at zero-lag, as indicated in the below equation:

$$\text{Cstep}(\mathbf{t}) = \text{Cstep}(0.5 \ast \text{Tstride}) / \sqrt{3} \text{.} $$

Appendix A.2.4. Phase Plot Analysis

To create an ellipse to apply the following models, the vertical acceleration signal was first transformed to a horizontal–vertical coordinate system and filtered with a low-pass fourth order Butterworth filter at 20 Hz. Following piecewise integration, the full vertical excursion signal must be restored via concatenation of the resultant integrals. Here, the phase shift is introduced. We restore two such vertical excursion signals, one of which is exactly one step cycle lagged behind the other i.e.,:

$$PP1(tt) = PP0(tt - nn)$$

where *n* is the number of data points comprising a step interval in the vertical excursion signal and *PP*1 and *PP*0 are the lagged and original vertical excursion signal, respectively.

The following conic model is fitted to the two-dimensional phase plot data. This fitting is performed on each orbit in turn.

$$ax^2 + by^2 + cxy + dx + ey + f = 0$$

In the case of ellipse fitting to phase plot data, *x* and *y* are taken to be *PP*1 and *PP*0. The above model defines an ellipse subject to the following constraint.

$$c^2 - 4ab < 0$$

This constraint is used to ensure that an elliptical conic is fitted to the data as opposed to a hyperbola or parabola. The model defined by the conic equation can be fitted using ordinary least squares to find an estimate of *A*ˆ = *a*ˆ, ˆ *b*, *c*ˆ, ˆ *d*, *e*ˆ . *f* is set equal to 1 to avoid a trivial solution.

The above form of an ellipse does not lend itself well to geometric interpretation, so the following parameterisation is implemented:

$$\frac{\left(x-g\right)^{2}}{r\_{1}^{2}} + \frac{\left(y-k\right)^{2}}{r\_{2}^{2}} = 1.$$

However, this form does not account for inclined ellipses. To account for the significant inclination of ellipses, the following rotated coordinate system is introduced:

$$\mathbf{x}' = (\mathbf{x} - \mathbf{g})\cos(\theta) + (y - k)\sin(\theta)$$

$$\mathbf{y}' = (y - k)\mathbf{y}\cos(\theta) + (x - \mathbf{g})\sin(\theta).$$

This form of ellipse and rotated coordinate system ensure more straightforward interpretation of the ellipses and more intuitive feature extraction.

**Figure A1.** A single orbit with a fitted conic (ellipse).

This Figure A1 shows one such ellipse fitted to a single orbit of a phase plot. From this ellipse, we can extract features relating to the eccentricity and inclination. In general, phase plots consist of many orbits and their respective fitted ellipses (Figure A2). Further features can be extracted by assessing the relative inclination of ellipses from alternating orbits. In general, these inclinations oscillate about the value θ = <sup>π</sup> 4 .

**Figure A2.** Complete phase plot comprising 7 continuous gait cycles.

Features extracted from ellipses fitted to entire orbits are considered primary features. Ellipses can be fitted to partial orbits; for example, two separate ellipses can be fitted to both halves of an orbit where the orbit in question is halved according to its major/minor axes. This leads to four additional ellipses fitted to each orbit of a phase plot (Figure A3). As an example, take the two ellipses fitted to either half of the shown orbit following halving via the minor axis (Figure A3, lower two figures). Features are extracted from these ellipses by extracting their relative characteristics e.g., their inclination relative to the other, the ratio of their areas, etc. Features extracted from ellipses fitted to partial orbits in this way are considered secondary phase plot features.

**Figure A3.** Indication of the different conic (ellipses) fitted to themajor/minor axis and the first/second halves.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **Is a Wearable Sensor-Based Characterisation of Gait Robust Enough to Overcome Di**ff**erences Between Measurement Protocols? A Multi-Centric Pragmatic Study in Patients with Multiple Sclerosis**

**Lorenza Angelini 1,2,\*, Ilaria Carpinella 3, Davide Cattaneo 3, Maurizio Ferrarin 3, Elisa Gervasoni 3, Basil Sharrack 4, David Paling 5, Krishnan Padmakumari Sivaraman Nair 2,4 and Claudia Mazzà 1,2**


Received: 6 November 2019; Accepted: 18 December 2019; Published: 21 December 2019 -

**Abstract:** Inertial measurement units (IMUs) allow accurate quantification of gait impairment of people with multiple sclerosis (pwMS). Nonetheless, it is not clear how IMU-based metrics might be influenced by pragmatic aspects associated with clinical translation of this approach, such as data collection settings and gait protocols. In this study, we hypothesised that these aspects do not significantly alter those characteristics of gait that are more related to quality and energetic efficiency and are quantifiable via acceleration related metrics, such as intensity, smoothness, stability, symmetry, and regularity. To test this hypothesis, we compared 33 IMU-based metrics extracted from data, retrospectively collected by two independent centres on two matched cohorts of pwMS. As a worst-case scenario, a walking test was performed in the two centres at a different speed along corridors of different lengths, using different IMU systems, which were also positioned differently. The results showed that the majority of the temporal metrics (9 out of 12) exhibited significant between-centre differences. Conversely, the between-centre differences in the gait quality metrics were small and comparable to those associated with a test-retest analysis under equivalent conditions. Therefore, the gait quality metrics are promising candidates for reliable multi-centric studies aiming at assessing rehabilitation interventions within a routine clinical context.

**Keywords:** multiple sclerosis; gait metrics; wearable sensors; test-retest reliability; sampling frequency; accelerometry; autocorrelation; harmonic ratio; six-minute walk

#### **1. Introduction**

Multiple sclerosis (MS) is a chronic demyelinating disease of the central nervous system affecting 2.3 million people worldwide [1]. MS is the major non-traumatic cause of disability in young and middle-aged adults [2], with a significant negative impact on independence and social participation [3]. Walking impairment is one of the most common functional deficits due to MS, even in the early stages

of the disease [4]. Importantly, nearly 70% of people with MS (pwMS) reported that walking difficulty is the most challenging aspect of their condition [5].

Given the high impact of gait impairment on pwMS, different rehabilitation interventions focused on improving locomotion are currently applied to improve the quality of life in this population [6]. The effects of these interventions, together with the progression of the disease, are usually assessed in clinical practice using clinical scales, such as the expanded disability status scale (EDSS) [7] or timed tests, such as the timed up and go test (TUG) [8], the timed 25-foot walk test (T25FW) [9], and the 6-minute walk test (6MWT) [10]. Although widely used, these tests suffer from some limitations. Firstly, they assess only the time taken to execute the test (e.g., TUG and T25FW) or the distance travelled in a given time (6 min for the 6MWT), without providing objective measures of the different components and characteristics of the task that could be useful to describe *how* the performance is possibly impaired [11]. Secondly, these clinical tests have a relatively limited sensitivity to change [9,12,13] and a flooring effect [9,14] that makes it difficult to detect possible alterations in minimally impaired pwMS [15–17].

Instrumental methods may partly overcome these limitations by providing additional quantitative information for a more complete characterisation of walking, which can be useful to tailor the rehabilitative intervention and objectively assess its effects [11,18]. In particular, wearable inertial measurement units (IMUs), including accelerometers, gyroscopes, and magnetometers, represent cost-effective tools to perform objective assessments of walking in pwMS outside movement analysis labs [19,20], and even during free-living and community contexts [21,22]. IMUs have been widely used to analyse different locomotor tasks in pwMS, such as straight-line over ground [17,23–27] and treadmill walking [28], standing up, walking, turning, and sitting down (e.g., the TUG) [15,29], walking with head turns and over/around obstacles [30,31], walking while texting [32], and stairway walking [33]. During these tests, several parameters have been extracted from IMUs, including spatio-temporal parameters [15,24,27,28,31,32,34], indexes of gait variability and stability [17,23,24,26,31,33], trunk sway metrics [15,23,30,34], and angular variables [15,25,27,34]. Nonetheless, what does not yet clearly emerge from current literature on pwMS is which of these could be more reliably adopted within the clinical context.

Besides the issue of identifying among the above metrics those that are more capable of characterising the disease progression, hence providing similar results for patients with similar clinical conditions, and that have the sensitivity to detect changes associated with clinical interventions, the clinical adoption of specific gait metrics also requires accounting for a number of pragmatic limitations associated with testing conditions. These include an understanding of which output is more robust to testing site characteristics (e.g., corridor lengths, lightening, noise, etc.), adopted measuring instruments and their configuration (e.g., brand, location on the body, sampling frequency) [35–37], type of gait test (e.g., a single pass, a 1-minute or a 6MWT), or instructions given to patients (e.g., self-selected or fast walking speed, use or not use of an assistive device) [28,38–45]. All these aspects are particularly difficult to standardise in a busy clinical environment and most likely occur in combination with each other.

The aim of this study was to identify those gait metrics that provide equivalent assessment of pwMS with similar characteristics in terms of age, gender, and gait disability, despite these being tested in different centres and in non-standardised conditions. Our hypothesis was that while pwMS might be able to adjust their gait in terms of spatio-temporal parameters in response to different testing conditions (e.g., if asked to increase their speed), they would not be able to control those aspects of gait more related to its overall quality and energetic efficiency [46,47]. As a result, metrics extracted directly from the acceleration signals and representative of intensity, smoothness, stability, symmetry, and regularity were expected to be more robust to differences in the test settings. To verify this hypothesis, we compared retrospective data from two matched cohorts of pwMS, which were collected by two independent hospitals using protocols that differed for: (i) brand, size, and sampling frequency of the IMUs; (ii) IMU positioning; (iii) subject instructing; (iv) length of the path. As a term of reference, we also compared differences in IMU-based metrics between the two centres (between-centre differences) to those observable between two sessions performed by the same centre (between-day test-retest reliability).

#### **2. Materials and Methods**

#### *2.1. Participants*

Two research centres, one located in Italy (centre A) and one in the United Kingdom (centre B), provided retrospective IMU data collected while pwMS walked back and forth for 6 min along a hospital corridor. The patients' level of disability was assessed with the EDSS scale, scored by an experienced neurologist. Patients were excluded if not free from any orthopaedic and/or musculoskeletal and neurological disorders other than MS that may have affected their gait and balance. Since there were no restrictions for MS subtypes, both patients with relapsing remitting MS who were relapse-free for 30 days prior to assessment (centre A) and patients with secondary progressive MS (centre B) were included in the study. Thirteen pwMS were selected from each data set to form two cohorts, with individual patients matched if having the same age, gender, EDSS score, and type of assistive device (Table 1). As a result of this matching, the sample size, percentage of females, EDSS score distribution, number of pwMS who required an assistive device, and type of assistive device used during the walking test were the same in the two centres. The average walking speed was calculated as the total distance walked during the test divided by the duration of the walking trial.

**Table 1.** Clinical characteristics of people with multiple sclerosis for centre A and centre B. Abbreviations: expanded disability status scale (EDSS); people with multiple sclerosis (pwMS); Mann-Whitney U (MWU) statistic; *p*-value (*p*); chi-square (X2).


Values are median (range) or numbers. \* *p* < 0.05.

pwMS from centre B repeated the instrumented walking test on a second visit, which was held 7–14 days after the first test at the same time of the day. The testing procedures were also kept constant between the two sessions. These data were used to assess between-day test-retest reliability.

Institutional review boards or ethics committees at the institutions in each country approved the separate protocols (NRES Committee Yorkshire & The Humber-Bradford Leeds (reference 15/YH/0300) and Ethical Committee of Don Carlo Gnocchi Foundation, Milan, Italy, references 29-03-2017 and 13-02-2019). Written informed consent was provided by all subjects. Data were collected in accordance with the International Declaration of Helsinki.

#### *2.2. Experimental Protocol*

Acceleration and angular velocity data from three IMUs, located at the fifth lumbar vertebra and around the right and left ankles, were recorded in both centres while pwMS walked back and forth for 6 min along a straight corridor free of obstacles and other people. If needed, they could use an assistive device and take short resting breaks while standing. Each IMU was manually aligned along the anatomical antero-posterior (AP), medio-lateral (ML), and vertical (V) axes.

The differences between the experimental protocols followed by centre A and centre B were: (i) device manufacturers and sampling frequency used to record acceleration and angular velocity signals; (ii) ankle IMU position; (iii) length of the walkway; (iv) instructions given to participants (Figure 1). Specifically, Xsens IMUs (unit weight 16 g, unit size 47 mm × 30 mm × 13 mm; MTw, Xsens, NL) with a sampling frequency of 75 Hz were used in centre A and OPAL IMUs (unit weight 22 g, unit size 48.5 mm × 36.5 mm × 13.5 mm; OPAL, APDM Inc., Portland, OR, USA) with a sampling frequency of 128 Hz were used in centre B. The IMUs around both ankles were placed laterally in centre A and frontally in centre B. PwMS were requested to walk at their maximum speed along a 30-meter straight corridor in centre A and at preferred comfortable speed along a 10-meter straight corridor in centre B.

**Figure 1.** Experimental protocols followed by centre A (red) and centre B (blue).

#### *2.3. Data Processing*

Data processing routines were developed in Matlab® (MATLAB R2019b, MathWorks, Inc., Natick, MA, USA). A total of 33 IMU-based metrics were included in this analysis. IMU signals collected in centre B were down sampled from 128 Hz to 75 Hz to match data from centre A, and the influence of down sampling was investigated by comparing the outcome metrics from centre B as obtained before and after the down sampling. Data from the lumbar IMU were reoriented to a horizontal-vertical coordinate system [48] and filtered with a 10 Hz cut-off, zero phase, low-pass Butterworth filter.

The turning motion and resting breaks were detected and removed from IMU signals to isolate steady-state walking bouts, which were used to compute the metrics of interest. The approach proposed by Salarian, et al. [49] was adapted to determine 180◦ turns, which appear in the V component of the lumbar angular velocity, ω*z(t)*, as peaks of a given duration. The turning onset and offset were identified from the trunk rotation angle around the V axis, θ*z(t)*, obtained after integrating the ω*z(t)* signal. The turning components were evidenced in θ*z(t)* as steep positive or negative gradients, whereas walking components were evidenced as small oscillations round a flat line. Specifically, θ*z(t)* was first smoothed using a weighted least-squares linear regression. Abrupt change points and their locations were then searched in θ*z(t)* using a predefined Matlab® function based on the minimisation of a linear computational cost function [50]. Resting breaks were automatically detected by checking in 2-s window increments if: (i) the norm of the lumbar IMU angular velocity was less than 0.5 rad/s; (ii) the norm of the lumbar IMU acceleration was within <sup>±</sup>10% of 9.81 m/s<sup>2</sup> [51]. A 2-s window was considered motionless if more than 50% of its samples fulfilled both criteria mentioned above.

Twelve gait metrics were extracted from the angular velocities recorded from the ankle IMUs and 21 were extracted from the lumbar IMU accelerations. Following the suggestions of Lord, et al. [52] and Buckley, et al. [53], these metrics were organised in independent gait domains (e.g., rhythm, variability, asymmetry, intensity, stability, smoothness, symmetry, and regularity).

Initial and final foot contact instances, referred to as gait events (GE), were identified for each steady-state walking bout as local minimum values of the ML angular velocity recorded from ankle IMUs of both legs [54]. These minima occur just before and after the instant of maximum ML angular velocity. Once the GE were determined, stride, step, swing and stance durations (representing rhythm domain) were separately estimated for left and right sides. Variability (i.e., within-subject combined standard deviation of left and right; variability domain) and asymmetry (i.e., absolute difference between the mean of left and right time series; asymmetry domain) of these metrics were also computed, applying the established formula in Galna, et al. [55] and Godfrey, et al. [56].

From processing the filtered acceleration signals in time and frequency domain, 21 additional metrics, referred to as gait quality metrics [57], were separately extracted for each acceleration component (AP, ML, and V): (i) intensity as the root mean square (RMS) of each acceleration component around its mean value [44]; (ii) stability as the ratio of the RMS in a given direction to the RMS vector magnitude [58]; (iii) smoothness as the RMS of the jerk [59]; (iv) symmetry represented by the harmonic ratio (HR), defined as the ratio of the sum of the amplitudes of the in-phase harmonics to the sum of the amplitudes of the out-of-phase harmonics [60,61]; (v) regularity as the ensemble of the following three metrics obtained from the unbiased normalised autocorrelation [62]:

$$Step\text{ }regularity = 1st\text{ }peak\text{ }of\text{ }(\frac{1}{N-|m|}\sum\_{i=1}^{N-|m|}\ge(i)\cdot\ge(i+m)\tag{1}$$

$$\text{Stride regularity} = 2 \text{nd peak of } \left( \frac{1}{N - |m|} \sum\_{i=1}^{N-|m|} \mathbf{x}(i) \cdot \mathbf{x}(i+m) \right) \tag{2}$$

$$\text{Regularity index} = \frac{|\text{Stride regularization} - \text{Step regularization}|}{\text{mean}(\text{Stride regularization}, \text{Step regularity})} \tag{3}$$

All metrics were calculated for the part of signals corresponding to the middle eight steps of each pass along the corridor and then averaged over the whole trial. The choice of eight steps was due to the maximum number of steps which subjects in centre B could walk in completely straight condition. Since centre A adopted a three-times longer path, in order to process the same number of steps, only one walking bout in every three was included for centre B.

#### *2.4. Statistical Analysis*

Statistical analyses were performed in R version 3.4.3 [63]. Participant characteristics from centre A and centre B were compared using the independent Mann-Whitney U for age and EDSS scores and Pearson's chi-square for gender. Given the limited sample size and the non-normal distribution of most of the investigated metrics (as a result of the Shapiro-Wilk test), non-parametric tests were performed. The level of significance was taken at 5%. A Wilcoxon signed-rank test was performed to compare the centre B metrics obtained from IMU data sampled at 128 Hz and those down-sampled at 75 Hz.

Between-day test-retest reliability of the metrics was evaluated for centre B through the intra-class correlation coefficients (ICCs) with a 95% confidence interval (CI). ICCs were calculated using a two-way random-effect model and absolute agreement (ICC2,k) [64]. An ICC lower than 0.39 was classified as poor, an ICC between 0.40 and 0.59 as fair, an ICC between 0.60 and 0.74 as moderate, and an ICC greater than 0.75 as excellent [65]. The minimum detectable changes (MDCs), representing the smallest amount of change that can be considered above the bounds of the measurement error *Sensors* **2020**, *20*, 79

and/or within-subject variability, was also computed for each metric at the CI of 95%, according to Equation (4):

$$\text{MDC} = 1.96 \cdot \sqrt{2} \cdot \text{SEM} = 1.96 \cdot \sqrt{2} \cdot \text{SD} \cdot \sqrt{1 - \text{ICC}},\tag{4}$$

where SEM is the standard error of the measurement and SD corresponds to the average of the standard deviations from test and re-test sessions [66].

A Wilcoxon signed-rank test was used to determine if there was a median difference in centre B metrics between the two sessions, whereas an independent Mann-Whitney U test was carried out to compare IMU-based metrics from centre A and centre B.

In all the above tests, if the *p*-value was lower than 0.05, the null hypothesis (e.g., the two population medians were identical) was rejected and the alternative hypothesis accepted. To avoid misinterpretation of the *p*-values and to account for a type II error, the effect size (*r*) for non-parametric tests was also calculated as follows:

$$
\sigma = \mathbf{z} / \sqrt{\mathbf{N}} \tag{5}
$$

where z is the z-score and N is the size of the study (i.e., the number of total observations) on which z is based. Cohen [67] suggested thresholds of 0.1, 0.3, and 0.5 for small, medium, and large effect sizes, respectively.

Median, inter-quartile range, minimum, and maximum values were finally calculated for IMU-based metrics from centre A and centre B (both sessions).

#### **3. Results**

#### *3.1. E*ff*ect of Sampling Frequency*

The results of the comparison between the metrics calculated using the 128 Hz and 75 Hz sampling frequencies are reported in Table 2. The HR, representative of the symmetry domain, was the only metric that significantly differed between the two analyses.

**Table 2.** Effect of down-sampling of the acceleration and angular velocity signals on the investigated gait metrics. Abbreviations: sampling frequency (FS), z-score (z), *p*-value (*p*), and effect size (*r*).



**Table 2.** *Cont*.

Values are median (range). \* *p* < 0.05.

#### *3.2. Between-Day Test-Retest Reliability*

ICC, SEM, and MDC values for between-day assessment are shown in Table 3 for each metric estimated for pwMS from centre B who completed two testing visits. Overall, 17 out of 33 metrics revealed excellent test-retest reliability (ICC: 0.93–0.98; 95% CI: 0.76–0.93), 11 metrics showed moderate test-retest reliability (ICC: 0.88–0.92; 95% CI: 0.62–0.74), and only 5 metrics exhibited poor to fair test-retest reliability with ICC values between 0.72 and 0.86 and 95% CI between 0.13 and 0.52. The Wilcoxon signed-rank test showed no significant differences in any of the metrics between the two sessions (Figure 2 and Table 4).

**Table 3.** Intra-class correlation coefficients (ICC) with a 95% confidence interval (CI), standard error of the measurement (SEM), and minimum detectable change (MDC) for the investigated gait metrics.


Inertial measurement unit (IMU)-based gait metrics with poor to fair test-retest reliability are presented in bold.

**Figure 2.** Minimum, first quartile (q1), median, mean, third quartile (q3), and maximum values of each IMU-based metrics relative to centre A (red) and centre B for between-day test-retest assessment (blue empty boxplots and blue filled boxplots). Values larger than q1 + 1.5(q3 + q1) or smaller than q1 − 1.5(q3 − q1) are considered outliers and are represented with crosses (+). \* *p* < 0.05. Note that, for graphical convenience, the absolute values have been depicted for the step regularity and regularity index in the ML direction.


**Table 4.** Descriptive statistics for the investigated gait metrics from centre B (session1 and session2), including the z-score (z), *p*-value (*p*), and effect size (*r*).

Values are median (range). \* *p* < 0.05.

#### *3.3. Between-Centre Di*ff*erences*

As expected, the comparison between centre A and centre B via the independent Mann-Whitney U test highlighted significant differences for all the temporal metrics (Figure 2 and Table 5; rhythm domain), except for swing duration. Apart from asymmetry of step duration and asymmetry of swing duration, variability and asymmetry of the temporal metrics were significantly lower in centre A compared to centre B (Figure 2 and Table 5; variability and asymmetry domain). However, even though the difference in asymmetry of swing duration between the two centres was non-significant (U = 48.0; *p* = 0.06), a fairly moderate effect size was found for this specific metric (*r* = 0.37). Conversely, a consistency between the two centres was found for 18 out of 21 metrics extracted from acceleration signals (Figure 2 and Table 5; intensity, stability, smoothness, symmetry, and regularity domains). Only the differences in the regularity index in the ML direction and in the HR in the AP and ML directions were proved statistically significant between centre A and centre B (Figure 2 and Table 5).


**Table 5.** Descriptive statistics for the investigated gait metrics from centre A and centre B (session1), including the Mann-Whitney U (MWU) statistic, *p*-value (*p*), and effect size (*r*).


**Table 5.** *Cont*.

Values are median (range). \* *p* < 0.05.

#### **4. Discussion**

This study aimed to identify comparable gait metrics as quantified from IMU data measured from two different hospital settings on two matched cohorts of pwMS (13 pwMS for each centre, Table 1), under the hypothesis that those metrics associated with the overall balance control and coordination of gait (i.e., gait quality metrics) would be robust, even when obtained from different experimental protocols. Reported results overall corroborated this assumption and showed that between-centre differences for most of these metrics were comparable to those obtained by the same centre in two different sessions.

The small sample size, resulting from the attempt of maximising the cohort match, is certainly a limitation of this study. It is worth noting, in fact, that while some of the investigated gait metrics in centre A (e.g., asymmetry of swing duration from asymmetry domain and regularity index from regularity domain) did not differ significantly from those in centre B, an observed medium effect size suggested the opposite might hold true (Table 5). This is indeed likely to be due to the small sample size and possibly due to the higher inter-subject variability observed in centre B.

Since MS is well known for heterogeneity of symptoms, high day-to-day fluctuations, and a large variability in its course [68], care must be taken before generalising our findings to all pwMS with different levels of gait impairment. Another limitation of this study might lie in the fact that patients recruited by the two centres differed in the subtypes of MS. Nonetheless, Dujmovic, et al. [69] showed that the altered gait pattern in pwMS did not depend on the disease phenotype. Additional studies are of course needed to further investigate this aspect.

The comparison between centre A and centre B implied down-sampling the data from the latter. As expected, this affected only the calculation of HR, which is the only metric based on frequency analysis. In particular, changing sampling frequency from 128 Hz to 75 Hz led to decreased values in the AP and V directions and increased values in the ML direction (Table 2). This is in line with what was previously reported by Riva, et al. [35].

Moderate to excellent between-day test-retest reliability was observed for 28 out of 33 IMU-based metrics with few exceptions, which exhibited poor to fair reliability (Table 3). Additionally, all the investigated metrics were not significantly different between the two sessions (Figure 2 and Table 4), even if some of these results (swing duration in particular) should be interpreted with care, due to the medium effect size. These findings confirmed that sensor-based gait analysis is a reliable tool in pwMS, as also reported in previous test-retest studies on pwMS [34].

Walking speed clearly affected the gait outcomes. In particular, the gait metrics representative of rhythm, variability, and asymmetry domains were evidently lower in centre A compared to centre B (Figure 2 and Table 5) due to different instructions given to the participants in terms of walking speed (i.e., walk at maximum speed versus walk at self-selected speed). This finding is in agreement with previous studies on pwMS [28] and on people with other neurological conditions, such as Parkinson's disease [70], which observed a reduction of the above metrics with increasing walking speed. The shorter length of the walkway used in centre B could also have contributed to these differences. In fact, Storm, et al. [22] demonstrated that rhythm and variability metrics decreased when walking longer distances (e.g., lower stride duration and lower variability of stride duration). However, the data available for our study did not allow us to separate walking speed and path effects, and further studies should hence be performed to this purpose.

Unlike the temporal metrics, the gait quality metrics appeared to be robust with respect to the notable differences in the experimental gait protocols adopted by the two centres. Among these metrics, in fact, only differences in the regularity index in the ML direction and the HR (representative of symmetry domain) in the AP and ML directions were found to be statistically significant between centre A and centre B (Figure 2 and Table 5). Again, this specific result could be explained both by the different walking speed and by the different lengths of the walkway in the two centres. Indeed, an association between walking speed and HR has been previously showed, both in healthy young [43,44] and older subjects [39]. These authors observed that the HR increased at the self-selected comfortable walking speed and decreased at slower and faster speeds. A similar trend emerged from our analysis, except for the HR in the ML direction, but this specific metric should be handled with care due to its observed low test-retest reliability (Table 3). The low number of steps (i.e., eight steps) used for calculating the HR for each walking bout might also have contributed to reduce robustness and reliability of this metric [57,71]. However, this choice was imposed by the reduced length of the corridor in centre B. Testing the participants along a shorter path also implied a higher number of turns over the 6 min, resulting in a minor validity of the HR as showed in the research by Riva, et al. [35] and by Brach, et al. [40].

While further studies are of course needed to fully validate this hypothesis, our results suggest that, in agreement with what is already reported for other neurological diseases, such as Parkinson's disease [53], the gait quality metrics extracted from the upper body accelerations should not be considered as a simple reflection of gait spatio-temporal features and might bring complementary informative content in quantifying patients' gait ability. Additionally, these metrics have been recently shown to be sensitive to fatigue and pathology progression in pwMS [72] and, as such, they are promising candidates for quantification of disease progression and rehabilitation interventions in these patients.

#### **5. Conclusions**

In conclusion, this pragmatic study showed consistency in the gait metrics from two matched groups of pwMS, even when they were assessed in two different hospitals and under notably different gait testing conditions. The identification of such robust gait metrics opens the possibility of comparing retrospective data and paves the way for reliable multi-centre studies to be conducted in routine hospital settings rather than in specialised gait research laboratories. This is essential to allow an increase of sample size and statistical power of clinical trials in which rehabilitation interventions need to be quantitatively assessed.

**Author Contributions:** Conceptualisation, M.F. and C.M.; methodology, L.A., I.C., D.C., M.F. and C.M.; software, L.A.; formal analysis, L.A.; resources, D.C., E.G., B.S., D.P. and C.M.; writing-original draft preparation, L.A., I.C., M.F. and C.M.; writing-review and editing, L.A., I.C., D.C., M.F., E.G., B.S., D.P., K.P.S.N. and C.M.; visualisation, L.A.; funding acquisition, M.F., B.S., K.P.S.N. and C.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was co-funded by the NIHR through the Sheffield Biomedical Research Centre (BRC, grant number IS-BRC-1215-20017), by the European Union's Horizon 2020 research and innovation programme and EFPIA via the Innovative Medicine Initiative 2 (Mobilise-D project, grant number IMI22017-13-7-820820), the UK Engineering and Physical Sciences Research Council (Multisim and MultiSim2 project, grant number EP/K03877X/1 and EP/S032940/1), and by the Italian Ministry of Health, Ricerca Corrente (grant number GR-2009-1604984). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, the Department of Health and Social Care, the IMI, the European Union, the EFPIA, or any associated partners.

**Acknowledgments:** We would like to thank all participants for giving their time to support this research. This study was carried out at the NIHR Sheffield Clinical Research Facility (Sheffield, United Kingdom) and at the IRCCS Don Carlo Gnocchi Foundation (Milan, Italy). The authors would like to acknowledge William Hodgkinson, Craig Smith, and Jessy Moorman Dodd for the support in Sheffield's data collection.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Availability:** The data used in this paper will be made publicly available (DOI: 10.15131/shef.data.11395641).

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
