**Wearable Sensors in the Evaluation of Gait and Balance in Neurological Disorders**

Editors

**Antonio Suppa Fernanda Irrera Joan Cabestany**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Antonio Suppa Department of Human Neurosciences, Sapienza University of Rome Italy

Fernanda Irrera Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome Italy

Joan Cabestany Technical Research Centre for Dependency Care and Autonomous Living (CETpD), Universitat Politecnica de ` Catalunya Spain

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Sensors* (ISSN 1424-8220) (available at: https://www.mdpi.com/journal/sensors/special issues/wearable sensors neurological disorders).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Article Number*, Page Range.

**ISBN 978-3-03943-144-1 (Hbk) ISBN 978-3-03943-145-8 (PDF)**

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



### **About the Editors**

**Antonio Suppa**, MD, PhD, is working at the Department of Human Neurosciences, Sapienza University of Rome, Italy. He is currently Delegate of the Italian Society of Clinical Neurophysiology (SINC) and Scientific Advisor of the Sapienza information-based Technology InnovaTion Center for Health (STITCH). His research activity mainly focuses on the pathophysiology of motor symptoms in Parkinson's disease and other movement disorders. His current clinical activity also concerns the objective diagnosis of motor symptoms in patients with movement disorders, by means of advanced wireless and wearable technologies.

**Fernanda Irrera** joined the Sapienza University of Rome in 1989, where she is now a Full Professor of Electronics and a Scientific Advisor of the Sapienza information-based Technology InnovaTion Center for Health (STITCH). Since 2012 she coordinates the IEEE—Electron Device Society—Italy Chapter. Her main research activities are stand-alone and embedded integrated sensors for health and image applications, reliability in CMOS nanoelectronics, and non-volatile memories.

**Joan Cabestany**, PhD, Telecommunication Engineer, is currently working at the Electronic Engineering Department, Universitat Politecnica de Catalunya (UPC), Spain. His main activity ` is related to the research and development of systems for human gait and movement disorder monitoring, mainly related to Parkinson's disease. He is also a co-founder of the Sense4Care Company distributing related medical solutions in the market.

### **Preface to "Wearable Sensors in the Evaluation of Gait and Balance in Neurological Disorders"**

The aging population and the increased prevalence of neurological diseases have raised the issue of gait and balance disorders as a major public concern worldwide. Indeed, gait and balance disorders are responsible for harmful consequences, such as falls, frequently leading to hospitalization and even death. The high healthcare and economic burden of gait and balance disorders on society, therefore, require new diagnostic and therapeutic strategies to promptly address this issue.

Advances in wearable technologies have offered innovative solutions to objectively assess different biological parameters, including motor behaviors, thus providing new opportunities in the management of health-related issues. Accordingly, over recent years, researchers have increasingly devoted greater efforts to assessing gait and balance through wearable sensors in healthy subjects and patients affected by neurological disorders. The use of wearable sensors has multiple appealing prospects, including applications in telemedicine and telerehabilitation in neurological patients with gait and balance disorders.

This book is a printed edition of the Special Issue "Wearable Sensors in the Evaluation of Gait and Balance in Neurological Disorders". It collects sixteen original research articles that provide the most up-to-date information about the objective evaluation of gait and balance disorders by means of wearable sensors in patients with various neurological diseases, such as Parkinson's disease, multiple sclerosis, stroke, traumatic brain injury, and cerebellar ataxia. Overall, this book offers a detailed overview of the most recent achievements in the field and encourages the development of new wearable solutions to address gait and balance disorders in patients with neurological diseases.

> **Antonio Suppa, Fernanda Irrera, Joan Cabestany** *Editors*

#### *Article*

### **Indoor Trajectory Reconstruction of Walking, Jogging, and Running Activities Based on a Foot-Mounted Inertial Pedestrian Dead-Reckoning System**

**Jesus D. Ceron 1, Christine F. Martindale 2, Diego M. López 1,\*, Felix Kluge <sup>2</sup> and Bjoern M. Eskofier 2,\***


Received: 27 September 2019; Accepted: 1 November 2019; Published: 24 January 2020

**Abstract:** The evaluation of trajectory reconstruction of the human body obtained by foot-mounted Inertial Pedestrian Dead-Reckoning (IPDR) methods has usually been carried out in controlled environments, with very few participants and limited to walking. In this study, a pipeline for trajectory reconstruction using a foot-mounted IPDR system is proposed and evaluated in two large datasets containing activities that involve walking, jogging, and running, as well as movements such as side and backward strides, sitting, and standing. First, stride segmentation is addressed using a multi-subsequence Dynamic Time Warping method. Then, detection of Toe-Off and Mid-Stance is performed by using two new algorithms. Finally, stride length and orientation estimation are performed using a Zero Velocity Update algorithm empowered by a complementary Kalman filter. As a result, the Toe-Off detection algorithm reached an F-score between 90% and 100% for activities that do not involve stopping, and between 71% and 78% otherwise. Resulting return position errors were in the range of 0.5% to 8.8% for non-stopping activities and 8.8% to 27.4% otherwise. The proposed pipeline is able to reconstruct indoor trajectories of people performing activities that involve walking, jogging, running, side and backward walking, sitting, and standing.

**Keywords:** trajectory reconstruction; stride segmentation; dynamic time warping; pedestrian dead-reckoning

#### **1. Introduction**

Indoor positioning systems (IPS) enable the provision of several location-based services such as home monitoring, rehabilitation, navigation for blind and visual impaired people, and finding and rescuing people/firefighters in emergencies. IPSs can be divided into two approaches: infrastructure-based and infrastructure-free [1,2]. Infrastructure-based IPS require the deployment of devices in the indoor environment to calculate the position of the person. Among the technologies used by this type of IPS are Wi-Fi [3], radio frequency identification (RFID) [4], Bluetooth [5], ultra-wide band (UWB) [6], infrared [7], and video cameras [4]. Infrastructure-free IPS do not need the deployment of devices and mainly use dead-reckoning algorithms. Those systems are called inertial pedestrian dead-reckoning (IPDR) because they use body movement information measured by inertial measurement units (IMU) to estimate a person's position changes based on a previously estimated or known position [2]. The sum of these changes of position allows the reconstruction of the person's trajectory [2]. An IMU usually consists of a triaxial accelerometer and gyroscope. Although some IMUs

also incorporate a triaxial magnetometer, alterations of the magnetic field indoors make it unreliable for indoor positioning [8].

The advantages of IPDR systems over infrastructure-based systems are generally lower cost, data privacy, and ease of deployment. However, IPDR systems without correction suffer from severe drift, as person displacement is often calculated by integrating acceleration data from the accelerometer twice and integrating the rotational angle from the gyroscope. In consequence, intrinsic errors and IMU noise are raised to the third power, making a person's trajectory reconstruction by direct integration without correction impractical [9–11].

The literature review done in this study is aimed at foot-mounted IMU IPDR systems that only use the accelerometer and/or gyroscope. Foot-mounted IPDRs, together with a zero velocity update (ZUPT) algorithm, have been the most widely and successful method used to mitigate the drift in trajectory reconstruction [9]. We use only the accelerometer and gyroscope because in indoor environments, different sources might produce alterations in the magnetic field that make the magnetometer readings unreliable for trajectory reconstruction [8]. Most of the foot-mounted IPDR systems that only use accelerometer and gyroscope data are based on trajectory reconstruction during normal walking. Natural movements like avoiding obstacles, sitting, swinging legs, stopping, or performing activities like jumping, jogging, or running have rarely been considered [9,10]. In consequence, the literature review is focused on the foot-mounted IPDR systems that have reconstructed the trajectory of walking, jogging, and/or running activities. Thus, only six studies met the inclusion criteria and are part of the literature review. The foot-mounted IPDR systems are usually evaluated in closed-loop trajectories by measuring the return position error (RPE). The RPE indicates the distance between the final position of the person obtained by the system and the actual physical final position of the person at the end of the trial [8].

Threshold-based and machine learning-based foot-mounted IPDR approaches have been proposed to deal with walking and running activities [12–16]. Li et al. [12,13] proposed a threshold-based stance-phase detector that consists of one footstep detector and two zero velocity detectors, one for walking and another for running. The evaluation of the system was done with one pedestrian who followed two closed-loop trajectories while walking and running. For the square-shape path (195.7 m), the RPE was 0.24% for walking and 0.42% for running. For the eight-shape path (292.1 m), the RPE was 0.2% for walking and 1.01% for running. An adaptive zero-velocity detector that selects an optimal threshold for zero-velocity detection depending on the movement (walking or running) of the person was proposed by Wagstaff et al. [15]. This system was evaluated by five people who walked and ran a distance of 130 m in an "L" shaped path. The RPE reported were 1% for walking and 3.24% for running.

Considering that zero-velocity detection using machine learning-based IPDR systems is free of threshold-tuning, Wagstaff et al. proposed a method for zero-velocity detection by using a long short-term memory neural network (LSTM) [16]. Five people walked and ran a 220-m "L" shaped path. The RPE in walking was 0.49% and running 0.93%. Similarly, Ren et al. proposed a zero-velocity detection algorithm based on HMM [14]. The system was evaluated by one person in an oval-shaped sports field of 422 m. The RPE when walking and running was 0.6% and 1.61%, respectively.

The described works have obtained very high precision in the trajectory reconstruction of walking and running. However, the systems were evaluated with very few participants, and the evaluated trajectories involved continuous walking and running activities. Currently, trajectory reconstruction methods in realistic scenarios—with several people, and considering walking, jogging, and running strides—are still missing.

Physical activity classification and gait event detection are key components of the trajectory reconstruction process using IPDR. Machine learning has played an important role in both topics. In [17] it is shown how different machine learning-based algorithms are able to classify different physical activities, including standing, sitting, walking, and running. Gait event detection has been performed by using several machine learning algorithms such as deep learning [18], hidden Markov models (HMM) [19,20] and neural networks [21,22].

The aim of the present work was to propose a pipeline for trajectory reconstruction using a foot-mounted IPDR system able to reconstruct the trajectories of activities that involve walking, jogging, and running strides as well as natural movements like stopping, standing, sitting, and side-walking.

This paper contributes to foot-mounted IPDR systems by (1) comprehensively evaluating the trajectory reconstruction of activities that involve walking, jogging, and running strides including the discrimination of natural activities such as stopping, sitting, and side-walking; and (2) evaluating two algorithms for Toe-off and Mid-Stance detection during walking, jogging, and running strides adapted from the ones proposed by Barth et al. [23].

The proposed pipeline is able to recognize walking, jogging and running strides and detect the Toe-off and Mid-Stance events in each of them. With this information, a foot-mounted IPDR system is able to reconstruct the person's trajectory regardless of their gait speed. This allows the development of new ambient assisted living applications in which indoor tracking is a ground technology as well as the development of new applications for indoor sports.

#### **2. Datasets**

#### *2.1. Unicauca Dataset*

The objective of the Unicauca dataset was to evaluate the trajectory reconstruction of walking, jogging, and running in similar settings as the state-of-the-art methods, which are usually evaluated in close-loop trajectories and the activities performed by the participants include continuous walking, jogging, or running. This dataset was collected at the University of Cauca, Popayán, Colombia. Ten participants (mean age: 30 ± 3 years) walked, jogged, and ran a closed-loop P-shaped path of approximately 150 m (Figure 1) with an IMU attached to the lateral side of the left shoe with a Velcro strap (Figure 2).

**Figure 1.** Illustration of the path used for walking, jogging and running in the Unicauca dataset. It is a "P" shaped path. The dotted red line represents the trajectory followed by one person, dotted black lines show outer edges (walls) of the path, and the blue square shows the start and end point of the trajectory.

The IMU was a Shimmer3 GSR+ (Shimmer Sensing, Dublin, Ireland). Acceleration (range: ±16 g) and angular velocity (range: ±2000 dps) data were collected at a frequency of 200 Hz. Accelerometer calibration consisted in leaving the sensor still for a few seconds lying on each of its 6 sides on a flat surface. For gyroscope calibration, the sensor is rotated around the three axes. At the beginning of each trial, the participant was asked to remain standing without moving the IMU for at least 10 s for gyroscope bias calculation.

**Figure 2.** IMU sensor placement and axis alignment. (**a**) Accelerometer. (**b**) Gyroscope.

#### *2.2. FAU Dataset*

The FAU dataset is based on a previous study evaluating a method for smart labeling of cyclic activities [24] and is publicly available at www.activitynet.org. The dataset provides gait data in a relatively natural setting, and its protocol consisted in the execution of 12 different task-driven activities performed in random order for each participant. It includes data from 80 healthy participants with a mean age of 27 ± 6 years. Data were collected from 56 participants at the Friedrich-Alexander University Erlangen-Nürnberg (Germany) and from 24 participants at the University of Ljubljana (Slovenia). In this study, data collected at Slovenia from 20 of the 24 participants (mean age of 28 years) was used as training dataset [25] and data collected in Germany from the 56 participants were used as evaluation dataset. Only the data collected from the IMU worn on the left foot was used for trajectory reconstruction of ten activities (Table 1). Sensor placement and axis alignment are the same used in the Unicauca dataset (Figure 2). The acceleration (range: ±8 g) and angular velocity (range: ±2000 dps) were collected at a frequency of 200 Hz. The on-ground and off-ground phases of each stride are labeled. The accelerometer was calibrated using six static positions and the gyroscope was calibrated using a complete rotation about each of the three axes. Data were acquired in an indoor environment which including chairs and tables (Figure 3). Jogging was described to the participants as "if one would jog for exercise in the evening" and running as "if one is late for a bus". These instructions were the same used in the Unicauca dataset.

**Figure 3.** Map of the indoor environment used for collecting the FAU dataset. Blue squares represent chairs that denote start/end positions of activities. Black rectangles represent tables, and dotted red lines represent the possible trajectories followed by participants in each activity.


**Table 1.** Activity descriptions and abbreviations, shown with their relevant start and end points as labeled in Figure 3 as well as approximated distances.

#### **3. Methods**

A trajectory reconstruction pipeline was carried out separately for each activity of both datasets (Figure 4). This pipeline is based on previous work by Hannink et al. [26]. A type of activity classification step was included. Toe-Off and Mid-Stance algorithms were modified in order to deal with non-walking strides as well as a complementary filter added for stride length and orientation estimation.

**Figure 4.** Pipeline for trajectory reconstruction for each activity.

#### *3.1. Stride Segmentation*

As shown by Zrenner et al., a threshold-based stride segmentation and a double integration with the ZUPT algorithm performed better than other approaches based on stride time, foot acceleration, and deep learning for calculating stride length in running using a foot-mounted IMU [27]. Thus, multi-dimensional subsequence dynamic time warping (msDTW) and a double integration with ZUPT were used as the stride segmentation and stride length and orientation estimation methods, respectively, in this study [23].

msDTW is used to find a subsequence of continuous signal sequences similar to a given reference pattern. In the context of stride segmentation, that pattern consists of a template of one stride. The stride start was set to the negative peak before the swing phase and stride end to the negative peak at the end of the stance phase (Figure 5a), according to the definition of stride given in [20]. Using that template, msDTW looks for similarities in a movement sequence. msDTW has been shown to be a robust method to segment strides from healthy, geriatric, and Parkinson's patients using foot-mounted IMUs [28].

#### 3.1.1. Template Generation

A MatLab script was developed for template generation. It included two steps: interpolation and averaging. Interpolation consisted of taking each stride and interpolating it to a fixed duration of 200 samples. After interpolation, the template was obtained by averaging, sample by sample, all the strides. The templates for walking, jogging, and running were built using the 8724, 1688, and 1360 walking, jogging, and running strides, respectively, of the training dataset. Unlike other studies, which used only straight strides for building templates [23,28,29], the three templates were built with all the strides of the activities. Thus, both straight and non-straight strides were included in the templates.

The swing-phase starts when the foot leaves the ground (Toe-Off) and ends when the heel strikes the ground (Heel Strike). The portion of the gyroscope z-signal after Heel Strike (HS) describes the stance-phase. A Mid-Stance (MS) event is defined as the part of the stance-phase when the signal energy is zero [30].

**Figure 5.** (**a**) Walking, jogging, and running templates (gyroscope z-axis). (**b**) Running stride example (gyroscope z-axis).

#### 3.1.2. Classification of Walking, Jogging, and Running Activities

In order to automatically select the walking, jogging, or running template that will be used in the stride segmentation process, the machine learning algorithms included in the Matlab Classification Learner app were trained using the activities of the training dataset. A window size of 200 samples (1 s of data) and an overlap of 100 samples were used for feature extraction. The features extracted were velocity (by integrating accelerometer readings), angular velocity (by integrating gyroscope readings) and energy of accelerometer and gyroscope axes. The most frequent value in the result was chosen as the final classification. The evaluation was performed using ten-fold cross-validation. As a result, the highest accuracy (98.1%) was achieved by the SVM classifier with a polynomial kernel function of third-order.

#### 3.1.3. Multi-Subsequence Dynamic Time Warping Implementation

The output of the stride segmentation based on msDTW is a set of segments [31]. Each segment describes a possible stride. One issue using these resulting segments for trajectory reconstruction is that often the end of a segment does not coincide with the start of the next segment even for consecutive strides (Figure 6a). The solution to this issue is based on the Toe-Off (TO) detection, which is described in the next section. Using the templates (Figure 5a), the first event detected in each stride is TO. For this reason, TO was defined as the beginning of a stride. For consecutive strides, the end of the stride corresponds with the beginning of the next stride (next TO), resulting in a stride segmentation without "holes" (Figure 6b).

The precision and sensitivity of the stride segmentation using msDTW can be tuned using a threshold. The threshold needed to detect a stride indicates the similarity between that stride and the template used, that is, a large threshold indicates a large difference between the template and the segmented stride [23]. Therefore, with a very small threshold, the number of false negatives strides would increase, and a very large threshold would generate false positives strides. Thresholds from 0 to 100 in steps of 5 were tested on the training dataset. As a result, it was found that a fixed threshold of 65 maximizes the F-score of the stride segmentation in walking, jogging, and running activities (Figure 7).

**Figure 6.** (**a**) Result of stride segmentation with msDTW. (**b**) Final stride segmentation with TO detection. Blue vertical lines depict TOs. Light blue rectangles are segmented strides.

**Figure 7.** Threshold choice for stride segmentation of walking, jogging, and running strides using msDTW.

#### *3.2. Toe-O*ff *and Mid-Stance Detection*

The previous algorithms for TO and MS detection [31] were modified in order to improve detection accuracy in jogging and running. These modifications are described in this section. Both previous and proposed algorithms use the signal of the gyroscope *z*-axis for TO and MS detection.

#### 3.2.1. To Detection

At TO, the gyroscope *z*-axis describes a zero-crossing because of the ankle joint changes from plantar flexion to a dorsal extension position in the sagittal plane [23]. The algorithm included in [31] for TO detection consists of detecting the first zero-crossing in the gyroscope *z*-axis. Due to the abrupt movements in jogging and running strides, in a few cases, a peak located at the beginning of the stride causes a zero crossing. This would lead to a wrong TO detection (red circle in Figure 8). Consequently, the adapted algorithm for TO detection (Algorithm 1) find the maximum peak of the signal and then find the nearest zero crossing before it (blue circle in Figure 8). After the detection of all the TOs that belong to the activity, all the portions corresponding from TO to TO are considered as strides (Figure 6b). Considering that the stride time of walking strides is around one second [24], if one TO to TO portion is greater than 2 s (400 samples), only the signal until 1.5 s was taken into account. This often happens because the participant is standing still or sitting.

#### **Algorithm 1: Toe-o**ff **(TO) detection algorithm.**

1: *xMP* ← *getMaximumPeak(stride)*

*2: xZC* ← *getZeroCrossings(stride(1 : xMP))*

*3: TO* ← *getNearestZCtoMP(xZC, xMP)*

**Figure 8.** Example of TO and MS detection. The red circle and square show a wrong TO and MS detection, respectively, using the previous TO detection algorithm. The blue circle and square show an adequate TO and MS detection, respectively, using the proposed algorithms.

#### 3.2.2. Mid-Stance Detection

At Mid-Stance (MS) we define that the foot is entirely stationary on the ground [23,28] and its velocity is zero. The gyroscope *z*-signal is minimal at that moment. As the speed of movement increases from walking to running, the stance-phase time decreases (Figure 5a) making MS detection more difficult [10]. The previous algorithm for MS detection in walking strides consists of calculating the middle of the window with the lowest energy in the full stride's gyroscope *z*-signal [23,28,31]. For jogging and running strides, the MS is often confused with other parts of the signal like the valley just before the HS or the peak before the next TO (red square in Figure 8).

The adaptation of the MS detection algorithm (Algorithm 2) consisted of (1) taking only the stride portion from HS to 80% of the stride—this portion was chosen taking into account that the stance-phase of walking strides is approximately the last 60% of the stride and for jogging and running strides it is approximately the last 40% of the stride [25]; (2) calculating the middle of the window with the lowest energy within that portion—to this end, a window size of 20 samples (100 ms) and a window overlap of 10 samples (Blue square in Figure 8) are used.

#### **Algorithm 2: Mid-Stance (MS) detection algorithm.**

1: *windowSize* ← *20 2: overlap* ← *10 3: stride* ← *interpolateStrideTo200Samples(stride)* 4: *xMP* ← *getMaximumPeak(stride) 5: stride* ← *stride(xMP : 160) 6: xHS* ← *getMinimumPeak(stride) 7: stride* ← *stride(xHS : end) 8: MS getMinimumEnergy(stride, windowSize, overlap)*

3.2.3. Stride Length and Orientation Estimation

The biggest challenge to adequately estimate stride length using IMU data is the significant bias derived from the use of IMUs, which leads to large drifts after the double-integration process. For that reason, the ZUPT method was used. Zero-velocity detection was done by evaluating a threshold on the magnitude of the gyroscope rate of turn of each measurement. If the measurement is less than a threshold of 0.6 dps, that measurement is considered as a zero-velocity measurement. It has been proved that this simple approach works properly in walking strides [11,30]. However, this approach does not work correctly in jogging and running strides due to the abrupt signal variations. The solution to this problem is the use of the MS detected previously. Taking into account that the average stance-phase time in running strides is around 100 ms (20 samples), it was empirically found that taking 5 samples to each side of the MS (which corresponds to 50 ms with the sampling frequency used) leads to better zero-velocity detection in jogging and running strides.

After zero-velocity detection, a complementary Kalman filter (CF) was used in order to model the error in velocity and position estimates using the ZUPTs as measurements (see Appendix A for details). When zero-velocity is detected, but the estimated velocity is different to zero, the CF adjusts the velocity and the corresponding displacement. The CF used in this work is based on the proposed work by Fischer et al. [11]. Three main parameters have to be set up for CF initialization: accelerometer and gyroscope noise (σ*<sup>a</sup>* and σ*w*) and the ZUPT detection noise (σ*v*). Accelerometer and gyroscope noise were set to equal value in both datasets (σ*<sup>a</sup>* = 0.01 *m*/*s*<sup>2</sup> and σ*<sup>w</sup>* = 0.01 *rad*/*s*). ZUPT detection noise depends on the velocity of the participant. That parameter was established by evaluating from σ*<sup>v</sup>* = 0.001 *m*/*s* to σ*<sup>v</sup>* = 0.05 *m*/*s* in steps of 0.001 *m*/*s* for each trajectory performed. The σ*<sup>v</sup>* chosen was the one that produced the least error in the final distance evaluated. The stride length and orientation estimation are obtained using the position increments in each MS event. Stride length, where ∇P*<sup>k</sup>* is the position increment from stride k-1 to stride k, is calculated as follows:

$$SL\_k = \sqrt{\nabla \mathcal{P}\_k(\mathbf{x})^2 + \nabla \mathcal{P}\_k(y)^2} \tag{1}$$

#### **4. Results**

#### *4.1. Unicauca Dataset*

#### 4.1.1. Classification of the Type of Activity

The accuracy in the activity classification was 90%. There were only three misclassifications: two running activities were classified as jogging activities and one jogging activity was classified as a running activity (Figure 9).

**Figure 9.** Confusion matrix of the classification of the type of activity in the Unicauca dataset.

#### 4.1.2. Toe-Off and Mid-Stance Detection

In this dataset, TO and MS were manually labeled. A TO/MS is considered as a true positive (TP) if it is located within 15% of the total number of samples of the stride to the right and left of the TO/MS ground truth. A false positive (FP) occurs when a TO/MS is detected outside this range. A false negative (FN) indicates that a TO/MS for a stride was not detected. Having in mind that 40% and 60% of the stride corresponds to the stance-phase of walking and running strides, respectively [25], the TO detection performance was evaluated in the training dataset using error ranges from 5% to 21% of the total stride in steps of 3% (Figure 10). As a result, 15% was chosen as an acceptable error range for TP calculation.

**Figure 10.** TO performance evaluation using error ranges from 5% to 21% in steps of 3%.

Results of the evaluation of the TO and MS detection using the previous and proposed algorithms are shown in Tables 2 and 3, respectively.


**Table 2.** Averaged results of TO and MS detection for the 10 participants in the Unicauca dataset using the previous TO and MS detection algorithms.

TO GT: ground truth TO rate. MS GT: ground truth MS rate. TP: true-positive rate. FP: false-positive rate. FN: false-negative rate.

**Table 3.** Averaged results of TO and MS detection for the 10 participants in the Unicauca dataset using the proposed TO and MS detection algorithms.


TO GT: ground truth TO rate. MS GT: ground truth MS rate. TP: true-positive rate. FP: false-positive rate. FN: false-negative rate.

A perfect F-score was obtained for TO and MS detection in walking strides. Very few mistakes occurred for jogging and running, but the F-score remains high.

#### 4.1.3. Trajectory Reconstruction

Two evaluation measures were used. (1) Return position error (RPE): the distance between the coordinates of the actual final point of the activity and the coordinates of the participant's final stride of the corresponding activity. (2) Strides out of trajectory (SOT): All strides of the reconstructed trajectory should be within the boundaries of the corridors represented by black dotted lines (Figure 11). Otherwise, those strides will be counted as out of trajectory.

Higher velocity corresponds to more SOT and RPE. Although, on average, 5.7 % of the strides are out of trajectory in the running trial, the RPE remains less than 1.0% (Table 4). Trajectories of the three trials are mostly within the boundaries (Figure 11).


**Table 4.** Average results of trajectory reconstruction for each type of activity performed by the 10 participants using the previous and the proposed TO and MS detection algorithms.

SOT: strides out of trajectory, RPE: return position error, [31]: previous TO and MS detection algorithms, New A: proposed TO and MS detection algorithms.

**Figure 11.** Trajectory reconstruction for the ten participants of the Unicauca dataset in a P shaped path. Black dotted lines show outer edges (walls) of the possible path. Gray lines are the trajectories reconstructed of the ten participants by using the proposed pipeline.

#### *4.2. FAU Dataset*

#### 4.2.1. Classification of the Type of Activity

The accuracy obtained by the SVM classifier was 93%. Most of the misclassifications occurred when classifying between running and jogging (Figure 12).

**Figure 12.** Confusion matrix of the classification of the type of activity classification in the FAU dataset activities.

#### 4.2.2. Toe-Off Detection

The last sample of the on-ground phase of each stride was used as ground truth for the evaluation of the TO detection algorithm (Table 5). The same criteria used in the Unicauca dataset for TP, FP, and FN calculations were used. The evaluation was carried out on the data collected from the 56 participants at the Friedrich-Alexander University Erlangen-Nürnberg (Germany) of FAU dataset.

**Table 5.** Average results of TO detection for each type of activity performed by the 56 participants in the FAU dataset using the previous and the proposed TO detection algorithms.


TO: toe-off rate, TP: true positives rate, FP: false positives rate, FN: false negatives rate, [31]: previous TO and MS detection algorithms, New A: proposed TO and MS detection algorithms.

#### 4.2.3. Body Trajectory Reconstruction

For RPE estimation in FAU dataset (Table 6), it is important to note that the start/end activity positions were defined by chairs in the indoor environment. For that reason, the actual positions where the participants started and finished the activities were not precisely the same as the chairs' positions since participants began each activity near the corresponding chair and did not necessarily return to the exact point where they started the activity. Based on the videos of the data collection, participants started and finished the activities within a radius of 1.5 m around the chairs. Light blue and gray rectangles in Figures 13 and 14, respectively, indicate the path where all the strides related to a certain activity should take place. If a stride is out of this path, it is considered as a Stride Out of Trajectory (SOT). A SOT can be caused by the accumulative error of stride lengths and angle calculation of previous strides. These zones were defined taking into account the coordinates of the chairs and tables and the boundaries of the indoor environment.


**Table 6.** Averaged results of trajectory reconstruction of activities performed by the 56 participants in the FAU dataset using the previous and the proposed TO and MS detection algorithms.

SOT: strides out of trajectory, RPE: return position error, [31]: previous TO and MS detection algorithms, New A: proposed TO and MS detection algorithms.

**Figure 13.** Trajectory reconstruction of non-circuit activities for all 56 participants of the FAU dataset. Black, blue and orange lines denote R-20, J-20, and W-20, respectively. Red, green, violet and light green lines represent W-Cards, W-Slalom, W-Posters, and W-Tables, respectively. Gray rectangles represent zones where all the strides related to certain activity should take place.

**Figure 14.** Trajectory reconstruction of circuit activities for all 56 participants of the FAU dataset. Black lines denote the trajectory follows by the participants. Gray zones represent the zone where all the strides should take place.

Most of the trajectories were inside the zones (Figures 13 and 14). The trajectory reconstruction of activities W-20, J-20, and R-20 describes two straight trajectories, joined by a 180-degree turn. The trajectory reconstruction of W-Slalom allows sight of the area where the tables are located. The W-Posters activity includes non-straight strides, which are well described in the trajectory obtained. Regarding the circuit activities, although most of the strides are inside the activity zones, some trajectories lead towards the outer part of the activity zone. Others lead towards the internal part of the circuit (Figure 13).

#### **5. Discussion**

We have proposed a pipeline for indoor trajectory reconstruction of walking, jogging, and running activities. The proposed pipeline was evaluated with two datasets. The results showed that it is able to reconstruct a person's trajectory regardless of their gait speed.

#### *5.1. Classification of the Type of Activity*

It was found that the classification model obtained with the SVM algorithm is able to classify the three types of activities performed: walking, jogging, and running. The classification between jogging and running is the one in which the classifier made more mistakes. This is possibly due to the jogging and running speeds of some participants being similar. The use of personal models to avoid this problem could be promising.

#### *5.2. TO and MS Detection*

Previous studies focused on the reconstruction of the trajectory during walking and running and do not show results of segmentation or detection of strides [18–22]. The two datasets used in this study allow TO evaluation. In the case of MS detection, ground truth information was not available in the FAU dataset. Therefore, it was not possible to evaluate MS detection in that dataset. However, a high F-score was obtained in the detection of MS in the Unicauca dataset.

While the F-score obtained for the proposed TO and MS detection algorithms is similar to that obtained for the previous algorithms for walking activities, the F-score achieved for the proposed TO and MS detection algorithms outperformed that achieved for the previous algorithms for all jogging and running activities. That suggests that the proposed algorithms can detect those gait events in walking, jogging, and running strides. The number of false positives (FP) was always higher than the number of false negatives (FN). This could indicate that the threshold used for stride segmentation with msDTW might have been overestimated, since stride segmentation using a large threshold implies that there is a large difference between the template used and the segmented strides, leading to the detection of FP strides. However, it was checked that by reducing that threshold, the number of FN increased, causing a decrease in the F-score. Threshold-free methods based on machine learning techniques such as those used by Ren [20] and Wagstaff [22] would make the stride segmentation process straightforward by avoiding setting any threshold.

The lowest F-scores are obtained for three walking activities: W-Posters, W-Tables, and W-Cards, which might be due to the fact that those activities involve non-stride movements such as stopping, sitting, lateral and backward steps. This could be because the signal generated for those foot movements is different from the walking/running templates. This could be accounted for by using templates generated by those specific movements, as previously demonstrated in [29], where specific templates were generated for each specific activity such as ascending and descending stairs. Unfortunately, the wide range of possible natural foot movements makes this alternative hard to implement. A hierarchical hidden Markov model (hHMM) approach has proved to be a robust method for stride segmentation of walking activities that include non-stride movements in Parkinson's patients [14] and for stride segmentation of jogging activities [15]. Furthermore, hHMM is a threshold-free approach, therefore it should be explored in order to improve the results obtained for the walking activities that include non-stride movements such as W-Posters, W-Tables, and W-Cards, as well as for stride segmentation of jogging and running activities.

#### *5.3. Trajectory Reconstruction*

Usually, the foot-mounted IPDR systems have been evaluated in closed-loop trajectories and by measuring the Return Position Error (RPE) [18–22]. The purpose of the Unicauca dataset was, therefore, to provide a starting point to allow a fair comparison with the state-of-the-art papers.

Sometimes the RPE is small, although the reconstructed trajectory does not fit the actual trajectory performed by the person. That is why we proposed the number of strides out of the trajectory as an additional evaluation metric. The RPEs obtained with the pipeline proposed in this paper for the three trials collected in the Unicauca dataset are less than 1%. The results obtained by the works described in the literature review section are also lower than 1%.

As a result of the better detection of TO and MS obtained by using the algorithms proposed in this study, there is also a better trajectory reconstruction since there were fewer strides out of trajectory (SOT) and shorter RPE for jogging and running activities. This demonstrates two things. The first is the importance of performing a correct detection of TO and MS for trajectory reconstruction. The second is that if the complementary filter does not have precise data to perform the ZVUs, it is not capable of modeling errors in speed on its own, even if its parameters were tuned. It has also been demonstrated that by properly detecting TO and MS, the complementary filter is capable of modeling errors in walking, jogging, and running strides.

RPE obtained for trajectories in the FAU dataset are higher than for the Unicauca dataset. It is important to highlight two limitations that the FAU dataset has for trajectory reconstruction. Firstly, the position of the participants at the beginning and end of the activities is not exactly the same. When analyzing the videos of the FAU dataset collection, it was concluded that these positions vary approximately in a radius of one and a half meters, taking as reference the chairs that indicated the start and end of the activities. Therefore, the RPEs calculated have an error of ±1.5 m. This fact should be taken into account for the preparation of the protocol for the collection of a future dataset. Secondly, it was not possible to subtract the gyroscope bias in all activities performed in the FAU dataset, because the activities were performed continuously. A prerequisite for bias computation is that the person stands still for a few seconds for the calculation of the mean of the gyroscope readings and then subtracting it from the entire movement sequence.

The number of strides out of trajectory is directly related to the RPE obtained; the more strides out of the acceptable path range, the higher the RPE. When observing the trajectory reconstruction of the activities W-20, J-20, R-20, and W-Circuit, J-Circuit, R-Circuit, it appears that the difficulty in trajectory reconstruction increases with stride velocity (from walking to jogging and running). This also occurred in the five papers described in the literature review section [18–22]. In those papers, the evaluation was performed with very few people. From our study, we can confirm that there is still a gap in trajectory reconstruction using foot-mounted IPDR systems of jogging/running activities regarding the trajectory reconstruction of walking activities.

The RPE of the trajectory reconstruction of W-Cards, W-Tables, and W-Posters activities are particularly high, due to the bad detection of TOs. These activities should be treated with special care in future works since they describe movements of daily living activities that happen frequently.

The trajectories obtained have a very well-defined shape and could be used for mapping an indoor environment.

One important recommendation for future work in the field of trajectory reconstruction using IPDR systems is that the datasets collected for evaluation are labeled at activity and stride/step levels, as the FAU dataset used in this paper. Additionally, the participants of the data collection process must start and end precisely at the indicated coordinates.

#### **6. Conclusions**

In this paper, we have proposed and evaluated a pipeline for trajectory reconstruction of walking, jogging, and running activities using a foot-mounted inertial pedestrian dead-reckoning system. The dynamic time warping method was adapted within this paper to segment walking, jogging, and running strides. Stride length and orientation estimation were performed using a zero velocity update algorithm adapted for walking, jogging, and running strides and empowered by a complementary Kalman filter.

The presented results showed that the proposed pipeline provides good trajectory estimations during walking, jogging, and running. TO detection algorithm reached an F-score between 92% and 100% for activities that do not involve stopping, and between 67% and 70% otherwise. Resulting return distance errors were in the range of 0.51% to 8.67% for non-stopping activities and 8.79% to 27.36% otherwise.

To the best of the authors' knowledge, this is the most comprehensive evaluation of a foot-mounted IPDR system regarding the type and number of activities and quantity of people included in the datasets and can serve as a baseline for the comparison of future systems. Future work will be focused on using hidden Markov models in order to improve stride segmentation and fusing symbolic location from an RSSI signal to update the indoor localization when possible.

**Author Contributions:** Conceptualization, J.D.C., C.F.M., D.M.L., F.K. and B.M.E.; Formal analysis, J.D.C. and D.M.L.; Resources, C.F.M., F.K. and B.M.E.; Software, J.D.C.; Writing—original draft, J.D.C. and D.M.L.; Writing—review & editing, C.F.M., F.K. and B.M.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Departamento Administrativo de Ciencia, Tecnología e Innovación (COLCIENCIAS), (Call 727-2015).

**Acknowledgments:** Jesus Ceron gratefully acknowledges the support of the Departamento Administrativo de Ciencia, Tecnología e Innovación (COLCIENCIAS) within the national doctoral grants, call 727-2015. Bjoern Eskofier gratefully acknowledges the support of the German Research Foundation (DFG) within the framework of the Heisenberg professorship programme (grant number ES 434/8-1).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A. Complementary Filter**

The initialization of the Complementary Filter (CF) implies to establish a series of matrices. First, the state of the CF includes the errors in orientation, position, and velocity. (A1) shows the state in an array representation. Each array element is a 1 × 3 array containing the errors in the three-axis.

$$E = \begin{bmatrix} E\_o \ E\_p \ E\_v \end{bmatrix} \tag{A1}$$

The error covariance matrix accumulates the error in orientation, position, and velocity produced in each sample k:

$$P\_k = \begin{bmatrix} 0 \text{9x9} \end{bmatrix} \tag{A2}$$

The state transition function is a matrix that is multiplied with the previous state to get the next state, as shown in (A7). 'S' is the Skew-symmetric cross-product operator matrix formed from the n-frame accelerations and is the time step equals to 0.005 s, which results from dividing 1 s between the IMU data collection frequency (200Hz).

$$\begin{array}{ccccc} I\_{3\mathbf{X}3} & \mathbf{0\_{3x3}} & \mathbf{0\_{3x3}} \\ F\_k = & \mathbf{0\_{3x3}} & I\_{3\mathbf{X}3} & I\_{3\mathbf{X}3}\Delta t \\ & -S\Delta t & \mathbf{0\_{3x3}} & I\_{3\mathbf{X}3} \end{array} \tag{A3}$$

The process noise covariance matrix is calculated for each sample by multiplying the accelerometer and gyroscope noise by:

$$Q\_k = \begin{bmatrix} \begin{pmatrix} \sigma\_{w\_x} \ \sigma\_{w\_y} \ \sigma\_{w\_z} \ 0 \ 0 \ 0 \ \sigma\_{a\_x} \ \sigma\_{a\_y} \ \sigma\_{a\_z} \end{pmatrix} \Delta t \end{bmatrix} \tag{A4}$$

The uncertainty in velocity during each ZUPT is represented using the measurement noise covariance matrix (A5). It is a diagonal matrix because no correlation in velocity is supposed to exist between axes.

$$R = \begin{array}{cccc}\sigma\_{\upsilon\_x}^2 & 0 & 0\\0 & \sigma\_{\upsilon\_y}^2 & 0\\0 & 0 & \sigma\_{\upsilon\_z}^2\end{array} \tag{A5}$$

The measurement function matrix is used to move from the state variables space to the measurement variables states. In this implementation, the measurements are the ZUPTs that is when velocity is supposed to be zero. That way, the measurement function has to contain an identity matrix in the position of the velocity error state as follows:

$$H\_k = \begin{bmatrix} (\mathbf{0}\_{3\mathbf{x}\mathbf{3}} \ \mathbf{0}\_{3\mathbf{x}\mathbf{3}} \ I\_{3\mathbf{x}\mathbf{3}}) \end{bmatrix} \tag{A6}$$

Before running the CF, the gyroscope bias has to be removed. Gyroscope bias is obtained by calculating the mean of the gyroscope readings while IMU is not moving just before the beginning of the activity. The resulting value is subtracted to all gyroscope signals.

After gyroscope bias subtraction, the CF is executed. It has two phases: Prediction and update. In the prediction phase, the error covariance matrix (*Pk*) is propagated using (A7):

$$P\_k = \,^F\_k P\_{k-1} \mathbf{F}\_k^T + \mathbf{Q}\_k \tag{A7}$$

Only when a sample k is a ZUPT, the Update phase comes into play. In this case, the Kalman gain is calculated with (A8), and with that gain, the error is obtained using (A9).

$$K\_k = \left| P\_k H^T (H P\_k H\_T + R) \right|^{-1} \tag{A8}$$

$$E = \left[ E\_{\mathcal{O}} \ E\_{\mathcal{P}} E\_{\mathcal{V}} \right] = \left. K\_k V\_k \right| \tag{A9}$$

Finally, the velocity and position estimates are corrected as well as *Pk*:

$$V\_k = \,^\prime V\_k - E\_v \tag{A10}$$

$$\text{Pos}\_k = \text{Pos}\_k - E\_p \tag{A11}$$

$$P\_k = (I\_{\theta x9} - K\_k H) P\_k \tag{A12}$$

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Technical Note*

### **The Coe**ffi**cient of Variation of Step Time Can Overestimate Gait Abnormality: Test-Retest Reliability of Gait-Related Parameters Obtained with a Tri-Axial Accelerometer in Healthy Subjects**

**Shunrou Fujiwara 1,2,\*, Shinpei Sato 1, Atsushi Sugawara 1, Yasumasa Nishikawa 1, Takahiro Koji 1, Yukihide Nishimura <sup>3</sup> and Kuniaki Ogasawara <sup>1</sup>**


Received: 25 October 2019; Accepted: 16 January 2020; Published: 21 January 2020

**Abstract:** The aim of this study was to investigate whether variation in gait-related parameters among healthy participants could help detect gait abnormalities. In total, 36 participants (21 men, 15 women; mean age, 35.7 ± 9.9 years) performed a 10-m walk six times while wearing a tri-axial accelerometer fixed at the L3 level. A second walk was performed ≥1 month after the first (mean interval, 49.6 ± 7.6 days). From each 10-m data set, the following nine gait-related parameters were automatically calculated: assessment time, number of steps, stride time, cadence, ground force reaction, step time, coefficient of variation (CV) of step time, velocity, and step length. Six repeated measurement values were averaged for each gait parameter. In addition, for each gait parameter, the difference between the first and second assessments was statistically examined, and the intraclass correlation coefficient (ICC) was calculated with the level of significance set at *p* < 0.05. Only the CV of step time showed a significant difference between the first and second assessments (*p* = 0.0188). The CV of step time also showed the lowest ICC, at <0.50 (0.425), among all parameters. Test–retest results of gait assessment using a tri-axial accelerometer showed sufficient reproducibility in terms of the clinical evaluation of all parameters except the CV of step time.

**Keywords:** gait assessment; tri-axial accelerometer; CV; healthy subjects; test-retest

#### **1. Introduction**

Walking is naturally performed by humans without any deficits, and it is one of the indexes that show normality of motor and/or cognitive function in healthy subjects and abnormality in patients [1–6]. In patients with neurodegenerative diseases such as Parkinson's or Huntington's disease, freezing of gait or reduction of gait performance is often observed [7–9]. For assessing the gait in post-stroke patients during the follow-up period, a special portable stride analyzer was used in a previous work [2]. This device consisted of special insoles with compression sensors in shoes, and the insole was connected to a mobile data collection box worn on the belt. The other study to assess the gait in patients with subcortical capsular encephalopathy also used a similar device [10]. On the other hand, an electronic walkway that detects the spatial and temporal characteristics of footfalls during gait was used in

the assessment of walking behavior in multiple sclerosis [11,12]. Then, a camera-based device was also used for gait assessments in patients with spinal cord injury in addition to multiple sclerosis [1,13]. Various techniques for quantitative gait assessment have been proposed in previous works; however, gait performance is difficult to assess, requiring the use of special or large machines and facilities in clinical scene and/or multicenter trials [1,10–15]. In practice, qualitative assessments have often been performed without any devices, but with the use of clinical scores [16,17], and no standard device for quantitative gait assessment has been established.

The portable devices for gait assessment, particularly a tri-axial accelerometer [18–21], that have been developed recently exhibit an accuracy equal to that of large devices such as treadmills in the assessment of gait performance and are used in practical and clinical studies [18–24]. Tri-axial accelerometers have the advantage of easy attachment to the body or foot of subjects with a belt and the estimation of various gait parameters such as movements of the body trunk, step time, and ground reaction force from the acceleration wave dataset in the three axial directions during walking. On the other hand, the reproducibility of the accelerometer in healthy subjects remains unclear. In the present study, we investigated whether gait-related parameters obtained by a tri-axial accelerometer are reliable in terms of reproducibility by performing test-retest gait measurement in healthy subjects.

#### **2. Materials and Methods**

#### *2.1. Subjects*

All subjects participated in this study between July 2017 and October 2017 after providing written informed consent and primary medical check interviewing history of disorders, age and height by authors (S.S. and Y.N.). The inclusion criteria were as follows: >20 but <60 years of age; no history of brain-related disorders, including surgical operation, irradiation, stroke, infection, remarkable atrophy, or demyelinating disease; no history of hypertension, diabetes mellitus, atrial fibrillation, pulmonary disease, leukoaraiosis, no musculoskeletal deficits or the other diseases showing gait abnormalities without neurological deficits [24–27]. In the first stage, each subject performed a 10-m walk six times with a tri-axial accelerometer (MG-M1110-HW, LSI Medience, Tokyo, Japan) fixed at the L3 level of the subject by a nylon belt (Figure 1).

**Figure 1.** Tri-axial accelerometer (MG-M1110) with a switch cable for marking points of a 10-m walking interval for the dataset (**left**) and the accelerometer fixed at the L3 level by a nylon belt in a subject (**right**).

The device can measure tri-axial (vertical, anteroposterior, and mediolateral) acceleration by detecting limb and trunk movements at a sampling rate of 100 Hz during step-in and kick-off motions. The tests were performed on a 30-m straight walkway in our hospital. All subjects were instructed by an author (S.F.) for walking at each usual pace and they walked 16 m, including 3 m before the starting point and 3 m after the end point, as intervals to obtain the 10-m walking dataset in Table 1. To mark the 10-m segment of the dataset, an operator (S.F.) pushed a button connected to the accelerometer with a cable at both the start and end points while following the subject. The second test was also performed with the same subjects under the same conditions (the tri-axial accelerometer, an operator, and the walkway) within a 3-month period at least 1 month after the first evaluation. The statistical comparison of two datasets, each showing 10% standard deviation relative to each mean value and 10% mean difference to an averaged value of the two mean values, requires the examination of more than 28 subjects with alpha and beta levels less than 0.01 [28], indicating Type I error and Type II error, respectively. Thus, we determined that the present study required more than 28 subjects in order to compare the first and second tests.

#### *2.2. Data Analysis*

From each tri-axial acceleration wave dataset of six repetitions of the 10-m walk measurement, the following nine gait parameters were calculated using commercial software (LSI Medience): assessment time (s); number of steps (step); stride time (s; time from initial contact of one foot to subsequent contact of the same foot); cadence (step/m); ground floor reaction (×9.8 m/s2); step time (s; time from initial contact of one foot to initial contact of the other foot); coefficient of variation (standard deviation/mean × 100) of step time (%; CV); velocity (m/min); and step length (cm). To calculate these gait parameters, pre-processing, called "step extraction", which marks each wave indicating a step in the tri-axial acceleration wave dataset (Figure 2), was needed.

**Figure 2.** Wave dataset obtained using a tri-axial accelerometer during walking.

By a pre-processing "step extraction", markers were automatically placed on top of each wave, indicating one step.

During pre-processing with the software, the number of steps can be automatically estimated from each 10-m walking wave dataset. If the number of steps was clearly low (e.g., less than half of the number calculated using the other 10-m walking wave datasets in the same subject) by missing the wave peak due to the limitation of the peak detection algorithm, the subject was excluded from the analysis in this study without the manual error correction for the same conditions and protocols for all wave dataset. Finally, the six values from each gait parameter obtained from the six measurements were averaged, and the averaged value was defined as the representative value of the parameter in a subject. The protocol of this study was reviewed and approved by the institutional ethics committee.

#### *2.3. Statistical Analysis*

The differences in each gait parameter between the first and second assessments were examined using the Wilcoxon signed-rank test. To validate the reproducibility in two gait assessments with a tri-axial accelerometer, the intraclass correlation coefficient (ICC) was also calculated in each gait parameter. Grading of the ICC was defined as follows: excellent, ICC ≥ 0.9; good, 0.7 ≤ ICC < 0.9; moderate, 0.5 ≤ ICC < 0.7; and poor, ICC < 0.50. Subsequently, a Bland-Altman plot was performed to confirm the tendency of the relationship between the two measurements in each parameter. Furthermore, the correlation between height and each gait parameter at the 1st test was statistically examined with Spearman's correlation coefficient for confirming the effect from the individual factor. All statistical analyses were performed on MedCalc ver. 17.9.7 (MedCalc Software bvba, Ostend, Belgium) with a significance level of *p* < 0.05.

#### **3. Results**

Forty-four subjects were included in this study. Two of the 44 subjects could not perform the second test within 3 months. The other 42 subjects performed the 10-m walk at both stages; however, some step waves of the six measurements in six subjects could not be appropriately extracted for each 10-m walk wave, indicating that step extraction errors during pre-processing for gait analysis occurred because of the deterioration of the waveform and the limitation of the wave extraction algorithm. Finally, 36 subjects (Figure 3) (21 men and 15 women; mean age, 35.7 ± 9.9 years; range, 22–58 years) were able to complete both the first and second analyses (mean interval, 49.6 ± 7.6 days; range, 40–65 days).

**Figure 3.** Flowchart for including the subjects.

Only the CV showed a significant difference between the first and second measurements (median CV: first, 2.16; second, 2.50; *p* = 0.0188), while the other parameters showed no significant differences (Table 1). Among all nine gait parameters, stride time (ICC: 0.803), step time (0.788), and cadence (0.784) showed good correlation with a high ICC of ≥0.70. The number of steps (0.685), step length (0.663), ground reaction force (0.615), velocity (0.598), and assessment time (0.565) showed a moderate ICC between 0.50 and 0.70. The ICC of the CV indicated poor correlation and was the lowest value, being <0.50 (0.425) in all the parameters (Table 1). The Bland-Altman plot in the CV showed a negative trend, with the mean of the first and second assessments being larger, while those in the other parameters showed no trends. The plot of the parameters shows good ICCs in Figure 4a–c, and the CV shows poor ICC in Figure 4d.


**Table 1.** Median and intra-correlation coefficient for each parameter at and between first and second 10-m walks in healthy subjects (*n* = 36).

ICC: intra-correlation coefficient; CI: confidential interval. \* examined using Wilcoxon signed-rank test.

**Figure 4.** Bland-Altman plots of the gait parameters, showing good intraclass correlation coefficient (ICC; (**a**) stride time, (**b**) step time, (**c**) cadence) and poor ICC ((**d**) coefficient of variation).

Height significantly correlated with number of steps (ρ, *p*-value and the 95% confidential interval: −0.358, 0.0323, −0.614 to −0.0328), stride time (0.468, 0.0040, 0.164 to 0.690), cadence (−0.467, 0.0041, −0.690 to −0.163), step time (0.483, 0.0028, 0.184 to 0.701) or step length (0.356, 0.0331, 0.0311 to 0.613). On the other hand, no significant correlation was observed between height and assessment time (ρ, *p* value and the 95% confidential interval: 0.0312, 0.8567, −0.300 to 0.356), ground force reaction (−0.139, 0.4174, −0.447 to 0.198), CV (0.0464, 0.7883, −0.287 to 0.369), or velocity (−0.0340, 0.8441, −0.359 to 0.298).

#### **4. Discussion**

Portable devices for quantitative gait analysis have been developed, and these devices have the advantage in clinical scenarios of needing only a walkway, rather than any large space or facility [18,19,21–23,29–31]. In the present study, we validated the reproducibility of a tri-axial accelerometer used in gait assessment by performing test-retest measurements within a 3-month period 1 month after the first evaluation. All gait parameters except the CV showed adequate reproducibility for practical clinical use, with no significant differences and with practical ICCs between test-retest

measurements. The present findings suggest that the tri-axial accelerometer can sufficiently evaluate gait by using just the device and an operator, and without large machines and experts to analyze the dataset.

Three gait parameters (stride time, step time, and cadence) showed good ICC in the present study. In a previous work, the parameters showed no significant differences between controls (patients showing transient ischemic stroke/asymptomatic carotid artery stenosis) and patients without symptoms 1 month after stroke but significantly changed in patients with symptoms after stroke as compared with controls [2]. These three parameters may be more robust than the other gait parameters because they only change in cases that show severe deficits. Furthermore, the device we used may have more sensitivity in such parameters than previous devices; thus, it may also indicate a remarkable robustness in these three parameters.

By contrast, the parameters that show moderate ICC (number of steps, step length, ground reaction force, velocity, and assessment time) may change depending on the condition and/or intention of the subjects participating in the gait measurement. We presume that velocity and assessment time can change more easily than the other parameters because the same pace is difficult for subjects to retain between the first and second measurements, even if an operator carefully performs the measurements with healthy subjects under the same conditions. Thus, when we use these parameters for clinical research, it may be insufficient for identifying the gait abnormalities in the pathological groups to use the standard cutoff values at the 95% confidential intervals (*p* = 0.05) from the healthy groups because the sensitivity to detect the abnormalities may be low with the cutoff values, especially for assessing improvement of ambulation by velocity [32] or assessment time [12,17], as described in a previous study. We have to pay attention to the use of such gait parameters.

A previous report indicated that the CV was significantly larger in patients with Parkinson's or Huntington's disease (HD) than in controls, and that the CV was the best predictor of HD [9]. On the other hand, the averaged value of each gait parameter during gait assessment showed no significant difference between the pathological and control groups in the previous study. The CV showed a significant difference in the present study and a poor ICC compared to that of the healthy subject group. This result indicates that the CV in the previous work may potentially underestimate the gait performance in the pathological group. With the high accuracy of the accelerometer used in the present study, we could identify the significant variation in the CV observed even in the healthy subjects. Therefore, the averaged value of each gait parameter during gait assessment may show the significance in the pathological group if the tri-axial accelerometer that shows high sensitivity is used.

The present study has some limitations. First, the sample size is not so large. Second, the gait dataset may include errors due to a slight slipping of the nylon belt from the L3 level of the subject. Third, the 10-m distance was marked by manually pushing a button connected to the accelerometer with a cable in the present study. Only one operator performed the measurements for gait assessment; therefore, interrater reliability remains unclear. In future studies, a comparison of the tri-axial accelerometer as used in the present study with an accelerometer based on an infrared system (as proposed in a previous study [31]) tell us the difference in the accuracy between manual and automatic procedures.

#### **5. Conclusions**

In the test-retest gait assessment using a tri-axial accelerometer, a reproducibility sufficient for clinical research was observed in all parameters except the CV. The present results suggest careful evaluation of the CV because it may potentially overestimate gait disturbance in the pathological group owing to the comparably low reproducibility. Portable accelerometers can assess gait performance noninvasively and with practical accuracy without the need for other huge machines. In future works, the devices may be used for long-term gait assessment over a few days in the elderly and in patients with neurodegenerative and/or spinal disease because subjects only need to attach the device to their back with a belt.

**Author Contributions:** Conceptualization, S.F. and K.O.; Methodology, S.F., Y.N. (Yasumasa Nishikawa), K.O.; Software, S.F.; Validation, S.F., S.S. and K.O.; Formal Analysis, S.F. and S.S.; Investigation, S.F., S.S. and K.O.; Resources, Y.N. (Yasumasa Nishikawa) and K.O.; Data Curation, S.F., S.S., and K.O.; Writing—Original Draft Preparation, S.F.; Writing—Review & Editing, A.S., T.K., Y.N. (Yukihide Nishimura) and K.O.; Visualization, S.F.; Supervision, Y.N. (Yukihide Nishimura) and K.O.; Project Administration, S.F. and K.O.; Funding Acquisition, A.S., S.F. and K.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** This study was partially supported by Grant-in-Aid for Scientific Research (C) (No. 18K08948, 2018–2021 from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Comparing Gait Trials with Greedy Template Matching**

**Aliénor Vienne-Jumeau 1, Laurent Oudre 1,2,3,\*, Albane Moreau 1, Flavien Quijoux 1,4, Pierre-Paul Vidal 1,5 and Damien Ricard 1,6,7**


Received: 9 May 2019; Accepted: 11 July 2019; Published: 12 July 2019

**Abstract:** Gait assessment and quantification have received an increased interest in recent years. Embedded technologies and low-cost sensors can be used for the longitudinal follow-up of various populations (neurological diseases, elderly, etc.). However, the comparison of two gait trials remains a tricky question as standard gait features may prove to be insufficient in some cases. This article describes a new algorithm for comparing two gait trials recorded with inertial measurement units (IMUs). This algorithm uses a library of step templates extracted from one trial and attempts to detect similar steps in the second trial through a greedy template matching approach. The output of our method is a similarity index (SId) comprised between 0 and 1 that reflects the similarity between the patterns observed in both trials. Results on healthy and multiple sclerosis subjects show that this new comparison tool can be used for both inter-individual comparison and longitudinal follow-up.

**Keywords:** inertial measurement units; gait analysis; biomedical signal processing; pattern recognition; step detection; physiological signals

#### **1. Introduction**

Gait semiology is of major importance in neurological practice, as abnormalities are associated with high comorbidities. The quantification of gait using inertial measurement units (IMUs) has become a democratic method for the follow-up of subjects with locomotion alterations in healthcare. The use of such embedded technologies has already shown its usefulness in the detection of postural strategies during walking [1], partitioning gait during the stance phase [2] or motor supplementation for switch-activated simulators [3]. However, these clinical applications require the detection of steps within the IMU signals. Spatio-temporal gait parameters can also be extracted for the healthy or disabled and stored in databases that enable a longitudinal follow-up of patients with gait disorders due to ageing [4], orthopedic or rheumatic diseases [4–6] or neurological alterations [4,7–9]. It has proven useful to help clinicians refine the description of individual gait disorder and strengthen their insights into the patients' movements and compensation patterns. This quantification of characteristics related to altered gait using signals from IMUs that are collected inside databases allows inter-individual comparisons to assess the distance of the patients' gait from a control group [10] or intra-individual comparisons for longitudinal follow-up [11,12]. Common gait features most often rely on basic

statistics such as averages or standard deviations over a whole exercise [13,14]. On the one hand, they provide useful and interpretable information for the clinician. On the other hand, they have not proven sensitive enough for detecting subtle changes in several pathologies [4,15,16]. Besides, they display high inter-session variability for diseases that present with day-to-day changes [17]. To assure robustness of these parameters, it is usually necessary to increase the number of steps within a trial [18–20]. However, repeated measures or treadmill exercises are incompatible with common clinical practice in patients with limited walking perimeter, which is frequent in neurological practice. In order to obtain a more integrative perspective, some authors resort to global indexes, which are composed of several parameters [21,22]. These scores are promising but careful consideration should be given to their evolution inasmuch as the absence of evolution of a multicomponent score does not necessarily reflect the reproducibility of the gait pattern between two measurements [20]. Indeed, maintaining a steady value of an overall score over time may mask gait adaptations. For instance, gait velocity may be maintained despite a decreased step length if the cadence increases concomitantly. What is more, these parameters heavily rely on accurate step detection, which is problematic in severely altered steps: Some patients may require manual painstaking and time-consuming detection [23]. It is therefore key to evaluate this detection or be exempt from it.

Progressive multiple sclerosis (pMS) is one of the disorders that benefits heavily from the use of IMUs in routine clinical practice to assess indices of the disease evolution [24–26]. IMUs have been applied to reliably monitor patients' health status with regard to their risk of falling [27], their physical abilities [28,29] or the neuromotor strategies used to adapt to their disability [22,30–32]. However, patients suffering from severe pMS may impose high constraints both on measurements, which should be short and controlled to abstract from fatigue and day-to-day variations, and processing, which should adapt to very abnormal patterns and confounders such as false gait events triggered off by loading and unloading of walking aids.

In this study, a new metric to compare two gait trials recorded with IMUs, which we called "similarity index" (SId), is introduced. It aims at overcoming previously mentioned withdrawals of statistical methods in pathologies such as pMS. The SId is an asymmetrical metric that takes as input two gait trials and computes an index, comprised between 0 and 1, that assesses the similarity between them. It is hypothesized that such a metric provides a valid characterization of a change in gait patterns between two measurements, and can therefore be used either for inter-individual comparison or longitudinal follow-up.

First, the SId is compared between pairs of trials of increasing distance. Second, it is evaluated against more conventional features to estimate its capacity to assess changes in gait. Eventually, its ability to indicate the level of confidence of the underlying step detection is appraised.

#### **2. Data, Protocol and Subjects**

#### *2.1. Protocol*

Two XSens® sensors (Xsens® Technologies, Enschede, the Netherlands)—hereafter XS—were placed on the participant's body (one on the dorsal part of each foot) using Velcro bands. The XS showed high reliability at heel impact for the ankle joint in the sagittal plane (inter-correlation coefficient (ICC) > 0.8) [33]. With a standard error of the mean (SEM) below 3°, between- and within-rater reliability of kinematic variables obtained from XS across joints and planes, its consistency is comparable to or better than that obtained from optoelectronic motion capture systems [34]. The GaitRite® mat—hereafter GR—exhibits strong concurrent validity [35,36] and excellent reliability (with ICC > 0.8) for most temporo-spatial gait parameters in both young and older subjects [37–39]. The GR can be used to assess people with altered gait with good reliability even though walking ability does influence it. The ICCs for the older subjects [37] or patients with neurological diseases [40,41] can be somewhat lower but are still adequate for measuring step parameters of gait in these populations. Based on this data, GR is used in this paper as the gold standard. The data were sampled at 100 Hz

for the XS and at 120 Hz for the GR. Both systems were synchronized in time by using the PC clock connected to the XS. Participants performed four walks of 12 m with a U-turn (6 m on the way in and 6 m on the way out): Two at the first visit (M0) and again two at the second visit six months later (M6). The choice of a six-month period between measurement was driven by the fact that patients from the pMS group undergo routine evaluation of their gait every six months. The protocol is schematized in Figure 1.

**Figure 1.** Measurement protocol. The XSens® sensors (XS) inertial measurement units and the GaitRite® mat (GR) are synchronized by using the PC clock connected to the inertial measurement units. The active surface (green) is covered with pressure sensors. The rest of the mat (grey) is inactive and does not detect any pressure from the subject.

#### *2.2. Subjects*

Twenty-two patients with progressive multiple sclerosis (pMS) and ten young healthy subjects (HS) were enrolled in this longitudinal study. The characteristics of the subjects are displayed in Table 1. pMS patients were consecutively recruited from the outpatient clinic of Percy Hospital (Clamart, France) between June 2018 and September 2018. The inclusion criteria for participation in this cohort required patients to be at least 18 years old, be diagnosed with primary progressive or secondary progressive multiple sclerosis according to the 2010 International Panel criteria [42], be capable of walking 20 m with U-turn and be free of any other conditions that affect gait. HS participants were recruited from the hospital and research unit staff between June 2018 and September 2018. The inclusion criteria were: No report of falls in the past five years prior to inclusion and no disease that could affect their walk. The sex ratio was comparable between the two groups and no major differences were seen between other anthropometric characteristics. pMS patients were aged 58 (±11) years old and the HS group mean age was 26 (±2) years old. The two groups were not matched for age as one aim of this analysis was to analyze the performance of the algorithm on two opposite groups, one with highly altered steps (pMS) and one with the most normal steps. Severity of the disease was evaluated using the Expanded Disease Status Scale (EDSS) [43], which is a score of 0 to 10, ranging from normal neurological examination (0) to total impotence (9.5) or even death (10). Included participants in the pMS group had an EDSS between 3.0 and 6.5, as disabilities greater than 7.0 impede walking even a few steps. Seven out of the 22 participants had an advanced disease requiring permanent walking aid (cane(s), walker and/or human help). Two patients needed human help to perform the walking test. All subjects provided a written informed consent prior to their inclusion. The study protocol followed the principles of the Declaration of Helsinki and was approved by the Ethics Committee "Protection des Personnes Nord Ouest III" under the ID RCB: 2017-A01538-45.

**Table 1.** Baseline characteristics of patients with progressive multiple sclerosis (pMS) and healthy subjects (HS). For the age, height, weight, BMI, Multiple Sclerosis Walking Scale (MSWS) and Fatigue Impact Scale (FIS), the mean and the standard deviation (SD) are displayed. For the Expanded Diseases Status Scale (EDSS) and the functional scores (subscores of EDSS), the statistics are reported as median and interval quartile range (IQR).


#### **3. Method**

We now define the similarity index (SId) between two gait trials. Let us consider two gait trials:


The aim of the algorithm presented in this section is to compute a similarity index SId*itest*|*itrain* , comprised between 0 and 1, that will assess the proximity between trials *itrain* and *itest*. This metric is based upon the following question: How well can the group of steps present in trial *itrain* predict those observed in trial *itest* ?

The computation of this index is based on three main stages, detailed below and illustrated in Figure 2.


**Figure 2.** Main stages for the computation of the similarity index (SId). First, the GR and XS data from the trial *itrain* are used to build a library of templates P*train*. In the second stage, the library is used to detect the steps in the trial *itest*, according to a greedy template-based approach inspired by [44]. Each detected step *s* is associated with one template *ps*. The correlation coefficients *cs* between the steps *<sup>s</sup>* and their associated templates *ps* are then averaged to obtain the similarity index SId*itest*|*itrain* .

#### *3.1. Construction of the Library of Templates*

Let us consider a train gait trial *itrain* composed of GR and XS data. We first use the GR recordings to extract the exact timings for initial contacts (ICs) and final contacts (FCs). This process is automatically performed thanks to the GR software. Only the steps occurring while the subject is on the active surface of the instrumented mat are used; steps occurring during the U-turn are not considered. Then, we use the XS synchronized data to build the library of templates. We consider for each right/left foot XS sensor the *Y*-axis angular velocities (swing in the direction of the walk) and construct a library of templates by extracting the steps in the XS signals. More precisely, given a step identified with the GR with the initial contact time *tIC* and final contact time *tFC*, we consider the XS signal *xtrain* corresponding to the adequate foot and define the pattern *p* = *xtrain*[*tIC* : *tFC*]. This pattern *p* can be seen as a signal of length *Np* = *tFC* − *tIC* + 1 that represents the typical angular

velocity of a foot during a step. The process is iterated for all the steps and for both feet: Each step identified with the GR forms a different pattern *p*. All patterns corresponding to the trial *itrain* are stored in a library P*train*.

#### *3.2. Use of the Library to Detect the Steps*

The library P*train* is used to detect the steps for the trial *itest* (which does not necessarily belong to the same subject and/or the same session). To that end, we consider the XS *Y*-axis angular velocity *xtest* for trial *itest*. Each pattern *p* ∈ P*train* is slid along signal *xtest* and for each possible shift we compute the Pearson correlation coefficient. The final result is a matrix **C** of size *N*<sup>P</sup> × *Ntest*, where *N*<sup>P</sup> is the number of templates in P*train* and *Ntest* is the number of samples of signal *xtest*, where

$$\forall i\_p \in \{1, N\_{\mathcal{P}}\}, \forall i\_l \in \{1, N\_{\text{test}}\} \qquad \mathbf{c}(i\_p, i\_l) = \text{corr}\left(p, \mathbf{x}\_{\text{test}}[i\_l : i\_l + N\_p - 1]\right), \tag{1}$$

and corr(., .) is the Pearson correlation coefficient.

The matrix **C** is then processed with an iterative and greedy detection strategy, described in [44], which detects steps by iteratively selecting the largest Pearson correlation coefficients in the matrix until all of them are lower than a threshold *λ* = 0.6. The influence of threshold *λ* is discussed in [44] and the value 0.6 insures that the algorithm does not consider irrelevant matches. The main idea behind this procedure is that we select the best possible templates in train trial *itrain* to detect the steps in test trial *itest*.

The output of the algorithm is a list of steps S*itest*|*itrain* (steps of trial *itest* detected with the library of trial *itrain*). For each detected step *s* ∈ S*itest*|*itrain* , we also have access to the template *ps* ∈ P*train* that was used for the detection, and to the Pearson correlation coefficient *cs* between *s* and *ps*. Those additional outputs, which were not investigated in [44], are actually of interest since:


#### *3.3. Similarity Index*

In order to use this additional information for the gait characterization, we propose to introduce a new parameter, called SId (similarity index). Given a library of template P*train* and a test trial *itest*, this quantity is defined as

$$\text{SId}\_{i\_{\text{test}}|i\_{\text{train}}} = \max\_{s \in S\_{i\_{\text{test}}|i\_{\text{train}}}} (c\_s) \,. \tag{2}$$

The SId is the mean of the Pearson correlation coefficients computed between detected steps and their respective closest templates. This quantity measures the ability of trial *itrain* to detect the steps in trial *itest*. It can be interpreted as a similarity index between trials *itrain* and *itest* (assuming that if both trials were identical the step detection would be easy to perform and would produce large Pearson coefficients). It can also be seen as a confidence index on the detection (if this index is close to 1, it means that all detected steps were very similar to the annotated steps in the library and thus are likely to be well detected).

Note that the SId is not symmetrical, as using steps in trial *itrain* to detect the steps in trial *itest* might not be the same as using steps in trial *itest* to detect the steps in trial *itrain*.

#### *3.4. Use of the SId Index in Various Configurations*

According to the chosen train and test sets, the SId index can be used either for longitudinal follow-up or inter-individual comparison. In this article, we consider four different configurations referred to as A1–A4.


In order to investigate the properties of the SId index, we computed all SId between all trials of all subjects and merged the SId values according to these four different configurations, as illustrated in Figure 3.

**Figure 3.** Definitions of the different pairs of extraction/detection trials that are analyzed in the article.

#### *3.5. Conventional Features*

In addition to the SId, the following conventional gait parameters were computed:


Given two gait trials, we computed the differences between the parameter values and merged these differences with the four different configurations illustrated in Figure 3.

#### *3.6. Link to the Performance of the Step Detection*

The similarity index (SId) can be interpreted as a confidence index for the step-detection algorithm. Indeed, a large SId suggests that the patterns present in the train library fit those observed in the test signal and are therefore likely to provide efficient detection. To investigate this question, we computed the correlation between the SId values and some evaluation metrics commonly used for assessing the performances of step-detection algorithms [44]. These metrics are based on the ground truth step annotations provided by the GR.


#### *3.7. Statistics*

All parameters were tested for normality using Shapiro-Wilks tests. Parametric tests were applied for normal distributions and non-parametric tests were resorted to when this hypothesis was rejected. Means and standard deviations (SD) were reported, except for ordinal distributions (EDSS) where mean and interquartile range were reported.

#### 3.7.1. Comparisons between Configurations

SId and change in gait conventional features were compared between configurations of pairs of extraction/detection trials using the absolute difference between the mean value in the two groups. For all these non-parametric variables, the Krustkall-Wallis test—a rank-based non-parametric test used to assess more than two independent groups—was used. Rejection of the null-hypothesis was followed by subsequent Wilcoxon tests to test differences in medians. All tests were corrected for multiple comparisons using Bonferroni adjustment. For each group (HS and pMS, respectively), the percentile score of SId from A2 was computed from the distribution of SId from A3. The percentile score of SId from A2 was also computed from the distribution of SId from A4.

#### 3.7.2. Correlations

SId was correlated to performance, accuracy and conventional gait features using Pearson moment product correlation coefficients, which remains a valid method, even in the case of non-normal datasets [45]. Pearson correlation coefficient is interpreted as very high for absolute values between 0.9 and 1.0, high for absolute values between 0.7 and 0.9, moderate for absolute values from 0.5 to 0.7, low for absolute values from 0.3 to 0.5 and negligible for absolute values below 0.3 [46].

Primary data analysis (extraction and detection process) was done using MATLAB® R2019a. Statistical analysis was performed using R v3.5.1. All tests were corrected for multiple comparisons using Bonferroni adjustment.

#### **4. Results**

In this section, we investigate the ability of this index to effectively compare two trials. To investigate the potential of SId as a gait biomarker, three different and complementary questions are investigated:


#### *4.1. Comparison of SId Based on Configurations*

In this experiment, the values of SId are compared between the four configurations: Intra-individual intra-session comparison (A1), intra-individual inter-session comparison (A2), intra-group inter-individudal comparison (A3) and inter-group inter-individudal comparison (A4). Boxplots are displayed in Figure 4. SId shows its highest values for the A1 comparison (HS: 0.99 (0.00), pMS: 0.97 (0.01)) and decreases from A1 to A4, both in the HS and the pMS group (*p*-value of the Krustkall-Wallis test: <0.0001, *p*-value of the subsequent Wilcoxon tests: <0.0001 for all paired comparisons). In particular, it shows that trials from a given subject are closer to each other than to trials from another subject both in the HS group (mean difference: 0.046; *p*-value: <0.0001) and in the pMS group (mean difference: 0.055; *p*-value: <0.0001). Comparisons of A3 (inter-individual intra-group) and A4 (inter-individual inter-group) show that SIds obtained for intra-group comparison are larger than inter-group ones in the HS group (mean difference: 0.190; *p*-value: <0.0001) but not in the pMS group (mean difference: 0.070; *p*-value = 0.52).

For a given individual *k* inside the HS or the pMS groups, SIds for comparison of one trial to another trial are reproduced in Table 2. This table shows that, on average, trials from a given subject are closer to other trials from the same subject than to trials from other subjects. For HS subjects, SId from A2 prediction belongs to the 90th (SD: 14) percentile of the distribution of SId from A3 prediction and is always higher than SId from A4. For pMS subjects, SId from A2 prediction belongs to the 96th (SD: 8) percentile of the distribution of SId from A3 prediction and the 99th (SD: 2) percentile of the distribution of SId from A4.

**Figure 4.** Comparison of SId predictions across configurations: Intra-individual intra-session prediction (A1) vs. intra-individual inter-session prediction (A2) vs. intra-group inter-individual prediction (A3) vs. inter-group inter-individual prediction (A4).

**Table 2.** Similarity index scores for comparing one gait trial depending on the training trial (intra-individual inter-session, intra-group inter-individual, inter-group inter-individual). Means and standard deviations are displayed for both pMS and HS groups.


#### *4.2. Correlation of SId with Conventional Features*

Comparisons were also carried out for the average walking velocity (Figure 5a), step length (Figure 5b), step time (Figure 5c), double stance time (Figure 5d), coefficient of variation of step time (Figure 5e) and coefficient of variation of double stance time (Figure 5f), which are classical gait features [4]. After controlling for multiple comparisons, difference in average velocity (Figure 5a) and differences in double stance time (Figure 5d) proved significantly higher in the A2 (intra-individual inter-session) comparison as compared to the A1 (intra-individual intra-session) comparison in the HS group (*p*-values of 0.002 and 0.003, respectively, with a threshold of 0.017) and the pMS group (*p*-values < 0.001 and < 0.0001, respectively, with a threshold of 0.017). Difference in step length (Figure 5b) was also higher in the A2 comparison as compared to the A1 comparison in the HS group (*p*-values of 0.007, with a threshold of 0.017). All other comparisons of configurations were highly significant (*p*-value < 0.0001).

**Figure 5.** Intra-individual intra-session prediction (A1) vs. intra-individual inter-session prediction (A2) vs. intra-group inter-individual prediction (A3) vs. inter-group inter-individual prediction (A4) for both cohorts : (**a**) Average walking velocity; (**b**) step time; (**c**) step length; (**d**) double stance time; (**e**) coefficient of variation of step time; (**f**) coefficient of variation of double stance time.

To investigate how SId would correlate to change in these conventional features, SId, as measured for each intra-group comparison (A1, A2, A3), was correlated to variation between the respective train trial and test trial for each of the following conventional gait features: The average walking velocity (Figure 6a), step time (Figure 6b), step length (Figure 6c), double stance time (Figure 6d), coefficient of variation of step time (Figure 6e) and coefficient of variation of double stance time (Figure 6f).

**Figure 6.** Correlation of SId to conventional features: (**a**) Average walking velocity; (**b**) step length; (**c**) step time; (**d**) double stance; (**e**) coefficient of variation of step time; (**f**) coefficient of variation of double stance time.

For both groups, low correlations were observed for difference in the average walking velocity (Figure 6a) (HS: *r* = −0.38, *p*-value: < 0.0001; pMS: *r* = −0.31, *p*-value: < 0.0001), double stance time (Figure 6d) (HS: *r* = −0.35, *p*-value: < 0.0001; pMS: *r* = −0.13, *p*-value: < 0.0001) and the variation coefficient of step time (Figure 6e) (HS: *r* = −0.14, *p*-value: 0.004; pMS: *r* = −0.13, *p*-value: < 0.0001). Moderate to high correlations were observed for difference in step time (Figure 6c) (HS: *r* = −0.74, *p*-value: < 0.0001; pMS: *r* = −0.56, *p*-value: < 0.0001). Additional low correlation was seen for pMS participants for the difference in step length (Figure 6b) (*r* = −0.13, *p*-value: < 0.0001) and the variation coefficient of double stance time (Figure 6f) (*r* = −0.13, *p*-value: < 0.0001).

#### *4.3. Correlation to Performance of the Step Detection*

Performance and accuracy scores, along with their correlations to SId, are reported in Table 3. In the HS group, SId correlates moderately to the F-measure, Δ*Start* and Δ*Duration*, and weakly to Δ*End*. In the pMS group, SId correlates moderately to the F-measure and strongly to Δ*Start* and Δ*Duration*, while a very low correlation is found with Δ*End*.

**Table 3.** Correlations between SId and the F-measure and accuracy scores for the step detected. All configurations are pooled together and reported as mean (SD).


#### **5. Discussion**

This study shows that SId is a valid metric to compare two gait trials both between different subjects or between two visits of a same subject to track changes in gait. In addition, in our small sample of patients, SId seems to give an insight into the performance of the underlying template-based step-detection method.

First, SId showed decreasing values from intra-individual intra-session (A1) to intra-individual inter-session (A2) to intra-group inter-individual (A3) to inter-group inter-individual (A4) trial comparisons for both the HS group and the pMS group. The difference in SId between A1 and A2 was expected for pMS individuals, for which symptoms vary from day to day depending, for instance, on the level of exercise and physical therapy or the weather (Uhtoff effect [47–49]). This higher change in HS participants between trials of different sessions compared to between trials of a same session was also true for conventional features. Average velocity and double stance time, as well as step length in the HS group, both displayed a higher difference when comparing inter-session with intra-session trials. Still, for all features, the difference in A2 remains within the standard error mean for inter-session comparison as found in the literature [50–52]. Furthermore, the hierarchy of variability in gait parameters is also found in the literature in intra-class correlation coefficients for both healthy subjects [19,53] and mixed groups of patients and subjects [10].What is more, SId shows high variability in between-cohort comparisons as compared to intra-cohort comparison for the HS group but not for the pMS group. Two participants from the pMS cohort can then be as distant as one participant from the pMS cohort and one participant from the HS cohort. One explanation is that pMS patients present with a wide range of gait alterations both in terms of the types of symptoms (which can relate to balance deficit, spasticity, decreased muscular strength, etc.) and severity of symptoms. In that regard, it can be observed in Figure 4 and Table 2 that the SId for the detection of steps from HS individuals using steps from pMS individuals seems lower than the detection of steps from pMS individuals using steps from HS individuals, which illustrates the non-symmetrical characteristic of the SId. This difference may be due to the durations of the steps that are different for HS and pMS subjects [23,54,55]. Due the greedy aspect of the matching procedure, it is easier for the algorithm to detect one large step with several small steps than the opposite. Therefore, higher SId values can be achieved by using HS templates to detect pMS steps than the opposite. One other explanation might that the noise level is larger for pMS subjects, thus creating noisy templates that are more difficult to match than HS smoother templates.

Second, as mentioned above, lower SId was associated with increased difference in step time between the train and test trial, a parameter which also showed strong correlation with disease severity as measured using the Expanded Disease Status Scale [16,23,54–56]. The SId has, therefore, potential to give insight into the evolution of the disease, without needing any pre-processing and step detection. However, even though most of them were significant, only low correlations were found for the differences in other conventional features that are usually used to characterize gait. As a matter of fact, very high variability in the difference of conventional features is seen, and one ought to be careful in drawing conclusions before larger and longer studies are carried out.

Third, SId has been shown to provide key information on the underlying step-detection algorithm. One major drawback of automatic step-detection algorithms is that it is tricky to assess their performances in real-life conditions. In particular, when confronted with different types of gait or cohorts, their accuracy may drop, which can have consequences if they are used in a clinical context. As a matter of fact, most of the algorithms designed for a particular type of subject may suffer from degraded performance in other cohorts [57]. Thanks to its construction, a large SId between two trials means that templates used to detect the steps were close to these latter. Very low SId values can therefore be interpreted as a discrepancy between the train and test trial, which is likely to cause a poor step detection. Indeed, we showed in Table 3 a correlation between SId and performance as well as accuracy scores. The SId values are therefore linked to the confidence in the underlying detection algorithm, and could be used to report that the model used in the detection process does not suit the

tested data. If several libraries of templates were available (e.g., one for each cohort or one for each gait disability), the SIds could be used to select the most appropriate library and thus improve the step-detection performances. These perspectives shall be investigated in future studies.

Eventually, these results can be applied to a wide range of pMS individuals, with mild as well as severe diseases. Indeed, as patients using walking aids were also included, the conclusions also apply to patients with EDSS 6 and 6.5, which fills a gap in the literature [23]. Comparisons of SId between other populations should be informative to compare distances between gaits of patients disabled by different Neurological illness and participates in the development of a new taxonomy. New matching procedures may also be implemented, for instance, by using Dynamic-Time Warping (DTW), which allows to match time series of different lengths. In particular, the use of this technique dedicated to template matching may be useful in the context of step detection and recognition [58].

Our study has limitations. First, sampling fluctuations may have occurred due to the small sample size, particularly of HS. Recruiting young healthy subjects was difficult due to the necessity of a six month time period between both measurements. In particular, strong variability was found when correlating conventional features with the SId. Even though results were significant, the clouds of points are sparse.

#### **6. Conclusions**

In this article, we introduced a novel algorithm for comparing inertial signals of two gait trials. The output parameter, a metric referred to as the similarity index (SId), is comprised between 0 and 1 and reflects how similar two gait trials are. This parameter shows promising results for the longitudinal follow-up of participants, as it is sensitive to changes in gait features. Larger studies are needed to confirm the potential of SId as a predictor of changes and a longer follow-up time could also allow assessment of its prognostic value. Besides, as the SId correlates to the performance and accuracy of the underlying step-detection algorithm, it provides immediate feedback of the detection, which is a key aid for decision making.

**Author Contributions:** Conceptualization, A.V.-J., L.O., A.M., F.Q., P.-P.V. and D.R.; Data curation, A.M.; Formal analysis, A.V.-J., L.O., A.M. and F.Q.; Methodology, A.V.-J., L.O., A.M., F.Q., P.-P.V. and D.R.; Software, A.V.-J.; Supervision, P.-P.V. and D.R.; Writing—original draft, A.V.-J.; Writing – review and editing, L.O., A.M., F.Q., P.-P.V. and D.R.

**Funding:** This work was supported by SATT-IDF INNOV.

**Acknowledgments:** The authors would like to thank Juan Mantilla, Danping Wang, Nicolas Vayatis and Christophe Labourdette for the useful discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*

### **Comparison of Walking Protocols and Gait Assessment Systems for Machine Learning-Based Classification of Parkinson's Disease**

**Rana Zia Ur Rehman 1, Silvia Del Din 1, Jian Qing Shi 2, Brook Galna 1,3, Sue Lord 1,4, Alison J. Yarnall 1,5, Yu Guan <sup>6</sup> and Lynn Rochester 1,5,\***


Received: 1 November 2019; Accepted: 2 December 2019; Published: 5 December 2019

**Abstract:** Early diagnosis of Parkinson's diseases (PD) is challenging; applying machine learning (ML) models to gait characteristics may support the classification process. Comparing performance of ML models used in various studies can be problematic due to different walking protocols and gait assessment systems. The objective of this study was to compare the impact of walking protocols and gait assessment systems on the performance of a support vector machine (SVM) and random forest (RF) for classification of PD. 93 PD and 103 controls performed two walking protocols at their normal pace: (i) four times along a 10 m walkway (intermittent walk-IW), (ii) walking for 2 minutes on a 25 m oval circuit (continuous walk-CW). 14 gait characteristics were extracted from two different systems (an instrumented walkway—GAITRite; and an accelerometer attached at the lower back—Axivity). SVM and RF were trained on normalized data (accounting for step velocity, gender, age and BMI) and evaluated using 10-fold cross validation with area under the curve (AUC). Overall performance was higher for both systems during CW compared to IW. SVM performed better than RF. With SVM, during CW Axivity significantly outperformed GAITRite (AUC: 87.83 ± 7.81% vs. 80.49 ± 9.85%); during IW systems performed similarly. These findings suggest that choice of testing protocol and sensing system may have a direct impact on ML PD classification results and highlight the need for standardization for wide scale implementation.

**Keywords:** Parkinson's disease; machine learning; classification; wearables; accelerometer; GAITRite; multi-regression normalization; SVM; random forest classifier

#### **1. Introduction**

Parkinson's disease (PD) is a complex neurodegenerative disorder which progresses over time [1] and comprises both motor and non-motor symptoms [2], leading to poor disease management, poorer quality of life [3], and increased health care costs [4]. Early diagnosis of PD is critical for optimal management but remains challenging. Current diagnosis of PD is commonly based on subjective clinical examination (clinical scales) [5] often in conjunction with expensive and time consuming brain imaging techniques. Gait has been shown to act as a marker of global health and has been used to predict morbidity, mortality, falls risk and neurological disorders [6]. Recent work has shown that objective gait quantification of motor impairments can support PD diagnosis, also at an early stage [7,8].

Gait can be objectively quantified via a number of spatial-temporal characteristics and features [9–12]. In order to maximize information for disease classification, analysis of multiple characteristics can be enhanced using machine learning (ML) [6]. The most widely used ML models for PD classification are the support vector machine (SVM) and random forest (RF) [13–19]. Classification accuracy however is inconsistent across studies which may be largely due to methodological differences (e.g., testing protocols, gait assessment systems and normalization of participants' data) [13,14,17]. This leads to difficulty comparing across studies and in turn to select the optimal gait protocol and outcomes for classification purposes. For example, protocols used to measure gait have different durations, distance and speed [10,20,21]. Moreover, gait assessment systems range from gold standards in the field of gait analysis using camera based motion capture and instrumented walkways [11,12] to wearable devices [22]. Practically, wearable sensors such as accelerometers, gyroscopes and magnetometers [23] have advantages as they are not context specific and can be used in the clinic and home [10,24]. This is relevant if gait proves useful for disease classification and clinical use.

For accurate disease classification the differences between participants should also be as low as possible and this requires normalization of the selected features as an important and critical step [16]. Between-participants gait differences are related to demographic characteristics such as the age [25], gender, and BMI [26,27]. Gait characteristics are also speed dependent [28–30] and normalization of gait features with respect to speed is usually performed [16]. Robust normalization processes thus optimize ML models and classification of PD [31].

The effects of walking protocol and gait assessment systems on the performance of ML models and the impacts on disease classification remain unanswered questions. The objective of this study was therefore to investigate the impact of different walking protocols and gait assessment systems on the performance of SVM and RF models for PD classification. We also highlight the strengths and limitations of protocols and devices to guide decision making in future studies. We compared the effect from two different walking protocols at normal pace (four times along a 10 m walkway (intermittent walk-IW); walking for 2 min on a 25 m oval circuit (continuous walk-CW)); and two different gait assessment systems: GAITRite vs. Axivity.

#### **2. Methods**

#### *2.1. Participants*

Data from the "Incidence of Cognitive Impairment in Cohorts with Longitudinal Evaluation - GAIT" (ICICLE-GAIT) study [11,32] encompassing 93 people with early PD and without dementia at study entry, and 103 healthy controls (HC) were included in this cross-sectional analysis. PD was diagnosed according to the UK Parkinson's Disease Brain Bank criteria [21] by a movement disorder specialist [32]. The study was approved by the "Newcastle and North Tyneside Research Ethics Committee" (REC No. 09/H0906/82). All the participants gave their written informed consent before participating in the study. Experiments were conducted according to the declaration of Helsinki.

#### *2.2. Demographic and Clinical Measures*

Demographic characteristics such as age, height, weight, and BMI were recorded for all the participants. Cognition was assessed with the Mini-Mental State Examination (MMSE) [33] and balance confidence was evaluated with the balance self-confidence scale (Activities specific balance confidence scale; ABC) [34]. To assess PD motor severity, Hoehn & Yahr scale score [5] and the modified version of the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS)—section III [35] were used. Phenotypes in PD, namely postural instability and gait difficulty (PIGD), indeterminate (ID) and tremor dominant (TD) subtypes, were also calculated from MDS-UPDRS [36]. Levodopa equivalent daily dose (LEDD mg/day) was calculated according to defined criteria [37,38].

#### *2.3. Walking Protocols and Data Collection*

Two different protocols were used to assess gait. All PD participants were assessed one hour after medication intake:


**Figure 1.** Layout of experimental setup and testing protocols, (**a**) 10 m intermittent walking test (IW); (**b**) 2 min continuous walking test (CW).

#### *2.4. Gait Assessment Systems*

Each participant was asked to wear a tri-axial accelerometer (Axivity AX3, dimensions: 23.0 × 32.5 × 7.6 mm) on the lower back (L5), held in place with double sided tape (BSN Medical Limited, Hull, UK) [10]. The monitor measures the vertical, mediolateral and anteroposterior accelerations during walking at 100 Hz sampling frequency (±8 g range, resolution up to 13-bit). Data collected using Axivity were synchronized with real-time clock, and start and stop times of the trials were noted by the experimenter to automatically segment and analyze the accelerometer data via MATLAB®. Gait assessment was also conducted using an instrumented mat (Platinum model GAITRite: 7.0 × 0.6 m) [12]. GAITRite has a spatial accuracy of 1.27 cm and temporal accuracy of one sample (240 Hz, ~4.17 ms).

#### *2.5. Data Processing and Gait Characteristics Extraction*

From each testing protocol and gait assessment system, 14 gait characteristics were extracted [10,11]. Methods described in our previous work were used for extracting gait characteristics from the 10 m test and the 2 min test with GAITRite and Axivity [10]. For easy interpretation, these 14 gait characteristics were grouped into five domains (pace, rhythm, variability, asymmetry, and postural control) as described in our previous work [11].

#### *2.6. Statistical Analysis, Gait Normalization and Classification Modeling*

Multivariate analysis of variance (MANOVA) was performed on normalized gait characteristics to examine the main effect and interactions of group (PD vs. HC), walking protocols (IW vs. CW) and gait assessment systems (GAITRite vs. Axivity) on the gait characteristics. Independent *t*-tests were performed to understand the between-group (PD vs. HC) differences of demographic and gait characteristics to include as input to the ML model. Receiver operating characteristics (ROC) analysis was used to measure the discriminative power of each gait characteristic. Pearson's correlation coefficients (r) were used to check the dependency among gait characteristics within each group. Distribution of each gait characteristic was plotted using rain cloud plots [39] for each group, walking protocol and gait assessment system. Gait characteristics were normalized for ML using multiple regression normalization [16] performed with respect to preferred gait speed (step velocity in each walking protocol from each gait assessment system), age, BMI and gender. This gave the ratio of the original and predicted gait characteristics based on the following equations:

$$y\_i = \beta \mathbf{o}\_i + \beta\_1 \ast \mathbf{Gender}\_i + \beta\_2 \ast \mathbf{Age}\_i + \beta\_3 \ast \mathbf{BMI}\_i + \beta\_4 \ast \mathbf{StepVelocity} \langle \text{speed} \rangle\_i + \epsilon\_i \tag{1}$$

where *yi* is the gait characteristics from the 5 domains of the conceptual gait model, *i*th participant. β<sup>0</sup> is the intercept and β is the coefficient of the linear regression. *<sup>i</sup>* ∼ *N* 0, σ<sup>2</sup> is the residual for each participant *i*. For each testing task and sensing system, the model coefficients were estimated using the healthy control participants' data based on Equation (1):

$$y\_i = \mathfrak{Y}\_i + \mathfrak{e}\_i \tag{2}$$

where *y*ˆ*<sup>i</sup>* and ˆ*<sup>i</sup>* are the predicted value and residual error for the *ith* participant. Finally the normalized gait features were obtained by dividing the original independent gait feature with the predicted dependent gait feature by using following equation:

$$y\_i^n = \frac{y\_i}{\hat{y}\_i} \tag{3}$$

where *yn <sup>i</sup>* is the final normalized gait characteristic for the *i*th participant and n is normalized. Based on the Taylor's series the expected value of the control group normalized gait characteristics should be 1. The resulting gait characteristics will be unit less due to the division of *yi* and *y*ˆ*<sup>i</sup>* as these have the same measuring units.

The support vector machine with radial basis function (SVM-RBF) and random forest were used because these are the most widely used ML models for PD classification [13–19,40]. The models were trained on the same conceptual features from both sensing systems to compare the impact of walking protocols and gait assessment systems. 10-fold cross validation repeated 100 times was used for the evaluation of the models. Single measure, area under the curve was used for the model evaluation [41]. Importance of the gait characteristics was identified by extracting the square of the weight of the gait characteristics in the SVM-linear classifier [42,43]. Gait characteristic importance is a unitless number which was used to rank the variables based on their contribution in the classification by SVM model. This was calculated as the square of the weight calculated in the SVM model for each variable with the following Equation (4):

$$Importance = \varpi^2 = \left(\sum\_{k=1}^{k=N} \alpha\_k \mathbf{x}\_k I\_k\right)^2 \tag{4}$$

where *w*<sup>2</sup> gives the importance score and it is the entry wise square of the weight for each gait variable in the model. α*<sup>k</sup>* represents the model parameter trained on data {*xk*, *lk*}, where *k* is 1 to *N*. *N* represents the sample size, *xk* is each subject data with corresponding label *lk*. For ML, standard commands for SVM with different kernels (RBF and linear) and default parameters (slack variable-C:1) were used from SciKit-learn library in Python [44] for comparison among walking protocols and gait assessment systems. Similarly in RF, 100 trees were used for final performance estimation.

#### **3. Results**

Demographic characteristics are shown in Table 1. Compared to HCs, PDs had comparable height, weight, and BMI, included proportionally more males; were significantly younger; presented with significantly lower balance confidence (ABC) and poorer cognition (MMSE). Mostly PDs were at mild to moderate stage of the diseases based on the Hoehn & Yahr scale. PD gait was assessed within 23.8 months of clinical diagnosis while taking average 398 mg/day LEDD.


**Table 1.** Demographic and clinical characteristics of the participants.

<sup>1</sup> BMI: Body Mass Index; <sup>2</sup> MMSE: Mini–Mental State Examination; <sup>3</sup> ABC: Activities specific balance confidence scale; <sup>4</sup> LEDD: Levodopa equivalent daily dose; <sup>5</sup> MDS-UPDRS III: Movement Disorders Unified Parkinson's Disease Rating Scale part III; <sup>6</sup> PIGD: Postural instability and gait disorder phenotype; <sup>7</sup> ID: Indeterminate phenotype; <sup>8</sup> TD: Tremor dominant phenotype. In bold significant *p*-values (*p* < 0.05).

Table 2 shows the main effects and interaction effect for the group (PD vs. HC), walking protocols and gait assessment systems on gait characteristics. Table 3 shows the mean and standard deviation of raw gait characteristics and the statistical difference for each normalized gait characteristic between PD and HC for the two walking tasks (IW and CW) and two gait assessment systems (GAITRite vs. Axivity). The results for the multi regression normalization are given in the supplementary Tables S1 and S2. Plots of the whole data set to check the distribution, outliers, confidence intervals, and AUC are shown in Figure S1 in the Supplementary Material. Correlations among the gait characteristics are given

in Supplementary Figure S2. Gait characteristics were categorized into five domains (pace, rhythm, variability, asymmetry, and postural control) [11] based on a model of gait to help summarize findings.

**Table 2.** MANOVA to check the effect of walking protocols and gait assessment systems on gait (\* indicates interaction).


**Table 3.** Mean comparison among PD and HC for gait characteristics obtained from walking protocols and assessment systems (significant normalized gait characteristics are highlighted in grey color, in bold significant *p*-values (*p* < 0.05) for normalized gait characteristics except step velocity).


Firstly, we established the effect of protocol and sensor system on gait characteristics as a first step to evaluate ML performance. There were significant main and interaction effects of pathology, walking protocols, and gait assessment systems on gait as shown in Table 2 and the individual gait characteristics are displayed in Table 3.

For group (PD vs. HC) people with PD had worse gait performance compared to controls irrespective of the protocol or gait assessment system. Grouping variables by domain [11], in general PD pace and rhythm were significantly slower while variability and asymmetry were higher, in both IW and CW protocols for both gait assessment systems.

There was a main effect of walking protocol (IW & CW) on gait characteristics. Performance was typically greater (higher pace, rhythm, variability, and asymmetry) in the IW protocol compared to the CW protocol for both PD and HC. Similarly, there were significant main and interaction effects of assessment systems on gait characteristics. In general, the values from Axivity tended to be higher compared to GAITRite although only asymmetry was significantly different between the systems.

Table 4 shows the contribution of gait characteristics in the classification modelling. A higher importance score indicates a greater contribution of each gait characteristic in the overall classification model. The top 5 Axivity characteristics were from variability, rhythm, and pace domains for both CW and IW. For GAITRite, CW contained gait characteristics from pace, rhythm and asymmetry domains, while for IW pace, rhythm and variability were important. Results without gait normalization are presented in the supplementary Table S3 and Figure S3.


**Table 4.** Importance of normalized gait characteristics in the classification of PD.

Both models (SVM-RBF & RF) behaved in the similar manner for both walking protocols and gait assessment systems, with better performance of Axivity compared to GAITRite in both walking tasks with RF. Overall, SVM-RBF performed better than RF. Therefore for comparison of walking protocols and gait assessment, we only reported the results from SVM-RBF. The results of RF are given in the supplementary Table S4.

Overall, the classification of PD was significantly more accurate with Axivity (<0.001) during the CW test (AUC 87.83 ± 7.81% for Axivity and 80.49 ± 9.85% for GAITRite), while there was no difference (*p* = 0.073) between the systems during the IW test (AUC resulted being 79.09 ± 10.11% for Axivity and 79.90 ± 10.06% for GAITRite) (Figure 2 shows the distribution of the model classification performance, where the x-axis represents the classification performance (AUC), the top x-axis represents walking protocols, and the y-axis represents the gait assessment system). For reference, SVM performance results without gait normalization are presented in the Supplementary Figure S4.

**Figure 2.** Distribution of SVM classification performance after normalization of gait characteristics.

#### **4. Discussion**

To the best of our knowledge this is the first study to investigate the impact of different walking protocols and gait assessment systems on the performance of ML models for classification of PD. Robust normalization techniques were carried out to reduce the effect of demographics and speed on between participant differences within each group (PD and HC). A comprehensive group of 14 gait characteristics were selected based on a validated gait model. Finally, widely used SVM-RBF and RF models were trained for classification of PD and HC. The results show that different walking protocols and gait assessment systems significantly affect gait characteristics and in turn the performance of ML models. Harmonizing methods across multiple levels for comparative purposes is strongly advised to optimize and implement ML in disease classification.

#### *4.1. ML Performance: An Overview*

In this study, we found that the combination of CW protocol and Axivity gave the highest PD classification performance. In terms of protocols, ML performance was higher during CW with respect to IW. In terms of systems, Axivity showed a significantly higher AUC (87.83 ± 7.81%) compared to GAITRite (80.49 ± 9.85%) during CW. Similar pattern in results was achieved with RF, where Axivity showed better results compared to GAITRite during both CW & IW. Therefore, walking protocols and gait assessment systems materially impact on ML performance, which makes the comparison of previous ML studies inconclusive. In fact, previous literature has shown that, when using wearables to quantify gait, studies using 2 min CW protocols [18,19] achieved better results compared to those using 10m IW protocols [13,45]. In addition, studies showed that ML models derived from wearable inertial and force feet sensors [14,19,45,46] performed relatively better when compared to studies based on GAITRite data [17].

It's important to underline that many factors can influence ML results: not only walking protocols and gait assessment systems, but also cohort size, disease severity stage of PD, and validation method. However, in the context of this study, we showed that walking protocol and gait assessment have a significant impact on ML performance.

#### *4.2. E*ff*ect of Walking Protocols on ML Model and Performance*

In general, ML performance was higher during CW with respect to IW. There are a number of possible explanations for this. The gait characteristics included in each ML model were different for CW and IW, which may explain the differences in classification performances (i.e., CW higher AUC than IW). Indeed performance of ML models are influenced by the characteristics included in the model and those characteristics are in turn influenced by the protocol used to assess gait. During IW we observed higher gait performance (e.g., higher pace, etc.) for all characteristics and for both groups (HC and PD). Acceleration and deceleration phases at the beginning and end of each IW increase the dispersion of gait characteristics, especially variability and asymmetry compared to CW where gait was sampled under more steady state conditions (lower variability and asymmetry values for both walking systems).

Another aspect is that even though participants were instructed to walk at their normal preferred pace for both protocols, it is clear that gait performance was faster during the IW compared to CW, and this has been reported previously [21]. The reasons for this are most likely because attention to performance is higher during short intermittent walking tests than a continuous steady state—where walking is performed with less attention and conscious effort. However this will also influence the dispersion of gait characteristics for PD and HC as seen by the standard deviations from Table 3. For accurate classification between groups, this dispersion within each group for each characteristic should be minimized to increase the distance between groups. This explains the need to overcome between subject variability within each group to enhance ML performance.

Collectively, this suggests that the walking protocol should be selected carefully and protocols that capture more steady state gait (in our case CW) may be optimal for classification and therefore early identification of PD.

#### *4.3. E*ff*ect of Gait Assessment Systems on ML Model and Performance*

In general, Axivity showed significantly higher classification performance compared to GAITRite during CW and comparable performance during IW. Gait characteristics quantified by the two systems showed significant differences. Even if these two systems (GAITRite and Axivity) measured the same spatial-temporal characteristics from the same walking tasks, the mechanism by which gait characteristics are derived is different. GAITRite determines footfalls based on pressure sensors that identify each step from which additional gait characteristics are derived [47]. Axivity, instead, uses accelerometers which detect movement continuously: individual characteristics are then derived from the raw signal. In previous work comparing gait characteristics from Axivity and GAITRite, mean spatiotemporal gait characteristics (such as walking speed, step length and step time) showed high agreement, while variability and asymmetry showed low agreement between the systems [10]. Gait characteristics extracted from GAITRite are more variable (wider dispersion) at slow speed [48]. Conversely, an accelerometer positioned at lower back may mis-detect gait events like initial and final contacts which may impact on gait characteristic quantification [49]. An accelerometer close to the center of mass of the body can capture small variations in body movement (variability) during walking [50] with higher sensitivity compared to GAITRite. Analysis of the current study indicated that Axivity is more sensitive to detect variability and asymmetry, particularly in PD. Collectively all these factors most likely influence: (i) the observed differences in the gait characteristics quantified by each system and (ii) ultimately the performance of the ML models.

The highest classification performance obtained with Axivity during the CW could also simply be due to the higher amount of data available for Axivity vs. GAITRite during the walking protocol: Axivity continuously sampled the entire 2 min while GAITRite sampled only each pass over the mat (e.g., four passes) during the same time frame. This is in part corroborated by the fact that we found comparable performances during the IW, when the amount of data was similar for both the systems. A further explanation for the difference in performance could be related to the inclusion of gradual turns with Axivity during the CW protocol which could have influenced the ML model. Axivity is not able to quantify turning due to the lack of a gyroscope; to try and address this, we measured turning from a sensor with an embedded gyroscope (Opal Mobility lab system APDM Inc., Portland, OR, USA) collected concurrently in a subsample of the same cohort (PD: 31, HC: 49) during CW and IW (Figure 3 shows the probability distribution of turning gait characteristics, where the x-axis

represents the corresponding units and the y-axis represents the walking protocols). Turning time and angular velocity were significantly different during IW and so the turning segment of the signal was removed from Axivity analysis to delineate the gait characteristics. There were no significant between-group differences in turning characteristics during CW, and so step data from the turning component was retained for the analysis. However, we can't rule out the possibility that steps from the turning component during CW may have contributed to better classification between groups when using Axivity.

**Figure 3.** Distribution of turning characteristics.

#### *4.4. E*ff*ect of Gait Normalization on ML Performance*

From our results, it is clear that participants walked faster during the IW as compared to CW with both gait assessment systems (Table 3). This higher step velocity acts as a function of other gait characteristics [28–30], which can be influenced by its high variability. To find the appropriate walking protocol, multiple regression (MR) normalization using demographics and gait speed was important to reveal important influencing variables and overcome between participant differences among groups. Our findings support this approach, in fact we found that by controlling the effect of speed and demographics, SVM was able to differentiate between PD and HC more accurately. SVM performance increased by 5–7% in CW and 5–9% in the IW in both gait assessment systems (Supplementary Figure S4). The results are in line with previous work where similar gait normalization approaches have been used for better classification [16,31]. Thus, in short walks (IW), normalization may act as a standardized technique to overcome the effect of gait assessment systems. During CW, the effect of gait assessment systems was still significant. Normalization was also important to improve performance of ML—irrespective of protocol or system. This means that, normalization may be important to ensure standardization of walking protocol and gait assessment systems for optimal ML performance.

#### *4.5. Limitations*

This study had some limitations. Only two widely used ML models (SVM & RF) [13–19] were used in this study to compare the effect of walking protocols and gait assessment systems. However, future work should explore other classification models such as logistic regression and neural networks. Turning features were not included in this work due to the use of an accelerometer. In order to harmonize gait characteristics, step width and step width variability and a range of time series and frequency based characteristics were not included in the analysis because they could not be calculated from both systems. The inclusion of these additional variables may improve classification for respective systems and should be explored in future studies to investigate their impact on ML models. PD were assessed within 23.8 ± 4.2 months from clinical diagnosis, which is considered relatively early disease. The cohort assessed in this study was relatively young and results may not be applicable or generalizable to older, frailer people with PD with multi-morbidity.

#### **5. Clinical Implications**

Based on this study, walking protocols (IW & CW) and gait assessment systems had significant impact on the ML model performance. The extracted characteristics in CW with Axivity gave the highest performance in the classification ML model. Our work emphasizes the importance of the use of standardized walking protocols and wearable devices for ML PD classification purposes, to support clinical decision making. With the recent advancements in this field, this study will help clinicians to understand and select the appropriate walking protocols and gait assessment systems for optimal PD diagnosis. In future studies, such as those looking at prodromal disease, CW assessed with Axivity may give a more accurate reflection of gait changes. For better results, it is recommended to control for demographics and walking speed for gait characteristics normalization in the PD ML classification modeling. Intervention studies seeking to determine changes in particular gait characteristic(s) may be advised to use this methodology.

#### **6. Conclusions**

In this study, the impact of different walking protocols (CW & IW) and gait assessment systems (GAITRite & Axivity) on the performance of widely used ML models SVM and RF was investigated. Gait characteristics were normalized with respect to demographic properties and walking speed to overcome the between participants' differences within each group (HC and PD) for each walking protocol (CW vs. IW). Both ML models behaved in similar fashion for both walking protocols and gait assessment systems. Higher performances were achieved with CW compared to IW. Axivity gave higher classification performance compared to GAITRite. The highest PD classification performance was obtained during CW with Axivity (87.83 ± 7.81%). This work supports the idea that direct comparison of various ML studies using different walking protocols and gait assessment systems may not be appropriate. The findings from this study suggest that the choice of the testing protocol and gait assessment systems is important to achieve best classification results, which may have a direct impact on future end points in intervention studies. In conclusion, there is a need for standardization of walking protocols and gait assessment systems for wide scale implementation in clinical gait assessment.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1424-8220/19/24/5363/s1, Table S1: Correlation of gender, age, BMI, and step velocity (speed) with gait characteristics before and after normalization, Table S2: Coefficients from the regression model by using the healthy control participants, Table S3: Importance of gait characteristics in the classification of PD before and after gait characteristics normalization, Table S4: Random forest (RF) classification performance after gait normalization, Figure S1: Distribution of the gait characteristics from 5 domains of conceptual gait model with statistical analysis, Figure S2: Correlations among gait characteristics before and after normalization, Figure S3: Contribution of the gait characteristics in the classification modelling in Support Vector Machine, Figure S4: SVM performance before gait characteristics normalization.

**Author Contributions:** R.Z.U.R. performed data analysis, statistical analysis, drafting and critical revision of the manuscript. S.D.D. helped in data analysis, interpretation of data and critical revision of manuscript for important intellectual content. Y.G. and J.Q.S. provided support for statistical analysis, interpretation, and critical revision of manuscript for important intellectual content. A.J.Y., B.G., and S.L. were involved in interpretation of data and critical revision of the manuscript. L.R. conceptualized and designed the study, helped in interpretation of data, and critically revised the manuscript for important intellectual content.

**Funding:** This work was supported by "Keep Control" project, which is European Union's Horizon 2020 research and innovation ITN program under the Marie Sklodowska-Curie grant agreement No. 721577. ICICLE-Gait study was supported by Parkinson's UK (J-0802, G-1301) and by the National Institute for Health Research (NIHR) Newcastle Biomedical Research Center (BRC) based at Newcastle Upon Tyne Hospital NHS Foundation Trust and Newcastle University (REC number: 09/H0906/82). The work was also supported by the NIHR/Wellcome Trust Clinical Research Facility (CRF) infrastructure at Newcastle upon Tyne Hospitals NHS Foundation Trust. All opinions are those of the authors and not the funders.

**Acknowledgments:** The authors would like to thank all the participants and assessors of the ICICLE study, Lisa Alcock and Rachael Lawson for their support.

**Conflicts of Interest:** The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
