**Deep Learning to Predict Falls in Older Adults Based on Daily-Life Trunk Accelerometry**

### **Ahmed Nait Aicha 1,\*, Gwenn Englebienne 2, Kimberley S. van Schooten 3, Mirjam Pijnappels <sup>4</sup> and Ben Kröse 1,5**


Received: 31 March 2018; Accepted: 18 May 2018; Published: 22 May 2018

**Abstract:** Early detection of high fall risk is an essential component of fall prevention in older adults. Wearable sensors can provide valuable insight into daily-life activities; biomechanical features extracted from such inertial data have been shown to be of added value for the assessment of fall risk. Body-worn sensors such as accelerometers can provide valuable insight into fall risk. Currently, biomechanical features derived from accelerometer data are used for the assessment of fall risk. Here, we studied whether deep learning methods from machine learning are suited to automatically derive features from raw accelerometer data that assess fall risk. We used an existing dataset of 296 older adults. We compared the performance of three deep learning model architectures (convolutional neural network (CNN), long short-term memory (LSTM) and a combination of these two (ConvLSTM)) to each other and to a baseline model with biomechanical features on the same dataset. The results show that the deep learning models in a single-task learning mode are strong in recognition of identity of the subject, but that these models only slightly outperform the baseline method on fall risk assessment. When using multi-task learning, with gender and age as auxiliary tasks, deep learning models perform better. We also found that preprocessing of the data resulted in the best performance (AUC = 0.75). We conclude that deep learning models, and in particular multi-task learning, effectively assess fall risk on the basis of wearable sensor data.

**Keywords:** accidental falls; older adults; machine learning; neural networks; convolutional neural network; long short-term memory; accelerometry

#### **1. Introduction**

Falls among older adults are one of the major health problems that lead to a decreased quality of life and increased morbidity and mortality. In addition, falls pose high costs to the public health service. Risk factors for falls include weak muscles, unsteady gait, cognitive decline, and psychoactive medications. Early detection and monitoring of fall risk factors can significantly reduce the risk of future falls [1,2]. Among these factors, history of falls and of gait and balance disorders have been identified as strong predictors [3].

Fall risk assessment is a process in which the probability of a future fall is estimated, usually within a time frame of 6–12 months. In many intervention programs proposed for fall prevention,

fall risk assessment is performed as the initial step to identify persons at highest risk. The assessment of fall risk is commonly conducted in a clinical setting and based on questionnaires and functional tests of mobility such as the Timed Up and Go (TUG) [4], the Performance Oriented Mobility Assessment (POMA) [5], or the Berg Balance Scale test [6]. Although these tests provide a good indication of one's optimal mobility and performance, their predictive ability for prospective falls is limited (e.g., [7]), possibly because this optimal ability might not be representative of one's use in daily life behavior.

In previous research, we studied the use of ambient sensors for the continuous monitoring of human activities in their natural environment [8,9]. In this paper, we focus on body-worn inertial sensors that are used in many research studies on the ambulatory monitoring of humans in daily life providing reliable insight into an individual's daily activities and gait quality characteristics [10].

Much research is done in the characterization of the quantity of movement of subjects, including the duration of low-, moderate-, and high-intensity activities, the total number of daily steps, and the daily percentage of time spent lying, sitting, standing, and walking [10]. Recent research showed the added value of the characterization of one's quality of movement in the determination of fall risk in older adults [11]. These studies revealed that biomechanical features such as gait stability, variability, and smoothness [12,13], but also mean turn duration [14] and the number of abnormal sit-to-stand transitions [15], are associated with fall risk. However, estimation of these features often requires event detection, which is in need of improvement, and may not currently exploit the wealth of information that has been collected. On the other hand, deep learning allows for the data-driven generation of features and does not suffer from these shortcomings.

In machine learning, deep convolutional and long short-term memory (LSTM) recurrent neural networks have shown to be successful for the recognition of activities [16] and gait patterns [17] from inertial sensor data. However, the assessment of fall risk with such models has not been done before. The contributions of this paper are (a) a comparison of the performance of deep learning models for the assessment of fall risk with a baseline model based on biomechanical features using a large data set of 296 subjects and (b) the extension and testing of these models with multi-task learning to improve their performance.

#### **2. Sensor Data**

The data used in this paper were collected between March 2011 and January 2014 as part of the fall risk assessment in older adults (FARAO) cohort study performed at the Vrije Universiteit Amsterdam. The FARAO study collected data on fall risk factors in older adults with questionnaires, physical tests, and wearable sensors. Participants in the cohort were between 65 and 99 years of age, had a mini mental state examination score (MMSE [18]) between 19 and 30, and were able to walk at least 20 m with the aid of an assistive device, if needed. We re-analyzed the data described in [21], which consisted of a population of 296 older adults. These participants wore a triaxial accelerometer (Dynaport MoveMonitor, McRoberts) on their lower back, which registered 3D trunk accelerations at 100 Hz and ±6 G, for 1 week. During a 6-month follow-up period in which fall incidences and descriptions were obtained monthly, 101 subjects (34.1%) had experienced at least one fall and were identified as fallers. Table 1 provides an overview of the descriptive characteristics of the population. A detailed description of the population and the methods for data collection can be found in [11,19,21].


**Table 1.** Descriptive statistics of the population.

Participants were instructed to wear the accelerometer with an elastic belt around their lower back at all times, except during aquatic activities such as showering. The distribution of the total time that the sensor was worn for fallers and non-fallers was similar. Bouts of non-wearing, locomotion, sitting, lying, and standing were identified using the manufacturer's activity classification algorithm [20]. Only the locomotion bouts were analyzed in the current study. For each locomotion bout, the acceleration in three directions (i.e., anteroposterior (AP), mediolateral (ML), and vertical (VT)) was recorded. Figure 1 shows two examples of locomotion bouts lasting 10 s each.

**Figure 1.** An example of two locomotion samples. (**a**) A typical walking sample; (**b**) A walking sample interwoven with a turning activity at 7 s. accAP: acceleration in the anteroposterior direction; accML: acceleration in the mediolateral direction; accVT: acceleration in the vertical direction.

#### **3. Approach**

On the basis of this data set containing bouts of accelerometer data from 296 participants and the identification of the participants into fall or non-fall categories, a model was made that predicts falls from accelerometer data. In van Schooten et al. [21], a linear model was used, based on biomechanical features from the accelerometer data. In this paper, we used deep neural networks. Deep learning allows for the creation of computational models that are composed of multiple processing layers and learn representations of data with multiple levels of abstraction [22]. This can result in more powerful models, because the complexity of the feature computations are dictated directly by the data and by the quality of the model predictions, rather than by the preconceptions of the operator. On the other hand, no prior knowledge is leveraged in the creation of the model, so it is useful to compare deep learning approaches to traditional machine learning methods.

We evaluated two types of deep neural network (DNN) for the analysis of fall risk. First, we considered the convolutional neural network (CNN), which constrains the number of parameters by sharing parameter values in different parts of the network. It has been used with great success in speech recognition [23] and in the detection, segmentation, and recognition of objects and regions in images [24,25]. We then looked at the long short-term memory (LSTM) model, a specific type of recurrent neural network (RNN). RNNs specifically model sequential inputs such as speech and language [26,27]. In this work, we used a model that combines convolutional and recurrent models, which we refer to as the "ConvLSTM".

We trained the model parameters and evaluated the resulting models by minimizing the loss, a function that expresses how many prediction errors the model makes, and evaluated the models for different values of their so-called "hyper-parameters", which include the number of layers and the number of nodes in each layer, based on their the receiver operating characteristic (ROC) curves. The models can make different types of errors, as well as false positive and false negative predictions, and a single model can be tweaked to minimize one type of error at the expense of the other. The ROC curve shows the model's performance for multiple choices of this trade-off. The area under the ROC

curve (AUC) is a robust metric of a model's performance. The training, validation, and testing of the DNN was performed on a Distributed ASCI Supercomputer 5 (DAS-5) server [28].

#### **4. Deep Learning Neural Network Models**

#### *4.1. Feed-Forward Neural Networks*

Deep neural networks (DNNs) consist of large numbers of simple processing modules, the "neurons", which compute a fixed function—the "activation function"—of the weighted sum of their inputs and are organized in separate layers. The simplicity of the neurons make network training possible, while the large number of nodes and their organization in a large number of layers allows them to perform complex tasks. DNNs have the ability to learn representations of the training data and relate them to the output variable(s) that we train them to predict. An example of a DNN consisting of two hidden layers is given in Figure 2. The number of nodes in the input layer is determined by the dimensionality of the data, while the number of nodes in the output layer is determined by the chosen representation of the intended prediction. The structure of the network is determined by the complexity of the task being predicted. In addition to the number of nodes and layers, the connections between layers affect the complexity of the network. In a dense layer, each neuron is connected to all neurons of the previous layer and has its own set of weights. In a convolutional layer, a neuron is connected to a subset of the neurons in the previous layer, and shares its weights with the other neurons of that layer.

**Figure 2.** A three-layer neural network with three input neurons, two hidden layers of four neurons each, and one output layer.

#### *4.2. Long Short-Term Memory (LSTM) Network*

Recurrent neural networks are a type of neural network where inputs are organized sequentially, and the output at time *t* is connected to all inputs from time 0 to *t* (Figure 3a). Such a network is still a feed-forward network, but the number of layers between an output and previous inputs increases as the time difference increases. In practice, the training of recurrent neural networks (RNNs) with long-term temporal dependencies can be problematic because the gradient of the loss function decays exponentially with the number of layers and, therefore, with time [29]. LSTM networks, introduced by Hochreiter and Schmidhuber [30], are a type of RNN that uses special units to solve this so-called vanishing gradient problem by "gating" the propagation of information over time. They extend RNNs with memory blocks (Figure 3b) to store information, easing the learning of temporal relationships over long time periods.

**Figure 3.** (**a**) A cyclic connection of a recurrent neural network (RNN) folded and unfolded. (**b**) An long short-term memory (LSTM) memory block consisting of one cell at time *t* and the three gates (*it*, *ot*, and *ft*) which control the activation of the cell *ct* and its output *ht*.

#### *4.3. Multi-Task Learning*

Multi-task learning (MTL) has been proposed by Caruana [31] to learn several related tasks by a single model. Having a network learn multiple tasks increases the complexity of the function it computes, but when the tasks are related, the models can share parameters. The complexity of a network performing multiple tasks is then lower than the complexity of multiple networks learning the tasks separately. In addition, the fact that tasks essentially compete for the resources of the network tends to force the network to avoid modeling non-essential aspects of the problem, thereby also improving the performance on the individual tasks.

#### **5. Experiments and Results**

We conducted a set of five experiments to evaluate the presented approach. In the first experiment, we compared deep neural networks (DNNs) with the current state-of-the-art model described in Section 5.1, which relies on manually engineered feature extraction. In the second experiment, we investigated the performance of DNNs in the prediction of fall status at the sample level (i.e., when allowing the model to train and test on different data from the same person), and show drastically improved results. In the third experiment, we explored whether these improvements are due to the model learning to identify people from their gait, rather than from better modeling of fall risk. We observed that the model is capable of identifying people from their gait, but that this does not by itself explain all of the performance increase. In the fourth experiment, we therefore explored how person-specific but not fall-related information can improve the model. We showed that multi-task learning improved fall prediction. Finally, in the fifth experiment, we showed how improving the focus of the model on cleaner data further improved the overall prediction performance. To train

a model and calculate its performance, the complete dataset was split into a training and a validation set (90%) and a test set (10%).

#### *5.1. Experiment 1*

We compared the performance of three types of DNN using raw inputs to the performance of the state-of-the-art model. This base model was previously described by van Schooten et al. [21], and is based on a dataset of ten-second gait samples from which several features such as walking speed, variability, smoothness, and complexity were extracted. Principal components analysis (PCA) was applied to these features (as well as other parameters obtained from questionnaires and tests), keeping 18 principal components, and a multivariate model was developed to predict time to prospective falls. The median of a person's ten-second segments' predictions provided that person's risk assessment. This base model resulted in a performance of AUC = 0.67 (95% confidence interval [0.59, 0.73]) at 6 months [21].

The same complete dataset was randomly split into three subsets (training, validation, and testing) at the subject level, where all the 10-s samples of a subject *A* occur in a single subset. The ratio of fallers to non-fallers was approximately the same in these three sets. The DNNs were given 10-s samples *x* and the corresponding faller/non-faller label *y* ∈ {0, 1} for training and testing. For each sample *x*, the predicted value using a DNN architecture was denoted by *y*ˆ. The median of the predicted values for all of a subject's samples was used as the predicted value for that subject. The subjects' predicted values and their actual values (label) were used to plot the ROC and to calculate the corresponding AUC. Figure 4 shows an illustration of predicted values (*y*ˆ) for multiple 10-s sequences grouped by subject.

**Figure 4.** Example boxplots of the normalized predicted values (*y*ˆ) for multiple 10-s sequences, grouped by subject. Subjects 1 and 4 were non-fallers and the other two were fallers. The final prediction per subject was given by the median of the predictions, as per van Schooten et al. [21]. The green line inside the box represents the median, the box represents the range of first and third quartile and circles represent outliers.

As described in Section 3, three types of DNN architectures (CNN, LSTM, and ConvLSTM) were applied to a small set of the data to determine the best-fitting model. The models were trained by minimizing the binary cross-entropy loss function (Figure 5a), and evaluated in terms of the area under the ROC curve for each subject (Figure 6a). The corresponding AUC was used to measure the performance of the models (Figure 7a).

From these, we can conclude that the LSTM and ConvLSTM architectures resulted in a slightly better performance than the CNN architecture (*p*-values were, respectively, 0.056 and 0.022) and that

there was no significant difference in the performance between LSTM and ConvLSTM (*p* = 0.480). The time needed for the training of the LSTMs was very long compared to the ConvLSTM architecture (Table 2), because two or more LSTM layers were used in the LSTM architecture while the ConvLSTM architecture was set to have exactly one LSTM layer. For this reason, we selected a ConvLSTM architecture and its corresponding hyper parameters to be trained on larger datasets. Table 3 illustrates the architecture of the ConvLSTM type used. The AUC and the corresponding training time of this architecture are given in Table 4.

We compared the performance, in terms of average AUC, of the best-fitting model to the base model using a *z*-test, and found no statistically significant difference (*p* = 0.209). In addition, the results also showed a poor generalization ability of the DNN model when trained at the subject level, as indicated by the gap between the two loss functions in Figure 5a. Perhaps the model learned concepts from the training data that did not apply to the test data and therefore negatively impacted the performance of the model. For the investigation of the cause of this generalization problem, we conducted a second experiment, where we applied the same types of DNNs on different training and testing subsets.

**Figure 5.** A typical loss versus epoch graph during the training of a deep neural network (DNN). The data has been split at (**a**) subject level or (**b**) sample level. Loss is the training loss and val\_loss is the validation loss. The gap between the training and validation loss indicates the amount of over-fitting.

**Figure 6.** Examples of the receiver operating characteristic (ROC) curves (solid red lines) and their corresponding area under the curve (AUC) values obtained using a ConvLSTM model. The dashed blue line represents the ROC for chance. The dataset was split at (**a**) the subject level and (**b**) the sample level.

**Table 2.** Average AUC (standard deviation) and corresponding average training time per neural network (NN) architecture type for a subset of the data. The difference in training time between the ways of splitting the data is due to the slower convergence when splitting at the sample level.


**Figure 7.** A boxplot of the AUCs of different DNN architectures. For the LSTM architecture, at least two LSTM layers were involved, while for the ConvLSTM architecture, only one LSTM layer was involved. The dataset was split at (**a**) the subject level and (**b**) the sample level.

**Table 3.** The ConvLSTM architecture (ConvLSTM is our proposed model that combines convolutional and recurrent models). To keep the architecture clear, we omitted the input layer (layer 00) and the dropout layers (the even layer indices) applied after each convolutional neural network (CNN) layer. *N* was set to 128.


**Table 4.** Average AUC, training duration, and number of folds obtained when applying the ConvLSTM model to different dataset sizes. The dataset was cut into three subsets at the subject level.


#### *5.2. Experiment 2*

For this experiment, the complete dataset was randomly split into three subsets (training, validation, and testing) again, but now at the sample level. As a consequence, there was only a small chance that all of the 10-s samples of a single subject were allocated to only one subset. As in the first experiment, we tested three DNN architectures on a small set of the data to identify the best-performing architecture. The ConvLSTM architecture again resulted in the best trade-off between performance and

training time (Figures 6b and 7b). A *t*-test showed that both LSTM and ConvLSTM had a significantly better performance than CNN (*p* < 0.002), and there was no significant difference between LSTM and ConvLSTM (0.580). Furthermore, this experiment resulted in a better performance than the previous experiment, as shown in Table 5.

**Table 5.** Average AUC and the corresponding standard deviation when splitting at sample or subject levels.


The high AUC when splitting the data at the sample level compared to the subject level can be explained by the smaller within-subject, compared to between-subject, variability of gait. However, another explanation may be that the model learns to identify subjects better than it recognizes characteristics indicating fall risk (since the same subjects were present in training and testing sets, the model could map their identity to fall risk). In the third experiment, we checked the model's ability to identify subjects' gait signatures.

#### *5.3. Experiment 3*

We again split the dataset into three subsets at the sample level. To learn subject signatures together with their fall risk, we used multi-task learning (MTL): fall risk was the main task, while the identity of the subject was the auxiliary task. We used the same ConvLSTM architecture as in Table 3, because of its good trade-off between performance and learning time in the previous experiments, with an additional dense output layer (connected to Layer 11) for the auxiliary task. The overall loss of the network is a weighted sum of the losses on the main and auxiliary tasks. Ten-fold cross-validation was used to calculate the performance of both the main and auxiliary tasks. The performance of the auxiliary task, which identified the person out of the 296 in the dataset, was evaluated with a plot of the ROC for each subject in a one-versus-all approach. The ROC of the main task and, for clarity, a random sample of the ROCs for the auxiliary task are shown in Figure 8.

**Figure 8.** A sample of obtained ROCs for multi-task deep learning (MTDL) with fall status as the main task and subject identity as the auxiliary task. For the auxiliary task, the ROCs were computed using one-versus-all. The corresponding average AUC is reported. For (**a**,**b**), the main and auxiliary losses were given the same weight (1:1); for (**c**,**d**), the main loss function was given higher weight (104:1) than the auxiliary loss function. The dashed blue lines in (**a**,**c**) represent the chance ROC.

As we can see, when both tasks were given the same weight (Figure 8a,b), the network was exceedingly good at recognizing identities, but not as good at predicting fall risk. The network had the information to learn the mapping from identity to risk, but not the information capacity to learn this mapping. Therefore, when we increase the weight of the main task, the network becomes better at predicting fall risk at the expense of the identification task (Figure 8c,d). From this, we could conclude that there were important differences between subjects, but that there were other informative patterns in the data, which led us to our next experiment.

#### *5.4. Experiment 4*

In this experiment, we investigated the effect of MTDL on the model performance. When we split the data at the subject level, it makes no sense to use subject ID as the auxiliary task (since the IDs in the test set are never seen during training), but other subject characteristics can form an informative auxiliary task. The experimental setup is similar to the previous experiment, except that the data was split at the subject level and the auxiliary task was one of the following subject characteristics: *age*, *gender*, *weight*, and *height*. Table 6 shows the average AUC and the corresponding standard deviation of the main task (fall status). We can conclude that MTDL consistently resulted in improved performance compared to the single-task learning used in the first experiment. However, the improvement was not significant when compared to the base model.

**Table 6.** Average AUC and the corresponding standard deviation of the main task (fall status), obtained when the ConvLSTM is applied to the test set. The *p*-value was obtained using the *z*-test to test the difference in the performance to the base model.


#### *5.5. Experiment 5*

In the previous experiments, we used the exact same 10-s data segments as found in van Schooten et al. [21], which consist of samples of locomotion as identified by the accelerometer manufacturer's algorithm [20]. As the data were collected in a daily living environment, the locomotion bouts may contain some "non-gait" data samples, which may have negatively affected the performance of the DNNs. Visual inspection of the data indeed suggested the presence of such data samples. These data samples correspond to cyclic accelerations of the trunk without taking clear steps (e.g., when riding a bike) or involve only a few steps (e.g., when moving in the kitchen while preparing a meal). The objective of this experiment was to investigate the effect of conservatively selecting gait data samples on the performance of the models. To do so, 10-s data samples having a very low dominant frequency in the vertical direction (VT-axis) (≤0.2 Hz) were removed from the data, resulting in approximately 20% discarded data. An example of such included and excluded samples is shown in Figure 9. A procedure similar to Experiment 4 was followed to train, test, and calculate the performance of the ConvLSTM model. Table 6 shows the obtained average AUC and the corresponding standard deviation of the main task. It should be noted that, although different data were used for both training and testing, the results are per *subject*. Therefore, for the same subjects, they are comparable. Comparing these results with those of Experiment 4, we may conclude that the excluded samples did have a negative effect on the performance of the DNNs. The obtained results of the *z*-test showed that this model resulted in a significant improvement compared to the performance of the base model. These results suggest that improvement in fall prediction based on accelerometry is not only warranted on the modeling side, but also on the input (or activity classification) side.

**Figure 9.** (**a**) An example of a 10-s data sample included in the training and testing set and (**b**) an example of a data sample excluded in Experiment 5 due to the low dominant frequencies in the VT-axis. In the bottom-right corners, histograms of the VT-frequencies up to 3 Hz are depicted. Both examples were included in the first 4 experiments.

#### **6. Discussion and Conclusions**

In this paper, we studied the use of deep learning neural networks to model fall risk on the basis of accelerometer data. Our aim was to compare the performance of deep learning on raw acceleration data with the performance of a base model that uses biomechanical features extracted from the data. For this comparison, we used the same dataset. We did not compare our approach with other work done on different tasks such as activity recognition [16] or age-related differences in walking [17].

In our first experiment, we selected the ConvLSTM neural model based on its trade-off between performance and training time. However, although we found that this architecture was best in modeling the training data, it generalized poorly over subjects. This was confirmed in Experiment 2, where we achieved a very good performance (AUC = 0.94) when the training and validation sets contained data, split at samples, from all 296 subjects. The very good performance in this case may have

been caused by the network learning identities of subjects from gait data and using these implicitly to model individual fall risk. In Experiment 3, we studied an MTL network that simultaneously modeled fall risk and identity. We inferred the need to control for subject-specific factors, since training for both fall risk and identity improved the model's performance considerably. In Experiment 4, we studied an MTL approach where as auxiliary task we chose more general characteristics such as age or weight as secondary tasks that were still related to the subject, but were not the subject itself. We found an improvement of the performance of the ConvLSTM model on the validation set of new subjects if we used gender and age as auxiliary outputs. When we compared the performance of our MTL ConvLSTM with the base model of van Schooten et al. [11], we saw a slightly higher performance. However, this was not significant. Nevertheless, our results indicate that deep learning methods provide similar high accuracy of fall risk prediction compared to biomechanical models, with the advantage that they do not require painstakingly crafted features.

The performance of a model relies on the model architecture used and on the input data. In Experiment 5, we therefore studied an approach where we selectively ignored some of the data samples based on a spectral analysis. We found a significantly better performance. These results suggest that a stricter gait classification algorithm may result in more accurate identification of an individual's gait signature and therefore improve model performance. Another option is to use the dynamics in the data over periods longer than 10 s. This can be done by using the entire locomotion bout as input for the ConvLSTM network. Another method is adopting hierarchical methods [32].

In conclusion, this work shows that machine learning on accelerometer data acquired in the home environment provides comparable accuracy to conventional models in the assessment of fall risk of older adults, with the advantage that they do not rely on handcrafted extracted features. We believe that this approach will contribute to the societal challenge of healthy and active aging in the home environment.

**Author Contributions:** Conceptualization, A.N.A., G.E., K.S.v.S., M.P. and B.K.; Methodology, A.N.A., G.E., K.S.v.S., M.P. and B.K.; Software, A.N.A. and K.S.v.S.; Validation, A.N.A.; Formal Analysis, A.N.A., G.E. and B.K.; Investigation, A.N.A., G.E., K.S.v.S. and M.P.; Resources, K.S.v.S., M.P. and B.K.; Data Curation, K.S.v.S. and M.P.; Writing-Original Draft Preparation, A.N.A., G.E., K.S.v.S., M.P. and B.K.; Writing-Review & Editing, A.N.A., G.E., K.S.v.S., M.P. and B.K.; Visualization, A.N.A. and G.E.; Supervision, G.E., K.S.v.S., M.P. and B.K.; Project Administration, A.N.A., K.S.v.S., M.P. and B.K.; Funding Acquisition, K.S.v.S., M.P. and B.K.

**Acknowledgments:** This work is part of the research project BRAVO, which is (partly) financed by the the Dutch research funding organization SIA. The authors would like to thank the participants of the project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Detecting Steps Walking at very Low Speeds Combining Outlier Detection, Transition Matrices and Autoencoders from Acceleration Patterns**

#### **Mario Muñoz-Organero 1,2,\* and Ramona Ruiz-Blázquez <sup>1</sup>**


Received: 30 August 2017; Accepted: 4 October 2017; Published: 5 October 2017

**Abstract:** In this paper, we develop and validate a new algorithm to detect steps while walking at speeds between 30 and 40 steps per minute based on the data sensed from a single tri-axial accelerometer. The algorithm concatenates three consecutive phases. First, an outlier detection is performed on the sensed data based on the Mahalanobis distance to pre-detect candidate points in the acceleration time series that may contain a ground contact segment of data while walking. Second, the acceleration segment around the pre-detected point is used to calculate the transition matrix in order to capture the time dependencies. Finally, autoencoders, trained with data segments containing ground contact transition matrices from acceleration series from labeled steps are used to reconstruct the computed transition matrices at each pre-detected point. A similarity index is used to assess if the pre-selected point contains a true step in the 30–40 steps per minute speed range. Our experimental results, based on a database from three different participants performing similar activities to the target one, are able to achieve a recall = 0.88 with precision = 0.50 improving the results when directly applying the autoencoders to acceleration patterns (recall = 0.77 with precision = 0.50).

**Keywords:** step detection; machine learning; outlier detection; transition matrices; autoencoders

#### **1. Introduction**

The automatic recognition of human activities and movements using wearable sensor data is able to provide contextual information to many areas of application. Some examples are ambient-assisted living [1], the self-management and monitoring of health parameters for patients with chronic conditions [2], training in sports [3], or entertainment and security [4]. By using the data extracted from sensors such as accelerometers, physiological sensors, Global Position System (GPS), or environmental sensors, several movement related features can be computed and machine learning algorithms can be trained to classify different activities and movements [4].

One particular application of using sensor data to detect human movements is for counting steps while walking at very slow speeds. People with physical disabilities or medical long-term conditions may find it difficult to walk at normal speeds [5]. However, physical activity is normally recommended to this particular set of users in order to promote or regain a healthy lifestyle. Counting steps is a supporting mechanism for accurately measuring physical activity for these users. Among the available wearable sensors, accelerometers and pedometers are commonly used for tracking ambulatory physical activity in clinical populations. By using the sensor display, information such as the number of steps walked and/or the number of calories burnt could be shown to the user, which are useful in motivating patients to increase their activity levels [5]. However, the accuracy of accelerometers for gait analysis tends to decrease (underestimating the real number of steps) in slow walking conditions [6]. Some examples of of-the-shelf sensors and devices that underestimate strides at slower walking speeds can be found in [7]. The authors in [7] found that the number of steps counted by two off-the shelf devices while walking at a walking cadence around 67 steps/minute was around 90% of the real value. This degradation would increase for slower cadences.

In this paper, we propose a novel mechanism to detect and count steps at slow speeds by processing acceleration data from a wearable device. We validate the proposed algorithm by using the data from three different participants executing slow-walking segments in the middle of several related activities such as sliding (walking without lifting the feet from the ground) at slow speeds, walking in circles at slow speeds, or walking at moderate speeds. The algorithm combines outlier detection for pre-selecting and aligning candidate steps, transition matrices to capture time dependences in the sensed time series, and autoencoders to assess the similarity of pre-selected segments with real steps used for training. Outlier detection from acceleration time series will identify segments of interest that exhibit particular stochastic properties, such as the ground contact instant while walking.

#### **2. Related Work**

Physical and mental health conditions benefit from physical activity. A healthy lifestyle improves both physical and mental health aspects [8]. In order to monitor the physical activity of a particular user, the counting of steps is one of the most used measures, and therefore it is important that activity-monitoring devices are both specific and sensitive in estimating the actual number of steps walked (discriminating real steps from non-stepping body movements) [8]. Many off-the-shelf devices register a significant number of false positive steps per minute when executing non-walking activities [8] and fail to count steps while walking at slow speeds [7]. Some previous studies have reviewed the accuracy of accelerometers for gait analysis in slow walking, such as [6,7,9,10], finding that off-the-shelf sensors tend to underestimate strides (significantly in some cases) at slower walking speeds (more specifically for patients of severe medical conditions which affect the gait) [11].

Different approaches and algorithms have been previously used for counting steps in different scenarios. The authors in [12] used a threshold based algorithm, characterized by a linear computational time, trying to improve the real-time monitoring and real-time analysis of the walking behaviour of animals such as dairy cows. The research in [13] proposed an algorithm based on the use of an Android smartphone accelerometer, Fast Fourier Transform (FFT), and thresholding for detecting steps. The algorithm was not validated for slow walking and achieved poor results for running segments of data. The study in [14] used a wrist worn accelerometer to estimate the walking cadence and speed in daily life walking in several environments. The study in [15] proposed the use of some particular points in the acceleration time series in order to detect steps at normal speeds.

There are some preliminary results for detecting steps while walking at slow speeds based on raw acceleration data [6,16]. The slow execution of movements can contribute to increasing the difficulty in the process of detecting steps from acceleration data and in order to minimize the false positives from other activities.

Outlier detection has attracted a significant interest in several different areas. The authors in [17] applied outlier detection to estimate peak ground accelerations in seismic data. An algorithm for collision and hazard detection for motorcycles via inertial measurements based on outlier detection was presented in [18]. The authors in [19] made use of univariate outlier detection techniques in order to detect unusual sleep patterns. The authors in [20] used a density-based outlier detection method by measuring the LOF (Local Outlier Factor) on a projected PCA (Principal Component Analysis) domain from real world spatiotemporal traffic signals to detect traffic data outliers, which are errors in data and traffic anomalies in real situations, such as accidents and congestions. The research in [21] also used an outlier detection algorithm in a traffic system as a basis for alerting the transport department and drivers about some abnormal traffic conditions, such as traffic accidents or traffic congestion.

Outlier detection has also been previously used for the recognition of human movements. By using different sensor technologies, some human movements can be studied in terms of sequential data analysis. The authors in [22] used models that utilize sequential data as a measure to determine if the executions of a particular activity are close enough to a pre-defined specification of the activity or if they should be considered as executed in a wrong way. The authors used Hidden Markov Models (HMM) for time series characterization. The research study in [23] tackled the problem of fall detection by using a combination of outlier detectors. Using HMM for outlier detection and its application to fall detection has been studied in [24].

The research in this paper combines outlier detection from acceleration time series with time dependencies characterization via a transition matrix to feed a detection algorithm based on the similarity between the input and output of an autoencoder. Data from real steps while walking at slow speeds are used to train the autoencoder. The trained autoencoder is used to reconstruct the transition matrices at pre-selected points based on a Mahalanobis distance outlier detection mechanism. The Pearson correlation index is used to assess the similarity between the input and output of the autoencoder in order to determine if the pre-selected point is a step at slow speed walking.

The rest of the paper is organized as follows. Section 3 is dedicated to presenting the mechanism used to obtain the input data from the acceleration time series. Section 4 describes the outlier detection algorithm used in this paper. Section 5 is dedicated to present the mechanism used to estimate the transition matrices. The way we use autoencoders is shown in Section 6. Section 7 details the experimental results and Section 8 captures the main conclusions of this research.

#### **3. Sensor and Data Series**

Smart mobile devices such as smartphones or tablets contain several sensors that are able to automatically monitor and track some user related information. In our case, the proposed algorithm in this paper is based on the use of the accelerometer and gravity sensors. Combing the output of these sensors, we obtain acceleration time series that are both gravity free and geo-referenced in order to characterize the user's movements.

Figure 1 shows the device used to record the raw acceleration data and the three axes as defined by the Android operating system to provide accelerometer and gravity information. The device used was a Nexus 6 Android mobile device, which contains an accelerometer sensor and is able to estimate the gravity force vector using the combination of the accelerometer, gyroscope, and magnetic field sensors. The device is able to obtain 50 samples per second from these two sensors.

The data gathering process performs a series of steps. First, we remove the gravity component from the acceleration time series provided by the accelerometer sensor by subtracting the output from the gravity sensor to the accelerometer sensor output, as captured in Equation (1). <sup>→</sup> *a* represents the gravity free acceleration, <sup>→</sup> *ac* contains the output of the accelerometer sensor, and <sup>→</sup> *g* represents the output of the gravity sensor.

$$
\overrightarrow{\dot{a}} = \overrightarrow{a\_{\varepsilon}} - \overrightarrow{\mathcal{g}} = (a\_{x\_{\prime}} a\_{y\_{\prime}} a\_{z}),
\tag{1}
$$

After calculating <sup>→</sup> *a* (which is referenced to the axes in Figure 1) we can project the acceleration component in the gravity direction (perpendicular to the Earth's surface), which is independent of the relative position of the mobile device and the relative movements of the device while walking (when the device is carried not tight to the human body). Equation (2) captures the computation of the acceleration component in the gravity direction.

$$
\stackrel{\rightarrow}{a}\_{\mathcal{S}} = \frac{\stackrel{\rightarrow}{a}\stackrel{\rightarrow}{\cdot}}{\|\stackrel{\rightarrow}{\mathcal{g}}\|^2} \stackrel{\rightarrow}{\mathcal{g}} \,. \tag{2}
$$

The value of *ag* = <sup>→</sup> *ag*· → *g* → *<sup>g</sup>* provides the vertical acceleration while walking. In order to improve the detection accuracy, the acceleration information in the horizontal plane has also been used. In particular, we have used the *aHx* component from Equation (3):

$$
\overrightarrow{a\_H} = \overrightarrow{a} - \overrightarrow{a\_\mathcal{g}} = (a\_{Hx\_\mathcal{L}} a\_{Hy\_\mathcal{L}} a\_{Hz}),
\tag{3}
$$

The raw data used as the input to the step detection algorithm will be the time sequence of - *ag*, *aHx* , sampled at 50 Hz when the mobile device is carried in the user's trouser pocket. Placing the sensor device close to the hip is a normal practice in step detection [5].

**Figure 1.** Nexus 6 mobile device and accelerometer axes.

#### **4. Outlier Pre-Detection**

In order to save computation energy and optimize the search for candidate time segments that may contain a step, and at the same time provide an alignment for the time sequence, an outlier detection technique based on the Mahalanobis distance is proposed in this paper. Gravity compensated step related acceleration patterns have a prominent activity in the milliseconds after the ground contact of each foot. In order to detect steps of up to 60 steps per minute, a time window of up to 1 s should be used in order to detect outliers in the - *ag*, *aHx* sequence. Outliers are potential candidates that may contain a ground contact in a step.

In order to train the algorithm, a labeled sequence of steps at low speeds (at 30 and 40 steps per minute) from three different users sampled at 50 Hz is used. The outlier detection provides a set of candidate points that are post-filtered using the labeled information. The result is a set of outliers associated with ground contact instants of real steps executed at slow speeds. These steps will be used to compute the transition matrices and then train the final autoencoders (one for *ag* and another for *aHx*) for steps at slow speed detection.

#### **5. Transition Matrices**

The transition matrix computed from a time series captures the probability of moving from each state-a at instant "*t*" to each state-b in the next sample at "*t* + 1". In our case, we have two acceleration time series *ag*(*t*), *aHx*(*t*). A transition matrix is computed for the acceleration samples around each pre-detected outlier. For each outlier at *ti*, the transition matrices are computed form *ag*(*ti*−*N*,..., *ti*,..., *ti*+*N*) and *aHx*(*ti*−*N*,..., *ti*,..., *ti*+*N*). Each sample is assigned to a state according to the distance to the mean value of the series in terms of the standard deviation of the series. We have used eight states as follows (for the particular case of *ag*(*t*)):

$$\begin{array}{l} \text{m = mean value of } (a\_{\mathcal{S}}(t\_{i-N}, \ldots, t\_i, \ldots, t\_{i+N})) \\ \text{std = standard deviation of } (a\_{\mathcal{S}}(t\_{i-N}, \ldots, t\_i, \ldots, t\_{i+N})) \\ \text{S\_1 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le -1.5^{\text{std}} \\ \text{S\_2 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le -\text{std and } a\_{\mathcal{S}}(t) - \mathbf{m} > -1.5^{\text{std}} \\ \text{S\_3 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le -0.5^{\text{std}} \text{ and } a\_{\mathcal{S}}(t) - \mathbf{m} > -\text{std} \\ \text{S\_4 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le 0 \text{ and } a\_{\mathcal{S}}(t) - \mathbf{m} > -0.5^{\text{std}} \\ \text{S\_5 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le 0.5^{\text{std}} \text{ and } a\_{\mathcal{S}}(t) - \mathbf{m} > 0.5^{\text{std}} \\ \text{S\_6 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le \text{std and } a\_{\mathcal{S}}(t) - \mathbf{m} > 0.5^{\text{std}} \\ \text{S\_7 if } a\_{\mathcal{S}}(t) - \mathbf{m} \le 1.5^{\text{std}} \text{ and } a\_{\mathcal{S}}(t) - \mathbf{m} > \text{std} \\ \text{S\_8 if } a\_{\mathcal{S}}(t) - \mathbf{m} > 1.5^{\text{std}} \\ \end{array}$$

The state vectors for outlier "*i*", *sg*(*ti*−*N*,..., *ti*,..., *ti*+*N*) and *sHx*(*ti*−*N*,..., *ti*,..., *ti*+*N*), are used to calculate the transition matrix at ti. The transition matrix counts the number of times that being at state j goes to state k in the next instant of time. A final regularization is performed to convert counts to probabilities by dividing the cumulative count of each row, as captured in Equation (4), where the component in row *j* and column k of the transition matrix is calculated.

$$T(j,k) = \frac{\text{Count}\ (j \to k)}{\sum \text{Count}\ (j \to l)}\tag{4}$$

In order to use the transition matrices at each outlier as the input to the autoencoder, described in the next section, the matrix is converted into a vector by concatenating each row in the matrix one after the other.

#### **6. Autoencoders**

A method based on training two different autoencoders, one for the *sg* transition matrices at each step and a similar one based on *sHx*, transition matrices, have been used in order to detect steps from pre-selected outliers. The Pearson correlation index is used to calculate the similarity of the reconstructed transition matrix at each pre-selected outlier in the detection time series with the input one. If the Pearson correlation index for both autoencoders is above a certain threshold, than the pre-selected point is detected as a step. A description on how autoencoders work can be found in [25].

The design of an autoencoder tries to minimize the error between the reconstructed output and the input following Equation (5).

$$\left| \varepsilon(\mathbf{x}, \mathbf{x}\prime) = \right| \left| \mathbf{x} - f\_2 \left( \mathcal{W}^{\prime} (f\_1(\mathcal{W}\mathbf{x} + b\prime)) + b\prime \right) \left| \right|^2 \,, \tag{5}$$

where *x*' is the reconstructed signal after an encoder and decoder functions (*f* <sup>1</sup> and *f* <sup>2</sup> are activation functions such as the sigmoid function). In our case, *x* corresponds to the vector calculated from serializing the transition matrix *T*, as described in the previous section.

#### **7. Results**

This section presents the results of the conducted experiment. The set-up for the experiment is presented first (both in terms of the data gathered as well as the implemented algorithm details). A first sub-section is dedicated to present the experiment set-up and the database gathered. A second sub-section is dedicated to explaining the internal details of the implemented algorithm. Then, the obtained results are captured in two different sub-sections. A first sub-section shows the results achieved when directly applying the autoencoders to the acceleration time series around the pre-selected outliers. A second sub-section uses the transition matrices instead of the acceleration

time series as the input to the final autoencoders in order to capture the time dependencies. The idea is not only to present the results of the proposed algorithm in terms of accuracy and recall, but also to assess the gain in these figures when using transition matrices instead of acceleration time series, as other previous research has presented.

As a mechanism to compare the detection results provided in each case, we have used the obtained precision, recall, and *F* score. If we define *tp* as the number of "steps walking at slow speeds" that are correctly detected, *Tp* as the total number of "steps walking at slow speeds" present in the validation sequence, and *fp* as the total number of "non-steps walking at slow speeds" that are detected as positive samples, the precision, the recall, and the *F* score can be defined, as shown in Equation (6).

$$\begin{array}{l}\text{precision} = \frac{tp}{(tp + fp)}\\\text{recall} = \frac{tp}{Tp}\\\text{if }= 2 \cdot \frac{p \text{precision} \cdot \text{recall}}{p \text{precision} + \text{recall}}\end{array} \tag{6}$$

#### *7.1. Experiment Set-up and Database*

A group of three volunteers have recorded the output of both the acceleration and gravity sensors when wearing a Nexus 6 Android smart phone in the pocket of the trousers (close to the participant's hip, a common place that has been widely used in previous literature for step detection and counting). The demographic details for the participants are captured on Table 1.


**Table 1.** Participant demographics.

The implemented algorithm combining the techniques presented in Sections 4–6 should be trained with data that will characterize the objective class to be detected. For the case of this paper, the objective of the algorithm has been set to detect the steps executed at cadences between 30 and 40 steps per minute. Our objective is to maximize the number of true positives (the number of true steps detected) while minimizing the false positives (segments of other activities that are detected as members of the target class). Previous studies, such as [7], have shown how the reported number of steps by off-the-shelf devices tend to underestimate the real value for speeds around 67 steps per minute. In our research, we have designed an algorithm that could be trained to detect steps at even smaller cadences. Moreover, we have added the study of how the algorithm discriminates data from similar activities (which has been many times overlooked in previous studies such as [7]). We have generated a dataset that includes segments of data in the target class, as well as three other activities that may present similar acceleration patterns (sitting down and up, walking in circles at 30 steps per minute, and walking at 60 steps per minute). In order to generate the required data, each participant was asked to execute the following sequence of movements:


The movements contained in the database are all related to the target class of steps walking at speeds of 30 to 40 steps per minute. Walking without lifting the feet from the ground (sliding) at slow speeds is expected to generate similar *aHx*(*t*) sequences than walking at the same speeds, but the algorithm should detect the differences in the *ag*(*t*) component. On the opposite site, sitting down and standing up could generate similar acceleration patterns in the gravity axis *ag*(*t*) but different ones in *aHx*(*t*). Walking in circles around a chair will change the direction of the movement and so the acceleration components. Finally, walking at moderate speeds (60 steps per minute) will generate similar patterns for some of the steps. In the real application of counting steps for monitoring physical activities, all of the steps would count for a positive output. In the case of this research, we want to test the discrimination rate of the proposed algorithm for the particular case of walking at slow speeds from some similar classes.

#### *7.2. Implemented Algorithm Details*

The techniques presented in Sections 4–6 have been combined to implement two "steps at slow speeds" detection algorithms. This sub-section captures the design and implementation details for these algorithms. The experimental results for these algorithms using the data gathered as described in the previous sub-section are captured in Sections 7.3 and 7.4.

The first algorithm will combine outlier detection techniques and autoencoders in order to detect "steps at slow speeds" from acceleration data. The algorithm details are:


The second algorithm will try to improve the results of the previous algorithm by characterizing the time sequences around each pre-detected outlier in terms of its transition matrix. The algorithm details are:


Both of the algorithms have been implemented in Matlab. The mahal function has been used in order to calculate the Mahalanobis distance. The trainAutoencoder [26], encode and decode functions have been used for the autoencoder.

#### *7.3. Autoencoders Based on Acceleration Data around Outlier Pre-Detected Points*

The data from the segments of slow walking (30 and 40 steps per minute) from two out of the three participants were isolated in order to train the autoencoders. Each step was detected by using the outlier pre-detection phase previously described in Section 4. The time location of each detected outlier was compared with a predefined vector manually introduced containing the ground truth (manual labels) for each step. The outlier pre-detection phase was needed in order to automatically align the segments of the acceleration data associated with each step (the manual labeling process did not require therefore the exact location of the ground contact instant but a visual approximation, eliminating therefore the human error when labeling the data). Only two participants have been selected for training to implement a leave one out validation approach.

In order to generate the samples of acceleration, windows of 240 ms of raw data (both accelerometer and gravity data) have been used around each pre-detected outlier corresponding to each labeled step. The size of 240 ms. has been empirically selected in order to include all of the acceleration patterns corresponding to the ground contact instant. Figure 2 captures some of the acceleration samples *a*(*t*) corresponding to outliers associated with labeled steps.

**Figure 2.** Acceleration samples around outliers corresponding to slow walking steps. Each color represents a different sample.

For each window of pre-selected data, the *ag*(*t*), *aHx*(*t*) components were computed. The samples of each component were used to train a different autoencoder.

After finishing with the training of the autoencoders, the validation phase used all of the recorded data from the third participant in order to detect steps (the leave-one-out process was repeated with all tree participants). Windows of acceleration data were computed for each pre-detected outlier in a similar way, but now for the entire time series containing all of the movements for each of the three participants. A Pearson correlation index was used to compare the output and the input of the autoencoders and different detection thresholds have been used to assess the obtained results. The idea is that the acceleration sequences computed for similar steps to those used in the training of the algorithm will show high values for the Pearson correlation index. However, acceleration sequences computed for other movements from those used in the training of the algorithm will show small values for the Pearson correlation index.

The results obtained for the recall, precision, and *F*-score for different similarity thresholds are captured in Table 2. The optimal value for the *F*-score is achieved when using a similarity threshold of 0.6. The recall in this case is 0.67 and the precision is 0.59. A recall = 0.67 means that we are able to detect 67% of all steps executed at slow speeds (30 and 40 steps per minute). A precision = 0.59 means that we detect 41% of false positives (other activities detected as slow walking steps). The recall of the algorithm could be improved by detecting slow walking segments in which missing intermediate walking steps could be incorporated into the output of the algorithm. The idea is to estimate the walking cadence as the inverse of the difference in time between two consecutive detected steps. When the cadence shows a relatively small variation among consecutive steps for the last steps and the time between two consecutive detected steps suddenly approximates the double of previous steps, we could estimate that one more step has been executed. In our case, we have not used this detection feature since we only had a limited number of walking segments and the results will be close to the optimal value of recall = 1 since it is known that the test person did not perform intermediate stops (except for the steps missing at the beginning of a waking segment).


**Table 2.** Results for different similarity thresholds.

The average distribution of false positives among the different movements is captured in Table 3. No outlier in the acceleration time series for the sit down and stand up segments is classified as walking at slow speeds. The horizontal acceleration component is different and there is no confusion for this movement. Only a moderate 6.83% of walking in circles steps are detected as walking at slow speeds. The horizontal component is again different. The majority of false positives are related to sliding sections. In this case, the horizontal component is similar to the target class and some of the misclassified segments show a similar vertical acceleration pattern despite the fact that the foot is not separated from the ground (the sensor is located in the pocket of the participant which captures some vertical acceleration activity). Finally, some of the walking at moderate speed steps are also generating false positives and showing similar patterns as those steps executed at slower speeds.

**Table 3.** Percentage of false positives.


#### *7.4. Autoencoders Based on Transition Matrices around Outlier Pre-Detected Points.*

In this sub-section, the same computations are performed but using the transition matrices instead of the acceleration time series in order to train the autoencoders and perform step detection. Transition matrices capture the temporal dependencies from adjacent samples in acceleration segments. We use the method proposed in Section 5 to compute the transition matrices and to convert them into vectors in order to feed the final autoencoders.

The results obtained in this case for the recall, precision, and *F*-score for different similarity thresholds are captured in Table 4. The recall, precision, and *F* scores improve as compared to the previous section. For a value of the similarity threshold of 0.4, a recall of 0.88 is achieved, meaning that 88% of the steps walking at slow speeds are counted (with 50% of false positives among the related movements). The optimal value for the *F*-score is achieved when using a similarity threshold of 0.8. The optimal *F*-score is in this case 0.67, improving from 0.63 in the case of using acceleration time series instead of transition matrices. A balanced result for both recall and precision is achieved for a similarity threshold of 0.7, in which approximately 2/3 of the steps walking at slow speeds are detected (without using the post-detection estimation of missing steps previously described) and 2/3 of true positives are detected (1/3 of false positives).


Figure 3 captures a visual comparison for the recall achieved by both methods for different similarity thresholds. Using the transition matrices to characterize the time dependencies improves the results obtained for all of the similarity threshold values. The results do not decay so fast when the threshold moves close to 1.

The results for the *F*-score for the different similarity thresholds for both methods are captured in Figure 4. Using the transition matrices instead of time acceleration segments generates a more constant *F*-score along the different similarity thresholds (the lost in recall is compensated by a similar gain in precision).

**Figure 3.** Recall for different similarity threshold for both methods.

**Figure 4.** *F*-score for different similarity threshold for both methods.

The average distribution of false positives among the different movements for the case of using transition matrices to capture the time dependencies is shown in Table 5. Similar results to those presented in Table 3 are obtained. In this case, there is a small decrement in the percentage of steps at 60 steps per minute detected as walking at slower speeds, which is compensated by the increment in slide segments detected, as walking at slow speeds.

**Table 5.** Percentage of false positives.


#### **8. Conclusions**

This study has proposed and validated a new method to detect steps while walking at slow speeds. The algorithm combines the use of outlier detection in acceleration time series, sensor random movements compensation, time dependencies modelling in acceleration series, and a final detection phase. We have used the Mahalanobis distance to detect outliers in the time series. A threshold has been used so that the ground contact instant in steps walking at slow speeds is detected as an outlier in at least 95% of the recorded data. Random movements of the sensor when not tightly worn to the participant's body and initial placement compensation has been conducted by using the gravity sensor to estimate the direction of the gravity force. Temporal dependencies have been modeled using transition matrices. A comparison when acceleration time series are used instead of transition matrices is also presented in the paper. Finally, we have used autoencoders and a similarity function based on the Pearson correlation index in order to perform the final step detection.

The results have been validated using a new database that we have recorded using three participants executing a series of movements that are similar in terms of acceleration patterns to the target class that we want to detect (walking at slow speeds). Using acceleration time series as the input to the final autoencoders achieves an optimal *F* score of 0.63 for the data in the recorded database. Using transition matrices to model the time dependencies improves the results to a *F* score value of 0.67.

Similar movements have been introduced in the recorded database to validate the proposed algorithm in a pessimistic scenario. The results show that around half of the false positives are for steps executed at higher speeds, which show similar patterns around the ground contact instant to steps at slower speeds. These false positives will contribute to positive results for the generic approach of detecting steps while walking at any speed in applications to motivate the physical exercise to people in need.

As a future work, the study will include other parts of the body to which to attach the accelerometer sensor and using devices from different manufacturers in order to generalize results. Moreover, participants suffering long-term conditions, such as COPD, will also be added in order to validate the applicability of the proposed algorithms for different types of users.

**Acknowledgments:** The research leading to these results has received funding from the from the "HERMES-SMART DRIVER" project TIN2013-46801-C4-2-R (MINECO), funded by the Spanish Agencia Estatal de Investigación (AEI), and the "ANALYTICS USING SENSOR DATA FOR FLATCITY" project TIN2016-77158-C4-1-R (MINECO/ ERDF, EU) funded by the Spanish Agencia Estatal de Investigación (AEI) and the European Regional Development Fund (ERDF).

**Author Contributions:** The main author, Mario Muñoz-Organero, conceived and developed the algorithm. The authors Mario Muñoz-Organero and Ramona Ruiz-Blazquez participated in the experiment and validated the experimental results.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Three-Axis Ground Reaction Force Distribution during Straight Walking**

#### **Masataka Hori, Akihito Nakai and Isao Shimoyama \***

Department of Mechano-Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan; hori@leopard.t.u-tokyo.ac.jp (M.H.); nakai@leopard.t.u-tokyo.ac.jp (A.N.)

**\*** Correspondence: isao@leopard.t.u-tokyo.ac.jp; Tel.: +81-3-5841-6318; Fax: +81-3-3818-0835

Received: 30 August 2017; Accepted: 19 October 2017; Published: 24 October 2017

**Abstract:** We measured the three-axis ground reaction force (GRF) distribution during straight walking. Small three-axis force sensors composed of rubber and sensor chips were fabricated and calibrated. After sensor calibration, 16 force sensors were attached to the left shoe. The three-axis force distribution during straight walking was measured, and the local features of the three-axis force under the sole of the shoe were analyzed. The heel area played a role in receiving the braking force, the base area of the fourth and fifth toes applied little vertical or shear force, the base area of the second and third toes generated a portion of the propulsive force and received a large vertical force, and the base area of the big toe helped move the body's center of mass to the other foot. The results demonstrate that measuring the three-axis GRF distribution is useful for a detailed analysis of bipedal locomotion.

**Keywords:** ground reaction force (GRF); micro electro mechanical systems (MEMS); gait; walk; bipedal locomotion; 3-axis force sensor; shoe; force distribution

#### **1. Introduction**

Bipedal locomotion is one of the most remarkable features of humans. Many researchers have focused on the relationship between bipedal locomotion and the ground reaction force (GRF) [1–4]. Recently, force plates have enabled us to measure the three-axis resultant forces while walking on such a plate in limited laboratory settings. For walking action outdoors, only the normal force distribution can be measured by use of a piezoelectric film sensor inserted into the insole of a shoe [5–9]. However, researchers cannot obtain shear force distribution data from these methods, which is necessary for evaluating forward and lateral motion.

A large amount of the shear force distribution data with normal force distribution data obtained during walking is useful for analyzing bipedal locomotion in more detail. The force distribution data can be used to determine the locations at which large shear and vertical forces are applied during walking. This result will be helpful for optimizing the positions of stud pins in athletic shoes and for determining the best shoe cushion design. Thus, this knowledge is beneficial for shoe design, sports science, and medical treatment. Unfortunately, it is impossible to obtain the shear force distribution data from multiple steps using the current commercially available methods. Therefore, we fabricated a new force sensor and obtained the three-axis force distribution data.

In this paper, we measured the three-axis GRF distribution during walking using a small three-axis force sensor attached to a shoe. The small three-axis force sensor was composed of rubber and a sensor chip fabricated using a micro-electro-mechanical system (MEMS). These force sensors were attached to the shoe after being fixed in a rubber frame without the use of a metal plate. Using this shoe–sensor combination, the three-axis GRF distribution was measured during straight walking on a treadmill. After measurement, the local features of the applied three-axis force under the shoe were analyzed.

#### **2. Materials and Methods**

#### *2.1. Three-Axis Force Sensor and Rubber Frame*

The three-axis force sensor used for the measurements consisted of a sensor chip, a flexible cable, and thermoplastic rubber. The fabrication method, force detection principal, and a detailed feature of the sensor chip were described in [10]. Briefly, the dimensions of the sensor chip were 2 × 2 × 0.3 mm, and three pairs of Si-doped beams were fabricated on the sensor using a MEMS. These beams were used as the sensing elements for the applied three-axis force, and canceled out the effect of temperature change. This sensor chip was attached to the flexible cable substrate, and was electrically connected using conductive paste (XA-874, FUJIKURA KASEI Co., Ltd., Tokyo, Japan). After finishing the electrical connection, the sensor chip was covered with thermoplastic rubber (TSE3431-H(A) and TSE3431-H(E), Momentive Performance Materials Japan LLC, Tokyo, Japan). The covering rubber was 8 mm in diameter. The total dimensions of the three-axis force sensor were 11 × 11 × 3.5 mm, and the weight of the sensor was 0.6 g (Figure 1a).

**Figure 1.** (**a**) Sensor chip and three-axis force sensor; (**b**) Top and bottom of the three-axis force sensor embedded in rubber material used as a sole.

Before the force sensor was used for measurement, it was fixed in a rubber frame. The frame was made of rubber (TangoBlack, Stratasys Ltd., Rehovot, Israel) and was constructed using a 3D printer (EDEN260V, Stratasys Ltd., Rehovot, Israel). The Shore A hardness of the rubber material was 61, which is within the hardness range of the soles of commercially available shoes. The dimensions of the frame were 25 × 25 × 7 mm. To fix the three-axis force sensor, a 11.5 × 13 × 3.5 mm depression was made in the center of the sole, and a hole was created near the depression to allow the force sensor cable to pass through. The three-axis force sensor was fixed in the depression using clinchers. To attach the sensor to the shoe, a cloth was bonded to the backside of the frame using glue. The cloth was able to be detached via a hook-and-loop fastener (Figure 1b). The total weight of the three-axis force sensor and the frame was 15 g.

Vertical and shear forces were applied to the three-axis force sensor for calibration. A commercially available six-axis force sensor (SI-130-10, ATI Industrial Automation, Inc., Apex, NC, USA) was used as a reference device for the three-axis force sensor in the calibration experiment. The six-axis force sensor was fixed on the stage, and our three-axis force sensor that was fixed in the frame was attached

to the top of the six-axis force sensor using a hook-and-loop fastener. An acrylic plate was fixed over the three-axis force sensor. In the calibration procedure, the stage was moved upward until our sensor touched the acrylic plate, and then vertical force was applied. Then, the stage was moved horizontally so that shear forces were applied to our sensor. The relationship between the applied force and the sensor outputs was determined from this calibration, and the result was used to measure the three-axis GRF distribution during walking (Figure 2).

**Figure 2.** Outputs of the three-axis force sensor embedded in rubber material used as a sole. (**a**), (**b**) and (**c**) are the results when x-, y- and z-axis force were applied to the sensor, respectively. The markers in the graphs are plotted on the measured values. The solid lines in the graphs are fitting values obtained using the least-squares technique. (**d**) Equation of a force sensor to convert from resistivity changes to forces.

#### *2.2. Measurement System*

To measure the three-axis GRF distribution during walking, three-axis force sensors were combined with a shoe and circuits.

Commercially available shoes were selected for the measurement experiment (SST8, Dexter, MO, USA). The soles of these shoes could be attached and detached via the hook-and-loop fastener on the bottom surface. This feature was used to attach three-axis force sensors fixed in rubber frames to the left shoe. Rubber with the same thickness was attached to the sole of the right shoe. The shoes used were 26.0 cm in size. Eleven force sensors were used in the forefoot area, and five force sensors were used in the heel area. We set local coordinates on each sensor. When the subject stands with toes and heels together, the coordinate of the sensors in the heel area and the forefoot area rotates 10 degrees counterclockwise and 7 degrees clockwise with respect to the forward direction, respectively. The cables from the force sensors were fixed using a hook-and-loop fastener on the side surface of the shoes, and were connected with circuits through connectors and wires. The weight of the shoe with sixteen force sensors was 470 g. The circuits were carried on the subject's shoulders with a backpack when the three-axis GRF distribution was measured (Figure 3).

To correct the force data during walking, a circuit was designed using an IC board (mbed NXP LPC1768, NXP Semiconductors N. V., Eindhoven, The Netherlands). On the circuit board, bridge and amplification circuits, low-pass filters, and A/D converters were mounted for the output channel of each three-axis force sensor. The voltage changes from the three-axis force sensors were measured using the bridge circuits and amplified 247 times, and 25 Hz were cut off by a low-pass filter. A Bessel filter was used as the low-pass filter, and its group delay was within 3% error from 0 to 20 Hz. After that, the signals were converted to digital data by 12-bit analog-to-digital (A/D) converters, gathered to the memory on the IC board, and sent to a PC using serial communication. The baud rate of the serial communication was 921.6 kbps, and the sampling frequency was 333 Hz. This circuit individually managed four force sensors, and we prepared four sets of circuits for measurement. Lithium batteries were used as the power supply. The dimensions and the weight of the four sets of circuits were 85 × 106 × 108 mm and 660 g, respectively. The total weight of the backpack with four sets of circuits, two batteries, and cables that was carried by the subject was 1100 g.

**Figure 3.** Shoe with three-axis force sensors and appearance of the person who wore the measurement system.

We compared the sum of the sensor read and the body weight when the subject was conducting a static single-leg stand. The result of the former was 528 N, and the latter was 600 N. This indicates that the sensor read has around 10% error.

#### **3. Experimental Results and Discussion**

Using the proposed system, the three-axis GRF force distribution was measured when a subject walked on a treadmill. The subject was a healthy man whose height and weight were 168 cm and 61.2 kg, respectively. The walking velocity was a comfortable speed for the subject, and the analyzed data were averaged over the results from ten steps. The angle of the subject's toe-off was 5 degrees counterclockwise.

Figure 4 shows the change in vertical forces during the foot contact phase. The durations of the single support phase and double support phase were calculated using the foot contact time and walking cycle. As shown in this figure, the contact area under the foot shifted from the heel to the toe. At the time of first contact between the left foot and ground, the entire heel area and the bottom of the forefoot area made contact nearly simultaneously. At the beginning of the single support phase, the base area of the toes contacted the ground gradually. Subsequently, the heel area lifted halfway off the ground through the single support phase. At the beginning of the right foot contact, the base area of all the toes were lifting off, and the toe area was in contact with the ground.

**Figure 4.** Change of vertical forces during the contact phase of the left foot.

Figure 5 shows the changes in the vertical force distribution over time during the foot contact phase and the center of pressure (COP) trajectory on a schematic of the relation between the location of the force sensors and the foot bone. The vertical forces are represented using a color scale: 0 N to 5 N is white, 5 N to 150 N gradually changes from purple to red, and over 150 N is red. The contact phase of the left foot was divided into seven parts, and the time of each data point is shown with the percentage of foot contact time. The time interval is 0.12 s, and the COP trajectory is plotted at 0.06 s intervals.

At the moment of left foot contact, the vertical force was distributed outside of the heel first and then received by the heel and forefoot (Figure 5i,ii). In this case, the largest vertical force was applied to the back area under the base of the third toe. After the beginning of foot contact, vertical forces at the forefoot were increasingly greater as the subject's total mass moved forward (Figure 5iii,iv). Large vertical forces were applied to a small area under the third metatarsal. During the heel-off phase, vertical forces were primarily applied to the forefoot area of the first, second, and third toes (Figure 5v). During the toe-off phase, the area in which vertical forces were applied changed from the forefoot to the toe, and the forces in these areas became small (Figure 5vi,vii). This trend illustrates that the area under the third metatarsal is an important area during straight walking because of the magnitude of the vertical forces applied. Additionally, this area is likely at high risk of injury when this subject walks an excessively long distance because of the concentrated force. During this foot contact, the COP

moved from outside of the heel to between the second and third toes. This trajectory is similar to the results of other research [11–13].

Figure 6 shows the change in the shear force distribution over time during the foot contact phase and a schematic of the relation between the force sensor location and foot bone. The shear forces applied from the ground to the shoe are indicated by arrows from the center of each embedded force sensor. The strength of the shear force is represented by the length of the arrow, and the direction of the shear force is represented by the direction of the arrow. The time interval of the presented shear force data is 0.12 s. The contact phase of the left foot is divided into seven parts, and the time of each data point is shown as a percentage of the foot contact time.

**Figure 5.** Result of the vertical force distribution measured by the proposed measurement system. COP: center of pressure.

At the moment of left foot contact, a braking force was applied to the heel area and the center back area of the forefoot (Figure 6i,ii). A braking force of approximately 10 N was evenly applied to those areas. After the beginning of foot contact, the main area where the shear forces were applied moved from the heel to the forefoot. Although the shear forces applied to the heel area decreased in amplitude, the forces applied to the area under the base of the first, second, and third toes increased (Figure 6iii,iv). The shear forces in different areas had different features. The shear forces from the center back area of the first, second, and third toes were in the medial direction, indicating that this area plays a role in translating the center of mass from the left foot to the right foot. The shear force under the base of the third toe was in the anterior direction. Thus, the propulsive force is generated from this area. Additionally, these shear forces generated a torque in the counterclockwise direction along the vertical axis. During the heel-off phase, the area under the base of the big toe received a shear force toward the right foot (Figure 6v). The subject's center of mass also moved toward the right foot due to this shear force and toward the center back area of the first, second, and third toes. During the toe-off phase, propulsive forces were applied under the first, second, and third toes (Figure 6vi,vii). Under the big toe, a propulsive force was applied in the medial direction. Conversely, the propulsive force under the third toe was applied in the direction of the long axis of the shoe.

**Figure 6.** Result of the shear force distribution measured by the proposed measurement system.

These results imply that the features of the shear forces during straight walking differ depending on the location of the sole for this subject. The heel area received braking forces early during straight walking, and the area under the first, second, and third toes generated propulsive forces at the end of walking. Although these features correspond well with previous knowledge of bipedal walking, three features from the forefoot area are observed halfway through straight walking [14]. The first feature was observed under the fourth and fifth toes. In this area, small shear forces were applied during straight walking, indicating that this area has a low degree of importance when the subject is walking straight. The second feature was observed under the base of the big toe. This area helped move the center of mass toward the other foot, and the generated shear force was directed to the right foot while the area was in contact with the ground. This result occurred because the momentum of the center of mass toward the left decreased and changed the momentum toward the right. This result

indicates that the forefoot is separated into two areas: the area moving the center of mass forward, and the area moving it toward the other foot. The final feature was observed at the base of the third toe. Whereas the front area of the base of the toe generated a propulsive force, the back area of the base of the toe generated a braking force. This feature is assumed to be caused by foot motion in the shoe while walking. When a human is walking, the foot makes a gripping motion in the shoe. During this motion, the toes move as though they are scratching the ground, and the remaining area of the foot holds the ground. The joints between the metatarsal bones and phalanxes support this motion, and the area under these bones becomes the border where the direction of shear force changes. This observation is presumably why the directions of the shear forces are different between the front and back areas under the base of the third toe.

There are several advantages to our method. First, in our method, it is easy to increase the number of force sensors because of their small size. In walking research, several studies have focused on shoes with wearable force sensors [15–18]. In these studies, two-to-six three-axis force sensors were attached to the sole of the shoe, and the researchers attempted to measure the three-axis GRF distribution, even outside the laboratory. Although the results demonstrated the potential usefulness of these proposed measurement systems, the number of force sensors still requires improvement. When using only two-to-six force sensors, the distance between the sensors is rather large. Thus, a large space under the foot is covered by one force sensor, and the spatial resolution of the measured force data is low. This drawback could produce a deficient walking analysis of the three-axis force distribution. To increase the number of force sensors under a foot, where space is limited, the force sensors must be smaller. Our force sensors were fabricated using MEMS and miniaturized. Therefore, we can attach 16 force sensors to a shoe, which is more than twice the number of sensors used in previous studies.

Second, in our method, even when many force sensors are attached to the shoe, the weight of the shoe is approximately maintained because only a small amount of metal material is used. In previous studies attempting to measure three-axis GRF distribution while walking, either the force sensors were made of metal or metal plates were used to fix the force sensors to the soles of commercially available shoes [15–18]. The use of metal could cause subjects discomfort, because such a hard and heavy material is not normally found in shoes. Additionally, if researchers want to attach many force sensors to a shoe, the weight of the shoe will become heavy and may affect the subject's locomotion. To mitigate the drawbacks of using metal as much as possible, our fabricated force sensor was made primarily of rubber, which is in the hardness range of commercially available shoes. By using rubber, our force sensor is lighter than those used in previous studies. Thus, subjects who wear our shoes will experience under-foot sensations that are nearly the same as those before the force sensors were attached, and their locomotion will be largely unaffected.

According to its design, our method can measure a higher three-axis GRF distribution resolution, even outside the laboratory. Our shoes enable data to be collected even in locations where it is difficult to place force plates. The obtained three-axis distribution data cannot be collected using a pressure sheet. Compared to other proposed shoe measurement methods, the shape and weight of the shoe are further preserved, and our proposed method obtains the highest data resolution.

An important observation from this experiment was that the individual features of the area under each subject's sole can be discussed from the measured three-axis force distribution. From the resulting shear force distribution, there are several suggested roles for the forefoot area: a low contribution area, a propulsive force generating area, and a medial force generating area. Additionally, this result implies that the area under the third metatarsal is an area of concentrated applied vertical and shear force. These features observed during our walking experiment cannot be accepted as a common feature of human gait at this time. However, this measurement system can obtain a subject's three-axis force features applied to definite sole areas, and the measurement of the three-axis force is meaningful for walking analysis. In the future, common three-axis force features during walking will be collected using more subjects.

#### **4. Conclusions**

A three-axis GRF distribution during straight walking was measured using a shoe with three-axis force sensors. Sixteen small three-axis force sensors were fabricated with rubber and sensor chips. The weight of the three-axis force sensor with a rubber frame was 15 g. The force sensors were attached to the removable sole of the left shoe using a hook-and-loop fastener. The data from these 16 force sensors were collected at a sampling frequency of 333 Hz. Our shoe measurement system was used to measure the three-axis GRF distribution during straight walking on a treadmill. From the experiment, extensive three-axis-GRF-distribution data were obtained for the first time. The local roles under the foot were revealed by analyzing the distribution data: the heel received a braking force during the foot contact phase, the base of the fourth and fifth toes applied a small three-axis force during straight walking, the base of the second and third toes applied a strong vertical force and propulsive force from the mid-standing phase to the take-off phase, and the base of the big toe applied a medial force from the mid-standing phase to the take-off phase. These results indicate that the applied three-axis GRF force differed at different locations under the foot. Therefore, measuring and analyzing the three-axis GRF distribution during walking has enormous significance, and may yield new insights for walking researchers. In the future, we believe that this measurement system and the results of this study will be crucial for further detailed analyses of bipedal locomotion.

**Acknowledgments:** This work was partially supported by JSPS KAKENHI Grant Number 25000010.

**Author Contributions:** All authors contributed to the design, data analysis and preparation of the manuscript. Masataka Hori and Akihito Nakai fabricated the sensor, established the shoe measurement system and performed the experiments. Isao Shimoyama planned and supervised the project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
