1. Introduction
Dynamic balance refers to the ability to maintain equilibrium while performing actions that include movements of the center of mass outside of the base of support. In particular, the Y Balance Test (YBT) is a test for assessing dynamic balance control, which has been widely used in clinical practice and research [
1]. For example, YBT has been used for determining a person’s risk for injury [
2] or return to sport readiness [
3]. This test assesses performance during single-leg balance while reaching in three directions (anterior, posteromedial, and posterolateral). The traditional method for scoring the YBT is the normalized reach distance (NRD), which is obtained by measuring the distance an individual can reach in each of the three directions, normalized by leg length. Inertial measurement units (IMUs) are now being used to capture movement quality during the reaching tasks, providing a more sensitive approach to measuring dynamic balance performance. These IMUs provide a new opportunity to estimate the NRD directly from the sensor data and score the YBT, by developing a fully automated system. Although the NRD formula and setup are easy to assess, obtaining the NRD with the equipment requires the subjects to attend to the physiotherapist or sports center to perform the YBT. The equipment is cumbersome to move. Evaluating the YBT automatically using a back inertial sensor facilitates its evaluation, allowing subjects to supervise their evolution without the need of attending the sport center neither the help of the physiotherapist. The inertial signals could be used for feeding a deep learning architecture. The architecture output generates an estimation of the NRD, allowing to score the performance of the YBT.
Wearable inertial sensors have been widely used in sports science and medicine postural control applications. Postural control assessments have been frequently used for performance testing [
4], injury risk screening, injury rehabilitation, and assessment of readiness to return to play [
5]. Previous studies have been focused on evaluating the validity and reliability of assessment protocols in laboratory environments [
6]. For example, a previous work [
7] demonstrated that inertial sensor-derived 95% ellipsoid volume (95 EV) measure could capture alterations in dynamic balance control, which were not detected by traditional reach distances alone, and distinguish pre-fatigue and post-fatigue dynamic balance control for all three reach directions. In addition, another previous study [
8] analyzed inter-session test-retest reliability of quantified YBT variables using a single lumbar inertial sensor, providing a reliable measure of balance performing across all three reach directions between tests performed in two different weeks. The authors analyzed the following YBT variables: NRD, 95 EV, ranges of pitch, roll, and yaw, and root mean square, sample-entropy, area under the curve of the fast Fourier transform and variance of the tri-axial gyroscope and accelerometer signals and magnitudes of the lumbar sensor. Regarding injuries related to sports, another previous work [
2] demonstrated that poor dynamic balance performance, measured by a lumbar inertial sensor during the YBT, was significantly associated with a subsequent concussion injury. The authors used data from an elite rugby union, and they concluded that individuals with poorer balance performance were three times more likely to sustain a sports-related concussion.
Regarding classification tasks related to dynamic balance, a previous work [
1] discriminated between the three YBT reach directions and between pre and post-fatigue balance performed during the YBT. The authors recorded data from fifteen subjects performing YBT on the dominant leg at 0, 10, and 20 min. They used features extracted from a lumbar sensor and a random forest classifier. They obtained 97.80% of accuracy, 97.86 ± 0.89% of sensitivity, and 98.90 ± 0.56% of specificity for the reach direction classification task. Regarding fatigue, “normal” and “abnormal” balance performances were classified with an accuracy of 61.90–71.43%, sensitivity of 61.90–69.04%, and specificity of 61.90–78.57% depending on which reach direction was chosen.
Regarding regression tasks related to dynamic balance, a previous study in the literature [
3] proposed and evaluated a machine learning approach based on the k-nearest neighbour (k-NN) algorithm and dynamic time warping method to estimate the NRD over a dataset with 29 young healthy adults [
9]. This study used data from 21 subjects for training and validating the model and data from the remaining eight subjects for testing it. The authors observed that the
Z-axis from the lumbar accelerometer was the most informative signal. This previous work used a 10-fold cross-validation for the training and validation procedure and evaluated the final model over the eight unseen subjects. The results reported a mean absolute percentage error (MAPE) of 6.24% and 8.02% over the training and testing subsets, respectively. Another recent work [
10] studied kinematic and kinetic predictors of YBT performance for each direction using data from 31 healthy subjects. The authors built a stepwise regression model with specific variables such as flexion or rotation of the knee, hip, ankle, or torso. Knee flexion and torso contralateral rotation explained 45.8% of the variance in anterior reach direction, the combination of hip flexion, ankle dorsiflexion, and external rotation explained 76.9% of the variance in posteromedial reach direction, and hip flexion and pelvis contralateral rotation explained 69.6% of the variance in the posterolateral reach direction. The conclusions of this work remarked that hip and knee joint moments in the sagittal and frontal planes were critical for YBT performance.
Deep learning algorithms have been widely integrated into human activity recognition (HAR) systems [
11] and have outperformed traditional machine learning techniques. Some of these deep learning architectures are composed of convolutional layers which could capture spatial and temporal dependencies of inputs through convolutions with filters. Other architectures are composed of recurrent layers which could learn the evolution of a sequence through internal memory cells. Within HAR systems, gesture recognition has been achieved using these architectures or a combination of both [
12,
13]. This way, motion during a YBT could be considered as a gesture, so deep learning approaches could generate a robust model based on the performed movement during the YBT excursions. To the best of the authors’ knowledge, in the literature, there is not a previous study using deep learning algorithms (recurrent neural networks) for estimating YBT distances. This would be the most important contribution of this paper.
This paper addresses the challenge of automatically estimating the NRD of the YBT. The main contributions of the paper are the following:
Analysis of the YBT NRD estimation task using data from a wide variety of subjects.
Proposal and evaluation of a deep learning approach to estimate the NRD of the YBT by modeling the temporal pattern of the movement. In this analysis, several normalizations and input formats were analyzed. This approach automatically evaluates the YBT using a back inertial sensor, allowing subjects to supervise their evolution without the need of attending the sport center neither the help of the physiotherapist.
Comparison of two approaches for NRD estimation: creating a unique robust model for all directions or building specific systems to estimate the NRD for each direction.
Description of a detailed analysis of correlations between real and estimated NRD.
This study was performed over a dataset with YBT from 407 different subjects using a subject-wise cross-validation strategy. To the best of the authors’ knowledge, this dataset is the biggest in the literature. Moreover, a subset of this dataset was used for comparison with previous results.
2. Materials and Methods
This section describes the YBT, the dataset used for the experiments, the signal processing techniques, and the deep learning approach based on LSTMs.
2.1. Y Balance Test and Collection Protocol
The YBT is an instrumented substitute of the Star Excursion Balance Test (SEBT), efficient for measuring dynamic postural control. The YBT is a clinical assessment that is traditionally scored by measuring the reach distance [
1] (NRD). This distance provides an objective measurement.
The YBT consists in switching from an initial bilateral to unilateral stance and maintaining controlled balance while using one leg to perform a maximal reach excursion with the non-stance limb in the three standardized directions [
14].
Figure 1 shows an individual performing a YBT excursion, a diagram of the anterior, posteromedial, and posterolateral directions and the location and orientation of the lumbar sensor. The subject must return to the starting bilateral stance in a controlled way. A trial is considered a failure if any of these situations occurs: the subject removes his hands from the hips, contacts the ground, uses the block for support, raises the stance leg heel, or kicks the slider forward for extra distance.
2.2. Dataset
The dataset contains YBT recordings from 407 subjects (aged 23.1 ± 6.6 years; height 179.8 ± 42.1 cm; weight 89.3 ± 21.1 kg; left leg length 96.6 ± 7.6 cm; right length 96.9 ± 6.4 cm). The dataset contains data from 407 subjects from different cohorts: 107 professional Rugby Union athletes, 32 Intercounty Gaelic Football athletes, 104 young healthy adults (18–40 years), 18 healthy middle-aged adults (40–64 years), 97 NCAA Division 1 American football, and 49 NCAA Division 1 ice hockey players. All participants were healthy subjects and they self-reported no musculoskeletal or neurological impairments at the time of testing. In this sense, the dataset offers data from a wide variety of population and some athletes from different sports. Data were collected in a standardized manner which has previously been detailed in the literature ([
2,
7,
8,
15]). The university research ethics board granted ethical approval, and all subjects gave informed consent before the completion of the testing protocol.
The participants were informed about the YBT, and they performed several practice trials before the data collection. Each session consisted of three YBT excursions in three different directions (anterior, posteromedial, and posterolateral) and with the two legs in a randomized order. From each subject, 18 recordings were obtained per session: 3 YBT excursions × 3 directions × 2 stance legs. However, some of the YBT samples were missing in the dataset. The YBT reach distance and sensor data were collected for each excursion. Data were labelled by measuring the reach distance over the experimental platform. First, the researchers measured the leg length as the distance from the anterior-superior iliac spine to the most distal aspect of the medial malleolus [
15]. Second, once the subject reached the distance by moving the slider over the platform, they write down the distance and normalized it using the subject’s leg length using Equation (1).
The total number of samples included in this dataset is 7262 (2427 from the anterior direction, 2411 from the posteromedial direction and 2424 from the posterolateral direction), corresponding to 407 subjects.
These subjects were wearing a single inertial sensor (Shimmer3, Dublin, Ireland), which provided an accelerometer, gyroscope, and magnetometer in three dimensions. The sensor was mounted at the level of the fourth lumbar vertebra, in line with the top of the iliac crests and secured using a custom-made elastic belt to closely match the acceleration of the body’s center of mass during the YBT excursions.
Figure 1 shows a subject wearing the belt where the lumbar sensor was attached, and the
Z-axis pointed backwards. The inertial sensor was connected via Bluetooth to an Android tablet (Galaxy Tab 2, Samsung, Seoul, Korea) operating a custom-made application and configured to collect tri-axial accelerometer (±2 g), tri-axial gyroscope (±500 °/s) and tri-axial magnetometer (±1 gauss) data at a sampling frequency of 51.2 Hz during each YBT reach excursion. The Shimmer3 sensor was calibrated prior to data collection following the standardized procedure outlined by the manufacturer [
16]. These data acquisition parameters were defined based on pilot testing and previous work investigating the utility of inertial sensors in the evaluation of exercise technique and balance [
1,
7,
17]. Shimmer sensor is a general-purpose device but could be used for clinical postural balance purposes since it provides accurate measurements of inertial signals. In this work, we used the
Z-axis from the lumbar accelerometer as suggested by [
3]. This previous work concluded that this signal was the most informative one and no improvement was obtained when including additional signals. We observed that
Z-axis acceleration signal have a greater standard deviation during the YBT (mean 5.07, std 2.85 g) compared to X (mean −0.09, std 1.64 g) and Y (mean 7.57, std 2.17 g) axes. A higher variability can provide more information about the movements and about the NRD. This aspect has been verified evaluating the system with other signals.
Figure 2 shows Z acceleration of YBT excursions from subject 100403 in the three directions (anterior in red, posteromedial in green, and posterolateral in blue). In this figure, it is possible to observe a common pattern of the YBT excursions: a gradual increase of acceleration and final braking. In addition, the figure shows that the acceleration signal reaches higher values for directions posteromedial and posterolateral compared to the anterior direction.
These excursions have different lengths: from 113 to 648 data-points (2.2 s to 12.7 s), with a mean of 324.83 data-points (6.3 s) and a standard deviation of 115.84 data-points (2.3 s).
Figure 3 shows the histograms of YBT excursions of the dataset depending on duration. The specific histograms for each direction show that YBT excursions of the anterior direction usually last less time than posteromedial and posterolateral excursions. These histograms also show that YBT excursions in all directions could last until approximately 12 s.
Figure 4 shows the histograms of YBT excursions of the dataset depending on the NRD. The histogram in the YBT excursions of all directions shows values between 40 and 157, and it suggests that there exist two Gaussian profiles.
The specific histograms for each direction show that YBT excursions of each direction have a different mean and standard deviation of the reach distance: 59.65 ± 6.73, 104.63 ± 8.66, and 100.27 ± 9.10 for anterior, posteromedial, and posterolateral, respectively.
2.3. Signal Processing and Deep Learning Approach
HAR systems typically use a general-purpose framework [
17] with several modules for activity recognition tasks, which could be extended for regression tasks. This framework contains two main modules: a signal processing module that extracts the features or transforms the signals and a machine or deep learning system that models and could estimate a specific measure for each sample. This general-purpose framework of HAR systems could be adapted to our regression system.
Figure 5 presents the sequence of modules for the HAR framework used in this work, mentioning the outputs of the intermediate modules.
The raw signals were normalized considering several possibilities. The Z-axis lumbar acceleration signal was normalized by the mean along with all samples in each example (by examples), all samples of the excursions in a specific direction (direction), or all samples of all excursions from the same subject (subject), respectively
As mentioned, the length of the time series is different for each YBT excursion while the inputs to the deep learning architecture have a fixed size. Because of this, after normalizing the signal, zero padding was used at the beginning of the signal to have examples of the same duration: 650 points, which correspond to 12.7 s. These initial zeros did not affect the system performance because recurrent neural networks could learn this type of patterns, obtaining information about the YBT duration.
After zero-padding, we extracted features. In this process, we evaluated two possibilities: using raw recordings directly (leaving to the deep learning architecture the process of learning features automatically), and handcrafted features from YBT sub-windows.
For the raw data approach, to reduce the number of inputs to the deep learning architecture, we selected 100 representative points from the last 500 points of each example after downsampling (filtering and sample selection). This way, we obtained YBT examples of 100 data points. We followed this technique because a sequence of 650 points is too long to be analyzed by a recurrent layer.
Figure 6 shows the complete signal processing for the YBT excursions, showing the process of an example of posterolateral direction from subject 100403.
Regarding the feature extraction process, each YBT example was subdivided into 2 s subwindows with a step of 0.5 s (overlap of 1.5 s) after the normalization and zero padding processes. For all 12.7 s excursions, 22 sub-windows of 186 features were obtained. Afterward, we extracted handcrafted features for each sub-window to learn a temporal model from the evolution of the features through the YBT excursion.
Figure 7 shows the sub-windowing process for a YBT example and the extraction of features of the subwindows.
For features computation, we used a time series feature extraction library [
18] to compute over 60 different features extracted across temporal, statistical, and spectral domains for each of the sub-windows. Barandas et al. [
18] explained the computation details for each feature. The features are grouped into three sets, as shown in
Table 1: temporal domain, statistics, and spectral domain.
Thanks to the Fourier analysis, it is possible to detect fast signals variations that are related to vibrations produced by the exercise stress. These oscillations increase the energy at high frequencies. In addition, slow movements, which could be related to the beginning and the end of the exercise, are associated with energy in low frequencies. For these reasons, we included features from the fast Fourier transform in the spectral domain subset (
Table 1).
The deep learning architecture was composed of a time modeling subnet and an additional fully connected layer for estimating the normalized reach distance (regression). The first subnet modeled the time patterns using recurrent layers while the second part of the network estimated the NRD. The output of the architecture was the estimated NRD for every YBT excursion. We used the mean squared error (MSE) as loss metric and the Adagrad as the optimizer, with parameter-specific learning rates that were adapted relative to how frequently a parameter gets updated during training. The deep learning architecture had two long short-term memory (LSTM) layers with 32 and 16 neurons, respectively, and a final dense (fully connected) layer with one neuron and a linear activation function. The architecture included intermediate dropout layers (20%) after recurrent layers to avoid overfitting during training. These recurrent layers are capable of learning long-term dependencies in sequence prediction problems, extracting the temporal model of each example, and generating a model based on the pattern of the YBT excursion.
Figure 8 represents the RNN architecture based on LSTMs, which was optimized by evaluating the system performance over validation subsets and the structure inside an LSTM neuron. This architecture used a 2 D input considering W sub-windows × M features as dimensions. In the case of using raw signals directly, the dimensions were W excursion length × M = 1 (Z acceleration values). This figure also shows that the LSTM neuron is composed of sigmoid (σ) and hyperbolic tangent (tanh) layers and how it manages the internal memory. x
t, h
t, and C
t denote respectively the input, the output, and the cell state memory of the module, and h
t-1 and C
t-1 denote the output and the cell state memory of the previous module. This architecture has been implemented in Python using Keras with Tensorflow as backend. In the experiments, other tools like sklearn have been used.
Table 2 details the different layers of the architecture. The deep learning architecture was separately optimized regarding the different input formats: raw data and features. The final architecture was the same in both cases, which was optimized using a validation subset: the best performance over the validation subset was obtained using a learning rate of 0.02, a batch size of 100, and 50 epochs.
4. Discussion
The results obtained in the previous section validate the use of deep learning techniques to score the YBT by estimating the NRD. This approach has been evaluated through a dataset that contains recordings from a wide range of subjects (407 subjects aged from 18 to 64 years). This set of different people could be used to create a robust model that could generalize to other subjects. However, a user normalization of recordings is required to build this general model since each person has distinct energy while performing the YBT excursions.
Regarding the input format, this manuscript evaluated raw and handcrafted features as inputs to the deep learning architecture. Results suggested that the recurrent layers architecture could boost the NRD estimation performance when it was fed with specific features from 2-s sub-windows of the YBT excursions instead of leaving the network directly learns from raw recordings. This aspect could happen in complex tasks, where extracting features for representing the recordings is worthy. We also observed that the temporal-domain features were the most informative ones, reaching a similar performance than using the entire set of features.
Training specific systems with data from the same direction allows a better modeling of the regression problem because this training use data with lower variability. These specific systems reach significantly better performance compared to training a unique model for all directions. The supervision of subjects during the YBT collection protocol is crucial to avoid anomalous recordings that could disturb the modeling process. These recordings could hinder the training and they would obtain a worse NRD estimation error.