1. Background
Spasticity is a symptom of neurological impairment and is prevalent in patients with stroke [
1,
2], multiple sclerosis [
3], cerebral palsy [
4], or spinal cord injury (SCI) [
5,
6]. It is characterized by a velocity-dependent increase in muscle tone during passive stretch [
7]. Spastic movement of upper or lower limbs are measured at periodical intervals to monitor patient progress. However, it remains complicated to correctly quantify spasticity, despite the application of diverse approaches from different academic fields.
Clinical professionals have come up with outcome measures to evaluate spasticity. One of the most common approaches is to apply a clinical scale, such as modified Ashworth scale (MAS) [
8] and modified Tardieu scale [
9]. Although such clinical constructs have been challenged in terms of their reliability, particularly inter-rater reliability [
10,
11,
12], these measures are frequently used in practice as they are simple to carry out. In fact, the most commonly used clinical measures for spasticity are the Ashworth scale and the MAS [
13].
Researchers have attempted to evaluate spasticity employing a variety of sensors to capture physiological and biomechanical signals. The recorded signals are analyzed to derive clinically meaningful indexes and to compare the results with clinical scales. McGibbon et al. [
14] proposed a wearable system that consisted of a fiberoptic goniometer and an electromyography (EMG) sensor with two channels to record kinematic responses and muscle activity during passive stretch-reflex tests of spasticity under elbow flexion and extension. Associations between MAS scores and metrics extracted from kinematic and EMG data that represent the intensity of involuntary reflex have been evaluated. Pandyan et al. [
15] developed a biomechanical measure of resistance to passive movement that incorporates a force transducer and a flexible electrogoniometer for use in a clinical setting for patients with different health conditions, such as traumatic brain injury, stroke, and multiple sclerosis. In addition, applied force, passive range of movement, and speed of the device have been used as indicators of elbow spasticity and compared to the MAS score. Spasticity assessment based on multimodal signals has also been applied to children with cerebral palsy [
16].
Wearable sensors, such as internal sensors, are increasingly being used in rehabilitation studies that explore the possibility of inertial data to assess spasticity of both upper and lower limbs in patients with neurological disorders. Van den Noort et al. [
17] introduced a method to assess the spasticity of lower limbs including the medial hamstrings, soleus, and gastrocnemius in children with cerebral palsy. They determined the angle of catch, which refers to a sudden stop or increased resistance during dynamic joint movement at a certain angle before being fully extended or flexed [
18], from the inertial signals transforming the three-dimensional (3D) orientations of inertial sensors to the 3D joint angle of the lower limb. This method has also been tested on the upper limbs of stroke patients and has demonstrated excellent test-rest and inter-rater reliability [
19]. A similar method utilizing inertial sensors for accurate and reliable assessment of spasticity was proposed by Choi et al. [
20]; they added visual biofeedback to their inertial sensor-based spasticity assessment to provide assessors with additional information on the joint movement of lower limbs in regular passive stretch velocity to improve reliability.
Wearable sensors have been successfully demonstrated in healthcare research. Applications commonly used include continuous monitoring of activities of daily living [
21,
22], gait, and mobility [
23,
24]. Although the increasing use of wearable sensors poses great challenges to data analyses as the sensors record significant amounts of time-series data, the rapid development of data analytic methods has enabled vast amounts of data to be processed, revealing hidden information. Machine learning is a widely used data analytics technique, which uses statistical techniques to learn from the observed data and predict outcomes or categorize observations in unseen data. Many attempts have been made to investigate the efficacy of machine learning for the delivery of rehabilitation services. Yang et al. [
25] developed a hand function recovery system consisting of a smart wearable armband that incorporates surface EMG to measure bio-potential signals and machine-learning algorithms to detect different hand movement patterns, and a dexterous robot hand to mimic the user’s hand gestures. They applied machine-learning algorithms to sensor data to provide interventions, as a promising technology to improve the degree of automation and the quality of intelligent decision making in healthcare service delivery.
Apart from the studies presenting the applications of machine learning as a way of providing interventions, recent evidence has demonstrated the successful use of machine learning in outcome assessments. The performance of a client during a rehabilitation exercise can be classified according to whether he or she performs the given exercise correctly [
26]. For spasticity assessment, artificial neural networks are often applied to learn patterns of biomechanical data recorded by multiple sensors, including force sensors and angle sensors embedded in wearable devices [
27,
28]. Zhang et al. [
29] used regression-based supervised learning algorithms to predict MAS scores based on EMG signals and inertial data a triaxial accelerometer, a triaxial gyroscope, and a triaxial magnetometer.
A large number of studies have been conducted to develop spasticity assessment methods using advanced technologies, with the aim of providing clinical professionals with reliable information related to characteristics of spasticity. However, there is a growth in demand for outcome measures to be implemented in home, community, and nonhealthcare institutions as patients with neurological impairments require continuous rehabilitation to maintain or improve their condition. Even when they are discharged from hospitals, it is necessary to continuously monitor the condition for appropriate rehabilitation treatment. Despite some success, the need to use extra devices to collect the data and the need for a clinician/professional to be present to assess spasticity has complicated remote monitoring of this condition in nonhealthcare facilities. Such instruments also tend to be costly. This has limited the tools available to rehabilitation clients for health monitoring. Therefore, developing low-cost and simple methods for assessing spasticity in remote nonhospital environments without the help of healthcare professionals is of great importance. To address this issue, we propose a machine-learning method to provide information regarding the degree of spasticity of an elbow using a wearable device with inertial measurement units (IMUs).
3. Results
Spastic movements rated by a rehabilitation therapist using the MAS and the number of data samples according to the segmentation techniques are reported in
Table 4. The majority of participants had either no signs of spasticity in their elbow (35.4%) or minimal symptoms (27.1%). There was only one participant who showed the most severe degree of spasticity.
Classification performance was tested using a combination of datasets (DS1 and DS2) with segmentation and feature sets (FS1 and FS2) as shown in
Figure 4.
When DS1 was used for classification, the median accuracy was 75.7%. As shown in
Figure 4a, the median classification accuracies were 73.6% for FS1 and 81.9% for FS2 with DS1. The median value of accuracy increased by 8.3% when an extra 16 features were added to the common statistical features. However, the difference was not statistically significant, as confirmed by the results from a Wilcoxon signed-rank test (Z = −0.944, p = 0.345). The extra features did not have a significant influence on classification performance.
Figure 4b summarizes the classification results, based on the features derived from the data with different segmentation techniques (DS2). The medians of classification accuracy for FS1 and FS2 were 80.8% and 87.9% respectively.
Table 5 summarizes the classification performance obtained from FS1 and FS2, regardless of the segmentation technique. The extra features improved classification by 5%. Although the extra 16 features (root mean square, mean, standard deviation, energy, spectral energy, absolute difference, variance extracted from pitch and roll, and two additional features: SMA and SV) added to the common statistical features (derived from both datasets DS1 and DS2) performed better, there was no significant increase in classification accuracy (Z = −1.784, p = 0.074).
Median accuracies were compared with regard to the segmentation technique. The performance increased by 7.4% when testing features were computed from DS2 (
Table 6). Furthermore, the performance difference between segmentation techniques was statistically significant (Z = −2.701, p = 0.007). This indicates that having a data set segmented with 50% overlap had a significant positive impact on the classification accuracy.
The performance was compared in terms of classifier types. As summarized in
Table 7, the most accurate classifier in this context was RFs, at nearly 95.4% accuracy (91.8% median accuracy), regardless of the type of segment technique applied or the number of features used. This was followed by MLPs and LDA, which classified the severity of spastic movement with approximately 80% accuracy. Our results indicate that SVMs were the least powerful classifier in this study.
Finally, the precision and recall for the best accuracy obtained with RFs using FS2 from DS2 are reported in
Table 8. The classifier worked well overall, with the accuracy ranging from 92% to 100%. However, RFs showed relatively poor performance for discriminating MAS grade 1 and 1 +. MAS score 4 was perfectly classified, although there was only one participant with a MAS score of 4. Perfect classification was also observed for a MAS score 3.
4. Discussion
This study investigated whether grading of the degree of spasticity could be achieved by utilizing machine-learning algorithms and inertial signals collected during passive stretching. In previous studies, spasticity has been evaluated based on data collected from various types of sensors, including EMG, rotary angle sensors, load cells, force sensors, and IMUs [
14,
15,
16,
17,
28]. However, such studies have aimed to provide such information for clinicians or therapists to improve the reliability of spasticity assessment. In addition, having to use multiple sensors requires sophisticated protocols and skills for sensor placement, e.g., preparing skin by removing hair and cleansing with alcohol before attaching EMG electrodes on the surface of target muscles. This kind of approach is generally employed in scientific studies. However, the ultimate purpose of the proposed method presented was to provide clients with a means of monitoring their own health-related status, especially in remote areas where healthcare professionals are unavailable. To accommodate this, a wearable device equipped with a minimum number of sensors was used to collect the signals reflecting characteristics of spastic movements. Specifically, spastic characteristics of the elbow were captured using a wearable device with IMU sensors; inertial sensors have been widely used in similar applications, such as outcome assessment in healthcare domain [
23,
26,
29,
49,
50].
Our proposed method considers spasticity assessment as a classification problem, in contrast to a previous study [
29]. This is because that there is some controversy over whether to use an ordinal or a categorical scale, due to the addition of the 1 + score. It has been argued that the relationship between 1 and 1 + is hierarchical [
51]. In addition, there iso no unique definition of zero on this scale, in contrast to the original Ashworth scale; thus, there is little information on whether the distances between 1 and 1 + and 1 + and 2 are equal [
52]. Due to insufficient evidence for using an ordinal scale, the proposed method grades the degree of spasticity using the MAS as a classification problem.
Within the proposed method, several approaches were applied to identify the optimal combination of features, segmentation methods, and supervised learning classifiers that perform classification of MAS scores with confidence. In terms of segmentation methods, a comparison of two segmentation techniques indicated that allowing a 50% overlap between previous and subsequent signals was a major factor that impacted classification performance significantly. This is consistent with current findings [
34,
53,
54] that have found that the accuracy increases systematically with segmentation methods that allow overlap. Segmentation without overlap loses some information between previous and next segments, which results in a dataset with poor representation. On the other hand, increased performance could be due to not only better representation captured by segmentation with overlap but also a bigger number of datasets available for training machine-learning algorithms.
Feature-wise comparison revealed that FS1 (the commonly used statistical features) was as effective as FS2 (those combined with extra features computed from pitch and roll representation of inertial signals) due to the fact that such increases in classification performance were not enough to be statistically significant. Although FS2 caused rather systematic increases in accuracy for most of the classifiers tested, it imposed a heavier computational burden in comparison to FS1. Therefore, advantages of using the extra features is not clear; the tradeoff among accuracy, response time, and battery run-time should be considered carefully and will likely depend on the priorities of application.
In the proposed method, the key priority was to provide both healthcare professionals and nonprofessionals with clinical indicators, so that they could rate the degree of elbow spasticity in patients by discriminating the different levels indicated by inertial sensors. While other studies have derived biomarkers from EMG, force, angle, and inertial data (e.g., resistance, angle, angular acceleration, and velocity [
27,
29]), the proposed method utilizes different types of features that are widely used in human activity recognition [
53]; the performance results were high, indicating that our approach is feasible.
In terms of classifier performance, statistical analyses confirmed that the machine-learning classifiers tested had no significant influences. Nevertheless, nonlinear classifiers tended to work better than linear ones. In fact, RFs outperformed LDA, SVMs, and MLPs. In addition, MLPs performed relative better than the linear classifiers, with the exception that LDA only outperformed MLP when DS2 was tested. Our findings are in accordance with those of previous studies [
27,
53].
There were a few data samples that represented MAS 3 and 4. However, they were perfectly separated, indicating that the characteristics of MAS 3 and 4 are clearer than other MAS scores according to precision and recall metrics. On the other hand, relatively lower subset accuracy was observed with MAS 1 and 1 +. This is a known issue associated with the MAS. The initial version of Ashworth scale was modified with an addition score 1 +, to discriminate from MAS score 1, as it represents a state that indicates resistance through less than half of the movement. This was introduced to increase the sensitivity of the scale. However, the additional level of measurement (1 +) may have caused poorer agreement between raters, leading to lower reliability than the original Ashworth scale [
51]. The ambiguity of scoring the degree of spasticity may also have led to poorer quality for labeling data for classification. Further investigation with regards to the 1 + point of the MAS is beyond the scope of this study.
The highest classification accuracy of MAS score was achieved by RFs in combination with 50% overlapping for segmentation and a total 58 features. Our findings confirm that the approach proposed is acceptable. Besides, it only requires a wearable device equipped with IMUs, which is the most frequently used sensors for fitness trackers or smart watches. The simplicity of the proposed method is what makes it possible to incorporate the method into tele-rehabilitation applications that deliver rehabilitation services—including rehabilitation interventions [
55], clinical assessments [
56], and consultations [
57]—over tele-communication networks as well as the Internet [
58].
The empirical results reported herein should be considered in the light of some limitations. Increased segmentation overlap leads to higher accuracy in most cases [
53]. However, different percentages for segment overlaps were not examined. It would be worth investigating the impact of the amount of segmentation overlap on classification performance. In addition, the proposed method requires features to be computed from data, as the method employs classic supervised machine-learning classifiers. Recent advances in machine learning, including deep learning, are rapid; a considerable number of studies have reported that deep-learning methods outperform classic machine-learning algorithms. Therefore, it is a reasonable next step to examine deep-learning models and investigate factors potentially beneficial to performance improvement. Although high classification performance was achieved across various conditions tested, it does not guarantee that our method would perform well on a larger dataset due to the limited sample size, e.g., MAS score 3 and 4. Having more data would help the model be more accurate and generalizable. We expect to address such issues by examining larger sample sizes and applying deep learning to enhance the current work.