1. Introduction
Detecting and predicting human intentions by collecting and analyzing body signals are among the main goals in human–robot interaction [
1]. These challenging tasks and their relevance in daily-living applications are gaining importance, for instance due to the spreading of collaborative robots (cobots) for human–robot cooperation. An accurate and real-time interpretation of the motion intention could ease the achieving of effective human–machine coordination strategies [
2] for interactive robotic interfaces or diagnostic systems [
3].
For a non-invasive detection of body signals several kinds of sensors can be adopted, such as accelerometers [
4], electroencephalography (EEG) [
3], or surface electromyography (sEMG) [
2,
5,
6]. In the last years, investigations on the suitability of wearable sensors for the pattern recognition of human movements have been widely conducted [
2,
3,
7], also evaluating the effect of the sensors positioning on the acquired data [
8,
9]. Wearable sensors assure non-invasive analyses and are compliant to full integration with pre-existing systems or commercially available devices [
1]. In addition, they allow assessing body signals or motion properties (such as acceleration and velocity) to reconstruct an observed movement [
10], overcoming potential inter- and intra-subjects anatomical variability that could affect the measurement quality [
11].
Since movements are subject-dependent, and body signals are sensitive to lack of repeatability [
2], the complexity of the human intention prediction task even increases in specific scenarios, such as the clinical environment, where the pathological subject can behave according to peculiar or unpredictable motion patterns. Within this context, laboratory-based optical systems for the movement analysis are widely adopted for periodical monitoring and assessment of the stroke condition during rehabilitation [
12], since they enable the measurement of multiple bio-signals, recognized as useful in both detecting pathological symptoms and improving the rehabilitation healing rate [
13]. A thorough knowledge of the expected natural behavior and motion patterns in the healthy subject becomes therefore fundamental to perform a correct assessment of the subject condition.
Among all the possible movements affecting the activities of daily living, the reaching task undoubtedly plays a crucial role [
14], given the importance of its functional aim [
15].
Since a defined movement can be performed according to many different strategies [
7,
16], the use of predictive models and machine learning algorithms is particularly suitable to analyze the signals with the purpose of predicting the human movement intention [
2]. Developing effective working methodologies for the processing of body signals becomes therefore necessary, and machine learning techniques (MLT) can face the limit of small data amounts, typical of this kind of applications.
Literature provides several examples of MLT applied to human movement analysis; for instance, in 2014 Romaszewski et al. applied Linear Discriminant Analysis (LDA), Support Vector Machine and k Nearest Neighbor algorithms to identify natural hand gestures [
17]. In 2015, Li et al. discriminated eight different movements of the upper limb exploiting the Random Forest (RF) algorithm for the analysis of optoelectronic data [
18], whereas in 2020, Robertson et al. applied the quadratic discriminant analysis to data acquired with a Kinect camera (© Microsoft Corporation, Redmond, WA, USA) to discriminate between healthy subjects and patients affected by Cerebellar Ataxia [
19].
Furthermore, deep learning (DL) techniques have been applied to the study of gait data through recurrent neural networks (RNN), deep neural networks (DNN), or dedicated DL approaches [
20]. In particular, after detecting the motion through electromyography signals [
21] or wearable sensors [
22,
23], DL techniques have been applied to analyze peculiar movement features. In 2016, Illias et al. [
24] also applied the NN model to discriminate the gait patterns between healthy children and children affected by autism. In the human–robot interaction (HRI) field, Liu et al. [
25] have applied DL techniques combined with data acquired through 3D body skeletons, 2D RGB images, and optical flows, to identify humans’ intention and build a framework of human–robot interaction. A similar application was also developed in 2019 by Li et al. [
26]. In the clinical field, DL techniques have also been applied to detect and study hand movement pattern and its changes, and to determine possible disease. Several researchers have measured this phenomenon using computer vision techniques, such as gesture recognition, analyzing the signal of optical markers placed on the hand joints [
27]. Other researchers apply DL algorithms to measure hand parameters detected with a vision system [
28,
29].
Focusing on the analysis of the reaching movement in both healthy subjects and post-stroke patients, this study aimed to compare the performance of LDA and RF MLT in: (i) predicting subjects intention of moving towards a target or a specific direction (intention prediction), and (ii) detecting if the subject is behaving according to a healthy or a pathological pattern, and if the possible damage affected the right of left hemisphere (health condition detection). Analyzed data were captured with wearable electromagnetic sensors, and only a first section of the acquired signals was exploited for the prediction and detection processes. Further analyses investigate the possibility of detecting with which arm (left or right hand) the motion is performed by the subject, and the sensitivity of the evaluated MLT to variations in the length of the evaluated signal section.
Compared to previous works in scientific literature, this paper presents novelty aspects in the methodological approach. In particular:
- (i).
The same dataset is exploited performing different set of analyses, to assess suitability and performance of two MLT with respect to different purposes;
- (ii).
The performance of LDA and RF are compared for the specific analysis of the reaching movement;
- (iii).
With particular reference to the previous work of Archetti et al. [
30], the analysis of a dataset which also includes data collected from post-stroke patients on one side allowed a more robust evaluation of the performance of LDA and RF as intention predictors, and on the other side enabled the implementation of a new level of analysis, evaluating the performance of LDA and RF as health condition detectors.
2. Materials and Methods
An experimental campaign was designed, and the study was approved by CPP Ile de France 8 ethical committee of Hôpital Am-449 broise Paré (ID RCB 2009–A00028-49, 19 June 2009). The study was conducted in accordance with the guidelines of the 448 Declaration of Helsinki.
2.1. Participants
For the experimental campaign, a convenience sample of 31 subjects was recruited: ten healthy subjects (6 females; mean age: 51 years, range [29;71] years; 1 left-handed) as control group, and 21 patients who experienced a first ischemic or hemorrhagic stroke with cortical and/or subcortical lesions. Among them, three subjects were left-handed and 18 right-handed (9 females; mean age: 48 years, range [20;71] years).
For the pathological subjects, tests were performed at least 3 months post botulinum toxin injection to assure the absence of lingering effects of the toxin on the human body. Exclusion criteria were: (i) patients with shoulder pain, (ii) previous shoulder pathologies, (iii) multiple or bilateral cerebral lesions, (iv) acute algoneurodystrophy, (v) cerebellar involvement or comprehension deficit, and (vi) range of motion of the upper limbs that does not allow the reaching movement. Within the subset of pathological subjects, ten patients presented a right hemisphere damage (RHD), and the remaining 11 a left hemisphere damage (LHD).
Inclusion criteria for the control group were: (i) age over 18 years old, (ii) no previous or current orthopedic or neurological pathology of the upper arm.
2.2. Acquisition Protocol
All testing sessions were performed in the same environmental conditions, i.e., during the morning and in the controlled environment of the Laboratoire de Neurophysique et Physiologie at the Hôpital R. Poincaré, Garches (France). During each session, at first a preliminary trial was carried out to familiarize the subject with the procedure, then the subject was asked to perform six repetitions of unilateral sitting reaching movement: three cycles with the left arm, and three with the right arm. As described in Robertson et al. [
19] and Archetti et al. [
30], the initial condition consisted in the subject with the hand placed on a red cross placed on the table plane in line with the shoulder, the forearm in mid-prone, the elbow flexed to 90°, and the humerus positioned along the vertical direction. In each repetition, the subject was asked to touch the target identified by the operator, among a set of possible pre-defined positions, which depicts combinations of the three directions left, center, and right, of the two quotes high and low, and of the two distances proximal and distal. Although the subject was oblivious to it, the target sequence submitted by the operator was standardized: close-middle (CM), far-internal (FI), high-external (HE), far-middle (FM), close-external (CE), high-internal (HI), close-internal (CI), far-external (FE), and high-middle (HM). The subjects were instructed to touch the target with a provided pointer and return to the starting position, performing the movement with opened eyes. No instructions were given on accuracy and speed, other than touching the target at a comfortable speed.
2.3. Experimental Setup
The subject was seated on a chair adjusted in height to make the table surface at the navel level. A wide strap assured the subject’s trunk to be fixed to the chairback.
A trained operator instrumented the subject with a wrist splint, provided with a pointer to simulate an extended index finger, and with four electromagnetic sensors. The sensors were located on (i) acromion, (ii) upper third of humerus, (iii) wrist dorsum, and (iv) manubrium. During the acquisitions, an electromagnetic tracking system (Polhemus Space Fastrak, Colchester, VT, USA) was used: The system provides the position and orientation of each sensor as timestamped vector triplets (X, Y, Z) and (α, β, γ) respectively, at a frequency of 30 Hz. The assured root mean square (RMS) static accuracy and resolution of the system are respectively 0.8 mm and 0.0005 [cm/cm of range] for the position receivers, and 0.15° and 0.025° for the orientation receiver.
As schematically depicted in
Figure 1, nine targets were located within three planes, each of them orthogonal to the table surface and passing through one of the following directions: the parasagittal straight line emanating from the subject’s shoulder for the middle direction, and the straight lines positively and negatively inclined at 45° with respect to the middle plane, as internal and external directions, respectively.
The distance between target and subject was defined with respect to the length of the equivalent anatomical upper limb, meant as the distance between the position of the sensor located at the acromion and the end of the pointer. Two distances were evaluated: (a) close, corresponding to 65% of the total upper limb length, and (b) far, equal to 90% of the upper limb length. Considering the quote parameter, the six targets in low configuration were placed at 70 mm of height from the table level, whereas the three targets in high configuration were located above the corresponding distal targets, at the same quote of the acromion from the table surface.
2.4. Data Treatment
Linear position and orientation provided by each sensor were imported and elaborated in MATLAB (© The MathWorks, Inc., Natick, MA, USA) environment. Data processing was performed using an Intel® (© Intel Corporation, Santa Clara, CA, USA) Core™ i7-8565U processor (1.80 GHz) on a machine running a Windows 10 Home (© Microsoft Corporation, Redmond, WA, USA) operative system.
The acquired signals were initially trimmed to the actual motion section to define a dataset of coherent data, comparable among subjects and among trials. To detect starting and ending points of the reaching movement, the absolute value of the velocity of the hand was analyzed. The absolute value of the hand position was computed as the vectorial composition of the signal components along the three directions X, Y, and Z, and the hand velocity was numerically evaluated according to a custom two-point derivative approximation [
30]. This signal was then filtered to remove noise with a fourth-order zero-phase low-pass Butterworth filter, according to literature indications [
31,
32]. For the filter, a cut-off frequency of 3 Hz was adopted [
33]. The subject resting condition was defined as the mean value of the first and last ten acquired data samples, each of them corresponding to a time interval of 0.33 s. Starting and ending points of the reaching movement were automatically detected by a custom-developed code, as the first and last time instant, respectively, in which the absolute value of the position first derivative exceeds an imposed threshold. This threshold, initially set to 9 × 10
−3 mm/s, was iteratively reduced by 1 × 10
−3 mm/s, and the estimation of starting and ending point was updated, until the variance of the velocity signal from the beginning to the identified starting point, and from the identified ending point on, were below the threshold itself.
To normalize data in amplitude, anthropometric quantities were computed for each subject from the positions of the sensors on hand, arm, and shoulder. In each trial, the relative distances hand-to-arm, arm-to-shoulder, and shoulder-to-trunk of the sensors were calculated during the subject resting phase. The average values of these nine quantities for each subject were then computed and adopted as reference values for the normalization process.
Finally, the second derivative of sensors position was also numerically computed applying the custom two-point derivative approximation, to simulate data coming from fictitious accelerometers placed on subjects. The resulting signal was then filtered applying the same low-pass Butterworth filter previously described (fourth-order zero-phase, cut-off frequency of 3 Hz).
To identify a feature set for the implementation of the machine learning algorithms four signals were considered: (i) the linear position computed by the sensors position (SP) components, (ii) the modulus of the sensors velocity, i.e., the first derivative of SP, (iii) the modulus of the sensors acceleration, i.e., the second derivative of SP, and (iv) the angular position computed by the measured Euler angles. For the purpose of the features extraction, only a section of the overall signal was analyzed as the observation window (OW). Two different approaches were adopted for the evaluation of the OW size: (i) a subject- and trial-dependent strategy, based on a custom window which computes the observation time using the information on the motion length of the specific trial, and (ii) a generalized approach, based on an average window which exploits the dataset of all the available data, from all the subjects and trials, to compute a fixed OW.
For the implementation of the machine learning algorithms, the minimum, maximum and root mean square values of linear and angular position, velocity and acceleration were evaluated as features from the source signals. For each subject and trial, all the computed features were rescaled to −0.80, +0.80. The LDA and RF algorithms were applied for the data treatment, and only data from the sensors placed on hand and arm were used in both the analyses, since a first set of preliminary evaluations suggested that the data gathered from the other sensors provided negligible contributes. For both the purposes of intention prediction and health condition detection, the algorithms were trained using a subset of randomly selected data. According to the results of a preliminary analysis, the size of those subsets was selected at 85% and 90% of the analyzed dataset, respectively; those thresholds in fact proved to allow a reasonable compromise between computation time and prediction performance according to the expectations of the current purpose. The remaining data were then exploited for the testing phase. For both the algorithms, training time and prediction time were also computed.
2.4.1. Intention Prediction
To evaluate the intention prediction, twenty tests were designed, combining different configurations of data setup parameters, used features, and outputs. As setup parameters, OW evaluation strategies and lengths were considered. For the features, four conditions were evaluated, corresponding to features extracted by different source signals: (i) sensor position (P) and velocity (V), (ii) sensor position, velocity, and Euler angles (E), (iii) sensor position, and iv) sensor acceleration (A).
Table 1 synthesizes the test conditions for each test. Considering the OW evaluation strategy, the first ten tests applied the average window method, and the remaining the custom window approach. Focusing on the OW length, two different cases were evaluated: a size equal to 1/7 and 1/10 of the total time extent of the actual motion section. For all the tests, the main output was the expected position of the target that the subject wants to reach; for 16 tests, additional output was the left (L) and right (R) distinction, meant as with which limb the motion was expected to be performed.
Tests were performed on the complete dataset of all the subjects, and compared with the results provided by the analysis of the subsets of healthy and pathological subjects only. For the comparison, the prediction accuracy of the LDA algorithm was calculated as depicted by Nuzzi et al. [
34,
35]; for the assessment of the RF accuracy, the out-of-bag (OOB) method was adopted. The results were averaged on 200 consecutive tests.
2.4.2. Health Condition Detection
To detect the health condition of the subject, eight tests were designed investigating different combinations of data setup parameters, included features, and outputs. The average window approach was adopted as OW evaluation strategy for all the tests. Two combinations of source signals were considered for the features evaluation: (i) sensors position and velocity, and (ii) sensors position, velocity, and Euler angles.
Table 2 collects the conditions applied in each test. Main output of all the tests was the detection of a healthy or pathological pattern, and for the latter the further identification of the damage location, i.e., left or right hemisphere (LHD or RHD, respectively), for a total of three prediction classes. The additional output of the left (L) and right (R) distinction was also evaluated in four tests, increasing to six the prediction classes for this kind of tests.
Tests were performed on the complete dataset of all the subjects and the prediction accuracy of both the LDA and RF algorithms was evaluated according to Nuzzi et al. [
34,
35]. For the assessment of the RF algorithm accuracy, the OOB approach was also used. Results were averaged on 200 consecutive tests.
4. Discussion
From a methodological perspective, the assessment of the observation window (OW) according to the custom window strategy should be preferred to the average window approach, because it grants a personalized solution which is flexible to subject- and trial-dependent peculiarities. Besides, the average window strategy offers a more robust approach, grounding on a dataset that, if properly populated, can provide probabilistically significant values and statistically sound indications for the OW definition. Nevertheless, in practical applications the time length of a naturally performed movement is typically an unknown quantity, since it cannot be foresight before the actual execution of the movement itself. According to these considerations, the average window approach results particularly suitable for integrations in systems requiring real-time dynamics, such as human–robot collaborations in working environment, whereas the custom window strategy better suits those applications demanding for high accuracy and results customization over timing, such as diagnostic evaluations in clinical environment.
The dualism of this issue reflects in the dual approach adopted for the analysis of the reaching task and of the current dataset. In fact, the prediction of the subject’s intention of moving towards a target among a set of possible choices can be easily contextualized in the industrial daily practice. For instance, in assembly operations a cobot could ease the task execution by foreseeing the worker’s intention of performing an action and approaching or moving the necessary components consequently. The health condition detection on the contrary could support the physician in discriminating the potential pathologies that afflict the subject or in quantitatively assessing the state of the rehabilitation process.
Comparing the overall performances of linear discriminant analysis (LDA) and random forest (RF) algorithms (see
Figure 2 and
Figure 6), LDA allows slightly better results to be reached for the intention prediction. Considering the whole dataset, the accuracy of RF was 79.22% at best (test IP_7), compared to 82.42% for LDA, whereas the average accuracy results were 59.91% for RF and 62.19% for LDA. On the contrary, RF proved to be particularly suitable for health condition detection, allowing accuracies of over 90% in all the tests.
Focusing on the contribution of the OW evaluation strategy in the performance of LDA and RF as intention predictors, the results collected in
Table 3 emphasize that tests performed adopting the custom window method, an OW of 1/7, and considering the whole dataset of healthy and pathological subjects obtained better results, as
Figure 3,
Figure 4 and
Figure 5 depict at a glance. This behavior can be expected if considering that a fixed OW does not allow compensating for possible intra-subjects or -trials velocity variations, and cannot assure the achievement of a minimum amount of travelled space within the analyzed portion of signal. This aspect can be particularly relevant for the analysis of pathological subjects, since their affected movements often induce a slower motion. As a consequence, a feature based on time hinders the potentiality of the method, whereas features based on spatial criteria could in this sense provide more information. Coherently, tests performed with longer portions of signals, i.e., 1/7 of the actual motion length, provide better results. Although the wider the window, the better the obtained result, a proper maximum length limit should be imposed for the OW length definition, since the primary aim of the analysis is a prediction of the motion evolution, whereas too wide OWs would translate into a classification of the movements instead.
In the health condition detection, LDA results less sensitive to variations in the OW length. As
Table 4 describes, differences between the mean accuracies (around 1%) are comparable with the SD value.
According to these considerations, relevant differences in the performance of both LDA and RF can be detected when comparing tests evaluating the complete dataset and tests performed on the subsets of pathological and healthy subjects. For example, RF algorithm presents differences close to 4% and 11% with the two subsets, respectively. The difference with the subset of pathological subjects decreases for tests that use only acceleration as features. The LDA algorithm, as reported in
Table 3, presented similar results.
Also considering the performance of the applied MLT as intention predictors with respect to the evaluated features, the OW length had a decisive influence on the final accuracy of both LDA and RF algorithms in tests using features extracted from Euler angles, sensor position, and velocity: The accuracy improved at least 10% in tests with OW length set at 1/7. The improvement decreased at about 5% (SD close to 1%) when acceleration-related features were included. This behavior could be partially justified if considering that acceleration signals are not measured but computed by double numerical derivation, and this process introduces noise in the signal, influencing the performance. As
Figure 2 depicts, the tests including acceleration-related features (IP4, IP5, IP9, IP10, IP14, IP15, IP19, IP20) presented lower accuracies than the ones without their contribution; better results were obtained when the number of classes was lower, i.e., when the distinction between right and left limb performing the reaching was neglected.
For both LDA and RF, the addition of features computed with data derived from the sensors of shoulder and trunk does not significantly affect the obtained accuracy. In fact, the primary role of the trunk sensor was to validate the hypothesis that the elastic band used to fix the trunk to the chair worked as effective constraint, preventing the subject from unintentional movements. For the shoulder sensor instead, the introduced variation of performances considering the additional features (around 1%) was comparable with the SD amplitude and cannot be distinguished from to the results variability due to the random extraction of samples for the creation of the training and testing datasets. Despite this, the computational burden increased.
The set of included features also affected the results of the health condition detection. As
Table 4 describes, better results were achieved including the Euler angle-related features, i.e., tests HD3, HD4, HD7, and HD8. The best results were obtained with RF in test HD3 (see
Figure 6), in which features extracted from Euler angles were included and a wider OW was considered (OOB accuracy of 97.00% SD 0.0039, and AVG accuracy of 98.21% SD 3.38 × 10
−16). For the LDA algorithm, the accuracy ranged from 65% of HD1, presenting 3 classes (healthy, LHD, RHD), to 83% of HD8, which reached 6 classes discriminating also between the left and right limb. As the confusion matrixes in
Figure 7 depict, regardless of the inclusion of additional classes, the algorithm still preserved its discrimination performance.
Finally, the time factor should be analyzed. Focusing on the training phase, the average time required to train the RF algorithm was considerably higher than LDA in both intention prediction (3.94 and 0.09 s, respectively) and health condition detection (1.5 and 0.05 s, respectively). Nevertheless, the training time for RF was related to the number of trees in the forest, and higher accuracies could be achieved with wider forest. Besides, the improvement in the accuracy tended to reduce as the number of trees increases, i.e., the accuracy profile converges toward an asymptotic condition; a set of preliminary analyses identified in 40 and 20 trees acceptable limits for the intention prediction and health condition detection, respectively (see
Figure 8). For the testing phase, the LDA algorithm revealed shorter times than RF, for both intention prediction and health condition detection; the mean values of estimation time were 3.42 × 10
−4 and 4.65 × 10
−3 s for LDA and RF, respectively, in prediction, and decreased to 3.62 × 10
−5 for LDA and 6.67 × 10
−4 s for RF, in detection.
Although the analyzed dataset included a remarkable amount of data, further acquisition campaigns could improve the quality of the results. For instance, the data sample could be enlarged by subjects’ age and pathological conditions (such as similar elapsed time from the stroke event), allowing for stratified analyses, or improved in quality, e.g., better balancing the presence of right- and left-handed subjects, allowing for functional comparisons. Besides, further data acquisition campaigns could focus on the experimental setup, for instance including new sensors. Adding accelerometers and/or IMU inertial sensors would assure to gather actual acceleration data, allowing an experimental comparison with the results of the tests carried out with the acceleration features. Finally, to better understand the goodness of the models, further data analyses could also estimate indexes useful for the identification of type II errors, such as F1-score or G-index.