1. Introduction
Electromyography (EMG) is a biological signal whose amplitude fluctuates when exercising or contracting muscles. Many researchers have used this property to research and develop devices that are aimed at expanding and recovering human motor function [
1,
2,
3,
4,
5]. Due to its easy design, which does not need a dynamics model and any physical parameters and only uses data, machine learning models have been used in many studies including motion control for artificial hands and gesture recognition using classifiers, and control of walking aids and assistance suits by predicting joint angles, joint angular velocities, or joint torque using regressors [
2,
3,
4,
5]. Linear models such as logistic regression and support-vector machines were first used around 2000, with an emphasis on improving classification performance by the feature extraction method such as mean absolute value, waveform length, and short-time Fourier transform [
5,
6,
7,
8]. However, as classification performance significantly improved with the development of deep learning [
9] that occurred in 2012, research was also conducted to improve classification performance by changing the configuration of the deep neural network [
10,
11,
12]. However, improving classification performance it is limited by the study of the feature and machine learning model alone. Therefore, methods other than feature-extraction and machine learning models are required to improve classification performance.
Data normalization is one of the methods for improving the classification performance of machine learning models and is used in fields such as imaging and biometrics [
13,
14,
15]. Methods that are often used include min–max normalization [
15,
16], which normalizes the value range of the dataset from 0 to 1; and z-score, which normalizes the dataset mean to 0 and standard deviation to 1 [
15,
17]. Even in the field of EMG, the classification performance of models is improved by normalizing signals and features with z-score and min–max normalization [
16,
17,
18,
19,
20]. Although the EMG fields use normalization such as maximum voluntary contraction (MVC) [
21] or maximum voluntary isometric contraction (MVIC) [
22] to enable motor analysis and motor performance evaluation between muscles and subjects, normalization is hardly used in EMG-based motion prediction research. Normalization is thought to be rarely used in motion prediction research for two possible reasons. The first reason is the need to measure the reference value (e.g., min, max, mean, or standard deviation of each EMG channel) to carry out the calibration. It can take 30 s–3 min to use the application, depending on the measurement method. The second reason is that reference values such as the max and mean of each EMG channel fluctuate that are due to various factors such as muscle fatigue, electrode position, and fluctuations in skin impedance [
23,
24,
25,
26,
27]. Therefore, a reference value, once measured, cannot be re-used. This reduces the practical applications of normalization and makes it unsuitable for real-time processing (i.e., online processing). Therefore, we aimed to devise a normalization method that does not require calibration (i.e., measurement of reference values) and that is suited for real-time processing to enable normalization to be used as a means of improving the prediction performance of machine learning models.
We propose a normalization method that uses the sliding-window [
28] and z-score normalization [
15,
17] shown in
Section 2.1. The z-score is a simple normalization method that sets the dataset mean to 0 and standard deviation to 1. Compared to min-max normalization, which uses the minimum and maximum values in the entire dataset, the z-score uses the mean and standard deviation of the entire data, making it less susceptible to outliers. However, the z-score is usually not suitable for real-time processing because the entire dataset needs to be used for normalization, which incurs a time delay. Therefore, we considered combining the sliding-window analysis (SWA) that is used for signal analysis with time-varying parameter analysis. SWA involves analyses that use the signal within a specified window length. It is thought that using a signal of a sufficient length can achieve the same effect as the z-score that uses the entire dataset.
In recent years, research has focused toward enabling other people’s machine learning models to exhibit the same classification performance as machine learning models trained from their own data (i.e., improving generalizability) [
29]. Studies that solve the problem of requiring individually specialized machine learning models by measuring a large amount of data for each user because of individual differences in myoelectric amplitudes have been reported. Methods have been proposed to reduce the required amount of own data by using other people’s data with domain adaptation, which technology enables the use of models that were trained in different datasets, even in datasets with different data attributes [
29,
30,
31]. Such methods include geodesic flow kernel (GFK) [
32], correlation alignment (CORAL) [
33], and transfer component analysis (TCA) [
34], which conducts motion prediction using a machine learning model trained from a different dataset after projecting one’s own data onto the data space, and domain adversarial neural networks (DANN) [
35], which is a kind of deep learning method that trains the model to extract common features across different datasets.
The proposed method normalizes the standard deviation of myoelectric amplitude with individual differences, so it is thought that the influence of individual differences in myoelectric amplitude can be reduced, and the classification performance in the model learned from a different subject’s dataset can be improved. Compared to previous research such as DANN, the proposed method trains machine learning models using data other than one’s own data, so it is superior in that the models do not need to be trained for each user.
2. Methods
2.1. Proposed Sliding-Window Normalization
We propose a normalization method using sliding-window analysis (SWA) and z-score to improve the classification performance of the machine learning model and generalizability (i.e., exhibiting the same classification performance as the own machine learning model in the other’s machine learning model). SWA is used for signal analysis and time-varying parameter analysis using the signal within a specified window length [
28]. SWA enables time series analysis by sliding the window so that when a new sample is obtained, the sliding window replaces the oldest sample with the new sample. The z-score is a kind of normalization method that is used to improve the classification performance of models in machine learning. The features are normalized by setting the feature mean to 0 and the standard deviation to 1 [
15,
17].
The proposed method is a combination of these two concepts and is called sliding-window normalization (SWN). As shown in Equation (1), the mean and standard deviation of the samples in the sliding window are set to 0 and 1, respectively.
where
t is the current discrete time,
Lnorm is the sliding window length,
n is the discrete time number in the sliding window,
EMGi is the
ith processed EMG,
SWN EMGt, n-t+Lnorm is the myoelectric signal to which the
n-t+Lnormth proposed method (SWN) is applied at the
tth, and
mt and
st are the myoelectric mean and standard deviation on the
tth sliding window, respectively. We used the “mean” and “std” functions in numpy in Python.
2.2. Comparison Methods
As comparison methods to SWN, applying z-score and none (without normalization).
2.2.1. Z-Score
Z-score sets mean to 0 and standard deviation to 1 on a dataset [
15,
17]. Here, normalizing train and test dataset are based on train data like Equation (2).
where
t is the current discrete time,
d means the train data or test data,
s is the subject number,
EMGt,d,s is the
tth processed EMG on
sth subject,
Z-Scored EMGt,d,s is the myoelectric signal to which
tth z-score is applied at the
tth processed EMG on
sth subject, and
and
are the myoelectric mean and standard deviation on
sth subject’s training data.
2.2.2. None (Without Normalization)
None apply nothing in the normalization process (
Section 2.4).
2.3. Evaluation Method
This paper evaluates three types of items. The first is the improvement of the classification performance of machine learning models when the proposed method (SWN) is applied (
Section 2.3.1), the second is the improvement of generalizability of machine learning models when the proposed method (SWN) is applied (
Section 2.3.2), and the third is the improvement of the classification performance of the machine learning model when the number of subjects of the model that was trained with different data is increased by applying the proposed method (SWN) (
Section 2.3.3).
Two types of machine learning models need to be trained. The first is the model trained with one’s own data (model type of OWN). The second is the model trained with another person’s data (model type of OTHER).
Section 2.3.1 and
Section 2.3.2 used models OWN and OTHER.
Section 2.3.3 used only OTHER. Performance (OWN) involved dividing the data into training and testing datasets and calculating the performance using the model trained with one’s own training data and own test data. Performance (OTHER) involved calculating the performance using the model trained with another person’s training data and one’s own test data. The training data and test data was created by randomly dividing them into a 1:1 ratio every 10 consecutive trials.
2.3.1. Normalization Evaluation
The evaluation of model classification performance improvement by the proposed method (SWN) was conducted by comparing the “classification performance in the model with SWN (OWN or OTHER)” and “classification performance of the model with z-score or None (OWN or OTHER)”.
Improvements in model classification performance that were due to the proposed model will be indicated by higher performance and lower standard deviation in performance. We consider the model classification performance improved and the research objective achieved when the performance (OWN or OTHER) with SWN applied is equal to or greater than the performance (OWN or OTHER) with z-score or None (no normalization).
2.3.2. Generalizability Evaluation
The evaluation of generalizability was conducted by comparing the “classification performance in the model trained with one’s own data (OWN)” with the “classification performance in the model trained with another person’s data (OTHER)”.
Better generalizability is indicated by higher performance and lower standard deviation. Generalizability is considered improved and the research objective achieved when the performance (OTHER) with SWN applied is equal to or greater than the performance (OWN) without normalization (None) applied, and the performance (OTHER) with SWN applied is equal to or greater than the performance (OWN) with SWN applied.
2.3.3. Evaluation on SWN Increased Number of Subject to Train Model
We investigated whether the classification performance of the model could be improved by increasing the number of subjects used for learning the model. We compared a model trained with nine subjects (OTHER) with a model trained with one subject (OTHER). The model trained with nine subjects (OTHER) was considered better if its performance was higher and its performance had a lower standard deviation.
2.3.4. Evaluation Index
The accuracy shown in Equation (3) was used as the evaluation index for the classification performance of the machine learning model. Accuracy is an evaluation index that can simply compare results with multiple targets.
The Wilcoxon rank-sum test was used for significance tests. The significance level was set for the p-value less than 0.05. The “ranksums” function in scipy.states in Python was used for implementation. The “multipletests” function in statemodels.sandbox.stats.multicomp in Python was used for multiple comparisons. We used the Bonferroni correction as the correction method for the p-value.2.3.5. Machine Learning Model
We chose multi-class logistic regression for the machine learning model, which allows multi-class classification and short training time, to easily confirm the improvement by the proposed SWN. The “LogisticRegression” function in scikit-learn in Python was used for implementation. The parameters were as follows: penalty = “none”, class_weight = “balanced”, and max_iter = 6000. This model transforms the feature that is extracted from EMG (Session 2.4) to elbow-joint movement: rest, flexion, or extension (Session 2.6). The number of models trained was calculated by number of subjectCnumber of subject to train.
2.4. EMG Processing
Before training the machine learning model, the measured EMG underwent preprocessing, normalization, feature extraction, and decimation.
Preprocessing involved the application of a low-pass Butterworth filter (3rd order, 500 Hz), decimation (2000 → 500 Hz), and a high-pass Butterworth filter (3rd order, 30 Hz). We used the scipy.signal “butter” function and “sosfilt” in Python for implementation. Normalization involved the application of either SWN, z-score, or no normalization (i.e., None). The window length for SWN was set at between 100 and 500 ms, with 100 ms intervals, because too long a window length decreases the amount of data. To adjust the amount of data, the data near the beginning of the trial are reduced based on the longest window length. In the case of “z-score and None”, the obtained features did not change even when the normalization window length was changed.
Feature extraction involved the calculation of the following six features to investigate window length for normalization and feature-extraction, for which high classification performance was obtained in previous studies: mean absolute value: MAV (Equation (4)) [
6], mean waveform length: MWL (Equation (5)) [
7], and difference root mean square: DRMS (Equation (6)) [
7] as time-dimension features, short-time Fourier transform: STFT [
5], and stationary wavelet transform: SWT [
8] as frequency-dimension features, and combination of all five features: ALL. STFT involved averaging in the 1–70 Hz (low component), 60–100 Hz (middle component), and 100–250 Hz (high component) ranges and concatenating them (Equation (7)). SWT involved time-frequency conversion using Daubechies wavelet 2 (db2) as the mother wavelet and taking the absolute mean of the wavelet coefficient of level 3 frequency (cD3) as the feature.
where
t is the current discrete time, L
feature is the window length of feature extraction, cat(·) is the concatenation function,
freq is the frequency, MeanSpec is the function that outputs the spectrogram averaged in the time direction, and
bin is the number of discrete frequencies in each of the low/middle/high frequencies. A Hanning window with a window length of 64 samples was used for the STFT window function. The functions in scipy.signal in Python were used for implementation. SWT is a method that improves the position invariance, which was a problem of wavelet transforms (WT), and the same mother wavelet as in WT can be used. The “swt” function in the pywt module in Python was used for implementation. The window length of feature extraction was set between 100 and 500 ms, with 100 ms intervals, because too long a window length decreases the amount of data.
Finally, decimation involved reducing the sampling rate of the features from 500 Hz to 20 Hz to reduce the amount of data and shorten the training time of the machine learning model.
2.5. Data Acquisition
2.5.1. Subjects
The ethics board of the Nagaoka University of Technology approved this study according to the Declaration of Helsinki. The subjects were 10 right-handed 22- to 23-year-old men. The subjects were informed about the experiment in advance and consented to participate in the experiment.
2.5.2. Experiment
The positions of the hands, elbows, and shoulders, and the EMG of the forearm and upper arm muscles, were measured as in the experimental environment shown in
Figure 1A. Subjects performed 12 types of elbow single-joint movements with four different start points and end points as tasks (
Figure 1B). A task involves moving from one of the four points (start point) to one of the other three points (end point). Each trial consisted of pre-rest (2 s), task (2.5 s), and post-rest (0.1 s); 36 trials (12 movements × 3) were conducted in one session, for a total of 10 sessions (i.e., 360 trials). The tasks were randomly selected for each session. The following four rules were also set as the success conditions for the tasks.
- (1)
No exercise during the rest period. Elbow joint angular velocity does not exceed 2 deg./s during the rest period.
- (2)
End the task during the task period. End the task between 0–2.5 s.
- (3)
Place the elbow joint angle at the start point (±2°.) during the rest period and at the end point (±6°.) at the end of the task.
- (4)
Place the shoulder and elbow joints within 3 cm of the initial position between the pre-rest and post-rest.
The position data were measured at three locations, namely, the hand, elbow, and shoulder, using Optotrack Certus, (NDI Inc., Waterloo, Canada, sampling rate: 500 Hz). The EMG was measured at the biceps brachii (×4), brachialis (×1), brachioradialis (×1), anconeus (×1), triceps brachii (outside) (×2), triceps brachii (long head) (×2), and extensor carpi radialis longus (×1), totaling 12 locations, by using Trigno Lab Avanti (Delsys, Natick, MA, USA, sampling rate: 2000 Hz)
2.6. Position Processing
Position processing consisted of noise reduction, work space → joint angle space conversion, elbow joint angular velocity conversion, coding, and decimation to obtain the target (rest, flexion, and extension of elbow joint movement) from the positions of the hands, elbows, and shoulders obtained in the subject experiment.
For noise reduction, we applied a zero-phase low-pass Butterworth filter (2nd order 20 Hz). The Python scipy.signal “butter” and “sosfiltfilt” functions were used.
Work space → joint angle space conversion involved the conversion of the positions of the hand, elbow, and shoulder to the elbow and shoulder joint angles using Equation (8).
where atan2d(
y,
x) is the function that calculates the angle [deg.] from the two-dimensional coordinate position,
and
are the hand position [m],
and
are the shoulder position [m], and
and
are the upper arm and forearm length, respectively [m].
Elbow joint angular velocity conversion involved the conversion of the joint angle to the joint angular velocity using Equation (9).
where
is the elbow joint angular velocity at the discrete time
t, and
is the sampling frequency.
Coding involved the conversion of the elbow joint angular velocity to the target using Equation (10). This target was used as the teacher data for model training.
3. Results
Prior to evaluating the classification performance (
Section 3.2) and generalizability (
Section 3.3) of the machine learning model by the proposed SWN method, we investigated the effects of window length for feature extraction and normalization (
Section 3.1). Thereafter, we investigated the effect of the number of subjects used in model OTHER (
Section 3.4). The chance level of accuracy in all results was 33.3% (3 classes: rest, flexion, and extension).
3.1. Effect of Window Length
In model OWN, we investigated the effect of changing window length for feature extraction and normalization on accuracy.
First, the effect of window length for feature extraction on accuracy was investigated.
Figure 2 shows the results of changing the window length for feature extraction between 100 and 500 ms in 100 ms intervals and comparing the proposed SWN (the window length fixed at 500 ms), z-score, and None (no normalization) for the six types of features.
Figure 2A shows that applying SWN improved accuracy as the window length for feature extraction increased. In contrast, with z-score and no normalization, the accuracy decreased as the window length for feature extraction increased (
Figure 2B,C). We surmise that applying SWN improves the classification performance of the model by lengthening the feature extraction window.
Next, the effect of window length for normalization on accuracy was investigated. We changed the window length for normalization between 100 and 500 ms at 100 ms intervals, and the window length for feature extraction was fixed at 500 ms.
Figure 3 shows the results of calculating with all six feature types. The accuracy fundamentally increases with the window length for normalization. We recommend that the window length for normalization should be selected within the range of 200–500 ms, with the window length that maximizes accuracy being selected.
We also investigated whether there was any synergy between normalization and feature extraction window length, but no synergistic effects were observed. This result is shown in
Appendix A.
3.2. Comparison of Normalization Methods
We investigated whether the proposed method (SWN) would improve the classification performance of the models using all the features (ALL). Research in recent years has been conducted to reduce the pre-data measurement of each user by enabling others’ machine learning models to exhibit the same classification performance as one’s own model (i.e., improving generalizability). Therefore, in this study, we compared the accuracy of SWN, z-score, and non-normalization (None) between a model learned from one’s own data (OWN) and a model learned from other subjects’ data (OTHER). The window lengths for normalization and feature extraction were changed between 100 and 500 ms in 100 ms intervals, and the maximum accuracy was compared. The number of subjects used when training model OTHER was set to nine people.
Figure 4 shows a result comparing the accuracy of the model (OWN or OTHER) with SWN, z-score, and no normalization (None). A comparison between SWN_OWN (accuracy: 77.7 ± 2.9%, blue bar) and None_OWN (accuracy: 56.2 ± 7.1%, orange bar) shows that the mean accuracy of SWN_OWN significantly increased by 21.5% (Wilcoxon rank-sum test,
p < 0.001) and its standard deviation of accuracy decreased by 4.9%. A comparison between SWN_OWN (accuracy: 77.7 ± 2.9%, blue bar) and z-score_OWN (accuracy: 77.2 ± 2.4%, green bar) shows that SWN demonstrated the same performance as z-score on the model type OWN (
p > 0.05). These results show that the proposed SWN can improve the accuracy of machine learning model, much like the z-score when using the machine learning model that was trained from one’s own data. Furthermore, a comparison between SWN_OTHER (accuracy: 63.1 ± 5.1%, blue shaded bar) and None_OTHER (accuracy: 41.4 ± 11.3%, green shaded bar) shows that the accuracy of SWN_OTHER significantly increased by 21.6% (
p < 0.01) and its standard deviation of accuracy decreased by 6.2%. A comparison between SWN_OTHER (accuracy: 63.1 ± 5.1%, blue shaded bar) and z-score_OTHER (accuracy: 54.4 ± 8.5%, orange shaded bar) shows that the accuracy of SWN_OTHER significantly increased by 8.8% (
p < 0.01) and its standard deviation of accuracy decreased by 3.4%. These results show that the proposed SWN can improve the accuracy compared to the z-score when using other’s machine learning models. These two results show the effectiveness of the proposed method.
3.3. Generalizability Comparison
We investigated whether the proposed SWN method would improve generalizability (i.e., other’s machine model would exhibit the same classification performance as one’s own model). A comparison was made between the accuracy of model OTHER with SWN applied that was used in
Section 3.2 in
Figure 4 and model OTHER where normalization was not applied.
From
Figure 4, a comparison between SWN_OTHER (accuracy: 63.1 ± 5.1%, blue shaded bar) and None_OWN (accuracy: 56.2 ± 7.1%, green bar) shows that SWN_OTHER had an accuracy that was 6.9% higher (
p < 0.05) and standard deviation of accuracy that was 2% lower. However, a comparison between SWN_OWN (accuracy: 77.7 ± 2.2%, blue bar) and SWN_OTHER (accuracy: 63.1 ± 5.1%, blue shaded bar) shows that SWN_OWN had an accuracy that was 14.7% higher (
p < 0.001) and standard deviation of accuracy that was 3% lower. These results show that the classification performance of the machine learning model was improved by the proposed SWN, but even a model that used a large amount of other’s data did not improve generalizability to the extent that it was similar to the classification performance using one’s own data.
3.4. Number of Subjects to Train Model (OTHER)
It was shown in
Section 3.2 that applying the proposed SWN method could improve the classification performance of not only the model trained from one’s own data (OWN) but also the model trained from other user’s data (OTHER). Therefore, investigating the extent to which the classification performance of the model (OTHER) could be improved by training the model by mixing the data of multiple other subjects. The number of subjects used for training the model was changed from 1 to 9. The window lengths for normalization and feature extraction were changed in the range of 100 to 500 ms in 100 ms intervals, and the maximum accuracy was compared. All the features (ALL) were used for the classification. From
Figure 5, accuracy for feature ALL for cases of proposed SWN and z-score increased with subjects used in model training. In contrast, the accuracy did not either monotonically increase or decrease with respect to the number of subjects for cases without normalization (None). This implies that high classification performance can be achieved with an increase in the number of subjects by applying proposed SWN in cases that use others’ data.
Next, to investigate whether the increase in the number of subjects had a significant effect, we compared cases with either nine subjects (highest accuracy in
Figure 5) and one subject (lowest accuracy in
Figure 5) used in the training of the machine learning model. The results show
p < 0.01 on proposed SWN,
p < 0.001 on z-score, and
p ≥ 0.05 on None (no normalization). It implies increasing the accuracy by normalization (proposed SWN and z-score).
5. Conclusions
In this paper, we proposed a normalization method (SWN) that used the sliding window and z-score to improve the classification performance of devices using EMG. Applying SWN improved the accuracy by 21.5% compared to the case without normalization. Even when a machine learning model that was trained with other’s data was used, the accuracy improved by 21.6% compared to the case without normalization and 8.8% compared to the case with z-score. These results show that the classification performance of the machine learning model could be improved by the proposed method (SWN). Results of investigating the relationship between the standard deviation and features also show that applying the SWN changed the correlation between the standard deviation and features from that of a strongly positive one to a weakly negative one. This was assumed to be one of the factors that improved the classification performance of machine learning models.
The focus of future studies will be on the following two points. First, we found that the proposed SWN has a higher accuracy than the z-score on the model using other people’s data. However, the proposed SWN has almost the same accuracy as the z-score in the case of using own data. To determine whether the proposed SWN is superior to the z-score, we need to analyze in detail whether it is robust to various data attributes such as measurement day, sensor location, and muscle fatigue. Second, we applied the proposed SWN to the classification model. We need to investigate whether the proposed SWN can improve the performance of the regression model that predicts kinematic parameters such as the joint angle and angular velocity.