Review Reports - Effects of Different Feature Parameters of sEMG on Human Motion Pattern Recognition Using Multilayer Perceptrons and LSTM Neural Networks

Round 1

Reviewer 1 Report

It feels like LSTM NN is used blindly here. Is there a reason to suppose that time sequences of features are appropriate for such a classifier model based on a sequences? As it was shown in fig. 4, feature signals have impulse nature, and characteristic width of such an impulses is of hundreds samples. Did you set your LSTM NN taking into account such a time specificity of signals? So LSTM model is of a big interests, but while it stays unrevealed here.
"Bayesian Information Standard (BIC)" - maybe "standart" should be replaced with "criterion"?
Figure 5 (a) - local minimums are suspicious here. It’s look like there are some “cursed” numbers of hidden neurons. Is it really the result of averaging of sufficient dataset?
Starting with fig. 5 (b) - Why are the time intervals of some motion patterns of significantly higher priority over other ones?
I don't like how the illustration of human body looks like. Please try to improve it a bit.
This research is devoted to exoskeleton control through the intentions of a pilot. But all subjects didn't wear it and moreover they didn't generate the patterns of intended motions, but they were the patterns of real movements itself. Therefore, EMG data were not collected in terms of wearable exoskeleton. The question now is how such a results may be transferred to more applied situation of wearable exoskeleton?

Author Response

First of all, thank you so much for your comments and suggestions on my article, which greatly promoted the improvement of my article. In response to your modification suggestions, I modified them one by one in the article.

I added the reason for choosing LSTM in Section 3.2 of the article：Because the action patterns of the human body occur in time series, not at time nodes, therefore in theory, LSTM neural network is suitable for EMG signal processing.
I replaced "Bayesian Information standard" with "Bayesian Information Criterion".
I added in the Discussion Section of the article: In this study, the reason for the lower accuracy of the training results might be the data set being not very large. For example, the local minimum in Figure 5 (a) appears to have some "cursed" numbers of hidden neurons. So in our next work, we intend to add more data sets for the training to nullify this problem.
I added the following to the Section “3. Experiments and results” of this paper: Because the EMG acquisition system we designed is wearable, it is not restricted by the sports environment. Data collection is not limited to be carried out in a laboratory environment, so the subjects wore the EMG collection equipment and moved freely outdoors. The duration of each exercise mode was random and appropriate.
I have improved the illustration of the human body.
I added in the Discussion Section of the article: The wearable EMG acquisition device designed in this paper is compact and can be integrated with the exoskeleton of the lower limbs. The identification of human movement patterns by collecting EMG information is to provide ground information as the human body wears the exoskeleton when switching different movement modes to the exoskeleton. The research in this article is the first step in judging human movement patterns and intentions for the exoskeletons. Next, we intend to combine this set of wearable EMG acquisition devices and the data processing methods with lower extremity exoskeletons to aim for intelligent switching of movement modes of lower exoskeleton.

For more detailed changes, please check in the attachment. Thanks again for your help!

Author Response File: Author Response.doc

Reviewer 2 Report

In this work, the authors focus on the feature selection of 6 human movement characteristics estimated according to sEMG signal. Neural networks are employed to perform the experiments in order to assess the effect of these features in the task of recognizing 7 common human motion patterns in daily life.

Although the method seems to be promising, this work is encouraged to clarify according to the following concerns.

Major:

Back propagation neural network
+ This term is repeatedly used in the manuscript but it is not true. Back propagation is an algorithm for training (i.e. optimizing) neural networks, including LSTM. As I understand, the authors want to indicate vanilla networks (or multilayer perceptrons) in this scenario. Therefore, this term needs to be revised.
Removing Mean Absolute Value (MAV) instead of Root Mean Square (RMS) value
+ Since they are similar in shape and phase, why do the authors choose to remove MAV instead of RMS? (MAV's computational cost is lower than the RMS's.)
Dataset in experiments
+ The statistical description of the experimental dataset is not clear (how many total samples, how many splitted subsets, samples for each subset, etc.). The separation of subsets is also ambiguous: the authors mention the splitting of data for training and verification, the reported accuracies are on verification set, but there are "test data" and "test curve" in Fig. 6.
Network in Section 3.1.1
+ Why is the purelin function used for the output instead of softmax as typical classification network? Since the output is not a probability, the loss function should be added to the manuscript to clarify the objective of the network.
Figs. 5(b), 7, 8(b), 10: the corresponding confusion matrices are encouraged to be added in order to provide more details.
A description of the trend term removal in Section 3.1.2 is necessary.
LSTM in Section 3.2: Why do the authors modify/select the activation for input layer?
+ [If only LSTM is used] Typically, a LSTM has multiple activations for the input and each one has its own meaning (related to gates), their modification is thus not encouraged.
+ [If LSTM is only a part of the network] Please describe clearly the whole architecture.
The accuracies of vanilla network and LSTM are incomparable since they have different subset splitting: 70%-30% for vanilla network (Section 3.1) and 90%-10% for LSTM (Section 3.2). Therefore, the authors cannot conclude that LSTM is better (lines 267-268, lines 323-325, lines 417-418).

Minor:

The authors' name needs to be normalized in citation: A. J. Young et al. => Young et al.; C.D. Joshi et al. => Joshi et al.; etc.
"trained an algorithm" (lines 36, 46): the thing which is trained is model(s), not algorithm
Redundancy/duplication: "Multiple... minutes." (lines 97-101)
"The selection... classifier." (line 109) is not clear, it should be revised.
Eqs. (1), (2) and (3) should be synchronized. For example: i - N (instead of n) + 1 in eq. (1); eq. (1) uses i but eq. (2) does not; x_j in line 128 is not in eq. (3)
Is there any special reason for choosing power-of-2 numbers (256 ms, 1024 ms,...) as the length of sliding window?
The description of vanilla network is duplicated many times (lines 198-202, lines 217-221, lines 239-243). They should be removed and replaced by a short sentence indicating the original paragraph (Section 3.1.1).
The contents repeated from the Introduction should be removed from Discussion (lines 357-379).

Author Response

First of all, thank you so much for your comments and suggestions on my article, which is very helpful to improve the level of my article. For each kind of modification comments you give, I will revise them one by one in the article.

1. Your suggestion is correct. I changed the "BP neural network" in the article to "multilayer perceptron".

2. I added the following to the Discussion Section of the article: During the establishment of the feature parameter data set, we found that the data's absolute mean value (MAV) and the root mean square (RMS) have remarkable similarity, so one of them can be deleted for the dimensionality reduction. Generally, we think that the calculation costs of the RMS will be high, but we have deleted the mean absolute value (MAV) instead of the root mean square (RMS). The reason is that, on one hand, we want to change the conventional thinking and find the root mean square (RMS) value, on the other hand, because of its impact in the data processing as the amount of data processed this time is not very large. Of course, it is precisely because of the amount of data this time being not very large, which leads to the final motion pattern recognition rate being lower, and consumes a certain amount of resources. In the process of mass data processing in the future, we will improve this and reconsider feature value selection in the process of establishing feature parameter data set.

3. Regarding the data set in the experiment, a supplementary description has been made in the text.

Multiple subjects wore the system and were collected data for each movement mode. Before the experiment, carefully treat the skin that needs to be in contact with the electrode pads, apply with medical alcohol to clean the skin; and check the surface EMG data to see if the position of the electrode pads is correct. The acquisition time of each sport mode is controlled at 3min and guaranteed to be continuous. After completing an exercise mode, rest for 15 minutes. 70% of the data set is used for training and 30% is used for testing. In 70% of the data set used for training, 90% of the data set is used for training, and 10% of the data set is used for verification.

4. The activation function of the output layer in section 3.1.1 is softmax, which was written as a Purelin function due to carelessness in finishing the article. Thanks for your comments, this is indeed the case, softmax is used as a typical classification network. I have modified it in the text.

5. I added the confusion matrix of Figure 5 (b), Figure 7 (a), Figure 7 (b), Figure 8 (b), Figure 10 (a) and Figure 10 (b) in the appendix of the article.

6. The description of the trend item in section 3.1.2 has been deleted.

7. I added content in the discussion section of the article: Usually, the activation function of the LSTM neural network is Tanh. In Section 3.2.2, the reason why we want to modify the activation function of the input layer of the LSTM neural network is to find a way to improve the recognition accuracy. Therefore, the accuracy rate obtained by training Sigmoid and ReLU as the activation function of the input layer is compared with the accuracy rate of Tanh as the activation function of the input layer. The results show that Tanh as the input layer activation function has better accuracy, and we have verified this fact in practical applications.

8. I have modified the description of multilayer perceptrons and LSTM neural networks. 70% of the data set is used for training and 30% is used for testing. In 70% of the data set used for training, 90% of the data set is used for training, and 10% of the data set is used for verification.

9. For other details, I made revisions one by one according to your suggestions.

For more detailed changes, please check in the attachment. Thanks again for your help!

Author Response File: Author Response.doc

Round 2

Reviewer 2 Report

The authors have revised their manuscript according to the suggestions. However, there are still a few details that the authors should clarify/modify. Please notice that the use of languages other than English is discouraged in the responses.

1) The authors misunderstood my suggestion "6. A description of the trend term removal in Section 3.1.2 is necessary." => I mean to add more details about how the trend term is removed (i.e. how to perform it), not to delete that description.

2) Selection of RMS: the reason that the dataset is not large is acceptable. The authors should remove the other reason that is "to change the conventional thinking" because this is inappropriate in a researcher paper.

3) Regarding to LSTM, the authors revised that "Usually, the activation function of the LSTM neural network is Tanh." => This is unclear because there are 5 activations in a typical LSTM (3 Sigmoids, 2 Tanhs). Please clarify which Tanh is currently mentioned (I guess the Tanh applied on cell state and then multiplied with sigmoid of the input to provide the output?).

4) The use of term "BP network" is still existed in Section 3.1, please revise them.

5) Section 3.2: gradient disappearance => gradient vanishing

*** Checking by a native English speaker would help to significantly improve the writing of the manuscript.

Author Response

First of all, thank you so much for your comments to my article, which is a great help to my article modification!

Regarding the operation description of removing trend items, I have added in section 3.1.2: In order to remove the trend items of the data, the least squares method was used to find the trend items of the original data for each time window data, and then the original data was subtracted from the trend item data, and finally the maximum value and minimum value of the data were normalized.
I removed the other reason that is "to change the conventional thinking".
Regarding the description of the LSTM activation function, I added in section 3.2.2and the Discussion Section: In order to compare the accuracy of Sigmoid and ReLU as activation functions of LSTM with the accuracy of Tanh as activation function of LSTM. We modified the Tanh in the activation functions of LSTM which is applied on cell state and then multiplied with sigmoid of the input to provide the output. Different activation functions were used in LSTM, including Sigmoid, Relu and Tanh.
Indeed, there are 5 activation functions in a typical LSTM neural network, including 3 Sigmoids and 2 Tanhs. I added in section 3.2.2: “In order to compare the accuracy of Sigmoid and ReLU as activation functions of LSTM with the accuracy of Tanh as activation function of LSTM. We modified the Tanh in the activation functions of LSTM which is applied on cell state and then multiplied with sigmoid of the input to provide the output. Different activation functions were used in LSTM, including Sigmoid, Relu and Tanh.”And I also added a corresponding description in the Discussion S
Based on your suggestions, I changed "BP network" to "multilayer perceptrons" in section 3.1.
Based on your suggestions, I changed "gradient disappearance" to "gradient vanishing" in Section 3.2.

We have comprehensively revised the English expression. For more details, please see the attachment. (In the revised article, the yellow highlighted part is the modified content this time, and the green highlighted part is the last modified content.)

Thanks again for your help!

Author Response File: Author Response.doc