Next Article in Journal
Evaluation of Neuromuscular Fatigue According to Injury History in a Repeat Sprint Ability Test, Countermovement Jump, and Hamstring Test in Elite Female Soccer Players
Next Article in Special Issue
Early Ventricular Fibrillation Prediction Based on Topological Data Analysis of ECG Signal
Previous Article in Journal
Analytical Research on the Impact Test of Light Steel Keel and Lightweight Concrete of Composite Wall
Previous Article in Special Issue
Blood Pressure Estimation by Photoplethysmogram Decomposition into Hyperbolic Secant Waves
 
 
Article
Peer-Review Record

sEMG Signals Characterization and Identification of Hand Movements by Machine Learning Considering Sex Differences

Appl. Sci. 2022, 12(6), 2962; https://doi.org/10.3390/app12062962
by Ruixuan Zhang, Xushu Zhang *, Dongdong He, Ruixue Wang and Yuan Guo
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2022, 12(6), 2962; https://doi.org/10.3390/app12062962
Submission received: 28 November 2021 / Revised: 24 February 2022 / Accepted: 10 March 2022 / Published: 14 March 2022

Round 1

Reviewer 1 Report

This work examines the effect of using a human subject’s sex as an input feature to machine learning models when performing EMG-based gesture classification. The authors first perform a statistical analysis to establish sex-based differences in EMG signals, then examine differences in gesture classification accuracy, showing greater accuracy for sex-differentiated or sex-labeled datasets across three different machine learning algorithms. There is also some brief examination of motion capture data of subjects’ hands while performing the gestures.

The idea that sex-specific or sex-labeled data allow for higher EMG-based classification accuracy is hardly surprising, due to the significant differences in male/female hand and arm musculature, as well as grip strength (that the authors allude to in [35] and [36]). That said, using sex as a label for EMG classification is not something I have seen in the literature, so the authors’ direct examination of this problem is appreciated, particularly since it was done both directly via the EMG statistics and for classifier performance.

The authors’ initial experimental methods – sensor placement, subject selection, choice of algorithms, signal processing, etc – seems in line with the existing literature. However, the procedures that follow, and the way some of the results are presented, is troubling.

The statistical analysis is unclear and/or potentially inappropriate. Section 2.3 says “The iEMG was examined by six paired sample t-test of six hand motions with the pairs of the same muscle of the right hand (females and male).” If I am correct in assuming that the pairings here are between male and female subjects, I fail to see how the pairings were made (i.e. how specific subjects or trials were paired). The paired t-test is an intra-subject measure, so unless my interpretation is incorrect, this analysis is inappropriate. A two-sample independent t-test could be used instead, or, given the presumably large sample size in this section, an ANOVA may be more appropriate, incorporating the muscles, and potentially a variety of signal features, as independent variables.

The description of the classifier training and testing is also somewhat unclear. The text mentions k-fold cross-validation for hyperparameter selection (page 5), which seems to be done by first pooling all subjects’ data together without grouping by subject, but then control and experimental data are selected with a subject-based grouping. It is unclear to me whether k-fold cross-validation was performed on a subject or pooled level, which has an important impact on whether or not the testing phase was done with data from subjects who are already represented in the training data (which would inflate the accuracy).

Furthermore, while optimizing hyperparameters can be appropriate at the algorithm level (using an optimal set of hyperparameters for all training runs of the same algorithm type), Tables 3 and 4 show that different hyperparameters were used for each dataset. If different hyperparameters are used for the different datasets, then it becomes inappropriate to compare the results against each other directly as the authors do in Figure 6. The conclusion that sex-based features are driving the differences is muddled by the differences in hyperparameters used in each dataset. More broadly, I fail to see why this level of hyperparameter optimization was performed at all, since the idea was to look for sex-based differences. Finding a roughly-appropriate set of hyperparameters for each class of algorithm (kNN, SVM, ANN) and holding those constant would remove hyperparameters as a factor. Alternatively, this could be done without any hyperparameter tuning at all, since that is not a factor of concern.

A related point on Figure 6 and the machine learning section is that unlike the statistical comparison section, this section has no error bars. These may be generated by taking the mean and standard deviation of the accuracy of the testing folds if indeed k-fold cross-validation was used for each classifier and dataset. Statistical testing should also be used to check the differences in the accuracy. With neither error bars nor statistical testing on the results in Figure 6, the results cannot be interpreted, particularly since the accuracies are so close to each other.

Separately, the presence of the motion capture portion of the paper is odd. While I appreciate the potential use of motion capture as a way to verify and characterize gestures, the authors do not incorporate it at all into the analysis of the main question of sex-based EMG differences. Indeed, sex is not considered at all in the motion capture sections, and the discussion of the motions is entirely separate from anything else in the paper. I can only conclude that the motion capture portion of this paper is extraneous and should be removed.

In general, this paper presents a premise that is likely valid based on the literature, and that I hoped would be more seriously considered by work on EMG-based gesture classification. However, apart from the initial subject section, data collection, and signal processing, many procedural errors are made in the latter parts of the experiment that render the conclusions invalid at worst, or questionable at best.

Author Response

We thank you for your careful reading and thoughtful comments on the previous manuscript. We have carefully considered your comments in preparing our revision, which has resulted in a clearer, more persuasive and extensive paper. Our responses to the comments are summarized in the attached attachment.


Please see the attachment


Thank you for all your help.

Author Response File: Author Response.pdf

Reviewer 2 Report

I have following comments to the authors of "sEMG signals characterization and identification of hand movements by machine learning considering sex differences":

1) It is not clear how the data described in section 2.5 "Motion data acquisition" was used in the study. Was it used as additional set of features? Can you provide more details how the data was used in identification of hand movements?

2) What was the training and test data configuration? What percentage of data was used in model training and what percentage was used for assessing precision of hand gesture recognition? Was it the same proportion for all models?

3) Description of your second approach is little confusing. What do you mean by adding sex label? Was it just added to data set and used as an additional feature? Or was it really used as a label in supervising training?

4) I also ask about clarification of sentence in first paragraph of section 2.2: "On the basis of previous studies, 6 surface electrodes (Ag/AgCl) were applied to 6 muscles of forearm and hand: abductor pollicis brevis (APB), flexor digitorum superficials (FDS), brachioradialis (BRA), flexor carpi ulnaris (FCU), extensor carpiradialis(EC) and extensor digitorum communis (EDC)." What previous studies? Can you point to literature references here?

I have also found following misspellings/examples of bad formatting:

1) Last sentence of Introduction: "... have better classification results and higher average prediction accuracy, prediction precision" - what is the difference here between prediction accuracy and prediction precision?

2) Sentences "Figure 1 detailed the experimental procedure for sEMG signal acquisition under six different hand-gestures." (section 2.2) or "In addition, the Grid Search method were used in order to find and realize the best configuration of k-NN and SVM classifiers, so as to achieve the best performance." (section 2.4) are  not fully grammatically correct.

3) There is a problem with commas in caption of Figure 1 inside of round brackets: (RM、TB、TI、FF、HC、FK)

4) Caption of Figure 4 is on different page than the figure itself.

Author Response

We thank you for your careful reading and thoughtful comments on the previous manuscript. We have carefully considered your comments in preparing our revision, which has resulted in a clearer, more persuasive and extensive paper. Our responses to the comments are summarized in the attached attachment.


Please see the attachment


Thank you for all your help.

Author Response File: Author Response.pdf

Reviewer 3 Report

In this study, the authors used machine-learning methods that considered sex differences to identify hand movement patterns in healthy subjects. They acquired surface electromyographic (sEMG) signals from forearm and hand muscles in twenty young individuals. They found significant differences in the right hand upper limb muscles between males and females in performing the six gestures. Thus, they used time features, which had been shown to perform better compared to frequency or time-frequency domain features in previous studies, as input to machine-learning algorithms for the recognition of the six gestures. In detail, two hand-gesture recognition methods that considered sex differences (differentiating sex datasets and adding a sex label) were applied to k-nearest neighbor (k-NN), support vector machine (SVM), and artificial neural network (ANN) algorithms for comparison. Notably, the authors found that considering sex differences improved classification performance. The ANN algorithm with the addition of a sex label performed best in movement classification. Finally, the thumb and index finger motion trajectories were captured with a motion capture system to build a multi-rigid body model.

When discussing their results, the authors emphasized that the use of movement recognition algorithms considering sex differences can improve prosthetic hands and exoskeletons' functionality.

The study is well-conducted. The methods are accurate. The results and the figures are sufficiently clear. There are some minor issues, however, that the authors should better address before the paper could be considered for publication.

-         The last sentence of the Introduction (‘It can be concluded that machine learning methods… prediction precision’) would be more appropriate if inserted into the Discussion. Please modify or delete it.

-         Please better detail in the Introduction the significance of building a hand motion model.

-         Discussion: a brief paragraph summarizing the main study aims and results at the begging of the Discussion, before starting with the subheadings, would improve the paper readability

-         I noticed that several times across the text there are spelling errors. In particular, there is no space between two words, or there is a double space between consecutive words. Please correct.

Author Response

We thank you for your careful reading and thoughtful comments on the previous manuscript. We have carefully considered your comments in preparing our revision, which has resulted in a clearer, more persuasive and extensive paper. Our responses to the comments are summarised in the attached attachment.


Please see the attachment.


Thank you for all your help.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I appreciate the authors’ revisions made from the previous version, and this version is clearly better in some ways. However, there are still serious issues with methodology and results presentation that I believe preclude this work from being publishable. Furthermore, with the sheer number of EMG-based gesture recognition papers already in the literature, I am uncertain whether the originality of this work is significant enough to warrant publication without even with the errors fixed.

 

Regarding the changes to the statistical analysis:

I am glad for the change from t-test to ANOVA, since the former was not an appropriate design. However, the authors’ use of 36 replicates of a one-way ANOVA (presumably one ANOVA per muscle per hand movement) is still inappropriate and unnecessarily increases the likelihood of Type 2 error beyond the intended 0.05. The analysis of the iEMG values is a three-factor experiment, with factors of 1) sex, 2) muscle, and 3) movement. Therefore, a single three-way ANOVA is the appropriate omnibus test, with 2, 6, and 6 levels for the three factors. This would be what allows the authors to claim significance on sex within the experimental design. Since sex is a binary variable, a post-hoc test would not be necessary for that factor, but other factors, post-hoc tests (e.g. Tukey’s test) could be used, which would properly control for the intended alpha level.

Regarding data pooling:

I am realizing now that the authors do not specify the window over which the signal features were calculated, so I assume it was done for varying window sizes depending on the length of the gesture “event”. If this is correct, it is contrary to the typical overlapping “sliding window” approaches used for EMG feature generation, where features are generated for constant-size windows (the variable T, or analogously, N, in Table 1). See: Micera, Silvestro, Jacopo Carpaneto, and Stanisa Raspopovic. "Control of hand prostheses using peripheral information." IEEE reviews in biomedical engineering 3 (2010): 48-68.

My concern about pooling data between subjects would apply after the sliding window is properly used. Because windows are typically so close to each other in time, it would be easy to high gesture recognition accuracy for a k-fold cross-validation because much of the data that surrounds a given point in the test set is already in the training set. For example, if the sliding window was 0.5 seconds long, with 0.4 seconds of overlap between windows, and 10% holdout was used, it would be extremely likely that much of the signal from the test set is actually already in the training set, due to the window overlap. That would be true even if the overlap was less – say, 0.25 s – because each data point in the test set is “surrounded by” (and very similar to) data points in the training set. This problem would be alleviated by grouping data by subject and/or trial before further considering cross-validation groupings. For example, in a 10-trial experiment, a cross-validation with 10% holdout would hold data out one trial at a time, rather than pooling and then randomly selecting. However, all this assumes that the authors were using the standard sliding window feature generation approach, which it seems they are likely not – pointing to an even more fundamental problem in the current work. I suppose there could be argument made for why features should be calculated with varying window sizes, but then the authors would have to defend (or at least present) the choice of start and end points for the gestures, and further explain why such as method is superior to the traditional approach.

Regarding classification algorithms and hyperparameter tuning:

I agree with the authors’ response that some tuning is almost certainly necessary for any practical results, so I’d amend my previous comment about that. Furthermore, the explanation for when cross-validation vs single-split testing was used was informative, and the new Figure 5 is helpful, and does explain the difficulty with cross-validation on the later results. However, given that the crux of the paper is in Figure 5 rather than the hyperparameter tuning, it seems to me that a much better approach would be to reverse when cross-validation and single-split testing was used. A small hold-out set could be used to tune hyperparameters, and then the rest of the data could be used as a cross-validation set so that each bar on Figure 5 could contain multiple data points, allowing for error bars and statistical testing. The use of single-split testing and the lack of error bars on results is a common issue in machine learning papers, but nonetheless must be addressed for the authors’ key results in both the text and figures of page 11 to be meaningfully interpreted.

 

Regarding the level of impact of this work:

Like I mentioned at the beginning of this review, I remain concerned about the “significance of content” of this paper, in the sense of whether or not it is sufficiently impactful for the field. There are many EMG-based gesture recognition papers in the literature. This paper is potentially impactful because it tries to confirm the importance of using sex as an indicator for EMG-based gesture recognition, something that I imagine many people have thought might be important, but none have specifically confirmed for this application (that I have seen). It seems to me that in addition to fixing the previously-mentioned methodological errors, a more impactful version of this paper would delve deeper into the nuances of the paper’s primary hypothesis. Rather than simply saying “sex differences matter for EMG gesture recognition,” further exploration could consider what the physiological drivers of such a difference are. Between the introduction and the discussion, there seems to be just a single sentence that alludes to what these “sex differences” are (towards the end of page 2), and they consider kinematics and EMG rather than underlying physiology. If the authors wish to have a more impactful paper, it would be helpful to consider some of these physiological differences – e.g. fat, musculature, weight, skin properties, etc and how they cause the observed differences, and at least discuss them, if they cannot be retroactively measured for this subject pool.

Author Response

We are grateful for your careful reading and thoughtful comments on the previous draft. We have considered your comments carefully in preparing our revision, which has resulted in a clearer, more persuasive and extensive paper.

Our responses to the comments are uploaded inside the attachment. Please see the attachment

Thanks again for all your help.

Author Response File: Author Response.pdf

Reviewer 2 Report

I have accept authors response and have no further comments.

Author Response

Thank you for reading and for all your help.

Reviewer 3 Report

The authors have addressed all the questions raised in my previous review. 

Round 3

Reviewer 1 Report

I appreciate the authors’ additional work the this second edit. From the previous two rounds of editing, this paper has, on the whole, improved significantly in quality, in terms of scientific methodology, as well as in the presentation of the work.

It rather pains me at this point then, to say that one of the key checks I asked for has seemingly invalidated the key result of the paper, thus leading me to believe that this paper should either be withdrawn, or change one of its main conclusions for resubmission as a more sound, but less impactful paper.

Regarding changes to statistical analysis:

The changes made here were appropriate, and I appreciate the effort made here to correct from the pairwise t-test performed in the initial work. The way the authors present the work (three-way ANOVA in the main text, and full comparisons in the appendix) is fine. In the text of Section 3, the authors describe the 3-way and 2-way interaction results, but do not mention the main (single-factor) effects – all of which are also significant. This should be added to the text.

Regarding data windowing and pooling:

The addition of the information on the 250 ms window was necessary and appreciated, and the authors’ explanation of how they processed the data removes my concern about pooling.

Regarding impact and further literature on physiology:

The authors’ inclusion of the additional background is good and has strengthened the introduction of the work.

Regarding classification accuracy:

Now this is the main point of difficulty. Once again, I applaud the authors’ perseverance in improving their methodology in this latest edit – at this point frankly showing more rigor than many machine learning results in the literature. I agree with the authors about the difficulties with balancing hyperparameter tuning with cross-validation, and can see the conundrum presented there. However, the newly-added error bars for the prediction accuracy plot confirmed what I had feared from the beginning: the authors’ claim of sex differences insofar as classification accuracy do not appear to be substantiated.

If we examine the figure as presented in the review response, the first thing I will note is that the error bars only extend upwards, and should be edited to extend symmetrically downwards as well. If that is done, then we would see that the error bars between male and female results would overlap in the kNN and ANN cases, and perhaps overlap in the SVM case. Significance may also be checked directly, likely using a t-test. However, statistical tests aside, we can already see that for kNN and ANN (and perhaps for SVM), there is no effect of sex – nullifying claims made about the utility of including a sex factor in EMG-based gesture recognition accuracy (last sentence of Section 4.4, a sentence in the middle of Section 5 incorrect, and a sentence towards the end of the abstract). This weakens the paper’s position to just confirming the significant effect of subject sex on the EMG signal itself (rather than on classification). Ironically, this was something shown to already be present in the literature based on the references added to the introduction in this latest draft.

So after two revisions, this paper is certainly more technically sound and better presented, but in this review process, we seem to have discovered a issue with a primary conclusion. In my opinion, if the journal is willing to accept a publication with a null result (no difference in classification accuracy due to the inclusion of subject sex) as a key finding, then the authors should change the wording in the aforementioned locations in the text to clearly state this null result and proceed with resubmission. I am personally not opposed to the publication of null results, and really believe they should be published more often. Alternatively, if we go by the typical conventions of publication (primarily publishing positive results or refutations of previous positive results) we may simply accept that this paper is, unfortunately, not worth publishing. This is an issue that I will leave to the authors and editors to decide.

In any case, I applaud the authors’ willingness to work with the data and the manuscript, and with my continual poking at details. Even with – or perhaps especially because of – the discovery of the likely null result, this process is a credit to the authors’ willingness to follow the science, and I respect them for the effort.

Author Response

We thank you for your careful reading and thoughtful comments on the previous draft. We have completed the additions to the statistical analysis section and revisions to the cross-validation section.

In addition, we thank you for your expert comments on the algorithms and statistical analyses. These suggestions have made the manuscript more accurate in its methods, more comprehensive in its experiments, and more convincing in its results.

The responses to all your comments are in the attachment. Please see the attachment.

Thank you for all your help.

Author Response File: Author Response.pdf

Round 4

Reviewer 1 Report

Thank you for the edits. The differences in percentages for adding a sex label do not appear to be large, and I somewhat question the operational relevance of a 1% to 2% difference in accuracy for applications like this. Nonetheless, I suppose it may matter in some settings, and at this point, the rest of the paper is fairly thorough and appropriately done.

Back to TopTop