Next Article in Journal
Active Bearing Technology of Foot Steel Pipe Applied in Controlling the Large Deformation of Tunnels: A Case Study
Next Article in Special Issue
Extraction and Analysis of Soil Salinization Information in an Alar Reclamation Area Based on Spectral Index Modeling
Previous Article in Journal
Numerical Investigation on Thermal Performance of Duplex Nanocoolant Jets in Drilling of Ti-6Al-4V Alloy
Previous Article in Special Issue
Institutional Adoption and Implementation of Blended Learning in the Era of Intelligent Education
 
 
Article
Peer-Review Record

Recognizing Teachers’ Hand Gestures for Effective Non-Verbal Interaction

Appl. Sci. 2022, 12(22), 11717; https://doi.org/10.3390/app122211717
by Zhenlong Peng 1,2,*, Zhidan Yang 1,2, Jianbing Xiahou 1 and Tao Xie 3
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2022, 12(22), 11717; https://doi.org/10.3390/app122211717
Submission received: 5 August 2022 / Revised: 31 October 2022 / Accepted: 14 November 2022 / Published: 18 November 2022
(This article belongs to the Special Issue Technologies and Environments of Intelligent Education)

Round 1

Reviewer 1 Report

This is a promising result, but the paper raises many questions, which it should also answer if it is to be publishable.

1. It is not clear why the paper chooses the classroom setting. The paper is simply about hand posture recognition and could be used in any setting. Restricting the scope to the classroom reduces the potential of interesting readers with other applications in mind and adds unneccesary content to the introduction where a justification of hand gesture recognition in the classroom is described (this section doesn't really explain the need to automate this activity either. What is more, the main datasets used in the study are not from a classroom! The paper would be stronger if the focus on classrooms was removed.

2. The experimental methodology is not clearly described. What were the hyper parameter settings for the various algorithms? For example, what level of dropout was used? What was the train, validation, test split and which split are the results presented for? What method was used to optimise the hyper parameters? Did all methods have the same optimisation regime? Is figure 7 correct? It appears that YOLO3 has the best performance. Also, figure 9 does not match the description in the text - it doesn't have rows, for example. It would be better to see accuracy results here rather than some example images (I'm also not sure why the images have wire-frame lines marked on them).

3. Which part of the proposed method is responsible for the improved performance? Is it the CNN architecture, or the pre-processing?

4. What is the state-of-the-art for the datasets used? Are there any publications that use the same datasets? What was their performance? The comparison summarised in table 4 claims to test robustness against different backlight conditions, but admits that the Marcel dataset does not have much variation in background. Also, in section 4.4.2 I'd like to know how the normalisation affected the performance of YOLO. CNNs are known to be robust to changes in light and contrast, so I would expect YOLO to perform well too,

Across the paper, the references are not formatted correctly - appearing as a number without square brackets.

 

Author Response

Dear Reviewer,Thanks for your comments. Please refer to the attachment for detailed reply.Thanks.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors have presented a method that improves the gesture recognition algorithm by proposing an improved preprocessing approach and CNN architecture. The manuscript reads well and it interesting. The abstract is structured and clear. A clear summary of the problem background is provided along with comparison of different techniques and approaches. However, there are some major changes that need to be done to improve the quality of the manuscript. This is mostly pertaining to the experimentation and benchmarking the results

 

1) Referencing issues:

The referencing format in the following lines need to be corrected: 48,49,56,59,71,72,75,76,80,84,106,110, and so on...

Lines 42-43 need reference

Reference not found in line 39

References needed for equations 5 - 15

 

2) Highlight the improvements made to the DCNN model and explain the CNN architecture in Figure 5 clearly. Additionally, the steps in the preprocessing which are original should also be highlighted better.

3) Explain  lines 298 - 300. It is not clear, what the authors would wish to convey.

4) Figure 2 shows the preprocessing only. It would be good to add how this links to the rest of the approach (for e.g., the CNN architecture)

5) The evaluation that is provided in sections 4.3 and 4.4 is weak. Consider the following:

i) explain how the ROC was used for YOLO and other algorithms. please explain clearly, how this was achieved. The graph of the ROC curve in Figures 7 and 8 need to start with 0 on the y axis. From my understanding of the ROC, the YOLO v3 performs the best and your method is actually not working well.

ii) The evaluation metrics of the proposed method should still include classification performance metrics such as accuracy, etc. Note that this evaluation of the proposed method should be done before you start comparing it with other methods. It is also worth adding a confusion matrix that shows the number of true positives, number of true negatives, etc.

iii) Provide details about the number of classes and what they are.

iv) The comparison of proposed methods and other regression algorithms can be shown by the number of accurate predictions done. How will you determine that the YOLO that was used in your evaluation was able to predict the hand gesture correctly or accurately? 

v) Lines 516 - 526 need to be substantiated. Where are the results of the experiment? I cannot see any difference between the three algorithms.

6) English and Grammar - For example, lines 66, 71, 307.

 

Author Response

Dear reviewer,Thanks for your comments,Please refer to the  attachment for detailed reply.

Author Response File: Author Response.docx

Reviewer 3 Report

1. Introduction

I don't really understand what is the hand gesture recognition used for? Previously I was thinking about the effective gesture for teaching. But when I go trough the paper, this research is more about finding the effective algorithm to recognize the hand gesture more accurately. Is it the preliminary  study for developing application that is operated using the hand gesture? It is suggested to explain more detail about the "purpose" of this research. what is the "practical" contribution of this research result?

2. Experiments

Who are the participants of the experiments conducted in this research? please explain the population, sampling methods, and the reason of choosing the participants.. the profile of the participants

3. I could not find the discussion session. Please add the discussion session by comparing this research results with other researches results in detail.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I'm afraid I don't think that all of my comments have been satisfactorily addressed.

I'm happy with the explanation of why the classroom setting is chosen.

There are minor issues with the hyper-parameter setting description:
Keras and Tensorflow are not architectures and should not be part of the hyper-parameter description. Validation data is mentioned, but the paper does not described how it was used - what was the hyper-parameter search space and method?

The response to my review mentions other papers that use the same data set, but they are not mentioned in the paper, for example Ewe et al. "Hand Gesture Recognition via Lightweight VGG16 and Ensemble Classifier". Applied Sciences achieve better performance on NUS than this paper, but that result is not mentioned.

The descrption for figure 9 is still not sufficient. It is not enough to just say "the detection performance of the proposed framework was desirable". I would expect to see something like "We collected XX images in total and the algorithm correctly classified Y% of them"

The references have been corrected.

Author Response

Dear reviewer,

        Thanks for your comments. Please refer to the attachment for detailed response.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have made some improvements to the manuscript. However, there are still concerns about whether the evaluation was done properly. Please evaluate the proposed method first and provide evidence of implementation. For example, how many epochs did it take for convergence. Any graphs regarding the convergence. How many replications were done to calculate the training time? Or the running time? There are lot of missing information that need to be addressed.

Author Response

Dear reviewer,

Thanks for your comments. Please refer to the attachment for detailed response.

Author Response File: Author Response.pdf

Reviewer 3 Report

The author have not revised or at least responded to my second and third request/questions.

-Experiments

Who are the participants of the experiments conducted in this research? please explain the population, sampling methods, and the reason of choosing the participants.. the profile of the participants

-I could not find the discussion session. Please add the discussion session by comparing this research results with other researches results in detail.

Author Response

Dear reviewer,

Thanks for your comments. Please refer to the attachment for detailed response.

Author Response File: Author Response.doc

Round 3

Reviewer 1 Report

The authors have not addressed my previous comments satisfactorily. It is not acceptable to exclude other work that has achieved better performance on the same dataset due to 'different parameter setting'. A full justification for this new proposed method is needed in light of the fact that there is already a near perfect solution to the problem.

I'm also not satisfied with the new results reported in section 4.4.3. I at least want to know how many images this was tested on and how many of those images resulted in a correct response.

Author Response

Dear reviewer,

Thanks for your comments. Please refer to the attachment for detailed response.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors have not addressed the major flaws and concerns in the paper. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 4

Reviewer 2 Report

There are still a lot of issues with the manuscript

1) The authors have not improved their implementation section adequately. Since TP, FP, etc. are discussed, where is the confusion matrix. I cannot understand how many images were used for the testing. Which dataset were used for which accuracy metrics? The table does not show any information regarding this. 

2) The ROC curve does not look good; I am not able to understand why the full ROC curve is not represented. It looks like a cropped up ROC curve where the orange line in Fig 8 even has a value less than zero? Was this hand drawn?

3) The accuracy value mentioned in the abstract (approximately 94%) does not seem to be discussed in the results and evaluation. Where does this value come from?

4) Too many metrics used in the evaluation and I am not able to understand how these metrics are calculated. Where is the equation? There is not proper definition as to what these metrics such as recognition accuracy, localisation accuracy etc. represent in this particular context.

5) Since this is a comparison paper, the benchmark algorithms that are discussed always seem to have very poor performance compared to the proposed algorithm. Is this possible? How many images were used in the benchmark datasets? 

6) How balanced what the dataset used? How balanced was the test dataset used? By having an imbalanced dataset, it is possible to actually get very high classification accuracy. 

7) Are the authors able to replicate these results? Any cross-validation methods? Any changes to the train test split to see if the accuracy changes? If the authors claim that the algorithm has exceptional performance, it needs to be thoroughly validated before such a claim can be placed. I do not see this in the manuscript.

Author Response

Dear reviewer,

Thanks for your comments. Please refer to the attachment for detailed response.

Author Response File: Author Response.doc

Back to TopTop