Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

Sensors 2022, 22(13), 5043; https://doi.org/10.3390/s22135043

by Marek Hrúz^1,*,†

, Ivan Gruber^1,*,†

, Jakub Kanis¹

, Matyáš Boháček^1,2

, Miroslav Hlaváč¹

and Zdeněk Krňoul¹

Reviewer 1:

Jens Schumacher

Reviewer 2:

Hao Li

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Sensors 2022, 22(13), 5043; https://doi.org/10.3390/s22135043

Submission received: 31 May 2022 / Revised: 27 June 2022 / Accepted: 28 June 2022 / Published: 4 July 2022

(This article belongs to the Special Issue Sensing Systems for Sign Language Recognition)

Round 1

Reviewer 1 Report

Well-written and interesting paper. Please consider the following points:

* On page 6 you reference yourself 'as shown promising results on SLR task in the past [47]' whereas 47 is contains your own work. An additional reference would benefit this section

* Page 8: please put the links to github into the references or fix the surrounding formatting issue

* Page 8: 6 runs of CMA-ES -> do you mean you run the algorithm six times or do you let the algorithm run for six generations? In general, the paper would benefit from you providing the parameters you used to run the CMA-ES (lambda, mu, sigma, etc.)

In general, the paper is well written and a valuable addition to the journal and community. However, I feel it is necessary to highlight why CMA-ES is used and compared in the current settings and why this might benefit the problem domain of SLR. Also, to me it is at some points unclear how and why are describing certain approaches in the state-of-the-art which are then not used in the data analysis part of your paper. Clarifying certain phrases would help to connect the dots to the reader.

Author Response

Dear reviewer,

Thank you very much for your comments and valuable insight into our manuscript. Each comment raised by you is answered in the following response:

C: On page 6 you reference yourself 'as shown promising results on SLR task in the past [47]' whereas 47 is contains your own work. An additional reference would benefit this section

R: We added an additional reference.

C: Page 8: please put the links to github into the references or fix the surrounding formatting issue

R: We fixed the format.

C: Page 8: 6 runs of CMA-ES -> do you mean you run the algorithm six times or do you let the algorithm run for six generations? In general, the paper would benefit from you providing the parameters you used to run the CMA-ES (lambda, mu, sigma, etc.)

R: We added a clarification. We also add specific values of CMA-ES parameters.

C: I feel it is necessary to highlight why CMA-ES is used and compared in the current settings and why this might benefit the problem domain of SLR.

R: We justified the choice of CMA-ES in the text.

Author Response File: Author Response.pdf

Reviewer 2 Report

The author investigate the performance of I3D, TimeSformer, and SPOTER method pretrained on the AUTSL dataset and finetuned their performance on the WLASL300 dataset, and corresponding models trained the other way around. They also test the impact of applying neural ensemble, through combines several learning algorithms to obtain better results than the individual procedures, on Isolated Sign Recognition problem. The improvement 73.87% vs 73.43% is quite minor, which makes the title "...all we need" a overclaim and quite misleading. I suggest the author to revise it.

Author Response

Dear reviewer,

Thank you very much for your comment and valuable insight into our manuscript. Your comment is answered in the following response:

C: which makes the title "...all we need" a overclaim and quite misleading. I suggest the author to revise it.

R: We changed the title to better correspond with the obtained results.

Reviewer 3 Report

This paper did analyze the performance of three different methods - I3D, TimeSformer, and SPOTER, on two different benchmark datasets AUTSL, and WLASL300. The task is defined as a classification problem, where a sequence of frames is recognized as one of the given sign language glosses

The authors may need to take into consideration the following issues:

1. I suggest the authors can add some paragraph to mention about the motivation in abstract section.

2. I would like to see the Wilcoxon test of three main proposed models in Table 3, and 4.

3. In Table 3. and 4. What are the measures of the "VAL, TEST, WEns., and WEns.Logits". I suggest the authors can give a brief explanation.

4. In Table 3. and 4. There are too many abbreviation in there two Tables. I suggest the author can list them as in one Table.

5. The hyperlink of line 290 is out of bounds.

6. The ens and Neural ens are outperforms than I3D, TimeSformer and Spoter. How about the training time of different models.

Author Response

Dear reviewer,

Thank you very much for your comments and valuable insight into our manuscript. Each comment raised by you is answered in the following response:

C: I suggest the authors can add some paragraph to mention about the motivation in abstract section.

R: Based on your comment and the comment from Reviewer 4 we added the motivation into the Introduction.

C: I would like to see the Wilcoxon test of three main proposed models in Table 3, and 4.

R: Unfortunately, due to the time restriction, we were not able to perform the Wilcoxon test, however, we will focus on it in our future work.

C: In Table 3. and 4. What are the measures of the "VAL, TEST, WEns., and WEns.Logits". I suggest the authors can give a brief explanation.

R: We added a brief explanation.

C: In Table 3. and 4. There are too many abbreviation in there two Tables. I suggest the author can list them as in one Table.

R: We added a more detailed description to the tables.

C: The hyperlink of line 290 is out of bounds.

R: We fixed the problem.

C: The ens and Neural ens are outperforms than I3D, TimeSformer and Spoter. How about the training time of different models.

R: Unfortunately, each model was trained on different hardware, therefore, the training times are not comparable.

Reviewer 4 Report

Dear authors.

Information technologies accompany our life everywhere. Neural networks and machine learning are increasingly being used in the surrounding processes. Image analysis approaches are not simple. The work aimed at developing algorithms that analyze data more efficiently is undoubtedly relevant. The topic touched upon in the article is relevant. The scientific content of the manuscript justifies its publication, but some additions and modifications will significantly improve the quality of the article.

Major comments:

1) The topic is broader than the proposed material.

2) Introduction, the purpose of the work is not given.

3) Table 3, 4, the accepted designations are needed explanations;

4) What is the main application of the neural network being developed?

5) Conclusions, The reasoning of the authors is not enough (Ensemble is All We Need - there are more questions left than answers received).

Author Response

Dear reviewer,

Thank you very much for your comments and valuable insight into our manuscript. Each comment raised by you is answered in the following response:

C: Introduction, the purpose of the work is not given.

R: We added the motivation to the introduction.

C: Table 3, 4, the accepted designations are needed explanations.

R: We added an explanation.

C: What is the main application of the neural network being developed?

R: We added the motivation to the introduction.

C: Conclusions, The reasoning of the authors is not enough (Ensemble is All We Need - there are more questions left than answers received).

R: We changed the title of the paper appropriately.

Round 2

Reviewer 1 Report

Thank you for the updated version of you paper.

Some changes are still required, though.

For the CMA-ES: you describe sigma as the standard deviation; in CMA-ES context sigma is usually the step-size. Also, it would be beneficial why you have set the constants to the values listed in your paper. Finally, you state the good performance of the algorithm. It would be beneficial to know why it has good performance for your use case and you could probably support this by a reference.

Author Response

Dear reviewer,
Thank you very much for the additional comment. Based on it, we've corrected the description of the sigma parameter. The parameters of CMA-ES were chosen based on heuristics – we add this information to the paper. Unfortunately, we did not find any sufficient reference to support our claim about the performance. We draw the conclusions only from our past experience with different methods.

Article Menu

One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

Further Information

Guidelines

MDPI Initiatives

Follow MDPI