Next Article in Journal
Investigation on Preparation Method of SBS-Modified Asphalt Based on MSCR, LAS, and Fluorescence Microscopy
Previous Article in Journal
Aerodynamic Window Sealing of a Large-Aperture Channel for High-Power Laser Transmission
 
 
Article
Peer-Review Record

Sign Language Gesture Recognition with Convolutional-Type Features on Ensemble Classifiers and Hybrid Artificial Neural Network

Appl. Sci. 2022, 12(14), 7303; https://doi.org/10.3390/app12147303
by Ayanabha Jana 1 and Shridevi S. Krishnakumar 2,*
Reviewer 2: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2022, 12(14), 7303; https://doi.org/10.3390/app12147303
Submission received: 19 May 2022 / Revised: 29 June 2022 / Accepted: 4 July 2022 / Published: 20 July 2022
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

 

This is an interesting work but the paper needs to be improved. The objective should be better defined, that is
the paper proposes a method for static hand shape recognition in the context of sign language recognition (for finger-spelling for example), which is a small subpart of existing signs. This should be stated somewhere.

In the related work section (section 2), each selected paper is explained one after the other, without any thematic structure. We could expect a structure like 1) datasets, 2) possible inputs used (RGB, depth, motion etc), 3) features (built or learned), 4) machine learning techniques, 5) evaluation metrics  and a last subsection to position the proposed method wrt the literature.

Section 3 should be improved by better justifying the choices made on the techniques and by improving the transitions between subsections.
There are 6 datasets studied in the paper. However, the proposed ANN architectures have a different parametrization depending on the dataset, which is a shortcoming for the generalization of the method.

There are 5 different features tested + PCA + 2 ensemble methods + 3 ANN architectures, which is interesting but which makes the reading difficult. The paper would gain in quality by focusing on some of the most promising techniques and by analysing the results more thoroughly.
With so many combinations, the results tables are difficult to read (table 12 for example), because the reader can not see what are the most remarkable results.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper used three methods to train and test on multiple common data sets in sign languages. Great results have been presented. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The work presents a method for the recognition of the gestures of a hand in an image through the use of neural networks. It has applied to sign language recognition. The work is interesting and presents results of the performance of the methodology carried out through the use of a data set of images. The weak point of this work is that sections two and three are very poorly organized and make it very difficult to read.

Section 1.

The introduction is well constructed and makes the work much easier to read. It adequately presents the objective and contributions.

Section 2.

This section should be approached as a classification of methods to later compare them all, draw some conclusions and thereby justify the objective of this work described in section 1.

Why don't all the works referenced in Table 1 appear?

What conclusions are drawn to justify the proposed objective?

Section 3.

The section should be divided appropriately, since “materials” means both the data sets and the software packages used. Moreover, it is necessary to include a general algorithm that detail the used methodology.

Is the edge detection procedure before than  feature extraction?  Then classification is performed? There are too many subsections and some of them don't make sense. Other details to take into account are:

• Subsections 3.1. and 3.2 can join.

• A series of neural networks are presented. Any conclusions about which of them will be used?

• In 3.3.1.3. the angle of the fingers is calculated, but it is done in two dimensions, why are the characteristics of figure 3 presented in three dimensions?

• In figure 5 the labels a1, b1... can be included to make it more understandable. Now figure 5 is very similar to figure 3, therefore, we must eliminate figure 3.

• How do you get the schematic of the hand in figure 5? • In 3.3.1.4. the edges of the hand are obtained by using existing software. Is this process done before calculating the angle of the fingers?

• All the methodology described is messy and too much divided into subsections. An algorithm can be used at the beginning of the section showing the methodology to be followed and then detailing the steps. The scheme in figure 2 is not useful to clarify the methodology, it does not have the necessary detail.

• Subsections 3.3.3.1 and 3.3.3.2 do not make sense since it is not understood which parameters must be configured nor is it detailed what values are assigned to them.

• Section 3.3.4.1. it is obvious, it makes no sense to explain how a neural network works.

• Table 4 has a nomenclature that is confusing.

• Section 3.3.4.3 is not understood.

• How does everything presented fit with the structure presented in figure 8?

Section 4

• Figure 11 is not well explained. What do their axes mean? Does it take 50 trainings for the neural network to converge?

• What is the cost function and how is it used?

• Table 7 is poorly explained.

• There is no design of experiments that indicates how they are carried out and what is to be demonstrated with each one of them.

• Table 8 and figures 12 and 13 are not well explained.

• Have tests been carried out with different neural networks or only with one?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

The aim of the present article is to propose a system aimed to tackle the problem of sign image classification by extracting features most representative of the input image along with a hybrid artificial neural network that takes convolutional features and convolutional features on hand edges in order to precisely locate the hand region of the sign gesture under consideration in an attempt for classification.

Overall, the paper is nicely written and informative. 

There are only some minor issues, which are reported below, that need to be addressed or fixed.

In the title, I would avoid the use of acronyms such as ANN. In line with this, at the end of the Introduction, pg. 1, “In addition to the ensemble methods, ANNs[5]” the authors should spell out the acronym of ANNs. This is also true for “CNN”, “BRISK”, “ALS” (Introduction, pg. 2), “ILS”, “NUS I”, “NUS II”, “HSV” (pg. 3) and so on. 

Paragraph 2 “Related works”. When citing the references, it would be helpful for the reader to indicate the name of the author/s of the work and not merely the number of the reference; for example: “Sagayam and coworkers [6] uses a…” or “Sagayam and colleagues [6] uses…” or “Sagayam et al. [6] uses …” instead of “[6] uses a CNN to devise”. This should be done for all the references 6-18. 

Although paragraph 2 is well written and informative, maybe it would be fit better in a review paper; it is very long and full of information; maybe the authors should consider rewriting this paragraph in a more concise way focusing only on key aspects such as the accuracy of the testing datasets (keeping table 1 which instead is very helpful).

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Dear Authors,

I am glad you took all my comments in consideration.

For me, the paper is now far easier to read and the quality of the paper has been improved.

Reviewer 3 Report

The authors fulfil the recommendations of the revision

Back to TopTop