Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Data Augmentation for Human Keypoint Estimation Deep Learning based Sign Language Translation

Electronics 2020, 9(8), 1257; https://doi.org/10.3390/electronics9081257

by Chan-Il Park^*

and Chae-Bong Sohn^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2020, 9(8), 1257; https://doi.org/10.3390/electronics9081257

Submission received: 8 July 2020 / Revised: 3 August 2020 / Accepted: 4 August 2020 / Published: 5 August 2020

(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

The main idea of the article is to propose a method to increase the sign language data set.

In the literature review (Section 2) only one work related to this research is given (i.e. S.K. Ko et al, 2019). To show the need for the proposed method, it is worth adding a wider review of the literature on augmentation techniques. It should be discussed why the techniques used in literature are not effective and why new ones are proposed.

Example of techniques used:

A rotation operation with small angles of was performed for augmentation by M. Al-Hammadi (2019). In work of L. Pigou et al. (2014), the data augmentation consists of zooming, rotations, spatial translations in the x and y direction, and temporal translations.

Pigou, L., Dieleman, S., Kindermans, P. J., & Schrauwen, B. (2014, September). Sign language recognition using convolutional neural networks. In European Conference on Computer Vision (pp. 572-578). Springer, Cham.
Al-Hammadi, M., Muhammad, G., Abdul, W., Alsulaiman, M., & Hossain, M. S. (2019). Hand gesture recognition using 3D-CNN model. IEEE Consumer Electronics Magazine, 9(1), 95-101.

The above mentioned works are related to hand gesture recognition, which is part of the sign language translation.

There are only some examples. It is recommended to rewrite Section 2.

It should be clearly stated, which learning process is used in this research. In Section 3 it is stated that for the keypoint detection OpenPose library is used (Figure3), while in Section 4 it is stated, that the features are extracted by passing data through the CNN.

Why 2 different learning process are used?

Although OpenPose system processed images through a two-branch multi-stage CNN, we we cannot claim that these processes are the same.

The citation of references is not recommended in the abstract.

It's strange to see the title of a publication in a text (lines 42-43). It is enough to give only a reference to this publication. The same applies to lines 217-218.

There are errors in the use of abbreviations in the publication. The abbreviation should be explained the first time it appears in the article (even if an abbreviation is well known). This applies, for example, to the abbreviations AI (line 30), LSTM (line 71), CNN (line 144).

Figure 3. It is unclear what text is extracted from features and given to the LSTM network as input. Maybe the text is to be extracted from the LSTM as an output?

More information should be included in Table 2. That is, for each method, it is to give which database was used and the amount of data.

What does variables and in Equation 1 referring to? It is recommended to define them.

The capital letters are used in a chaotic way:

Line 50. All first letters of figure caption are capitals while the rest of the captions are written in lowercase;

Line 66. “Translation”

Line 84. “Conversion”

Page 6. “Camera angle conversion” and “camera angle conversion”, “Finger length conversion” and “finger length conversion”, “Random keypoint removal” and “random keypoint removal”.

Line 139. “in Figure 8.3.1. Subsection” ?

Line 229. Instant of combination number (i.e. 2 and 3) it's better to give full names of methods.

Lines 231-232. Non-breaking space should be used for “Table 2”.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Please see the attached file.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The paper revised in accordance with the comments of the review.

Some suggestions:

In title, instant of "Deep-lerning based" should be "deep lerning-based" or "deep lerning based"

Title of subsection 3.1. should be "Finger length conversion" (i.e., first letter in uppercase, because it is the section title). The same applies to the subsections 3.2 and 3.3 (i.e. should be "Random keypoint removal" and "Camera angle conversion")

Author Response

Response to Reviewer 1 Comments

We would like to thank you for reading this paper carefully and giving us helpful comments. We have answered each of your comments below.

Point 1: In title, instant of "Deep-lerning based" should be "deep lerning-based" or "deep lerning based"

Response 1: We changed "Deep-lerning based" to "deep lerning based".

Point 2 : Title of subsection 3.1. should be "Finger length conversion" (i.e., first letter in uppercase, because it is the section title). The same applies to the subsections 3.2 and 3.3 (i.e. should be "Random keypoint removal" and "Camera angle conversion")

Response 2: We changed all words

Reviewer 2 Report

all my questions have been answered.

Author Response

Response to Reviewer 2 Comments

We would like to thank you for reading this paper carefully and giving us helpful comments. We have answered each of your comments below.

Point 1: minor spell check required

Response 1: We fixed some spelling.

Author Response File: Author Response.docx

Article Menu

Data Augmentation for Human Keypoint Estimation Deep Learning based Sign Language Translation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI