Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition

Appl. Sci. 2022, 12(14), 6862; https://doi.org/10.3390/app12146862

by Mianjun Hu

, Xiwen Qu^*, Jun Huang

and Xuangou Wu

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2022, 12(14), 6862; https://doi.org/10.3390/app12146862

Submission received: 21 June 2022 / Revised: 3 July 2022 / Accepted: 5 July 2022 / Published: 7 July 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The results section should explore the values obtained and provide a narrative. This is standard within any research paper. Currently stands as superficial without results being explored.

Similarly, the conclusion should include some key values to reflect and demonstrate insights in the conclusion summary.

Suggestion: K-folds are standard for test/training data. The random 20/80 for test/train for each class will not average out if done a single instance (generally if going this route then random permutations over x-iterations will solve it).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors propose a deep learning framework for in-air Chinese character recognition that does not transform character coordinates into images but directly processes coordinate sequences; the comparison with the state-of-the-art highlights the goodness of the proposed approach.

The manuscript is written exhaustively, describing the state-of-the-art, the methodology used, the experimentation and the results obtained compared with the state-of-the-art. However, some aspects highlighted below deserve to be addressed before considering the manuscript for publication.

- The framework proposed by the authors receives the coordinates of the in-air characters as input. But how are the coordinates of the characters written in the air obtained? Presumably, a camera and image processing algorithms are used to recognize hand movement. Authors should describe this process and report the acquisition process adopted in the datasets they used during the experimentation.

- The authors state that the fully connected layer requires input vectors with a fixed size (lines 148-150). Therefore, the authors claim to obtain the fixed size by averaging the sequence coordinates. But how can an average operation transform a variable into a fixed dimension? The authors should detail this point better and report the mathematical formulas they used.

- Many grammatical errors in the text need to be corrected. Some of these have been highlighted below, but the manuscript requires proofreading by a native speaker for error correction.

SPECIFIC COMMENTS

In Figure 1, to aid the understanding of non-Chinese readers, it would be useful to compare in-air handwritten characters with the equivalent Chinese font.

Line 105: The acronym MQDF is not defined on first use. All acronyms should be defined when they first occur in the text.

Lines 105-106: "a loss function is designed which can minimum the reconstruction error" should read "a loss function is designed to minimize the reconstruction error"

Line 120: "the recognition accuracy, In [8] proposed" should read "the recognition accuracy, in [8] the authors proposed"

Figure 4 was never referenced in the manuscript text. All figures should be referenced at least once in the manuscript text.

The caption of Figure 4: "convolutional neural Network architecture" should read "convolutional neural network architecture"

Line 162: "network need to stack" should read "the network needs to stack"

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The formation of the paper is good. I want few clarifications and some changes in the paper:

(1) During review it is noticed that your manuscript contains sections of bulk citation. For example: [1]-[8], [7]-[13].

(2) Kindly clarify why different colors are appearing in Figure 3 for The character (left) before the normalization and that (right) after the normalization are given respectively whereas Images are looking same. What actually author want to explain?

(3) The abstract can present more results and conclusions of the manuscript. These parts need to be corrected.

(4) A nice summary table of the compared related works can be added to point out the gap in literature more clearly.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All of the changes suggested by the reviewer have been made; thus, the manuscript is now ready for publication.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

This paper presents a CNN model that classifies in-air handwritten Chinese characters. The manuscript is clear, concise, and well-organized. The contents of the paper is interesting for the ML community. The proposal is a combination methods, but it is scientific soundness. Here are my comments for further improvement of the work:

The introduction section needs a clear statement of the motivation of the work. Why do people need to recognize in-air characters? Any potential applications? What is the exact contribution of the work?
A related work or a state-of-the-art section is required. It is important to really understand the advantages and limitations of the current technologies, so that the work might be compared fairly.
The proposal is fine. In my opinion, the CNN architecture needs to be optimized (apply e.g., grid search, Bayesian optimization).
The manuscript requires an experimentation section. Explain the details of the benchmarks, what are the evaluation metrics implemented and why (including the equations), etc. Moreover, do the authors run a cross-validation method or how many times do the authors train the models? It is mandatory that authors include this cross-validation method in their experiments. In addition, are the datasets balanced? Or how do the authors deal with the imbalance datasets? Last but not least, in the experimentation, it is important to include other metrics than the accuracy, e.g., precision, recall, F1-score, AUC, etc. are more insightful metrics than the accuracy. I suggest to include them. The storage metric is not explained. How do the authors calculate this metric?
The results section requires improvement. The experimental results are clearly reported in the tables, but an analysis within the text is important, so that readers can understand the reasoning behind making an extensive benchmark. Why do author select these methods to compare? Moreover, a discussion is mandatory. Authors needs to summary the main advantages and weaknesses of the proposed method. What do authors think is the critical aspect of the proposal that makes the improvement in the accuracy metric?
My main concern is about the in-air condition for tracing handwritten characters. The authors only consider two space coordinates (x and y). But, to be fair, if a person writes "in the air" a character, there are three space coordinates (x, y, and z). How and why do the authors simplify this 3D-coordinates into a 2D-projection? I think something is missing in the explanation.
The conclusions need to give the final thoughts about the proposal supported by the results. Future work is also required.

Reviewer 2 Report

An end to end classifier based on CNN for in-air handwritten Chinese character recognition

The authors are proposing a method of end-to-end convolutional neural network for recognizing in-air handwritten Chinese characters. In-air handwriting is evolving as a new form of human-computer interaction that allows a user to perform gesture-based writing in the midair. The topic is of great interest to many researchers and manifold studies have been recently done on the recognition of in-air handwriting Chinese characters. However, I have many concerns with the quality of this manuscript that prevent me from recommending it for publication in Applied Sciences.

The manuscript presents too many language deficiencies and typographical errors.
The succinct review of related literature in the introductory section is incomprehensive with many relevant studies missing in the manuscript. For instance, a temporal convolutional recurrent network (Gan et al., 2020) was recently proposed for recognizing 3D in-air handwritten Chinese text which is not mentioned or compared in the manuscript.
The entire method section (Section 3) is not well presented which is difficult to comprehend and replicate. An algorithmic description of the method would have improved the methodology.
There is no justification for separating preprocessing into a separate section, it should have been a step of the methodology.
The datasets are not well described with references.
The experimentation is incomprehensive with only one evaluation metric used for establishing comparison without efficiency analysis.

Reviewer 3 Report

This manuscript proposes an end to end classifier based on CNN for in-air handwritten Chinese character recognition. Experiments show that the scheme has better character recognition than other schemes. However, this manuscript has room for improvement, as follows:

-This manuscript should discuss the problems to be solved and the disadvantages of the traditional schemes in current research.

- The Figure of the proposed algorithm should be good enough to show your main work.

-In the manuscript, I felt that analysis is missing and the relationships with the proposed work. Authors may include a new Section “Discussion and Significance of the Proposed Work” after Experiments Section (Section 4).

-Each section needs to flow smoothly from one section to the next. Each section needs to summarize at the end as a transitional sentence. Authors should revise accordingly.

-Incorporate following papers in your references:

Gadekallu, T. R., Srivastava, G., Liyanage, M., Iyapparaja, M., Chowdhary, C. L., Koppu, S., & Maddikunta, P. K. R. (2022). Hand gesture recognition based on a Harris Hawks optimized Convolution Neural Network. Computers & Electrical Engineering, 100, 107836.

-The grammar and spelling of this manuscript can be improved. Please check and revise them accordingly.

Reviewer 4 Report

This paper presents a CNN-based classifier for recognizing handwritten Chinese characters in the air. The proposed approach differs from those present in the literature in that it directly classifies the coordinates of the trajectory without converting the coordinates into images.

The paper is well written and technically focused. Some minor comments are listed below.

GENERAL COMMENTS

In Eq. 6, shouldn't the difference (y_t - mu_t) be divided by a delta_y to maintain the proportions along the y-axis?

The residual convolutional block shown in Figure 4: (b) corresponds to block "Block N" of Figure 4: (a), however in Figure 4: (b) the parameter N is not considered. Why?

SPECIFIC COMMENTS

Line 82

Quote: "… classification. our method can…"

Comment: should read "… classification. Our method can…"

Line 84

Quote: "… models, Our method does not"

Comment: should read "… models, our method does not…"

Line 165

Quote: "(including Cov8d"

Comment: The round bracket is never closed.

Reviewer 5 Report

The article presents an interesting topic and application. The paper is structured appropriately with relevant content, with room for improvement. The paper should be more detailed as currently it is quite short.

Abbreviation declarations shouldn't be recurring throughout the text, such as the conclusion.

The writing style can be improved by removing first person writing. Instances such as "we", "our", ... should be removed. The article should be in 3rd person.

Literature should be explored further and contribute to the papers work, motivation and comparison of results.

References 1-12 are not explored in any meaningful context as they are presented as they are presented and grouped into reference instances 1-8 and 7-12. References should provide context, support and be discussed as appropriately. The authors should expand on the literature review and discuss current work, while avoiding this block referencing of articles not contributing to the work presented.

Be careful with referencing. When using <author> et al. format a reference number should appear beside it to be clear which reference is being discussed. Several cases of authors being mentioned, but no way to determine the reference number from the text. (line 84).

The mathematics discussion and text should be improved with more narrative details and explanations presented.

The training and testing procedure undertaken must be improved in the paper. 80% training and 20% testing is appropriate. However, this appears to be done once, where the data itself will vary the network structure and performance can be questionable. Recommendations would include utilising k-folding to improve the experiment validity and is standard within the field.

The results tables should include the references for the other methods presented. E.g. Tables 4 and 5 should include the references beside the methods.

The results presented should be explored in text rather than presented 'as is'. What performance improvement was obtained? Accuracy is simply one metric, how did false positives factor into the results? ... (There are many discussion points required to be explored).

The conclusion should summarise the work done and detail the impact/findings further than currently presented.

Article Menu

An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition

Further Information

Guidelines

MDPI Initiatives

Follow MDPI