Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

Electronics 2021, 10(7), 807; https://doi.org/10.3390/electronics10070807

by Jiho Jeong¹, S. I. M. M. Raton Mondol¹

, Yeon Wook Kim² and Sangmin Lee^1,2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2021, 10(7), 807; https://doi.org/10.3390/electronics10070807

Submission received: 3 March 2021 / Revised: 24 March 2021 / Accepted: 24 March 2021 / Published: 29 March 2021

(This article belongs to the Special Issue Deep Learning, Reconfigurable Computing and Machine Learning in Healthcare)

Round 1

Reviewer 1 Report

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

In this paper the authors present a method for improving learning in AST for CI patients.
An interesting topic in an important area. The paper is well organized and written with good english.
The introduction could provide a better description of CI speech and what are the implications in ASR. A very brief explanation of Korean ASR models would be also important (explaining symbols, phonetics and word forming. I believe that the authors can provide 2 or 3 sentences covering theses aspects). The explanation of previous work involving data augmentation could also be better explored.
The methodology is interesting and the evaluation follow a valid scientific approach.
Testing scenarios could be presented.
Comparison with other authors using similar databases or systems would improve discussion.
In conclusion, the authors could better explain what are the main contribution of the paper and what is novel.

Some additional comments/suggestions:

L1. include intro about CI speech

L27. include meaning of CI (non-mentioned in the main content)

L30. "spectral mean" -> mean spectral content or the mean spectral energy
L38. Please explain the overfitting problem
L43. "as presented in [9]". please include the name of the technique or something more specific
L44. "an ASR model was pre-trained with 1000h of standard....to learn ALS". How?
L64. 80 dimensional log-Mel ? Why 80?
L65. "harming window" ?
L72 "the input sequence ? calculates" -> "the input sequence is used to calculate"
L84. "Paer [9]" -> "In the work of Shor et al. [9]..."
L95. "which does not train decoder" -> so what does it do?
L104. between -1 and 1, please explain the use of these factors.
L108 Fig 3. Legend could better explain what is being shown. The y range should be the same. Maybe spectrograms with pitch tracking (like those generated by Praat) could help to visualize the results that are being presented. x-label doesn't shown units (is it ms?)
L110. Speed perturbation while maintaining F0 ?
Fig 4. Legend (b) Speech signal after speed perturbation
L125. F and T must be better explained.
Fig 5. What value is chosen to fill the spectrum (what is the value of the green areas in Fig 5b?)
L139. In Korean, is there a correspondence between phonemes and characters? Maybe something could be added about this. In many languages there are complex grapheme/phoneme correspondence modules.
L141. Create an introduction first where Methods 1, 2 and 3 are briefly explained
Tables 1, 2 and 3 could be combined for easier comparison

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Figure 1 needs to be redrawn. The same blocks are plotted twice.
Figures 2 to 5 are not mentioned.

The caption of Table 5 is comparison for three methods, but in the table other two methods are compared.

For the references, "et al" should be replaced by a full list of authors.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have answered my questions in general.

Author Response

Dear Reviewer1,

Thank you for your suggestion.

We modified the reference as the guide of the journal and did spell check.

Figure 1 and description were modified.

Several typos have been fixed.

Thank you very much.

Sangmin Lee

Reviewer 2 Report

The revised Figure 1 has some problem. The two part can be combined into one, but two flows (one for training and the other for testing). They share the same fine-tuned model. For the testing flow, non-standard test speech is applied on the fine-tine model, not directly on the result.

The format of the author list in the references is inconsistent.

Author Response

Dear Reviewer2,

Thank you very much for your suggestion.

We modified our paper as you mentioned.

The detail contents are attached as a MS file.

Thank you very much.

Sangmin Lee

Author Response File: Author Response.pdf

Article Menu

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

Further Information

Guidelines

MDPI Initiatives

Follow MDPI