Next Article in Journal
Musculoskeletal and Sociodemographic Gender Differences between Vocational Ballet Students
Next Article in Special Issue
MiniatureVQNet: A Light-Weight Deep Neural Network for Non-Intrusive Evaluation of VoIP Speech Quality
Previous Article in Journal
Effect of P+ Source Pattern in 4H-SiC Trench-Gate MOSFETs on Low Specific On-Resistance
 
 
Article
Peer-Review Record

Non-Autoregressive End-to-End Neural Modeling for Automatic Pronunciation Error Detection

Appl. Sci. 2023, 13(1), 109; https://doi.org/10.3390/app13010109
by Md. Anwar Hussen Wadud 1, Mohammed Alatiyyah 2,* and M. F. Mridha 3
Reviewer 1:
Reviewer 2:
Appl. Sci. 2023, 13(1), 109; https://doi.org/10.3390/app13010109
Submission received: 27 November 2022 / Revised: 17 December 2022 / Accepted: 18 December 2022 / Published: 22 December 2022
(This article belongs to the Special Issue Deep Learning for Speech Processing)

Round 1

Reviewer 1 Report

Contributions:

This paper presents a self-directing non-autoregressive pronunciation mistake identification model with the dictation and pronunciation models. My comments are given below:

1. (Page 14) As shown in Table 4, the F1 measure only reaches 52.07. The performance of mispronunciation detection is not good.

2.  (Page 6) Please give an example for the K, V, and Q and the output of the PDS in Fig. 4.

3.  There are many abbreviations in this paper. Please create an abbreviation table. It will help a reader to read this paper.

4.  (Line 78 on page 2) This point can be regarded as an experiment rather than a contribution. It can be removed.

5. (Line 80 on page 2)”… baseline & existing…” can be revised as “…baseline and  existing…”.

6. The captions are too redundant in Fig. 1. Some statements can be moved to the context. Figures 3, 4, 5, and 6 also have the same problem.

7. (Page 9)The text pair x, y should be defined explicitly in eq.(2).

8. The sub-grid lines should be removed for Tables 1-4.

9. The captions are not well presented in Figs. 7-12.

10. (Page 14) The target numbers for each vowel and consonant are missed in Table 3.

11. (Line 332 on Page 14) The first sentence is not well written.  

Author Response

Please find the attached file

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors propose a non-automatic regression end-to-end neural network model that combines dictation and pronunciation models for pronunciation error detection and diagnosis.

In this paper, they train the proposed and existing models on the public datasets L2-ARCTIC and SpeechOcean762. The authors compared and experimented with FBANK and Wav2Vec feature extraction methods for the proposed model and the existing model.

The model proposed in this paper is relatively simple, but it is written in detail to make it easier for readers to read and reproduce.

From the experimental results, it can be seen that it shows good performance compared to existing models.

I want you to explain why you did not compare with the current SoTA GOPT model(https://arxiv.org/pdf/2205.03432v1.pdf).

Ref. 30 and 49 are the same.

Author Response

Please find the attached file

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have improved the quality of this paper. I think this paper can be accepted for publication.

Reviewer 2 Report

The author answered my comment well and applied it to the paper.

Back to TopTop