Next Article in Journal
Evaluation of Table Grape Flavor Based on Deep Neural Networks
Previous Article in Journal
Prospects of Computer-Aided Molecular Design of Coumarins as Ecotoxicologically Safe Plant Protection Agents
 
 
Article
Peer-Review Record

Designing a Leveled Conversational Teachable Agent for English Language Learners

Appl. Sci. 2023, 13(11), 6541; https://doi.org/10.3390/app13116541
by Kyung-A Lee 1,† and Soon-Bum Lim 2,*,†
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2023, 13(11), 6541; https://doi.org/10.3390/app13116541
Submission received: 11 April 2023 / Revised: 22 May 2023 / Accepted: 25 May 2023 / Published: 27 May 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

This is an interesting paper, however I have some concerns, please kindly add explanations to the issues given below:

The literature review is well organized; however, it is not entirely clear the originality of this research, aim, and objectives.

 

*Please explain how the relevant values are calculated for Figure 11. 12. 13.

*Include all descriptive statistics (mean, median, min, max, SD, etc.) for the evaluation results. If possible, please use APA style to report the results.

Author Response

See the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The topic is interesting and relevant. However, the contribution of the work is not clear, the description is not scientifically sound. I have a lot of remarks:

1.      It is not clear what exactly was done by the authors. The research highly relies on the NLP part (speech recognition, speech synthesis, intent detection, entity recognition, comprehension of the grammatical structure, word segmentation, etc.) (descriptions, figure 7, etc.) but nothing is said neither about training any of these components nor about reusing existing models or tools (no explanations or references are provided).

2.      The authors describe the collection of the speech data in Section 2.2 but emphasize only toddlers. Since nothing is said about the other age groups, at the beginning it makes the wrong impression that only toddlers' speech was used in this research. And due to it, the explanation about the investigation of different English language levels, and even the removal of profanity seems even more weird. Everything has to be described explicitly, therefore I recommend inserting a table (with the description) containing the important statistics about the dataset that you collected (line 136) focusing on the age groups, gender, English language levels, and other important characteristics that you may seem relevant to your research.

3.      Lines 121-123 describe the importance of the waveform. The algorithm of how the agent considers the learner's waveform is still a mystery to me. How does it adapt to speech differences (of males/females, children, and even toddlers)? Please, explain your offered method or put the reference to the method you are reusing.

4.      Nothing is said about the variety of the speech used in training (if there was training) and testing, in the diversity of language. The scope is not clear. It is not clear if the method is robust to cope with, e.g., slang or accents. Can the system still be used by different English language learners or only by Koreans?  

5.      Some questions and answers (at least from Figure 5) seem even not related and confusing. Are learners satisfied by such answers?

6.      Tables are presented as figures (Figure 5, Figure 6, Figure 8).

7.      In Figure 7 it is not clear how user and agent parts are connected. How, e.g., extraction of keywords helps in the overall process. How agent can adapt to input incoming from the learner?

8.      Figure 9 is too abstract and requires an explanation. You must provide the details of how all blocks smoothly connect, how different blocks (in particular, their errors) impact the succeeding blocks, and the overall accuracy. The picture of what was done is not clear at all.

9.      The evaluation process (Section 3.2) is not clear. How many people participated in the evaluation process? You need to provide the statistics including users with different proficiency levels. Your research is about it: you must prove they are satisfied with the system and the improvement of their English.

10.   Tables 2-5 do not reveal the scope of your research: there are just a few examples. Such descriptive evaluation is not enough to make any conclusions.

11.   I still do not understand Figure 11. It seems that the agent somehow adjusted to the level of the learner by using lower proficiency-level language. Shouldn‘t the agent be more advanced for the learner to be able to learn something and make progress (to improve their English language skills)?

12.   Figures 12 and 13 (especially confidence intervals) demonstrate that there are several evaluators involved in this process (because you present averaged results and confidence intervals). Despite it, the confidence intervals in both figures seem incorrect (they cannot be above the average). Moreover, it even exceeds 5 (first column in Figure 12) on the Likert scale, which is not possible and allows me to conclude that the evaluation process is fake. Therefore no conclusions can be drawn.

13.   That you focus on Koreans (their accent, slang, etc.) is mentioned only in the conclusions.

 

14.   The inserted program code (Theorem 1, and I have no idea why Theorem) of the API degrades the description. The code is too simple, without important details. If you still want to include the code in the description, please, move it into Appendix or put the reference to the external source.

Author Response

See the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Authors have solved all my concerns.

Author Response

There are no more comments to reply to

Thank you.

Reviewer 2 Report

The authors considered the majority of my previous comments. They also added new content, which is not accurate. Please, find my remarks:

1.      In Line 130 you state that “The structure of the BERT language model is shown in Figure 2”, but in Figure 2 you have the traditional transformer architecture. BERT consists of encoders, which are the blocks situated on the left side of the transformer architecture.

2.      The caption of Figure 2 is incorrect: it is the transformer’s architecture taken from Figure 1 from https://arxiv.org/pdf/1706.03762.pdf. Besides, you must add the reference to this paper next to your Figure 2.

3.      I asked for the statistics regarding your collected dataset (e.g., the number of dialogs, topics, questions&answers, words, vocabulary size, and other relevant information typically used to represent such datasets). The purpose of obtaining this information is to showcase the learning potential of your model and provide guidance to other researchers on what data they should gather to achieve similar performance in similar problems. Unfortunately, the current Table 1 is not informative at all.

4.      About Lines 176-178. You already confused me in Line 130 (calling transformer as BERT). Are you sure you are using BERT (i.e., not the whole transformer: with encoder and decoder parts) to generate answers? BERT’s capabilities are very limited to generating text.

5.      In your answer to me you write “This paper aims to develop a speech system for elementary school students…”, but not in the paper. It is crucial to write that you are developing the system for elementary school students: it would define the problem you are addressing, including the unique aspects such as the pronunciation challenges faced by children, their limited vocabulary, and other relevant specifics.

 

6.      Formula in Line 167 is missing the reference. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop