Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition
Round 1
Reviewer 1 Report
-English should be corrected
-please add colorful picture of measurements (optionally);;; + arrows what is what
-please add block diagram of the proposed research step by step ;;; what is the result of paper?;;;
-please add block diagram of the proposed method;;;;
-please add photo/photos of application of the proposed research ;;;;
-please add sentences about future analysis;;;
-Figures should have better quality;;;; for now they are low
-Fonts of figures should be bigger;;;
-Please add labels to axes (Figures);;;;
-please add arrows to photos what is what;;;
-formulas and fonts should be formatted;;;;
-references should be 2018-2021 Web of Science about 50% or more ;; 30 at least
-Please compare with other methods, justify. Advantages or Disadvantages different methods
-Conclusion: point out what are you done;;;;
-is there possibility to use the proposed method for other problems?
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The authors address the 'Code Switching' problem in ASR, that is the appearance of words or phrases from a second language within a sentence, in this case English words appearing in Korean sentences. Two techniques are reported and analysed separately and together:
English words from a Korean speaker may not be pronounced normally (Konglish) because of phonetic differences in the 2 languages. This effect is analysed carefully and captured in a learnt pronunciation model which is used in end-to-end ASR.
A language model for this kind of speech is obtained by adapting with code-switching material close to the target domain. There are several variations on this process.
Results (reported as %age reductions in the Error Reduction Rate, which is undefined) are positive without being dramatic. Results are not compared with those from other studies, but perhaps there is no very similar work .. if that is the case please say so.
The paper is well organised and most descriptions are clear but there are many small errors of English and the manuscript must be proof read by a native speaker.
We have the following concerns:
The argument for using 'Sentences containing very rare English words' in LM adaptation is unconvincing and figure 3, which is meant to illustrate the effect, is unclear.
The paper would benefit from more discussion of the advantages and disadvantages of the work reported in comparison to the studies cited in the literature review, in particular those which construct context switching databases and those based on knowledge of phoneme systems. This discussion could be added to the conclusions.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
This paper is focused on an interesting topic, mainly speech recognition of Korean English. Authors propose their own method for CS ASR, tested using a wide variation of speech samples. Next, they discuss and evaluate their approach, presenting results of carried out experiments.
Indeed, English words are mixed in and merged with traditional sentences by people all around the world. Furthermore, different individuals, including background, country of origin, etc., have their own pronunciation of certain words and expressions, related with movements of the mouth, teeth, tongue, or vocal tract. In this case, the mixture of traditional native-Korean and “modern” English is referred to as Konglish. In my opinion, such analyses should be carried out by scientists and researchers worldwide, including both monolingual and bilingual individuals.
The first part of the manuscript is very informative and provides a good introduction to the discussed subject and a throughout review of related work. Authors provide a good description of consonants in Korean and the American English dialect.
The presented figures (including block diagrams) are clear and understandable, so are the presented mathematical formulas and equations. They seem plausible and free of error. It is good that utilized symbols are highlighted, in order to better distinguish them from plain text.
May I ask whether the Authors did use (or are planning to do so) a group of listeners in their studies, including Korean-natives, both monolingual (average English skills) and bilingual (fluent English skills)? Such an evaluation, aside from machine learning and related computer algorithms, would be most interesting.
Personally, I would like to see more results, including those presented in tables and/or graphical form. Surely the Authors could share and describe more data from their studies. I do encourage them to consider extending the second (research) part of the paper. Furthermore, in my opinion, the Conclusions section should be extended as well.
Authors present results of Accuracy (Correction Rate) of their solution, compared to several others. I do think that a comparison of computational complexity, including CPU load, required RAM, processing time, etc., would be most interesting.
If this is not confidential information – I am wondering whether the Authors did implement and utilize their solution in any consumer product? If you can, please mention some information.
Suggestions:
- Double-check the whole manuscript for minor editorial and formatting issues, e.g. bullet points, alignment of figures and tables.
- Table 2 should not be centered, but aligned to the right hand side.
- Consider inserting all figures in larger size (with larger fonts), as well as resolution, in order to make them more legible and easier to interpret for the potential reader.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
-what is application of the proposed method, please add photo
-algorithms should be changed into block diagrams
-more references from 2018-2021 web of science should be added
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 3
Reviewer 1 Report
-figure 8, please add labels to axes because now, frequency are on both?
Author Response
Please see the attachment.
Author Response File: Author Response.docx