Next Article in Journal
Evaluation Method of Electric Vehicle Charging Station Operation Based on Contrastive Learning
Next Article in Special Issue
Arabic Toxic Tweet Classification: Leveraging the AraBERT Model
Previous Article in Journal
A Real-Time Vehicle Speed Prediction Method Based on a Lightweight Informer Driven by Big Temporal Data
Previous Article in Special Issue
DSpamOnto: An Ontology Modelling for Domain-Specific Social Spammers in Microblogging
 
 
Article
Peer-Review Record

The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters

Big Data Cogn. Comput. 2023, 7(3), 132; https://doi.org/10.3390/bdcc7030132
by Nurgali Kadyrbek 1, Madina Mansurova 1,*, Adai Shomanov 2 and Gaukhar Makharova 3
Reviewer 1: Anonymous
Reviewer 2:
Big Data Cogn. Comput. 2023, 7(3), 132; https://doi.org/10.3390/bdcc7030132
Submission received: 10 April 2023 / Revised: 22 June 2023 / Accepted: 5 July 2023 / Published: 20 July 2023
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

Round 1

Reviewer 1 Report

Correct the document according to the comments in the attached document.

Comments for author File: Comments.pdf

Correct the document according to the comments in the attached document.

Fine but should de-personalize the text. Avoid using we, you, etc.

Author Response

Thank you for your detailed feedback. The article has been corrected in accordance with the comments

Author Response File: Author Response.pdf

Reviewer 2 Report

The work describes the Kazakh speech recognition model using CNN. Although the authors have covered all the aspects and explained them, the following points need to be considered for the possible publication of the manuscript. 

1. does the author do the data collection, or available dataset is used for model development?

2. If the data is collected by authors, as explained in section 3.2 Audio collection, the audio data needs extensive signal processing and filtering. These steps are completely missing. 

3. Figure 6: Make it clear and readable. 

4. Check the figure numbers and captions. For example, figure 6 is repeated twice. 

5. The proposed model can be validated on any one available dataset. So, it helps authors to prove the effectiveness of their proposed model. 

6. Conclusion should be rewritten. It should be informative with the results obtained at each steps along with future scope. 

7. Highlight the novelty of proposed work 

Moderate English grammar and spell check is required. 

Author Response

1. does the author do the data collection, or available dataset is used for model development?
The dataset has posted in an open hosting and is available by link and the updated article contains this information.

2. If the data is collected by authors, as explained in section 3.2 Audio collection, the audio data needs extensive signal processing and filtering. These steps are completely missing. 
The short-time Fourier transform function from the Librosa python package has been used during the data loading process to preprocess audio signals by transforming them into a spectrogram representation. This approach has proven to be effective in extracting relevant features from raw audio signals, resulting in improved performance for various machine learning tasks [21]. Typically, convolutional layers in an encoder can be used to extract spectrogram-like objects from raw audio signals. Filters in convolutional layers are trained to recognize patterns and features at various time and frequency scales that can be used to represent the basic structure of the audio signal, so no additional signal processing tools were used during this research.

The updated article contains this information.

3. Figure 6: Make it clear and readable. 
Done.

4. Check the figure numbers and captions. For example, figure 6 is repeated twice. 
Have fixed.

5. The proposed model can be validated on any one available dataset. So, it helps authors to prove the effectiveness of their proposed model. 
Additional explanations were added to the results and the validity of the improvement.

6. Conclusion should be rewritten. It should be informative with the results obtained at each steps along with future scope. 
The conclusion was rewritten anew. And it contains a more informative context.

7. Highlight the novelty of proposed work 
The discussion of the results and the importance from the point of view of the architectural solution was continued.

Author Response File: Author Response.pdf

Reviewer 3 Report

1.         The abstract is not detailed enough. Readers expect to see more detail of the methodology, results, and conclusion in the abstract. The abstract need to be greatly improved. The abstract does not show that the authors achieved much as there is no numerical justification to back the author’s claims or results of comparative analysis to show superior performance. 2.         In the introduction, the authors should explain why they did it (motivation) discussing the possible outcome. Readers are primarily interested in the motivation and outcome of your research. Therefore, a good introduction should contain:

a. What is the problem to be solved?

b. Are there any existing solutions?

c. Which is the best?

d. What is the main limitation of the best and existing approaches?

e. What do you hope to change or propose to make it better?

f. How is the paper structured?

3.         Please clearly highlight how your work advances the field from the present state of knowledge and you should provide a clear justification for your work which should be stated at the end of literature review/ related works. The impact or advancement of the work can also appear in the conclusion.

4.         The authors mentioned feature extraction but have not presented this stage in their work. A block diagram or flowchart of the steps would be helpful. The authors should look for recent works on feature extraction to cite. An example of such recent literature which the authors can consult amongst others is: Feature Extraction: A Survey of the Types, Techniques, Applications, 5th IEEE International Conference on Signal Processing and Communication (ICSC), Noida, India, pp. 158-164 (2019). DOI: 10.1109/ICSC45622.2019.8938371

5.         Related works section is not sufficient. The authors should improve on this section as they have left many papers out. Normally, it’s the gaps in work of others that the authors are expected to fill. Therefore, at the end of your review section state the problems in this field with appropriate reference and tell readers which one your work addresses. What is the difference of your paper and these papers.

-https://www.researchgate.net/publication/344339328_Development_of_Automatic_Speech_Recognition_for_Kazakh_Language_using_Transfer_Learning

- A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training, https://www.mdpi.com/1424-8220/23/2/870 -Hybrid end-to-end model for Kazakh speech recognition, https://link.springer.com/article/10.1007/s10772-022-09983-8

-Accent Classification of the Three Major Nigerian Indigenous Languages Using 1D CNN LSTM Network Model. Algorithms for Intelligent Systems, Springer Singapore, pp. 1–16, 2020. DOI: 10.1007/978-981-15-2620-6_1

6.         Most of the figures in this paper are not clear enough. The authors should endeavour to change them.

7.         The authors need to discuss the results in Tables 3, 4, and 5 better. The reason why the proposed technique performs better has not been explained.

8.         There is no comparison of results with the existing works in this paper. This should be added for readers to see how your proposed method performs relative to other works.

9. The authors should structure the paper into abstract, introduction, literature review/related works, methodology, results and discussion, and conclusion.

10.       I was hoping to see more results and discussion as more results could be presented to make the work much appreciable. The authors are encouraged to reduce the plagiarism of the paper.

11. The Limitations of the proposed study need to be discussed before conclusion.

12. Some of the challenges encountered during the course of the study can be highlighted and future recommendations can be added at the end of the conclusion. Retitle conclusion as conclusion and recommendation.

13. The results and discussion section needs to be improved. The authors should endeavor to improve on this section. In the section of selection of local minima, what criteria did the authors used? Also what priors did the authors consider? What is the minimum and the maximum values? If these are suitable, do they work for different types of speech data or just the speech under consideration? E.g. speech data includes accent, etc

Moderate editing of English language is required.

Author Response

Thank you for your valuable feedback on our article.

We agree that the abstract needs more details to give readers a better understanding of our methodology, results, and conclusions. We have made significant improvements to the abstract to address your concerns. We have also added numerical justifications to back up our claims and compared our results with existing models to demonstrate the superior performance of our approach. Regarding the introduction, we appreciate your suggestions on providing more information on the motivation and outcome of our research. We have added a detailed explanation of the problem we aimed to solve, existing solutions, their limitations, and our proposed solution to improve the current state-of-the-art. We have also explained the structure of the paper to help readers navigate through the article.
We have highlighted how our work advances the field from the present state of knowledge and have provided a clear justification for our work. This is now stated at the end of the literature review/related works, and we have also included the impact and advancement of our work in the conclusion. We hope these revisions address your concerns and improve the overall quality of our article. Thank you again for your valuable feedback. 

We will work on including additional results and discuss them in more detail to improve the quality of the article.

Regarding plagiarism, we apologize if there were any cases of unintentional similarity in language or ideas. We have taken steps to ensure that the article complies with ethical and academic standards, including the use of plagiarism detection tools to verify the manuscript and, if necessary, provide appropriate citations and references to sources. If you could kindly point out specific cases of similarity, we would be happy to respond accordingly.

As you noted, it is usually expected that authors will fill in the gaps in the works of other authors. But our main goal was to create a model with the ability to deploy speech recognition on a platform with limited computing resources, and not just limit ourselves to creating a model, you can see that our model is relatively light and contains fewer parameters, and this is a key indicator of optimization. And also from the point of view of respect for other researchers, we did not delve into the comparison of our results, since our study was conducted on completely different data, and this reduces the objectivity of comparing the accuracy of the models.

Round 2

Reviewer 1 Report

Everything corrected according to suggestions.

Everything corrected according to suggestions.

Author Response

Dear Reviewer,

Thank you for your feedback on our research manuscript. We have carefully addressed your concerns regarding the method description, specifically regarding corpus collection and model selection. We appreciate your input and have made the necessary revisions to ensure that the research design aligns with the study objectives.

In response to your concerns, we have provided additional details in the method description to ensure clarity and address any ambiguities. We have also made necessary edits to improve the English language throughout the manuscript.

Additionally, we have included future research directions. Firstly, we highlighted the importance of optimizing the model architecture for real-time processing and embedding it on resource-constrained platforms, expanding its deployment possibilities. Secondly, we emphasized the need for extending the applicability of the developed model to low-resource scenarios through techniques such as unsupervised or semi-supervised learning, active learning, and few-shot learning.

We are grateful for your guidance, and we have carefully incorporated your suggestions into the revised manuscript. We believe that the updated version now better addresses your concerns and enhances the overall quality of our research.



Reviewer 3 Report

The authors have not shown the new contribution of the paper as there are numerous related works.

The English language is ok but the paper doesn't contribute to the body of knowledge.

Author Response

Dear Reviewer,

We sincerely appreciate your valuable feedback and constructive comments on our research manuscript. Your concerns regarding the adequacy of the method description have been carefully considered, particularly with regards to corpus collection and model selection. We highly value your input and would like to address these concerns in the revised version of the manuscript. Our aim is to ensure that the research design aligns with the objectives of the study.

  1. Corpus collection: In our study, we meticulously collected a transcribed audio corpus in the Kazakh language. The corpus comprised approximately 554 hours of high-quality data, providing a comprehensive dataset for both training and evaluation purposes. We acknowledge that the dataset's quality and size contribute to the robustness of our research findings.

  2. Model selection: To establish a solid foundation for our study, we selected the DeepSpeech2 model as the base module. DeepSpeech2 is a widely recognized and extensively used model in the field of speech recognition. By leveraging this established model, we could build upon its strengths and explore architectural modifications to enhance its performance.

  3. Architectural changes: Our study incorporated significant architectural changes, encompassing both spectrogram characteristics and symbol-level information. We adopted a CNN hybrid architecture that effectively captures both acoustic and linguistic features. Furthermore, we introduced an additional filter based on the embedding of Kazakh language character vectors. These modifications were designed to improve feature representation and reduce model size, contributing to overall efficiency.

  4. Training methodology: In our research, we employed a combination of supervised and unsupervised training methods. The speech recognition model was trained on a labeled set of speech data, while character-level filters were trained using an unlabeled dataset. This hybrid approach allowed us to leverage both labeled and unlabeled data, thereby enhancing the model's performance.

  5. Evaluation of effectiveness: To evaluate the effectiveness of our proposed approach, we utilized the character error rate (CER) as a performance metric. We conducted comparisons with existing models to demonstrate the efficacy of our architectural modifications, showcasing improved accuracy and efficiency.

  6. Conclusions: The conclusions drawn from our study are aligned with the objectives and findings of the research. We aimed to highlight the contemporary effectiveness of our approach by showcasing accuracy improvements and model size reduction. Additionally, we discussed the potential application of our findings to speech-related tasks in other languages, underscoring the broader significance of our research. Also we extended with a information about future steps of research.

Regarding the method description, we have taken your feedback into careful consideration and made the necessary additions to enhance the clarity and comprehensiveness of the manuscript.

Once again, we express our gratitude for your valuable feedback, which has undoubtedly contributed to the refinement of our research. We are confident that the revised version of the manuscript will address your concerns adequately.

Author Response File: Author Response.pdf

Back to TopTop