Next Article in Journal
Effect of the Umbrella Arch Technique Modelled as a Homogenized Area above a Cross Passage
Next Article in Special Issue
Multi-Hypergraph Neural Networks for Emotion Recognition in Multi-Party Conversations
Previous Article in Journal
Fractional Order Viscoelastic Model for Stress Relaxation of Polyvinyl Chloride Geomembranes
Previous Article in Special Issue
Explore Long-Range Context Features for Speaker Verification
 
 
Article
Peer-Review Record

LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models

Appl. Sci. 2023, 13(3), 1587; https://doi.org/10.3390/app13031587
by Yukun Liu 1,2, Ta Li 1,2, Pengyuan Zhang 1,2,* and Yonghong Yan 1,2,3
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(3), 1587; https://doi.org/10.3390/app13031587
Submission received: 12 December 2022 / Revised: 29 December 2022 / Accepted: 3 January 2023 / Published: 26 January 2023
(This article belongs to the Special Issue Audio, Speech and Language Processing)

Round 1

Reviewer 1 Report

Very well-written paper. Complex ideas broken down using figures and flow charts allow for conciseness in the explanations.

Author Response

Response:  Thank you for taking time to review our paper. Good review comments are provided.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presents a compression platform named 4 LWMD (light-weight model designing) for Automatic Speech Recognition along with the framework and algorithm for achieving the compression. An extensive literature survey has been provided for the modelling; however, on the subject of compression with various neural network algorithms, the number of literature is very less. This can be improved. Moreover, there are too many self citations in different combinations which can be avoided.

The general goal of the paper is relevant. The results also are relevant to the objectives of the paper. However looking at the references, one gets the feeling that the contribution of the paper is just a small addition to the previous works. To avoid that, a comprehensive literature survey with comparison of other algorithms and models can be added.

Author Response

Response:  We give a more comprehensive comparison of existing model compression methods in Section 1 (Introduction) and provide more relevant reference literature. According to the criterion of model compression, we introduce existing works in light-weight architecture design, parameter pruning, low-bit quantization, and tensor decomposition, respectively.

Author Response File: Author Response.pdf

Reviewer 3 Report

In this article, the authors tried to propose a compression model (LWMD) for the E2E Speech Recognition task. The methodology is clear and original. But I have some suggestions namely minor revisions;

1) The motivation of the current work should be highlighted.

2) The authors should compare their results with some state-of-the-arts. I cound not see any table related to comparision.

3) Adding some Pros and Cons of the proposed model will sound good in terms of the readers.

4) Some papers related to speech processing can be discussed, especially for the feature;

** Unsupervised and supervised VAD systems using combination of time and frequency domain features

** milVAD: a bag-level MNIST modelling of voice activity detection using deep multiple instance learning

** Hybrid voice activity detection system based on LSTM and auditory speech features

5) Lastly, future works should be extended by one or two more sentences at the last sentences of Conclusion.

Author Response

n this article, the authors tried to propose a compression model (LWMD) for the E2E Speech Recognition task. The methodology is clear and original. But I have some suggestions namely minor revisions;

  • The motivation of the current work should be highlighted.

Response:  We highlighted the motivation in Section 1.

 

  • The authors should compare their results with some state-of-the-arts. I cound not see any table related to comparision.

Response: In Table 1 and Table 2, experiment results of Conformer baselines on Aishell-1 and HKSUT has been SOTA results under respective frameworks. Considering the main purpose of our paper is designing a light-weight compression method for E2E ASR, we do not adopt more data augmentation tricks in our experiments. On the other hand, existing model compression works in E2E ASR is various and an objective benchmark is lacking to select the best compression methods. Therefore, we select several representative compression methods as baselines in our experiments.

  • Adding some Pros and Cons of the proposed model will sound good in terms of the readers.

Response:  Besides described contributions in Section 1, we further summarize advantages and disadvantages of our methods in the Conclusion.

 

  • Some papers related to speech processing can be discussed, especially for the feature;

** Unsupervised and supervised VAD systems using combination of time and frequency domain features

** milVAD: a bag-level MNIST modelling of voice activity detection using deep multiple instance learning

** Hybrid voice activity detection system based on LSTM and auditory speech features

Response:  Additional discussions of speech processing is added in the Introduction and these relevant papers are cited.

 

5) Lastly, future works should be extended by one or two more sentences at the last sentences of Conclusion.

Response:  We further extend the Conclusion of the paper around future works.

Author Response File: Author Response.pdf

Back to TopTop