Performance Analysis of Deep Learning Model-Compression Techniques for Audio Classification on Edge Devices
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents results on the performance of deep learning models for audio classification running on edge devices. However, it seems that it is simply an adapted version of a thesis. It doesn't meet the standards of publication. For example, it contains many unnecessary paragraphs introducing basic knowledge well known to researchers (e.g. lines 157 to 200). Lines 146-150 use "chapter" when referring to different sections.
The Introduction section provides a simple summary of some relevant literature. It does not make a convincing argument why using Mel spectrum can potentially offer better performance.
The results achieved are not quite convincing either. It is not clear how previous studies were replicated to make comparison. The Methods section does not provide enough technical details about the models and their implementations making it is impossible to judge the validity of the research results.
Comments on the Quality of English Language
It is ok.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding in the re-submitted files.
- We omit the lines 157 to 200. We also change “chapter” to “section” in lines 146 - 150.
- Mel spectrogram, created by mimicking the non-linear response of the human ear to different frequencies, offer advantages over raw audio data. They compress frequency information, capturing essential acoustic features while reducing redundancy. The high-dimensional nature of raw audio data is condensed into a more manageable set of features, aligning with machine learning preferences for lower-dimensional inputs. Mel spectrogram are designed to be perceptually relevant, enhancing performance in tasks influenced by human perception, such as speech-related applications. They also exhibit better noise robustness due to frequency compression and feature extraction, reducing the impact of irrelevant noise. Moreover, the computational efficiency of mel spectrogram makes them a practical choice by providing a more efficient representation for processing in terms of both memory and computation.
- In table 2, we showed our result with the other research paper results.
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors ,
This paper focuses on Audio classification using deep learning models. The idea is clear, I recommend this paper for publication if the authors consider the following corrections in order to improve the paper's quality.
1- in the introduction, the third paragraph is missing a lot of citations.
2- The problem statement and research objectives are not clear in the introduction section.
3- In the last paragraph in the introduction section, replace Chapter with section.
4- Redraw Figure 1 and give more details on the research methodology.
5- Check the citation format in the whole paper such as in section 3.1
6- The following statement, is a contradiction.
"All the experiments will be conducted using Python and for hardware, Raspberry Pi 303 and NVIDIA Jetson Nano were used. The Convolutional Neural Network is implemented in PyTorch version 2.1.0 and the Wavio audio library is used to process the audio files."
7- Strengthen the Discussion section by conducting a more thorough comparison of the performance of the proposed model with existing state-of-the-art methods.
8- I recommend providing a more detailed description of the Conclusion, emphasizing the significance of this research for the worldwide scientific community.
Regards
Comments on the Quality of English LanguageMinor
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding in the re-submitted files.
- We add the citation in the third paragraph of the introduction section.
- We address the problem statement and research objectives in points at page 3.
- In the last paragraph in the introduction section, we replace Chapter with section.
- We add more details on the research methodology(line 300 - 310).
- We fix the citation in the section 3.1
- Fixed the statement (line 252-253)
Reviewer 3 Report
Comments and Suggestions for AuthorsThis is an interesting and well written paper. Authors evaluated performance of audio classification deep neural networks deployed on TPU coprocessor namely Edge. The subject is especially actual because it covers the application of low-powered embedded devices to machine learning tasks. Low-powered TPU has strict limitation on deep neural network size and decimal precision, because of that one has to apply model compression techniques. Authors apply notably magnitude 10 pruning, Taylor Pruning, and 8-bit quantization.
The proposed research demonstrates that a hybrid pruned model achieves a commendable accuracy rate of 89 percent, which, although marginally lower than the 92 percent accuracy of the uncompressed CNN, strikingly illustrates an equilibrium between efficiency and performance.
I have the following remarks:
1) Section 2.2 please present the neural network architecture in tabulated form.
2) Please publish source codes of your research, for example in online repository like GitHub. Please add information how and where to download the datasets. Without source codes and datasets experiments are virtually impossible to reproduce.
3) Please explain, how did you calculate power consumption per iteration (Table 5)?
4) Why accuracy on Raspberry pi4 is lower than on NVIDIA Jetson Nano? It should be a matter of neural network, not a hosting hardware.
5) Please define in formal (mathematical) way all pruning techniques you apply.
Author Response
Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding in the re-submitted files
- In Section 2.2 we add the neural network architecture in tabulated form.
- Thank you again for pointing this out. We will publish the source code in github after we publish the research.
- In table 5, we showed only the time it takes to train per iteration. We mentioned the general power consumption of the edge devices from their manuals.
- The performance in NVIDIA Jetson Nano and Raspberry pi performs almost similar. The difference in their performance may have the following reasons. The NVIDIA Jetson Nano typically has more powerful hardware, including a dedicated GPU (Graphics Processing Unit), which can significantly improve performance for certain tasks compared to the Raspberry Pi 4.The speed and capacity of RAM and storage on both devices can impact performance. The Jetson Nano may have faster memory or storage, contributing to better overall performance.
- We add the mathematical formula for the pruning method in section 2.2.1.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsWell done
Comments on the Quality of English LanguageMinor
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed my remarks. In my opinion paper can be accepted in its present form.