Next Article in Journal
Saliency Guided DNL-Yolo for Optical Remote Sensing Images for Off-Shore Ship Detection
Previous Article in Journal
Cardiac Magnetic Resonance Left Ventricle Segmentation and Function Evaluation Using a Trained Deep-Learning Model
 
 
Article
Peer-Review Record

Multi-Scale Features for Transformer Model to Improve the Performance of Sound Event Detection

Appl. Sci. 2022, 12(5), 2626; https://doi.org/10.3390/app12052626
by Soo-Jong Kim and Yong-Joo Chung *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Appl. Sci. 2022, 12(5), 2626; https://doi.org/10.3390/app12052626
Submission received: 3 January 2022 / Revised: 24 February 2022 / Accepted: 2 March 2022 / Published: 3 March 2022
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

1. Elaborating the introduction more with sufficient background/literature review is recommended. Add some latest work (listed below) on noise/sound detection but there are more than this as well. Also, talk about techniques used by those researchers and compare the proposed work with those works.

Suggested Papers:

a. Automatic Detection of Noise Events at Shooting Range Using Machine Learning

b. Detection and Identification of Background Sounds to Improvise Voice Interface in Critical Environments

c. Rare Sound Event Detection Using Deep Learning and Data Augmentation

2. I suggest authors highlighting the novelty of their work clearly along with the contribution.

3.  A proper diagram should be added to show the overall method involved in the proposed work, which gives the reader a clear picture of this work at a glance. Besides, highlight important parameters used for the different methods.

4. In Result and Discussion: Add the table for comparing the result of the proposed work with previous work done by other researchers.

5. Was the parameters for the proposed architecture also decided based on reference 19? If so, please mention that as well. Gives reader clear picture of how the architecture parameters, filters, and layers were decided. And if not; please mention how parameters, activation layer, filters, and layers were decided.

6. Was training, validation, and evaluation data from same block of data? If so, are we creating overfitting model? Please clarify on test, validation, and evaluation set. Looks confusing when I read.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Approved for publication.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

This manuscript titled "Multi-scale Features for Transformer Model to Improve the Performance of Sound Event Detection" constructed a multi-scale feature extraction model by using the Transformer encoder to classify and predict sound events. At the same time, this paper applied this model to the mean-teacher model, thereby proving the effectiveness of this model in semi-supervised learning. This paper finally used the data set of DCASE 2019 to verify the model, which can show better performance, and has a certain performance improvement compared with the baseline Transformer and the baseline mean-teacher model. Here, it is recommended to publish after some modification.

Here are some comments and suggestions for this article:

  1. This article introduces the SED task well and mentions CRNN, a model architecture of SED, but lacks an interpretation of the internal architecture of the CRNN model. It is recommended to add an interpretation and comparison of the internal architecture of CRNN to help illustrate the advantages of the Transformer architecture in the SED task.
  2. In this paper, it is very valuable to verify the effectiveness of the multi-scale model in semi-supervised learning from the perspective of the mean-teacher model, which can make the model better extend to a variety of learning mechanisms.
  3. The experimental part of this paper has sufficient data and detailed hyperparameters. If some ablation experiments could be designed for the Transformer encoders inside the multi-scale model, it would better reflect the value of the multi-scale extraction model.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

This paper can be accepted with the below changes being done. 1) In the ABSTRACT section - Please state the problem at first. 2) In the INTRODUCTION section - it should be in Capitals (deep neural networks (DNNs)). 3) The paper needs to be checked throughout by a native English speaker. 4) In the "Proposed Network Architecture" section -  Please state the action performed at every step and the logic for the same. 5) Please also write about FUTURE WORK that should be done. 6) In the RESULTS section, define the training set - three parts clearly and the reason for doing so. 7) Please add the RELATED INFORMATION section with details about Related papers with details.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I have no further concerns. I accept the paper.

Back to TopTop