Next Article in Journal
Structure Design of Quadrilateral Overlapped Wireless Power Transmission Coupling Coil
Next Article in Special Issue
Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations
Previous Article in Journal
Modeling Full-Field Transient Flexural Waves on Damaged Plates with Arbitrary Excitations Using Temporal Vibration Characteristics
Previous Article in Special Issue
A Sentiment Analysis Method Based on a Blockchain-Supported Long Short-Term Memory Deep Network
 
 
Article
Peer-Review Record

Effects of Data Augmentations on Speech Emotion Recognition

Sensors 2022, 22(16), 5941; https://doi.org/10.3390/s22165941
by Bagus Tris Atmaja 1,2,*,† and Akira Sasou 1
Reviewer 1:
Reviewer 2: Anonymous
Sensors 2022, 22(16), 5941; https://doi.org/10.3390/s22165941
Submission received: 13 July 2022 / Revised: 3 August 2022 / Accepted: 5 August 2022 / Published: 9 August 2022

Round 1

Reviewer 1 Report

Some points should be addressed to make the paper eligible for publication:

 

The main problem statement should be highlighted.

The author needs to improve the related work section by adding state-of-the-art related works.

The dataset visualization figure is required in the dataset section (3.1.). What are the sampling rate and other metrics of used audio files?

In section 3.2.4., what is the noise type that was added to the original dataset?

It is necessary to ad the main methodology flowchart.

What are the main differences between experiments one and two?

The results tables from 2 to 5 require more descriptions; why the results are same for some instances in both two experiments?

The ROC (Receiver Operating Characteristic) curve is required for the SVM. the author needs to calculate it.

 

Besides the accuracy metrics, the comparison table (benchmarking table) should contain recall, precision, and specificity. The author needs to calculate them to highlight the system quality.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper discussed the influence of data augmentations techniques on the speech emotion recognition. Different types and numbers of data augmentation methods were tested on the Japanese Twitter-based emotional speech corpus.

 

The detailed comments are as follows:

1. In line 84, could the authors explain why they didn’t add the data augmentations technique in the test data?

2. In section 3.1, what’s the purpose of introducing the IEMOCAP dataset? The abstract and introduction said the paper focused on the JTES dataset.

3. Figure 3 lacks appropriate unit labels

4. In sections 3.3 and 3.4, did the JTES and IEMOCAP datasets use the same features extraction method and classifier?

5. In section 4, I think the experiment did not control the variables well, if the hardware equipment of Exp.#1 and Exp.#2 were different, then their other conditions should be controlled the same, for example the impulse response addition.

 

6. The effect of data augmentation presented by the experimental results does not seem to be very significant. Especially in the comparison session with previous work, the improvement in accuracy seems to come from the feature extraction method (wav2vec 2.0) rather than the effect of data augmentation.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The author completes the required modifications; the paper becomes better now. My recommendation is to accept it.

Reviewer 2 Report

The authors have addressed all of my comments. I would like to recommend it for publication in Sensors.

Back to TopTop