Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Effects of Data Augmentations on Speech Emotion Recognition

Sensors 2022, 22(16), 5941; https://doi.org/10.3390/s22165941

by Bagus Tris Atmaja^1,2,*,†

and Akira Sasou¹

Reviewer 1:

Ahmed Faeq Hussein

Reviewer 2: Anonymous

Sensors 2022, 22(16), 5941; https://doi.org/10.3390/s22165941

Submission received: 13 July 2022 / Revised: 3 August 2022 / Accepted: 5 August 2022 / Published: 9 August 2022

(This article belongs to the Special Issue Sensor Data Processing for Social Reasoning and Artificial Empathy in Human-Centred Systems)

Round 1

Reviewer 1 Report

Some points should be addressed to make the paper eligible for publication:

The main problem statement should be highlighted.

The author needs to improve the related work section by adding state-of-the-art related works.

The dataset visualization figure is required in the dataset section (3.1.). What are the sampling rate and other metrics of used audio files?

In section 3.2.4., what is the noise type that was added to the original dataset?

It is necessary to ad the main methodology flowchart.

What are the main differences between experiments one and two?

The results tables from 2 to 5 require more descriptions; why the results are same for some instances in both two experiments?

The ROC (Receiver Operating Characteristic) curve is required for the SVM. the author needs to calculate it.

Besides the accuracy metrics, the comparison table (benchmarking table) should contain recall, precision, and specificity. The author needs to calculate them to highlight the system quality.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper discussed the influence of data augmentations techniques on the speech emotion recognition. Different types and numbers of data augmentation methods were tested on the Japanese Twitter-based emotional speech corpus.

The detailed comments are as follows:

1. In line 84, could the authors explain why they didn’t add the data augmentations technique in the test data?

2. In section 3.1, what’s the purpose of introducing the IEMOCAP dataset? The abstract and introduction said the paper focused on the JTES dataset.

3. Figure 3 lacks appropriate unit labels

4. In sections 3.3 and 3.4, did the JTES and IEMOCAP datasets use the same features extraction method and classifier?

5. In section 4, I think the experiment did not control the variables well, if the hardware equipment of Exp.#1 and Exp.#2 were different, then their other conditions should be controlled the same, for example the impulse response addition.

6. The effect of data augmentation presented by the experimental results does not seem to be very significant. Especially in the comparison session with previous work, the improvement in accuracy seems to come from the feature extraction method (wav2vec 2.0) rather than the effect of data augmentation.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The author completes the required modifications; the paper becomes better now. My recommendation is to accept it.

Reviewer 2 Report

The authors have addressed all of my comments. I would like to recommend it for publication in Sensors.

Article Menu

Effects of Data Augmentations on Speech Emotion Recognition

Further Information

Guidelines

MDPI Initiatives

Follow MDPI