Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Appl. Sci. 2022, 12(19), 9979; https://doi.org/10.3390/app12199979

by Taiyang Guo¹

, Zhi Zhu²

, Shunsuke Kidani¹

and Masashi Unoki^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2022, 12(19), 9979; https://doi.org/10.3390/app12199979

Submission received: 1 September 2022 / Revised: 23 September 2022 / Accepted: 24 September 2022 / Published: 4 October 2022

Round 1

Reviewer 1 Report

The work is interesting and in my opinion deserves to be published. However, I would suggest that a reference be inserted to certain formulas to justify the choice of formulas from a theoretical point of view, even if these formulas are relatively known in a different form. I am referring especially to the parameters that are used later in the experiments. In line 348, the authors state: "we estimated the probability density functions (PDFs) of ..."; I think that there is still a long way to go before an estimate and probably a reformulation is desirable.

Author Response

Thank you for your comment. We used probability density functions (PDFs) in the discussion part of the article, but we didn't explain it in detail before, so in the revised version, we added an explanation in lines 348 to 354 (1) why we assumed that the distribution of each MSF in each emotion conforms to a normal distribution: The distribution of each MSF in each emotion is a real-valued random variable and it has a finite mean and variance. Therefore, this study assumed that the distribution of each MSF in each emotion conforms to a normal distribution. (2) how it is calculated: we estimate its distribution using the probability density functions (PDFs) of the normal distribution, the mean value and variance of each MSF taken across the 10 utterances of each emotion.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors in this manuscript investigated the contribution of modulation spectral features (MSFs) to vocal-emotion recognition in noisy reverberant environments. Emotion recognitions using modulation spectral features are widely explored in the past, whereas the performance in a noisy reverberant environment is not clear. Two common MSFs are found that contributes to vocal-emotion recognition. Therefore, this proposed idea is interesting and worth for considering publishing at Applied Sciences. The paper is also well organized. Five vocal emotions (neutral, joy, cold anger, sadness, and hot anger) were considered in the current analysis. Will the conclusions be applied to other vocal emotions?

Typos:

In line 87, Page 2: MSFs. and the vocal-emotion-recognition results.

Author Response

Response 1:Thank you for your comment. Although the Fujitsu Japanese Emotional Speech Database we used in this study contains only five emotions, since they are all expressed by a professional, they can be considered as the typical representation of diverse emotions in our daily life. We believed that the conclusions of this study are also applicable to other emotions, but this speculation still needs further experiments to verify, which will also be our future research mission, use more natural emotion databases containing more emotions to examine the contribution of MSFs to vocal-emotion recognition in daily communication. This explanation is added in lines 429 to 432 of the manuscript.

Response 2: Thank you for pointing out mistakes， we verified MSFs. to MSFs in line 87.

Author Response File: Author Response.pdf

Article Menu

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Further Information

Guidelines

MDPI Initiatives

Follow MDPI