Next Article in Journal
Distributed Genetic Algorithm for Community Detection in Large Graphs with a Parallel Fuzzy Cognitive Map for Focal Node Identification
Next Article in Special Issue
Cascade Speech Translation for the Kazakh Language
Previous Article in Journal
Review of the Main Mechanical Testing Methods for Interlayer Characterization in Laminated Glass
Previous Article in Special Issue
Contributions of Temporal Modulation Cues in Temporal Amplitude Envelope of Speech to Urgency Perception
 
 
Article
Peer-Review Record

Listeners’ Spectral Reallocation Preferences for Speech in Noise

Appl. Sci. 2023, 13(15), 8734; https://doi.org/10.3390/app13158734
by Olympia Simantiraki 1,* and Martin Cooke 2
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2023, 13(15), 8734; https://doi.org/10.3390/app13158734
Submission received: 24 June 2023 / Revised: 24 July 2023 / Accepted: 26 July 2023 / Published: 28 July 2023
(This article belongs to the Special Issue Audio, Speech and Language Processing)

Round 1

Reviewer 1 Report

I appreciate the chance to review this manuscript, which presents interesting and novel perspectives that could contribute to our understanding of auditory processing and speech recognition. I congratulate the authors for their innovative study, which merits publication. Nevertheless, there are certain aspects that require attention in order to enhance the manuscript’s focus, coherence, and clarity. Additionally, I have a few suggestions to make to the authors.

The study investigated the impact of spectral modifications through an experimental framework in which participants were granted the ability to manipulate speech parameters in real-time while receiving immediate audio feedback, facilitating the simultaneous assessment of both subjective preferences and word recognition performance. The authors concluded that listener choices were not arbitrary, even in cases where speech intelligibility reached its maximum or remained consistent across various adjustment values, suggesting that listener preferences encompass factors beyond the sole objective of maintaining comprehensibility.

 

Comments on the Introduction

In general, the background/rationale for this study is sufficiently developed as written. Yet, while I acknowledge the length limitation for the manuscript, I believe it would benefit from the inclusion of some information pertaining to auditory processing and speech recognition in general (not necessarily linked to non-live forms of speech output). Addressing this topic in the literature review could be relevant and could also serve as a guiding framework for the research design.

 Furthermore, in the following sections of the manuscript, a few different speech properties are mentioned. However, it is unclear from the introduction why these particular speech properties were chosen for this experiments and how/why the possibilities for specific manipulations were selected. It would enhance the manuscript if the authors provided a clearer justification for their choice of speech properties and offered more details regarding the selection of the specifics of the manipulations used.

One general comment regarding formatting: the in-text citations are not numbered consecutively in the order they are cited (e.g. the very first citation that appears in the manuscript is numbered 9, followed by 47, 48, 53…).

 

Comments on the Methods and Results:

Both experiments are very well designed and thoroughly described, as are the results. My only commentary regarding these sections would be that the subject inclusion/exclusion criteria and the participant selection process (e.g. was it an open call? Invitation? Convenience sampling? etc) are not sufficiently explained. I believe these aspects warrant further elaboration.

 

Comments on the Discussion (both interim and general)

It is essential to provide a stronger contextual placement of the present study within the existing body of research to effectively explain and support the findings. It is my opinion that the findings are not sufficiently supported by previous literature. The authors should consider elaborating on the interpretation of the results in light of the published literature, drawing meaningful connections and highlighting the implications of the study’s findings. Furthermore, it could be beneficial if the authors further elaborate on the larger significance of these results beyond the scope of the current study, emphasizing their contributions to the field and potential practical applications. This additional elaboration will enhance the overall quality and impact of the discussion section.

I recommend that the authors also consider including the small sample size as a limitation of the study. The limited number of participants makes it more challenging to isolate individual variability and could potentially impact the generalizability of the findings.

Author Response

Please see the attachment. 

Author Response File: Author Response.pdf

Reviewer 2 Report

In this article the authors designed two experiments in which listeners adjusted spectral properties of speech degraded by different levels of noise to enhance its intelligibility. The article was very well written. I enjoyed reading the article. This article could have important implications for the hearing assistance technologies. I have a few mostly minor comments: 

1- The authors used low-context sentences for the experiments. Please provide a discussion in the case of high-context speech material as they're more relevant in daily communication. I'm assuming that with high-context sentences the scores will be closer to the 100% correct level and one may not see the effects of reallocation as clearly as in low-context sentences. 

2- p5, Section 3.4: Did the authors account for homophones? (e.g., sea vs. see)

3- Figures 2 and 5: please provide the baseline intelligibility score for each SNR, otherwise it is unclear how much benefit each spectral reallocation condition provided. 

4- p7, line 253: "Since participants were encouraged to spend as long as necessary during the adjustment phase, trials with long response times were retained" Are the intelligibility scores only for the trials with long response times as well? If not, then plotting the intelligibility scores for all trials and then plotting the response time for the longer trials is confusing. Please elaborate. 

5- p10, lines 351-353: using the same sentences for both experiments may be a confound. Listeners in experiment 2 were already familiar with the sentences during the adjustment phase. This familiarity is equivalent to an increased SNR, which then may change the listeners' preferred adjustment. Please justify. 

6- p10, lines 354-360: I highly recommend sharing the MATLAB code for the spectral filtering in both experiments to facilitate replicating the results. But feel free to ignore this comment. 

7- p13, line 455: missing apostrophe in listeners.

8- p15, line 525: given the important implications of the results in this paper for hearing aids, please provide a discussion on how these results may change for individuals with hearing loss given their reduced frequency selectivity. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I appreciate the authors’ efforts to address the concerns raised in my initial review, and I acknowledge that the manuscript is indeed improved. Nevertheless, I remain concerned about two issues that may still require clarification, as they seem to have been potentially misunderstood.

Firstly, my suggestion regarding the inclusion of information on auditory processing and speech recognition in a general context appears to have been partially addressed. The authors introduced two non-specific sentences in the manuscript, which read: “Understanding the mechanisms underlying auditory processing and speech recognition is fundamental for the advancement of such algorithms. Studies over several decades support the idea that listeners are remarkably flexible in terms of the information that is deemed to be sufficient to support accurate speech perception.” [page 1, lines 24-27]. However, on top of the absence of references for the statement “studies over several decades”, subsequently the authors transition back to non-live forms of speech output. I believe it is crucial to delve deeper into auditory processing and speech perception in regular listening situations, encompassing aspects such as acoustic processing (e.g., frequency, amplitude, and temporal patterns), phonemic analysis, and word recognition. Addressing these aspects by drawing from relevant linguistics and phonetics literature would greatly enhance not only the introduction but also the discussion and practical applications of the findings.

Secondly (and I think perhaps these two issues are connected), my initial review suggested the need for a more comprehensive justification for the speech features the authors chose to work with, along with additional details regarding the selecting of these specific spectral manipulations. The authors did present a preview of this discussion in the following paragraph: “An alternative paradigm is used in the present study to investigate the wider impact of spectral energy reallocation. The approach is based on providing listeners with the ability to modify some facet of the speech signal, with real-time auditory feedback, allowing listeners to express their preferences directly. This listener-centric technique has been used in the past to explore listeners’ preferred choice of formant frequency/fundamental frequency relationship, speech rate, speech level and local SNR for speech enhancement. Listener preferences in the context of spectral modifications have previously been explored for individuals with hearing loss by providing them with direct control over broadband, low-, and high-frequency gain or loudness and degree of spectral tilt.” [page 2, lines 70-79]. However, I feel this could be further elaborated. Why did the authors choose to work with frequency manipulations, instead of loudness or rate, for example?

There is ample literature discussing the role of speech spectral properties in listener comprehension and understanding even in the absence of speech manipulation. I suggest the authors review and incorporate relevant literature to support and strengthen the rationale for their study.

In conclusion, while the manuscript has been improved, addressing these two lingering concerns in a more comprehensive manner would significantly enhance the overall quality and scientific value of the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have addressed all my comments.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

I have had the opportunity to once more thoroughly review the manuscript titled “Listeners’ spectral reallocation preferences for speech in noise” submitted to Applied Sciences. The manuscript presents a well-researched contribution to the field, and after careful evaluation, I am pleased to recommend its acceptance for publication in its present form. The authors have addressed all of the concerns I raised during the review process.

Back to TopTop