Next Article in Journal
Predicting Road Crash Severity Using Classifier Models and Crash Hotspots
Next Article in Special Issue
Voice Simulation: The Next Generation
Previous Article in Journal
Three-Level Hybrid Envelope Tracking Supply Modulator with High-Bandwidth Wide-Output-Swing
Previous Article in Special Issue
The Role of Data Analytics in the Assessment of Pathological Speech—A Critical Appraisal
 
 
Perspective
Peer-Review Record

Voice Maps as a Tool for Understanding and Dealing with Variability in the Voice

Appl. Sci. 2022, 12(22), 11353; https://doi.org/10.3390/app122211353
by Sten Ternström 1,* and Peter Pabon 2
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(22), 11353; https://doi.org/10.3390/app122211353
Submission received: 30 September 2022 / Revised: 28 October 2022 / Accepted: 2 November 2022 / Published: 9 November 2022
(This article belongs to the Special Issue Current Trends and Future Directions in Voice Acoustics Measurement)

Round 1

Reviewer 1 Report

Article prospects for finding a solution to the problem of determining the variability of the voice.

According to the MDPI classification, an article of the “perspective” type is: “Perspectives are usually an invited type of article that showcase current developments in a specific field. Emphasis is placed on future directions of the field and on the personal assessment of the author. Comments should be situated in the context of existing literature from the previous 3 years. The structure is similar to a review, with a suggested minimum word count of 3500 words.” The last two requirements were clearly not met by the authors. The article cites 46 sources, among which there are "freshnesses" of 1976, 1988, 1991. I understand that the authors prefer "warm, tube sound" and "classic never gets old", but "the law is harsh, but such is the law." In addition, the size of the article clearly exceeds the mentioned 3500 words.

The main thing with which it is worth starting such articles is the rationale for the process under study and the rationale for the relevance of its research. The authors have said profusely about the first and said practically nothing about the second. I recommend the authors to read the article "Entropy-Argumentative Concept of Computational Phonetic Analysis of Speech Taking into Account Dialect and Individuality of Phonation". There are laconic examples of both the first and the strict.

I understand that sound cards are where speech technology began, but if the object of study remains unchanged, then the subject of study evolves. For example, these are ways to get sound cards. Wavelets much more than the Fourier basis are suitable for visualizing short-term features of the voice. Finally, waveletograms are just beautiful )

The authors do not say anything about qualitative metrics for evaluating voice visualization options. However, in the article there is no classification of applicants for evaluation.

Speaking about the visual analysis of something, it is simply impossible to ignore the achievements of computer vision. All kinds of convolutional neural networks, autoencoders, transformers are not only buzzwords, but also powerful applied technologies. How exactly to apply them to the analysis of sound cards? The authors will tell us.

Finally, the article must end optimistically. With clear statements of specific tasks related to the object of study. Preferably, arranged in order of need for their solution. Without this, we have journalism, not science. I ask the authors not to give up their work. There are not many of us, "sound-know-mens", left )

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The present overview draws examples from several studies of normal voices using voice mapping and takes stock of many sources of variation in speakers and singers. The overall quality of the paper is in line with the expectation, and the perspective is also pleasing. Following are some of the comments which should be addressed in the revised version of this paper:

Abstract

1. In the abstract, the objective and scope are not very clear. For instance, the objective seems to be a “voice map.” Yet, the “voice map” is not considered objective in the abstract.

2. In addition, the abstract has too many themes. I suggest explaining the critical issue of the paper briefly to make it clear, concise, accurate, and informative.

Introduction

1. As a review article, you should describe the rationale in the context of what is already known. On the other hand, the context of the article in the scientific literature should be provided to lead the reader.

2. I think you should also state the paper’s contribution more clearly.

Aspects of variability in voice

I suggest using a short paragraph to describe the relationship between this section and the critical topic “voice map” of this paper.

Discussion and Conclusion

As far as I am concerned, this section lacks critical discussion. It is more like a descriptive summary of the topic. If there is contradictory research in the area of the voice map, it is better to include an element of debate and present both sides of the argument. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I formulated the following recommendations for the basic version of the article:

1: The main thing with which it is worth starting such articles is the rationale for the process under study and the rationale for the relevance of its research. The authors have said profusely about the first and said practically nothing about the second. I recommend the authors to read the article "Entropy-Argumentative Concept of Computational Phonetic Analysis of Speech Taking into Account Dialect and Individuality of Phonation". There are laconic examples of both the first and the strict.

2: I understand that sound cards are where speech technology began, but if the object of study remains unchanged, then the subject of study evolves. For example, these are ways to get sound cards.

3: Wavelets much more than the Fourier basis are suitable for visualizing short- term features of the voice. Finally, waveletograms are just beautiful.

Reviewer 1: The authors do not say anything about qualitative metrics for evaluating voice visualization options. However, in the article there is no classification of applicants for evaluation.

4: Speaking about the visual analysis of something, it is simply impossible to ignore the achievements of computer vision. All kinds of convolutional neural networks, autoencoders, transformers are not only buzzwords, but also powerful applied technologies. How exactly to apply them to the analysis of sound cards? The authors will tell us.The authors tried to respond to my recommendations with humor. I appreciated the efforts of the authors. I support the publication of this work, because the classics do not get old. The authors appeal to the unshakable pillars of speech technologies. We all stand on the pillars of the titans. Sometimes it is necessary to rediscover the open, because the perception of the picture depends on the lighting. I wish the authors not to stop and publish a research article and more than one.

Author Response

In this second round, Reviewer 1 simply reiterates the comments from the first round, without offering any assessment of whether or not those comments have been adequately addressed in our revision R1. 

Reviewer 1 then writes: "The authors tried to respond to my recommendations with humor. I appreciated the efforts of the authors. I support the publication of this work, because the classics do not get old. The authors appeal to the unshakable pillars of speech technologies. We all stand on the pillars of the titans. Sometimes it is necessary to rediscover the open, because the perception of the picture depends on the lighting. I wish the authors not to stop and publish a research article and more than one."

Thank you for your encouragement. We remain puzzled by several of your comments in round 1, the relevance of which is not clear to us, but nevertheless we tried to formulate a appropriate response. We had hoped that you would have commented on the substance of our article. That substance is (1) to deliver a well-founded criticism of some established measurement paradigms in voice research, as being highly susceptible to several types of errors, and (2) to propose solutions based on the greater awareness of variability in the voice that results from using the proposed voice mapping paradigm. We do not share your view that we "appeal to the unshakable pillars of speech technologies". Yes, we are grounded in voice science and speech technology from the early days, and we are also fully aware of not only current developments but also of current practice in clinical research on the voice, which we find to be lacking in some respects. As is evident from the list of references, we have published a number of research articles related to voice mapping over the years, and we intend to continue doing so. 

 

Back to TopTop