Next Article in Journal
Optimized-Based Fault-Tolerant Control of an Electro-Hydraulic System with Disturbance Rejection
Previous Article in Journal
Seismic Microzonation Map for a Fixed-Jacket Platform in the Malay Basin
 
 
Article
Peer-Review Record

Multi-Scale Recurrence Quantification Measurements for Voice Disorder Detection

Appl. Sci. 2022, 12(18), 9196; https://doi.org/10.3390/app12189196
by Xin-Cheng Zhu, Deng-Huang Zhao, Yi-Hua Zhang, Xiao-Jun Zhang * and Zhi Tao *
Reviewer 1: Anonymous
Reviewer 3:
Appl. Sci. 2022, 12(18), 9196; https://doi.org/10.3390/app12189196
Submission received: 7 August 2022 / Revised: 11 September 2022 / Accepted: 12 September 2022 / Published: 14 September 2022
(This article belongs to the Section Acoustics and Vibrations)

Round 1

Reviewer 1 Report

In this paper, the authors evaluated a method for voice disorder detection using multi-scale recurrence quantification measurements on the vowel /a:/. In general, the topic is innovative and interesting, but I have too many concerns, fundamental questions and remarks which leads to my conclusion of a rejection for this current version of the manuscript.

 

Major comments:

-        Abstract doesn’t meet the criteria by Applies Sciences (https://www.mdpi.com/journal/applsci/instructions#preparation). Furthermore, the content has to be written more reader friendly for interest readers who are not in the field.

-        The introduction is too complicated and not reader-friendly using a rudimentary English language. A lot of information is missing about the procedure, working, and value for example of non-linear measures, MFCC, GFCC, recurrence maps, recurrence quantification analysis, and MRQMS

-        The selection of the voice samples from the MEEI database is unclear: “From this database, 53 normal voice samples and 173 pathological voice samples with four symptoms were selected as a data subset. Among them, 53 normal voice samples, 20 cases of vocal folds polyps samples, and 67 cases of vocal folds paralysis samples were selected as multi-class samples.

Did the others only include three groups (normal voices, vocal fold polyps and VF-paralysis), or did they include a heterogenous sample set of dysphonia? If the first one is correct, the study and its (high) validity results are marginal and in consequence no novelty to the present literature. VF-paralysis are mostly characterized with breathy voice sound (high frequent noise) and polyps mostly rough voice sound (variant in spectral characteristics based on e.g., multiplophonia, irregularity, and glottal fry) in comparison to the harmonic normal voice sound. If the second one corresponds to the facts, then please list up all voice disorders in a table in detail. Furthermore, it is unclear what the authors mean with “four symptoms”.

-        There are also methodological weaknesses, such as no statistical paragraph of tests and software, and the combination of the results and the discussion sections under one heading.

-        The results are not clear described and have to be improved.

-        The clinical value of this fundamental research is too little described, although high validity results were found. A prospective view and future work are missing.

-        There are a lot of errors or incomprehensibles in the text (some examples):

-   “According to the characteristics of pathological voice energy distribution, Zhang used the frequency division method to improve the recognition rate in the bark scale [28].” Who or what is “Zhang”?

-   Error in writing: “In the past, the characteristics of voice signal were generally studied from the perspective of linear signal processing technology.(spectral and cepstral features)”

-   “Zhou extracted multi-scale nonlinear features GTSLs in the gamma scale [29].” Who or what is Zhou, GTSL or gamma scale?

-   Therefore, from the perspective of RP, […]”. What is RP?

-   Error in citing: “Leonardo Wanderley Lopes et al introduced […]”.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

This paper uses Recurrence Quantification Analysis to analyse the global signals of patients vs. controls and show high classification accuracy when using the nonlinear parameters as input for the different machine learning approaches used. In general, I think the paper is a good idea as these nonlinear analysis are being used more often in the literature and have been proven to show great accuracy classification various physiological signals. Below are some comments that I hope help strengthen the paper:

- In the introduction I would try to order/separate a bit the argumentation of the three categories in which we can characterised speech signals (perturbation, spectral....). That part it is a bit tough to follow. For instance, this part  seems to come from nowhere:

It should be pointed out that the spectral or cepstral features do not depend on the accurate estimation of F0, such as mel frequency cepstral coefficients (MFCC) and gamma frequency cepstral coefficients (GFCC) [7]. 

Please organise this a bit more and maybe add some additional description to improve readability.

- Are reference numbers right in page 2? They jump abrupty from 8 to 27.

- Some full stops are probably not needed in the second paragraph in page 2. Please revise.

- When RQA is first referenced in the intro, the creators of the method should  probably be acknowledged. This is something that applies to the whole manuscript. In general, I think there is a lack of citations all over the manuscript. Some of the statements need some support from . Please revise that and add that into a further revision of the manuscript.

- Please the second paragraph of the gamma tone filter bank description. The sentences are full of full stops, and it is tough to follow and understand the description of the variables of the formula. 

- In the last paragraph of page 4, shouldn't it be Fig 2 and not Fig 4?

- When the authors describe the recurrence plot in figure 3 they say:

The recurrence characteristics of the dynamic system are represented by black dots, and a white dot is scored when the dynamic system is non-recurrent 

 

In that image the plot is in blue, so please change the image or texts accordingly.

 

- Further explanation is needed to justify in the first paragraph of page 7 the method used for threshold selection. Why it is not appropriate to use 5sigma? Further or clearer description needed.

- All the RQA parameters are great, but it would be good to add some predictions/examples in those parameters. For instance, higher or lower transitivity is expected in pathological patients? You could more or less say what to expect for each of those parameters and why. This would help understand the rationale of the analysis.

- Please state what are the value of delay and embedding dimension used. Also specify the range of values obtained in each parameter.

- The sample are quite unbalanced with way higher number of pathological patients than control. Do you think that can affect the analysis and the machine learning algorithms? 

- All the acronyms in the first paragraphs of page 10 need to be explained for unexperienced readers. In general, I would suggest using 3 or 4 acronyms. The capacity of a readers to remember all the acronyms is very limited and most probably they forgot unless they are expert in the topic.

Due to the problem of unbalanced pathological samples, the FC-SMOTE algorithm was adopted for sample balancing for multi-classification experiments [44]. What does this mean? What does this algorithm do? Does this solve the problem of unbalanced samples? Please explain in more detail.

- The first paragraph of the results and discussion part seems more a method description part. I would suggest to rearrange this a bit.

-Recurrence measures are generally highligh correlated and some machine learning algorithms can work slightly worse in those cases...could that affect your results? Please discuss accordingly. 

- At the end of page 11 the authors claim This proved that the method proposed in this article can ef-fectively separate the normal samples from the pathological samples, and the dichoto-mous classification performance was superior.  My problem is that this is a discussion and nothing is being discussed. Why the 40% threshold is so good? Any guess? 

 

- Why is RQA so good that other methods don't have? Please discuss your results in comparison to existing analysis and prove the improvement that RQA adds into the literature.

- Beginning of page 12. Shouldn't be Fig 5? And in Page 13 shouldn't be Fig 6 instead of 7?

- Generally, RQA measures are highly correlated. Some machine learning algorithms tend to overfit the data when this is the case or when we use more variables than needed. Do you think similar results can be obtained with fewer variables. Have you tried some feature reduction to check this? 

- No limitations are listed. Nothing to mention?

- I always ask this to all the papers I revise. Is the code used for this analysis freely available? If not, why?

With all this I think that the paper can be consider for publication after major revision. I am happy to revise further revision of the present document.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

A multiscale recurrence quantification measures (MRQMs) are proposed for pathological voice detection. The prediction analysis is assessed with the MEEI dataset and compared with existing methods.

 Comments:

 1-      Explain the reason for improvement in accuracy of the recurrence quantification measurements over the state-of-the-art techniques in the pathological voice analysis.

22-      The method should be tested with other speech task such as isolated words, running sentences to assess its reliability.

3-      Define the term Rij  in equations 5-7.

4-      It is not clear what is multiscale RQM represents in the study.

55-      Mention the abbreviations for the machine learning algorithms  BN , LWL etc in section 3.

66-   In section 4, 4th paragraph, fig 8 should be replaced with Fig. 5.

77-      Represent the pathological multiclass experiments results in a tabular form better visibility to reader.

88-      Explain how the 312 dimensional feature vector is formed.

99-    In section 2.1, the fig 4 should be replaced with fig. 2. Also define the term ‘IAIF’.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I thank the authors for their major revision and agree with the acceptance of the manuscript in its current version.

Author Response

Thanks very much for your valuable comment. All your suggsetions are very important, they have important guiding significance for my thesis writing and scientific work.

Reviewer 2 Report

I would like to thank the authors for carefully answering the comments. 

 

Author Response

Thanks very much for your valuable comment. All your suggsetions are very important, they have important guiding significance for my thesis writing and scientific work.

Reviewer 3 Report

Authors have addressed most of my comments. However, I have few concerns as follows:

1- In connection to the comment 2, the isolated word and sentences have been well used in speech disorder in many disease conditions such as Parkinson , Alzheimer. Both Vocal cord and vocal tract influences a lot in the speech pathology which may give better information. Recurrence quantification analysis should be tried with the speech tasks for better understanding of the pathology.

2- Add the results of other machine learning classifiers in the Table 7 for better comparison.

 

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop