Next Article in Journal
Characterization of Hydraulic Spool Clamping Triggered by Solid Particles Based on Mechanical Model and Experiment Research
Previous Article in Journal
Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks
 
 
Article
Peer-Review Record

The Impact of Pause and Filler Word Encoding on Dementia Detection with Contrastive Learning

Appl. Sci. 2024, 14(19), 8879; https://doi.org/10.3390/app14198879
by Reza Soleimani 1, Shengjie Guo 1, Katarina L. Haley 2, Adam Jacks 2 and Edgar Lobaton 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Appl. Sci. 2024, 14(19), 8879; https://doi.org/10.3390/app14198879
Submission received: 7 June 2024 / Revised: 10 August 2024 / Accepted: 25 September 2024 / Published: 2 October 2024
(This article belongs to the Special Issue Artificial Intelligence Technology in Neurodegenerative Diseases)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I personally like this work very much. It describes a well designed study about encoding specific sounds and pauses made in speech in order to obtain text documents that might be then classified with much less computing power than raw audio recordings.

The authors put much effort into testing various kinds of encoding, finally obtaining results that are comparable with BERT-S, which is impressive. Also the application of the results is in an area of study that is very important, namely cognitive disorder detection (though possibly could be also applied elsewhere, like for detecting specific kinds of emotional states).

What I don't exactly like in the paper is the layout of its chapters. For instance, the comparison with BERT-S could well be placed into the section about experiments, rather than the discussion. Also I would love to see a separate section about related work, not necessarily as a part of introduction. However, I admit that it is not a very major obstacle in reading the text, and thus I leave the decision about changing the layout to the authors.

Minor remarks: line 139 at the end, "constrative" should probably be replaced with "contrastive", line 324 at the end, "which this combination", it looks like something is missing here.

Author Response

Please see attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper addresses a crucial topic in dementia detection using an interesting methodology. However, to improve the manuscript, the authors need to clearly articulate the contributions of their work.  

To improve the manuscript, I suggest the following:  

1) The authors should explicitly state the novelty of their approach and how it advances the field of dementia detection. Simply applying an existing method to this domain is not a significant contribution in itself. I recommend that the authors consider modifying their method, comparing their dual contrasting approach with other existing methods, or exploring why their approach is particularly well-suited for this task. Also, it is unclear whether the use of in-text pauses for dementia detection is a novel approach. The authors should clearly explain what has been done previously, referring to unpublished work in the paper [18] is confusing and should be clarified.  

2) The authors refer to paper [18], which is their preprint and not publicly available. To ensure reproducibility and transparency, all necessary details should be included in the current manuscript.  

3) The manuscript lacks details about the dataset used. It is unclear whether the authors worked with text transcripts or audio recordings. The authors mention existing AI speech-to-text methods, such as Whisper and Wav2vec (corrected typo, line 102), but it is unclear whether these methods were applied or if the text versions were obtained from the source.  

4) Are there differences presented in Table 2 statistically significant? If not what does this tell us? 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Please review the introduction chapter. Some paragraphs  belong to the method chapter another to the discussion and evaluation of the consistency of the outcomes.  For instance . 

"However, we take a different approach in our modeling that 45 does not involve temporal word alignment, which follows the directions in [18]. Instead, 46 we explore alternative methods to encode pauses and language information directly within 47 the textual data. This approach allows us to investigate the impact of pause information 48 on model performance, potentially offering new insights into dementia detection through 49 textual analysis. 50 Another approach of interest in this paper is contrastive learning (CL). Researchers 51 have adopted contrastive methods to perform dementia detection, relying on the data 52 itself to improve the representation space to enhance model performance. In this paper, 53 we employ a technique called dual contrastive learning (DualCL) [21], which has demon- 54 strated strong performance with general textual data. In Section 3.3, we provide a detailed 55 explanation of this methodology".

The paragraphs below need also to be re-allocated . some sentences belong to the method chapter other to the analysis of the results. 

I suggest you use a traditional structure of scientific papers to increase the understanding of your contribution 

introduction, aim, method, theoretical background, outcomes/results, analysis, comments 

 

 

To achieve this, we will utilize LLMs (transfer learning). Our findings 60 suggest that these approaches are effective in improving model performance. Below, we 61 summarized our contributions: 62 • Proposing in-text pause and other language features (uh/um) encoding. 63 • Thoroughly evaluating different pauses to provide insight on how they affect model 64 performance. 65 • Proposing the use a contrastive learning to improve performance. 66 • Combining in-text pause encoding and DualCL in a multitask manner. 67 To the best of the authors’ knowledge, the proposed approaches as stated above and 68 in subsequent sections have not been explored by other authors in the literature. Our 69 work extends our previous research presented in [18], where we initially introduced in-text 70 encoding. The differences between our new work and the previous one are summarized in 71 our contributions. 

 

Try to avoid  own comments about the quality of your manuscript. 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

(1)What is the encoding principle of In-text Pause?

(2)How is to set a hyper-parameter lamda?

(3)The equation label is misssing.

(4)How is to set the hyper-parameter a?

(5)If possible, the complexity of model should be analyzed. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors refer to paper [22] (previously [18]), which is still not publicly available. As the paper is not publicly available, it is challenging to evaluate the novelty and contributions of the current work. I recommend that the authors consider publishing their previous work on a preprint server such as arXiv. This would allow us to access the previous paper and compare its content with the current submission.

Author Response

We thank the reviewer for the constructive feedback to improve the paper. We incorporated reviewers’ comments to our manuscript the best we could.

Comments1 : The authors refer to paper [22] (previously [18]), which is still not publicly available. As the paper is not publicly available, it is challenging to evaluate the novelty and contributions of the current work. I recommend that the authors consider publishing their previous work on a preprint server such as arXiv. This would allow us to access the previous paper and compare its content with the current submission.

Response: per reviewer's suggestion, we published the previous paper on preprint server and updated the reference with the link to the paper. We thank the reviewer for their constructive feedback. 

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have reviewed the manuscript and submitted it again . 

It could be nice the authors structure the manuscript more clearly and move part of the introduction to the method chapter . Please clarify the main aim of the manuscript . 

Aim 1 : 

 This paper studies the effect of both modalities on the 40 model performance. Also, Using Transformers and attention-based approaches combined 41 with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) haven 42 been explored in [17,18].

Aim 2

The main objective of this paper is to enhance model accuracy through the exploration 56 of in-text encoding and contrastive learning, which can be considered a multi-task learning 57 scheme. As shown in [25–27], multi-task learning can be beneficial in improving model per- 58 formance.

 

Comment:Which ones is the main aim of the manuscript.?

This seems to belong to the method chapter 

To achieve this, we will utilize LLMs (transfer learning). Below, we summarized 59 our contributions

This paragraph seems to belong to final comments and reflections not to the introduction 

To the best of the authors’ knowledge, the proposed approaches as stated above and 66 in subsequent sections have not been explored by other authors in the literature. Our 67 work extends our previous research presented in [22], where we initially introduced in-text 68 encoding. The differences between our new work and the previous one are summarized in 69 our contributions. For more detail, please see section 2.2. 

 

 

 

 

 

 

Author Response

We thank the reviewer for the constructive feedback to improve the paper. We incorporated reviewers’ comments to our manuscript the best we could.

Comments 1: Aim 1 : 

 This paper studies the effect of both modalities on the 40 model performance. Also, Using Transformers and attention-based approaches combined 41 with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) haven 42 been explored in [17,18].

Aim 2

The main objective of this paper is to enhance model accuracy through the exploration 56 of in-text encoding and contrastive learning, which can be considered a multi-task learning 57 scheme. As shown in [25–27], multi-task learning can be beneficial in improving model per- 58 formance.

 

Comment:Which ones is the main aim of the manuscript.?

Response: Aim 2 is the main objective of the paper. We changed the wording to make sure this confusion is resolved. Thank you very much for pointing that out. 

Comments 2: 

This seems to belong to the method chapter 

To achieve this, we will utilize LLMs (transfer learning). Below, we summarized 59 our contributions

This paragraph seems to belong to final comments and reflections not to the introduction 

To the best of the authors’ knowledge, the proposed approaches as stated above and 66 in subsequent sections have not been explored by other authors in the literature. Our 67 work extends our previous research presented in [22], where we initially introduced in-text 68 encoding. The differences between our new work and the previous one are summarized in 69 our contributions. For more detail, please see section 2.2. 

Response: per reviewer's suggestion, we moved those portion of the text to their respective sections for more clarity. We thank the reviewer for thoroughly examining the paper with constructive feedbacks.

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have completed the paper revisions according to the reviewers' comments. 

Author Response

Comments 1: The authors have completed the paper revisions according to the reviewers' comments. 

Response: We thank the reviewer for their constructive feedback. 

Back to TopTop