Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

AI-Based Detection of Aspiration for Video-Endoscopy with Visual Aids in Meaningful Frames to Interpret the Model Outcome

Sensors 2022, 22(23), 9468; https://doi.org/10.3390/s22239468

by Jürgen Konradi^1,*

, Milla Zajber², Ulrich Betz¹

, Philipp Drees³, Annika Gerken⁴

and Hans Meine⁴

Reviewer 1:

Jerzy Balicki

Reviewer 2: Anonymous

Sensors 2022, 22(23), 9468; https://doi.org/10.3390/s22239468

Submission received: 19 October 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 4 December 2022

(This article belongs to the Special Issue Explainable/Interpretable Machine Learning for Biomedical Sensing, Sensor Data Fusion and Diagnostics)

Round 1

Reviewer 1 Report

Quite a large team of Authors undertook to find a solution to an extremely difficult problem, which is AI-based detection of aspiration for video-endoscopy with visual aids in meaningful frames to interpret the model outcome. In my opinion, the task was done perfectly. The manuscript reads fluently, the content is consistent, and the rich bibliography allows you to supplement your knowledge if necessary. The described XAI model has been trained to detect aspiration in endoscopic swallowing videos. Besides, it explains its assessment by locating specific video frames with relevant aspiration events. Moreover, the model distinguishes the suspected bolus in situ as a meaningful sequence. Thus, these difficult diagnostic decisions are verifiable, interpretable, and thus acceptable for clinical users. Of course, then the interaction with the dysphagia experts can improve the outcome. The undoubted advantages of the recommended software are aid endoscopists to improve accuracy, shorten the duration of the administration, and safe costs. It is worth emphasizing that disorders of swallowing are a relevant problem across various etiologies and all sectors of healthcare provision. Generally speaking, oropharyngeal dysphagia is extremely dangerous to human life, even in such trivial cases as aspiration of boluses and saliva, when material passes the vocal cords and enters the airways.

Admittedly the high potential application is the CNN for aspiration detection of VFSS videos with an accuracy of AUC of 1.00, but it is limited in clinical use. In the proposed model, the AI learns the segmentation of relevant anatomical structures like the vocal cords and the glottis. Simultaneously, the AI is trained to detect bolus that passes the glottis and becomes aspirated into the airways. This interpretable architecture results in a final model that explains its assessment by locating specific video frames with relevant aspiration events and by highlighting the glottis, vocal cords, and suspected bolus in situ as visual aids in meaningful frames.

In my work, I encountered some weaker passages worth discussing:

1. Automated the video analysis by the CNN to generate a new video in which all AI-based segmentations and detections of aspirations are drawn into all frames of the video sequence was applied. Why was the CNN not compared to another LSTM network which is considered an effective tool for video clip analysis?

2. The math formulas are missing from the manuscript. Even if we write about medical issues, it is worth defining basic relationships, such as F1 score. We cannot assume that the Reader remembers all the concepts, especially since we are writing about XAI.

3. The very nice writing style of the work is slightly disturbed by the aesthetics of Figure 3, in which the size of the font describing the axes is approx. 50% larger than that used in the manuscript.

Of course, the above comments do not negatively affect the very high rating of the article, but taking them into account will undoubtedly increase its quality.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear authors,

I found your paper interesting and well-written. Despite thorough checking, I could not find any major problems. The topic is fresh and interesting, and the scope of the work clearly presented and well-organized. In my opinion, the language is fine and I could not find any big issues. The only thing which lowered the reception of the paper was the quality of the figures. Please make them more readable (e.g. fig 1 is blurred and too small, the font on fig 2 is too small, and the figure itself would benefit from being bigger).

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Article Menu

AI-Based Detection of Aspiration for Video-Endoscopy with Visual Aids in Meaningful Frames to Interpret the Model Outcome

Further Information

Guidelines

MDPI Initiatives

Follow MDPI