Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy

Martins, Miguel; Mascarenhas, Miguel; Almeida, Maria João; Afonso, João; Ribeiro, Tiago; Cardoso, Pedro; Mendes, Francisco; Mota, Joana; Andrade, Patrícia; Cardoso, Hélder; Mascarenhas-Saraiva, Miguel; Ferreira, João; Macedo, Guilherme

doi:10.3390/app15020709

Open AccessArticle

Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy

by

Miguel Martins

^1,2,†

,

Miguel Mascarenhas

^1,2,3,*,†

,

Maria João Almeida

^1,2,

João Afonso

^1,2,

Tiago Ribeiro

^1,2,

Pedro Cardoso

^1,2

,

Francisco Mendes

^1,2

,

Joana Mota

^1,2,

Patrícia Andrade

^1,2,3,

Hélder Cardoso

^1,2,3,

Miguel Mascarenhas-Saraiva

⁴,

João Ferreira

⁵ and

Guilherme Macedo

^1,2,3

¹

Department of Gastroenterology, São João University Hospital, 4200-437 Porto, Portugal

²

WGO Gastroenterology and Hepatology Training Center, 4200-047 Porto, Portugal

³

Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal

⁴

Gastroenterology Department, ManopH, Instituto CUF, 4460-188 Porto, Portugal

⁵

Department of Mechanical Engineering, Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(2), 709; https://doi.org/10.3390/app15020709

Submission received: 1 October 2024 / Revised: 14 December 2024 / Accepted: 23 December 2024 / Published: 13 January 2025

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Capsule endoscopy (CE) offers a minimally invasive approach for assessing the entire gastrointestinal tract, from the esophagus to colon. However, the prolonged procedure duration and potential for error, particularly in capsule panendoscopy (CPE) cases, limit its widespread adoption. Although artificial intelligence (AI) has made significant progress in CE, particularly in automating enteric lesion, developing specific AI models for esophageal lesions remains crucial for the advancement of AI-enhanced CPE. This study aimed to develop and validate an AI model for detecting various types of esophageal lesions at once. It serves as a proof of concept for automating the identification of different types of lesions in this anatomical region, bringing us closer to achieving cost-effective CPE.

Abstract

Background: Capsule endoscopy (CE) improved the digestive tract assessment; yet, its reading burden is substantial. Deep-learning (DL) algorithms were developed for the detection of enteric and gastric lesions. Nonetheless, their application in the esophagus lacks evidence. The study aim was to develop a DL model for esophageal pleomorphic lesion (PL) detection. Methods: A bicentric retrospective study was conducted using 598 CE exams. Three different CE devices provided 7982 esophageal frames, including 2942 PL lesions. The data were divided into the training/validation and test groups, in a patient-split design. Three runs were conducted, each with unique patient sets. The sensitivity, specificity, accuracy, positive and negative predictive value (PPV and NPV), area under the conventional receiver operating characteristic curve (AUC-ROC), and precision–recall curve (AUC-PR) were calculated per run. The model’s diagnostic performance was assessed using the median and range values. Results: The median sensitivity, specificity, PPV, and NPV were 75.8% (63.6–82.1%), 95.8% (93.7–97.9%), 71.9% (50.0–90.1%), and 96.4% (94.2–97.6%), respectively. The median accuracy was 93.5% (91.8–93.8%). The median AUC-ROC and AUC-PR were 0.82 and 0.93. Conclusions: This study focused on the automatic detection of pleomorphic esophageal lesions, potentially enhancing the diagnostic yield of this type of lesion, compared to conventional methods. Specific esophageal DL algorithms may provide a significant contribution and bridge the gap for the implementation of minimally invasive CE-enhanced panendoscopy.

Keywords:

capsule endoscopy; panendoscopy; artificial intelligence; esophageal lesions

1. Introduction

Capsule endoscopy (CE) has revolutionized the management of small bowel diseases [1]. Since its introduction into clinical practice, it evolved to include the possibility of a colon assessment as part of a panenteric approach [2]. In fact, due to the opportunistic capture of esophagus and stomach mucosa while performing CE, there has been a growing interest in developing a more comprehensive panendoscopic solution that enables the additional evaluation of these locations [3]. Advances in imaging technologies, such as those enhancing mucosal visualization, further highlight the potential of this method for assessing the gastrointestinal (GI) tract [4].

Although minimally invasive capsule panendoscopy (CPE) may be tempting and likely well-tolerated by most patients, it presents some technical challenges, including a higher burden for exam interpretation and a greater risk of missing crucial frames [3]. Consequently, the introduction of automated reading assistance methodologies may be crucial for decreasing its reading time while attaining a highly satisfactory diagnostic performance [5]. By doing so, it may be possible to reduce its financial costs, further reinforcing CPE as a cost-effective option [5].

There are particular challenges with the esophageal assessment by CE. Insufflation and controlled movement are not possible in this procedure [6]. Moreover, the rapid esophageal transit, particularly in the upright position, accounts for the reduced number of esophageal mucosa frames provided by each exam, particularly the Z-line, where frequent pathologies are discovered [7]. While esophagogastroduodenoscopy (EGD) remains the gold standard for esophageal assessment, its invasiveness can cause patient discomfort and poses a non-negligible risk of complications [8]. Additionally, certain protocols have been identified to enhance the diagnostic performance of CE in detecting esophageal lesions, by swallowing the capsule in a reclined position, suggesting it may be feasible to detect common esophageal lesions at a fairly comparable level to EGD [9,10].

Artificial intelligence (AI) has been a hot topic in the medical community, specifically in areas with a strong imaging component, such as gastroenterology [11]. In the CE technological scenario, machine-learning models are being supplanted by deep-learning (DL) algorithms due to their unsupervised learning capacity [12]. The majority of DL models developed so far are based on convolutional neural networks (CNNs), although there is a growing interest in using vision transformer (ViT) methods to leverage computer vision tasks [13]. While DL models already have demonstrated potential in the automatic detection of lesions in the small bowel and colon first, and, more recently, in the stomach, there is no evidence regarding its use in the esophagus [6,14,15].

The aim of this study was to develop and validate the first deep-learning model capable of the automatic detection of pleomorphic esophageal lesions.

2. Materials and Methods

2.1. Study Design

In this retrospective study across two centers (Centro Hospitalar Universitário de São João and ManopH Gastroenterology Clinic, both in Porto, Portugal), we reviewed frames from CE procedures, either CE or Colon CE (CCE), performed from June 2021 to May 2023.

Since the project was designed without direct patient intervention, their clinical management remained unaffected. In order to address privacy concerns related to data protection, each patient’s identification information was omitted and assigned an arbitrary number instead. A legal team with certification as Data Protection Officer (Maastricht University) also evaluated privacy rules to ensure non-tracking and compliance with the General Data Protection Regulation.

2.2. Capsule Endoscopy Protocol

Three distinct devices were used to develop this DL model: PillCam^TM SB3 (Medtronic Corp., Minneapolis, MN, USA), PillCam^TM Crohn’s Capsule (Medtronic Corp., Minneapolis, MN, USA), and OMOM^® HD Capsule (JINSHAN Co., Yubei, Chongqing, China). PillCam^TM SB3 and PillCam^TM Crohn’s frames were reviewed with PillCam™ Software version 9 (Medtronic, Minneapolis, MN, USA), whereas OMOM^® HD images were examined using Vue Smart Software (Jinshan Science & Technology Co, Chongqing, Yubei, China, https://www.jinshangroup.com/product/omom-hd-capsule-endoscopy-camera/, accessed on 22 December 2024).

The European Society of Gastrointestinal Endoscopy’s recommendations were used to direct the bowel preparation process. Patients were encouraged to follow a clear liquid diet and fast overnight the day before capsule ingestion. A 2 L polyethylene glycol (PEG) solution was utilized for small bowel preparation. A 4 L PEG solution was given in a split-dosage for PillCam Crohn’s capsule, similar to the preparation process for colonoscopy (patients were instructed to drink 2 L of PEG in the night prior to the procedure and 2 L in the morning of the procedure). The use of an anti-foaming agent, specifically simethicone, was also integrated into the capsule’s administration protocol.

2.3. Categorization of Lesions

Esophageal frames of CE exams were independently reviewed by three gastroenterologists with expertise in CE for identification of lesions in this location. Each frame was classified as either normal or as containing a pleomorphic lesion, which included at least one of the following: protruding lesions, ulcers and erosions, vascular lesions, hematic residues, and esophageal diverticula. Images were included in the final dataset only if their classification received unanimous agreement from the three physicians. The algorithm was built using a total of 7982 frames from three different types of CE devices, 2942 of which had pleomorphic esophageal lesions.

2.4. Development of the DL Model and Performance Analysis

We developed a vision transformer (ViT) model to automatically identify and categorize two types of frames in the esophagus: normal mucosa and pleomorphic lesions. The complete dataset was divided into two main groups for our study: one for training and validation, and the other for testing. To ensure consistency, frames from the same patient were grouped together during this division, following a patient-split design. The training set, comprising 70% of the data, was employed to train the model, while the validation set (20%) helped fine-tune its parameters. The remaining 10% of the data constituted the testing set, used to independently evaluate the diagnostic performance of our ViT model. The graphical representation of our study design can be found in Figure 1.

Our ViT model was initialized using pre-trained weights from ImageNet, a comprehensive image dataset specializing in object recognition [16]. We retained the feature extractor weights to leverage knowledge from ImageNet and defined our own fully connected layers to adapt the pre-trained model to our specific task. To prevent overfitting, we have included between these fully connected layers dropout layers with a dropout rate of 0.2. Subsequently, a dense layer was added to determine the binary classification result (normal or pleomorphic esophageal lesions). The hyperparameters, including the initial learning rate, batch size (32), and the number of epochs, were determined through trial and error. Common data augmentation techniques, such as image rotation and mirroring, were applied during the training stage. Our computational setup consisted of a NVIDIA RTX A6000 graphic processing unit (NVIDIA Corp, Santa Clara, CA, USA) and a dual AMD EPYC 7282 16-Core processor (AMD, Santa Clara, CA, USA).

The model outputs the probability of each frame being labeled as normal or having a pleomorphic esophageal lesion. Based on the highest calculated probability, each frame was assigned one of these labels (Figure 2). We also generated heatmaps to visualize the frame features that contributed most to the model predictions (Figure 3). To establish a reference point, we compared the final classification made by our algorithm with expert assessments provided by three gastroenterologists, recognized as the gold standard.

2.5. Statistics and Reproducibility

We conducted three runs of training, each with an equal distribution of training, validation, and testing data. The patients included in each run were randomly selected, resulting in a unique set of patients for every iteration. Sensitivity (proportion of true positives correctly identified among the total number of individuals with lesions), specificity (proportion of true negatives correctly identified among the total number of individuals without lesions), accuracy (proportion of correctly identified cases [both true positives and true negatives] out of the total number of individuals [all predictions]), negative predictive value (NPV—proportion of true negatives correctly identified among all individuals who were predicted to not have lesions), and positive predictive value (PPV—proportion of true positives among all cases with a positive test result) were calculated for each test group. The final metrics for the model diagnostic performance were determined based on the median and range values of these variables. Additionally, we computed the area under the ROC curve (AUC-ROC) and the area under the precision–recall curve (AUC-PR) for each test set and calculated their average values. We chose to compute both precision–recall and conventional ROC curves to address the imbalance between normal mucosa frames (true negatives) and frames containing pleomorphic lesions (true positives), as this imbalance could potentially lead to misinterpretation when relying solely on the ROC curve.

Furthermore, we assessed the computational performance efficiency of our ViT model in our machine by measuring the processing time for all frames within the test set

Our statistical analysis was carried out using Sci-kit Learn v0.22.2 [17].

3. Results

We constructed a ViT model using a total of 598 exams, 512 of which were small bowel CE exams (457 were PillCam^TM SB3 and 55 were OMOM^® HD Capsule), and the remaining 86 were colon capsule CE exams (PillCam^TM Crohn’s Capsule).

After triple validation, a total of 7982 frames were used, with 2942 including pleomorphic esophageal lesions (protruding lesions, ulcers and erosions, vascular lesions, hematic residues, and esophageal diverticula). The number of frames and patients, as well as types of CE device are displayed in Table 1.

The metrics calculated for each run are displayed in Table 2. The median sensitivity, specificity, and accuracy were, respectively, 75.8% (range 63.6–82.1%), 95.8% (range 93.7–97.9%), and 93.5% (range 91.8–93.8%). The median positive and negative predictive values were, respectively, 71.9% (range 50.0–90.1%) and 96.4% (94.2–97.6%). The median AUC-ROC and the AUC-PR for the detection of pleomorphic esophageal lesions were, respectively, 0.82 and 0.93 (Figure 4).

In the test set, each frame took 26 ± 3 milliseconds to process.

4. Discussion

To the best of our knowledge, this is the first AI deep-learning model for the automatic detection of pleomorphic esophageal lesions during CE. This proof-of-concept specific model showed good overall diagnostic performance metrics, across different types of CE devices, which could potentially serve as a noteworthy contribution and the missing step for the implementation of a minimally invasive CE-based panendoscopic evaluation.

There are some highlights that must be acknowledged. Firstly, the patient-split design ensures that frames from a specific patient are only attributable to one of either the training/validation or testing group, reducing the risk of similar frames being presented in both, which could lead to the overfitting of the AI model. Secondly, several CE devices were used to train this algorithm, which increases the interoperability of the model, an important aspect for enhancing its technology readiness level and, consequently, its application in real-life clinical practice. Thirdly, it is worth noting that this algorithm was trained using frames from two high-volume CE centers, which may increase the external validity of this results. Fourthly, its capacity to accurately identify different types of lesions during a single procedure enhances its clinical utility, as well. Moreover, through the generation of heatmaps that highlight the region with the higher probability of containing a lesion, we can infer that the model appears to detect patterns as lesions in the way we ideally expect it to. This explainability feature addresses an important current topic, since it not only reduces the cognitive load on exam interpretation by directing attention to specific areas, but also empowers the physician to make the necessary corrections in case of erroneous predictions.

The fact that the development of this model was based on a ViT model can also stand as a strength of this paper and should be emphasized. ViT models are a type of DL algorithm that were initially developed in the field of natural language processing, since there is an enhanced capability in recognizing complex relations using self-attention mechanisms [18]. This distinct feature enables models to achieve a greater precision in tasks such as language paraphrasing and translations. More recently, there has been an increasing interest in trying to employ ViT models to enhance visual complicated tasks, with some evidence indicating they perform equally well or at a higher level compared to CNN models [13,18]. In terms of CE technology, there are a great number of published DL models based on CNN algorithms, but none of them include ViT models. To our knowledge, this model stands as a pioneer. Not only is it the first of its kind in the esophagus, but it also marks the inaugural application of a ViT method for automatical lesion detection in this specific location, potentially marking a significant double advancement in CE technology. The outcome metrics demonstrate a robust diagnostic performance, indicating a valuable balance between minimizing missed lesions and keeping the proportion of false positives low, which will be an essential consideration for an effective AI-enhanced capsule endoscopy assessment.

Nonetheless, some limitations have to be recognized. On the one hand, this study has a retrospective and bicentric design involving a relatively small number of patients, which may introduce a demographic (selection) bias, implying that these finding may not be broadly applicable to other population settings. Additionally, the lack of access to clinically relevant information about this sample further hinders our understanding of how this model might impact different patient groups. Subsequent prospective studies with a larger patient population and better control of clinical variables are needed in order to corroborate our results for future application. On the other, this algorithm was constructed using a relatively low number of CE still-frames, and it does not necessarily guarantee that the model will exhibit the same diagnostic performance when applied to full-length CE videos. We were also unable to conduct a subanalysis by device or lesion type due to the limited sample size, which lacked sufficient statistical power. This also represents a limitation of the study, as it prevents us from evaluating how device-specific image characteristics may influence the model performance, as well as determining whether the accuracy is consistent across different lesion types. The paucity of esophageal mucosa frames is a topic worth mentioning, as it turns out to be both a limitation and strength of this work. Until now, there were no published esophageal-specific trained DL models for lesion detection. This may be explained due to the limited number of esophageal frames provided by each exam, impeding the development of dedicated databases. Despite being at a lower technological readiness level due to the limited data and methodological challenges, this achievement marks a significant achievement. It stands as the first published DL algorithm in this esophageal location, representing as well the accumulated experience of the scientific group in the field of AI-enhanced CE, although we also recognize that future studies are needed to explore its broader applicability.

The use of AI algorithms that further improve the diagnostic performance of CE can potentially shift the imbalance that currently favors EGD and may render CE as a cost-effective and patient-friendly option for the evaluation of esophageal pathology. In the case of esophageal varices, systematic reviews with meta-analyses have been published, demonstrating that, with the correct protocol, the pooled-accuracy for the detection of such lesions can reach up to 90% of cases [10]. Similarly, when it comes to Barrett’s metaplasia in patients with gastro-esophageal reflux disease, systematic reviews with meta-analyses indicate that CE’s diagnostic performance metrics can be comparable to those of EGD (with a pooled sensitivity of 78% and specificity 86% in CE, versus 78% and 90%, respectively, in EGD) [9]. Considering that it is possible for CE to detect esophageal lesions at a similar level as the gold standard, EGD, one may hypothesize that the addition of AI reading-assisted systems could potentially result in a further improvement in its diagnostic yield.

5. Conclusions

The implementation of a minimally invasive CPE is the ultimate goal of CE employ. Nevertheless, the current cost-effectiveness of such a procedure is limited by the prolonged examination reading time and suboptimal diagnostic yield. This is where AI can play a pivotal role, addressing most of these limitations and paving the way for a potentially economically viable procedure, ensuring that bowel preparation is adequate. The success of AI-based CPE relies on the existence of trustworthy algorithms capable of detecting lesions in each of the GI tract regions—not only in the stomach, small bowel, and colon, but also in the esophagus. The development of DL models for the detection of pleomorphic esophageal lesions is an important milestone and serves as a proof of the methodological concept, taking us one step closer to achieving cost-effective CPE. However, real-world adoption will require standardized protocols, adjustments of clinical workflows and training, and further multicentric and prospective studies to confirm its feasibility.

Author Contributions

Conceptualization, M.M. (Miguel Mascarenhas), M.M. (Miguel Martins), and G.M.; methodology, M.M. (Miguel Mascarenhas) and M.M. (Miguel Martins); software, J.F.; validation, M.M. (Miguel Mascarenhas) and M.M. (Miguel Martins); formal analysis, J.F.; investigation, M.M. (Miguel Martins), M.J.A., J.A., T.R., P.C., F.M., and J.M.; resources, P.A., H.C., M.M.-S., and G.M.; data curation, M.M. (Miguel Martins), M.J.A., J.A., T.R., P.C., F.M., and J.M.; writing—original draft preparation, M.M. (Miguel Mascarenhas), M.M. (Miguel Martins), and M.J.A.; writing—review and editing, M.M. (Miguel Mascarenhas), M.M. (Miguel Martins), M.J.A., J.A., T.R., P.C., F.M., and J.M.; visualization, M.M. (Miguel Martins), M.J.A., J.A., T.R., P.C., F.M., and J.M.; supervision, P.A., H.C., M.M.-S., and G.M.; project administration, M.M. (Miguel Mascarenhas) and G.M.; funding acquisition, M.M. (Miguel Mascarenhas) and G.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of São João University Hospital (No. CE 407/2020).

Informed Consent Statement

Not applicable. The need for informed consent is waived by the Ethics Committee due to the retrospective nature of the study.

Data Availability Statement

Raw data were generated at the Faculty of Medicine of the University of Porto, PT. Derived data supporting the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pennazio, M.; Rondonotti, E.; Despott, E.J.; Dray, X.; Keuchel, M.; Moreels, T.; Sanders, D.S.; Spada, C.; Carretero, C.; Cortegoso Valdivia, P.; et al. Small-bowel capsule endoscopy and device-assisted enteroscopy for diagnosis and treatment of small-bowel disorders: European Society of Gastrointestinal Endoscopy (ESGE) Guideline—Update 2022. Endoscopy 2023, 55, 58–95. [Google Scholar] [CrossRef]
Spada, C.; McNamara, D.; Despott, E.J.; Adler, S.; Cash, B.D.; Fernández-Urién, I.; Ivekovic, H.; Keuchel, M.; McAlindon, M.; Saurin, J.C.; et al. Performance measures for small-bowel endoscopy: A European Society of Gastrointestinal Endoscopy (ESGE) Quality Improvement Initiative. United Eur. Gastroenterol. J. 2019, 7, 614–641. [Google Scholar] [CrossRef]
Rondonotti, E.; Pennazio, M. Colon capsule for panendoscopy: A narrow window of opportunity. Endosc. Int. Open 2021, 9, E1860–E1862. [Google Scholar] [CrossRef]
Wang, Y.P.; Karmakar, R.; Mukundan, A.; Tsao, Y.M.; Sung, T.C.; Lu, C.L.; Wang, H.C. Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy. Sci. Rep. 2024, 14, 22243. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, T.; Fernández-Urien, I.; Cardoso, H. Chapter 15—Colon capsule endoscopy and artificial intelligence: A perfect match for panendoscopy. In Artificial Intelligence in Capsule Endoscopy; Mascarenhas, M., Cardoso, H., Macedo, G., Eds.; Academic Press: Cambridge, MA, USA, 2023; pp. 255–269. [Google Scholar] [CrossRef]
Mascarenhas, M.; Mendes, F.; Ribeiro, T.; Afonso, J.; Cardoso, P.; Martins, M.; Cardoso, H.; Andrade, P.; Ferreira, J.; Saraiva, M.M.; et al. Deep Learning and Minimally Invasive Endoscopy: Automatic Classification of Pleomorphic Gastric Lesions in Capsule Endoscopy. Clin. Transl. Gastroenterol. 2023, 14, e00609. [Google Scholar] [CrossRef] [PubMed]
Park, J.; Cho, Y.K.; Kim, J.H. Current and Future Use of Esophageal Capsule Endoscopy. Clin. Endosc. 2018, 51, 317–322. [Google Scholar] [CrossRef] [PubMed]
Levy, I.; Gralnek, I.M. Complications of diagnostic colonoscopy, upper endoscopy, and enteroscopy. Best. Pract. Res. Clin. Gastroenterol. 2016, 30, 705–718. [Google Scholar] [CrossRef] [PubMed]
Bhardwaj, A.; Hollenbeak, C.S.; Pooran, N.; Mathew, A. A meta-analysis of the diagnostic accuracy of esophageal capsule endoscopy for Barrett’s esophagus in patients with gastroesophageal reflux disease. Am. J. Gastroenterol. 2009, 104, 1533–1539. [Google Scholar] [CrossRef]
McCarty, T.R.; Afinogenova, Y.; Njei, B. Use of Wireless Capsule Endoscopy for the Diagnosis and Grading of Esophageal Varices in Patients With Portal Hypertension: A Systematic Review and Meta-Analysis. J. Clin. Gastroenterol. 2017, 51, 174–182. [Google Scholar] [CrossRef] [PubMed]
Messmann, H.; Bisschops, R.; Antonelli, G.; Libânio, D.; Sinonquel, P.; Abdelrahim, M.; Ahmad, O.F.; Areia, M.; Bergman, J.; Bhandari, P.; et al. Expected value of artificial intelligence in gastrointestinal endoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Position Statement. Endoscopy 2022, 54, 1211–1231. [Google Scholar] [CrossRef] [PubMed]
Afonso, J.; Martins, M.; Ferreira, J.; Mascarenhas, M. Chapter 1—Artificial intelligence: Machine learning, deep learning, and applications in gastrointestinal endoscopy. In Artificial Intelligence in Capsule Endoscopy; Mascarenhas, M., Cardoso, H., Macedo, G., Eds.; Academic Press: Cambridge, MA, USA, 2023; 10p. [Google Scholar] [CrossRef]
Moutik, O.; Sekkat, H.; Tigani, S.; Chehri, A.; Saadane, R.; Tchakoucht, T.A.; Paul, A. Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data? Sensors 2023, 23, 734. [Google Scholar] [CrossRef] [PubMed]
Mascarenhas, M.; Ribeiro, T.; Afonso, J.; Ferreira, J.P.S.; Cardoso, H.; Andrade, P.; Parente, M.P.L.; Jorge, R.N.; Mascarenhas Saraiva, M.; Macedo, G. Deep learning and colon capsule endoscopy: Automatic detection of blood and colonic mucosal lesions using a convolutional neural network. Endosc. Int. Open 2022, 10, E171–E177. [Google Scholar] [CrossRef] [PubMed]
Mascarenhas Saraiva, M.J.; Afonso, J.; Ribeiro, T.; Ferreira, J.; Cardoso, H.; Andrade, A.P.; Parente, M.; Natal, R.; Mascarenhas Saraiva, M.; Macedo, G. Deep learning and capsule endoscopy: Automatic identification and differentiation of small bowel lesions with distinct haemorrhagic potential using a convolutional neural network. BMJ Open Gastroenterol. 2021, 8, e000753. [Google Scholar] [CrossRef]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jegou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
Pedregosam, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. 2011, 12, 2825–2830. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]

Figure 1. Diagram that shows how the study was designed. AUC-PR: area under the precision–recall curve; AUC-ROC: area under the conventional receiver operating characteristic curve; CE: capsule endoscopy; CCE: colon capsule endoscopy; NPV: negative predictive value; and PPV: positive predictive value.

Figure 2. Examples of how the algorithm estimated probability of being considered normal (N) vs. having a pleomorphic esophageal lesion (PL). Considering the one with the highest probability, each frame was classified in one of categories stated earlier. The matching evaluation provided by the three physicians, considered the gold standard, is written in the upper right rectangle.

Figure 3. Examples of created heatmaps demonstrating how DL model distinguishes a pleomorphic esophageal lesion: (a) erosion, (b) protruding lesion, (c) varices, and (d) diverticula.

Figure 4. 1—Area under the conventional receiver operating characteristic curve, 2—Area under the precision–recall curve (AUC-PR) of VIT’s performance in detection of pleomorphic esophageal lesions. The proportion of cases in which the DL model was correct is associated with precision and it is on the y-axis. The proportion of frames with true pleomorphic esophageal lesions is associated with recall (which is the same as sensitivity) and it is on the x-axis. A higher precision indicates a lower false positive rate, whereas a higher recall means a lower false negative rate. The higher the precision and recall, the bigger the AUC-PR.

Table 1. Number of frames, patients, and types of CE device per group, divided in a patient-split design: training/validation (90% of patients) vs. test group (10% of remaining).

		Frames (n)	Patients (n)	Types of Devices (n)
Run 1	Train/Validation	7320	538	3
Run 1	Test	662	60	3
Run 2	Train/Validation	7508	538	3
Run 2	Test	474	60	3
Run 3	Train/Validation	6912	538	3
Run 3	Test	1070	60	3

Table 2. Data were divided in training/validation and test groups, in a patient-split design. Three runs were conducted, each with unique patient sets. Metrics were calculated in test set of each of run.

	Sensitivity (%)	Specificity (%)	PPV (%)	PPN (%)	Accuracy (%)
Run 1	75.8	97.9	90.1	94.2	93.5
Run 2	82.1	93.7	71.9	96.4	91.8
Run 3	63.6	95.8	50.0	97.6	93.8
Median	75.8	95.8	71.9	96.4	93.5
(Minimum–Maximum)	(63.6–82.1)	(93.7–97.9)	(50.0–90.1)	(94.2–97.6)	(91.8–93.8)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martins, M.; Mascarenhas, M.; Almeida, M.J.; Afonso, J.; Ribeiro, T.; Cardoso, P.; Mendes, F.; Mota, J.; Andrade, P.; Cardoso, H.; et al. Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy. Appl. Sci. 2025, 15, 709. https://doi.org/10.3390/app15020709

AMA Style

Martins M, Mascarenhas M, Almeida MJ, Afonso J, Ribeiro T, Cardoso P, Mendes F, Mota J, Andrade P, Cardoso H, et al. Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy. Applied Sciences. 2025; 15(2):709. https://doi.org/10.3390/app15020709

Chicago/Turabian Style

Martins, Miguel, Miguel Mascarenhas, Maria João Almeida, João Afonso, Tiago Ribeiro, Pedro Cardoso, Francisco Mendes, Joana Mota, Patrícia Andrade, Hélder Cardoso, and et al. 2025. "Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy" Applied Sciences 15, no. 2: 709. https://doi.org/10.3390/app15020709

APA Style

Martins, M., Mascarenhas, M., Almeida, M. J., Afonso, J., Ribeiro, T., Cardoso, P., Mendes, F., Mota, J., Andrade, P., Cardoso, H., Mascarenhas-Saraiva, M., Ferreira, J., & Macedo, G. (2025). Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy. Applied Sciences, 15(2), 709. https://doi.org/10.3390/app15020709

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning and Automatic Detection of Pleomorphic Esophageal Lesions—A Necessary Step for Minimally Invasive Panendoscopy

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Capsule Endoscopy Protocol

2.3. Categorization of Lesions

2.4. Development of the DL Model and Performance Analysis

2.5. Statistics and Reproducibility

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI