Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (337)

Search Parameters:
Keywords = interobserver variability

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1866 KB  
Article
Robust Multiclass Pneumonia Classification via Multi-Head Attention and Transfer Learning Ensemble
by Shenghua Rao, Zhuo Zeng and Jiemeng Zhang
Appl. Sci. 2025, 15(21), 11426; https://doi.org/10.3390/app152111426 (registering DOI) - 25 Oct 2025
Abstract
Pneumonia is an acute respiratory infection caused by pathogens such as bacteria or viruses, and accurate early diagnosis is critical for reducing mortality. Chest X-ray (CXR) imaging serves as a conventional diagnostic tool. However, radiographic features of pneumonia often overlap with those of [...] Read more.
Pneumonia is an acute respiratory infection caused by pathogens such as bacteria or viruses, and accurate early diagnosis is critical for reducing mortality. Chest X-ray (CXR) imaging serves as a conventional diagnostic tool. However, radiographic features of pneumonia often overlap with those of other pulmonary diseases and are subject to inter-observer variability. Traditional Convolutional Neural Network (CNN) models tend to capture redundant information during feature extraction, and single pre-trained models often exhibit limited generalization in multiclass classification tasks. This study proposes a multi-model ensemble learning framework based on multi-head attention mechanism. Firstly, the three pre-trained backbones—DenseNet-121, ResNet-50, and VGG-19—were fine-tuned through transfer learning by replacing their classification heads, adapting pooling layers, and optimizing the fully connected layers. Secondly, feature maps extracted from these tuned backbones were concatenated and fused using a multi-head attention mechanism; the fused representation was then refined by two consecutive multi-head attention layers and finally passed to a fully connected classifier to produce the ensemble prediction. Three task sets were constructed from a public Kaggle dataset: binary classification (normal vs. pneumonia), three-class classification (normal, COVID-19, viral pneumonia), and four-class classification (normal, lung opacity, viral pneumonia, COVID-19), achieving accuracies of 91.67%, 93.79%, and 90.60%, respectively. The results demonstrate that the proposed multi-head attention-based ensemble framework offers significant advantages for pneumonia multiclass classification, particularly by maintaining high recall and robustness in more complex scenarios such as four-class differentiation, indicating its potential as a clinical decision-support tool. Future work will involve expanding the dataset and evaluating the model’s generalizability across additional disease categories. Full article
Show Figures

Figure 1

13 pages, 3076 KB  
Article
Estimation of Kidney Volumes in Autosomal Dominant Polycystic Kidney Disease: A Comparison Between Manual Segmentation and Ellipsoid Formula
by Nicola Maggialetti, Claudia Dipalma, Eva Colucci, Ilaria Villanova, Giovanni Lorusso, Maria Grazia Arcidiacono, Giovanni Piscopo and Amato Antonio Stabile Ianora
Clin. Pract. 2025, 15(11), 191; https://doi.org/10.3390/clinpract15110191 - 23 Oct 2025
Viewed by 66
Abstract
Objectives: Evaluate the agreement and interobserver variability between manual segmentation and the ellipsoid formula in estimating single kidney volume (SKV) in patients with autosomal dominant polycystic kidney disease (ADPKD). Methods: In this retrospective study, 130 unenhanced CT scans of ADPKD kidneys [...] Read more.
Objectives: Evaluate the agreement and interobserver variability between manual segmentation and the ellipsoid formula in estimating single kidney volume (SKV) in patients with autosomal dominant polycystic kidney disease (ADPKD). Methods: In this retrospective study, 130 unenhanced CT scans of ADPKD kidneys were analyzed. Three radiologists (one senior, two juniors) measured SKV using manual segmentation and the ellipsoid formula. Statistical analyses included intraclass correlation coefficient (ICC), Wilcoxon signed-rank test, Bland–Altman analysis, and paired t-tests to compare measurement values and computation times. Results: Both methods showed excellent interobserver agreement (ICC ≥ 0.977). No significant difference was observed in volume estimates between the two techniques (Wilcoxon p = 0.295). Bland–Altman analysis confirmed strong agreement between methods for the senior radiologist. The ellipsoid method was significantly faster for all readers (p < 0.05). Conclusions: The ellipsoid formula is a reliable, time-efficient alternative to manual segmentation for SKV estimation in ADPKD, offering comparable accuracy with reduced resource demands in clinical settings. Full article
Show Figures

Figure 1

17 pages, 1926 KB  
Systematic Review
Quantitative Ultrasound for Hepatic Steatosis: A Systematic Review Highlighting the Diagnostic Performance of Ultrasound-Derived Fat Fraction
by Dimitrios Kavvadas, Vasileios Rafailidis, Aris Liakos, Emmanouil Sinakos, Sasan Partovi, Theodora Papamitsou and Panos Prassopoulos
Diagnostics 2025, 15(20), 2640; https://doi.org/10.3390/diagnostics15202640 - 20 Oct 2025
Viewed by 645
Abstract
Background/Objectives: Metabolic-dysfunction-associated steatotic liver disease (MASLD) is a leading cause of chronic liver disease worldwide, requiring accurate and accessible diagnostic tools. Methods: A systematic review evaluated the diagnostic performance of Ultrasound-Derived Fat Fraction (UDFF), with a primary focus on prospective studies [...] Read more.
Background/Objectives: Metabolic-dysfunction-associated steatotic liver disease (MASLD) is a leading cause of chronic liver disease worldwide, requiring accurate and accessible diagnostic tools. Methods: A systematic review evaluated the diagnostic performance of Ultrasound-Derived Fat Fraction (UDFF), with a primary focus on prospective studies comparing UDFF to MRI Proton Density Fat Fraction (MRI-PDFF) as the reference standard and a secondary appraisal of its performance against other modalities. Additional parameters, such as technical feasibility, inter-observer agreement, and proposed thresholds, were summarized to support clinical applicability. Results: Seven prospective MRI-based studies (n = 862) demonstrated excellent correlation (average r = 0.848) and reproducibility (inter-observer intraclass correlation coefficient ICC = 0.978, intra-observer ICC = 0.980) of UDFF, with high diagnostic accuracy across steatosis grades (AUCs ≥ 0.89). Additional studies comparing UDFF with Controlled Attenuation Parameter (CAP), histology, and other quantitative ultrasound techniques (attenuation- or backscatter-based methods) confirmed high sensitivity and specificity, particularly for advanced steatosis, and emphasized the potential of UDFF as a comprehensive quantitative biomarker. Proposed UDFF cut-offs for mild, moderate, and severe steatosis ranged from 5% to 23%, demonstrating high sensitivity and specificity. Factors like body position, probe pressure, and visceral fat influenced measurements, underscoring the need for standardized protocols. Conclusions: UDFF seems to offer a reliable and cost-effective quantitative ultrasound modality. So far, it correlates strongly with MRI-PDFF and accurately grades steatosis, especially for S2–S3. Given cut-off variability and protocol sensitivity, broad routine adoption may be premature. Therefore, we recommend further studies focusing on standardized acquisition and cut-off calibration to MRI-PDFF. Full article
(This article belongs to the Special Issue Diagnostic Imaging in Gastrointestinal and Liver Diseases)
Show Figures

Figure 1

15 pages, 457 KB  
Review
Use of AI Histopathology in Breast Cancer Diagnosis
by Valentin Ivanov, Usman Khalid, Jasmin Gurung, Rosen Dimov, Veselin Chonov, Petar Uchikov, Gancho Kostov and Stefan Ivanov
Medicina 2025, 61(10), 1878; https://doi.org/10.3390/medicina61101878 - 20 Oct 2025
Viewed by 373
Abstract
Background and Objectives: Breast cancer (BC) is a global health concern for women; the disease contributes to significant morbidity and mortality. A key element in the diagnosis of BC involves the histopathological diagnosis, which determines patient management and therapy. However, BC is [...] Read more.
Background and Objectives: Breast cancer (BC) is a global health concern for women; the disease contributes to significant morbidity and mortality. A key element in the diagnosis of BC involves the histopathological diagnosis, which determines patient management and therapy. However, BC is a multifaceted disease, limiting access to early diagnosis and, therefore, treatment. Artificial intelligence (AI) is transforming diagnostics in the medical field, especially in the detection of BC. Due to the increased availability of digital slides, it has facilitated the effective integration of AI in breast cancer diagnosis. Diagnosis poses a great challenge, even for experienced pathologists, due to the heterogeneity of this malignancy. Analysing microscopic slides by pathologists requires a considerable amount of time. Implementation of AI into routine workflows holds potential to improve diagnostic sensitivity and inter-observer concordance, and to increase efficiency by reducing the review time, thereby helping to alleviate the burden of diagnosing BC. Previous studies mainly address imaging modalities or oncology broadly, while a few specifically concentrates on the histopathological aspect of breast cancer. This review aims to explore the novel synthesis of AI advancements in digital pathology, including tumour classification, grading, lymph node staging, and biomarker evaluation, and discuss their potential incorporation into clinical workflows. We will also discuss the current barriers and prospects for future advancements. Materials and Methods: A literature search was conducted in PubMed and Google Scholar using the mentioned keywords. Articles published in English until July 2025 were reviewed and synthesised narratively. Results: Recent studies demonstrate that AI models such as convolutional neural networks (CNNs), YOLO, and RetinaNet achieve high accuracy in tumour detection, histological grading, lymph node metastasis localisation, and biomarker analysis. The reported performance values range from 75% to over 95% accuracy across various tasks, with gains in diagnostic sensitivity and inter-observer concordance, and reduced review time in assisted workflows. However, certain limitations, such as data variability, external validation in clinical practice, and ethical concerns, restrict the growth and optimal performance of AI and its clinical applicability. Conclusions: The future for AI looks promising, as it is rapidly evolving. By analysing evidence across multiple domains, this review evaluates both opportunities and persisting barriers, offering practical overviews for future clinical transition. AI cannot replace pathologists; however, it has the capabilities to enhance diagnostic precision, efficiency, and ultimately patient outcomes. It is only a matter of time before AI is adopted into healthcare. Full article
(This article belongs to the Section Oncology)
Show Figures

Figure 1

12 pages, 646 KB  
Article
Intra- and Inter-Observer Reliability of ChatGPT-4o in Thyroid Nodule Ultrasound Feature Analysis Based on ACR TI-RADS: An Image-Based Study
by Ziman Chen, Nonhlanhla Chambara, Shirley Yuk Wah Liu, Tom Chi Man Chow, Carol Man Sze Lai and Michael Tin Cheung Ying
Diagnostics 2025, 15(20), 2617; https://doi.org/10.3390/diagnostics15202617 - 17 Oct 2025
Viewed by 236
Abstract
Background/Objectives: Advances in large language models like ChatGPT-4o have extended their use to medical image analysis. Accurate assessment of thyroid nodule ultrasound features using ACR TI-RADS is crucial for clinical practice. This study aims to evaluate ChatGPT-4o’s intra-observer consistency and its agreement with [...] Read more.
Background/Objectives: Advances in large language models like ChatGPT-4o have extended their use to medical image analysis. Accurate assessment of thyroid nodule ultrasound features using ACR TI-RADS is crucial for clinical practice. This study aims to evaluate ChatGPT-4o’s intra-observer consistency and its agreement with an expert in analyzing these features from ultrasound image assessments based on ACR TI-RADS. Methods: This cross-sectional study used ultrasound images from 100 thyroid nodules collected prospectively between May 2019 and August 2021. Ultrasound images were analyzed by ChatGPT-4o, following ACR TI-RADS guidelines, to assess features of thyroid nodule including composition, echogenicity, shape, margin, and echogenic foci. The analysis was repeated after one week to evaluate intra-observer reliability. The ultrasound images were also analyzed by another ultrasound expert for the evaluation of inter-observer reliability. Agreement was measured using Cohen’s Kappa coefficient, and concordance rates were calculated based on alignment with the expert’s reference classifications. Results: Intra-observer agreement for ChatGPT-4o was moderate for composition (Kappa = 0.449) and echogenic foci (Kappa = 0.404), with substantial agreement for echogenicity (Kappa = 0.795). Agreement was notably low for shape (Kappa = −0.051) and margin (Kappa = 0.154). Inter-observer agreement between ChatGPT-4o and the expert was generally low, with Kappa values ranging from −0.006 to 0.238, the highest being for echogenic foci. Overall concordance rates between ChatGPT-4o and expert evaluations ranged from 46.6% to 48.2%, with the highest for shape (65%) and the lowest for echogenicity (29%). Conclusions: ChatGPT-4o showed favorable consistency in assessing some thyroid nodule features in intra-observer analysis, but notable variability in others. Inter-observer comparisons with expert evaluations revealed generally low agreement across all features, despite acceptable concordance for certain imaging characteristics. While promising for specific ultrasound features, ChatGPT-4o’s consistency and accuracy still vary significantly compared to expert assessments. Full article
(This article belongs to the Section Medical Imaging and Theranostics)
Show Figures

Figure 1

23 pages, 1303 KB  
Review
Advancing the Diagnosis and Treatment of Early Chronic Pancreatitis Through Innovation in Imaging and Biomarker Profiling—A Narrative Review
by Alexandru-Ionut Coseru, Diana Elena Floria, Constantin Simiras, Radu Alexandru Vulpoi, Vadim Rosca, Roxana Nemteanu, Oana Petrea, Irina Ciortescu, Oana-Bogdana Barboi, Gheorghe G. Balan, Catalin Sfarti, Georgiana-Emanuela Gîlca-Blanariu, Catalina Mihai, Liliana Gheorghe, Alina Plesa and Vasile-Liviu Drug
Life 2025, 15(10), 1574; https://doi.org/10.3390/life15101574 - 9 Oct 2025
Viewed by 492
Abstract
Early chronic pancreatitis (ECP) represents a potentially reversible stage in the natural history of chronic pancreatic disease. Timely diagnosis of ECP offers a possibility for intervention, yet its diagnosis remains challenging due to nonspecific symptoms, lack of standardized criteria, and the limited diagnostic [...] Read more.
Early chronic pancreatitis (ECP) represents a potentially reversible stage in the natural history of chronic pancreatic disease. Timely diagnosis of ECP offers a possibility for intervention, yet its diagnosis remains challenging due to nonspecific symptoms, lack of standardized criteria, and the limited diagnostic sensitivity of conventional tools. This review aims to synthesize recent advancements in the understanding, detection, and management of ECP, with a focus on innovation in imaging techniques and biomarker profiling. The goal is to facilitate earlier diagnosis and more effective patient stratification. We reviewed the literature from the past five years, including original studies, meta-analyses, and expert consensus statements, to address the current evidence across genetic, inflammatory, imaging, and biochemical domains relevant to ECP. Endoscopic ultrasound and advanced magnetic resonance techniques offer high sensitivity in detecting early parenchymal changes, although inter-observer variability and lack of standardization persist. Biomarker discovery has focused on inflammatory (IL-6, sCD163), fibrotic (TGF-β1, TIMP-1), and oxidative markers, as well as novel candidates like microRNAs. Genetic predisposition (PRSS1, SPINK1, CTRC, CPA1, CLDN2) significantly influences disease onset and progression and could enable selection of high-risk individuals. Therefore, diagnosing ECP should involve a multidisciplinary precision-based approach integrating clinical, radiologic, molecular, serologic, and genetic data for individualized risk stratification. Full article
(This article belongs to the Section Medical Research)
Show Figures

Figure 1

38 pages, 6947 KB  
Article
EfficientNet-B3-Based Automated Deep Learning Framework for Multiclass Endoscopic Bladder Tissue Classification
by A. A. Abd El-Aziz, Mahmood A. Mahmood and Sameh Abd El-Ghany
Diagnostics 2025, 15(19), 2515; https://doi.org/10.3390/diagnostics15192515 - 3 Oct 2025
Viewed by 391
Abstract
Background: Bladder cancer (BLCA) is a malignant growth that originates from the urothelial lining of the urinary bladder. Diagnosing BLCA is complex due to the variety of tumor features and its heterogeneous nature, which leads to significant morbidity and mortality. Understanding tumor histopathology [...] Read more.
Background: Bladder cancer (BLCA) is a malignant growth that originates from the urothelial lining of the urinary bladder. Diagnosing BLCA is complex due to the variety of tumor features and its heterogeneous nature, which leads to significant morbidity and mortality. Understanding tumor histopathology is crucial for developing tailored therapies and improving patient outcomes. Objectives: Early diagnosis and treatment are essential to lower the mortality rate associated with bladder cancer. Manual classification of muscular tissues by pathologists is labor-intensive and relies heavily on experience, which can result in interobserver variability due to the similarities in cancerous cell morphology. Traditional methods for analyzing endoscopic images are often time-consuming and resource-intensive, making it difficult to efficiently identify tissue types. Therefore, there is a strong demand for a fully automated and reliable system for classifying smooth muscle images. Methods: This paper proposes a deep learning (DL) technique utilizing the EfficientNet-B3 model and a five-fold cross-validation method to assist in the early detection of BLCA. This model enables timely intervention and improved patient outcomes while streamlining the diagnostic process, ultimately reducing both time and costs for patients. We conducted experiments using the Endoscopic Bladder Tissue Classification (EBTC) dataset for multiclass classification tasks. The dataset was preprocessed using resizing and normalization methods to ensure consistent input. In-depth experiments were carried out utilizing the EBTC dataset, along with ablation studies to evaluate the best hyperparameters. A thorough statistical analysis and comparisons with five leading DL models—ConvNeXtBase, DenseNet-169, MobileNet, ResNet-101, and VGG-16—showed that the proposed model outperformed the others. Conclusions: The EfficientNet-B3 model achieved impressive results: accuracy of 99.03%, specificity of 99.30%, precision of 97.95%, recall of 96.85%, and an F1-score of 97.36%. These findings indicate that the EfficientNet-B3 model demonstrates significant potential in accurately and efficiently diagnosing BLCA. Its high performance and ability to reduce diagnostic time and cost make it a valuable tool for clinicians in the field of oncology and urology. Full article
(This article belongs to the Special Issue AI and Big Data in Medical Diagnostics)
Show Figures

Figure 1

17 pages, 504 KB  
Review
CIN2 in the Era of Risk-Based Management and HPV Vaccination: Epidemiology, Natural History and Guidelines
by Maria Teresa Bruno, Alessia Pagana, Carla Lo Giudice, Marco Marzio Panella, Giuseppe Mascellino and Antonio Simone Laganà
Diagnostics 2025, 15(19), 2512; https://doi.org/10.3390/diagnostics15192512 - 2 Oct 2025
Viewed by 793
Abstract
Background: Cervical intraepithelial neoplasia grade 2 (CIN2) represents a controversial lesion in cervical cancer prevention. Traditionally included in the aggregate CIN2+ endpoint for reasons of diagnostic stability and statistical power, isolated CIN2 has unique biological characteristics: greater interobserver variability, a high probability of [...] Read more.
Background: Cervical intraepithelial neoplasia grade 2 (CIN2) represents a controversial lesion in cervical cancer prevention. Traditionally included in the aggregate CIN2+ endpoint for reasons of diagnostic stability and statistical power, isolated CIN2 has unique biological characteristics: greater interobserver variability, a high probability of spontaneous regression and a lower risk of progression compared to CIN3. Objectives: To critically describe the epidemiology, natural history and management strategies of CIN2, integrating data from clinical and population-based studies and comparing the recommendations of the main international guidelines. Methods: A narrative review was conducted using a search of PubMed and Scopus (1990–January 2025). Prospective and retrospective studies on isolated CIN2, screening and vaccination trials with CIN2+ endpoints, biomarker research, and consensus documents (ASCCP, ESGO, GISCi, Ministry of Health, WHO) were included. Results: Clinical studies have shown a high probability of CIN2 regression (50–70% within two years, >70% in those <25 years), compared to a 10–15% risk of progression, especially in the presence of persistent HPV16. Screening trials and vaccine evaluations with CIN2+ endpoints have documented the efficacy of the HPV test and a dramatic reduction in lesions in vaccinated cohorts, which was also confirmed for isolated CIN2. The most recent guidelines have progressively adopted a risk-based approach, which allows for active surveillance in young women or those seeking to conceive, while the WHO maintains a screen-and-treat model in resource-limited countries. Conclusions: CIN2 is not a lesion to be treated automatically, but rather a paradigmatic model for personalized management. Integrating epidemiological and clinical data, supported by biomarkers, allows for reducing overtreatment without compromising oncological safety. Full article
(This article belongs to the Section Pathology and Molecular Diagnostics)
Show Figures

Figure 1

20 pages, 162180 KB  
Article
Annotation-Efficient and Domain-General Segmentation from Weak Labels: A Bounding Box-Guided Approach
by Ammar M. Okran, Hatem A. Rashwan, Sylvie Chambon and Domenec Puig
Electronics 2025, 14(19), 3917; https://doi.org/10.3390/electronics14193917 - 1 Oct 2025
Viewed by 461
Abstract
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations [...] Read more.
Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations are time-consuming, expensive, and subject to inter-observer variability. To address these challenges, this work proposes a weakly supervised and annotation-efficient segmentation framework that integrates sparse bounding-box annotations with a limited subset of strong (pixel-level) labels to train robust segmentation models. The fundamental element of the framework is a lightweight Bounding Box Encoder that converts weak annotations into multi-scale attention maps. These maps guide a ConvNeXt-Base encoder, and a lightweight U-Net–style convolutional neural network (CNN) decoder—using nearest-neighbor upsampling and skip connections—reconstructs the final segmentation mask. This design enables the model to focus on semantically relevant regions without relying on full supervision, drastically reducing annotation cost while maintaining high accuracy. We validate our framework on two distinct domains, road crack detection and skin cancer segmentation, demonstrating that it achieves performance comparable to fully supervised segmentation models using only 10–20% of strong annotations. Given the ability of the proposed framework to generalize across varied visual contexts, it has strong potential as a general annotation-efficient segmentation tool for domains where strong labeling is costly or infeasible. Full article
Show Figures

Figure 1

15 pages, 812 KB  
Article
Large Language Model (LLM)-Predicted and LLM-Assisted Calculation of the Spinal Instability Neoplastic Score (SINS) Improves Clinician Accuracy and Efficiency
by Matthew Ding Zhou Chan, Calvin Kai En Tjio, Tammy Li Yi Chan, Yi Liang Tan, Alynna Xu Ying Chua, Sammy Khin Yee Loh, Gabriel Zi Hui Leow, Ming Ying Gan, Xinyi Lim, Amanda Kexin Choo, Yu Liu, Jonathan Wen Po Tan, Ee Chin Teo, Qai Ven Yap, Ting Yonghan, Andrew Makmur, Naresh Kumar, Jiong Hao Tan and James Thomas Patrick Decourcy Hallinan
Cancers 2025, 17(19), 3198; https://doi.org/10.3390/cancers17193198 - 30 Sep 2025
Viewed by 351
Abstract
Background: The Spinal Instability Neoplastic Score (SINS) guides treatment for patients with spinal tumors, but issues arise with complexity, interobserver variability, and time demands. Large language models (LLMs) may help overcome these limitations. Objectives: This study evaluates the accuracy and efficiency of a [...] Read more.
Background: The Spinal Instability Neoplastic Score (SINS) guides treatment for patients with spinal tumors, but issues arise with complexity, interobserver variability, and time demands. Large language models (LLMs) may help overcome these limitations. Objectives: This study evaluates the accuracy and efficiency of a privacy-preserving LLM (PP-LLM) for SINS calculation, with and without clinician involvement, to assess its feasibility as a clinical decision-support tool. Methods: This retrospective observational study was granted a Domain-Specific Review Board waiver owing to minimal risk. Patients from 2020 to 2022 were included. A PP-LLM was employed to maintain secure handling of patient data. A consensus SINS reference standard was established by musculoskeletal radiologists and an orthopedic surgeon. Eight orthopedic and oncology trainees were divided into two groups to calculate SINS, with and without PP-LLM assistance. LLM-predicted scores were also generated independently of any human input. Results: The main outcomes were agreement with the reference standard (measured by intraclass correlation coefficients [ICCs]) and time required for SINS calculation. The LLM-assisted method achieved excellent agreement (ICC = 0.993, 95%CI = 0.991–0.994), closely followed by the LLM-predicted approach (ICC = 0.990, 95%CI = 0.984–0.993). Clinicians working without LLM support showed a significantly lower ICC compared to both LLM methods (0.968, 95%CI = 0.960–0.975) (both p < 0.001). The LLM alone produced scores in approximately 5 s, while the median scoring time for LLM-assisted clinicians was 60.0 s (IQR = 46.0–80.0), notably shorter than the 83.0 s (IQR = 58.0–124.0) required without LLM assistance. Conclusions: An LLM-based approach, whether used autonomously or in conjunction with clinical expertise, enhances both accuracy and efficiency in SINS calculation. Adopting this technology may streamline oncologic workflows and facilitate more timely interventions for patients with spinal metastases. Full article
Show Figures

Figure 1

17 pages, 995 KB  
Article
Assessment of Tumor-Infiltrating Lymphocytes in Triple-Negative Breast Cancer: Interobserver Variability and Contributing Factors
by Nurkhairul Bariyah Baharun, Mohamed Afiq Hidayat Zailani, Afzan Adam, Qiaoyi Xu, Muaatamarulain Mustangin and Reena Rahayu Md Zin
Diagnostics 2025, 15(19), 2492; https://doi.org/10.3390/diagnostics15192492 - 30 Sep 2025
Viewed by 467
Abstract
Background/Objectives: Tumor-infiltrating lymphocytes (TILs) are emerging as a crucial prognostic biomarker in triple-negative breast cancer (TNBC). However, their clinical utility remains constrained by the subjectivity and interobserver variability of manual scoring, despite standardization efforts by the International TILs Working Group (TIL-WG). This study [...] Read more.
Background/Objectives: Tumor-infiltrating lymphocytes (TILs) are emerging as a crucial prognostic biomarker in triple-negative breast cancer (TNBC). However, their clinical utility remains constrained by the subjectivity and interobserver variability of manual scoring, despite standardization efforts by the International TILs Working Group (TIL-WG). This study aimed to evaluate the interobserver agreement among pathologists in scoring stromal and intratumoral TILs from H&E-stained TNBC slides and to identify contributing histological factors. Methods: Two consultant pathologists at Hospital Canselor Tuanku Muhriz, Kuala Lumpur, independently assessed 64 TNBC cases using TIL-WG guidelines. Interobserver agreement was quantified using the intraclass correlation coefficient (ICC) and Cohen’s kappa coefficient. Cases with over 10% scoring discrepancies underwent review by a third pathologist, and a consensus discussion was held to explore the underlying confounders. Results: Our results showed moderate interobserver agreement for stromal TILs (ICC = 0.58) and strong agreement for intratumoral TILs (ICC = 0.71). Significant variability was attributed to three main confounding variables: heterogeneous TIL distribution, poorly defined tumor-stroma interface, and focal dense lymphoid infiltrates. Conclusions: These findings highlight the need for standardized TIL scoring protocols and suggest that validated AI-based tools may help mitigate observer variability in future TIL assessments. Full article
(This article belongs to the Section Pathology and Molecular Diagnostics)
Show Figures

Figure 1

16 pages, 6893 KB  
Article
The Relationship Between Non-Invasive Tests and Digital Pathology for Quantifying Liver Fibrosis in MASLD
by Xiaodie Wei, Lixia Qiu, Xinxin Wang, Chen Shao, Jing Zhao, Qiang Yang, Jun Chen, Meng Yin, Richard L. Ehman and Jing Zhang
Diagnostics 2025, 15(19), 2475; https://doi.org/10.3390/diagnostics15192475 - 27 Sep 2025
Viewed by 479
Abstract
Background: It is crucial to evaluate liver fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD). Digital pathology, an automated method for quantitative fibrosis measurement, provides valuable support to pathologists by providing refined continuous metrics and addressing inter-observer variability. Although non-invasive tests (NITs) have [...] Read more.
Background: It is crucial to evaluate liver fibrosis in metabolic dysfunction-associated steatotic liver disease (MASLD). Digital pathology, an automated method for quantitative fibrosis measurement, provides valuable support to pathologists by providing refined continuous metrics and addressing inter-observer variability. Although non-invasive tests (NITs) have been validated as consistent with manual pathology, the relationship between digital pathology and NITs remains unexplored. Methods: This study included 99 biopsy-proven MASLD patients. Quantitative-fibrosis (Q-Fibrosis) used second-harmonic generation/two-photon excitation fluorescence microscopy (SHG/TPEF) to quantify fibrosis parameters (q-FPs). Correlations between eight NITs and q-FPs were analyzed. Results: Using manual pathology as standard, Q-Fibrosis exhibited excellent diagnostic performance in fibrosis stages assessment with area under the receiver operating characteristic curves (AUCs) ranging from 0.924 to 0.967. In addition, magnetic resonance elastography (MRE) achieved the highest diagnostic accuracy (AUC: 0.781–0.977) among the eight NITs. Furthermore, MRE-assessed liver stiffness measurement (MRE-LSM) showed the strongest correlation with q-FPs, particularly adjusted by string length, string width, and the number of short and thick strings within the portal region. Conclusions: Both MRE and digital pathology demonstrated excellent diagnostic accuracy. MRE-LSM was primarily determined by collagen extent, location and pattern, which provide a new perspective for understanding the relationship between the change in MRE and histological fibrosis reverse. Full article
(This article belongs to the Section Pathology and Molecular Diagnostics)
Show Figures

Figure 1

28 pages, 4443 KB  
Article
UCINet: A Multi-Task Network for Umbilical Coiling Index Measurement in Obstetric Ultrasound
by Zhuofu Liu, Lichen Niu, Zhixin Di and Meimei Liu
Algorithms 2025, 18(9), 592; https://doi.org/10.3390/a18090592 - 22 Sep 2025
Viewed by 405
Abstract
The umbilical coiling index (UCI), which quantifies the degree of vascular coiling in the umbilical cord, is a crucial indicator for assessing fetal intrauterine development and predicting perinatal outcomes. However, the existing methods for measuring the UCI primarily rely on manual assessment, which [...] Read more.
The umbilical coiling index (UCI), which quantifies the degree of vascular coiling in the umbilical cord, is a crucial indicator for assessing fetal intrauterine development and predicting perinatal outcomes. However, the existing methods for measuring the UCI primarily rely on manual assessment, which suffers from low efficiency and susceptibility to inter-observer variability. In response to the challenges in measuring the umbilical coiling index during obstetric ultrasound, we propose UCINet, a multi-task neural network engineered explicitly for this purpose. UCINet demonstrates enhanced operational efficiency and significantly improved accuracy in detection, catering to the nuanced requirements of obstetric imaging. Firstly, this paper proposes a Frequency–Spatial Domain Downsampling Module (FSDM) to extract features in both the frequency and spatial domains, thereby reducing the loss of umbilical cord features and enhancing their representational capacity. The proposed Multi-Receptive Field Feature Perception Module (MRPM) employs receptive fields of varying sizes across different stages of the feature maps, enhancing the richness of feature representation. This approach allows the model to capture a more diverse set of spatial information, contributing to improved overall performance in feature extraction. A Multi-Scale Feature Aggregation Module (MSAM) comprehensively leverages multi-scale features via a dynamic fusion mechanism, optimizing the integration of disparate feature scales for enhanced performance. In addition, the UCI dataset, which consisted of 2018 annotated ultrasound images, was constructed, each labeled with the number of vascular coils and keypoints at both ends of the umbilical cord. Compared with state-of-the-art methods, UCINet achieves consistent improvements across two tasks. In object detection, UCINet outperforms Deformable DETR-R50 with an improvement of 1.2% points in mAP@50. In keypoint localization, it further exceeds YOLOv11 with a 3.0% gain in mAP@50, highlighting its effectiveness in both detection accuracy and fine-grained keypoint prediction. Full article
(This article belongs to the Special Issue Machine Learning for Pattern Recognition (3rd Edition))
Show Figures

Figure 1

15 pages, 4098 KB  
Article
Comparative Diagnostic Value of Computed Tomography Lung and Bone Window Settings for the Detection of Nasal Foreign Bodies in 47 Dogs Presented to Two UK Referral Hospitals (2015–2023)
by Nicoletta Fantaconi, Andrew T. Parry, Jose Labrador, Luis Alejandro Pérez López and Petra Agthe
Animals 2025, 15(18), 2684; https://doi.org/10.3390/ani15182684 - 13 Sep 2025
Viewed by 646
Abstract
Confident diagnosis of nasal foreign bodies (FBs) with computed tomography (CT) is challenging. Plant material FBs may be inconspicuous depending on size and attenuation and may be obscured by secondary nasal changes such as accumulation of mucous. The authors anecdotally observed that the [...] Read more.
Confident diagnosis of nasal foreign bodies (FBs) with computed tomography (CT) is challenging. Plant material FBs may be inconspicuous depending on size and attenuation and may be obscured by secondary nasal changes such as accumulation of mucous. The authors anecdotally observed that the lung window (LW) might improve visualization of some nasal FBs. The aim of this retrospective, multicentre study was to assess the diagnostic utility and interobserver variability of the LW in the diagnosis of nasal FBs. We hypothesized that use of the LW improves detection rate of nasal FBs compared to the bone window (BW), and that interobserver agreement is strong. Computed tomography examinations of 47 dogs with an endoscopically confirmed nasal foreign body (FB) were included, and each study was reviewed independently by two board certified radiologists, resulting in a total of 94 assessments. Pre-contrast CT series were reviewed in the BW and LW. The reviewers were blinded to the final diagnosis and were asked to evaluate the CT studies for presence or absence of a convincing nasal FB. Reviewers confidently detected a nasal FB on 19/94 (20%) assessments in the BW and 20/94 (21%) in the LW, the majority of the FBs were elongated in shape (30%) and were visible in the rostral and mid-portion of the nasal cavity. The interobserver agreement was moderate in the BW (k < 0.53) and in the LW (k < 0.49). Our findings do not support our main hypothesis that the use of the LW significantly increases diagnostic accuracy for the identification of nasal FBs in dogs. However, as the LW enabled correct diagnosis in one assessment, it may occasionally be helpful if no FB is visualized on initial examination. Full article
(This article belongs to the Section Veterinary Clinical Studies)
Show Figures

Figure 1

15 pages, 2479 KB  
Article
Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study
by Ian Io Lei, Daniel R. Gaya, Alexander Robertson, Benedicte Schelde-Olesen, Alice Mapiye, Anirudh Bhandare, Bei Bei Lui, Chander Shekhar, Ursula Valentiner, Pere Gilabert, Pablo Laiz, Santi Segui, Nicholas Parsons, Cristiana Huhulea, Hagen Wenzek, Elizabeth White, Anastasios Koulaouzidis and Ramesh P. Arasaradnam
Cancers 2025, 17(17), 2840; https://doi.org/10.3390/cancers17172840 - 29 Aug 2025
Viewed by 689
Abstract
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is [...] Read more.
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is inherently subjective and marked by high interobserver variability. Recent advances in artificial intelligence (AI) have enabled automated cleansing scores that not only standardise assessment and reduce variability but also align with the emerging semi-automated AI reading workflow, which highlights only clinically significant frames. As full video review becomes less routine, reliable, and consistent, cleansing evaluation is essential, positioning bowel preparation AI as a critical enabler of diagnostic accuracy and scalable CCE deployment. Objective: This CESCAIL sub-study aimed to (1) evaluate interobserver agreement in CCE bowel cleansing assessment using two established scoring systems, and (2) determine the impact of AI-assisted scoring, specifically a TransUNet-based segmentation model with a custom Patch Loss function, on both interobserver and intraobserver agreement compared to manual assessment. Methods: As part of the CESCAIL study, twenty-five CCE videos were randomly selected from 673 participants. Nine readers with varying CCE experience scored bowel cleanliness using the Leighton–Rex and CC-CLEAR scales. After a minimum 8-week washout, the same readers reassessed the videos using AI-assisted CC-CLEAR scores. Interobserver variability was evaluated using bootstrapped intraclass correlation coefficients (ICC) and Fleiss’ Kappa; intraobserver variability was assessed with weighted Cohen’s Kappa, paired t-tests, and Two One-Sided Tests (TOSTs). Results: Leighton–Rex showed poor to fair agreement (Fleiss = 0.14; ICC = 0.55), while CC-CLEAR demonstrated fair to excellent agreement (Fleiss = 0.27; ICC = 0.90). AI-assisted CC-CLEAR achieved only moderate agreement overall (Fleiss = 0.27; ICC = 0.69), with weaker performance among less experienced readers (Fleiss = 0.15; ICC = 0.56). Intraobserver agreement was excellent (ICC > 0.75) for experienced readers but variable in others (ICC 0.03–0.80). AI-assisted scores were significantly lower than manual reads by 1.46 points (p < 0.001), potentially increasing conversion to colonoscopy. Conclusions: AI-assisted scoring did not improve interobserver agreement and may even reduce consistency amongst less experienced readers. The maintained agreement observed in experienced readers highlights its current value in experienced hands only. Further refinement, including spatial analysis integration, is needed for robust overall AI implementation in CCE. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

Back to TopTop