Next Issue
Volume 11, September
Previous Issue
Volume 11, July
 
 

J. Imaging, Volume 11, Issue 8 (August 2025) – 40 articles

Cover Story (view full-size image): From decoding plant root networks to simulating the inner structure of rocks, our research explores how generative AI is transforming scientific imaging. We put the most advanced architectures, ranging from VAEs to GANs and diffusion models, to the test on microCT scans, composite fibers, and high-resolution biological images. GANs, led by StyleGAN, produced strikingly detailed and coherent images, while diffusion models like DALL-E 2 delivered remarkable realism but sometimes sacrificed scientific precision. Our findings reveal why common image quality scores fall short in science, and why expert review is essential. By tackling challenges in interpretability, cost, and verification, we outline how generative AI could soon power breakthroughs in data augmentation, simulation, and even scientific discovery itself. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
18 pages, 1212 KB  
Article
Part-Wise Graph Fourier Learning for Skeleton-Based Continuous Sign Language Recognition
by Dong Wei, Hongxiang Hu and Gang-Feng Ma
J. Imaging 2025, 11(8), 286; https://doi.org/10.3390/jimaging11080286 - 21 Aug 2025
Viewed by 638
Abstract
Sign language is a visual language articulated through body movements. Existing approaches predominantly leverage RGB inputs, incurring substantial computational overhead and remaining susceptible to interference from foreground and background noise. A second fundamental challenge lies in accurately modeling the nonlinear temporal dynamics and [...] Read more.
Sign language is a visual language articulated through body movements. Existing approaches predominantly leverage RGB inputs, incurring substantial computational overhead and remaining susceptible to interference from foreground and background noise. A second fundamental challenge lies in accurately modeling the nonlinear temporal dynamics and inherent asynchrony across body parts that characterize sign language sequences. To address these challenges, we propose a novel part-wise graph Fourier learning method for skeleton-based continuous sign language recognition (PGF-SLR), which uniformly models the spatiotemporal relations of multiple body parts in a globally ordered yet locally unordered manner. Specifically, different parts within different time steps are treated as nodes, while the frequency domain attention between parts is treated as edges to construct a part-level Fourier fully connected graph. This enables the graph Fourier learning module to jointly capture spatiotemporal dependencies in the frequency domain, while our adaptive frequency enhancement method further amplifies discriminative action features in a lightweight and robust fashion. Finally, a dual-branch action learning module featuring an auxiliary action prediction branch to assist the recognition branch is designed to enhance the understanding of sign language. Our experimental results show that the proposed PGF-SLR achieved relative improvements of 3.31%/3.70% and 2.81%/7.33% compared to SOTA methods on the dev/test sets of the PHOENIX14 and PHOENIX14-T datasets. It also demonstrated highly competitive recognition performance on the CSL-Daily dataset, showcasing strong generalization while reducing computational costs in both offline and online settings. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
Show Figures

Figure 1

23 pages, 2751 KB  
Article
MSConv-YOLO: An Improved Small Target Detection Algorithm Based on YOLOv8
by Linli Yang and Barmak Honarvar Shakibaei Asli
J. Imaging 2025, 11(8), 285; https://doi.org/10.3390/jimaging11080285 - 21 Aug 2025
Viewed by 574
Abstract
Small object detection in UAV aerial imagery presents significant challenges due to scale variations, sparse feature representation, and complex backgrounds. To address these issues, this paper focuses on practical engineering improvements to the existing YOLOv8s framework, rather than proposing a fundamentally new algorithm. [...] Read more.
Small object detection in UAV aerial imagery presents significant challenges due to scale variations, sparse feature representation, and complex backgrounds. To address these issues, this paper focuses on practical engineering improvements to the existing YOLOv8s framework, rather than proposing a fundamentally new algorithm. We introduce MultiScaleConv-YOLO (MSConv-YOLO), an enhanced model that integrates well-established techniques to improve detection performance for small targets. Specifically, the proposed approach introduces three key improvements: (1) a MultiScaleConv (MSConv) module that combines depthwise separable and dilated convolutions with varying dilation rates, enhancing multi-scale feature extraction while maintaining efficiency; (2) the replacement of CIoU with WIoU v3 as the bounding box regression loss, which incorporates a dynamic non-monotonic focusing mechanism to improve localization for small targets; and (3) the addition of a high-resolution detection head in the neck–head structure, leveraging FPN and PAN to preserve fine-grained features and ensure full-scale coverage. Experimental results on the VisDrone2019 dataset show that MSConv-YOLO outperforms the baseline YOLOv8s by achieving a 6.9% improvement in mAP@0.5 and a 6.3% gain in recall. Ablation studies further validate the complementary impact of each enhancement. This paper presents practical and effective engineering enhancements to small object detection in UAV scenarios, offering an improved solution without introducing entirely new theoretical constructs. Future work will focus on lightweight deployment and adaptation to more complex environments. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 8764 KB  
Article
Multi-Class Classification of Breast Cancer Subtypes Using ResNet Architectures on Histopathological Images
by Akshat Desai and Rakeshkumar Mahto
J. Imaging 2025, 11(8), 284; https://doi.org/10.3390/jimaging11080284 - 21 Aug 2025
Viewed by 705
Abstract
Breast cancer is a significant cause of cancer-related mortality among women around the globe, underscoring the need for early and accurate diagnosis. Typically, histopathological analysis of biopsy slides is utilized for tumor classification. However, it is labor-intensive, subjective, and often affected by inter-observer [...] Read more.
Breast cancer is a significant cause of cancer-related mortality among women around the globe, underscoring the need for early and accurate diagnosis. Typically, histopathological analysis of biopsy slides is utilized for tumor classification. However, it is labor-intensive, subjective, and often affected by inter-observer variability. Therefore, this study explores a deep learning-based, multi-class classification framework for distinguishing breast cancer subtypes using convolutional neural networks (CNNs). Unlike previous work using the popular BreaKHis dataset, where binary classification models were applied, in this work, we differentiate eight histopathological subtypes: four benign (adenosis, fibroadenoma, phyllodes tumor, and tubular adenoma) and four malignant (ductal carcinoma, lobular carcinoma, mucinous carcinoma, and papillary carcinoma). This work leverages transfer learning with ImageNet-pretrained ResNet architectures (ResNet-18, ResNet-34, and ResNet-50) and extensive data augmentation to enhance classification accuracy and robustness across magnifications. Among the ResNet models, ResNet-50 achieved the best performance, attaining a maximum accuracy of 92.42%, an AUC-ROC of 99.86%, and an average specificity of 98.61%. These findings validate the combined effectiveness of CNNs and transfer learning in capturing fine-grained histopathological features required for accurate breast cancer subtype classification. Full article
(This article belongs to the Special Issue AI-Driven Advances in Computational Pathology)
Show Figures

Figure 1

31 pages, 5221 KB  
Article
Dynamic–Attentive Pooling Networks: A Hybrid Lightweight Deep Model for Lung Cancer Classification
by Williams Ayivi, Xiaoling Zhang, Wisdom Xornam Ativi, Francis Sam and Franck A. P. Kouassi
J. Imaging 2025, 11(8), 283; https://doi.org/10.3390/jimaging11080283 - 21 Aug 2025
Viewed by 532
Abstract
Lung cancer is one of the leading causes of cancer-related mortality worldwide. The diagnosis of this disease remains a challenge due to the subtle and ambiguous nature of early-stage symptoms and imaging findings. Deep learning approaches, specifically Convolutional Neural Networks (CNNs), have significantly [...] Read more.
Lung cancer is one of the leading causes of cancer-related mortality worldwide. The diagnosis of this disease remains a challenge due to the subtle and ambiguous nature of early-stage symptoms and imaging findings. Deep learning approaches, specifically Convolutional Neural Networks (CNNs), have significantly advanced medical image analysis. However, conventional architectures such as ResNet50 that rely on first-order pooling often fall short. This study aims to overcome the limitations of CNNs in lung cancer classification by proposing a novel and dynamic model named LungSE-SOP. The model is based on Second-Order Pooling (SOP) and Squeeze-and-Excitation Networks (SENet) within a ResNet50 backbone to improve feature representation and class separation. A novel Dynamic Feature Enhancement (DFE) module is also introduced, which dynamically adjusts the flow of information through SOP and SENet blocks based on learned importance scores. The model was trained using a publicly available IQ-OTH/NCCD lung cancer dataset. The performance of the model was assessed using various metrics, including the accuracy, precision, recall, F1-score, ROC curves, and confidence intervals. For multiclass tumor classification, our model achieved 98.6% accuracy for benign, 98.7% for malignant, and 99.9% for normal cases. Corresponding F1-scores were 99.2%, 99.8%, and 99.9%, respectively, reflecting the model’s high precision and recall across all tumor types and its strong potential for clinical deployment. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

24 pages, 2959 KB  
Article
From Detection to Diagnosis: An Advanced Transfer Learning Pipeline Using YOLO11 with Morphological Post-Processing for Brain Tumor Analysis for MRI Images
by Ikram Chourib
J. Imaging 2025, 11(8), 282; https://doi.org/10.3390/jimaging11080282 - 21 Aug 2025
Viewed by 895
Abstract
Accurate and timely detection of brain tumors from magnetic resonance imaging (MRI) scans is critical for improving patient outcomes and informing therapeutic decision-making. However, the complex heterogeneity of tumor morphology, scarcity of annotated medical data, and computational demands of deep learning models present [...] Read more.
Accurate and timely detection of brain tumors from magnetic resonance imaging (MRI) scans is critical for improving patient outcomes and informing therapeutic decision-making. However, the complex heterogeneity of tumor morphology, scarcity of annotated medical data, and computational demands of deep learning models present substantial challenges for developing reliable automated diagnostic systems. In this study, we propose a robust and scalable deep learning framework for brain tumor detection and classification, built upon an enhanced YOLO-v11 architecture combined with a two-stage transfer learning strategy. The first stage involves training a base model on a large, diverse MRI dataset. Upon achieving a mean Average Precision (mAP) exceeding 90%, this model is designated as the Brain Tumor Detection Model (BTDM). In the second stage, the BTDM is fine-tuned on a structurally similar but smaller dataset to form Brain Tumor Detection and Segmentation (BTDS), effectively leveraging domain transfer to maintain performance despite limited data. The model is further optimized through domain-specific data augmentation—including geometric transformations—to improve generalization and robustness. Experimental evaluations on publicly available datasets show that the framework achieves high mAP@0.5 scores (up to 93.5% for the BTDM and 91% for BTDS) and consistently outperforms existing state-of-the-art methods across multiple tumor types, including glioma, meningioma, and pituitary tumors. In addition, a post-processing module enhances interpretability by generating segmentation masks and extracting clinically relevant metrics such as tumor size and severity level. These results underscore the potential of our approach as a high-performance, interpretable, and deployable clinical decision-support tool, contributing to the advancement of intelligent real-time neuro-oncological diagnostics. Full article
(This article belongs to the Topic Machine Learning and Deep Learning in Medical Imaging)
Show Figures

Figure 1

26 pages, 6425 KB  
Article
Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations
by Pafan Doungpaisan and Peerapol Khunarsa
J. Imaging 2025, 11(8), 281; https://doi.org/10.3390/jimaging11080281 - 21 Aug 2025
Viewed by 590
Abstract
Gunshot sound classification plays a crucial role in public safety, forensic investigations, and intelligent surveillance systems. This study evaluates the performance of deep learning models in classifying firearm sounds by analyzing twelve time–frequency spectrogram representations, including Mel, Bark, MFCC, CQT, Cochleagram, STFT, FFT, [...] Read more.
Gunshot sound classification plays a crucial role in public safety, forensic investigations, and intelligent surveillance systems. This study evaluates the performance of deep learning models in classifying firearm sounds by analyzing twelve time–frequency spectrogram representations, including Mel, Bark, MFCC, CQT, Cochleagram, STFT, FFT, Reassigned, Chroma, Spectral Contrast, and Wavelet. The dataset consists of 2148 gunshot recordings from four firearm types, collected in a semi-controlled outdoor environment under multi-orientation conditions. To leverage advanced computer vision techniques, all spectrograms were converted into RGB images using perceptually informed colormaps. This enabled the application of image processing approaches and fine-tuning of pre-trained Convolutional Neural Networks (CNNs) originally developed for natural image classification. Six CNN architectures—ResNet18, ResNet50, ResNet101, GoogLeNet, Inception-v3, and InceptionResNetV2—were trained on these spectrogram images. Experimental results indicate that CQT, Cochleagram, and Mel spectrograms consistently achieved high classification accuracy, exceeding 94% when paired with deep CNNs such as ResNet101 and InceptionResNetV2. These findings demonstrate that transforming time–frequency features into RGB images not only facilitates the use of image-based processing but also allows deep models to capture rich spectral–temporal patterns, providing a robust framework for accurate firearm sound classification. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

17 pages, 3805 KB  
Systematic Review
The Genetics of Amyloid Deposition: A Systematic Review of Genome-Wide Association Studies Using Amyloid PET Imaging in Alzheimer’s Disease
by Amir A. Amanullah, Melika Mirbod, Aarti Pandey, Shashi B. Singh, Om H. Gandhi and Cyrus Ayubcha
J. Imaging 2025, 11(8), 280; https://doi.org/10.3390/jimaging11080280 - 19 Aug 2025
Viewed by 713
Abstract
Positron emission tomography (PET) has become a powerful tool in Alzheimer’s disease (AD) research by enabling in vivo visualization of pathological biomarkers. Recent efforts have aimed to integrate PET-derived imaging phenotypes with genome-wide association studies (GWASs) to better elucidate the genetic architecture underlying [...] Read more.
Positron emission tomography (PET) has become a powerful tool in Alzheimer’s disease (AD) research by enabling in vivo visualization of pathological biomarkers. Recent efforts have aimed to integrate PET-derived imaging phenotypes with genome-wide association studies (GWASs) to better elucidate the genetic architecture underlying AD. This systematic review examines studies that leverage PET imaging in the context of GWASs (PET-GWASs) to identify genetic variants associated with disease risk, progression, and brain region-specific pathology. A comprehensive search of PubMed and Embase databases was performed on 18 February 2025, yielding 210 articles, of which 10 met pre-defined inclusion criteria and were included in the final synthesis. Studies were eligible if they included AD populations, employed PET imaging alongside GWASs, and reported original full-text findings in English. No formal protocol was registered, and the risk of bias was not independently assessed. The included studies consistently identified APOE as the strongest genetic determinant of amyloid burden, while revealing additional significant loci including ABCA7 (involved in lipid metabolism and amyloid clearance), FERMT2 (cell adhesion), CR1 (immune response), TOMM40 (mitochondrial function), and FGL2 (protective against amyloid deposition in Korean populations). The included studies suggest that PET-GWAS approaches can uncover genetic loci involved in processes such as lipid metabolism, immune response, and synaptic regulation. Despite limitations including modest cohort sizes and methodological variability, this integrated approach offers valuable insight into the biological pathways driving AD pathology. Expanding PET-genomic datasets, improving study power, and applying advanced computational tools may further clarify genetic mechanisms and contribute to precision medicine efforts in AD. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 2639 KB  
Article
Fundus Image-Based Eye Disease Detection Using EfficientNetB3 Architecture
by Rahaf Alsohemi and Samia Dardouri
J. Imaging 2025, 11(8), 279; https://doi.org/10.3390/jimaging11080279 - 19 Aug 2025
Viewed by 835
Abstract
Accurate and early classification of retinal diseases such as diabetic retinopathy, cataract, and glaucoma is essential for preventing vision loss and improving clinical outcomes. Manual diagnosis from fundus images is often time-consuming and error-prone, motivating the development of automated solutions. This study proposes [...] Read more.
Accurate and early classification of retinal diseases such as diabetic retinopathy, cataract, and glaucoma is essential for preventing vision loss and improving clinical outcomes. Manual diagnosis from fundus images is often time-consuming and error-prone, motivating the development of automated solutions. This study proposes a deep learning-based classification model using a pretrained EfficientNetB3 architecture, fine-tuned on a publicly available Kaggle retinal image dataset. The model categorizes images into four classes: cataract, diabetic retinopathy, glaucoma, and healthy. Key enhancements include transfer learning, data augmentation, and optimization via the Adam optimizer with a cosine annealing scheduler. The proposed model achieved a classification accuracy of 95.12%, with a precision of 95.21%, recall of 94.88%, F1-score of 95.00%, Dice Score of 94.91%, Jaccard Index of 91.2%, and an MCC of 0.925. These results demonstrate the model’s robustness and potential to support automated retinal disease diagnosis in clinical settings. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

35 pages, 11854 KB  
Article
ODDM: Integration of SMOTE Tomek with Deep Learning on Imbalanced Color Fundus Images for Classification of Several Ocular Diseases
by Afraz Danish Ali Qureshi, Hassaan Malik, Ahmad Naeem, Syeda Nida Hassan, Daesik Jeong and Rizwan Ali Naqvi
J. Imaging 2025, 11(8), 278; https://doi.org/10.3390/jimaging11080278 - 18 Aug 2025
Viewed by 783
Abstract
Ocular disease (OD) represents a complex medical condition affecting humans. OD diagnosis is a challenging process in the current medical system, and blindness may occur if the disease is not detected at its initial phase. Recent studies showed significant outcomes in the identification [...] Read more.
Ocular disease (OD) represents a complex medical condition affecting humans. OD diagnosis is a challenging process in the current medical system, and blindness may occur if the disease is not detected at its initial phase. Recent studies showed significant outcomes in the identification of OD using deep learning (DL) models. Thus, this work aims to develop a multi-classification DL-based model for the classification of seven ODs, including normal (NOR), age-related macular degeneration (AMD), diabetic retinopathy (DR), glaucoma (GLU), maculopathy (MAC), non-proliferative diabetic retinopathy (NPDR), and proliferative diabetic retinopathy (PDR), using color fundus images (CFIs). This work proposes a custom model named the ocular disease detection model (ODDM) based on a CNN. The proposed ODDM is trained and tested on a publicly available ocular disease dataset (ODD). Additionally, the SMOTE Tomek (SM-TOM) approach is also used to handle the imbalanced distribution of the OD images in the ODD. The performance of the ODDM is compared with seven baseline models, including DenseNet-201 (R1), EfficientNet-B0 (R2), Inception-V3 (R3), MobileNet (R4), Vgg-16 (R5), Vgg-19 (R6), and ResNet-50 (R7). The proposed ODDM obtained a 98.94% AUC, along with 97.19% accuracy, a recall of 88.74%, a precision of 95.23%, and an F1-score of 88.31% in classifying the seven different types of OD. Furthermore, ANOVA and Tukey HSD (Honestly Significant Difference) post hoc tests are also applied to represent the statistical significance of the proposed ODDM. Thus, this study concludes that the results of the proposed ODDM are superior to those of baseline models and state-of-the-art models. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)
Show Figures

Figure 1

18 pages, 7265 KB  
Article
Automated Task-Transfer Function Measurement for CT Image Quality Assessment Based on AAPM TG 233
by Choirul Anam, Riska Amilia, Ariij Naufal, Eko Hidayanto, Heri Sutanto, Lukmanda E. Lubis, Toshioh Fujibuchi and Geoff Dougherty
J. Imaging 2025, 11(8), 277; https://doi.org/10.3390/jimaging11080277 - 18 Aug 2025
Viewed by 601
Abstract
This study aims to develop and validate software for the automatic measurement of the task-transfer function (TTF) based on the American Association of Physicists in Medicine (AAPM) Task Group (TG) 233. The software consists of two main stages: automatic placement of the region [...] Read more.
This study aims to develop and validate software for the automatic measurement of the task-transfer function (TTF) based on the American Association of Physicists in Medicine (AAPM) Task Group (TG) 233. The software consists of two main stages: automatic placement of the region of interest (ROI) within circular objects of the phantoms and calculating the TTF. The software was developed on four CT phantom types: computational phantom, ACR 464 CT phantom, AAPM CT phantom, and Catphan® 604 phantom. Each phantom was tested with varying parameters, including spatial resolution level, slice thickness, and image reconstruction technique. The results of TTF were compared with manual measurements performed using ImQuest version 7.3.01 and iQmetix-CT version v1.2. The software successfully located ROIs at all circular objects within each phantom and measured accurate TTF with various contrast-to-noise ratios (CNRs) of all phantoms. The TTF results were comparable to those obtained with ImQuest and iQmetrix-CT. It was found that the TTF curves produced by the software are smoother than those produced by ImQuest. An algorithm for the automated measurement of TTF was successfully developed and validated. TTF measurement with our software is highly user-friendly, requiring only a single click from the user. Full article
Show Figures

Figure 1

29 pages, 693 KB  
Article
The Contribution of AIDA (Artificial Intelligence Dystocia Algorithm) to Cesarean Section Within Robson Classification Group
by Antonio Malvasi, Lorenzo E. Malgieri, Michael Stark, Edoardo Di Naro, Dan Farine, Giorgio Maria Baldini, Miriam Dellino, Murat Yassa, Andrea Tinelli, Antonella Vimercati and Tommaso Difonzo
J. Imaging 2025, 11(8), 276; https://doi.org/10.3390/jimaging11080276 - 16 Aug 2025
Viewed by 547
Abstract
Global cesarean section (CS) rates continue to rise, with the Robson classification widely used for analysis. However, Robson Group 2A patients (nulliparous women with induced labor) show disproportionately high CS rates that cannot be fully explained by demographic factors alone. This study explored [...] Read more.
Global cesarean section (CS) rates continue to rise, with the Robson classification widely used for analysis. However, Robson Group 2A patients (nulliparous women with induced labor) show disproportionately high CS rates that cannot be fully explained by demographic factors alone. This study explored how the Artificial Intelligence Dystocia Algorithm (AIDA) could enhance the Robson system by providing detailed information on geometric dystocia, thereby facilitating better understanding of factors contributing to CS and developing more targeted reduction strategies. The authors conducted a comprehensive literature review analyzing both classification systems across multiple databases and developed a theoretical framework for integration. AIDA categorized labor cases into five classes (0–4) by analyzing four key geometric parameters measured through intrapartum ultrasound: angle of progression (AoP), asynclitism degree (AD), head–symphysis distance (HSD), and midline angle (MLA). Significant asynclitism (AD ≥ 7.0 mm) was strongly associated with CS regardless of other parameters, potentially explaining many “failure to progress” cases in Robson Group 2A patients. The proposed integration created a combined classification providing both population-level and individual geometric risk assessment. The integration of AIDA with the Robson classification represented a potentially valuable advancement in CS risk assessment, combining population-level stratification with individual-level geometric assessment to enable more personalized obstetric care. Future validation studies across diverse settings are needed to establish clinical utility. Full article
Show Figures

Figure 1

22 pages, 3234 KB  
Article
A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI
by Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen and Firoz Ahmed
J. Imaging 2025, 11(8), 275; https://doi.org/10.3390/jimaging11080275 - 15 Aug 2025
Viewed by 944
Abstract
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all [...] Read more.
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17–1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×104, reduce-on-plateau, early stopping), 2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

14 pages, 3502 KB  
Article
Deep Learning-Based Nuclei Segmentation and Melanoma Detection in Skin Histopathological Image Using Test Image Augmentation and Ensemble Model
by Mohammadesmaeil Akbarpour, Hamed Fazlollahiaghamalek, Mahdi Barati, Mehrdad Hashemi Kamangar and Mrinal Mandal
J. Imaging 2025, 11(8), 274; https://doi.org/10.3390/jimaging11080274 - 15 Aug 2025
Viewed by 530
Abstract
Histopathological images play a crucial role in diagnosing skin cancer. However, due to the very large size of digital histopathological images (typically in the order of billion pixels), manual image analysis is tedious and time-consuming. Therefore, there has been significant interest in developing [...] Read more.
Histopathological images play a crucial role in diagnosing skin cancer. However, due to the very large size of digital histopathological images (typically in the order of billion pixels), manual image analysis is tedious and time-consuming. Therefore, there has been significant interest in developing Artificial Intelligence (AI)-enabled computer-aided diagnosis (CAD) techniques for skin cancer detection. Due to the diversity of uncertain cell boundaries, automated nuclei segmentation of histopathological images remains challenging. Automating the identification of abnormal cell nuclei and analyzing their distribution across multiple tissue sections can significantly expedite comprehensive diagnostic assessments. In this paper, a deep neural network (DNN)-based technique is proposed to segment nuclei and detect melanoma in histopathological images. To achieve a robust performance, a test image is first augmented by various geometric operations. The augmented images are then passed through the DNN and the individual outputs are combined to obtain the final nuclei-segmented image. A morphological technique is then applied on the nuclei-segmented image to detect the melanoma region in the image. Experimental results show that the proposed technique can achieve a Dice score of 91.61% and 87.9% for nuclei segmentation and melanoma detection, respectively. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

24 pages, 5649 KB  
Article
Bangla Speech Emotion Recognition Using Deep Learning-Based Ensemble Learning and Feature Fusion
by Md. Shahid Ahammed Shakil, Fahmid Al Farid, Nitun Kumar Podder, S. M. Hasan Sazzad Iqbal, Abu Saleh Musa Miah, Md Abdur Rahim and Hezerul Abdul Karim
J. Imaging 2025, 11(8), 273; https://doi.org/10.3390/jimaging11080273 - 14 Aug 2025
Viewed by 589
Abstract
Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep [...] Read more.
Emotion recognition in speech is essential for enhancing human–computer interaction (HCI) systems. Despite progress in Bangla speech emotion recognition, challenges remain, including low accuracy, speaker dependency, and poor generalization across emotional expressions. Previous approaches often rely on traditional machine learning or basic deep learning models, struggling with robustness and accuracy in noisy or varied data. In this study, we propose a novel multi-stream deep learning feature fusion approach for Bangla speech emotion recognition, addressing the limitations of existing methods. Our approach begins with various data augmentation techniques applied to the training dataset, enhancing the model’s robustness and generalization. We then extract a comprehensive set of handcrafted features, including Zero-Crossing Rate (ZCR), chromagram, spectral centroid, spectral roll-off, spectral contrast, spectral flatness, Mel-Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Mel-spectrogram. Although these features are used as 1D numerical vectors, some of them are computed from time–frequency representations (e.g., chromagram, Mel-spectrogram) that can themselves be depicted as images, which is conceptually close to imaging-based analysis. These features capture key characteristics of the speech signal, providing valuable insights into the emotional content. Sequentially, we utilize a multi-stream deep learning architecture to automatically learn complex, hierarchical representations of the speech signal. This architecture consists of three distinct streams: the first stream uses 1D convolutional neural networks (1D CNNs), the second integrates 1D CNN with Long Short-Term Memory (LSTM), and the third combines 1D CNNs with bidirectional LSTM (Bi-LSTM). These models capture intricate emotional nuances that handcrafted features alone may not fully represent. For each of these models, we generate predicted scores and then employ ensemble learning with a soft voting technique to produce the final prediction. This fusion of handcrafted features, deep learning-derived features, and ensemble voting enhances the accuracy and robustness of emotion identification across multiple datasets. Our method demonstrates the effectiveness of combining various learning models to improve emotion recognition in Bangla speech, providing a more comprehensive solution compared with existing methods. We utilize three primary datasets—SUBESCO, BanglaSER, and a merged version of both—as well as two external datasets, RAVDESS and EMODB, to assess the performance of our models. Our method achieves impressive results with accuracies of 92.90%, 85.20%, 90.63%, 67.71%, and 69.25% for the SUBESCO, BanglaSER, merged SUBESCO and BanglaSER, RAVDESS, and EMODB datasets, respectively. These results demonstrate the effectiveness of combining handcrafted features with deep learning-based features through ensemble learning for robust emotion recognition in Bangla speech. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

19 pages, 6304 KB  
Article
Digital Image Processing and Convolutional Neural Network Applied to Detect Mitral Stenosis in Echocardiograms: Clinical Decision Support
by Genilton de França Barros Filho, José Fernando de Morais Firmino, Israel Solha, Ewerton Freitas de Medeiros, Alex dos Santos Felix, José Carlos de Lima Júnior, Marcelo Dantas Tavares de Melo and Marcelo Cavalcanti Rodrigues
J. Imaging 2025, 11(8), 272; https://doi.org/10.3390/jimaging11080272 - 14 Aug 2025
Viewed by 382
Abstract
The mitral valve is the most susceptible to pathological alterations, such as mitral stenosis, characterized by failure of the valve to open completely. In this context, the objective of this study was to apply digital image processing (DIP) and develop a convolutional neural [...] Read more.
The mitral valve is the most susceptible to pathological alterations, such as mitral stenosis, characterized by failure of the valve to open completely. In this context, the objective of this study was to apply digital image processing (DIP) and develop a convolutional neural network (CNN) to provide decision support for specialists in the diagnosis of mitral stenosis based on transesophageal echocardiography examinations. The following procedures were implemented: acquisition of echocardiogram exams; application of DIP; use of augmentation techniques; and development of a CNN. The DIP classified 26.7% cases without stenosis, 26.7% with mild stenosis, 13.3% with moderate stenosis, and 33.3% with severe stenosis. A CNN was initially developed to classify videos into those four categories. However, the number of acquired exams was insufficient to effectively train the model for this purpose. So, the final model was trained to differentiate between videos with or without stenosis, achieving an accuracy of 92% with a loss of 0.26. The results demonstrate that both DIP and CNN are effective in distinguishing between cases with and without stenosis. Moreover, DIP was capable of classifying varying degrees of stenosis severity—mild, moderate, and severe—highlighting its potential as a valuable tool in clinical decision support. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

19 pages, 6354 KB  
Article
Extract Nutritional Information from Bilingual Food Labels Using Large Language Models
by Fatmah Y. Assiri, Mohammad D. Alahmadi, Mohammed A. Almuashi and Ayidh M. Almansour
J. Imaging 2025, 11(8), 271; https://doi.org/10.3390/jimaging11080271 - 13 Aug 2025
Viewed by 735
Abstract
Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for [...] Read more.
Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for online grocery stores to offer reliable nutrition facts and empower customers to make informed dietary decisions. Unfortunately, product labels are typically available in image formats, requiring organizations and online stores to manually transcribe them—a process that is not only time-consuming but also highly prone to human error, especially with multilingual labels that add complexity to the task. Our study investigates the challenges and effectiveness of leveraging large language models (LLMs) to extract nutritional elements and values from multilingual food product labels, with a specific focus on Arabic and English. A comprehensive empirical analysis was conducted using a manually curated dataset of 294 food product labels, comprising 588 transcribed nutritional elements and values in both languages, which served as the ground truth for evaluation. The findings reveal that while LLMs performed better in extracting English elements and values compared to Arabic, our post-processing techniques significantly enhanced their accuracy, with GPT-4o outperforming GPT-4V and Gemini. Full article
Show Figures

Figure 1

17 pages, 8033 KB  
Article
PU-DZMS: Point Cloud Upsampling via Dense Zoom Encoder and Multi-Scale Complementary Regression
by Shucong Li, Zhenyu Liu, Tianlei Wang and Zhiheng Zhou
J. Imaging 2025, 11(8), 270; https://doi.org/10.3390/jimaging11080270 - 12 Aug 2025
Viewed by 535
Abstract
Point cloud imaging technology usually faces the problem of point cloud sparsity, which leads to a lack of important geometric detail. There are many point cloud upsampling networks that have been designed to solve this problem. However, the existing methods have limitations in [...] Read more.
Point cloud imaging technology usually faces the problem of point cloud sparsity, which leads to a lack of important geometric detail. There are many point cloud upsampling networks that have been designed to solve this problem. However, the existing methods have limitations in local–global relation understanding, leading to contour distortion and many local sparse regions. To this end, PU-DZMS is proposed with two components. (1) the Dense Zoom Encoder (DENZE) is designed to capture local–global features by using ZOOM Blocks with a dense connection. The main module in the ZOOM Block is the Zoom Encoder, which embeds a Transformer mechanism into the down–upsampling process to enhance local–global geometric features. The geometric edge of the point cloud would be clear under the DENZE. (2) The Multi-Scale Complementary Regression (MSCR) module is designed to expand the features and regress a dense point cloud. MSCR obtains the features’ geometric distribution differences across scales to ensure geometric continuity, and it regresses new points by adopting cross-scale residual learning. The local sparse regions of the point cloud would be reduced by the MSCR module. The experimental results on the PU-GAN dataset and the PU-Net dataset show that the proposed method performs well on point cloud upsampling tasks. Full article
Show Figures

Figure 1

24 pages, 948 KB  
Review
A Review on Deep Learning Methods for Glioma Segmentation, Limitations, and Future Perspectives
by Cecilia Diana-Albelda, Álvaro García-Martín and Jesus Bescos
J. Imaging 2025, 11(8), 269; https://doi.org/10.3390/jimaging11080269 - 11 Aug 2025
Viewed by 973
Abstract
Accurate and automated segmentation of gliomas from Magnetic Resonance Imaging (MRI) is crucial for effective diagnosis, treatment planning, and patient monitoring. However, the aggressive nature and morphological complexity of these tumors pose significant challenges that call for advanced segmentation techniques. This review provides [...] Read more.
Accurate and automated segmentation of gliomas from Magnetic Resonance Imaging (MRI) is crucial for effective diagnosis, treatment planning, and patient monitoring. However, the aggressive nature and morphological complexity of these tumors pose significant challenges that call for advanced segmentation techniques. This review provides a comprehensive analysis of Deep Learning (DL) methods for glioma segmentation, with a specific focus on bridging the gap between research performance and practical clinical deployment. We evaluate over 80 state-of-the-art models published up to 2025, categorizing them into CNN-based, Pure Transformer, and Hybrid CNN-Transformer architectures. The primary objective of this paper is to critically assess these models not only on their segmentation accuracy but also on their computational efficiency and suitability for real-world medical environments by incorporating hardware resource considerations. We present a comparison of model performance on the BraTS datasets benchmark and introduce a suitability analysis for top-performing models based on their robustness, efficiency, and completeness of tumor region delineation. By identifying current trends, limitations, and key trade-offs, this review offers future research directions aimed at optimizing the balance between technical performance and clinical usability to improve diagnostic outcomes for glioma patients. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

28 pages, 13462 KB  
Article
Research on the Accessibility of Different Colour Schemes for Web Resources for People with Colour Blindness
by Daiva Sajek, Olena Korotenko and Tetiana Kyrychok
J. Imaging 2025, 11(8), 268; https://doi.org/10.3390/jimaging11080268 - 11 Aug 2025
Viewed by 404
Abstract
This study is devoted to the analysis of the perception of colour schemes of web resources by users with different types of colour blindness (colour vision deficiency). The purpose of this study is to develop recommendations for choosing the optimal colour scheme for [...] Read more.
This study is devoted to the analysis of the perception of colour schemes of web resources by users with different types of colour blindness (colour vision deficiency). The purpose of this study is to develop recommendations for choosing the optimal colour scheme for web resource design that will ensure the comfortable perception of content for the broadest possible audience, including users with colour vision deficiency of various types (deuteranopia and deuteranomaly, protanopia and protanomaly, tritanopia, and tritanomaly). This article presents the results of a survey of people with different colour vision deficiencies regarding the accessibility of web resources created using different colour schemes. The colour deviation value ∆E was calculated to objectively assess changes in the perception of different colour groups by people with colour vision impairments. The conclusions of this study emphasise the importance of taking into account the needs of users with colour vision impairments when developing web resources. Specific recommendations for choosing the best colour schemes for websites are also offered, which will help increase the accessibility and effectiveness of web content for users with different types of colour blindness. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

17 pages, 7225 KB  
Article
Placido Sub-Pixel Edge Detection Algorithm Based on Enhanced Mexican Hat Wavelet Transform and Improved Zernike Moments
by Yujie Wang, Jinyu Liang, Yating Xiao, Xinfeng Liu, Jiale Li, Guangyu Cui and Quan Zhang
J. Imaging 2025, 11(8), 267; https://doi.org/10.3390/jimaging11080267 - 11 Aug 2025
Viewed by 342
Abstract
In order to meet the high-precision location requirements of the corneal Placido ring edge in corneal topographic reconstruction, this paper proposes a sub-pixel edge detection algorithm based on multi-scale and multi-position enhanced Mexican Hat Wavelet Transform and improved Zernike moment. Firstly, the image [...] Read more.
In order to meet the high-precision location requirements of the corneal Placido ring edge in corneal topographic reconstruction, this paper proposes a sub-pixel edge detection algorithm based on multi-scale and multi-position enhanced Mexican Hat Wavelet Transform and improved Zernike moment. Firstly, the image undergoes preliminary processing using a multi-scale and multi-position enhanced Mexican Hat Wavelet Transform function. Subsequently, the preliminary edge information extracted is relocated based on the Zernike moments of a 9 × 9 template. Finally, two improved adaptive edge threshold algorithms are employed to determine the actual sub-pixel edge points of the image, thereby realizing sub-pixel edge detection for corneal Placido ring images. Through comparison and analysis of edge extraction results from real human eye images obtained using the algorithm proposed in this paper and those from other existing algorithms, it is observed that the average sub-pixel edge error of other algorithms is 0.286 pixels, whereas the proposed algorithm achieves an average error of only 0.094 pixels. Furthermore, the proposed algorithm demonstrates strong robustness against noise. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

14 pages, 2224 KB  
Article
Evaluation of Transfer Learning Efficacy for Surgical Suture Quality Classification on Limited Datasets
by Roman Ishchenko, Maksim Solopov, Andrey Popandopulo, Elizaveta Chechekhina, Viktor Turchin, Fedor Popivnenko, Aleksandr Ermak, Konstantyn Ladyk, Anton Konyashin, Kirill Golubitskiy, Aleksei Burtsev and Dmitry Filimonov
J. Imaging 2025, 11(8), 266; https://doi.org/10.3390/jimaging11080266 - 8 Aug 2025
Viewed by 467
Abstract
This study evaluates the effectiveness of transfer learning with pre-trained convolutional neural networks (CNNs) for the automated binary classification of surgical suture quality (high-quality/low-quality) using photographs of three suture types: interrupted open vascular sutures (IOVS), continuous over-and-over open sutures (COOS), and interrupted laparoscopic [...] Read more.
This study evaluates the effectiveness of transfer learning with pre-trained convolutional neural networks (CNNs) for the automated binary classification of surgical suture quality (high-quality/low-quality) using photographs of three suture types: interrupted open vascular sutures (IOVS), continuous over-and-over open sutures (COOS), and interrupted laparoscopic sutures (ILS). To address the challenge of limited medical data, eight state-of-the-art CNN architectures—EfficientNetB0, ResNet50V2, MobileNetV3Large, VGG16, VGG19, InceptionV3, Xception, and DenseNet121—were trained and validated on small datasets (100–190 images per type) using 5-fold cross-validation. Performance was assessed using the F1-score, AUC-ROC, and a custom weighted stability-aware score (Scoreadj). The results demonstrate that transfer learning achieves robust classification (F1 > 0.90 for IOVS/ILS, 0.79 for COOS) despite data scarcity. ResNet50V2, DenseNet121, and Xception were more stable by Scoreadj, with ResNet50V2 achieving the highest AUC-ROC (0.959 ± 0.008) for IOVS internal view classification. GradCAM visualizations confirmed model focus on clinically relevant features (e.g., stitch uniformity, tissue apposition). These findings validate transfer learning as a powerful approach for developing objective, automated surgical skill assessment tools, reducing reliance on subjective expert evaluations while maintaining accuracy in resource-constrained settings. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Medical Imaging Applications)
Show Figures

Figure 1

19 pages, 12806 KB  
Article
A Vision Method for Detecting Citrus Separation Lines Using Line-Structured Light
by Qingcang Yu, Song Xue and Yang Zheng
J. Imaging 2025, 11(8), 265; https://doi.org/10.3390/jimaging11080265 - 8 Aug 2025
Viewed by 366
Abstract
The detection of citrus separation lines is a crucial step in the citrus processing industry. Inspired by the achievements of line-structured light technology in surface defect detection, this paper proposes a method for detecting citrus separation lines based on line-structured light. Firstly, a [...] Read more.
The detection of citrus separation lines is a crucial step in the citrus processing industry. Inspired by the achievements of line-structured light technology in surface defect detection, this paper proposes a method for detecting citrus separation lines based on line-structured light. Firstly, a gamma-corrected Otsu method is employed to extract the laser stripe region from the image. Secondly, an improved skeleton extraction algorithm is employed to mitigate the bifurcation errors inherent in original skeleton extraction algorithms while simultaneously acquiring 3D point cloud data of the citrus surface. Finally, the least squares progressive iterative approximation algorithm is applied to approximate the ideal surface curve; subsequently, principal component analysis is used to derive the normals of this ideally fitted curve. The deviation between each point (along its corresponding normal direction) and the actual geometric characteristic curve is then adopted as a quantitative index for separation lines positioning. The average similarity between the extracted separation lines and the manually defined standard separation lines reaches 92.5%. In total, 95% of the points on the separation lines obtained by this method have an error of less than 4 pixels. Experimental results demonstrate that through quantitative deviation analysis of geometric features, automatic detection and positioning of the separation lines are achieved, satisfying the requirements of high precision and non-destructiveness for automatic citrus splitting. Full article
Show Figures

Figure 1

20 pages, 7305 KB  
Article
Systematic and Individualized Preparation of External Ear Canal Implants: Development and Validation of an Efficient and Accurate Automated Segmentation System
by Yanjing Luo, Mohammadtaha Kouchakinezhad, Felix Repp, Verena Scheper, Thomas Lenarz and Farnaz Matin-Mann
J. Imaging 2025, 11(8), 264; https://doi.org/10.3390/jimaging11080264 - 8 Aug 2025
Viewed by 381
Abstract
External ear canal (EEC) stenosis, often associated with cholesteatoma, carries a high risk of postoperative restenosis despite surgical intervention. While individualized implants offer promise in preventing restenosis, the high morphological variability of EECs and the lack of standardized definitions hinder systematic implant design. [...] Read more.
External ear canal (EEC) stenosis, often associated with cholesteatoma, carries a high risk of postoperative restenosis despite surgical intervention. While individualized implants offer promise in preventing restenosis, the high morphological variability of EECs and the lack of standardized definitions hinder systematic implant design. This study aimed to characterize individual EEC morphology and to develop a validated automated segmentation system for efficient implant preparation. Reference datasets were first generated by manual segmentation using 3D SlicerTM software version 5.2.2. Based on these, we developed a customized plugin capable of automatically identifying the maximal implantable region within the EEC and measuring its key dimensions. The accuracy of the plugin was assessed by comparing it with manual segmentation results in terms of shape, volume, length, and width. Validation was further performed using three temporal bone implantation experiments with 3D-Bioplotter©-fabricated EEC implants. The automated system demonstrated strong consistency with manual methods and significantly improved segmentation efficiency. The plugin-generated models enabled successful implant fabrication and placement in all validation tests. These results confirm the system’s clinical feasibility and support its use for individualized and systematic EEC implant design. The developed tool holds potential to improve surgical planning and reduce postoperative restenosis in EEC stenosis treatment. Full article
(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)
Show Figures

Graphical abstract

23 pages, 5644 KB  
Article
Enhancing YOLOv5 for Autonomous Driving: Efficient Attention-Based Object Detection on Edge Devices
by Mortda A. A. Adam and Jules R. Tapamo
J. Imaging 2025, 11(8), 263; https://doi.org/10.3390/jimaging11080263 - 8 Aug 2025
Viewed by 909
Abstract
On-road vision-based systems rely on object detection to ensure vehicle safety and efficiency, making it an essential component of autonomous driving. Deep learning methods show high performance; however, they often require special hardware due to their large sizes and computational complexity, which makes [...] Read more.
On-road vision-based systems rely on object detection to ensure vehicle safety and efficiency, making it an essential component of autonomous driving. Deep learning methods show high performance; however, they often require special hardware due to their large sizes and computational complexity, which makes real-time deployment on edge devices expensive. This study proposes lightweight object detection models based on the YOLOv5s architecture, known for its speed and accuracy. The models integrate advanced channel attention strategies, specifically the ECA module and SE attention blocks, to enhance feature selection while minimizing computational overhead. Four models were developed and trained on the KITTI dataset. The models were analyzed using key evaluation metrics to assess their effectiveness in real-time autonomous driving scenarios, including precision, recall, and mean average precision (mAP). BaseECAx2 emerged as the most efficient model for edge devices, achieving the lowest GFLOPs (13) and smallest model size (9.1 MB) without sacrificing performance. The BaseSE-ECA model demonstrated outstanding accuracy in vehicle detection, reaching a precision of 96.69% and an mAP of 98.4%, making it ideal for high-precision autonomous driving scenarios. We also assessed the models’ robustness in more challenging environments by training and testing them on the BDD-100K dataset. While the models exhibited reduced performance in complex scenarios involving low-light conditions and motion blur, this evaluation highlights potential areas for improvement in challenging real-world driving conditions. This study bridges the gap between affordability and performance, presenting lightweight, cost-effective solutions for integration into real-time autonomous vehicle systems. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

29 pages, 3842 KB  
Article
SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation
by Rui Wen, Wu Xie, Yong Fan and Lanlan Shen
J. Imaging 2025, 11(8), 262; https://doi.org/10.3390/jimaging11080262 - 6 Aug 2025
Viewed by 494
Abstract
Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, [...] Read more.
Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

11 pages, 1947 KB  
Article
Quantitative Magnetic Resonance Imaging and Patient-Reported Outcomes in Patients Undergoing Hip Labral Repair or Reconstruction
by Kyle S. J. Jamar, Adam Peszek, Catherine C. Alder, Trevor J. Wait, Caleb J. Wipf, Carson L. Keeter, Stephanie W. Mayer, Charles P. Ho and James W. Genuario
J. Imaging 2025, 11(8), 261; https://doi.org/10.3390/jimaging11080261 - 5 Aug 2025
Viewed by 543
Abstract
This study evaluates the relationship between preoperative cartilage quality, measured by T2 mapping, and patient-reported outcomes following labral tear treatment. We retrospectively reviewed patients aged 14–50 who underwent primary hip arthroscopy with either labral repair or reconstruction. Preoperative T2 values of femoral, acetabular, [...] Read more.
This study evaluates the relationship between preoperative cartilage quality, measured by T2 mapping, and patient-reported outcomes following labral tear treatment. We retrospectively reviewed patients aged 14–50 who underwent primary hip arthroscopy with either labral repair or reconstruction. Preoperative T2 values of femoral, acetabular, and labral tissue were assessed from MRI by blinded reviewers. International Hip Outcome Tool (iHOT-12) scores were collected preoperatively and up to two years postoperatively. Associations between T2 values and iHOT-12 scores were analyzed using univariate mixed linear models. Twenty-nine patients were included (mean age of 32.5 years, BMI 24 kg/m2, 48.3% female, and 22 repairs). Across all patients, higher T2 values were associated with higher iHOT-12 scores at baseline and early postoperative timepoints (three months for cartilage and six months for labrum; p < 0.05). Lower T2 values were associated with higher 12- and 24-month iHOT-12 scores across all structures (p < 0.001). Similar trends were observed within the repair and reconstruction subgroups, with delayed negative associations correlating with worse tissue quality. T2 mapping showed time-dependent correlations with iHOT-12 scores, indicating that worse cartilage or labral quality predicts poorer long-term outcomes. These findings support the utility of T2 mapping as a preoperative tool for prognosis in hip preservation surgery. Full article
(This article belongs to the Special Issue New Developments in Musculoskeletal Imaging)
Show Figures

Figure 1

19 pages, 7531 KB  
Article
Evaluating the Impact of 2D MRI Slice Orientation and Location on Alzheimer’s Disease Diagnosis Using a Lightweight Convolutional Neural Network
by Nadia A. Mohsin and Mohammed H. Abdulameer
J. Imaging 2025, 11(8), 260; https://doi.org/10.3390/jimaging11080260 - 5 Aug 2025
Viewed by 692
Abstract
Accurate detection of Alzheimer’s disease (AD) is critical yet challenging for early medical intervention. Deep learning methods, especially convolutional neural networks (CNNs), have shown promising potential for improving diagnostic accuracy using magnetic resonance imaging (MRI). This study aims to identify the most informative [...] Read more.
Accurate detection of Alzheimer’s disease (AD) is critical yet challenging for early medical intervention. Deep learning methods, especially convolutional neural networks (CNNs), have shown promising potential for improving diagnostic accuracy using magnetic resonance imaging (MRI). This study aims to identify the most informative combination of MRI slice orientation and anatomical location for AD classification. We propose an automated framework that first selects the most relevant slices using a feature entropy-based method applied to activation maps from a pretrained CNN model. For classification, we employ a lightweight CNN architecture based on depthwise separable convolutions to efficiently analyze the selected 2D MRI slices extracted from preprocessed 3D brain scans. To further interpret model behavior, an attention mechanism is integrated to analyze which feature level contributes the most to the classification process. The model is evaluated on three binary tasks: AD vs. mild cognitive impairment (MCI), AD vs. cognitively normal (CN), and MCI vs. CN. The experimental results show the highest accuracy (97.4%) in distinguishing AD from CN when utilizing the selected slices from the ninth axial segment, followed by the tenth segment of coronal and sagittal orientations. These findings demonstrate the significance of slice location and orientation in MRI-based AD diagnosis and highlight the potential of lightweight CNNs for clinical use. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

23 pages, 22135 KB  
Article
Road Marking Damage Degree Detection Based on Boundary Features Enhanced and Asymmetric Large Field-of-View Contextual Features
by Zheng Wang, Ryojun Ikeura, Soichiro Hayakawa and Zhiliang Zhang
J. Imaging 2025, 11(8), 259; https://doi.org/10.3390/jimaging11080259 - 4 Aug 2025
Viewed by 502
Abstract
Road markings, as critical components of transportation infrastructure, are crucial for ensuring traffic safety. Accurate quantification of their damage severity is vital for effective maintenance prioritization. However, existing methods are limited to detecting the presence of damage without assessing its extent. To address [...] Read more.
Road markings, as critical components of transportation infrastructure, are crucial for ensuring traffic safety. Accurate quantification of their damage severity is vital for effective maintenance prioritization. However, existing methods are limited to detecting the presence of damage without assessing its extent. To address this limitation, we propose a novel segmentation-based framework for estimating the degree of road marking damage. The method comprises two stages: segmentation of residual pixels from the damaged markings and segmentation of the intact markings region. This dual-segmentation strategy enables precise reconstruction and comparison for severity estimation. To enhance segmentation performance, we proposed two key modules: the Asymmetric Large Field-of-View Contextual (ALFVC) module, which captures rich multi-scale contextual features, and the supervised Boundary Feature Enhancement (BFE) module, which strengthens shape representation and boundary accuracy. The experimental results demonstrate that our method achieved an average segmentation accuracy of 89.44%, outperforming the baseline by 5.86 percentage points. Moreover, the damage quantification achieved a minimum error rate of just 0.22% on the proprietary dataset. The proposed approach was both effective and lightweight, providing valuable support for automated maintenance planning, and significantly improving the efficiency and precision of road marking management. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

28 pages, 21813 KB  
Article
Adaptive RGB-D Semantic Segmentation with Skip-Connection Fusion for Indoor Staircase and Elevator Localization
by Zihan Zhu, Henghong Lin, Anastasia Ioannou and Tao Wang
J. Imaging 2025, 11(8), 258; https://doi.org/10.3390/jimaging11080258 - 4 Aug 2025
Viewed by 592
Abstract
Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature [...] Read more.
Accurate semantic segmentation of indoor architectural elements, such as staircases and elevators, is critical for safe and efficient robotic navigation, particularly in complex multi-floor environments. Traditional fusion methods struggle with occlusions, reflections, and low-contrast regions. In this paper, we propose a novel feature fusion module, Skip-Connection Fusion (SCF), that dynamically integrates RGB (Red, Green, Blue) and depth features through an adaptive weighting mechanism and skip-connection integration. This approach enables the model to selectively emphasize informative regions while suppressing noise, effectively addressing challenging conditions such as partially blocked staircases, glossy elevator doors, and dimly lit stair edges, which improves obstacle detection and supports reliable human–robot interaction in complex environments. Extensive experiments on a newly collected dataset demonstrate that SCF consistently outperforms state-of-the-art methods, including PSPNet and DeepLabv3, in both overall mIoU (mean Intersection over Union) and challenging-case performance. Specifically, our SCF module improves segmentation accuracy by 5.23% in the top 10% of challenging samples, highlighting its robustness in real-world conditions. Furthermore, we conduct a sensitivity analysis on the learnable weights, demonstrating their impact on segmentation quality across varying scene complexities. Our work provides a strong foundation for real-world applications in autonomous navigation, assistive robotics, and smart surveillance. Full article
Show Figures

Figure 1

20 pages, 4292 KB  
Article
A Novel Method for Analysing the Curvature of the Anterior Lens: Multi-Radial Scheimpflug Imaging and Custom Conic Fitting Algorithm
by María Arcas-Carbonell, Elvira Orduna-Hospital, María Mechó-García, Guisela Fernández-Espinosa and Ana Sanchez-Cano
J. Imaging 2025, 11(8), 257; https://doi.org/10.3390/jimaging11080257 - 1 Aug 2025
Viewed by 392
Abstract
This study describes and validates a novel method for assessing anterior crystalline lens curvature along vertical and horizontal meridians using radial measurements derived from Scheimpflug imaging. The aim was to evaluate whether pupil diameter (PD), anterior lens curvature, and anterior chamber depth (ACD) [...] Read more.
This study describes and validates a novel method for assessing anterior crystalline lens curvature along vertical and horizontal meridians using radial measurements derived from Scheimpflug imaging. The aim was to evaluate whether pupil diameter (PD), anterior lens curvature, and anterior chamber depth (ACD) change during accommodation and whether these changes are age-dependent. A cross-sectional study was conducted on 104 right eyes from healthy participants aged 21–62 years. Sixteen radial images per eye were acquired using the Galilei Dual Scheimpflug Placido Disk Topographer under four accommodative demands (0, 1, 3, and 5 dioptres (D)). Custom software analysed lens curvature by calculating eccentricity in both meridians. Participants were analysed as a total group and by age subgroups. Accommodative amplitude and monocular accommodative facility were inversely correlated with age. Both PD and ACD significantly decreased with higher accommodative demands and age. Relative eccentricity decreased under accommodation, indicating increased lens curvature, especially in younger participants. Significant curvature changes were detected in the horizontal meridian only, although no statistically significant differences between meridians were found overall. The vertical meridian showed slightly higher eccentricity values, suggesting that it remained less curved. By enabling detailed, meridionally stratified in vivo assessment of anterior lens curvature, this novel method provides a valuable non-invasive approach for characterizing age-related biomechanical changes during accommodation. The resulting insights enhance our understanding of presbyopia progression, particularly regarding the spatial remodelling of the anterior lens surface. Full article
(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)
Show Figures

Figure 1

Previous Issue
Back to TopTop