Previous Issue
Volume 11, August
 
 

J. Imaging, Volume 11, Issue 9 (September 2025) – 30 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
19 pages, 11534 KB  
Article
Segment and Recover: Defending Object Detectors Against Adversarial Patch Attacks
by Haotian Gu and Hamidreza Jafarnejadsani
J. Imaging 2025, 11(9), 316; https://doi.org/10.3390/jimaging11090316 - 15 Sep 2025
Abstract
Object detection is used to automatically identify and locate specific objects within images or videos for applications like autonomous driving, security surveillance, and medical imaging. Protecting object detection models against adversarial attacks, particularly malicious patches, is crucial to ensure reliable and safe performance [...] Read more.
Object detection is used to automatically identify and locate specific objects within images or videos for applications like autonomous driving, security surveillance, and medical imaging. Protecting object detection models against adversarial attacks, particularly malicious patches, is crucial to ensure reliable and safe performance in safety-critical applications, where misdetections can lead to severe consequences. Existing defenses against patch attacks are primarily designed for stationary scenes and struggle against adversarial image patches that vary in scale, position, and orientation in dynamic environments.In this paper, we introduce SAR, a patch-agnostic defense scheme based on image preprocessing that does not require additional model training. By integration of the patch-agnostic detection frontend with an additional broken pixel restoration backend, Segment and Recover (SAR) is developed for the large-mask-covered object-hiding attack. Our approach breaks the limitation of the patch scale, shape, and location, accurately localizes the adversarial patch on the frontend, and restores the broken pixel on the backend. Our evaluations of the clean performance demonstrate that SAR is compatible with a variety of pretrained object detectors. Moreover, SAR exhibits notable resilience improvements over state-of-the-art methods evaluated in this paper. Our comprehensive evaluation studies involve diverse patch types, such as localized-noise, printable, visible, and adaptive adversarial patches. Full article
(This article belongs to the Special Issue Object Detection in Video Surveillance Systems)
Show Figures

Figure 1

25 pages, 10818 KB  
Article
From Detection to Motion-Based Classification: A Two-Stage Approach for T. cruzi Identification in Video Sequences
by Kenza Chenni, Carlos Brito-Loeza, Cefa Karabağ and Lavdie Rada
J. Imaging 2025, 11(9), 315; https://doi.org/10.3390/jimaging11090315 - 14 Sep 2025
Abstract
Chagas disease, caused by Trypanosoma cruzi (T. cruzi), remains a significant public health challenge in Latin America. Traditional diagnostic methods relying on manual microscopy suffer from low sensitivity, subjective interpretation, and poor performance in suboptimal conditions. This study presents a novel [...] Read more.
Chagas disease, caused by Trypanosoma cruzi (T. cruzi), remains a significant public health challenge in Latin America. Traditional diagnostic methods relying on manual microscopy suffer from low sensitivity, subjective interpretation, and poor performance in suboptimal conditions. This study presents a novel computer vision framework integrating motion analysis with deep learning for automated T. cruzi detection in microscopic videos. Our motion-based detection pipeline leverages parasite motility as a key discriminative feature, employing frame differencing, morphological processing, and DBSCAN clustering across 23 microscopic videos. This approach effectively addresses limitations of static image analysis in challenging conditions including noisy backgrounds, uneven illumination, and low contrast. From motion-identified regions, 64×64 patches were extracted for classification. MobileNetV2 achieved superior performance with 99.63% accuracy, 100% precision, 99.12% recall, and an AUC-ROC of 1.0. Additionally, YOLOv5 and YOLOv8 models (Nano, Small, Medium variants) were trained on 43 annotated videos, with YOLOv5-Nano and YOLOv8-Nano demonstrating excellent detection capability on unseen test data. This dual-stage framework offers a practical, computationally efficient solution for automated Chagas diagnosis, particularly valuable for resource-constrained laboratories with poor imaging quality. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

24 pages, 4112 KB  
Article
Enhancing Breast Lesion Detection in Mammograms via Transfer Learning
by Beibit Abdikenov, Dimash Rakishev, Yerzhan Orazayev and Tomiris Zhaksylyk
J. Imaging 2025, 11(9), 314; https://doi.org/10.3390/jimaging11090314 - 13 Sep 2025
Viewed by 49
Abstract
Early detection of breast cancer via mammography enhances patient survival rates, prompting this study to assess object detection models—Cascade R-CNN, YOLOv12 (S, L, and X variants), RTMDet-X, and RT-DETR-X—for detecting masses and calcifications across four public datasets (INbreast, CBIS-DDSM, VinDr-Mammo, and EMBED). The [...] Read more.
Early detection of breast cancer via mammography enhances patient survival rates, prompting this study to assess object detection models—Cascade R-CNN, YOLOv12 (S, L, and X variants), RTMDet-X, and RT-DETR-X—for detecting masses and calcifications across four public datasets (INbreast, CBIS-DDSM, VinDr-Mammo, and EMBED). The evaluation employs a standardized preprocessing approach (CLAHE, cropping) and augmentation (rotations, scaling), with transfer learning tested by training on combined datasets (e.g., INbreast + CBIS-DDSM) and validating on held-out sets (e.g., VinDr-Mammo). Performance is measured using precision, recall, mean Average Precision at IoU 0.5 (mAP50), and F1-score. YOLOv12-L excels in mass detection with an mAP50 of 0.963 and F1-score up to 0.917 on INbreast, while RTMDet-X achieves an mAP50 of 0.697 on combined datasets with transfer learning. Preprocessing improves mAP50 by up to 0.209, and transfer learning elevates INbreast performance to an mAP50 of 0.995, though it incurs 5–11% drops on CBIS-DDSM (0.566 to 0.447) and VinDr-Mammo (0.59 to 0.5) due to domain shifts. EMBED yields a low mAP50 of 0.306 due to label inconsistencies, and calcification detection remains weak (mAP50 < 0.116), highlighting the value of high-capacity models, preprocessing, and augmentation for mass detection while identifying calcification detection and domain adaptation as key areas for future investigation. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Graphical abstract

26 pages, 5362 KB  
Article
Maternal Factors, Breast Anatomy, and Milk Production During Established Lactation—An Ultrasound Investigation
by Zoya Gridneva, Alethea Rea, David Weight, Jacki L. McEachran, Ching Tat Lai, Sharon L. Perrella and Donna T. Geddes
J. Imaging 2025, 11(9), 313; https://doi.org/10.3390/jimaging11090313 - 12 Sep 2025
Viewed by 152
Abstract
Obesity is linked to suboptimal breastfeeding outcomes, yet the relationships between maternal adiposity, breast anatomy, and milk production (MP) have not been investigated. We conducted ultrasound imaging to assess the breast anatomy of 34 lactating women. The amount of glandular tissue (glandular tissue [...] Read more.
Obesity is linked to suboptimal breastfeeding outcomes, yet the relationships between maternal adiposity, breast anatomy, and milk production (MP) have not been investigated. We conducted ultrasound imaging to assess the breast anatomy of 34 lactating women. The amount of glandular tissue (glandular tissue representation (GTR)) was classified as low, moderate, or high. Number and diameters of main milk ducts and mammary blood flow (resistive index) were measured. Women completed a 24 h MP measurement and an obstetric/lactation history questionnaire. Body composition was measured with bioimpedance spectroscopy. Statistical analysis employed correlation networks. Multiple relationships were revealed, with later menarche correlating with minimal pubertal and pregnancy breast growth. A minimal breast growth was further correlated with lower mammary blood flow during lactation and lower numbers and smaller diameters of main milk ducts, which in turn correlated with a lower MP. Importantly, higher adiposity also correlated with minimal breast growth during pregnancy and low GTR and MP. Several modifiable and non-modifiable maternal factors may be associated with breast development and MP. Antenatal lactation assessment and intervention in high-risk women may ensure they reach their full lactation potential and inform future interventions, such as maintaining healthy adiposity. Full article
(This article belongs to the Special Issue Imaging in Healthcare: Progress and Challenges)
Show Figures

Figure 1

15 pages, 5090 KB  
Article
EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network
by Hao Cheng, Weiye Pang, Kun Li, Yongzhuang Wei, Yuhang Song and Ji Chen
J. Imaging 2025, 11(9), 312; https://doi.org/10.3390/jimaging11090312 - 12 Sep 2025
Viewed by 137
Abstract
Currently, deepfake detection has garnered widespread attention as a key defense mechanism against the misuse of deepfake technology. However, existing deepfake detection networks still face challenges such as insufficient robustness, limited generalization capabilities, and a single feature extraction domain (e.g., using only spatial [...] Read more.
Currently, deepfake detection has garnered widespread attention as a key defense mechanism against the misuse of deepfake technology. However, existing deepfake detection networks still face challenges such as insufficient robustness, limited generalization capabilities, and a single feature extraction domain (e.g., using only spatial domain features) when confronted with evolving algorithms or diverse datasets, which severely limits their application capabilities. To address these issues, this study proposes a deepfake detection network named EFIMD-Net, which enhances performance by strengthening feature interaction and integrating spatial and frequency domain features. The proposed network integrates a Cross-feature Interaction Enhancement module (CFIE) based on cosine similarity, which achieves adaptive interaction between spatial domain features (RGB stream) and frequency domain features (SRM, Spatial Rich Model stream) through a channel attention mechanism, effectively fusing macro-semantic information with high-frequency artifact information. Additionally, an Enhanced Multi-scale Feature Fusion (EMFF) module is proposed, which effectively integrates multi-scale feature information from various layers of the network through adaptive feature enhancement and reorganization techniques. Experimental results show that compared to the baseline network Xception, EFIMD-Net achieves comparable or even better Area Under the Curve (AUC) on multiple datasets. Ablation experiments also validate the effectiveness of the proposed modules. Furthermore, compared to the baseline traditional two-stream network Locate and Verify, EFIMD-Net significantly improves forgery detection performance, with a 9-percentage-point increase in Area Under the Curve on the CelebDF-v1 dataset and a 7-percentage-point increase on the CelebDF-v2 dataset. These results fully demonstrate the effectiveness and generalization of EFIMD-Net in forgery detection. Potential limitations regarding real-time processing efficiency are acknowledged. Full article
(This article belongs to the Section Biometrics, Forensics, and Security)
Show Figures

Figure 1

19 pages, 2838 KB  
Article
Cascaded Spatial and Depth Attention UNet for Hippocampus Segmentation
by Zi-Zheng Wei, Bich-Thuy Vu, Maisam Abbas and Ran-Zan Wang
J. Imaging 2025, 11(9), 311; https://doi.org/10.3390/jimaging11090311 - 11 Sep 2025
Viewed by 168
Abstract
This study introduces a novel enhancement to the UNet architecture, termed Cascaded Spatial and Depth Attention U-Net (CSDA-UNet), tailored specifically for precise hippocampus segmentation in T1-weighted brain MRI scans. The proposed architecture integrates two key attention mechanisms: a Spatial Attention (SA) module, which [...] Read more.
This study introduces a novel enhancement to the UNet architecture, termed Cascaded Spatial and Depth Attention U-Net (CSDA-UNet), tailored specifically for precise hippocampus segmentation in T1-weighted brain MRI scans. The proposed architecture integrates two key attention mechanisms: a Spatial Attention (SA) module, which refines spatial feature representations by producing attention maps from the deepest convolutional layer and modulating the matching object features; and an Inter-Slice Attention (ISA) module, which enhances volumetric uniformity by integrating related information from adjacent slices, thereby reinforcing the model’s capacity to capture inter-slice dependencies. The CSDA-UNet is assessed using hippocampal segmentation data derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Decathlon, two benchmark studies widely employed in neuroimaging research. The proposed model outperforms state-of-the-art methods, achieving a Dice coefficient of 0.9512 and an IoU score of 0.9345 on ADNI and Dice scores of 0.9907/0.8963 (train/validation) and an IoU score of 0.9816/0.8132 (train/validation) on the Decathlon dataset across multiple quantitative metrics. These improvements underscore the efficacy of the proposed dual-attention framework in accurately explaining small, asymmetrical structures such as the hippocampus, while maintaining computational efficiency suitable for clinical deployment. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

12 pages, 1471 KB  
Article
Evaluation of AI Performance in Spinal Radiographic Measurements Compared to Radiologists: A Study of Accuracy and Efficiency
by Francesco Pucciarelli, Guido Gentiloni Silveri, Marta Zerunian, Domenico De Santis, Michela Polici, Antonella Del Gaudio, Benedetta Masci, Tiziano Polidori, Giuseppe Tremamunno, Raffaello Persechino, Giuseppe Argento, Marco Francone, Andrea Laghi and Damiano Caruso
J. Imaging 2025, 11(9), 310; https://doi.org/10.3390/jimaging11090310 - 10 Sep 2025
Viewed by 119
Abstract
This study aimed to evaluate the reliability of an AI-based software tool in measuring spinal parameters—Cobb angle, thoracic kyphosis, lumbar lordosis, and pelvic obliquity—compared to manual measurements by radiologists and to assess potential time savings. In this retrospective monocentric study, 56 patients who [...] Read more.
This study aimed to evaluate the reliability of an AI-based software tool in measuring spinal parameters—Cobb angle, thoracic kyphosis, lumbar lordosis, and pelvic obliquity—compared to manual measurements by radiologists and to assess potential time savings. In this retrospective monocentric study, 56 patients who underwent full-spine weight-bearing X-rays were analyzed. Measurements were independently performed by an experienced radiologist, a radiology resident, and the AI software. A consensus between two senior experts established the ground truth. Lin’s Concordance Correlation Coefficient (CCC), mean absolute error (MAE), ICC, and paired t-tests were used for statistical analysis. The AI software showed excellent agreement with human readers (CCC > 0.9) and demonstrated lower MAE than the resident in Cobb angle and lumbar lordosis measurements but slightly underperformed in thoracic kyphosis and pelvic obliquity. Importantly, the AI significantly reduced analysis time compared to both the experienced radiologist and the resident (p < 0.001). These findings suggest that the AI tool offers a reliable and time-efficient alternative to manual spinal measurements and may enhance accuracy for less experienced radiologists. Full article
Show Figures

Figure 1

31 pages, 13782 KB  
Article
A Hybrid Framework for Red Blood Cell Labeling Using Elliptical Fitting, Autoencoding, and Data Augmentation
by Bundasak Angmanee, Surasak Wanram and Amorn Thedsakhulwong
J. Imaging 2025, 11(9), 309; https://doi.org/10.3390/jimaging11090309 - 9 Sep 2025
Viewed by 446
Abstract
This study aimed to develop a local dataset of abnormal RBC morphology from confirmed cases of anemia and thalassemia in Thailand, providing a foundation for medical image analysis and future AI-assisted diagnostics. Blood smear samples from six hematological disorders were collected between April [...] Read more.
This study aimed to develop a local dataset of abnormal RBC morphology from confirmed cases of anemia and thalassemia in Thailand, providing a foundation for medical image analysis and future AI-assisted diagnostics. Blood smear samples from six hematological disorders were collected between April and May 2025, with twelve regions of interest segmented into approximately 34,000 single-cell images. To characterize cell variability, a convolutional autoencoder was applied to extract latent features, while ellipse fitting was used to quantify cell geometry. Expert hematologists validated representative clusters to ensure clinical accuracy, and data augmentation was employed to address class imbalance and expand rare morphological types. From the dataset, 14,089 high-quality single-cell images were used to classify RBC morphology into 36 clinically meaningful categories. Unlike existing datasets that rely on limited or curated samples, this dataset reflects population-specific characteristics and morphological diversity relevant to Southeast Asia. The results demonstrate the feasibility of establishing scalable and interpretable datasets that integrate computational methods with expert knowledge. The proposed dataset serves as a robust resource for advancing hematology research and contributes to bridging traditional diagnostics with AI-driven clinical support systems. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

32 pages, 29650 KB  
Article
Unsupervised Optical Mark Recognition on Answer Sheets for Massive Printed Multiple-Choice Tests
by Yahir Hernández-Mier, Marco Aurelio Nuño-Maganda, Said Polanco-Martagón, Guadalupe Acosta-Villarreal and Rubén Posada-Gómez
J. Imaging 2025, 11(9), 308; https://doi.org/10.3390/jimaging11090308 - 8 Sep 2025
Viewed by 968
Abstract
The large-scale evaluation of multiple-choice tests is a challenging task from the perspective of image processing. A typical instrument is a multiple-choice question test that employs an answer sheet with circles or squares. Once students have finished the test, the answer sheets are [...] Read more.
The large-scale evaluation of multiple-choice tests is a challenging task from the perspective of image processing. A typical instrument is a multiple-choice question test that employs an answer sheet with circles or squares. Once students have finished the test, the answer sheets are digitized and sent to a processing center for scoring. Operators compute each exam score manually, but this task requires considerable time. While it is true that mature algorithms exist for detecting circles under controlled conditions, they may fail in real-life applications, even when using controlled conditions for image acquisition of the answer sheets. This paper proposes a desktop application for optical mark recognition (OMR) on the scanned multiple-choice question (MCQ) test answer sheets. First, we compiled a set of answer sheet images corresponding to 6029 exams (totaling 564,040 four-option answers) applied in 2024 in Tamaulipas, Mexico. Subsequently, we developed an image-processing module that extracts answers from the answer sheets and an interface for operators to perform analysis by selecting the folder containing the exams and generating results in a tabulated format. We evaluated the image-processing module, achieving a percentage of 96.15% of exams graded without error and 99.95% of 4-option answers classified correctly. We obtained these percentages by comparing the answers generated through our system with those generated by human operators, who took an average of 2 min to produce the answers for a single answer sheet, while the automated version took an average of 1.04 s. Full article
(This article belongs to the Special Issue Self-Supervised Learning for Image Processing and Analysis)
Show Figures

Figure 1

41 pages, 3893 KB  
Review
Research Progress on Color Image Quality Assessment
by Minjuan Gao, Chenye Song, Qiaorong Zhang, Xuande Zhang, Yankang Li and Fujiang Yuan
J. Imaging 2025, 11(9), 307; https://doi.org/10.3390/jimaging11090307 - 8 Sep 2025
Viewed by 328
Abstract
Image quality assessment (IQA) aims to measure the consistency between an objective algorithm output and a subjective perception measurement. This article focuses on this complex relationship in the context of color image scenarios—color image quality assessment (CIQA). This review systematically investigates CIQA applications [...] Read more.
Image quality assessment (IQA) aims to measure the consistency between an objective algorithm output and a subjective perception measurement. This article focuses on this complex relationship in the context of color image scenarios—color image quality assessment (CIQA). This review systematically investigates CIQA applications in image compression, processing optimization, and domain-specific scenarios, analyzes benchmark datasets and assessment metrics, and categorizes CIQA algorithms into full-reference (FR), reduced-reference (RR) and no-reference (NR) methods. In this study, color images are evaluated using a newly developed CIQA framework. Focusing on FR and NR methods, FR methods leverage reference images with machine learning, visual perception models, and mathematical frameworks, while NR methods utilize distortion-only features through feature fusion and extraction techniques. Specialized CIQA algorithms are developed for robotics, low-light, and underwater imaging. Despite progress, challenges remain in cross-domain adaptability, generalization, and contextualized assessment. Future directions may include prototype-based cross-domain adaptation, fidelity–structure balancing, spatiotemporal consistency integration, and CIQA–restoration synergy to meet emerging demands. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

17 pages, 3666 KB  
Article
Efficient Retinal Vessel Segmentation with 78K Parameters
by Zhigao Zeng, Jiakai Liu, Xianming Huang, Kaixi Luo, Xinpan Yuan and Yanhui Zhu
J. Imaging 2025, 11(9), 306; https://doi.org/10.3390/jimaging11090306 - 8 Sep 2025
Viewed by 334
Abstract
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement [...] Read more.
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement with only 1% of the parameters of a standard U-Net; (2) designing a novel Skeleton Distance Loss (SDL) that overcomes boundary loss limitations by leveraging vessel skeletons to handle severe class imbalance; (3) developing a Cross-modal Fusion Attention (CMFA) module combining group convolutions and dynamic weighting to effectively expand receptive fields; and (4) proposing Coordinate Attention Gates (CAGs) to optimize skip connections via directional feature reweighting. Evaluated extensively on DRIVE, CHASE_DB1, HRF, and STARE datasets, DSAE-Net significantly reduces computational complexity while outperforming state-of-the-art lightweight models in segmentation accuracy. Its efficiency and robustness make DSAE-Net particularly suitable for real-time diagnostics in resource-constrained clinical settings. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

20 pages, 3675 KB  
Article
FDMNet: A Multi-Task Network for Joint Detection and Segmentation of Three Fish Diseases
by Zhuofu Liu, Zigan Yan and Gaohan Li
J. Imaging 2025, 11(9), 305; https://doi.org/10.3390/jimaging11090305 - 6 Sep 2025
Viewed by 223
Abstract
Fish diseases are one of the primary causes of economic losses in aquaculture. Existing deep learning models have progressed in fish disease detection and lesion segmentation. However, many models still have limitations, such as detecting only a single type of fish disease or [...] Read more.
Fish diseases are one of the primary causes of economic losses in aquaculture. Existing deep learning models have progressed in fish disease detection and lesion segmentation. However, many models still have limitations, such as detecting only a single type of fish disease or completing only a single task within fish disease detection. To address these limitations, we propose FDMNet, a multi-task learning network. Built upon the YOLOv8 framework, the network incorporates a semantic segmentation branch with a multi-scale perception mechanism. FDMNet performs detection and segmentation simultaneously. The detection and segmentation branches use the C2DF dynamic feature fusion module to address information loss during local feature fusion across scales. Additionally, we use uncertainty-based loss weighting together with PCGrad to mitigate conflicting gradients between tasks, improving the stability and overall performance of FDMNet. On a self-built image dataset containing three common fish diseases, FDMNet achieved 97.0% mAP50 for the detection task and 85.7% mIoU for the segmentation task. Relative to the multi-task YOLO-FD baseline, FDMNet’s detection mAP50 improved by 2.5% and its segmentation mIoU by 5.4%. On the dataset constructed in this study, FDMNet achieved competitive accuracy in both detection and segmentation. These results suggest potential practical utility. Full article
Show Figures

Figure 1

26 pages, 6612 KB  
Article
A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis
by Leonardo Scabini, Andre Sacilotti, Kallil M. Zielinski, Lucas C. Ribas, Bernard De Baets and Odemir M. Bruno
J. Imaging 2025, 11(9), 304; https://doi.org/10.3390/jimaging11090304 - 5 Sep 2025
Viewed by 434
Abstract
Texture, a significant visual attribute in images, plays an important role in many pattern recognition tasks. While Convolutional Neural Networks (CNNs) have been among the most effective methods for texture analysis, alternative architectures such as Vision Transformers (ViTs) have recently demonstrated superior performance [...] Read more.
Texture, a significant visual attribute in images, plays an important role in many pattern recognition tasks. While Convolutional Neural Networks (CNNs) have been among the most effective methods for texture analysis, alternative architectures such as Vision Transformers (ViTs) have recently demonstrated superior performance on a range of visual recognition problems. However, the suitability of ViTs for texture recognition remains underexplored. In this work, we investigate the capabilities and limitations of ViTs for texture recognition by analyzing 25 different ViT variants as feature extractors and comparing them to CNN-based and hand-engineered approaches. Our evaluation encompasses both accuracy and efficiency, aiming to assess the trade-offs involved in applying ViTs to texture analysis. Our results indicate that ViTs generally outperform CNN-based and hand-engineered models, particularly when using strong pre-training and in-the-wild texture datasets. Notably, BeiTv2-B/16 achieves the highest average accuracy (85.7%), followed by ViT-B/16-DINO (84.1%) and Swin-B (80.8%), outperforming the ResNet50 baseline (75.5%) and the hand-engineered baseline (73.4%). As a lightweight alternative, EfficientFormer-L3 attains a competitive average accuracy of 78.9%. In terms of efficiency, although ViT-B and BeiT(v2) have a higher number of GFLOPs and parameters, they achieve significantly faster feature extraction on GPUs compared to ResNet50. These findings highlight the potential of ViTs as a powerful tool for texture analysis while also pointing to areas for future exploration, such as efficiency improvements and domain-specific adaptations. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

2 pages, 368 KB  
Correction
Correction: Dou et al. Performance Calibration of the Wavefront Sensor’s EMCCD Detector for the Cool Planets Imaging Coronagraph Aboard CSST. J. Imaging 2025, 11, 203
by Jiangpei Dou, Bingli Niu, Gang Zhao, Xi Zhang, Gang Wang, Baoning Yuan, Di Wang and Xingguang Qian
J. Imaging 2025, 11(9), 303; https://doi.org/10.3390/jimaging11090303 - 5 Sep 2025
Viewed by 180
Abstract
The authors would like to make the following corrections to the published paper [...] Full article
Show Figures

Figure 2

8 pages, 609 KB  
Brief Report
AI-Generated Patient-Friendly MRI Fistula Summaries: A Pilot Randomised Study
by Easan Anand, Itai Ghersin, Gita Lingam, Theo Pelly, Daniel Singer, Chris Tomlinson, Robin E. J. Munro, Rachel Capstick, Anna Antoniou, Ailsa L. Hart, Phil Tozer, Kapil Sahnan and Phillip Lung
J. Imaging 2025, 11(9), 302; https://doi.org/10.3390/jimaging11090302 - 4 Sep 2025
Viewed by 354
Abstract
Perianal fistulising Crohn’s disease (pfCD) affects 1 in 5 Crohn’s patients and requires frequent MRI monitoring. Standard radiology reports are written for clinicians using technical language often inaccessible to patients, which can cause anxiety and hinder engagement. This study evaluates the feasibility and [...] Read more.
Perianal fistulising Crohn’s disease (pfCD) affects 1 in 5 Crohn’s patients and requires frequent MRI monitoring. Standard radiology reports are written for clinicians using technical language often inaccessible to patients, which can cause anxiety and hinder engagement. This study evaluates the feasibility and safety of AI-generated patient-friendly MRI fistula summaries to improve patient understanding and shared decision-making. MRI fistula reports spanning healed to complex disease were identified and used to generate AI patient-friendly summaries via ChatGPT-4. Six de-identified MRI reports and corresponding AI summaries were assessed by clinicians for hallucinations and readability (Flesch-Kincaid score). Sixteen patients with perianal fistulas were randomized to review either AI summaries or original reports and rated them on readability, comprehensibility, utility, quality, follow-up questions, and trustworthiness using Likert scales. Patients rated AI summaries significantly higher in readability (median 5 vs. 2, p = 0.011), comprehensibility (5 vs. 2, p = 0.007), utility (5 vs. 3, p = 0.014), and overall quality (4.5 vs. 4, p = 0.013), with fewer follow-up questions (3 vs. 4, p = 0.018). Clinicians found AI summaries more readable (mean Flesch-Kincaid 54.6 vs. 32.2, p = 0.005) and free of hallucinations. No clinically significant inaccuracies were identified. AI-generated patient-friendly MRI summaries have potential to enhance patient communication and clinical workflow in pfCD. Larger studies are needed to validate clinical utility, hallucination rates, and acceptability. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

17 pages, 17890 KB  
Article
AnomNet: A Dual-Stage Centroid Optimization Framework for Unsupervised Anomaly Detection
by Yuan Gao, Yu Wang, Xiaoguang Tu and Jiaqing Shen
J. Imaging 2025, 11(9), 301; https://doi.org/10.3390/jimaging11090301 - 3 Sep 2025
Viewed by 368
Abstract
Anomaly detection plays a vital role in ensuring product quality and operational safety across various industrial applications, from manufacturing to infrastructure monitoring. However, current methods often struggle with challenges such as limited generalization to complex multimodal anomalies, poor adaptation to domain-specific patterns, and [...] Read more.
Anomaly detection plays a vital role in ensuring product quality and operational safety across various industrial applications, from manufacturing to infrastructure monitoring. However, current methods often struggle with challenges such as limited generalization to complex multimodal anomalies, poor adaptation to domain-specific patterns, and reduced feature discriminability due to domain gaps between pre-trained models and industrial data. To address these issues, we propose AnomNet, a novel deep anomaly detection framework that integrates a lightweight feature adapter module to bridge domain discrepancies and enhance multi-scale feature discriminability from pre-trained backbones. AnomNet is trained using a dual-stage centroid learning strategy: the first stage employs separation and entropy regularization losses to stabilize and optimize the centroid representation of normal samples; the second stage introduces a centroid-based contrastive learning mechanism to refine decision boundaries by adaptively managing inter- and intra-class feature relationships. The experimental results on the MVTec AD dataset demonstrate the superior performance of AnomNet, achieving a 99.5% image-level AUROC and 98.3% pixel-level AUROC, underscoring its effectiveness and robustness for anomaly detection and localization in industrial environments. Full article
Show Figures

Figure 1

26 pages, 4958 KB  
Article
Compton Camera X-Ray Fluorescence Imaging Design and Image Reconstruction Algorithm Optimization
by Shunmei Lu, Kexin Peng, Peng Feng, Cheng Lin, Qingqing Geng and Junrui Zhang
J. Imaging 2025, 11(9), 300; https://doi.org/10.3390/jimaging11090300 - 3 Sep 2025
Viewed by 400
Abstract
Traditional X-ray fluorescence computed tomography (XFCT) suffers from issues such as low photon collection efficiency, slow data acquisition, severe noise interference, and poor imaging quality due to the limitations of mechanical collimation. This study proposes to design an X-ray fluorescence imaging system based [...] Read more.
Traditional X-ray fluorescence computed tomography (XFCT) suffers from issues such as low photon collection efficiency, slow data acquisition, severe noise interference, and poor imaging quality due to the limitations of mechanical collimation. This study proposes to design an X-ray fluorescence imaging system based on bilateral Compton cameras and to develop an optimized reconstruction algorithm to achieve high-quality 2D/3D imaging of low-concentration samples (0.2% gold nanoparticles). A system equipped with bilateral Compton cameras was designed, replacing mechanical collimation with “electronic collimation”. The traditional LM-MLEM algorithm was optimized through improvements in data preprocessing, system matrix construction, iterative processes, and post-processing, integrating methods such as Total Variation (TV) regularization (anisotropic TV included), filtering, wavelet-domain constraints, and isosurface rendering. Successful 2D and 3D reconstruction of 0.2% gold nanoparticles was achieved. Compared with traditional algorithms, improvements were observed in convergence, stability, speed, quality, and accuracy. The system exhibited high detection efficiency, angular resolution, and energy resolution. The Compton camera-based XFCT overcomes the limitations of traditional methods; the optimized algorithm enables low-noise imaging at ultra-low concentrations and has potential applications in early cancer diagnosis and material analysis. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

18 pages, 4265 KB  
Article
Hybrid-Recursive-Refinement Network for Camouflaged Object Detection
by Hailong Chen, Xinyi Wang and Haipeng Jin
J. Imaging 2025, 11(9), 299; https://doi.org/10.3390/jimaging11090299 - 2 Sep 2025
Viewed by 370
Abstract
Camouflaged object detection (COD) seeks to precisely detect and delineate objects that are concealed within complex and ambiguous backgrounds. However, due to subtle texture variations and semantic ambiguity, it remains a highly challenging task. Existing methods that rely solely on either convolutional neural [...] Read more.
Camouflaged object detection (COD) seeks to precisely detect and delineate objects that are concealed within complex and ambiguous backgrounds. However, due to subtle texture variations and semantic ambiguity, it remains a highly challenging task. Existing methods that rely solely on either convolutional neural network (CNN) or Transformer architectures often suffer from incomplete feature representations and the loss of boundary details. To address the aforementioned challenges, we propose an innovative hybrid architecture that synergistically leverages the strengths of CNNs and Transformers. In particular, we devise a Hybrid Feature Fusion Module (HFFM) that harmonizes hierarchical features extracted from CNN and Transformer pathways, ultimately boosting the representational quality of the combined features. Furthermore, we design a Combined Recursive Decoder (CRD) that adaptively aggregates hierarchical features through recursive pooling/upsampling operators and stage-wise mask-guided refinement, enabling precise structural detail capture across multiple scales. In addition, we propose a Foreground–Background Selection (FBS) module, which alternates attention between foreground objects and background boundary regions, progressively refining object contours while suppressing background interference. Evaluations on four widely used public COD datasets, CHAMELEON, CAMO, COD10K, and NC4K, demonstrate that our method achieves state-of-the-art performance. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Graphical abstract

19 pages, 6571 KB  
Article
From Brain Lobes to Neurons: Navigating the Brain Using Advanced 3D Modeling and Visualization Tools
by Mohamed Rowaizak, Ahmad Farhat and Reem Khalil
J. Imaging 2025, 11(9), 298; https://doi.org/10.3390/jimaging11090298 - 1 Sep 2025
Viewed by 594
Abstract
Neuroscience education must convey 3D structure with clarity and accuracy. Traditional 2D renderings are limited as they lose depth information and hinder spatial understanding. High-resolution resources now exist, yet many are difficult to use in the class. Therefore, we developed an educational brain [...] Read more.
Neuroscience education must convey 3D structure with clarity and accuracy. Traditional 2D renderings are limited as they lose depth information and hinder spatial understanding. High-resolution resources now exist, yet many are difficult to use in the class. Therefore, we developed an educational brain video that moves from gross to microanatomy using MRI-based models and the published literature. The pipeline used Fiji for preprocessing, MeshLab for mesh cleanup, Rhino 6 for target fixes, Houdini FX for materials, lighting, and renders, and Cinema4D for final refinement of the video. We had our brain models validated by two neuroscientists for educational fidelity. We tested the video in a class with 96 undergraduates randomized to video and lecture or lecture only. Students completed the same pretest and posttest questions. Student feedback revealed that comprehension and motivation to learn increased significantly in the group that watched the video, suggesting its potential as a useful supplement to traditional lectures. A short, well-produced 3D video can supplement lectures and improve learning in this setting. We share software versions and key parameters to support reuse. Full article
(This article belongs to the Special Issue 3D Image Processing: Progress and Challenges)
Show Figures

Figure 1

11 pages, 800 KB  
Article
Longitudinal Ultrasound Monitoring of Peripheral Muscle Loss in Neurocritical Patients
by Talita Santos de Arruda, Rayssa Bruna Holanda Lima, Karla Luciana Magnani Seki, Vanderlei Porto Pinto, Rodrigo Koch, Ana Carolina dos Santos Demarchi and Gustavo Christofoletti
J. Imaging 2025, 11(9), 297; https://doi.org/10.3390/jimaging11090297 - 1 Sep 2025
Viewed by 435
Abstract
Ultrasound has become an important tool that offers clinical and practical benefits in the intensive care unit (ICU). Its real-time imaging provides immediate information to support prognostic evaluation and clinical decision-making. This study used ultrasound assessment to investigate the impact of hospitalization on [...] Read more.
Ultrasound has become an important tool that offers clinical and practical benefits in the intensive care unit (ICU). Its real-time imaging provides immediate information to support prognostic evaluation and clinical decision-making. This study used ultrasound assessment to investigate the impact of hospitalization on muscle properties in neurocritical patients and analyze the relationship between peripheral muscle changes and motor sequelae. A total of 43 neurocritical patients admitted to the ICU were included. The inclusion criteria were patients with acute brain injuries with or without motor sequelae. Muscle ultrasonography assessments were performed during ICU admission and hospital discharge. Measurements included muscle thickness, cross-sectional area, and echogenicity of the biceps brachii, quadriceps femoris, and rectus femoris. Statistical analyses were used to compare muscle properties between time points (hospital admission vs. discharge) and between groups (patients with vs. without motor sequelae). Significance was set at 5%. Hospitalization had a significant effect on muscle thickness, cross-sectional area, and echogenicity in patients with and without motor sequelae (p < 0.05, effect sizes between 0.104 and 0.475). Patients with motor sequelae exhibited greater alterations in muscle echogenicity than those without (p < 0.05, effect sizes between 0.182 and 0.211). Changes in muscle thickness and cross-sectional area were similar between the groups (p > 0.05). Neurocritical patients experience significant muscle deterioration during hospitalization. Future studies should explore why echogenicity is more markedly affected than muscle thickness and cross-sectional area in patients with motor sequelae compared to those without. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 4889 KB  
Article
Roughness Estimation and Image Rendering for Glossy Object Surface
by Shoji Tominaga, Motonori Doi and Hideaki Sakai
J. Imaging 2025, 11(9), 296; https://doi.org/10.3390/jimaging11090296 - 28 Aug 2025
Viewed by 453
Abstract
We study the relationship between the physical surface roughness of the glossy surfaces of dielectric objects and the roughness parameter in image rendering. The former refers to a measure of the microscopic surface structure of a real object’s surface. The latter is a [...] Read more.
We study the relationship between the physical surface roughness of the glossy surfaces of dielectric objects and the roughness parameter in image rendering. The former refers to a measure of the microscopic surface structure of a real object’s surface. The latter is a model parameter used to produce the realistic appearance of objects. The target dielectric objects to analyze the surface roughness are handcrafted lacquer plates with controlled surface glossiness, as well as several plastics and lacquer products from everyday life. We first define the physical surface roughness as the standard deviation of the surface normal, and provide the computational procedure. We use a laser scanning system to obtain the precise surface height information at tiny flat areas of a surface. Next, a method is developed for estimating the surface roughness parameter based on images taken of the surface with a camera. With a simple setup for observing a glossy flat surface, we estimate the roughness parameter by fitting the Beckmann function to the image intensity distribution in the observed HDR image using the least squares method. A linear relationship is then found between the measurement-based surface roughness and image-based surface roughness. We present applications to glossy objects with curved surfaces. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

20 pages, 2131 KB  
Article
Test-Time Augmentation for Cross-Domain Leukocyte Classification via OOD Filtering and Self-Ensembling
by Lorenzo Putzu, Andrea Loddo and Cecilia Di Ruberto
J. Imaging 2025, 11(9), 295; https://doi.org/10.3390/jimaging11090295 - 28 Aug 2025
Viewed by 518
Abstract
Domain shift poses a major challenge in many Machine Learning applications due to variations in data acquisition protocols, particularly in the medical field. Test-time augmentation (TTA) can solve the domain shift issue and improve robustness by aggregating predictions from multiple augmented versions of [...] Read more.
Domain shift poses a major challenge in many Machine Learning applications due to variations in data acquisition protocols, particularly in the medical field. Test-time augmentation (TTA) can solve the domain shift issue and improve robustness by aggregating predictions from multiple augmented versions of the same input. However, TTA may inadvertently generate unrealistic or Out-of-Distribution (OOD) samples that negatively affect prediction quality. In this work, we introduce a filtering procedure that removes from the TTA images all the OOD samples whose representations lie far from the training data distribution. Moreover, all the retained TTA images are weighted inversely to their distance from the training data. The final prediction is provided by a Self-Ensemble with Confidence, which is a lightweight ensemble strategy that fuses predictions from the original and retained TTA samples using a weighted soft voting scheme, without requiring multiple models or retraining. This method is model-agnostic and can be integrated with any deep learning architecture, making it broadly applicable across various domains. Experiments on cross-domain leukocyte classification benchmarks demonstrate that our method consistently improves over standard TTA and Baseline inference, particularly when strong domain shifts are present. Ablation studies and statistical tests confirm the effectiveness and significance of each component. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

23 pages, 1804 KB  
Article
Automatic Algorithm-Aided Segmentation of Retinal Nerve Fibers Using Fundus Photographs
by Diego Luján Villarreal
J. Imaging 2025, 11(9), 294; https://doi.org/10.3390/jimaging11090294 - 28 Aug 2025
Viewed by 586
Abstract
This work presents an image processing algorithm for the segmentation of the personalized mapping of retinal nerve fiber layer (RNFL) bundle trajectories in the human retina. To segment RNFL bundles, preprocessing steps were used for noise reduction and illumination correction. Blood vessels were [...] Read more.
This work presents an image processing algorithm for the segmentation of the personalized mapping of retinal nerve fiber layer (RNFL) bundle trajectories in the human retina. To segment RNFL bundles, preprocessing steps were used for noise reduction and illumination correction. Blood vessels were removed. The image was fed to a maximum–minimum modulation algorithm to isolate retinal nerve fiber (RNF) segments. A modified Garway-Heath map categorizes RNF orientation, assuming designated sets of orientation angles for aligning RNFs direction. Bezier curves fit RNFs from the center of the optic disk (OD) to their corresponding end. Fundus images from five different databases (n = 300) were tested, with 277 healthy normal subjects and 33 classified as diabetic without any sign of diabetic retinopathy. The algorithm successfully traced fiber trajectories per fundus across all regions identified by the Garway-Heath map. The resulting trace images were compared to the Jansonius map, reaching an average efficiency of 97.44% and working well with those of low resolution. The average mean difference in orientation angles of the included images was 11.01 ± 1.25 and the average RMSE was 13.82 ± 1.55. A 24-2 visual field (VF) grid pattern was overlaid onto the fundus to relate the VF test points to the intersection of RNFL bundles and their entry angles into the OD. The mean standard deviation (95% limit) obtained 13.5° (median 14.01°), ranging from less than 1° to 28.4° for 50 out of 52 VF locations. The influence of optic parameters was explored using multiple linear regression. Average angle trajectories in the papillomacular region were significantly influenced (p < 0.00001) by the latitudinal optic disk position and disk–fovea angle. Given the basic biometric ground truth data (only fovea and OD centers) that is publicly accessible, the algorithm can be customized to individual eyes and distinguish fibers with accuracy by considering unique anatomical features. Full article
(This article belongs to the Special Issue Progress and Challenges in Biomedical Image Analysis—2nd Edition)
Show Figures

Figure 1

58 pages, 1551 KB  
Systematic Review
Colorectal Polyp Segmentation Based on Deep Learning Methods: A Systematic Review
by Xin Liu, Nor Ashidi Mat Isa, Chao Chen and Fajin Lv
J. Imaging 2025, 11(9), 293; https://doi.org/10.3390/jimaging11090293 - 27 Aug 2025
Viewed by 1043
Abstract
Colorectal cancer is one of the three most common cancers worldwide. Early detection and assessment of polyps can significantly reduce the risk of developing colorectal cancer. Physicians can obtain information about polyp regions through polyp segmentation techniques, enabling the provision of targeted treatment [...] Read more.
Colorectal cancer is one of the three most common cancers worldwide. Early detection and assessment of polyps can significantly reduce the risk of developing colorectal cancer. Physicians can obtain information about polyp regions through polyp segmentation techniques, enabling the provision of targeted treatment plans. This study systematically reviews polyp segmentation methods. We investigated 146 papers published between 2018 and 2024 and conducted an in-depth analysis of the methodologies employed. Based on the selected literature, we systematically organized this review. First, we analyzed the development and evolution of the polyp segmentation field. Second, we provided a comprehensive overview of deep learning-based polyp image segmentation methods and the Mamba method, as well as video polyp segmentation methods categorized by network architecture, addressing the challenges faced in polyp segmentation. Subsequently, we evaluated the performance of 44 models, including segmentation performance metrics and real-time analysis capabilities. Additionally, we introduced commonly used datasets for polyp images and videos, along with metrics for assessing segmentation models. Finally, we discussed existing issues and potential future trends in this area. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

18 pages, 16540 KB  
Article
E-CMCA and LSTM-Enhanced Framework for Cross-Modal MRI-TRUS Registration in Prostate Cancer
by Ciliang Shao, Ruijin Xue and Lixu Gu
J. Imaging 2025, 11(9), 292; https://doi.org/10.3390/jimaging11090292 - 27 Aug 2025
Viewed by 385
Abstract
Accurate registration of MRI and TRUS images is crucial for effective prostate cancer diagnosis and biopsy guidance, yet modality differences and non-rigid deformations pose significant challenges, especially in dynamic imaging. This study presents a novel cross-modal MRI-TRUS registration framework, leveraging a dual-encoder architecture [...] Read more.
Accurate registration of MRI and TRUS images is crucial for effective prostate cancer diagnosis and biopsy guidance, yet modality differences and non-rigid deformations pose significant challenges, especially in dynamic imaging. This study presents a novel cross-modal MRI-TRUS registration framework, leveraging a dual-encoder architecture with an Enhanced Cross-Modal Channel Attention (E-CMCA) module and a LSTM-Based Spatial Deformation Modeling Module. The E-CMCA module efficiently extracts and integrates multi-scale cross-modal features, while the LSTM-Based Spatial Deformation Modeling Module models temporal dynamics by processing depth-sliced 3D deformation fields as sequential data. A VecInt operation ensures smooth, diffeomorphic transformations, and a FuseConv layer enhances feature integration for precise alignment. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model significantly improves registration accuracy and performs robustly in both static 3D and dynamic 4D registration tasks. Experiments on the μ-RegPro dataset from the MICCAI 2023 Challenge demonstrate that our model achieves a DSC of 0.865, RDSC of 0.898, TRE of 2.278 mm, and RTRE of 1.293, surpassing state-of-the-art methods and performing robustly in both static 3D and dynamic 4D registration tasks. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

25 pages, 4513 KB  
Article
Dual-Filter X-Ray Image Enhancement Using Cream and Bosso Algorithms: Contrast and Entropy Optimization Across Anatomical Regions
by Antonio Rienzo, Miguel Bustamante, Ricardo Staub and Gastón Lefranc
J. Imaging 2025, 11(9), 291; https://doi.org/10.3390/jimaging11090291 - 26 Aug 2025
Viewed by 517
Abstract
This study introduces a dual-filter X-ray image enhancement technique designed to elevate the quality of radiographic images of the knee, breast, and wrist, employing the Cream and Bosso algorithms. Our quantitative analysis reveals significant improvements in bone, edge definition, and contrast (p [...] Read more.
This study introduces a dual-filter X-ray image enhancement technique designed to elevate the quality of radiographic images of the knee, breast, and wrist, employing the Cream and Bosso algorithms. Our quantitative analysis reveals significant improvements in bone, edge definition, and contrast (p < 0.001). The processing parameters are derived from the relationship between entropy metrics and the filtering parameter d. The results demonstrate contrast enhancements for knee radiographs and for wrist radiographs, while maintaining acceptable noise levels. Comparisons are made with CLAHE techniques, unsharp masking, and deep-learning-based models. This method is a reliable and computationally efficient approach to enhancing clinical diagnosis in resource-limited settings, thereby improving robustness and interpretability. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

17 pages, 45337 KB  
Article
Contrastive Learning-Driven Image Dehazing with Multi-Scale Feature Fusion and Hybrid Attention Mechanism
by Huazhong Zhang, Jiaozhuo Wang, Xiaoguang Tu, Zhiyi Niu and Yu Wang
J. Imaging 2025, 11(9), 290; https://doi.org/10.3390/jimaging11090290 - 26 Aug 2025
Viewed by 527
Abstract
Image dehazing is critical for visual enhancement and a wide range of computer vision applications. Despite significant advancements, challenges remain in preserving fine details and adapting to diverse, non-uniformly degraded scenes. To address these issues, we propose a novel image dehazing method that [...] Read more.
Image dehazing is critical for visual enhancement and a wide range of computer vision applications. Despite significant advancements, challenges remain in preserving fine details and adapting to diverse, non-uniformly degraded scenes. To address these issues, we propose a novel image dehazing method that introduces a contrastive learning framework, enhanced by the InfoNCE loss, to improve model robustness. In this framework, hazy images are treated as negative samples and their clear counterparts as positive samples. By optimizing the InfoNCE loss, the model is trained to maximize the similarity between positive pairs and minimize that between negative pairs, thereby improving its ability to distinguish haze artifacts from intrinsic scene features and better preserving the structural integrity of images. In addition to contrastive learning, our method integrates a multi-scale dynamic feature fusion with a hybrid attention mechanism. Specifically, we introduce dynamically adjustable frequency band filters and refine the hybrid attention module to more effectively capture fine-grained, cross-scale image details. Extensive experiments on the RESIDE-6K and RS-Haze datasets demonstrate that our approach outperforms most existing methods, offering a promising solution for practical image dehazing applications. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)
Show Figures

Figure 1

37 pages, 756 KB  
Review
From Fragment to One Piece: A Review on AI-Driven Graphic Design
by Xingxing Zou, Wen Zhang and Nanxuan Zhao
J. Imaging 2025, 11(9), 289; https://doi.org/10.3390/jimaging11090289 - 25 Aug 2025
Viewed by 1286
Abstract
This survey offers a comprehensive overview of advancements in Artificial Intelligence in Graphic Design (AIGD), with a focus on the integration of AI techniques to enhance design interpretation and creative processes. The field is categorized into two primary directions: perception tasks, which involve [...] Read more.
This survey offers a comprehensive overview of advancements in Artificial Intelligence in Graphic Design (AIGD), with a focus on the integration of AI techniques to enhance design interpretation and creative processes. The field is categorized into two primary directions: perception tasks, which involve understanding and analyzing design elements, and generation tasks, which focus on creating new design elements and layouts. The methodology emphasizes the exploration of various subtasks including the perception and generation of visual elements, aesthetic and semantic understanding, and layout analysis and generation. The survey also highlights the role of large language models and multimodal approaches in bridging the gap between localized visual features and global design intent. Despite significant progress, challenges persist in understanding human intent, ensuring interpretability, and maintaining control over multilayered compositions. This survey aims to serve as a guide for researchers, detailing the current state of AIGD and outlining potential future directions. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

40 pages, 48075 KB  
Article
Directional Lighting-Based Deep Learning Models for Crack and Spalling Classification
by Sanjeetha Pennada, Jack McAlorum, Marcus Perry, Hamish Dow and Gordon Dobie
J. Imaging 2025, 11(9), 288; https://doi.org/10.3390/jimaging11090288 - 25 Aug 2025
Viewed by 553
Abstract
External lighting is essential for autonomous inspections of concrete structures in low-light environments. However, previous studies have primarily relied on uniformly diffused lighting to illuminate images and faced challenges in detecting complex crack patterns. This paper proposes two novel algorithms that use directional [...] Read more.
External lighting is essential for autonomous inspections of concrete structures in low-light environments. However, previous studies have primarily relied on uniformly diffused lighting to illuminate images and faced challenges in detecting complex crack patterns. This paper proposes two novel algorithms that use directional lighting to classify concrete defects. The first method, named fused neural network, uses the maximum intensity pixel-level image fusion technique and selects the maximum intensity pixel values from all directional images for each pixel to generate a fused image. The second proposed method, named multi-channel neural network, generates a five-channel image, with each channel representing the grayscale version of images captured in the Right (R), Down (D), Left (L), Up (U), and Diffused (A) directions, respectively. The proposed multi-channel neural network model achieved the best performance, with accuracy, precision, recall, and F1 score of 96.6%, 96.3%, 97%, and 96.6%, respectively. It also outperformed the FusedNet and other models found in the literature, with no significant change in evaluation time. The results from this work have the potential to improve concrete crack classification in environments where external illumination is required. Future research focuses on extending the concepts of multi-channel and image fusion to white-box techniques. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

16 pages, 3972 KB  
Article
Solar Panel Surface Defect and Dust Detection: Deep Learning Approach
by Atta Rahman
J. Imaging 2025, 11(9), 287; https://doi.org/10.3390/jimaging11090287 - 25 Aug 2025
Viewed by 838
Abstract
In recent years, solar energy has emerged as a pillar of sustainable development. However, maintaining panel efficiency under extreme environmental conditions remains a persistent hurdle. This study introduces an automated defect detection pipeline that leverages deep learning and computer vision to identify five [...] Read more.
In recent years, solar energy has emerged as a pillar of sustainable development. However, maintaining panel efficiency under extreme environmental conditions remains a persistent hurdle. This study introduces an automated defect detection pipeline that leverages deep learning and computer vision to identify five standard anomaly classes: Non-Defective, Dust, Defective, Physical Damage, and Snow on photovoltaic surfaces. To build a robust foundation, a heterogeneous dataset of 8973 images was sourced from public repositories and standardized into a uniform labeling scheme. This dataset was then expanded through an aggressive augmentation strategy, including flips, rotations, zooms, and noise injections. A YOLOv11-based model was trained and fine-tuned using both fixed and adaptive learning rate schedules, achieving a mAP@0.5 of 85% and accuracy, recall, and F1-score above 95% when evaluated across diverse lighting and dust scenarios. The optimized model is integrated into an interactive dashboard that processes live camera streams, issues real-time alerts upon defect detection, and supports proactive maintenance scheduling. Comparative evaluations highlight the superiority of this approach over manual inspections and earlier YOLO versions in both precision and inference speed, making it well suited for deployment on edge devices. Automating visual inspection not only reduces labor costs and operational downtime but also enhances the longevity of solar installations. By offering a scalable solution for continuous monitoring, this work contributes to improving the reliability and cost-effectiveness of large-scale solar energy systems. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

Previous Issue
Back to TopTop