Journal of Imaging

J. Imaging, Vol. 12, Pages 246: MSAFusion: A Lightweight Multispectral Pedestrian Detection Network with Multi-Scale and Adaptive Feature Fusion

Yang Song — 2026-05-30

J. Imaging, Vol. 12, Pages 246: MSAFusion: A Lightweight Multispectral Pedestrian Detection Network with Multi-Scale and Adaptive Feature Fusion

Journal of Imaging doi: 10.3390/jimaging12060246

Authors: Yang Song Xin Zuo Chenyu Qu Qiang Qian Dengbiao Jiang

Robust multispectral pedestrian detection remains challenging in complex environments such as those with low illumination, strong thermal contrast, and background clutter. Although RGB–thermal fusion provides complementary cues, lightweight detectors often suffer from unstable feature representation across scales and insufficient control over modality-biased responses during fusion, which can degrade localization accuracy and weaken the detection of small or distant pedestrians. To address these issues, we develop a lightweight stage-wise RGB–thermal fusion pipeline that integrates pre-fusion feature refinement, cross-modal interaction, and post-fusion adaptive recalibration. Specifically, a Multi-scale Feature Refinement (MSFR) module is proposed at the mid-level to enhance modality-specific representations by jointly modeling local details and contextual information, thereby reducing scale-sensitive noise before interaction. An established Cross-Modality Fusion Transformer (CFT) is then adopted to promote semantic correspondence between RGB and thermal features. After interaction, an Adaptive Feature Recalibration (AFR) module is introduced to suppress background-dominated and modality-biased responses through lightweight channel-wise adjustment. Extensive experiments on three public RGB–thermal benchmarks, including the pedestrian-focused KAIST and LLVIP datasets together with the FLIR-aligned road-scene benchmark, demonstrate that the proposed method achieves a favorable accuracy–efficiency trade-off, delivering consistent improvements over the lightweight baseline while maintaining a compact architecture and real-time inference capability.

J. Imaging, Vol. 12, Pages 245: Exposure-Aware Training for Low-Light Object Detection Without Target-Domain Data

Yawen Su — 2026-05-29

J. Imaging, Vol. 12, Pages 245: Exposure-Aware Training for Low-Light Object Detection Without Target-Domain Data

Journal of Imaging doi: 10.3390/jimaging12060245

Authors: Yawen Su Min Lu

Low-light object detection remains challenging because insufficient illumination obscures visual features and increases the discrepancy between training and testing conditions. Existing approaches often rely on detector redesign, image enhancement, or target-domain data, which may introduce additional complexity during training or inference. This paper presents Exposure-Aware Training (EAT), a lightweight degradation-based training strategy that applies illumination attenuation and additive Gaussian noise to normal-light images during training. The degradation parameters are estimated from real low-light image pairs, while the detector architecture remains unchanged. Our experimental results show that moderate degradation consistently improves low-light detection performance, whereas excessively strong degradation may damage semantic information, especially for small objects. Under both cross-domain and mixed-training settings, EAT achieves stable improvements on YOLOv8 and Faster R-CNN, with more noticeable gains for illumination-sensitive categories. These results indicate that exposing detectors to task-oriented illumination degradation during training can effectively improve low-light detection performance without additional inference overhead.

J. Imaging, Vol. 12, Pages 244: Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation

Carmelo Scribano — 2026-05-29

J. Imaging, Vol. 12, Pages 244: Generative Data Augmentation for ArUco-Free RGB-Based 6-DoF Object Pose Estimation

Journal of Imaging doi: 10.3390/jimaging12060244

Authors: Carmelo Scribano Iacopo Ferrari Giorgia Franchini Elena Govi Davide Sapienza Tobia Poppi Micaela Verucchi Marko Bertogna

In recent years, data-driven approaches have become increasingly important in industrial computer vision applications, particularly for 6-Degrees-of-Freedom (6-DoF) object pose estimation. However, benchmark datasets may unintentionally introduce biases that affect the reliability of learned models. In this work, we investigate the shortcut bias induced by fiducial ArUco markers in the widely used Linemod dataset. Although such markers are typically absent in real industrial environments, they introduce unintended visual cues that neural networks tend to exploit. As a result, model selection based on state-of-the-art benchmarks can be biased, since the reported performance often reflects reliance on these shortcuts rather than on robust feature extraction. Using saliency map analysis, we show that often a large portion of the model’s attention is concentrated on these markers, revealing the presence of a shortcut that artificially boosts pose estimation performance. To mitigate this issue, we propose a data augmentation pipeline based on generative AI techniques that removes the markers and replaces the background with more realistic synthesized scenes. Experimental results indicate a noticeable drop in performance when models trained on the original Linemod dataset are evaluated in ArUco-free environments, confirming the presence of background-induced biases. Training with the proposed generative-swapped dataset leads to improved robustness and better generalization to unseen scenarios, although it does not fully eliminate the problem. Overall, the results highlight the impact of background-related biases in pose estimation benchmarks and suggest that the proposed augmentation strategy represents a practical and scalable step toward developing more reliable 6-DoF pose estimation systems for industrial applications, while leaving room for further improvements.

J. Imaging, Vol. 12, Pages 243: Neural Residual Correction for 3D Tooth Point Cloud Canonicalization

Chawalit Chanintonsongkhla — 2026-05-29

J. Imaging, Vol. 12, Pages 243: Neural Residual Correction for 3D Tooth Point Cloud Canonicalization

Journal of Imaging doi: 10.3390/jimaging12060243

Authors: Chawalit Chanintonsongkhla Varin Chouvatut Chumphol Bunkhumpornpat Pornpat Theerasopon

Background: Statistical shape modeling and generative tooth synthesis require dental point clouds in canonical poses. This study compared canonicalization methods and proposed a hybrid pipeline pairing principal-axis alignment with a neural orientation guide and a trained residual correction. Methods: Seven classical, neural, and hybrid methods were evaluated on 9060 upper tooth point clouds across seven classes from 3DTeethSeg (891 patients, 1805 held out for validation) and 1465 external first molars from FDI16. Alignment was measured by Chamfer Distance to per-sample target poses (CD Target, validation only), Chamfer Distance to class templates (CD Template, both sets), and geodesic rotation error. Results: Neural-guided PCA selection with residual refinement (gPCA-rPointNet) reached the lowest CD Target (0.62 ± 2.43 × 10−3) and geodesic rotation error (3.3 ± 14.5 degrees), with 98.2% of predictions below 15 degrees. On the external set, the four PCA-based methods gave a lower CD Template than methods without geometric initialization. Conclusions: A neural orientation guide placed before principal-axis candidate selection resolved the PCA eigenvector sign ambiguity responsible for 180-degree failures on near-symmetric tooth crowns. Residual correction further reduced rotation error. The same pipeline produced consistent canonical poses for first molars on the external dataset, with validation on other tooth classes remaining limited.

J. Imaging, Vol. 12, Pages 242: Quantitative Preclinical Imaging as a Metrological Framework: Reproducibility, Validation, and Translational Maturity

Nicolò Lauciello — 2026-05-29

J. Imaging, Vol. 12, Pages 242: Quantitative Preclinical Imaging as a Metrological Framework: Reproducibility, Validation, and Translational Maturity

Journal of Imaging doi: 10.3390/jimaging12060242

Authors: Nicolò Lauciello Giorgio Russo Alessandro Stefano

Quantitative preclinical imaging enables non-invasive characterization of physiological, molecular, and functional processes providing measurable biomarkers for longitudinal and translational studies. This review systematically analyzes 60 studies published between 2015 and 2025, covering major imaging modalities including Positron emission tomography (PET), Single-Photon Emission Computed Tomography (SPECT), Magnetic resonance imaging (MRI), Computed Tomography (CT), optical imaging, and hybrid systems across murine and zebrafish models. We examine methodological frameworks for parameter extraction, reproducibility, and validation against biological reference standards, evaluating each modality through a cross-cutting analytical framework that distinguishes technical, biological, and computational sources of quantitative variance and identifies the current metrological maturity of harmonization infrastructure across platforms. Key comparative findings indicate that variability sources can be broadly categorized into technical (instrumentation, reconstruction, calibration) and biological (physiological heterogeneity, model-specific factors), with their interaction governing overall measurement uncertainty. Emerging computational approaches, including parametric modeling and artificial intelligence–assisted pipelines, show potential in reducing variance and improving parameter stability, although they introduce additional dependencies requiring validation. Collectively, this review frames quantitative preclinical imaging as a metrological discipline, emphasizing that reproducibility, bias control, and cross-modality harmonization are critical for generating robust and translationally relevant imaging biomarkers.

J. Imaging, Vol. 12, Pages 241: Imaging of Fibrous Dysplasia: A Comprehensive In-Depth Analysis of Monostotic, Polyostotic, Syndromic Forms, and Bone Sarcoma Development

Paolo Spinnato — 2026-05-29

J. Imaging, Vol. 12, Pages 241: Imaging of Fibrous Dysplasia: A Comprehensive In-Depth Analysis of Monostotic, Polyostotic, Syndromic Forms, and Bone Sarcoma Development

Journal of Imaging doi: 10.3390/jimaging12060241

Authors: Paolo Spinnato Nicola Marrone Domenico Romeo Matilde Gonçalves Roberts Naglis Leonardo Di Battista Elena Pedrini Maria Parisi Raffaella Rinaldi Silvia Gazzotti Alberto Righi Marco Colangeli

Fibrous dysplasia is one of the most common skeletal lesions. The wide spectrum of clinical manifestations ranges from asymptomatic conditions (typical of monostotic forms) to severe skeletal diseases with deformity and fractures for polyostotic fibrous dysplasia. The classical radiological features include: an osteolytic geographic pattern, ground-glass bone matrix, cortical thinning/cortical scalloping, bone deformities and enlargement, concavity of margins (evaluated with MRI), and cystic areas (MRI). All the bones can be affected, and the proximal femur is the most common one (about 30% of cases). Nonetheless, the disease can also affect cranio-facial bones, leading to compression of neural structures, as well as deformation and enlargement of facial bones, leading to the so-called “leontiasis ossea” or “facies leonine”. The polyostotic forms of fibrous dysplasia can be associated with multiple soft-tissue myomas (Mazabraud syndrome) or several endocrine diseases (McCune–Albright syndrome). In every diagnostic step of the disease, as well as in different fibrous dysplasia forms, imaging plays a key role. Indeed, radiology is fundamental to assess the suspicion of fibrous dysplasia in classical monostotic forms, representing the sole diagnostic tool needed in many cases. Imaging is also fundamental to staging and following up on more severe polyostotic forms, as well as for detecting complications. In this comprehensive updated review article, we examine every aspect of the disease, with a main focus on imaging presentation. The indications for biopsy are discussed as well. Most importantly, the article details the potential risk of malignant transformation (osteosarcoma, fibrosarcoma, chondrosarcoma, and other rarer sarcomas, all accounting for <1% of cases) underlying the radiological patterns of these conditions. The occurrence of aneurysmal bone cyst-like changes on fibrous dysplasia is also analyzed in the article. This review article aims to be a comprehensive guide for radiologists and clinicians involved in the care of patients affected by various forms of fibrous dysplasia, and a starting point for future research. Many classical and atypical cases are collected as an iconographic comprehensive representation.

J. Imaging, Vol. 12, Pages 240: WAFF: A Synergetic Face Forgery Video Detection Method via Weakly Supervised EfficientNet

Zhengzhuo Pan — 2026-05-29

J. Imaging, Vol. 12, Pages 240: WAFF: A Synergetic Face Forgery Video Detection Method via Weakly Supervised EfficientNet

Journal of Imaging doi: 10.3390/jimaging12060240

Authors: Zhengzhuo Pan Bohan Chen Longxiang Ma Dawei Jin Yu Zhou Yudi Huang

Deepfake detection has become an essential task for ensuring the authenticity and security of digital media. Although recent approaches have achieved notable progress, most existing detectors still exhibit limited generalization to unseen forgery techniques and remain vulnerable to common perturbations such as compression, noise, and adversarial attacks. To overcome these issues, we propose Weakly Supervised EfficientNet Augmented Face Forgery Detector (WAFF), a novel framework that integrates fine-grained per-frame analysis with adaptive video-level fusion. Specifically, WAFF integrates WSEffiNet, an EfficientNet-B3-based backbone enhanced with a Weakly Supervised Data Augmentation Network (WS-DAN). This design generates attention maps to emphasize subtle facial forgery artifacts while encouraging complementary local–global feature learning. At the video level, WAFF incorporates a multi-strategy fusion scheme that combines fake-frame counting, confidence averaging, and attention-guided voting to strike a balance between sensitivity and stability. Extensive experiments on FaceForensics++, Celeb-DF v2, DFD, DFDC, and FFIW-10K demonstrate that WAFF can achieve state-of-the-art performance under both high- and low-quality compression, while also enhancing cross-dataset generalization.

J. Imaging, Vol. 12, Pages 239: A Pilot Study on AI-Driven Age Estimation and Sex Determination in Greek Individuals

Anastasia Mitsea — 2026-05-29

J. Imaging, Vol. 12, Pages 239: A Pilot Study on AI-Driven Age Estimation and Sex Determination in Greek Individuals

Journal of Imaging doi: 10.3390/jimaging12060239

Authors: Anastasia Mitsea Nikolaos Christoloukas Aliki Rontogianni Marko Subašić Denis Milošević Marin Vodanović

AI methods (machine learning and deep learning methods) presented promising results concerning the accuracy of dental age estimation and sex determination. Therefore, this pilot study aims to evaluate the efficacy of an artificial intelligence system to estimate age and determine sex in a Greek population sample. Panoramic radiographs from 110 adult subjects comprised this study’s sample. Males and females were equally distributed (1/1) in the sample. The dental status of each patient was different. The sample’s age ranged from 9 to 84 years of age, with a mean age of 48.87 years (±16.14 yrs). The methodology employed beta versions of convolutional neural networks (CNNs) developed by the University of Zagreb. Separate CNNs were trained on 4000 panoramic radiographs: one for sex estimation and another for age estimation. The AI program overestimated the subjects’ age on average by 4.16 years. A statistically significant correlation was found between true and estimated sex (p-value < 0.001). In males, the rate of agreement was 56.36%, while for females it was 89.47% (z-test for two proportions; p-value < 0.001). For the overall sample, Kappa = 73.21%, indicating a very good agreement. The results concerning age estimation are not quite satisfactory and further research is needed.

J. Imaging, Vol. 12, Pages 237: 2s-DAS: Two-Stream Diffusion with Multi-Modal Fusion for Temporal Action Segmentation

Ce Li — 2026-05-28

J. Imaging, Vol. 12, Pages 237: 2s-DAS: Two-Stream Diffusion with Multi-Modal Fusion for Temporal Action Segmentation

Journal of Imaging doi: 10.3390/jimaging12060237

Authors: Ce Li Xuli Guo Ruijie Wang Kaipan Zhao Linlin Yang Fang Wan

Human temporal action segmentation (TAS) is a fundamental video understanding task aimed at partitioning untrimmed videos into semantically coherent action segments. While temporal convolutional networks and transformers have significantly improved frame representation and temporal modeling, existing methods are still constrained by two critical limitations: the dependence on single-modal inputs and the inefficiency of iterative, frame-wise sequential modeling. To address these gaps, we propose 2s-DAS: a novel two-stream diffusion-based framework for action segmentation characterized by three key contributions. First, we introduce a multi-modal frame representation that integrates optical flow with Br-Prompt RGB features, thereby capturing richer spatial-temporal context and enhancing feature representation. Second, we leverage a diffusion model to perform sequence segmentation, utilizing importance sampling to prioritize key frames for segment-level temporal modeling. Concurrently, a refinement mechanism based on iterative decoding denoising is introduced to ensure fine-grained action prediction. Third, we design a two-stream fusion mechanism that processes the streams of RGB with text and optical flow separately and integrates multi-modal information by a late fusion strategy to explicitly reduce oversegmentation. Evaluation experiments on GTEA, 50Salads, and Breakfast datasets show that our 2s-DAS significantly outperforms state-of-the-art methods, setting new benchmarks while effectively addressing the over-segmentation issue.

J. Imaging, Vol. 12, Pages 236: MultiRetNet: A Lightweight Explainable AI Approach to Diabetic Retinopathy Grading and DME Detection Using Fundus–OCT Fusion

Saad Islam — 2026-05-28

J. Imaging, Vol. 12, Pages 236: MultiRetNet: A Lightweight Explainable AI Approach to Diabetic Retinopathy Grading and DME Detection Using Fundus–OCT Fusion

Journal of Imaging doi: 10.3390/jimaging12060236

Authors: Saad Islam Ravinesh C. Deo U. Rajendra Acharya Prabal Datta Barua Jeffrey Soar

Diabetic retinopathy (DR) and diabetic macular oedema (DME) are two of the most significant preventable contributors to blindness in the adult population worldwide, yet current automated screening systems typically address each condition in isolation and rely on a single imaging modality. In this study, we propose a deep learning model that simultaneously grades DR severity and detects DME by fusing paired colour fundus and optical coherence tomography (OCT) images acquired from the same eye during the same clinical visit. Our architecture employs two parallel EfficientNet-B0 backbones pre-trained on ImageNet, one for each modality, whose 1280-dimensional feature vectors are concatenated into a 2560-dimensional joint representation. This fused representation passes through a shared fully connected block before branching into a three-class DR classification head and a binary DME detection head. We train and evaluate the model on a private dataset of 425 paired fundus and OCT eye images (850 images). The proposed architecture adopts feature-level fusion, in which modality-specific deep features are independently extracted from fundus and OCT images using separate convolutional backbones and subsequently concatenated to form a joint representation for multi-task learning. On the held-out test set (n= 85), the fusion model achieves 82.4% DR accuracy (area under the receiver operating characteristic curve [AUC] = 0.929, macro sensitivity = 0.81, macro specificity = 0.905) and 97.6% DME accuracy (AUC = 0.999, sensitivity = 0.833, specificity = 1.000). The fusion model detects 10 of 12 DME-positive eyes compared with only 7 of 12 for either the fundus-only or OCT-only baselines, representing a 43% relative improvement in DME sensitivity. Stratified five-fold cross-validation (n = 425 aggregated predictions) corroborates these findings, with the fusion model reaching 87.1% DR accuracy (AUC = 0.978) and 99.1% DME accuracy (AUC = 1.000). Gradient-weighted class activation mapping visualisations confirm that the fundus branch attends to clinically relevant macular lesions, whereas the OCT branch highlights retinal layer disruptions and subretinal fluid, providing interpretability. To the best of our knowledge, the proposed MultiRetNet is the first lightweight, task-specific multimodal architecture to jointly grade DR severity and detect DME from paired same-eye, same-visit fundus and OCT images through explicit feature-level fusion within a single end-to-end multi-task framework, distinct from recent generalist ophthalmic foundation models, supporting the value of multimodal fusion for comprehensive diabetic eye screening pending external validation.

J. Imaging, Vol. 12, Pages 238: Mask Optimization for High-Precision Extraction of Geometric Features in Microscopic Scenes

Tianbo Kang — 2026-05-28

J. Imaging, Vol. 12, Pages 238: Mask Optimization for High-Precision Extraction of Geometric Features in Microscopic Scenes

Journal of Imaging doi: 10.3390/jimaging12060238

Authors: Tianbo Kang Jianpeng Zhang Xin Zhao Mingzhu Sun Yunwang Zhang

Regular geometric targets under microscopic scenes, such as microspheres, micropores, and microtubes, are characterized by small scales, low contrast, and degraded boundaries. Masks generated by general segmentation methods often fail to directly support high-precision geometric parameter measurement. This paper proposes a mask optimization method for the high-precision extraction of regular geometric features in microscopic scenes. We establish a mask optimization framework that integrates initial mask generation with geometric consistency refinement. Mask initialization is first performed through segmentation and adaptive super-resolution (SR) under low annotation constraints. Subsequently, an iterative optimization strategy that fuses multi-dimensional pixel features with regular geometric priors is designed. By incorporating geometric features extracted from the current mask while maintaining stable pixel-level observations, the mask is progressively corrected until convergence to generate target masks with continuous boundaries that satisfy stringent geometric constraints. Our experimental results on a sphere–tube assembly dataset demonstrate that the proposed method achieves lower geometric errors on successfully fitted samples and significantly improves the fitting success rate. Ablation studies further confirm the critical roles of dynamic SR and iterative mask optimization in enhancing overall precision and stability. These findings suggest that for microscopic regular geometric measurement tasks, integrating geometric-consistency constraints into mask optimization effectively improves both the accuracy and robustness of geometric feature extraction.

J. Imaging, Vol. 12, Pages 235: ADPCNet: Adaptive Deformable Peripheral Convolution for Efficient Image Dehazing

Wang — 2026-05-28

J. Imaging, Vol. 12, Pages 235: ADPCNet: Adaptive Deformable Peripheral Convolution for Efficient Image Dehazing

Journal of Imaging doi: 10.3390/jimaging12060235

Authors: Wang Zhu Zheng Yang Hu

Single-image dehazing requires wide-range visibility estimation and local structure recovery under spatially varying degradation. Existing large-context models improve global reasoning, but they often incur heavy computation or lose sensitivity to irregular haze boundaries and attenuated details. To address these issues, we propose the Adaptive Deformable Peripheral Convolution Network (ADPCNet), a compact encoder–decoder that organizes dehazing into four coupled operations: conditional adaptive sharing for peripheral large-kernel context modeling, deformable sampling for geometry-aware aggregation, frequency-guided modulation for detail compensation, and dynamic multi-branch fusion for content-adaptive integration. The key idea is to separate broad haze estimation, structure alignment, and detail recovery within an efficient operator stack. Experiments on RESIDE, Dense-Haze, and NH-Haze show that ADPCNet achieves competitive paired-benchmark performance with 7.25 M parameters and 33.62 G FLOPs, reaching 40.89 dB/0.997 on SOTS-Indoor, 37.80 dB/0.996 on SOTS-Outdoor, 18.05 dB/0.679 on Dense-Haze, and 21.66 dB/0.815 on NH-Haze. The ablation and sensitivity results further support the contributions of the proposed modules and the selected kernel configuration. Overall, these results indicate that ADPCNet maintains a favorable quality-efficiency trade-off under the matched paired evaluation protocol.

J. Imaging, Vol. 12, Pages 234: FairEdge360: Distributed Multi-Agent Reinforcement Learning for QoE-Fair 360° Video Streaming with Uncertainty-Aware Edge Coordination

Reka Sandaruwan Gallena Watthage — 2026-05-28

J. Imaging, Vol. 12, Pages 234: FairEdge360: Distributed Multi-Agent Reinforcement Learning for QoE-Fair 360° Video Streaming with Uncertainty-Aware Edge Coordination

Journal of Imaging doi: 10.3390/jimaging12060234

Authors: Reka Sandaruwan Gallena Watthage Anil Fernando

Shared immersive environment sports venues, virtual classrooms, and collaborative workspaces require multiple users to stream 360° videos simultaneously over the same edge network, yet every existing adaptive bitrate system optimises each viewer in isolation. This self-interested behaviour triggers a bandwidth auction that chronically starves the most uncertain viewers: Jain’s Fairness Index for ten independently optimised agents routinely falls below 0.85. We present FairEdge360, a hierarchical multi-agent reinforcement learning framework that reformulates multi-user 360° streaming as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP) and proves, formally, that fairness and quality are complementary rather than competing objectives. Three tightly coupled innovations make this possible. First, a Lightweight Uncertainty Estimator (LUE) a compact 8385-parameter four-layer MLP evaluates per-device viewport prediction confidence cti=σ(w4⊤h3) in under approximately 2.1 ms on commodity smartphones (95th percentile, iPhone 12 A14 Bionic), enabling selective edge offloading that reduces device energy consumption by 38.9%. Second, a variational Graph Neural Network compresses each agent’s 256-dimensional GRU state into a 32-byte INT8 latent, transmitted over a dynamic RTT-gated neighbourhood graph at 96 bytes per agent per 500 ms 75% less overhead than competing approaches. Third, the edge coordinator maximises the Nash social welfare objective NSW=(∏i=1NQi)1/N, whose gradient ∂NSW/∂Qi&prop;1/Qi automatically prioritises the most disadvantaged viewer; a formal proof guarantees that every Pareto-optimal policy satisfies Qi/∑jQj≥1/N. Counterfactual advantage estimation correctly attributes each agent’s marginal contribution to the global reward, eliminating the credit-assignment ambiguity inherent in standard multi-agent baselines. Evaluated on 284 users, 52 omnidirectional videos, and 10,000 real network traces spanning 4G LTE, 5G mmWave, HSDPA, and campus WiFi, FairEdge360 raises Jain’s Fairness Index from 0.934 to 0.976 (+4.5%), improves worst-case user quality-of-experience from MOS 2.54 to MOS 3.21 (+26.4%), and halves rebuffering rate from 2.1% to 1.1%, all within a 20 ms motion-to-photon budget on a commodity smartphone.

J. Imaging, Vol. 12, Pages 233: Brain Tumor Classification in MRI Images Using Combined Transfer Learning and Convolutional Neural Networks

Maisam Abbas — 2026-05-28

J. Imaging, Vol. 12, Pages 233: Brain Tumor Classification in MRI Images Using Combined Transfer Learning and Convolutional Neural Networks

Journal of Imaging doi: 10.3390/jimaging12060233

Authors: Maisam Abbas Muhammad Hassan Ran-Zan Wang Chin-Hung Teng

Early and accurate brain tumor detection is vital for effective treatment. We propose a deep learning framework for MRI-based brain tumor classification, featuring a novel Custom CNN evaluated independently alongside six pre-trained models for comparative analysis (InceptionV3, EfficientNetV2L, ResNet152V2, Xception, VGG16, and MobileNetV2). Additionally, three separate ensemble models are constructed to analyze whether model combination improves performance. Experiments conducted on the Kaggle-Multiclass brain MRI dataset show that the proposed Custom CNN achieves the best performance, with an accuracy of 99.54%, and features a task-specific architecture (0.57M parameters) that achieves superior performance through domain-specific feature learning and computational efficiency, thus outperforming both individual pre-trained models and ensemble approaches. Among pre-trained models, EfficientNetV2L (99.47%) and InceptionV3 (99.39%) show competitive results, while the best ensemble model achieves 99.47% accuracy, indicating clinical deployment potential pending external validation. These results demonstrate that the proposed Custom CNN provides superior performance without requiring ensemble complexity, thus highlighting its effectiveness and efficiency for automated brain tumor classification.

J. Imaging, Vol. 12, Pages 232: Systems-Level Support for Hybrid Quantum-Classical Learning: A Systematic Review with a Medical Imaging Translation Lens

Maqsudur Rahman — 2026-05-28

J. Imaging, Vol. 12, Pages 232: Systems-Level Support for Hybrid Quantum-Classical Learning: A Systematic Review with a Medical Imaging Translation Lens

Journal of Imaging doi: 10.3390/jimaging12060232

Authors: Maqsudur Rahman Pintu Chandra Paul Amena Begum Kashmi Sultana Nahida Akter Anup Majumder Mengran Zhu Ze Sheng Wangjiaxuan Xin Xin Jin Jun Zhuang

Hybrid quantum-classical learning pipelines combine conventional accelerators, quantum runtimes, and quantum processing units (QPUs), creating scheduling, memory, isolation, encoding, and deployment challenges that are not captured by application-level quantum machine learning surveys alone. This paper presents a systematic review of runtime and systems mechanisms for hybrid quantum-classical workloads, with medical imaging used as a translation lens rather than as an exclusive inclusion boundary. Following a PRISMA-aligned review process, we screened 364 records and synthesized 40 studies published between 2020 and 2025. Each study was coded by systems layer, application grounding, noisy-label relevance, and evaluation maturity. The coding shows that the corpus combines direct medical evidence with broader transferable systems evidence: 8 studies directly evaluated medical data, 12 were medically motivated, and 20 were generic systems studies. Across the corpus, the strongest support concerns hybrid orchestration, qubit/resource allocation, classical–quantum data movement, and container-based reproducibility, whereas evidence remains limited for realistic clinical operation, end-to-end remote-QPU workflows, multi-tenant isolation, and noisy-label retraining loops. We contribute an evidence map, a direct/indirect/interpretive evidence distinction, and cross-layer design guidelines for future hybrid quantum-classical imaging pipelines in regulated settings.

J. Imaging, Vol. 12, Pages 231: When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images

Loris Cino — 2026-05-27

J. Imaging, Vol. 12, Pages 231: When AI and Experts Agree on Error: Intrinsic Ambiguity in Dermatoscopic Images

Journal of Imaging doi: 10.3390/jimaging12060231

Authors: Loris Cino Pier Luigi Mazzeo Alessandro Martella Giulia Radi Renato Rossi Cosimo Distante

The integration of artificial intelligence (AI), particularly convolutional neural networks (CNNs), into dermatological diagnosis demonstrates substantial clinical potential. While the existing literature predominantly benchmarks algorithmic performance against human experts, our study adopts a novel perspective by investigating the intrinsic complexity of dermatoscopic images. Through rigorous experimentation with multiple CNN architectures, we isolated a subset of images systematically misclassified across all models—a phenomenon statistically proven to exceed random chance. To determine whether these failures stem from algorithmic biases or inherent visual ambiguity, expert dermatologists independently evaluated these challenging cases alongside a control group. The results revealed a collapse in human diagnostic performance on the AI-misclassified images. First, agreement with ground-truth labels plummeted, with Cohen’s kappa dropping to a mere 0.08 for this subset, compared to 0.61 for the control group. Second, we observed a severe deterioration in expert consensus; inter-rater reliability among physicians fell from moderate concordance (Fleiss’ kappa = 0.456) on control images to only modest agreement (Fleiss’ kappa = 0.275) on the misclassified subset. We identified image quality as a primary driver of these dual systematic failures. To promote transparency and reproducibility, all data, code, and trained models have been made publicly available.

J. Imaging, Vol. 12, Pages 230: Artificial Intelligence Dystocia Algorithm (AIDA) for Risk Stratification of Occiput Posterior Fetal Head Position

Antonio Malvasi — 2026-05-27

J. Imaging, Vol. 12, Pages 230: Artificial Intelligence Dystocia Algorithm (AIDA) for Risk Stratification of Occiput Posterior Fetal Head Position

Journal of Imaging doi: 10.3390/jimaging12060230

Authors: Antonio Malvasi Giorgio Maria Baldini Tommaso Difonzo Iris Cara Marco Cerbone Miriam Dellino Antonella Vimercati Ilenia Mappa Giuseppe Rizzo Andrea Tinelli Ettore Cicinelli Edoardo Di Naro Lorenzo E. Malgieri

The occiput posterior (OP) fetal head position is the most common malposition during labor and is associated with prolonged labor, operative delivery, and cesarean section. Conventional assessment often relies on digital examination, and the clinical significance of OP may lie along a spectrum rather than as a binary diagnosis. The Artificial Intelligence Dystocia Algorithm (AIDA) integrates four objective intrapartum ultrasound parameters (Angle of Progression [AoP], Head–Symphysis Distance [HSD], Midline Angle [MLA], and Asynclitism Degree [AD]) into a five-class ordinal classification (Classes 0–4). This investigation is a focused secondary subgroup analysis of 79 OP cases drawn from a single-cohort dataset of 135 nulliparous women with prolonged second-stage labor originally collected for the development of the AIDA. Only Branch 1 of the AIDA (the deterministic threshold-based classification, with cut-offs originally derived via Decision Tree on the parent cohort, N = 135) was applied; Branch 2 (the case-level machine-learning predictors) was not used, and no predictive model was trained or validated in this study. Cesarean delivery rates rose monotonically across AIDA classes, from no cesareans in Class 0 to all cases delivering by cesarean in Class 4, with a clear gradient across intermediate classes; full numerical results, confidence intervals, and effect sizes are reported in the Results section. Because the AIDA thresholds were derived from the same parent cohort, the analysis is best interpreted as a within-cohort subgroup evaluation rather than as independent validation. The observed class-graded outcome distribution is consistent with the hypothesis that in OP labors, the AIDA class assignment itself may carry clinically relevant information on the risk of intrapartum cesarean delivery; this remains hypothesis generating, and confirmation in independent prospective cohorts is required before AIDA-class assignment can be regarded as an established risk-stratification descriptor in OP labors.

J. Imaging, Vol. 12, Pages 229: Unsupervised Wildfire Detection Using Multispectral MTG-FCI Data: A Feasibility Study

Alessandro Mercatini — 2026-05-27

J. Imaging, Vol. 12, Pages 229: Unsupervised Wildfire Detection Using Multispectral MTG-FCI Data: A Feasibility Study

Journal of Imaging doi: 10.3390/jimaging12060229

Authors: Alessandro Mercatini Nazario Tartaglione

The launch of the Flexible Combined Imager (FCI) sensor aboard the Meteosat Third Generation (MTG) satellite enables higher temporal and spatial resolution for geostationary environmental monitoring. This study explores the feasibility of near-real-time fire detection using MTG-FCI data. Two unsupervised approaches are evaluated on data covering the Italian territory: a conventional threshold method, applying fixed radiometric thresholds and brightness temperature differences between 3.8 μm and 10.5 μm, and an experimental Lightweight U-Net autoencoder for anomaly detection. The autoencoder is trained exclusively on fire-free imagery, with fires identified as statistical anomalies in the reconstruction error, refined through local and global z-score analysis. Validation combines high-resolution Sentinel-2 imagery, Fire Radiative Power (FRP) and data from European Forest Fire Information System (EFFIS). Results demonstrate that MTG-FCI can trigger active fire alerts prior to polar overpasses in 67.32% of the synchronized cases, providing a median early detection lead time of 21.00 min and reaching an advance of up to approximately 6 h in exceptional instances. While the spatial resolution limits detailed fire-front mapping, the high temporal resolution enables a robust near-real-time alerting system, providing enhanced detection of transient fire events that are not captured by lower-frequency polar-orbiting sensors.

J. Imaging, Vol. 12, Pages 228: A Comprehensive Review of Artificial Intelligence for Brain Tumor Analysis: Taxonomy, Robustness, and Open Challenges in Neuro-Oncology

Mais Haj Qasem — 2026-05-27

J. Imaging, Vol. 12, Pages 228: A Comprehensive Review of Artificial Intelligence for Brain Tumor Analysis: Taxonomy, Robustness, and Open Challenges in Neuro-Oncology

Journal of Imaging doi: 10.3390/jimaging12060228

Authors: Mais Haj Qasem Thamer Mitib Al Sariera Khadija Alhumaid Shadi Majed Alshraah Ahmad Subhi Salem Mufleh Naceur Chihaoui

Detecting brain tumors can be challenging as a clinical problem because of tumor heterogeneity and reliance on manual neuroimaging interpretation, which can be prone to human error. Artificial intelligence (AI) has shown strong potential as a clinical decision-support tool, assisting radiologists in improving diagnostic accuracy and supporting the interpretation of neuroimaging data. AI using machine learning (ML) and deep learning (DL) algorithms has performed credibly in tumor detection, segmentation, and classification tasks. Challenges such as dataset bias, limited generalization, lack of explainability, and high computational costs must be addressed before clinical application. This article provides a comprehensive review of AI methods applied to brain tumor imaging, with a primary focus on adult diffuse gliomas and secondary coverage of brain metastases, meningiomas, and pediatric tumors where relevant. The major contribution of this review is a new three-factor (diagnostic tasks, learning strategies, and data modalities) taxonomy. Beyond accuracy-based metrics, we provide a qualitative assessment of robustness, generalization, and the principal barriers to clinical adoption identified in the published literature, while acknowledging that comprehensive clinical utility evidence remains an open research direction.

J. Imaging, Vol. 12, Pages 227: UHPose-VAD: Unsupervised Video Anomaly Detection via Pose-Graph Learning and Normalizing Flow

Di Jiang — 2026-05-27

J. Imaging, Vol. 12, Pages 227: UHPose-VAD: Unsupervised Video Anomaly Detection via Pose-Graph Learning and Normalizing Flow

Journal of Imaging doi: 10.3390/jimaging12060227

Authors: Di Jiang Huicheng Lai Guxue Gao Dan Ma Liejun Wang

Unsupervised video anomaly detection (VAD) aims to identify unusual events by learning from unlabeled videos. However, many current methods overlook the fine-grained spatiotemporal dynamics of human poses, which are crucial for detecting localized anomalies like falls or assaults. Prevailing methods that rely on raw RGB frames are often susceptible to variations in lighting and background and struggle to capture the precise structural relationships of human bodies over time. To bridge this gap, we propose UHPose-VAD, a novel unsupervised framework that integrates human pose dynamics with normalizing flow within a graph-based probabilistic model to capture anomalies through spatiotemporal Gaussian distributions. Our framework first extracts human pose keypoints and normalizing flow features. These are then modeled by a graph convolutional network that adaptively learns the graph connectivity, effectively mapping the data to a latent space. This approach allows the model to explicitly reason about the spatiotemporal relationships between body joints, making it inherently more robust and interpretable for human-centric anomaly detection. Finally, a Gaussian Mixture Model fits the latent features of normal training data, learning the intrinsic manifold of regular motion patterns. Extensive experiments on ShanghaiTech and UBnormal datasets show that UHPose-VAD achieves state-of-the-art performance among unsupervised methods, with AUC scores of 86.1% and 69.4%, respectively.

J. Imaging, Vol. 12, Pages 226: AI Model for Textile Materials Identification Using Hyperspectral Data

Fariborz Eghtedari — 2026-05-27

J. Imaging, Vol. 12, Pages 226: AI Model for Textile Materials Identification Using Hyperspectral Data

Journal of Imaging doi: 10.3390/jimaging12060226

Authors: Fariborz Eghtedari Leszek Pecyna Rhys Evans

Efficient textile recycling depends on accurate identification of fibre types and compositions to support high-value material recovery and automated sorting. Existing commercial systems based on near-infrared (NIR) spectroscopy offer robust performance, but their model architectures and development methods are proprietary, and they often struggle to detect materials when carbon-black (graphite-based) dyes suppress the spectral signatures. This paper presents a hyperspectral imaging approach for textile fibre identification, combined with an artificial intelligence model capable of detecting cotton, polyester, elastane, and regions affected by carbon-black dye. Sixty-five textile samples were laboratory-verified to determine constituent materials and compositions, with 52 used in model development and testing. A semi-automatic algorithm detected textile boundaries and sampled 100 spectral patches per image. For materials exhibiting two distinct spectral signatures, typically due to carbon-black dye regions, 100 samples were collected for each signature, producing a database of 6500 spectra. A convolutional neural network model was trained using these signatures to predict fibre composition and identify any regions with carbon-black dye. The system achieved mean absolute errors below 2.1% for cotton, polyester, and elastane. A spatial clustering step groups pixels with similar spectra prior to detection, enabling region-wise material identification and allowing the model to classify clusters likely affected by carbon-black dye. This approach demonstrates high precision in fibre identification and reliable detection of carbon-black regions, highlighting its suitability for real-world textile analysis workflows.

J. Imaging, Vol. 12, Pages 225: Morton Code-Based Geometry-Adaptive Surface Reconstruction

Zili Huang — 2026-05-26

J. Imaging, Vol. 12, Pages 225: Morton Code-Based Geometry-Adaptive Surface Reconstruction

Journal of Imaging doi: 10.3390/jimaging12060225

Authors: Zili Huang Ran Fan Yongwei Miao

Neural implicit surface representations have yielded impressive results in 3D reconstruction, yet existing methods tend to introduce noise in smooth regions or fail to capture fine details in complex areas, primarily due to a lack of explicit spatial structure modeling. To address these limitations, we propose a geometry-adaptive surface reconstruction method based on Morton codes. By mapping 3D space onto octree traversal paths, this approach provides a natural spatial structural prior for the reconstruction process. For each query point, an implicit octree generates a unique root-to-leaf trajectory, yielding spatially adaptive weights that modulate multi-resolution geometric features. Specifically, low-frequency coarse features dominate in flat regions to suppress noise, whereas high-frequency fine features are activated in edge-rich areas to recover intricate geometry. Experimental results demonstrate competitive performance across multiple datasets, particularly in reconstructing sharp features and fine-grained geometric details.

J. Imaging, Vol. 12, Pages 224: A Review of the Forward Problem in Electrocardiographic Imaging

Xiafeng Zhang — 2026-05-25

J. Imaging, Vol. 12, Pages 224: A Review of the Forward Problem in Electrocardiographic Imaging

Journal of Imaging doi: 10.3390/jimaging12060224

Authors: Xiafeng Zhang Xuanhe Han Kaiyu Chen Yucheng Wang Wei Li Shaoxi Wang

Electrocardiographic imaging (ECGI) is a noninvasive technique for reconstructing cardiac electrical activity by recording body-surface potentials and geometric information of the heart and torso. The ECGI forward problem is the cornerstone of ECGI. Depending on the forward model, ECGI can reconstruct epicardial/endocardial surface potentials, activation/recovery sequences, transmembrane voltages, and other electrophysiological quantities of interest. This article reviews the modeling process and research progress of forward modeling. This review systematically summarizes the mathematical methods used in the ECGI forward problem by classifying them into three representative models: the cardiac surface potential (CSP), equivalent double-layer (EDL), and transmembrane voltage (TMV) models, with detailed derivations and guidance on their practical applications.

J. Imaging, Vol. 12, Pages 223: DiSMix: Dimensional Swap Mix for Feature-Level Data Augmentation in Vision Transformers

Rinka Kiriyama — 2026-05-25

J. Imaging, Vol. 12, Pages 223: DiSMix: Dimensional Swap Mix for Feature-Level Data Augmentation in Vision Transformers

Journal of Imaging doi: 10.3390/jimaging12060223

Authors: Rinka Kiriyama Akio Sashima Ikuko Shimizu

Mixup is a data augmentation technique that improves prediction accuracy in classification tasks by combining representations of training samples, which makes it particularly effective in settings with limited data and during fine-tuning for downstream tasks. However, representations generated by mixup may appear unnatural, which can negatively affect fine-tuning performance. To address this limitation, we propose a vision transformer (ViT)-aware variant of mixup strategies, Dimensional Swap Mix (DiSMix). DiSMix divides a representation vector into two segments corresponding to subspaces of the original feature space and generates new representations by swapping one segment with that from another sample and concatenating the segments. This allows part of the original representation to remain unchanged, enabling the model to learn from partially preserved features. We evaluate DiSMix by applying several mixup-based methods to fine-tune ViTs on the VTAB-1k benchmark. The findings show that DiSMix improves accuracy on the VTAB-1k Natural split, reaching 80.0%, compared with conventional mixup methods. This suggests that DiSMix is an effective alternative for representation-level data augmentation in fine-tuning scenarios.

J. Imaging, Vol. 12, Pages 222: Determinants of Impaired Left Atrial Hemodynamics in Paroxysmal Atrial Fibrillation: A 4D Flow MRI Study

Hadi Hassan — 2026-05-25

J. Imaging, Vol. 12, Pages 222: Determinants of Impaired Left Atrial Hemodynamics in Paroxysmal Atrial Fibrillation: A 4D Flow MRI Study

Journal of Imaging doi: 10.3390/jimaging12060222

Authors: Hadi Hassan Omar Hassan Shuvam Prasai Fiza Rajput Julio Garcia

Left atrial (LA) blood flow stasis is a major contributor to thromboembolic risk in atrial fibrillation (AF) and can be measured non-invasively using four-dimensional (4D) flow magnetic resonance imaging (MRI), yet factors driving impaired LA hemodynamics in paroxysmal AF remain unclear. In this retrospective study, 107 patients with paroxysmal AF undergoing pre-ablation cardiac MRI were evaluated. LA blood flow stasis was quantified as the proportion of LA voxels with velocity < 0.1 m/s on 4D flow MRI. Multivariable linear regression assessed associations between LA stasis and demographic, cardiac structural and functional, and clinical variables in a complete-case cohort of 88 patients. Mean LA blood flow stasis was 44 ± 13%. After adjustment, higher left atrial volume index, higher hematocrit, and female sex were independently associated with greater LA stasis, while higher left ventricular ejection fraction, higher resting heart rate, and higher body mass index were associated with lower stasis (all p < 0.01). Age and diabetes status were not significantly associated with stasis. The final model explained 40.8% of the variance in LA blood flow stasis. These results underscore the multifactorial nature of LA hemodynamic impairment in paroxysmal AF and support integrating clinical and imaging markers to refine thromboembolic risk assessment.

J. Imaging, Vol. 12, Pages 221: PHSNet: A Small-Target Infrared Hotspot Detection Network for Photovoltaic Modules in UAV Remote-Sensing Images

Bingpeng Gao — 2026-05-25

J. Imaging, Vol. 12, Pages 221: PHSNet: A Small-Target Infrared Hotspot Detection Network for Photovoltaic Modules in UAV Remote-Sensing Images

Journal of Imaging doi: 10.3390/jimaging12060221

Authors: Bingpeng Gao Yunbo Yang Xingzhi Chen Xin Cai Xinyuan Nan

With the rapid expansion of global photovoltaic (PV) installed capacity, hot spot defects have become a major hidden danger that reduces power generation efficiency and threatens the safe and stable operation of PV stations. Unmanned aerial vehicle (UAV) infrared remote sensing is a key technology for the efficient intelligent monitoring of large-scale PV stations. However, detecting tiny hotspots in such infrared images poses severe challenges. Most of these defects are ultra-small targets with extremely low pixel size and weak contrast, which are easily submerged by complex background noise, leading to prominent issues including low detection accuracy and high miss rates. To address these issues, we propose a lightweight detection network based on YOLO11n, named PHSNet, for PV hotspot detection in UAV infrared images. Its core designs include the dynamic convolution integrated C3k2 (Dy-C3k2) for small target feature enhancement, context-guided downsampling (CG-Down) to alleviate feature loss during downsampling, optimized detection layers, and a lightweight shared deconvolutional detection head (LSDECD) for small target adaptation in low-altitude aerial scenes, forming a full-link optimization architecture for tiny target feature perception. Experiments on a dedicated dataset (4025 images, 25,181 annotations, 92% targets < 20 pixels) show that PHSNet achieves 0.73 AP50 and 0.315 AP, surpassing YOLO11n by 0.1 in AP50 and 0.058 in AP, respectively. With only 1.8 M parameters and 98.8 FPS, it outperforms mainstream lightweight models, including YOLOv8n and RT-DETR-R18, strikes a superior accuracy–efficiency balance, and provides an efficient solution for real-time intelligent monitoring and edge deployment of PV stations.

J. Imaging, Vol. 12, Pages 220: Radiomics in Medical Imaging: Methods, Applications, and Challenges

Fnu Neha — 2026-05-23

J. Imaging, Vol. 12, Pages 220: Radiomics in Medical Imaging: Methods, Applications, and Challenges

Journal of Imaging doi: 10.3390/jimaging12060220

Authors: Fnu Neha Deepak Kumar Shukla

Radiomics enables quantitative medical image analysis by converting imaging data into structured, high-dimensional feature representations for predictive modeling. Despite methodological developments and encouraging retrospective results, radiomics continue to face persistent challenges related to feature instability, limited reproducibility, validation bias, and restricted clinical translation. Existing reviews largely focus on application-specific outcomes or isolated pipeline components, with limited analysis of how interdependent design choices across acquisition, preprocessing, feature engineering, modeling, and evaluation collectively affect robustness and generalizability. This survey provides an end-to-end analysis of radiomics pipelines, examining how methodological decisions at each stage influence feature stability, model reliability, and translational validity. This paper reviews radiomic feature extraction, selection, and dimensionality reduction strategies; classical machine and deep learning–based modeling approaches; and ensemble and hybrid frameworks, with emphasis on validation protocols, data leakage prevention, and statistical reliability. Clinical applications are discussed with a focus on evaluation rigor rather than reported performance metrics. The survey identifies open challenges in standardization, domain shift, and clinical deployment, and outlines future directions such as hybrid radiomics–artificial intelligence models, multimodal fusion, federated learning, and standardized benchmarking.

J. Imaging, Vol. 12, Pages 219: Visible–Infrared Fusion Based on CNN and Deformable Transformer

Xiaoyi Wang — 2026-05-22

J. Imaging, Vol. 12, Pages 219: Visible–Infrared Fusion Based on CNN and Deformable Transformer

Journal of Imaging doi: 10.3390/jimaging12060219

Authors: Xiaoyi Wang Xiansong Gu Bin Li Mingqiang Zhang Panpan Yang Qiang Fu

To address the limitations of traditional methods in feature extraction and multi-modal information fusion, this paper proposes an infrared–visible image object detection architecture that integrates Convolutional Neural Networks (CNNs) and Deformable Transformers. This method leverages the advantages of CNN in local feature modeling and the capabilities of Transformer in capturing global contextual information, facilitating the fusion of semantic consistency and structural details across modalities. By introducing a detection-aware multi-task optimization mechanism, the model improves object detection in challenging scenarios such as low-light conditions, occlusion, and complex backgrounds. Experiments on multiple standard datasets, including M3FD and LLVIP, indicate that the proposed method achieves competitive or better performance than the compared methods in key metrics such as mAP. Specifically, our method obtains the best mAP50 among the evaluated methods with an mAP50 of 74.2% on the M3FD dataset and 98.6% on the LLVIP dataset, surpassing the second-best PIAFusion by 4.3% and 2.5% respectively. These quantitative results support the practicality and effectiveness of our approach in the evaluated complex environments.

J. Imaging, Vol. 12, Pages 218: Adaptive Atmospheric Light Estimation for Dehazing via a Novel Decoupled Scattering Model with Neutral-Pixel and Visual-Depth Priors

Zhu Zhu — 2026-05-21

J. Imaging, Vol. 12, Pages 218: Adaptive Atmospheric Light Estimation for Dehazing via a Novel Decoupled Scattering Model with Neutral-Pixel and Visual-Depth Priors

Journal of Imaging doi: 10.3390/jimaging12050218

Authors: Zhu Zhu Xiaoguo Zhang

Accurate estimation of atmospheric light (AL) is essential within the atmospheric scattering model (ASM) to achieve high-quality image dehazing. Most existing methods, however, typically assume spatial uniformity of AL and rely on heuristic estimation from distant pixels, which often results in color distortion and exposure imbalance in dehazed outputs. To address this issue, we propose a novel framework that decouples AL into distinct color and intensity components. Specifically, a neutral pixel prior (NPP) is introduced for precise AL color estimation, which can eliminate color casts. For AL intensity estimation, an adaptive global-local fusion strategy integrating luminance perception transformation and a depth-related color prior (DRCP) is developed to realize balanced exposure. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art AL estimation methods, yielding dehazed images with enhanced color fidelity and more natural illumination.

J. Imaging, Vol. 12, Pages 217: Feasibility of High-Frequency Ultrasound and Magnetic Resonance Imaging to Assess the In Ovo Development of Chicken Embryos

Ylenia Ferrara — 2026-05-20

J. Imaging, Vol. 12, Pages 217: Feasibility of High-Frequency Ultrasound and Magnetic Resonance Imaging to Assess the In Ovo Development of Chicken Embryos

Journal of Imaging doi: 10.3390/jimaging12050217

Authors: Ylenia Ferrara Cristina Terlizzi Annachiara Sarnella Luca Licenziato Serena Monti Marcello Mancini

Preclinical multimodal imaging is widely applied in small animal models for longitudinal studies of human diseases. Beyond murine systems, cost-effective and ethically sustainable models such as the chicken embryo and its chorioallantoic membrane are gaining increasing interest in accordance with the 3Rs principles. This study evaluated the feasibility of using both high-frequency ultrasound and magnetic resonance imaging for the non-invasive longitudinal monitoring of chicken embryo development in ovo. Fifty fertilized eggs were incubated under controlled conditions and examined up to embryonic day 14. High-frequency ultrasound (15–71 MHz) enabled real-time imaging and quantitative assessment of superficial structures, including cranial biometry and limb growth, while magnetic resonance imaging (7T) provided high-resolution three-dimensional visualization of internal organs and extraembryonic compartments. Together, these modalities allowed the progressive identification of key anatomical structures from ED5 onward, with HFUS enabling earlier linear measurements and MRI facilitating detailed anatomical and volumetric evaluation. The integration of these techniques allowed the generation of a developmental imaging timeline and quantitative reference dataset of normal embryogenesis. This multimodal approach represents a promising strategy for in vivo developmental studies, offering a robust baseline to characterize structural alterations induced by experimental conditions. Moreover, the use of the chicken embryo model provides significant ethical and economic advantages, supporting its application in preclinical research and imaging-based studies.

J. Imaging, Vol. 12, Pages 216: IPSM-UNet: An Inverted Pyramid-Shaped U-Net++ Architecture with Multi-Resolution Information Interaction for Coronary Artery Segmentation

Yinong Liao — 2026-05-20

J. Imaging, Vol. 12, Pages 216: IPSM-UNet: An Inverted Pyramid-Shaped U-Net++ Architecture with Multi-Resolution Information Interaction for Coronary Artery Segmentation

Journal of Imaging doi: 10.3390/jimaging12050216

Authors: Yinong Liao Wei Li Guopeng Liu Rong Wang Nan Zheng

Accurate coronary artery segmentation is essential for diagnosis and interventional planning, but conventional U-shaped networks often miss thin, low-contrast vessels and break vessel continuity. We propose Inverted Pyramid-Shaped Multi-resolution U-Net (IPSM-UNet), a dual U-Net++ architecture with multi-resolution feature interaction, feature aggregation, and layer-wise deep supervision. The method is evaluated on DRIVE, CHASE_DB1, DCA1, and an internal coronary angiography dataset. IPSM-UNet achieves competitive or better performance across datasets, including F1 = 0.8310 and Acc = 0.9707 on DRIVE, Se = 0.8792 and Acc = 0.9745 on CHASE_DB1, F1 = 0.8043 and Acc = 0.9793 on DCA1, and Se = 0.8741, F1 = 0.8590, and Acc = 0.9879 on the internal dataset. IPSM-UNet improves vessel continuity and overall segmentation quality, particularly for small-caliber vessels, and supports downstream coronary analysis.

J. Imaging, Vol. 12, Pages 215: Cell Structure Segmentation in TEM Images of Murine Skin Melanoma Cells by Deep Learning Model

Mikhail A. Genaev — 2026-05-18

J. Imaging, Vol. 12, Pages 215: Cell Structure Segmentation in TEM Images of Murine Skin Melanoma Cells by Deep Learning Model

Journal of Imaging doi: 10.3390/jimaging12050215

Authors: Mikhail A. Genaev Izabella S. Gogaeva Iuliia S. Taskaeva Nataliya P. Bgatova Mikhail V. Kozhekin Evgeniy G. Komyshev Dmitry A. Afonnikov

Mitochondria–endoplasmic reticulum contact sites (MERCs) are known as the specialized areas that are involved in a large number of intracellular signaling pathways that regulate Ca2+ homeostasis, lipid transport, mitochondrial dynamics, cell death, and autophagy. Understanding MERC dynamics has important therapeutic implications in cancer, as these contacts regulate fundamental cellular processes and MERCs represent promising targets for therapeutic interventions aimed at improving cancer treatment outcomes. Despite the accumulated data, the role of MERCs in carcinogenesis still remains unknown; thus, it seems promising to search for new tools facilitating the study of MERCs in tumor cells. The structure of MERCs can be examined in great detail using transmission electron microscopy (TEM). Currently, several hundred TEM images are required to obtain reliable data on these contacts. The speed of data processing can be significantly improved by using fast and accurate image analysis techniques based on deep learning models. In this study, five U-Net models with a ResNet34 encoder network were evaluated, including the basic U-Net-Vanilla architecture as well as models incorporating various attention blocks and blocks capturing multilevel image structure, for the segmentation of mitochondria and the endoplasmic reticulum (ER). The best performance on the test dataset was demonstrated by the U-Net-scSE network, with F1 scores of 0.872 for mitochondria and 0.744 for the ER being achieved. Two models were tested for their ability to leverage pre-training on external datasets (Lucchi++, Kasthuri++, and DeepPi-EM). Additionally, models pre-trained on the CEM500K dataset were evaluated after the parameters had been tuned on the data. It was demonstrated by the results that pre-training or the use of pre-trained networks did not lead to an improvement in the IoU and F1 metrics on the test dataset. Subsequent image analysis was conducted to assess two types of MERCs in the segmented images. Finally, the free and user-friendly UltraNet web server was developed for automated analysis of mitochondria, ER, and MERCs using TEM images.

J. Imaging, Vol. 12, Pages 214: Robust Point Cloud Registration via Rotation-Equivariant Geometric Encoding and State Space Models

Junjie Li — 2026-05-18

J. Imaging, Vol. 12, Pages 214: Robust Point Cloud Registration via Rotation-Equivariant Geometric Encoding and State Space Models

Journal of Imaging doi: 10.3390/jimaging12050214

Authors: Junjie Li Jiajun Liu Anqi Chen Huifang Shen Jianya Yuan

Point cloud registration in environments lacking rich textures or containing repetitive structures remains highly susceptible to misalignments. The core challenge lies in balancing the demand for extracting highly distinctive local features with the computational cost of global context modeling. In this paper, we propose a robust registration framework that efficiently combines rotation-equivariant geometric representations with state space models of linear complexity to mitigate feature ambiguity and mismatch. First, a multivariate geometric encoding mechanism is embedded within convolutional layers, enhancing local feature distinctiveness under strict rotation equivariance by explicitly leveraging surface properties. Second, to efficiently establish long-range spatial dependencies, we replace standard dense attention with a hybrid geometry-state aggregation module. This module integrates local geometric self-attention with the Mamba architecture, strengthening focus on overlapping regions without the quadratic computational burden. Finally, we optimize the generated correspondences through a physically consistent hypothesis generator to compute reliable rigid transformation results. On standard benchmarks, our framework demonstrates exceptional robustness to ambiguous matches, achieving a 96.3% registration recall on the 3DMatch dataset and outstanding accuracy on the KITTI dataset.

J. Imaging, Vol. 12, Pages 213: MRI-Derived Biomarkers and Radiomic Signatures for Early, Dose-Dependent Evaluation of Prostate Cancer Radiotherapy: An Exploratory Study

Eleni Bekou — 2026-05-17

J. Imaging, Vol. 12, Pages 213: MRI-Derived Biomarkers and Radiomic Signatures for Early, Dose-Dependent Evaluation of Prostate Cancer Radiotherapy: An Exploratory Study

Journal of Imaging doi: 10.3390/jimaging12050213

Authors: Eleni Bekou Admir Mulita Ioannis M. Koukourakis Nikolaos Courcoutsakis Athanasia Kotini Evlampia Psatha Georgios Tsakaldimis Ioannis Seimenis Michael I. Koukourakis Efstratios Karavasilis

This study provides an accurate assessment of radiotherapy-induced tissue changes in prostate cancer when relying solely on serum prostate-specific antigen kinetics. The current study aims to explore the role of quantitative magnetic resonance imaging and radiomic analyses. In this exploratory prospective study, 22 patients with histologically confirmed prostate cancer underwent multiparametric magnetic resonance imaging at three time points: pre-treatment, mid-treatment, and two months post-radiotherapy. Quantitative imaging analysis included total prostate volume, T2, apparent diffusion coefficient—ADC, and T2* mapping, alongside T2-weighted and diffusion-weighted radiomic feature extraction. Longitudinal changes and dose correlations were analyzed using repeated-measures ANOVA and linear mixed-effects models. Prostate volume increased from 44.22 ± 21.26 cm3 at baseline to 51.11 ± 22.36 cm3 mid-treatment (p < 0.001) and decreased to 37.98 ± 15.5626 cm3 post-treatment (p = 0.034), indicative of temporary radiation-induced glandular edema. T2 relaxation times decreased from 106.00 ± 23.74 ms to 93.33 ± 9.50 ms after therapy (p = 0.023), with androgen deprivation therapy influencing overall values (partial η2 = 0.228, p = 0.028), while ADC and T2* remained largely stable (p > 0.05). Radiomic features, particularly from DWI, exhibited subtle time- and dose-dependent variations. Radiation dose was significantly associated with volume and T2, but not with ADC or T2*. These findings suggest that quantitative MRI biomarkers combined with radiomic analysis may provide objective, non-invasive measures of early prostate cancer radiotherapy-induced changes. These imaging-derived metrics may capture early treatment-related tissue alterations and could provide exploratory signals for early treatment evaluation in prostate cancer, although their relationship with biochemical markers requires further validation.

J. Imaging, Vol. 12, Pages 211: Clinician-Centered Evaluation Framework for Explainable AI Heatmaps in OCT-Based Retinal Disease Classification

Eirini Maliagkani — 2026-05-16

J. Imaging, Vol. 12, Pages 211: Clinician-Centered Evaluation Framework for Explainable AI Heatmaps in OCT-Based Retinal Disease Classification

Journal of Imaging doi: 10.3390/jimaging12050211

Authors: Eirini Maliagkani Ilias Georgalas Ioannis Datseris Elpiniki Papageorgiou Ioannis D. Apostolopoulos

This study presents a two-phase framework for selecting clinically plausible explainable artificial intelligence (XAI) heatmaps for retinal optical coherence tomography (OCT) classification. A six-class Swin Transformer model was trained and validated using a combined dataset consisting of a subset of the public OCT-C8 dataset and private data from a Greek tertiary hospital and externally evaluated on an independent dataset from a private ophthalmological institute. Diagnostic performance was high, achieving 97% accuracy in cross-validation and 91.82% on external evaluation. In Phase 1, one ophthalmologist and one artificial intelligence (AI) specialist independently assessed 100 heatmaps per method based on visual quality and anatomical plausibility, reducing the candidate methods to three. In Phase 2, 21 specialists evaluated the selected methods across multiple cases using a five-point Likert scale reflecting agreement between highlighted regions and the model diagnosis. The proposed Token contRAST map (TRAST) achieved the highest ratings, followed by Gradient-weighted Class Activation Mapping (Grad-CAM++), while Cosine-Grad Fusion Map (CGFM) showed the lowest performance. These findings reflect clinical plausibility rather than direct model interpretability and indicate that effective XAI in OCT imaging requires not only technical performance but also structured expert evaluation. The proposed framework provides a practical approach for selecting explanation methods suitable for clinical use in ophthalmology.

J. Imaging, Vol. 12, Pages 212: GCA-Trans: Global Context-Aware Transformer for Robust Transparent Object Segmentation in Robotic Environments

Deping Li — 2026-05-16

J. Imaging, Vol. 12, Pages 212: GCA-Trans: Global Context-Aware Transformer for Robust Transparent Object Segmentation in Robotic Environments

Journal of Imaging doi: 10.3390/jimaging12050212

Authors: Deping Li Zujian Dong Zilong Yang Ka-Kui Li Yushen Huang

Transparent object segmentation plays a critical role in indoor and outdoor scene understanding, particularly driven by the rapid advancements in autonomous driving and robotics. However, this task presents significant challenges due to the lack of distinct texture and chromatic features in transparent objects, causing their appearance to blend into the background. Existing methods face inherent architectural limitations: CNNs are restricted by limited receptive fields, while Transformer-based methods may inadvertently suppress the weak feature details of transparent surfaces due to the inherent low-pass filtering property of self-attention mechanisms, treating them as background noise. Consequently, these approaches struggle to consistently segment transparent objects across diverse scales, failing to preserve both fine details and large-scale structures. To address these limitations, we propose the Global Context-Aware Transformer (GCA-Trans). Specifically, we design a Multi-scale Context Mining (MCM) module that leverages parallel dilated convolutions with varying receptive fields to simultaneously extract features at multiple scales. This design allows the model to capture and fuse fine-grained local details (e.g., edges and textures) with coarse-grained global spatial context (e.g., overall object shapes), ensuring robust segmentation performance for transparent objects of varying scales. Extensive experiments on four benchmark datasets demonstrate that GCA-Trans sets a new state of the art, achieving significant improvements of 2.53% mIoU on Trans10K-v2, 2.1% IoU on RGB-D GSD, 2.2% IoU on GDD, and 1.9% IoU on GSD, validating the effectiveness and robustness of our approach.

J. Imaging, Vol. 12, Pages 210: Beyond GLM: Inter-Subject Variability as a Complementary Approach to Detect Longitudinal Changes in Emotion Processing in Multiple Sclerosis

Alice Pirastru — 2026-05-15

J. Imaging, Vol. 12, Pages 210: Beyond GLM: Inter-Subject Variability as a Complementary Approach to Detect Longitudinal Changes in Emotion Processing in Multiple Sclerosis

Journal of Imaging doi: 10.3390/jimaging12050210

Authors: Alice Pirastru Valeria Blasi Diego Michael Cacciatore Marco Rovaris Elena Toselli Francesco Pagnini Cesare Cavalera Fabrizio Esposito Giuseppe Baselli Francesca Baglio

Understanding how to reliably capture neural changes induced by treatments in neurological patients remains a major methodological challenge. This issue is particularly evident in the emotional domain—frequently impaired in conditions such as multiple sclerosis (MS) and a key target of rehabilitation—yet not limited to it. Longitudinal neuroimaging studies predominantly rely on group-level analyses (e.g., General Linear Model, GLM), which assume inter-subject homogeneity and treat inter-subject variability (ISV) as noise. Such assumptions may obscure treatment-related neuroplastic changes, especially in domains like emotion processing, where neural responses are intrinsically variable and highly individualized in clinical populations. This study investigates whether modeling ISV can better capture treatment-related neural changes, using emotion-focused rehabilitation as a representative case. We compared GLM with threshold-weighted overlap maps (OMth−w), which quantify spatial consistency across individuals. Thirty healthy controls (HCs) and thirteen people with MS (pwMS) undergoing EMDR for depression performed an emotional fMRI task (pwMS pre/post-treatment). GLM revealed no longitudinal effects, whereas OMth−w showed reduced variability in pwMS after treatment, alongside decreased depressive symptoms (p < 0.001). These findings highlight the value of variability-based approaches as a complementary framework to conventional GLM analyses for detecting treatment-related neuroplasticity in neurological populations.

J. Imaging, Vol. 12, Pages 209: LDSNet: A Lightweight Detail-Sensitive Network for Small Object Detection in Low-Altitude UAV Scenarios

Tong Tan — 2026-05-14

J. Imaging, Vol. 12, Pages 209: LDSNet: A Lightweight Detail-Sensitive Network for Small Object Detection in Low-Altitude UAV Scenarios

Journal of Imaging doi: 10.3390/jimaging12050209

Authors: Tong Tan Xianrong Peng Jianlin Zhang Haorui Zuo Yao Zhang Yunhao Wu Hui Li

Object detection in Unmanned Aerial Vehicle (UAV) imagery faces significant challenges due to the unique aerial perspective. A major bottleneck is the weak feature representation of small objects, which limits both detection accuracy and computational efficiency. To address this issue, we propose a Lightweight Detail-Sensitive Network (LDSNet). Specifically, LDSNet consists of three key components: (1) Lightweight Detail-Sensitive Downsampling (LDSDown), which combines anti-aliasing smoothing with dual-path feature extraction to preserve the spatial details of small objects during downsampling; (2) Shared Recursive Dilated Convolution (SRDC), which uses weight-shared multi-rate dilated convolutions to capture multi-scale context and enlarge the receptive field without introducing extra parameters; and (3) Deeply Decoupled Grouped Head (DGHead), which employs high-ratio grouped convolutions to significantly reduce the computational cost of processing high-resolution inputs. Extensive experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that LDSNet achieves an excellent trade-off between accuracy and efficiency. Compared to the YOLOv11n baseline, LDSNet reduces parameters by 84.6% (from 2.6 M to 0.4 M) and FLOPs by 29.2% (from 6.5 G to 4.6 G), while improving mAP50 by 2.2% on VisDrone2019 and achieving 94.5% on HIT-UAV.

J. Imaging, Vol. 12, Pages 208: Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

Rubén Juárez — 2026-05-12

J. Imaging, Vol. 12, Pages 208: Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

Journal of Imaging doi: 10.3390/jimaging12050208

Authors: Rubén Juárez Antonio Hernández-Fernández Claudia Barros Camargo David Molero

Human performance in critical environments is frequently degraded by mistimed communication delivered during periods of visual–cognitive saturation. In such settings, failures arise not only from individual limitations but also from poor coordination between operators under rapidly changing workload conditions. We present a dual-loop neuro-adaptive simulation framework based on real-time spectral–topographic EEG representations, in which multichannel cortical activity is transformed into dynamic spatial maps and decoded to regulate both operator assistance and team communication. The system integrates 14-channel wireless EEG (Emotiv EPOC X, 256 Hz), gaze tracking, telemetry, and communication events through an LSL-based multimodal synchronization pipeline. A hybrid CNN–LSTM model processes sequences of spectral-topographic EEG maps to classify three operationally actionable neurocognitive states—Channelized Attention, Diverted Attention, and Surprise/Startle—while also estimating a continuous Cognitive Load Index (CLI). These representation-derived features are then used by a multi-agent proximal policy optimization (MAPPO) controller to generate two coordinated outputs: (i) adaptive haptic guidance for the pilot, designed to reduce reliance on overloaded visual and auditory channels, and (ii) a traffic-light communication gate for the telemetry engineer, regulating whether radio intervention should proceed, be delayed, or be withheld. In a high-fidelity dual-station simulation with 25 pilot–engineer pairs, the proposed framework was associated with a reduction of more than 30% in communication breakdown errors relative to open-loop telemetry, with the strongest effects observed during peak-load windows, while preserving realistic task progression. It also improved pilot reaction time to time-critical warnings and reduced engineer decision load under the tested conditions. These findings support the use of spectral-topographic EEG representations as a practical basis for combining multimodal neurophysiological sensing, spatiotemporal pattern decoding, and adaptive coordination in high-pressure human–machine teams. At the same time, the study should be interpreted as evidence of controlled feasibility in a simulated setting rather than as definitive proof of field-level generalization. We further discuss deployment constraints and propose privacy-by-design safeguards to ensure that neurocognitive signals are used exclusively for operational adaptation rather than employability assessment or performance scoring.

J. Imaging, Vol. 12, Pages 207: A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

Lijuan Ren — 2026-05-12

J. Imaging, Vol. 12, Pages 207: A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs

Journal of Imaging doi: 10.3390/jimaging12050207

Authors: Lijuan Ren Jinjing Chen

The accurate detection of dental caries is often hindered by modality-specific imaging challenges, such as illumination artifacts in intra-oral photographs and low lesion contrast in radiographs. This study proposes a comprehensive framework comprising three key components: (1) HybridAugment+, an entropy-guided adaptive augmentation strategy that applies stronger transformations to low-information images; (2) DBAttNet, a dual-branch attention network featuring illumination–reflection aware attention (IRAA) for photographs and contrast–frequency-aware attention (CFA) for radiographs; and (3) a CAM-based explainability method, selected through a systematic evaluation of five advanced techniques. This study utilized two datasets derived from public sources, comprising 639 intra-oral photographs (481 caries, 158 healthy) and 456 radiographs (268 caries, 188 healthy). These were annotated by two dentists, with established inter-rater reliability (κ = 0.82 for photographs, κ = 0.79 for radiographs). The experimental results demonstrate that HybridAugment+ improved performance over conventional augmentation by up to 8.72% on photographs and 7.67% on radiographs. Furthermore, DBAttNet achieved F1-scores of 97.90% on photographs and 95.72% on radiographs, outperforming ResNet50, InceptionV3, MSDNet, DCANet, and ARM-Net. A comparative evaluation identified XGrad-CAM as the most suitable explainability method, with optimal visualization thresholds of 30% for photographs and 20% for radiographs. Generalization experiments on ophthalmology (APTOS 2019, Messidor-2) and chest radiography datasets (Kermany CXR, NIH ChestX-ray14) demonstrated consistent performance gains over domain-specific methods (DT-Net, ConvNeXt-Tiny). These results confirm that the core design principles effectively transfer to other modalities facing analogous imaging challenges.

J. Imaging, Vol. 12, Pages 206: Quantification of Costal Cartilage Calcification Using 18F-NaF-PET/CT

Vanessa Shehu — 2026-05-12

J. Imaging, Vol. 12, Pages 206: Quantification of Costal Cartilage Calcification Using 18F-NaF-PET/CT

Journal of Imaging doi: 10.3390/jimaging12050206

Authors: Vanessa Shehu Om H. Gandhi Patrick Glennan Jaskeerat Gujral Shashi B. Singh Amir A. Amanullah Shiv Patil Khushi Gujral William Y. Raynor Peter Sang Uk Park Eric M. Teichner Robert C. Subtirelu Talha Khan Thomas J. Werner Poul Flemming Høilund-Carlsen Ali Gholamrezanezhad Mona-Elisabeth Revheim Abass Alavi

A quantification technique for costal cartilage calcification using 18F-sodium fluoride–positron emission tomography/computed tomography (18F-NaF-PET/CT) has yet to be established, and the effects of aging and other demographic variables on costal cartilage calcification remain understudied. This study aims to introduce a quantification methodology for assessing costal cartilage calcification using 18F-NaF-PET/CT, assess age-related changes in its 18F-NaF uptake in females and males, and examine the relationship between its 18F-NaF uptake and CT attenuation as well as 18F-NaF uptake and coronary artery calcification. In this retrospective study, we analyzed subjects from the Cardiovascular Molecular Calcification Assessed by 18F-NaF PET/CT (CAMONA) clinical trial. This study evaluated 130 subjects (mean age 48.7 ± 14.5 years; n = 67 females). We manually generated regions of interest overlying the costal cartilages from ribs 8 to 10 on the left side, carefully avoiding osseous uptake from adjacent ribs and sternum, to measure cartilaginous 18F-NaF uptake. Non-parametric statistical analyses (Spearman correlations, Mann–Whitney U tests, Kruskal–Wallis tests) and receiver operating characteristic analysis were performed to evaluate sex-specific age-related changes in uptake, correlations between imaging parameters, and associations with coronary artery calcium (CAC) score. In females, the mean 18F-NaF uptake (as assessed by average SUVmean) was 0.69 ± 0.38 while the corresponding mean Hounsfield Unit (HU) was 108.0 ± 40.0. In males, the mean 18F-NaF uptake (as assessed by average SUVmean) was 0.63 ± 0.22, and the mean HU was 104.0 ± 24.0. There was a significant correlation between 18F-NaF uptake and age in both females (p = 0.003, r = 0.36) and males (p < 0.0001, r = 0.63). The correlation was significantly stronger in males than females (Fisher’s z-test, p = 0.040). There was a significant correlation between CAC score and costal cartilage SUVmean in both females (r = 0.26, p = 0.036) and males (r = 0.51, p < 0.0001). This study introduces a quantification technique to assess costal cartilage calcification using 18F-NaF-PET/CT and demonstrates that the calcification increases with age, more strongly in males than in females, and 18F-NaF uptake is correlated with CAC score. This technique can be applied to other cartilages of interest, in both physiological and pathological conditions, to assess the effects of aging and various demographic variables on cartilage calcification.

J. Imaging, Vol. 12, Pages 205: Federated Learning with Differential Privacy for Ultrasound Breast Cancer Classification: An Empirical Study

Nursultan Makhanov — 2026-05-11

J. Imaging, Vol. 12, Pages 205: Federated Learning with Differential Privacy for Ultrasound Breast Cancer Classification: An Empirical Study

Journal of Imaging doi: 10.3390/jimaging12050205

Authors: Nursultan Makhanov Beibit Abdikenov Tomiris Zhaksylyk Temirlan Karibekov

Breast cancer is a critical global health challenge, and deep learning shows transformative potential for medical image classification. However, privacy regulations such as HIPAA and GDPR create barriers to centralized data aggregation across institutions. This paper presents an empirical evaluation of federated learning (FL) for breast cancer classification in ultrasound images, systematically comparing seven deep learning architectures (ResNet-50, VGG16, VGG19, DenseNet-121, MobileNetV2, Vision Transformer, CoAtNet) across three FL algorithms (FedAvg, FedProx, FedOpt) with client-side differential privacy (DP). Using a simulated federation of eight institutions, we evaluate three clinically relevant classification scenarios. Federated models achieve performance comparable to centralized baselines—98.52% accuracy for normal/abnormal screening, 89.53% for three-class classification—with ViT-small and DenseNet-121 exceeding their centralized counterparts in several configurations. Under strong DP constraints (noise multiplier η=2.0, yielding conservative privacy budget estimates of ε<1.0 with δ=10−5), screening accuracy remains above 82%, though diagnostic tasks incur substantial degradation (best 68.42%). Our findings provide empirical guidance on architecture selection, FL algorithm choice, and privacy-utility trade-offs for privacy-preserving breast cancer diagnosis, while identifying key challenges for clinical deployment.

J. Imaging, Vol. 12, Pages 204: Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance

Zhiyong Zhang — 2026-05-11

J. Imaging, Vol. 12, Pages 204: Self-Supervised Text-Driven Point Cloud Upsampling via Semantic Text Guidance

Journal of Imaging doi: 10.3390/jimaging12050204

Authors: Zhiyong Zhang Meiling Qiu Shuo Chen Ruyu Liu Jianhua Zhang Shengyong Chen

Point cloud upsampling is a fundamental task in 3D vision, yet most existing methods adopt a global and uniform strategy, which is computationally inefficient and fails to address the need for region-specific refinement. To address this challenge, we propose PartSPUNet, a novel self-supervised, text-driven point cloud upsampling framework designed to enhance robotic perception through task-oriented local refinement. Inspired by the human cognitive process where high-level language instructions guide visual attention to specific regions of interest, our method allows an operator to use intuitive natural language prompts to direct the upsampling process. Specifically, PartSPUNet leverages a pretrained vision–language model to zero-shot localize the user-specified semantic part within a sparse point cloud. It then performs geometry-aware densification exclusively on this target region, recovering rich geometric details while preserving the global structure. Experimental results demonstrate that our approach significantly outperforms existing methods in reconstructing specified areas, offering a powerful and intuitive tool for enhancing the 3D perception pipeline in intelligent robotic systems.

J. Imaging, Vol. 12, Pages 203: Characterization of RGB-Polarization Sensor-Based Cameras

Andreas Karge — 2026-05-07

J. Imaging, Vol. 12, Pages 203: Characterization of RGB-Polarization Sensor-Based Cameras

Journal of Imaging doi: 10.3390/jimaging12050203

Authors: Andreas Karge Maximilian Klammer Bernhard Eberhardt Andreas Schilling

This work presents a characterization method for cameras with trichromatic RGB color filter array and polarization layer (RGB-P) sensor-based imaging devices. Such sensors enable the reconstruction of color and polarization of registered scene elements, which is an important requirement in computer vision. We will present spectral responsivity measurements, which reveal different sensitivities for various color and polarization channels. Furthermore, we will discuss and model an observed chromaticity shift in registered camera signals for polarized irradiance. Both lead to inaccurate estimation of color and polarization features. In order to overcome these issues, we will present a neural-network-based model for color and polarization feature reconstruction. Essentially, it considers spectral sensitivity for polarized irradiance. Furthermore, the model takes into account that, for visualization, the color signals have to be a linear combination of polarization channels. Models were trained for selected natural and synthetic reflectance sets, as well as commonly used lighting. We evaluated the resulting performance, which yielded robust results. The method can be employed for an estimation of color and polarization features for RGB-P imaging devices. Applications can be found in photography, as well as machine and computer vision, in which object surface color rendering plays a major role.

J. Imaging, Vol. 12, Pages 202: FFR-CT: Technical Advances and Implementation in Clinical Practice

Kamil Stankowski — 2026-05-05

J. Imaging, Vol. 12, Pages 202: FFR-CT: Technical Advances and Implementation in Clinical Practice

Journal of Imaging doi: 10.3390/jimaging12050202

Authors: Kamil Stankowski Amedeo Pellizzon Luca Signorelli Andrea Baggiano Nicola Cosentino Alberico Del Torto Fabio Fazzari Daniele Junod Maria Elisabetta Mancini Riccardo Maragna Manuela Muratori Luigi Tassetti Alessandra Volpe Saima Mushtaq Gianluca Pontone

Fractional flow reserve derived from coronary computed tomography angiography (FFR-CT) has emerged as a non-invasive modality for the functional assessment of coronary artery disease. By using computational fluid dynamics, particularly in its most extensively validated off-site implementation, FFR-CT enables lesion-specific estimation of pressure gradients across coronary stenoses without the need for invasive catheterization. This narrative review summarizes the technical foundations of FFR-CT as well as the evidence demonstrating that FFR-CT enhances the diagnostic accuracy of coronary CT angiography alone by improving specificity for hemodynamically significant stenoses when compared with invasive fractional flow reserve. Beyond diagnosis, FFR-CT provides incremental prognostic information, supporting risk stratification and guiding revascularization decisions. Suggestions for clinical implementation of FFR-CT and guidance on interpreting results within the appropriate clinical context are provided. Despite these advantages, limitations remain, including dependence on image quality, reduced performance in heavily calcified vessels, assumptions regarding hyperemic flow conditions, and limited validation in certain populations. While computational fluid dynamics-based FFR-CT remains the most commonly adopted approach in clinical settings, machine learning-based on-site FFR-CT is rapidly evolving and is expected to become a reliable alternative. As technical refinements continue, FFR-CT is poised to play an expanding role in precision-guided management of coronary artery disease.

J. Imaging, Vol. 12, Pages 201: Beyond Single Descriptors: Complementary Feature Learning for Image Matching

Xianguo Yu — 2026-05-05

J. Imaging, Vol. 12, Pages 201: Beyond Single Descriptors: Complementary Feature Learning for Image Matching

Journal of Imaging doi: 10.3390/jimaging12050201

Authors: Xianguo Yu Yulong Feng Xi Li

Sparse local feature matching has served as the cornerstone of numerous visual geometry tasks and attracted extensive attention. Although significant progress has been made in this area, improving the discriminative power of descriptors remains a key challenge. As far as we know, existing sparse feature matching methods only predict a single descriptor map for keypoints, which might restrict their potential in solving complex scenarios. This issue is particularly pronounced in real-time applications where most methods only learn descriptor maps at a reduced spatial resolution compared to the input image. Consequently, they require interpolating from the low resolution map for obtaining per-keypoint descriptors, which will introduce background contamination and reduce the discriminability of final descriptors. To address these issues, we propose an efficient novel complementary local feature description model. Specifically, the model simultaneously learns two descriptor maps using different loss functions within a single Convolutional Neural Network (CNN). An orthogonal loss is introduced to effectively coordinate the learning of the two branches, aiming to obtain decoupled and complementary descriptors. Extensive experiments across various visual geometry tasks, such as homography estimation, indoor and outdoor pose estimation, as well as visual localization, have demonstrated the superior performance of the proposed method.

J. Imaging, Vol. 12, Pages 200: A Scene-Adaptive Super-Resolution Framework for Video Compression

Qiyu Zha — 2026-05-05

J. Imaging, Vol. 12, Pages 200: A Scene-Adaptive Super-Resolution Framework for Video Compression

Journal of Imaging doi: 10.3390/jimaging12050200

Authors: Qiyu Zha Jiangling Guo

Video compression is central to large-scale video delivery, where better rate–distortion efficiency directly reduces bandwidth and storage cost. A practical way to improve efficiency is to encode a low-resolution video stream with a standard codec and restore high-resolution details with a learned super-resolution model at the decoder. However, prior SR-assisted compression methods usually update the reconstruction model at fixed temporal intervals, which can waste bitrate when those update boundaries do not match actual scene changes. In this paper, we present SASVC, a scene-adaptive super-resolution video compression framework for offline codec-augmented compression. SASVC detects scene changes using frame-wise grayscale differences, updates only compact adapter modules when a content transition is observed, and compresses the resulting model updates with chained differencing, quantization, and entropy coding. In this way, the method reduces unnecessary model-stream overhead while preserving scene-specific reconstruction fidelity. Experimental results on both long-form and short-form datasets show that SASVC consistently outperforms SRVC-style baselines and conventional codec-based alternatives under the Bjontegaard delta rate based on peak signal-to-noise ratio (BD-rate/PSNR) criterion. Complementary rate–distortion (RD) comparisons in terms of structural similarity index measure (SSIM) and Video Multi-Method Assessment Fusion (VMAF) show the same overall trend, indicating that the gain is not limited to a single distortion metric. Specifically, SASVC achieves BD-rate gains of −41.33% and −53.49% on Vimeo and Xiph, respectively, and further reaches −51.53% and −39.83% on UVG and MCL-JCV. The decoder also maintains real-time 1080p reconstruction at 125 frames per second (FPS) on an NVIDIA RTX 3080 Ti GPU, indicating that scene-aligned model updates can improve compression efficiency while keeping decoder-side deployment practical.

J. Imaging, Vol. 12, Pages 199: A Cost-Effective and Rapidly Manufacturable Infrared–Visible High-Contrast Calibration Board Based on Structural Parametrization

Yuandong Shao — 2026-05-02

J. Imaging, Vol. 12, Pages 199: A Cost-Effective and Rapidly Manufacturable Infrared–Visible High-Contrast Calibration Board Based on Structural Parametrization

Journal of Imaging doi: 10.3390/jimaging12050199

Authors: Yuandong Shao Aleksandr S. Vasilev

The infrared (IR)—visible light (VIS) dual-camera system provides complementary cues for image fusion, but issues such as geometric mismatch caused by different imaging methods, inconsistent resolution/field-of-view, and installation offsets often lead to ghosting and artifacts. This study aims to develop a fast-deployable and repeatable calibration workflow based on cost-effective calibration board. We designed an infrared-visible high-contrast checkerboard plate that can be generated through structural parameterization and efficiently manufactured using Python/OpenSCAD. We also established a corner-based registration pipeline that estimates global homography to align the visible-light images onto the infrared pixel grid for fusion and quantitative evaluation. Experiments conducted in a controlled indoor environment demonstrated stable sub-pixel performance within a range of 1.5–2.5 m, with an average re-projection error of 0.47–0.50 pixels per frame and a 95th percentile lower than 0.51 pixels. The corner position re-projection error test further confirmed stability near image boundaries, with a median value of 0.53–0.63 pixels and a 95th percentile of 0.54–0.64 pixels. Overall, the proposed target design and workflow can achieve practical infrared-visible calibration under typical deployment constraints and have repeatable accuracy, providing geometrically consistent input for subsequent fusion and dataset construction.

J. Imaging, Vol. 12, Pages 198: DAER-YOLO: Defect-Aware and Edge-Reconstruction Enhanced YOLO for Surface Defect Detection of Varistors

Wu Xie — 2026-05-02

J. Imaging, Vol. 12, Pages 198: DAER-YOLO: Defect-Aware and Edge-Reconstruction Enhanced YOLO for Surface Defect Detection of Varistors

Journal of Imaging doi: 10.3390/jimaging12050198

Authors: Wu Xie Shushuo Yao Tao Zhang Gaoxue Qiu Dong Li Fuxian Luo Yong Fan

Varistors are critical overvoltage protection components in modern power electronic systems. They effectively absorb and dissipate surge energy to ensure the safe and stable operation of electrical equipment. However, surface defects can lead to substandard performance or even trigger equipment failure, compromising overall system stability. Therefore, high-precision surface defect detection is essential for quality assurance. To address these challenges, we propose a lightweight model termed Defect-Aware and Edge-Reconstruction Enhanced YOLO (DAER-YOLO) for efficient varistor inspection. First, we construct a C3k2-based defect-aware enhancement module (C3k2-iEMA). This module tackles the difficulty of extracting features from small or morphologically complex defects. By integrating multi-scale feature extraction, an attention mechanism, and efficient nonlinear mapping, it strengthens the perception of defect details. Second, to enhance the reconstruction capability for edge damage and small-object defects, we introduce the Efficient Up-Convolution Block (EUCB). This block improves multi-level feature fusion and generates clearer enhanced feature maps. Based on these improvements, DAER-YOLO outperforms the YOLOv11n baseline on a custom varistor dataset, with mAP@50 and mAP@50:95 increasing by 1.6% and 2.3%, respectively. Experimental results demonstrate that the model effectively improves detection accuracy while exhibiting significant potential for real-time industrial applications.

J. Imaging, Vol. 12, Pages 197: CAMP: A Context-Aware, Multimodal, and Privacy-Preserving Pedestrian Trajectory Prediction Framework

Bin Yue — 2026-05-02

J. Imaging, Vol. 12, Pages 197: CAMP: A Context-Aware, Multimodal, and Privacy-Preserving Pedestrian Trajectory Prediction Framework

Journal of Imaging doi: 10.3390/jimaging12050197

Authors: Bin Yue Shuyu Li Anyu Liu

Pedestrian trajectory prediction is vital for crowd analysis and human–-robot interaction. Recent deep models enhance accuracy by modeling social interactions and scene context, but they often remain opaque and rarely address privacy risks associated with learning individualized motion patterns. We propose CAMP, a Context-Aware, Multimodal, and Privacy-preserving pedestrian trajectory prediction framework designed around a role-aligned multimodal architecture, in which trajectory representations, dynamic scene cues, and explicit spatial interaction constraints are modeled through complementary branches. In CAMP, the trajectory encoder separates shared motion regularities from individualized motion tendencies, the optical-flow encoder captures motion-centric transient scene dynamics, and the potential-field encoder provides an interpretable spatial cost prior for obstacle avoidance and social interaction modeling. A Transformer-based decoder fuses these modalities to predict future trajectory distributions. To reduce the exposure of personalized motion patterns, we apply targeted DP-SGD only to the individual branch during the private fine-tuning stage, while treating the remaining frozen components as post-processing under the stated threat model. Experiments on the ETH/UCY benchmark show that CAMP achieves competitive ADE/FDE performance under the reported setting, while its private variant DP-CAMP maintains a reasonable utility–privacy trade-off across several reported privacy budgets.

J. Imaging, Vol. 12, Pages 196: Computed Fluid Dynamics-Based Blood Pressure Prediction for Coronary Artery Disease Diagnosis Using Coronary Computed Tomography Angiography

Rene Lisasi — 2026-05-02

J. Imaging, Vol. 12, Pages 196: Computed Fluid Dynamics-Based Blood Pressure Prediction for Coronary Artery Disease Diagnosis Using Coronary Computed Tomography Angiography

Journal of Imaging doi: 10.3390/jimaging12050196

Authors: Rene Lisasi Huan Huang William Pei Michele Esposito Chen Zhao

Computational fluid dynamics (CFD)-based simulation of coronary blood flow provides valuable hemodynamic markers, such as pressure gradients, for diagnosing coronary artery disease (CAD). However, CFD is computationally expensive, time-consuming, and difficult to integrate into large-scale clinical workflows. These limitations restrict the availability of labeled hemodynamic data for training AI models and hinder the broad adoption of non-invasive, physiology-based CAD assessment. To address these challenges, we develop an end-to-end pipeline that automates coronary geometry extraction from coronary computed tomography angiography (CCTA), streamlines simulation data generation, and enables efficient learning of coronary blood pressure distributions. The pipeline reduces the manual burden associated with traditional CFD workflows while producing consistent training data. Furthermore, we introduce a diffusion-based regression model. Specifically, the inverted conditional diffusion (ICD) model is designed to predict coronary blood pressure directly from CCTA-derived features, thereby bypassing the need for computationally intensive CFD during inference. The proposed model is trained and validated on two CCTA datasets using the Adam optimizer with a weight decay of 1×10−3, a learning rate of 1×10−5, a batch size of 100, and Huber loss. It is then evaluated on a test set of ten simulated coronary hemodynamic cases. Experimental results demonstrate state-of-the-art performance. Compared with Long Short-Term Memory (LSTM), the proposed model improves the R2 score by 19.78%, reduces the root mean squared error (RMSE) by 19.44%, and lowers the normalized root mean squared error (NRMSE) by 18%. Compared with a multilayer perceptron (MLP), it improves the R2 score by 8.38%, reduces RMSE by 4.3%, and reduces NRMSE by 5.4%. This work represents a first step toward a scalable and accessible framework for rapid, non-invasive, CFD-based blood pressure prediction, with the potential to support CAD diagnosis.

J. Imaging, Vol. 12, Pages 195: On Vision Transformer Explainability for Personal Protective Equipment Detection: A Qualitative and Quantitative Analysis

Miriam Di Renzo — 2026-04-30

J. Imaging, Vol. 12, Pages 195: On Vision Transformer Explainability for Personal Protective Equipment Detection: A Qualitative and Quantitative Analysis

Journal of Imaging doi: 10.3390/jimaging12050195

Authors: Miriam Di Renzo Filomena Niro Patrizia Agnello Marta Petyx Fabio Martinelli Mario Cesarelli Antonella Santone Francesco Mercaldo

The safety of workers in industrial settings is ensured through the correct use of Personal Protective Equipment (PPE). The use of such equipment can be monitored using Deep Learning (DL). Federated Machine Learning (FML) is a technique that can be used in this context to preserve the privacy of sensitive information and provide explainability for the models adopted. Explainability techniques are an essential resource for interpreting the classification performed by the model. In this regard, this study aims to evaluate, through the adoption of specific similarity indices, the robustness and consistency of the explainability algorithms adopted to identify the areas of the images that are decisive for PPE classification. The dataset consists of 1600 real images representing work environments, in which staff are portrayed both with and without Personal Protective Equipment; specifically, there are workers wearing helmets, workers wearing reflective vests, workers wearing both devices and, finally, workers without any PPE. SSIM, VIF and SCC are the most relevant indices involved in the study. In the experimental phase, their mean values stand at 0.99, 0.96 and 0.96 for the intra-client study, and 0.96, 0.91 and 0.71 in the inter-client analysis.

J. Imaging, Vol. 12, Pages 194: N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images

Yafeng Yang — 2026-04-30

J. Imaging, Vol. 12, Pages 194: N-Unet: An Efficient Multi-Task Model for Precise Classification and Segmentation of Breast Ultrasound Images

Journal of Imaging doi: 10.3390/jimaging12050194

Authors: Yafeng Yang Zhengwei Zhu

Deep learning has substantially advanced the automated classification and segmentation of breast ultrasound images. However, many existing methods do not fully exploit task correlations, which weakens information exchange and limits the delineation of fine structures. In addition, commonly used loss functions often fail to balance classification and segmentation objectives effectively. To address these issues, we propose N-Unet, a multi-task learning framework that combines adaptive optimization with feature-enhancement modules. Specifically, the Adaptive Multi-Task Loss (AMTL) dynamically balances the two task objectives to promote stable joint learning. The Adaptive Feature Fusion (AFF) and Cross-Level Attention Enhancement (CLAE) modules improve feature representation through multi-scale integration and semantic refinement. The Conditional Segmentation Boosting (CSB) module further refines segmentation outputs according to the classification result, improving inference-stage consistency. Together, these components form a unified multi-task framework with a shared encoder, a segmentation branch, and an integrated classification branch whose output further supports segmentation-consistency refinement. Experiments on the BUSI and BUS-UCLM datasets demonstrate the superiority of N-Unet. The model achieves classification accuracies of 96.54% on BUSI and 95.83% on BUS-UCLM, with corresponding Dice scores of 80.70% and 92.16%. It reaches this performance with only 8.95 M parameters and 14.74 GFLOPs, showing a favorable performance-efficiency trade-off. These results confirm the effectiveness of N-Unet and its robustness across the two BUS datasets studied here, supporting its potential for practical breast nodule assessment, while broader external generalization remains to be validated.

J. Imaging, Vol. 12, Pages 193: Evaluation of the Colour Rendering of Brand Identity Elements on Sustainable Papers Made from Invasive Alien Plant Species

Anja Sarjanović — 2026-04-30

J. Imaging, Vol. 12, Pages 193: Evaluation of the Colour Rendering of Brand Identity Elements on Sustainable Papers Made from Invasive Alien Plant Species

Journal of Imaging doi: 10.3390/jimaging12050193

Authors: Anja Sarjanović Klemen Možina

The use of invasive plant species for papermaking presents both environmental and economic opportunities, particularly for companies seeking to introduce sustainable materials. This study examined whether paper made from cellulose fibres of Japanese knotweed is suitable for printing business elements such as logos in specific red colours. The physical, mechanical, and optical properties of the paper were compared with those of standard office and commercial Xerox paper. Two printing techniques—electrophotography and inkjet printing—were tested, and the colour differences (CIE colour difference, ΔE) between the reference logo and the prints, with and without the International Colour Consortium (ICC) colour profile, were evaluated. The results showed that the low whiteness and high porosity of the knotweed paper negatively affected colour reproduction, especially in inkjet printing, where even manually optimised profiles did not yield satisfactory results (minimum ΔE > 23). Electrophotography performed better but still had limitations. It was concluded that Japanese knotweed paper is not suitable for professional reproduction of demanding colour elements without additional processing, although it has potential for sustainable applications with lower visual requirements.

J. Imaging, Vol. 12, Pages 192: Automatic Polygon Annotation of Plant Objects for Training Dataset Preparation in Green Biomass Segmentation Tasks

Evgeniy Ivliev — 2026-04-30

J. Imaging, Vol. 12, Pages 192: Automatic Polygon Annotation of Plant Objects for Training Dataset Preparation in Green Biomass Segmentation Tasks

Journal of Imaging doi: 10.3390/jimaging12050192

Authors: Evgeniy Ivliev Valery Gvindjiliya Danila Donskoy Yevgeniy Chayka

This paper addresses the problem of automated segmentation of plant green biomass in field crop images aimed at improving the accuracy of crop and weed identification. To construct a training dataset for neural network models, an automatic annotation algorithm is proposed, enabling the generation of polygonal object masks without human intervention. The method is based on adaptive analysis of color characteristics of plant fragments with iterative narrowing of the hue range in the HSV color space, combined with an integral quality metric that accounts for the dynamics of contour area and shape. The proposed method achieved an IoU of 93.22% and a DSC of 96.30%, demonstrating a high level of agreement between automatic and manual annotations. The generated masks are used to train segmentation models of the YOLO11-seg family. Models of different scales (n, s, m, l, x) were trained and evaluated using standard metrics, including Intersection over Union (IoU), mAP@0.5, mAP@0.5–0.95, F1-score, and Precision–Recall (PR) curves. Experimental results demonstrate that models trained on automatically generated annotations achieve stable segmentation performance of plant green biomass. The best results were obtained with the YOLO11m-seg model, achieving an F1-score of 0. 772. The results confirm the effectiveness of the proposed approach and demonstrate acceptable segmentation quality, supported by both quantitative metrics and visual analysis. The developed automatic annotation algorithm can be used to expand training datasets in computer vision tasks for agricultural applications.

J. Imaging, Vol. 12, Pages 191: PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes

Bin Chen — 2026-04-28

J. Imaging, Vol. 12, Pages 191: PGT-Net: A Physics-Guided Transformer–CNN Hybrid Network for Low-Light Image Enhancement and Object Detection in Traffic Scenes

Journal of Imaging doi: 10.3390/jimaging12050191

Authors: Bin Chen Jian Qiao Baowei Li Shipeng Liu Wei She

In autonomous driving and intelligent transportation systems, the degradation of image quality under low-light conditions severely impacts the reliability of subsequent object detection. Existing methods predominantly employ data-driven deep learning models for image enhancement, often lacking physical interpretability and struggling to maintain robustness in complex lighting-varying traffic scenarios. To address this, this paper proposes a Physically Guided Transformer–CNN Hybrid Network (Physically Guided Transformer–CNN Hybrid Network, PGT-Net) for end-to-end joint optimization of low-light enhancement and object detection. PGT-Net innovatively integrates the atmospheric scattering physical model with deep learning architecture: first, a learnable physical guidance branch estimates the scene’s atmospheric illumination map and transmittance map, providing explicit physical priors for the network; second, a dual-branch enhancement backbone is designed, where the local CNN branch (based on an improved UNet) restores fine textures, while the Global Transformer Branch (based on Swin Transformer) models long-range dependencies to correct global uneven illumination, with features adaptively combined via a Physical Fusion Module to ensure enhancement results align with physical laws while retaining rich visual features; finally, the enhanced images are directly fed into a lightweight detection head (e.g., YOLOv7) for joint training and optimization. Comprehensive experiments on public datasets (ExDark, BDD100K-night, etc.) demonstrate that PGT-Net significantly outperforms mainstream methods (e.g., RetinexNet, KinD, Zero-DCE) in both low-light image enhancement quality (PSNR/SSIM) and object detection accuracy (mAP), while maintaining high inference efficiency. This research offers an interpretable, high-performance solution for visual perception tasks under adverse lighting conditions, holding strong theoretical significance and practical value.

J. Imaging, Vol. 12, Pages 190: A Practical Weakly Supervised Framework for Dose-Up Translation of Low-Enhanced CT Under Clinical Acquisition Variability

Jong Bub Lee — 2026-04-27

J. Imaging, Vol. 12, Pages 190: A Practical Weakly Supervised Framework for Dose-Up Translation of Low-Enhanced CT Under Clinical Acquisition Variability

Journal of Imaging doi: 10.3390/jimaging12050190

Authors: Jong Bub Lee Se Hwan Lim Yu Jin Jung Jae Hwan Kim Hyun Gyu Lee

Low-dose contrast-enhanced computed tomography (CT) is widely used to reduce contrast-induced toxicity, but reduced iodine concentration and inconsistent acquisition conditions often produce uneven contrast attenuation and spatial misalignment between scans. In this context, we define dose-up translation as the computational process of synthetically enhancing low-dose contrast images to approximate the visual and diagnostic quality of full-dose acquisitions. These factors limit the effective use of routinely acquired imaging data for dose-up translation, particularly in veterinary abdominal CT where respiratory motion and postural variability further degrade anatomical correspondence. We present a weakly aligned enhancement framework designed to operate under spatial misalignment and limited paired data. Registration-based pseudo-references are constructed using a hybrid strategy that combines deformable anatomical alignment with feature-level correspondence. Dose-up translation is performed using structure-preserving translation with multi-scale consistency and edge-aware regularization to maintain anatomical boundaries. To address limited low-dose datasets, a two-stage knowledge transfer strategy transfers anatomical and contrast priors from abundant pre-contrast data. Quantitative evaluation demonstrated region-level contrast-to-noise ratio improvements of up to 31.5% (e.g., from 5.55 to 8.38 in the caudal vena cava (CVC), p < 0.05) compared with baseline enhancement methods across 1171 test slices. Experiments demonstrate consistent improvements in structural fidelity, distributional realism, and region-level vascular conspicuity compared with paired, unpaired, and synthetic-pairing baselines. These findings suggest that the dose-up translation of low-enhanced CT is better formulated as a weakly aligned domain adaptation problem rather than a strictly paired reconstruction task, enabling practical image translation under realistic clinical acquisition variability.

J. Imaging, Vol. 12, Pages 189: MS-PANet: Multi-Scale Spatial Pyramid Attention for Effective Drainage Pipeline Image Dehazing

Ce Li — 2026-04-27

J. Imaging, Vol. 12, Pages 189: MS-PANet: Multi-Scale Spatial Pyramid Attention for Effective Drainage Pipeline Image Dehazing

Journal of Imaging doi: 10.3390/jimaging12050189

Authors: Ce Li Xinyi Duan Zhongbo Jiang Yijing Ding Quanzhi Li Zhengyan Tang Feng Yang

Urban drainage pipelines are crucial for flood control, drainage, and environmental quality. However, fog within pipelines degrades image quality, hindering the identification of damage features such as cracks and leaks. Existing dehazing algorithms struggle with the unique challenges presented by drainage pipelines, such as their cylindrical structure, non-uniform lighting, and multi-scale particulate interference, leading to inadequate feature extraction and weak cross-channel dependency modeling. To address these issues, we propose a novel drainage pipeline image dehazing network based on a pyramid attention mechanism. Specifically, our proposed method incorporates a custom-designed multi-scale spatial pyramid attention (MSPA) module, which combines hierarchical pyramid convolution and spatial pyramid recalibration modules. This enables the dynamic adjustment of multi-scale feature weights and the effective modeling of cross-channel long-range dependencies. Extensive experiments demonstrate that our network achieves superior dehazing performance across diverse underground environments, particularly in synthetic foggy dataset under real pipeline conditions, outperforming state-of-the-art dehazing algorithms. This proposed approach provides a reliable solution for high-precision visual inspection in complex pipeline scenarios.

J. Imaging, Vol. 12, Pages 188: A Robust Intelligent CNN Model Enhanced with Gabor-Based Feature Extraction, SMOTE Balancing, and Adam Optimization for Multi-Grade Diabetic Retinopathy Classification

Asri Mulyani — 2026-04-27

J. Imaging, Vol. 12, Pages 188: A Robust Intelligent CNN Model Enhanced with Gabor-Based Feature Extraction, SMOTE Balancing, and Adam Optimization for Multi-Grade Diabetic Retinopathy Classification

Journal of Imaging doi: 10.3390/jimaging12050188

Authors: Asri Mulyani Muljono Purwanto Moch Arief Soeleman

Diabetic retinopathy (DR) is a leading cause of vision impairment and permanent blindness worldwide, requiring accurate and automated systems for multi-grade severity classification. However, standard Convolutional Neural Networks (CNNs) often struggle to capture fine, high-frequency microvascular patterns critical for diagnosis. This study proposes a Robust Intelligent CNN Model (RICNN) that integrates Gabor-based feature extraction with deep learning to improve DR classification. Specifically, Gabor filters are applied during preprocessing to extract orientation- and frequency-sensitive texture features, which are transformed into feature maps and concatenated with CNN feature representations at the fully connected layer (feature-level fusion). The model also incorporates the Synthetic Minority Oversampling Technique (SMOTE) for data balancing and the Adam optimizer for efficient convergence. This integration enhances sensitivity to microvascular structures such as microaneurysms and hemorrhages. The proposed RICNN was evaluated on the Messidor dataset (1200 images) across four severity levels: Mild, Moderate, Severe, and Proliferative DR. The model achieved an accuracy of 89%, a precision of 88.75%, a recall of 89%, and an F1-score of 89%, with AUCs of 97% for Severe DR and 99% for Proliferative DR. Comparative analysis confirms that the proposed texture-aware Gabor enhancement significantly outperforms LBP and Color Histogram approaches, indicating its potential for reliable clinical decision support.

J. Imaging, Vol. 12, Pages 187: Projection-Related Bias in the Detection of Thoracic Abnormalities: A Large-Scale Analysis of the NIH ChestX-Ray14 Dataset

Josef Yayan — 2026-04-27

J. Imaging, Vol. 12, Pages 187: Projection-Related Bias in the Detection of Thoracic Abnormalities: A Large-Scale Analysis of the NIH ChestX-Ray14 Dataset

Journal of Imaging doi: 10.3390/jimaging12050187

Authors: Josef Yayan

Chest radiography remains a cornerstone in the diagnosis of thoracic diseases. However, differences in image acquisition—particularly projection type—may influence the apparent prevalence and detectability of radiographic findings. Such differences may represent a potential source of bias in large imaging datasets used for clinical research and artificial intelligence. Importantly, projection type is closely associated with the patient’s condition and may therefore reflect both technical imaging factors and underlying clinical characteristics, including disease severity. A total of 120,120 chest radiographs were available in the dataset. After applying inclusion criteria, 112,104 images were included in the primary analysis. Multivariable logistic regression models were used to assess the association between projection type and the presence of radiographic findings, adjusted for age and sex. Subgroup and interaction analyses were performed to evaluate effect modification by demographic factors. Given the large sample size, emphasis was placed on effect sizes and confidence intervals rather than statistical significance alone. Compared with posteroanterior projection, anteroposterior projection was associated with higher odds of detecting consolidation (aOR 3.27; 95% CI 3.07–3.48), infiltration (aOR 1.90; 95% CI 1.84–1.96), pleural effusion (aOR 1.66; 95% CI 1.60–1.72), atelectasis (aOR 1.63; 95% CI 1.57–1.70), and cardiomegaly (aOR 1.19; 95% CI 1.10–1.28). These associations were consistent across age and sex strata. A significant interaction between projection type and sex was observed for infiltration (p = 0.01). Projection type is associated with substantial differences in the detection of thoracic abnormalities on chest radiographs. These associations should be interpreted with caution, as they likely reflect a combination of technical imaging effects and residual confounding related to patient severity and clinical context. Projection may therefore act as a marker of dataset heterogeneity rather than a purely causal factor. Accounting for projection metadata is therefore essential to improve clinical interpretation and to ensure the robust development and validation of artificial intelligence models.

J. Imaging, Vol. 12, Pages 186: Physically Guided Attention Mechanism for Underwater Motion Deblurring via Cepstrum-Based Blur Estimation

Ning Hu — 2026-04-26

J. Imaging, Vol. 12, Pages 186: Physically Guided Attention Mechanism for Underwater Motion Deblurring via Cepstrum-Based Blur Estimation

Journal of Imaging doi: 10.3390/jimaging12050186

Authors: Ning Hu Shuai Li Jindong Tan

Underwater images often suffer from mixed degradations, including motion blur, which reduce structural clarity and adversely affect downstream vision tasks. To address this problem, we propose a physically guided Transformer framework for underwater motion deblurring. The proposed method combines two-stage cepstrum-based blur estimation with a point spread function (PSF)-guided self-attention mechanism. Specifically, blur parameters are first robustly estimated through cepstrum analysis, ellipse fitting, and negative-peak refinement, and the resulting PSF is then embedded into the Transformer attention module to guide feature aggregation. On the real underwater benchmark datasets UIEB Challenge-60 and EUVP330, the proposed method achieves UIQM/UCIQE scores of 4.09/0.56 and 3.40/0.58, respectively, significantly outperforming UFPNet and Phaseformer, thereby demonstrating superior perceptual restoration in terms of sharpness, contrast, and color consistency. On the synthetic test set, the proposed method attains 24.23 dB PSNR and 0.918 SSIM, outperforming both recent deep models and classical non-blind deconvolution methods, which confirms its strong restoration fidelity and structural consistency. In the controlled water-tank experiments, the proposed method consistently achieves the best performance under different camera motion speeds, demonstrating excellent robustness and practical applicability. Overall, the proposed framework provides an effective and physically interpretable solution for underwater motion deblurring.

J. Imaging, Vol. 12, Pages 185: Influence of Intrapancreatic Fat Deposition on Regional and Total Pancreatic T1 Relaxation Times at 3.0 Tesla MRI

Xiatiguli Shamaitijiang — 2026-04-24

J. Imaging, Vol. 12, Pages 185: Influence of Intrapancreatic Fat Deposition on Regional and Total Pancreatic T1 Relaxation Times at 3.0 Tesla MRI

Journal of Imaging doi: 10.3390/jimaging12050185

Authors: Xiatiguli Shamaitijiang Beau Pontre Loren Skudder-Hill Yutong Liu Maxim S. Petrov

Longitudinal relaxation time (T1) can be used to assess pancreatic pathology on magnetic resonance imaging (MRI). Although pancreatic T1 values may be influenced by intra-organ fat content, regional variation within the pancreas and the impact of potential confounders have not been comprehensively examined. This study aimed to investigate the nuanced associations between intrapancreatic fat deposition (IPFD) and both regional and total pancreatic T1 relaxation times. Pancreatic T1 relaxation times were quantified with B1-corrected dual flip-angle 3D-VIBE imaging at 3.0 Tesla, whereas IPFD was measured with a high-speed, T2-corrected multi-echo sequence. Linear regression models were constructed to evaluate the association between IPFD and T1 values, with adjustment for relevant covariates. A total of 124 individuals were included in the analysis. IPFD explained 4.6% of the variance in total pancreatic T1 values, with notable regional differences: 1.0% in the head, 3.0% in the body, and 7.7% in the tail of the pancreas. In the fully adjusted model, IPFD was significantly associated with total pancreatic T1 values (p = 0.001), with consistent significant associations observed across all pancreatic regions: head (p = 0.03), body (p = 0.004), and tail (p = 0.002). These findings demonstrate that IPFD is a significant determinant of pancreatic T1 relaxation times. Accordingly, IPFD should be considered a potential confounder in pancreatic T1 assessments and accounted for when interpreting T1 relaxation times on pancreatic MRI in both research and clinical contexts.

J. Imaging, Vol. 12, Pages 184: Infrared Small-Target Segmentation Framework Based on Morphological Attention and Energy Core Loss

Baoyu Zhu — 2026-04-24

J. Imaging, Vol. 12, Pages 184: Infrared Small-Target Segmentation Framework Based on Morphological Attention and Energy Core Loss

Journal of Imaging doi: 10.3390/jimaging12050184

Authors: Baoyu Zhu Qunbo Lv Yangyang Liu Haoran Cao Zheng Tan

Infrared small-target segmentation (IRSTS) is crucial for a wide range of applications, including maritime search-and-rescue operations and intelligent traffic surveillance. However, current deep learning methods struggle with dynamic scale variations in infrared small targets, resulting in false detections and missed detections, alongside inadequate core localization accuracy. To address these challenges, we propose an infrared small-target segmentation framework founded on morphological attention and an energy core loss function, IRSTS_Unet. Specifically, we design a Dynamic Shape-adaptive Deformable Attention Module (DSDAM), which achieves parameterized feature extraction via “initial localization–offset deformation–precise sampling”. This approach enables the network to differentially focus on target cores and background cues to suppress clutter. To improve the efficiency of multi-scale feature aggregation, we embed the DSDAM within both the feature extraction and cross-layer fusion stages. Furthermore, we formulate a Core Energy-aware Core-Priority loss (CECP-Loss) function that incorporates the energy prior distribution of small targets, effectively counteracting the “core dilution” phenomenon endemic to conventional loss functions. Through extensive experiments on multiple public datasets, we demonstrate that IRSTS_U-Net outperforms state-of-the-art approaches in terms of both detection accuracy and robustness.

J. Imaging, Vol. 12, Pages 183: DiGS: Depth-Initialized Gaussian Splatting for Single-Object Reconstruction

Jacopo Meglioraldi — 2026-04-24

J. Imaging, Vol. 12, Pages 183: DiGS: Depth-Initialized Gaussian Splatting for Single-Object Reconstruction

Journal of Imaging doi: 10.3390/jimaging12050183

Authors: Jacopo Meglioraldi Pasquale Cascarano Gustavo Marfia

Gaussian Splatting is a state-of-the-art technique for 3D reconstruction. In this paper, we investigate how different initialization strategies influence the optimization process within the Gaussian Splatting framework, showing that more accurate initial point clouds can greatly influence the quality of object reconstruction. We introduce the Depth-initialized Gaussian Splatting (DiGS) approach, a pipeline that leverages depth-based initialization. By incorporating depth data from a calibrated stereo camera setup, the proposed method significantly enhances model performance, particularly during the early optimization stages. DiGS is particularly effective for reconstructing isolated single objects and improving the recovery of fine-grained details. Several tests on synthetic and real-world datasets confirm the effectiveness of the proposed pipeline. To evaluate our approach, we employ objective metrics and a user study involving 20 participants to assess with human perception the quality of the proposed approach.

J. Imaging, Vol. 12, Pages 182: Phase-Domain Peak-Based Correspondence Extraction for Robust Structured-Light Imaging

Andrijana Ćurković — 2026-04-23

J. Imaging, Vol. 12, Pages 182: Phase-Domain Peak-Based Correspondence Extraction for Robust Structured-Light Imaging

Journal of Imaging doi: 10.3390/jimaging12050182

Authors: Andrijana Ćurković Milan Ćurković Alen Grebo

Standard fringe-based structured-light processing estimates wrapped phase from phase-shifted sinusoidal images and commonly relies on phase unwrapping to obtain a globally consistent phase representation. In practical measurements, this approach may become unstable on reflective objects and under low or non-uniform illumination, where the recorded fringe signal is distorted and the recovered phase becomes unreliable. To address these limitations, we propose a correspondence extraction method based on subpixel peak localization performed directly on phase-domain images. The wrapped phase is transformed into absolute value phase profiles, Φ=|ϕw|, whose local structure follows the projected fringe pattern and is less affected by object-dependent intensity variations. The proposed method reformulates correspondence extraction as a local signal-based estimation problem in the phase-domain, thereby reducing reliance on global phase-consistency constraints at the correspondence stage. A practical advantage observed in the evaluated examples is that the method remained usable in some regions where the phase became locally flat because of low modulation, saturation, or reflective surface effects. In such regions, conventional processing relies on sufficiently reliable phase gradients and subsequent unwrapping, whereas the proposed method uses local peak geometry in the transformed phase representation. In the implementation used here, Gray-code information is employed only for pixel-wise phase extension and reference indexing, not as a spatial phase-unwrapping mechanism. The method does not require machine learning models or training data and can be integrated as a correspondence analysis stage in practical structured-light systems.

J. Imaging, Vol. 12, Pages 181: A Pictorial Review on Mastitis: Clinical Aspects, Imaging Features and Complications

Giovanna Romanucci — 2026-04-23

J. Imaging, Vol. 12, Pages 181: A Pictorial Review on Mastitis: Clinical Aspects, Imaging Features and Complications

Journal of Imaging doi: 10.3390/jimaging12050181

Authors: Giovanna Romanucci Claudia Rossati Marco Conti Delia Moretti Gianluca Russo Francesca Fornasa Carlotta Rucci Oscar Tommasini Paolo Belli Rossella Rella

Breast mastitis is a common condition that can be found during clinical practice, challenging the clinician, who must reach the correct diagnosis among the many differentials, to properly treat the underlying pathology. In this review, we aim to provide clinicians and radiologists with an overview of the various forms of mastitis, focusing on clinical presentation, etiological subtypes, imaging appearances across modalities (e.g., ultrasound, mammography/tomosynthesis, contrast enhanced techniques, MRI), related complications, and the typical imaging takeaways. Our goal is also to provide tools for the correct differential diagnosis between various forms of mastitis, breast cancer and other inflammatory breast pathologies. A computerized literature search using PubMed and Google Scholar was performed by authors, entering various keywords (e.g., “mastitis”, “breast infections”, “breast abscess”, “breast cancer mimickers”, “lactational mastitis”, “non lactational mastitis”, “mastitis imaging”, “rare forms of mastitis”). Articles published between 2002 and 2025 were taken into consideration. The authors selected various eligible studies, scientific articles and extracted data to cover the whole spectrum of mastitis clinical presentation and underlying pathology. Authors divided the mastitis spectrum into “lactational” and “non-lactational” forms. Between the second group, periductal mastitis, idiopathic granulomatous mastitis, and rarer forms are taken into consideration. Our review has several limitations: it is a narrative and not systematic review and has limited generalizability of rare subtypes because of the case report driven evidence, heterogeneity of selected studies and potential selection bias. It supplies imaging from various clinical cases, which can be useful to familiarize with the pathology spectrum. In conclusion, breast mastitis is a challenge for breast radiologists and clinicians, familiarity with this condition is crucial to make a correct differential diagnosis. Further studies are needed on rarer subtypes.

J. Imaging, Vol. 12, Pages 180: Neural Computing Advancements in Cardiac Imaging: A Review of Deep Learning Approaches for Heart Disease Diagnosis

Tarek Berghout — 2026-04-22

J. Imaging, Vol. 12, Pages 180: Neural Computing Advancements in Cardiac Imaging: A Review of Deep Learning Approaches for Heart Disease Diagnosis

Journal of Imaging doi: 10.3390/jimaging12050180

Authors: Tarek Berghout

Heart disease remains a leading cause of mortality worldwide, and timely and accurate diagnosis is crucial for improving patient outcomes. Medical imaging plays a pivotal role in this process, yet traditional diagnostic methods often suffer from limitations, including dependency on manual interpretation, susceptibility to observer variability, and inefficiency in handling large-scale data. Deep learning has emerged as an innovative technology in medical imaging, providing unparalleled advancements in feature extraction, segmentation, classification, and prediction tasks. Despite its proven potential, comprehensive reviews of deep learning methods specifically targeted at cardiac imaging remain scarce. This review paper seeks to bridge this gap by analyzing the state-of-the-art deep learning applications for heart disease diagnosis, covering the period from 2015 to 2025. Employing a well-structured methodology, this review categorizes and examines studies based on imaging modalities: Ultrasound (US), Magnetic Resonance Imaging (MRI), X-ray, Computed Tomography (CT), and Electrocardiography (ECG). For each modality, the analysis focuses on utilized datasets, processing techniques (e.g., extraction, segmentation and classification), and paradigms (e.g., transfer learning, federated learning, explainability, interpretability, and uncertainty quantification). Additionally, the types of heart disease addressed and prediction accuracy metrics are also scrutinized. These findings point toward future opportunities, including the study of data quality, optimization, transfer learning, uncertainty quantification and model explainability or interpretability. Furthermore, exploring advanced techniques such as recurrent expansion, transformers, and other architectures may unlock new pathways in cardiac imaging research. This review is a critical synthesis offering a roadmap for researchers and practitioners to advance the application of deep learning in heart disease diagnosis.

J. Imaging, Vol. 12, Pages 179: Automated Morphological Profiling via Deep Learning-Based Segmentation for High-Throughput Phenotypic Screening

Bendegúz H. Zováthi — 2026-04-21

J. Imaging, Vol. 12, Pages 179: Automated Morphological Profiling via Deep Learning-Based Segmentation for High-Throughput Phenotypic Screening

Journal of Imaging doi: 10.3390/jimaging12040179

Authors: Bendegúz H. Zováthi Philipp Kainz

Reproducible morphological profiling, particularly for drug discovery, has become an important tool for compound evaluation. Established workflows such as CellProfiler provide a widely adopted foundation for Cell Painting analysis. However, conventional pipelines often require substantial manual configuration and technical expertise, which can limit scalability and accessibility. In this study, a fully automated deep learning-based workflow is presented for segmentation-driven morphological profiling from raw microscopy data. Using a curated subset of the JUMP Cell Painting pilot dataset, ground-truth masks were generated and used to train a U-net–based segmentation model in the IKOSA platform. Post-processing strategies were introduced to improve instance separation and reduce segmentation artifacts. The final model achieved strong segmentation performance (precision/recall/AP up to 0.98/0.94/0.92 for nuclei), with an average runtime of 2.2 s per 1080 × 1080 image. Segmentation outputs enabled large-scale feature extraction, yielding 3664 morphological descriptors that showed high correlation with CellProfiler-derived measurements (normalized MAE: 0.0298). Feature prioritization further reduced redundancy to 1145 informative descriptors. These results demonstrate that automated deep learning pipelines can complement established Cell Painting workflows by reducing configuration overhead while maintaining compatibility with validated morphological profiling standards. The proposed workflow may help improve resource efficiency in drug discovery and personalized medicine.

J. Imaging, Vol. 12, Pages 178: A Method for Paired Comparisons of Glo Germ Quantity in Images of Hands Before and After Washing

Jordan Ali Rashid — 2026-04-21

J. Imaging, Vol. 12, Pages 178: A Method for Paired Comparisons of Glo Germ Quantity in Images of Hands Before and After Washing

Journal of Imaging doi: 10.3390/jimaging12040178

Authors: Jordan Ali Rashid Stuart Criley

We present a reproducible pipeline that converts color images into quantitative fluorescence maps by combining spectral measurement with a linear mixture model. The method is designed specifically for quantitative comparisons of Glo Germ™ on images of hands taken under different experimental conditions with controlled illumination. The emission spectrum of Glo Germ is measured using a spectral photometer and normalized to obtain its spectral power density function. This spectrum is projected into CIE XYZ coordinates and incorporated into a linear mixture model in which each pixel contains contributions from white light, UV-illuminated skin reflectance, and fluorophore emission. Component magnitudes are estimated with non-negative least squares, yielding a grayscale image whose intensity is a monotonic proxy for local fluorophore density. Spatial integration provides an image-level summary proportional to total detected material. Compared with single-channel proxies, the observer suppresses background structure, improves contrast, and remains radiometrically interpretable. Because the method depends only on measurable spectra and linear transforms, it can be reproduced across cameras and extended to other fluorophores.

J. Imaging, Vol. 12, Pages 177: Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

Dana El-Rushaidat — 2026-04-20

J. Imaging, Vol. 12, Pages 177: Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

Journal of Imaging doi: 10.3390/jimaging12040177

Authors: Dana El-Rushaidat Nour Almohammad Raine Yeh Kinda Fayyad

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community.

J. Imaging, Vol. 12, Pages 176: MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification

Ghadeer Al Sukkar — 2026-04-19

J. Imaging, Vol. 12, Pages 176: MSWA-ResNet: Multi-Scale Wavelet Attention for Patient-Level and Interpretable Breast Cancer Histopathology Classification

Journal of Imaging doi: 10.3390/jimaging12040176

Authors: Ghadeer Al Sukkar Ali Rodan Azzam Sleit

Breast cancer histopathological classification is critical for diagnosis and treatment planning, yet manual assessment remains time-consuming and subject to inter-observer variability. Although deep learning approaches have advanced automated analysis, image-level data splitting may introduce data leakage, and spatial-domain architectures lack explicit multi-scale frequency modeling. This study proposes MSWA-ResNet, a Multi-Scale Wavelet Attention Residual Network that embeds recursive discrete wavelet decomposition within residual blocks to enable frequency-aware and scale-aware feature learning. The model is evaluated on the BreakHis dataset using a strict patient-level protocol with 70/30 patient-wise splitting, five-fold stratified cross-validation, ensemble prediction, and hierarchical aggregation from patch to patient level. MSWA-ResNet achieves 96% patient-level accuracy at 100×, 200×, and 400× magnifications, and 92% at 40×, with F1-scores of 0.97 and 0.94, respectively. At 200× and 400×, accuracy improves from 0.92 to 0.96 and F1-score from 0.94 to 0.97 over baseline CNNs while maintaining 11.8–12.1 M parameters and 2.5–4.8 ms inference time. Grad-CAM demonstrates improved localization of diagnostically relevant regions, indicating that explicit multi-scale frequency modeling enhances accurate and interpretable patient-level classification.

J. Imaging, Vol. 12, Pages 175: SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization

Rubi Quiñones — 2026-04-17

J. Imaging, Vol. 12, Pages 175: SurveyNet: A Unified Deep Learning Framework for OCR and OMR-Based Survey Digitization

Journal of Imaging doi: 10.3390/jimaging12040175

Authors: Rubi Quiñones Sreeja Cheekireddy Eren Gultepe

Manual survey data entry remains a bottleneck in large-scale research, marketing, and public policy, where survey sheets are still widely used due to accessibility and high response rates. Despite the progress in Optical Character Recognition (OCR) and Optical Mark Recognition (OMR), existing systems treat these tasks separately and are typically tailored to clean, standardized forms, making them unreliable for real-world survey sheets with diverse markings and handwritten inputs. These limitations hinder automation and introduce significant error rates in data transcription. To address this, we propose SurveyNet, a unified deep learning framework that combines OCR and OMR capabilities to automatically digitize complex survey responses within a single model. SurveyNet processes both handwritten digits and a wide variety of mark types including ticks, circles, and crosses across multiple question formats. We also introduce SurveySet, a novel dataset comprising 135 real-world survey forms annotated across four key response types. Experimental results demonstrate that SurveyNet achieves between 50% and 97% classification accuracy across tasks, with strong performance even on small and imbalanced datasets. This framework offers a scalable solution for streamlining survey digitization workflows, reducing manual errors, and enabling timely analysis in domains ranging from consumer research to public health and education.

J. Imaging, Vol. 12, Pages 174: Vision-Based Measurement of Breathing Deformation in Wind Turbine Blade Fatigue Test

Xianlong Wei — 2026-04-17

J. Imaging, Vol. 12, Pages 174: Vision-Based Measurement of Breathing Deformation in Wind Turbine Blade Fatigue Test

Journal of Imaging doi: 10.3390/jimaging12040174

Authors: Xianlong Wei Cailin Li Zhiyong Wang Zhao Hai Jinghua Wang Leian Zhang

Wind turbine blades are subjected to complex environmental conditions during long-term operation, which may lead to structural degradation and performance loss. To ensure structural integrity, fatigue testing prior to deployment is essential. This paper proposes a vision-based method for measuring the full-cycle breathing deformation of wind turbine blades during fatigue testing. The method captures dynamic image sequences of the blade’s hotspot cross-section using industrial cameras and employs a feature-based template matching approach to reconstruct the three-dimensional coordinates of target points. Through coordinate transformation, the deformation trajectories are obtained, enabling quantitative analysis of the blade’s dynamic responses in both flapwise and edgewise directions. A dedicated hardware–software system was developed and validated through full-scale fatigue experiments. Quantitative comparison with strain gage measurements shows that the proposed method achieves mean absolute deviations of 0.84 mm and 0.93 mm in two independent experiments, respectively, with closely matched deformation trends under typical loading conditions. These results demonstrate that the proposed method can reliably capture the global deformation behavior of the blade with millimeter-level accuracy, while significantly reducing instrumentation complexity compared to conventional contact-based approaches. The proposed method provides an effective and practical solution for full-field dynamic deformation measurement in blade fatigue testing, offering strong potential for structural health monitoring and early damage detection in wind turbine systems.

J. Imaging, Vol. 12, Pages 173: Cracking the Code: Computational Image Analysis Tools for Histopathological and Morphometric Insights

Ana Luisa Teixeira de Almeida — 2026-04-17

J. Imaging, Vol. 12, Pages 173: Cracking the Code: Computational Image Analysis Tools for Histopathological and Morphometric Insights

Journal of Imaging doi: 10.3390/jimaging12040173

Authors: Ana Luisa Teixeira de Almeida Ana Beatriz Gram dos Santos Debora Ferreira Barreto-Vieira

The assessment of histopathological features has evolved considerably, transitioning from traditional manual measurements to more sophisticated, technology-assisted approaches. Classical histological evaluation, while foundational and highly reliable, is inherently labor-intensive and subject to inter-observer variability. With the advent of digital pathology, these practices have been progressively enhanced by image processing software, which offers capabilities such as segmentation, feature extraction, and data visualization. However, despite their promise, the integration of machine learning into this branch of pathology faces notable challenges, such as the need for large, high-quality annotated datasets and the integration into existing workflows, which remain significant hurdles. Looking forward, the role of specialists in histological evaluation remains crucial in this evolving landscape. While automation streamlines routine tasks, the expertise of pathologists is indispensable in validating results and interpreting findings in scientific contexts. This comprehensive review explores the trajectory of histological evaluation methods, from manual and classical strategies to cutting-edge digital tools, highlighting the benefits, limitations, and implications of each approach in contemporary practice.

J. Imaging, Vol. 12, Pages 172: Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection

Nada Said — 2026-04-16

J. Imaging, Vol. 12, Pages 172: Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection

Journal of Imaging doi: 10.3390/jimaging12040172

Authors: Nada Said Bilal Nakhal Ali El-Zaart Lama Affara

Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable vanishing point determination. We present a Dual RANSAC with Rescue Midpoint-based Multi-Trend Vanishing Point Detection framework, which targets the simultaneous detection and fine-tuning of multiple, globally consistent vanishing points. The proposed framework introduces a novel Midpoint-based Multi-Trend Random Sample Consensus formulation that operates on line segment midpoints to infer dominant directional groups, thereby eliminating noisy or unstable midpoints and stabilizing subsequent vanishing point inference. The main novelty lies in using line segment midpoints to model the orientation variation as a linear regression in the midpoint–orientation space, which helps reduce sensitivity to endpoint instability. Candidate vanishing points are prioritized through inlier-based confidence ranking and subsequently optimized via an MSAC-based arbiter to resolve hypothesis conflicts and minimize geometric error. We evaluate our work against state-of-the-art techniques such as J-Linkage and Conditional Sample Consensus, over two of the current challenging public datasets that comprise the York Urban Dataset and the Toulouse Vanishing Point Dataset. The results show that the proposed framework achieves a recall of up to 95% and an image success rate of almost 84%, outperforming both J-Linkage and Conditional Sample Consensus, especially under tighter angular thresholds. This demonstrates the ability of the proposed framework to provide enhanced stability and localization accuracy.

J. Imaging, Vol. 12, Pages 171: Morphological Convolutional Neural Network for Efficient Facial Expression Recognition

Robert — 2026-04-15

J. Imaging, Vol. 12, Pages 171: Morphological Convolutional Neural Network for Efficient Facial Expression Recognition

Journal of Imaging doi: 10.3390/jimaging12040171

Authors: Robert Sarifuddin Madenda Suryadi Harmanto Michel Paindavoine Dina Indarti

This study proposes a morphological convolutional neural network (MCNN) architecture that integrates morphological operations with CNN layers for facial expression recognition (FER). Conventional CNN-based FER models primarily rely on appearance features and may be sensitive to illumination and demographic variations. This work investigates whether morphological structural representations provide complementary information to convolutional features. A multi-source and multi-ethnic FER dataset was constructed by combining CK+, JAFFE, KDEF, TFEID, and a newly collected Indonesian Facial Expression dataset, resulting in 3684 images from 326 subjects across seven expression classes. Subject-independent data splitting with 10-fold cross-validation was applied to ensure reliable evaluation. Experimental results show that the proposed MCNN1 model achieves an average accuracy of 88.16%, while the best MCNN2 variant achieves 88.7%, demonstrating competitive performance compared to MobileNetV2 (88.27%), VGG19 (87.58%), and the morphological baseline MNN (50.73%). The proposed model also demonstrates improved computational efficiency, achieving lower inference latency (21%) and reduced GPU memory usage (64%) compared to baseline models. These results indicate that integrating morphological representations into convolutional architectures provides a modest but consistent improvement in FER performance while enhancing generalization and efficiency under heterogeneous data conditions.

J. Imaging, Vol. 12, Pages 170: ARS-GS: Anisotropic Reflective Spherical 3D Gaussian Splatting

Chenrui Wu — 2026-04-15

J. Imaging, Vol. 12, Pages 170: ARS-GS: Anisotropic Reflective Spherical 3D Gaussian Splatting

Journal of Imaging doi: 10.3390/jimaging12040170

Authors: Chenrui Wu Xinyu Shi Zhenzhong Chu Yao Huang

3D scene reconstruction serves as a fundamental technology with widespread applications in virtual reality, structural inspection, and robotic systems. While recent advances in 3D Gaussian Splatting have significantly enhanced scene reconstruction capabilities, the performance of such methods remains suboptimal when applied to highly reflective environments. To overcome this limitation, we introduce ARS-GS, a novel framework that integrates Anisotropic Spherical Gaussian reflection modeling and spherical harmonics diffuse approximation into a physically based rendering pipeline. This architecture incorporates a skip connection between the Anisotropic Spherical Gaussian module and the Gaussian primitives, effectively preserving surface details while maintaining computational efficiency. Comprehensive experimental evaluations validate the efficacy of ARS-GS across multiple datasets. Specifically, our method establishes new state-of-the-art quantitative benchmarks, achieving a peak signal-to-noise ratio of 38.30 and a structural similarity index measure of 0.997 on the neural radiance fields synthetic dataset, alongside a peak signal-to-noise ratio of 46.31 on the Gloss Blender dataset. Furthermore, on the challenging reflective neural radiance fields real-world dataset, our approach secures the highest peak signal-to-noise ratio scores, highlighted by a metric of 26.26 on the Sedan scene. The proposed method also substantially reduces perceptual errors, yielding a learned perceptual image patch similarity as low as 0.204, thereby consistently outperforming existing techniques in the reconstruction of highly specular surfaces with superior geometric fidelity.

J. Imaging, Vol. 12, Pages 169: Novel, Contrast Echocardiography-Based Trabeculation Quantification Method in the Diagnosis of Left Ventricular Excessive Trabeculation

Kristóf Attila Farkas-Sütő — 2026-04-14

J. Imaging, Vol. 12, Pages 169: Novel, Contrast Echocardiography-Based Trabeculation Quantification Method in the Diagnosis of Left Ventricular Excessive Trabeculation

Journal of Imaging doi: 10.3390/jimaging12040169

Authors: Kristóf Attila Farkas-Sütő Balázs Mester Flóra Klára Gyulánczi Krisztina Filipkó Hajnalka Vágó Béla Merkely Andrea Szűcs

Cardiac MRI (CMR) is the gold standard for diagnosing left ventricular excessive trabeculation (LVET), whereas echocardiography (Echo) often does not yield a definitive diagnosis. The use of ultrasound contrast material offers the potential for more accurate imaging of the trabecular system; however, we do not yet have diagnostic criteria developed specifically for contrast Echo (CE-Echo). We aimed to determine the role of CE-Echo in the diagnosis of LVET and to propose a novel method for quantifying trabeculation. We included 55 LVET subjects and 54 age- and sex-matched healthy Control subjects. All subjects underwent non-contrast Echo, CE-Echo, and CMR examinations. In addition to volumetric parameters and ejection fraction (EF), we measured the area of the trabeculated layer and its ratio to the LV area (Trab/LV_area) on apical CE-Echo views. Based on the CMR-derived diagnosis, the Trab/LV_area ratio identified individuals with LVET with high specificity (98%) and sensitivity (95%) when the average of the apical views reached 17% (AUC = 0.98), or when it exceeded 20% in at least one view (AUC = 0.96). The use of CE-Echo may assist in the quantitative diagnosis of LVET in addition to its morphological assessment, and the Trab_area/LVarea may be a good additional criterion in the diagnosis of LVET.

J. Imaging, Vol. 12, Pages 168: Assessing CNNs and LoRA-Fine-Tuned Vision–Language Models for Breast Cancer Histopathology Image Classification

Tomiris M. Zhaksylyk — 2026-04-14

J. Imaging, Vol. 12, Pages 168: Assessing CNNs and LoRA-Fine-Tuned Vision–Language Models for Breast Cancer Histopathology Image Classification

Journal of Imaging doi: 10.3390/jimaging12040168

Authors: Tomiris M. Zhaksylyk Beibit B. Abdikenov Nurbek M. Saidnassim Birzhan T. Ayanbayev Aruzhan S. Imasheva Temirlan S. Karibekov

Breast cancer histopathology classification remains a fundamental challenge in computational pathology due to variations in tissue morphology across magnification levels. Convolutional neural networks (CNNs) have long been the standard for image-based diagnosis, yet recent advances in vision-language models (VLMs) suggest they may provide strong and transferable representations for complex medical images. In this study, we present a systematic comparison between CNN baselines and large VLMs—Qwen2 and SmolVLM—fine-tuned with Low-Rank Adaptation (LoRA; r=16, α=32, dropout = 0.05) on the BreakHis dataset. Models were evaluated at 40×, 100×, 200×, and 400× magnifications using accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). While Qwen2 achieved moderate performance across magnifications (e.g., 0.8736 accuracy and 0.9552 AUC at 200×), SmolVLM consistently outperformed Qwen2 and substantially reduced the gap with CNN baselines, reaching up to 0.9453 accuracy and 0.9572 F1-score at 200×—approaching the performance of AlexNet (0.9543 accuracy) at the same magnification. CNN baselines, particularly ResNet34, remained the strongest models overall, achieving the highest performance across all magnifications (e.g., 0.9879 accuracy and 0.9984 AUC at 40×). These findings demonstrate that LoRA fine-tuned VLMs, despite requiring gradient accumulation and memory-efficient optimizers and operating with a significantly smaller number of trainable parameters, can achieve competitive performance relative to traditional CNNs. However, CNN-based architectures still provide the highest accuracy and robustness for histopathology classification. Our results highlight the potential of VLMs as parameter-efficient alternatives for digital pathology tasks, particularly in resource-constrained settings.

J. Imaging, Vol. 12, Pages 167: Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions

Sara Lopes — 2026-04-12

J. Imaging, Vol. 12, Pages 167: Artificial Intelligence in Pulmonary Endoscopy: Current Evidence, Limitations, and Future Directions

Journal of Imaging doi: 10.3390/jimaging12040167

Authors: Sara Lopes Miguel Mascarenhas João Fonseca Adelino F. Leite-Moreira

Background: Artificial intelligence (AI) is increasingly applied in pulmonary endoscopy, including diagnostic bronchoscopy, interventional pulmonology and endobronchial imaging. Advances in computer vision, machine learning and robotic systems have expanded the potential for automated lesion detection, navigation to peripheral pulmonary lesions, and real-time procedural support. However, the current evidence base remains heterogeneous, and translational challenges persist. Methods: This review summarizes current applications and developments of AI across white-light bronchoscopy (WLB), image-enhanced bronchoscopy (e.g., narrow-band imaging and autofluorescence imaging), endobronchial ultrasound (EBUS), virtual and robotic bronchoscopies, and workflow optimization and training. The authors also examine the methodological limitations, regulatory considerations, and implementation barriers that affect translation into routine practice. Results: Reported developments include deep learning-based models for mucosal abnormality detection, lymph-node characterization during EBUS-guided transbronchial needle aspiration (EBUS-TBNA), improved lesion localization, and reduction in operator-dependent variability. Additionally, AI-assisted simulation platforms and decision-support tools are reshaping training paradigms. Nevertheless, most studies remain retrospective or single-center, with limited external validation, dataset heterogeneity, unclear model explainability, and incomplete integration into clinical workflows. Conclusions: AI has the potential to support lesion detection, navigation, and training in pulmonary endoscopy. However, robust prospective validation, standardized datasets, transparent model reporting, robust data governance, multidisciplinary collaboration, and careful integration into clinical practice are required before widespread adoption.

J. Imaging, Vol. 12, Pages 166: A TV–BM3D Iterative Algorithm for VMAT-CT Reconstruction

Chia-Lung Chien — 2026-04-10

J. Imaging, Vol. 12, Pages 166: A TV–BM3D Iterative Algorithm for VMAT-CT Reconstruction

Journal of Imaging doi: 10.3390/jimaging12040166

Authors: Chia-Lung Chien Beibei Guo Rui Zhang

Volumetric modulated arc therapy-computed tomography (VMAT-CT), which is the CT reconstructed using the portal images collected during VMAT, can potentially be an effective onsite imaging tool. The goal of this study was to propose an iterative reconstruction algorithm that can further improve the image quality of VMAT-CT and reduce the number of failed reconstructions. An iterative algorithm combining total variation (TV) with block-matching and 3D filtering (BM3D) was proposed, addressing the L1-L2 regularization problem using the split Bregman method. We collected portal images from 67 VMAT cases including 50 phantom and 17 real-patient cases. Both Feldkamp–Davis–Kress (FDK) and TV-BM3D iterative algorithms were used to reconstruct VMAT-CT using the collected images. The preprocessing methods developed by our group previously were also used in this study. A total of 48 out of 50 phantom cases and 15 out of 17 real-patient cases were successfully reconstructed using the iterative algorithm together with image preprocessing. In contrast, 39 phantom cases and 8 patient cases could be reconstructed using the original FDK algorithm, and 44 phantom cases and 11 patient cases could be reconstructed using the FDK algorithm together with preprocessing. Compared with the FDK algorithm, the TV-BM3D iterative algorithm significantly improved the image quality of VMAT-CT at all treatment sites. To the best of our knowledge, this study is the first to develop an iterative VMAT-CT reconstruction algorithm. It can be used to reconstruct CT images locally, and is superior to FDK-based algorithms in terms of the success rate and reconstructed image quality. This strongly supports the use of VMAT-CT as a promising imaging tool for treatment monitoring and adaptive radiotherapy.

J. Imaging, Vol. 12, Pages 165: Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Chingiz Seyidbayli — 2026-04-10

J. Imaging, Vol. 12, Pages 165: Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Journal of Imaging doi: 10.3390/jimaging12040165

Authors: Chingiz Seyidbayli Soheil Nezakat Andreas Reinhardt

Cloud cover significantly reduces the electrical power output of photovoltaic systems, making accurate short-term cloud movement predictions essential for reliable solar energy production planning. This article presents a deep learning framework that directly estimates cloud movement from ground-based all-sky camera images, rather than predicting future production from past power data. The system is based on a three-step process: First, a lightweight Convolutional Neural Network segments cloud regions and produces probabilistic masks that represent the spatial distribution of clouds in a compact and computationally efficient manner. This allows subsequent models to focus on the geometry of clouds rather than irrelevant visual features such as illumination changes. Second, a Vector Quantized Variational Autoencoder compresses these masks into discrete latent token sequences, reducing dimensionality while preserving fundamental cloud structure patterns. Third, a GPT-style autoregressive transformer learns temporal dependencies in this token space and predicts future sequences based on past observations, enabling iterative multi-step predictions, where each prediction serves as the input for subsequent time steps. Our evaluations show an average intersection-over-union ratio of 0.92 and a pixel accuracy of 0.96 for single-step (5 s ahead) predictions, while performance smoothly decreases to an intersection-over-union ratio of 0.65 and an accuracy of 0.80 in 10 min autoregressive propagation. The framework also provides prediction uncertainty estimates through token-level entropy measurement, which shows positive correlation with prediction error and serves as a confidence indicator for downstream decision-making in solar energy forecasting applications.

J. Imaging, Vol. 12, Pages 164: High-Resolution Measurement of Surface Normal Maps Using Specular Reflection Imaging

Shinichi Inoue — 2026-04-10

J. Imaging, Vol. 12, Pages 164: High-Resolution Measurement of Surface Normal Maps Using Specular Reflection Imaging

Journal of Imaging doi: 10.3390/jimaging12040164

Authors: Shinichi Inoue Yoshinori Igarashi Seiji Suzuki

This paper presents a method for measuring the spatial distribution of surface normal vectors with high angular accuracy. The measured data are visualized using a color-mapping technique and represented as normal maps, which are commonly used in computer graphics. Reliable methods for evaluating material surface properties have long been sought in industrial applications where visual assessments of reflective properties are still widely employed, particularly in appearance-critical fields. Motivated by this need, we introduce an imaging-based technique for measuring the high-resolution spatial distribution of surface normal vectors from specular reflection. A dedicated measurement apparatus was developed to capture surface normal vectors at 1024 × 1024 sampling points with a spatial resolution of 0.02 × 0.02 mm and an angular resolution of approximately 0.1°. Using this apparatus, normal maps were obtained for various materials, including plastic, ceramic tile, inkjet paper, and aluminum sheets. The spatial distribution of surface normal vectors reflects surface roughness, which strongly influences perceived texture. The resulting normal maps enable not only quantitative surface analysis for industrial inspection but also the physical reproduction of gloss in computer graphics.

J. Imaging, Vol. 12, Pages 163: A Robust Rule-Based Framework for Stone Detection and Posterior Acoustic Shadow Localization in Abdominal Ultrasound

Kyuseok Kim — 2026-04-09

J. Imaging, Vol. 12, Pages 163: A Robust Rule-Based Framework for Stone Detection and Posterior Acoustic Shadow Localization in Abdominal Ultrasound

Journal of Imaging doi: 10.3390/jimaging12040163

Authors: Kyuseok Kim Ji-Youn Kim

Posterior acoustic shadowing is a fundamental physical phenomenon associated with calcified stones in ultrasound image, yet it has not been fully exploited in automated ultrasound analysis. This study aimed to develop an explainable, semi-automatic rule-based framework that explicitly incorporates posterior acoustic shadow characteristics for stone detection and localization in a clinically guided manner. A rule-based framework was designed to generate stone candidates using morphological enhancement and to evaluate them through local contrast analysis, posterior shadow region assessment, and shape-based penalties. A composite score integrating these features was used to rank candidates. The method was evaluated on 52 kidney stone and 66 gallbladder stone ultrasound images, stratified into three diagnostic confidence categories. Performance was assessed using an ablation study and centroid distance error measured in pixels relative to expert-defined references. In the 50–60% confidence group, the accuracy increased from 0.29 to 0.64 for kidney stones and from 0.30 to 0.60 for gallbladder stones when posterior shadow information was included. Centroid distance errors in the ≥80% confidence group were 1.26 ± 0.28 mm for kidney stones and 1.44 ± 0.91 mm for gallbladder stones. The proposed framework enhances diagnostic confidence by leveraging physically grounded posterior acoustic shadow analysis and provides a reproducible augmentation to conventional ultrasound-based stone assessment.

J. Imaging, Vol. 12, Pages 162: D2MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering

Lingge Lai — 2026-04-09

J. Imaging, Vol. 12, Pages 162: D2MNet: Difference-Aware Decoupling and Multi-Prompt Learning for Medical Difference Visual Question Answering

Journal of Imaging doi: 10.3390/jimaging12040162

Authors: Lingge Lai Weihua Ou Jianping Gou Zhonghua Liu

Difference visual question answering (Diff-VQA) aims to answer questions by identifying and reasoning about differences between medical images. Existing methods often rely on simple feature subtraction or fusion to model image differences, while overlooking the asymmetric descriptive requirements of changed and unchanged cases and providing limited task-specific guidance to pretrained language decoders. To address these limitations, we propose D2MNet (Difference-aware Decoupling and Multi-prompt Network), a framework for medical Diff-VQA that combines change-aware reasoning with prompt-guided answer generation. Specifically, a Change Analysis Module (CAM) predicts whether a change is present and produces a binary change-aware prompt; a Difference-Aware Module (DAM) uses dual attention to capture fine-grained difference features; and a multi-prompt learning mechanism (MLM) injects question-aware, change-aware, and learnable prompts into the language decoder to improve contextual alignment and response generation. Experiments on the MIMIC-DiffVQA benchmark show that D2MNet achieves a CIDEr score of 2.907 ± 0.040, outperforming the strongest baseline, ReAl (2.409), under the same evaluation setting. These results demonstrate the effectiveness of the proposed design on benchmark medical Diff-VQA and suggest its potential for assisting difference-aware medical answer generation.

J. Imaging, Vol. 12, Pages 160: Comparative Assessment of Hyperspectral Image Segmentation Algorithms for Fruit Defect Detection Under Different Illumination Conditions

Anastasia Zolotukhina — 2026-04-08

J. Imaging, Vol. 12, Pages 160: Comparative Assessment of Hyperspectral Image Segmentation Algorithms for Fruit Defect Detection Under Different Illumination Conditions

Journal of Imaging doi: 10.3390/jimaging12040160

Authors: Anastasia Zolotukhina Anton Sudarev Georgiy Nesterov Demid Khokhlov

This study presents a comparative analysis of hyperspectral image segmentation algorithms for fruit defect detection under different illumination conditions. The research evaluates the performance of four segmentation methods (Spectral Angle Mapper, Random Forest, Support Vector Machine, and Neural Network) using three distinct illumination modes (local, simultaneous and sequential). The experimental setup employed hyperspectral imaging to assess tomato fruit samples, with data acquisition performed across the 450–850 nm spectral range. Quantitative metrics, including accuracy, error rate, precision, recall, F1-score, and Intersection over Union (IoU), were used to evaluate algorithm performance. Key findings indicate that Random Forest demonstrated superior performance across most metrics, particularly under simultaneous illumination conditions. The highest accuracy was achieved by Random Forest under sequential illumination (0.9971), while the best combination of segmentation metrics was obtained under simultaneous illumination, with an F1-score of 0.8996 and an IoU of 0.8176. The Neural Network showed competitive results. The Spectral Angle Mapper proved sensitive to illumination variations but excelled in specific scenarios requiring minimal memory usage. By demonstrating that acquisition protocol optimization can substantially improve segmentation performance, our results support the development of accurate, non-contact, high-throughput inspection systems and contribute to reducing postharvest losses and improving supply chain quality control.

J. Imaging, Vol. 12, Pages 161: RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery

Lei Li — 2026-04-08

J. Imaging, Vol. 12, Pages 161: RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery

Journal of Imaging doi: 10.3390/jimaging12040161

Authors: Lei Li Chenrong Fang Wei Li Kan Chen Baolong Li Qian Sun

Structural reconstruction helps infer the spatial relationships and object layouts in a scene, which is an essential computer vision task for understanding visual content. However, it remains challenging due to the high complexity of scene structural topologies in real-world environments. To address this challenge, this paper proposes RegionGraph, a novel method for structural reconstruction of buildings from a satellite image. It utilizes a layout region graph construction and graph contraction approach, introducing a primitive (layout region) estimation network named ConPNet for detecting and estimating different structural primitives. By combining structural extraction and rendering synthesis processes, RegionGraph constructs a graph structure with layout regions as nodes and adjacency relationships as edges, and transforms the graph optimization process into a node-merging-based graph contraction problem to obtain the final structural representation. The experiments demonstrated that RegionGraph achieves a 4% improvement in average F1 scores across three types of primitives and exhibits higher regional completeness and structural coherency in the reconstructed structure.

J. Imaging, Vol. 12, Pages 159: Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness

Hideyuki Ajiki — 2026-04-08

J. Imaging, Vol. 12, Pages 159: Experimental Analysis of the Effects of Image Lightness and Chroma Modulation on the Reproduction of Glossiness, Transparency and Roughness

Journal of Imaging doi: 10.3390/jimaging12040159

Authors: Hideyuki Ajiki Midori Tanaka

Even when an object’s color is accurately reproduced in a colorimetrically reproduced image (CRI), the perceived material appearance does not necessarily match that of the original object. This mismatch remains a challenge for faithfully reproducing real-world appearance in digital media. In this study, we investigated how lightness and chroma modulation affect the perception of glossiness, transparency, and roughness. These three attributes were quantitatively correlated with physical surface properties and image features through a direct comparison between objects and images. Observers selected the images that best matched the material appearance of the physical samples for each attribute. Image features derived from the gray-level co-occurrence matrix (GLCM) and surface roughness parameters were analyzed to compare the selected images with the CRI. In the lightness experiment, observers consistently selected images with higher lightness than the CRI, which was accompanied by increased complexity in the luminance distribution. In the chroma experiment, images with higher chroma were preferred; however, changes in GLCM features were negligible. Notably, stimuli with small local luminance differences at the CRI required larger shifts in image features to achieve perceptual matching. These findings indicate that modulating the luminance distribution is crucial for aligning the perceived appearance between physical objects and their digital representations.

J. Imaging, Vol. 12, Pages 158: Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

Hao Wang — 2026-04-08

J. Imaging, Vol. 12, Pages 158: Multimodal Fusion Prediction of Radiation Pneumonitis via Key Pre-Radiotherapy Imaging Feature Selection Based on Dual-Layer Attention Multiple-Instance Learning

Journal of Imaging doi: 10.3390/jimaging12040158

Authors: Hao Wang Dinghui Wu Shuguang Han Jingli Tang Wenlong Zhang

Radiation pneumonitis (RP), one of the most common and severe complications in locally advanced non-small cell lung cancer (LA-NSCLC) patients following thoracic radiotherapy, presents significant challenges in prediction due to the complexity of clinical risk factors, incomplete multimodal data, and unavailable slice-level annotations in pre-radiotherapy CT images. To address these challenges, we propose a multimodal fusion framework based on Dual-Layer Attention-Based Adaptive Bag Embedding Multiple-Instance Learning (DAAE-MIL) for accurate RP prediction. This study retrospectively collected data from 995 LA-NSCLC patients who received thoracic radiotherapy between November 2018 and April 2025. After screening, Subject datasets (n = 670) were allocated for training (n = 535), and the remaining samples (n = 135) were reserved for an independent test set. The proposed framework first extracts pre-radiotherapy CT image features using a fine-tuned C3D network, followed by the DAAE-MIL module to screen critical instances and generate bag-level representations, thereby enhancing the accuracy of deep feature extraction. Subsequently, clinical data, radiomics features, and CT-derived deep features are integrated to construct a multimodal prediction model. The proposed model demonstrates promising RP prediction performance across multiple evaluation metrics, outperforming both state-of-the-art and unimodal RP prediction approaches. On the test set, it achieves an accuracy (ACC) of 0.93 and an area under the curve (AUC) of 0.97. This study validates that the proposed method effectively addresses the limitations of single-modal prediction and the unknown key features in pre-radiotherapy CT images while providing significant clinical value for RP risk assessment.

J. Imaging, Vol. 12, Pages 157: A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning

Ludovica Ciardiello — 2026-04-06

J. Imaging, Vol. 12, Pages 157: A Method for Human Pose Estimation and Joint Angle Computation Through Deep Learning

Journal of Imaging doi: 10.3390/jimaging12040157

Authors: Ludovica Ciardiello Patrizia Agnello Marta Petyx Fabio Martinelli Mario Cesarelli Antonella Santone Francesco Mercaldo

Human pose estimation is a crucial task in computer vision with widespread applications in healthcare, rehabilitation, sports, and remote monitoring. In this paper, we propose a deep learning-based method for automatic human pose estimation and joint angle computation, tailored specifically for physiotherapy and telemedicine scenarios. Beyond pose estimation, the proposed method is able to compute angles between joints, enabling analysis of body alignment and posture. The proposed approach is built upon a customized skeleton with 25 anatomical keypoints and a dataset composed of over 150,000 annotated and augmented images derived from multiple open-source datasets. Experimental results demonstrate the effectiveness of the proposed method, achieving a mAP@50 of 0.58 for keypoint localization and 0.98 for object detection. Moreover, we demonstrate several real-world practical use cases in evaluating exercise correctness and identifying postural deviations by exploiting the proposed method, confirming that the proposed method can represent a promising approach for automated motion analysis, with potential impact on digital health, rehabilitation support, and remote patient care.

J. Imaging, Vol. 12, Pages 156: An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching

Kun Zhang — 2026-04-03

J. Imaging, Vol. 12, Pages 156: An Effective Non-Rigid Registration Approach for Ultrasound Images Based on the Improved Variational Model of Intensity, Local Phase Information and Descriptor Matching

Journal of Imaging doi: 10.3390/jimaging12040156

Authors: Kun Zhang Jinming Xing Qingtai Xiao

Ultrasound images have some limitations, such as low signal-to-noise ratio (SNR), speckle noise, lower dynamic range, blurred boundaries, and shadowing; therefore, ultrasound image registration is an important task for estimating tissue motion and analyzing tissue mechanical properties. In this paper, an effective non-rigid ultrasound image registration method is proposed. By integrating intensity, local phase information, and descriptor matching under a variational framework, we can find and track the non-rigid transformation of each pixel under diffeomorphism between the source and target images based on the warping technique. Experiments using simulation and in vivo ultrasound images of the human carotid artery are conducted to demonstrate the advantages of the proposed algorithm, which will act as an important supplement to current ultrasound image registration.

J. Imaging, Vol. 12, Pages 155: DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration

Lujun Zhai — 2026-04-03

J. Imaging, Vol. 12, Pages 155: DA-CycleGAN: Degradation-Adaptive Unpaired Super-Resolution for Historical Image Restoration

Journal of Imaging doi: 10.3390/jimaging12040155

Authors: Lujun Zhai Yonghui Wang Yu Zhou Suxia Cui

Historical images as the dominant method for documenting the world and its inhabitants can help us to better understand the real history. Due to the limited camera technology, historical images captured in the early to mid-20th century tend to be very blurry, unclear, noisy, and obscure. The goal of this paper is to super-resolve images for historical image restoration. Compared to the degradations in modern digital imagery, those in historical images have unique features that are typically much more complex and less well understood. The discrepancy between historical images and modern high-definition digital images leads to a significant performance drop for existing super-resolution (SR) models trained on modern digital imagery. To tackle this problem, we propose a new method, namely DA-CycleGAN. Specifically, the DA-CycleGAN is built on top of CycleGAN to achieve unsupervised learning. We introduce a degradation-adaptive (DA) module with strong, flexible adaptation to learn various unknown degradations from samples. Moreover, we collect a large dataset containing 10,000 low-resolution images from real historical films. The dataset features various natural degradations. Our experimental results demonstrate the superior performance of DA-CycleGAN and the effectiveness of our image dataset for achieving accurate super-resolution enhancement of historical images.

J. Imaging, Vol. 12, Pages 154: WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration

Junyuan Ma — 2026-04-02

J. Imaging, Vol. 12, Pages 154: WeatherMAR: Complementary Masking of Paired Tokens for Adverse-Weather Image Restoration

Journal of Imaging doi: 10.3390/jimaging12040154

Authors: Junyuan Ma Qunbo Lv Zheng Tan

Image restoration under adverse weather conditions has attracted increasing attention because of its importance for both human perception and downstream vision applications. Existing methods, however, are often designed for a single degradation type. We present WeatherMAR, a multi-weather restoration framework that formulates adverse-weather restoration as a paired-domain completion problem in a shared continuous token space. Specifically, WeatherMAR concatenates degraded and clean token sequences into a joint paired-domain sequence and performs restoration through masked autoregressive modeling, in which self-attention enables direct cross-domain interaction. To strengthen conditional learning while avoiding trivial paired correspondences, we introduce complementary bidirectional masking together with an optional reverse objective used only during training to encourage degradation-aware representations. WeatherMAR further employs a conditional diffusion objective for continuous token prediction and adopts a progress-to-step schedule to improve inference efficiency. Extensive experiments on standard multi-weather benchmarks, including Snow100K, Outdoor-Rain, and RainDrop, show that WeatherMAR achieves the best PSNR/SSIM on Snow100K-S (38.14/0.9684), the best SSIM on Outdoor-Rain (0.9396), and the best PSNR on Snow100K-L (32.58) and RainDrop (33.12). These results demonstrate that paired-domain token completion provides an effective solution for adverse-weather restoration.

J. Imaging, Vol. 12, Pages 153: Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI

Jesus David Rios-Perez — 2026-04-02

J. Imaging, Vol. 12, Pages 153: Radon-Guided Wavelet-Domain Attention U-Net for Periodic Artifact Suppression in Brain MRI

Journal of Imaging doi: 10.3390/jimaging12040153

Authors: Jesus David Rios-Perez German Sanchez-Torres John W. Branch-Bedoya Camilo Andres Laiton-Bonadiez

Periodic artifacts such as ringing (Gibbs), herringbone (spike/corduroy), and zipper patterns degrade the quality of brain MRI. We present a reproducible framework that (i) synthetically generates periodic artifacts with controllable severity directly in k-space, (ii) normalizes pattern orientation through a Radon-guided alignment step, and (iii) corrects them in the wavelet domain using a 2D DWT (AA/AD/DA/DD) with a band-weighted loss. The evaluation was conducted using DLBS T1-weighted 3T MRI volumes with synthetically generated periodic artifacts. It combined global image-quality metrics (SSIM, PSNR) with per-band metrics to quantify how correction concentrates on high-frequency components, and included ablation studies, mixed-artifact stress tests, and structural preservation analyses. Compared with several baseline architectures, the proposed approach shows improvements in structural fidelity and a reduction in periodic patterns (SSIM: 0.985±0.022; PSNR: 43.337±5.364; reduction in concentrated error in high-frequency bands), while preserving unaffected structures. These findings indicate that, within a controlled synthetic benchmark, aligning the pattern orientation prior to learning and optimizing correction in the wavelet domain enables suppression of synthetically generated periodic artifacts while limiting over-smoothing.

J. Imaging, Vol. 12, Pages 152: Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support

Alexandru-George Berciu — 2026-03-31

J. Imaging, Vol. 12, Pages 152: Semi-Automated Computational Identification of Fibrosis for Enhanced Histopathological Decision Support

Journal of Imaging doi: 10.3390/jimaging12040152

Authors: Alexandru-George Berciu Diana Rus-Gonciar Teodora Mocan Lucia Agoston-Coldea Carmen Cionca Eva-Henrietta Dulf

Myocardial fibrosis is a critical prognostic marker involving a progressive cascade of pathological conditions. Accurate assessment of fibrosis in myocardial samples is a routine but difficult procedure for pathologists. This article presents a semi-automated system designed to ease this task while providing pixel-level accuracy that exceeds manual estimation capabilities. The proposed innovative approach combines Gabor filters with CIELAB color space analysis to ensure the efficiency and interpretability of calculations. Testing on histopathological samples, differentiating between fibrous, healthy, and variant tissues, yielded a promising accuracy of 87.5% for images with fibrosis and 80% for all 45 images tested. This system successfully establishes a solid foundation for automated diagnosis, providing pathologists with a reliable and highly accurate tool for quantitative analysis of cardiac tissue.

J. Imaging, Vol. 12, Pages 151: Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review

Pasquale Frisina — 2026-03-30

J. Imaging, Vol. 12, Pages 151: Radiomic Characterization of Adrenal Incidentalomas on NECT: Retrospective Exploratory Study and Systematic Review

Journal of Imaging doi: 10.3390/jimaging12040151

Authors: Pasquale Frisina Paolo Ricci Filippo Valentini Daniela Messineo

Radiomics may aid the noninvasive characterization of adrenal incidentalomas; however, reproducibility is limited by methodological heterogeneity. In this retrospective, single-center, exploratory study, we tested whether radiomic features from baseline non-enhanced computed tomography (NECT) discriminate benign from malignant/metastatic adrenal lesions and contextualized results with a PRISMA 2020 systematic review (PubMed/Scopus 2017–2025; PROSPERO CRD420251276627). Thirty-three patients (36 lesions: 12 lipid-rich adenomas, 9 lipid-poor adenomas, 6 pheochromocytomas, 7 malignant/metastatic lesions, 2 myelolipomas) were included; myelolipomas were excluded from primary comparisons. Two abdominal radiologists performed consensus 3D segmentation on NECT. Using LIFEx (v7.8.0) and IBSI definitions, 42 features were extracted and z-score standardized. LASSO selected four heterogeneity descriptors: First-order Entropy, gray-level co-occurrence matrix (GLCM) entropy, gray-level size zone matrix (GLSZM) non-uniformity, and neighboring gray tone difference matrix (NGTDM) busyness. Heterogeneity increased from lipid-rich adenomas to pheochromocytomas and malignant/metastatic lesions (Kruskal–Wallis, all p < 0.001. Pairwise separability, measured using the Vargha–Delaney A index (VDA) as a rank-based measure of separability, was highest for lipid-rich adenomas versus malignant/metastatic lesions (0.93), intermediate for lipid-poor adenomas versus pheochromocytomas (0.73), and lowest for lipid-rich versus lipid-poor adenomas (0.64). The review identified 18 eligible CT radiomics studies that consistently reported higher entropy/non-uniformity in pheochromocytomas and malignant lesions than in lipid-rich adenomas. Global heterogeneity metrics on NECT may complement conventional CT criteria in indeterminate lesions; external validation with robust reference standards is needed in larger, multicenter cohorts with harmonization.

J. Imaging, Vol. 12, Pages 150: Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)

Md. Rahatul Islam — 2026-03-30

J. Imaging, Vol. 12, Pages 150: Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)

Journal of Imaging doi: 10.3390/jimaging12040150

Authors: Md. Rahatul Islam Yui Kataoka Keisuke Teramoto Keiichi Horio

Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten.

J. Imaging, Vol. 12, Pages 149: A New Feature Set for Texture-Based Classification of Remotely Sensed Images in a Quantum Framework

Archana G. Pai — 2026-03-30

J. Imaging, Vol. 12, Pages 149: A New Feature Set for Texture-Based Classification of Remotely Sensed Images in a Quantum Framework

Journal of Imaging doi: 10.3390/jimaging12040149

Authors: Archana G. Pai Koushikey Chhapariya Krishna M. Buddhiraju Surya S. Durbha

Texture feature extraction plays a crucial role in land-use and land-cover (LULC) classification for the remotely sensed images. However, when these images are quantized to a limited number of gray levels to reduce data volume or noise, conventional texture descriptors often lose discriminative power. This study investigates singular values of the gray-level co-occurrence matrix (GLCM) as novel texture features for image classification, with local binary pattern (LBP), complete LBP (CLBP) statistics, and original GLCM features proposed by Haralick et al. for comparison. Under coarse quantization, texture descriptors of LBP and its variants, which encode micro-texture, lose detail, whereas GLCM, which encodes macro-texture, retains structural co-occurrence patterns. This study thus proposes a new feature set, namely the Singular Values of the gray-level co-occurrence matrix (SVGM), for texture discrimination. Experimental analysis indicates SVGM achieves higher class separability by preserving dominant spatial structure while suppressing noise and redundancy. Quantitative evaluation using classical SVMs with multiple kernels, quantum learning models with different kernels, and neural baselines (ANN and 1D-CNN) further shows that SVGM consistently improves classification performance. Within our tested models, quantum kernel SVMs are competitive and achieve the best results on some datasets, while classical models perform best on others.

J. Imaging, Vol. 12, Pages 148: Quantifying Light Harshness: Method Automation and Influence of Photographic Light Modifiers

Veronika Štampfl — 2026-03-27

J. Imaging, Vol. 12, Pages 148: Quantifying Light Harshness: Method Automation and Influence of Photographic Light Modifiers

Journal of Imaging doi: 10.3390/jimaging12040148

Authors: Veronika Štampfl Jure Ahtik

Accurate assessment of light properties is essential and is measured with photometric and colorimetric standardized methods. However, the spatial characteristic of light—harshness—remains difficult to quantify. Building on the authors’ previous work, this study presented a fully automated method for determining light source harshness based on image analysis of cast shadows in a standardized environment. The improved method eliminated the need for manual shadow segmentation by introducing algorithmic noise removal and adaptive smoothing of shadow data. The method was applied to 180 test images comprising 30 combinations of photographic light-shaping attachments (e.g., softboxes, beauty dishes, and snoots) across two light sources (halogen and xenon) and three intensity levels. The results showed that the method was capable of detecting subtle differences in shadow properties and confirmed the influence of geometry, material, and orientation of the light modifiers on harshness. In addition, the results provided quantitative insight into the influence of photographic light modifiers on the original light.

J. Imaging, Vol. 12, Pages 147: A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation

Razeen Hussain — 2026-03-26

J. Imaging, Vol. 12, Pages 147: A Systematic Review of Deep Learning Approaches for Hepatopancreatic Tumor Segmentation

Journal of Imaging doi: 10.3390/jimaging12040147

Authors: Razeen Hussain Muhammad Mohsin Dadan Khan Mohammad Zohaib

Deep learning has advanced rapidly in medical image segmentation, yet hepatopancreatic tumor delineation remains challenging due to low contrast, small lesion size, organ variability, and limited high-quality annotations. Existing reviews are outdated or overly broad, leaving recent architectural developments, training strategies, and dataset limitations insufficiently synthesized. To address this gap, we conducted a PRISMA 2020 systematic literature review of studies published between 2021 and 2026 on deep learning-based liver and pancreatic tumor segmentation. From 2307 records, 84 studies met inclusion criteria. U-Net variants continue to dominate, achieving strong liver segmentation but inconsistent tumor accuracy, while transformer-based and hybrid models improve global context modeling at higher computational cost. Attention mechanisms, boundary-refinement modules, and semi-supervised learning offer incremental gains, yet pancreatic tumor segmentation remains notably difficult. Persistent issues, including domain shift, class imbalance, and limited generalization across datasets, underscore the need for more robust architectures, standardized benchmarks, and clinically oriented evaluation. This review consolidates recent progress and highlights key challenges that must be addressed to advance reliable hepatopancreatic tumor segmentation.