MDPI - Publisher of Open Access Journals

31 pages, 34773 KB

Open AccessArticle

Learning Domain-Invariant Representations for Event-Based Motion Segmentation: An Unsupervised Domain Adaptation Approach

by Mohammed Jeryo and Ahad Harati

J. Imaging 2025, 11(11), 377; https://doi.org/10.3390/jimaging11110377 (registering DOI) - 27 Oct 2025

Abstract

Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of [...] Read more.

Event cameras provide microsecond temporal resolution, high dynamic range, and low latency by asynchronously capturing per-pixel luminance changes, thereby introducing a novel sensing paradigm. These advantages render them well-suited for high-speed applications such as autonomous vehicles and dynamic environments. Nevertheless, the sparsity of event data and the absence of dense annotations are significant obstacles to supervised learning for motion segmentation from event streams. Domain adaptation is also challenging due to the considerable domain shift in intensity images. To address these challenges, we propose a two-phase cross-modality adaptation framework that translates motion segmentation knowledge from labeled RGB-flow data to unlabeled event streams. A dual-branch encoder extracts modality-specific motion and appearance features from RGB and optical flow in the source domain. Using reconstruction networks, event voxel grids are converted into pseudo-image and pseudo-flow modalities in the target domain. These modalities are subsequently re-encoded using frozen RGB-trained encoders. Multi-level consistency losses are implemented on features, predictions, and outputs to enforce domain alignment. Our design enables the model to acquire domain-invariant, semantically rich features through the use of shallow architectures, thereby reducing training costs and facilitating real-time inference with a lightweight prediction path. The proposed architecture, alongside the utilized hybrid loss function, effectively bridges the domain and modality gap. We evaluate our method on two challenging benchmarks: EVIMO2, which incorporates real-world dynamics, high-speed motion, illumination variation, and multiple independently moving objects; and MOD++, which features complex object dynamics, collisions, and dense 1kHz supervision in synthetic scenes. The proposed UDA framework achieves 83.1% and 79.4% accuracy on EVIMO2 and MOD++, respectively, outperforming existing state-of-the-art approaches, such as EV-Transfer and SHOT, by up to 3.6%. Additionally, it is lighter and faster and also delivers enhanced mIoU and F1 Score. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

37 pages, 14970 KB

Open AccessArticle

Research on Strawberry Visual Recognition and 3D Localization Based on Lightweight RAFS-YOLO and RGB-D Camera

by Kaixuan Li, Xinyuan Wei, Qiang Wang and Wuping Zhang

Agriculture 2025, 15(21), 2212; https://doi.org/10.3390/agriculture15212212 (registering DOI) - 24 Oct 2025

Viewed by 166

Abstract

Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with [...] Read more.

Improving the accuracy and real-time performance of strawberry recognition and localization algorithms remains a major challenge in intelligent harvesting. To address this, this study presents an integrated approach for strawberry maturity detection and 3D localization that combines a lightweight deep learning model with an RGB-D camera. Built upon the YOLOv11 framework, an enhanced RAFS-YOLO model is developed, incorporating three core modules to strengthen multi-scale feature fusion and spatial modeling capabilities. Specifically, the CRA module enhances spatial relationship perception through cross-layer attention, the HSFPN module performs hierarchical semantic filtering to suppress redundant features, and the DySample module dynamically optimizes the upsampling process to improve computational efficiency. By integrating the trained model with RGB-D depth data, the method achieves precise 3D localization of strawberries through coordinate mapping based on detection box centers. Experimental results indicate that RAFS-YOLO surpasses YOLOv11n, improving precision, recall, and mAP@50 by 4.2%, 3.8%, and 2.0%, respectively, while reducing parameters by 36.8% and computational cost by 23.8%. The 3D localization attains millimeter-level precision, with average RMSE values ranging from 0.21 to 0.31 cm across all axes. Overall, the proposed approach achieves a balance between detection accuracy, model efficiency, and localization precision, providing a reliable perception framework for intelligent strawberry-picking robots. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

25 pages, 6045 KB

Open AccessArticle

Energy-Aware Sensor Fusion Architecture for Autonomous Channel Robot Navigation in Constrained Environments

by Mohamed Shili, Hicham Chaoui and Khaled Nouri

Sensors 2025, 25(21), 6524; https://doi.org/10.3390/s25216524 - 23 Oct 2025

Viewed by 308

Abstract

Navigating autonomous robots in confined channels is inherently challenging due to limited space, dynamic obstacles, and energy constraints. Existing sensor fusion strategies often consume excessive power because all sensors remain active regardless of environmental conditions. This paper presents an energy-aware adaptive sensor fusion [...] Read more.

Navigating autonomous robots in confined channels is inherently challenging due to limited space, dynamic obstacles, and energy constraints. Existing sensor fusion strategies often consume excessive power because all sensors remain active regardless of environmental conditions. This paper presents an energy-aware adaptive sensor fusion framework for channel robots that deploys RGB cameras, laser range finders, and IMU sensors according to environmental complexity. Sensor data are fused using an adaptive Extended Kalman Filter (EKF), which selectively integrates multi-sensor information to maintain high navigation accuracy while minimizing energy consumption. An energy management module dynamically adjusts sensor activation and computational load, enabling significant reductions in power consumption while preserving navigation reliability. The proposed system is implemented on a low-power microcontroller and evaluated through simulations and prototype testing in constrained channel environments. Results show a 35% reduction in energy consumption with minimal impact on navigation performance, demonstrating the framework’s effectiveness for long-duration autonomous operations in pipelines, sewers, and industrial ducts. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Graphical abstract

20 pages, 2508 KB

Open AccessArticle

An Attention-Enhanced Network for Person Re-Identification via Appearance–Gait Fusion

by Zelong Yu, Yixiang Cai, Hanming Xu, Lei Chen, Mingqian Yang, Huabo Sun and Xiangyu Zhao

Electronics 2025, 14(21), 4142; https://doi.org/10.3390/electronics14214142 - 22 Oct 2025

Viewed by 239

Abstract

The objective of person re-identification (Re-ID) is to recognize a given target pedestrian across different cameras. However, perspective variations, resulting from differences in shooting angles, often significantly impact the accuracy of person Re-ID. To address this issue, this paper presents an attention-enhanced person [...] Read more.

The objective of person re-identification (Re-ID) is to recognize a given target pedestrian across different cameras. However, perspective variations, resulting from differences in shooting angles, often significantly impact the accuracy of person Re-ID. To address this issue, this paper presents an attention-enhanced person Re-ID algorithm based on appearance–gait information interaction. Specifically, appearance features and gait features are first extracted from RGB images and gait energy images (GEIs), respectively, using two ResNet-50 networks. Then, a multimodal information exchange module based on the attention mechanism is designed to build a bridge for information exchange between the two modalities during the feature extraction process. This module aims to enhance the feature extraction ability through mutual guidance and reinforcement between the two modalities, thereby improving the model’s effectiveness in integrating the two types of modal information. Subsequently, to further balance the signal-to-noise ratio, importance weight estimation is employed to map perspective information into the importance weights of the two features. Finally, based on the autoencoder structure, the two features are weighted and fused under the guidance of importance weights to generate fused features that are robust to perspective changes. The experimental results on the CASIA-B dataset indicate that, under conditions of viewpoint variation, the method proposed in this paper achieved an average accuracy of 94.9%, which is 1.1% higher than the next best method, and obtained the smallest variance of 4.199, suggesting that the method proposed in this paper is not only more accurate but also more stable. Full article

(This article belongs to the Special Issue Artificial Intelligence and Microsystems)

► Show Figures

Figure 1

28 pages, 10678 KB

Open AccessArticle

Deep-DSO: Improving Mapping of Direct Sparse Odometry Using CNN-Based Single-Image Depth Estimation

by Erick P. Herrera-Granda, Juan C. Torres-Cantero, Israel D. Herrera-Granda, José F. Lucio-Naranjo, Andrés Rosales, Javier Revelo-Fuelagán and Diego H. Peluffo-Ordóñez

Mathematics 2025, 13(20), 3330; https://doi.org/10.3390/math13203330 - 19 Oct 2025

Viewed by 332

Abstract

In recent years, SLAM, visual odometry, and structure-from-motion approaches have widely addressed the problems of 3D reconstruction and ego-motion estimation. Of the many input modalities that can be used to solve these ill-posed problems, the pure visual alternative using a single monocular RGB [...] Read more.

In recent years, SLAM, visual odometry, and structure-from-motion approaches have widely addressed the problems of 3D reconstruction and ego-motion estimation. Of the many input modalities that can be used to solve these ill-posed problems, the pure visual alternative using a single monocular RGB camera has attracted the attention of multiple researchers due to its low cost and widespread availability in handheld devices. One of the best proposals currently available is the Direct Sparse Odometry (DSO) system, which has demonstrated the ability to accurately recover trajectories and depth maps using monocular sequences as the only source of information. Given the impressive advances in single-image depth estimation using neural networks, this work proposes an extension of the DSO system, named DeepDSO. DeepDSO effectively integrates the state-of-the-art NeW CRF neural network as a depth estimation module, providing depth prior information for each candidate point. This reduces the point search interval over the epipolar line. This integration improves the DSO algorithm’s depth point initialization and allows each proposed point to converge faster to its true depth. Experimentation carried out in the TUM-Mono dataset demonstrated that adding the neural network depth estimation module to the DSO pipeline significantly reduced rotation, translation, scale, start-segment alignment, end-segment alignment, and RMSE errors. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

22 pages, 3941 KB

Open AccessArticle

A Novel Approach of Pig Weight Estimation Using High-Precision Segmentation and 2D Image Feature Extraction

by Yan Chen, Zhiye Li, Ling Yin and Yingjie Kuang

Animals 2025, 15(20), 2975; https://doi.org/10.3390/ani15202975 - 14 Oct 2025

Viewed by 421

Abstract

In modern livestock production, obtaining accurate body weight measurements for pigs is essential for feeding management and economic assessment, yet conventional weighing is laborious and can stress animals. To address these limitations, we developed a contactless image-based pipeline that first uses BiRefNet for [...] Read more.

In modern livestock production, obtaining accurate body weight measurements for pigs is essential for feeding management and economic assessment, yet conventional weighing is laborious and can stress animals. To address these limitations, we developed a contactless image-based pipeline that first uses BiRefNet for high-precision background removal and YOLOv11-seg to extract the pig dorsal mask from top-view RGB images; from these masks we designed and extracted 17 representative phenotypic features (for example, dorsal area, convex hull area, major/minor axes, curvature metrics and Hu moments) and included camera height as a calibration input. We then compared eight machine-learning and deep-learning regressors to map features to body weight. The segmentation pipeline achieved mAP₅₀–₉₅ = 0.995 on the validation set, and the XGBoost regressor gave the best test performance (MAE = 3.9350 kg, RMSE = 5.2372 kg, R² = 0.9814). These results indicate the method provides accurate, low-cost and computationally efficient weight prediction from simple RGB images, supporting frequent, noninvasive monitoring and practical deployment in smart-farming settings. Full article

(This article belongs to the Section Pigs)

► Show Figures

Figure 1

23 pages, 10835 KB

Open AccessArticle

Evaluation of Post-Fire Treatments (Erosion Barriers) on Vegetation Recovery Using RPAS and Sentinel-2 Time-Series Imagery

by Fernando Pérez-Cabello, Carlos Baroja-Saenz, Raquel Montorio and Jorge Angás-Pajas

Remote Sens. 2025, 17(20), 3422; https://doi.org/10.3390/rs17203422 - 13 Oct 2025

Viewed by 331

Abstract

Post-fire soil and vegetation changes can intensify erosion and sediment yield by altering the factors controlling the runoff–infiltration balance. Erosion barriers (EBs) are widely used in hydrological and forest restoration to mitigate erosion, reduce sediment transport, and promote vegetation recovery. However, precise spatial [...] Read more.

Post-fire soil and vegetation changes can intensify erosion and sediment yield by altering the factors controlling the runoff–infiltration balance. Erosion barriers (EBs) are widely used in hydrological and forest restoration to mitigate erosion, reduce sediment transport, and promote vegetation recovery. However, precise spatial assessments of their effectiveness remain scarce, requiring validation through operational methodologies. This study evaluates the impact of EB on post-fire vegetation recovery at two temporal and spatial scales: (1) Remotely Piloted Aircraft System (RPAS) imagery, acquired at high spatial resolution but limited to a single acquisition date coinciding with the field flight. These data were captured using a MicaSense RedEdge-MX multispectral camera and an RGB optical sensor (SODA), from which NDVI and vegetation height were derived through aerial photogrammetry and digital surface models (DSMs). (2) Sentinel-2 satellite imagery, offering coarser spatial resolution but enabling multi-temporal analysis, through NDVI time series spanning four consecutive years. The study was conducted in the area of the Luna Fire (northern Spain), which burned in July 2015. A paired sampling design compared upstream and downstream areas of burned wood stacks and control sites using NDVI values and vegetation height. Results showed slightly higher NDVI values (0.45) upstream of the EB (p < 0.05), while vegetation height was, on average, ~8 cm lower than in control sites (p > 0.05). Sentinel-2 analysis revealed significant differences in NDVI distributions between treatments (p < 0.05), although mean values were similar (~0.32), both showing positive trends over four years. This study offers indirect insight into the functioning and effectiveness of EB in post-fire recovery. The findings highlight the need for continued monitoring of treated areas to better understand environmental responses over time and to inform more effective land management strategies. Full article

(This article belongs to the Special Issue Remote Sensing for Risk Assessment, Monitoring and Recovery of Fires)

► Show Figures

Figure 1

38 pages, 1548 KB

Open AccessPerspective

RGB-D Cameras and Brain–Computer Interfaces for Human Activity Recognition: An Overview

by Grazia Iadarola, Alessandro Mengarelli, Sabrina Iarlori, Andrea Monteriù and Susanna Spinsante

Sensors 2025, 25(20), 6286; https://doi.org/10.3390/s25206286 - 10 Oct 2025

Viewed by 696

Abstract

This paper provides a perspective on the use of RGB-D cameras and non-invasive brain–computer interfaces (BCIs) for human activity recognition (HAR). Then, it explores the potential of integrating both the technologies for active and assisted living. RGB-D cameras can offer monitoring of users [...] Read more.

This paper provides a perspective on the use of RGB-D cameras and non-invasive brain–computer interfaces (BCIs) for human activity recognition (HAR). Then, it explores the potential of integrating both the technologies for active and assisted living. RGB-D cameras can offer monitoring of users in their living environments, preserving their privacy in human activity recognition through depth images and skeleton tracking. Concurrently, non-invasive BCIs can provide access to intent and control of users by decoding neural signals. The synergy between these technologies may allow holistic understanding of both physical context and cognitive state of users, to enhance personalized assistance inside smart homes. The successful deployment in integrating the two technologies needs addressing critical technical hurdles, including computational demands for real-time multi-modal data processing, and user acceptance challenges related to data privacy, security, and BCI illiteracy. Continued interdisciplinary research is essential to realize the full potential of RGB-D cameras and BCIs as AAL solutions, in order to improve the quality of life for independent or impaired people. Full article

(This article belongs to the Special Issue Computer Vision-Based Human Activity Recognition)

► Show Figures

Figure 1

19 pages, 8850 KB

Open AccessArticle

Intelligent Defect Recognition of Glazed Components in Ancient Buildings Based on Binocular Vision

by Youshan Zhao, Xiaolan Zhang, Ming Guo, Haoyu Han, Jiayi Wang, Yaofeng Wang, Xiaoxu Li and Ming Huang

Buildings 2025, 15(20), 3641; https://doi.org/10.3390/buildings15203641 - 10 Oct 2025

Viewed by 200

Abstract

Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed [...] Read more.

Glazed components in ancient Chinese architecture hold profound historical and cultural value. However, over time, environmental erosion, physical impacts, and human disturbances gradually lead to various forms of damage, severely impacting the durability and stability of the buildings. Therefore, preventive protection of glazed components is crucial. The key to preventive protection lies in the early detection and repair of damage, thereby extending the component’s service life and preventing significant structural damage. To address this challenge, this study proposes a Restoration-Scale Identification (RSI) method that integrates depth information. By combining RGB-D images acquired from a depth camera with intrinsic camera parameters, and embedding a Convolutional Block Attention Module (CBAM) into the backbone network, the method dynamically enhances critical feature regions. It then employs a scale restoration strategy to accurately identify damage areas and recover the physical dimensions of glazed components from a global perspective. In addition, we constructed a dedicated semantic segmentation dataset for glazed tile damage, focusing on cracks and spalling. Both qualitative and quantitative evaluation results demonstrate that, compared with various high-performance semantic segmentation methods, our approach significantly improves the accuracy and robustness of damage detection in glazed components. The achieved accuracy deviates by only ±10 mm from high-precision laser scanning, a level of precision that is essential for reliably identifying and assessing subtle damages in complex glazed architectural elements. By integrating depth information, real scale information can be effectively obtained during the intelligent recognition process, thereby efficiently and accurately identifying the type of damage and size information of glazed components, and realizing the conversion from two-dimensional (2D) pixel coordinates to local three-dimensional (3D) coordinates, providing a scientific basis for the protection and restoration of ancient buildings, and ensuring the long-term stability of cultural heritage and the inheritance of historical value. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

24 pages, 6407 KB

Open AccessArticle

Lightweight SCC-YOLO for Winter Jujube Detection and 3D Localization with Cross-Platform Deployment Evaluation

by Meng Zhou, Yaohua Hu, Anxiang Huang, Yiwen Chen, Xing Tong, Mengfei Liu and Yunxiao Pan

Agriculture 2025, 15(19), 2092; https://doi.org/10.3390/agriculture15192092 - 8 Oct 2025

Viewed by 317

Abstract

Harvesting winter jujubes is a key step in production, yet traditional manual approaches are labor-intensive and inefficient. To overcome these challenges, we propose SCC-YOLO, a lightweight method for winter jujube detection, 3D localization, and cross-platform deployment, aiming to support intelligent harvesting. In this [...] Read more.

Harvesting winter jujubes is a key step in production, yet traditional manual approaches are labor-intensive and inefficient. To overcome these challenges, we propose SCC-YOLO, a lightweight method for winter jujube detection, 3D localization, and cross-platform deployment, aiming to support intelligent harvesting. In this study, RGB-D cameras were integrated with an improved YOLOv11 network optimized by ShuffleNetV2, CBAM, and a redesigned C2f_WTConv module, which enables joint spatial–frequency feature modeling and enhances small-object detection in complex orchard conditions. The model was trained on a diversified dataset with extensive augmentation to ensure robustness. In addition, the original localization loss was replaced with DIoU to improve bounding box regression accuracy. A robotic harvesting system was developed, and an Eye-to-Hand calibration-based 3D localization pipeline was implemented to map fruit coordinates to the robot workspace for accurate picking. To validate engineering applicability, the SCC-YOLO model was deployed on both desktop (PyTorch and ONNX Runtime) and mobile (NCNN with Vulkan+FP16) platforms, and FPS, latency, and stability were comparatively analyzed. Experimental results showed that SCC-YOLO improved mAP by 5.6% over YOLOv11, significantly enhanced detection precision and robustness, and achieved real-time performance on mobile devices while maintaining peak throughput on high-performance desktops. Field and laboratory tests confirmed the system’s effectiveness for detection, localization, and harvesting efficiency, demonstrating its adaptability to diverse deployment environments and its potential for broader agricultural applications. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

36 pages, 20759 KB

Open AccessArticle

Autonomous UAV Landing and Collision Avoidance System for Unknown Terrain Utilizing Depth Camera with Actively Actuated Gimbal

by Piotr Łuczak and Grzegorz Granosik

Sensors 2025, 25(19), 6165; https://doi.org/10.3390/s25196165 - 5 Oct 2025

Viewed by 759

Abstract

Autonomous landing capability is crucial for fully autonomous UAV flight. Currently, most solutions use either color imaging from a camera pointed down, lidar sensors, dedicated landing spots, beacons, or a combination of these approaches. Classical strategies can be limited by either no color [...] Read more.

Autonomous landing capability is crucial for fully autonomous UAV flight. Currently, most solutions use either color imaging from a camera pointed down, lidar sensors, dedicated landing spots, beacons, or a combination of these approaches. Classical strategies can be limited by either no color data when lidar is used, limited obstacle perception when only color imaging is used, a low field of view from a single RGB-D sensor, or the requirement for the landing spot to be prepared in advance. In this paper, a new approach is proposed where an RGB-D camera mounted on a gimbal is used. The gimbal is actively actuated to counteract the limited field of view while color images and depth information are provided by the RGB-D camera. Furthermore, a combined UAV-and-gimbal-motion strategy is proposed to counteract the low maximum range of depth perception to provide static obstacle detection and avoidance, while preserving safe operating conditions for low-altitude flight, near potential obstacles. The system is developed using a PX4 flight stack, CubeOrange flight controller, and Jetson nano onboard computer. The system was flight-tested in simulation conditions and statically tested on a real vehicle. Results show the correctness of the system architecture and possibility of deployment in real conditions. Full article

(This article belongs to the Special Issue UAV-Based Sensing and Autonomous Technologies)

► Show Figures

Figure 1

16 pages, 1698 KB

Open AccessArticle

Fall Detection by Deep Learning-Based Bimodal Movement and Pose Sensing with Late Fusion

by Haythem Rehouma and Mounir Boukadoum

Sensors 2025, 25(19), 6035; https://doi.org/10.3390/s25196035 - 1 Oct 2025

Viewed by 492

Abstract

The timely detection of falls among the elderly remains challenging. Single modality sensing approaches using inertial measurement units (IMUs) or vision-based monitoring systems frequently exhibit high false positives and compromised accuracy under suboptimal operating conditions. We propose a novel bimodal deep learning-based bimodal [...] Read more.

The timely detection of falls among the elderly remains challenging. Single modality sensing approaches using inertial measurement units (IMUs) or vision-based monitoring systems frequently exhibit high false positives and compromised accuracy under suboptimal operating conditions. We propose a novel bimodal deep learning-based bimodal sensing framework to address the problem, by leveraging a memory-based autoencoder neural network for inertial abnormality detection and an attention-based neural network for visual pose assessment, with late fusion at the decision level. Our experimental evaluation with a custom dataset of simulated falls and routine activities, captured with waist-mounted IMUs and RGB cameras under dim lighting, shows significant performance improvement by the described bimodal late-fusion system, with an F1-score of 97.3% and, most notably, a false-positive rate of 3.6% significantly lower than the 11.3% and 8.9% with IMU-only and vision-only baselines, respectively. These results confirm the robustness of the described fall detection approach and validate its applicability to real-time fall detection under different light settings, including nighttime conditions. Full article

(This article belongs to the Special Issue Sensor-Based Human Activity Recognition)

► Show Figures

Figure 1

34 pages, 9527 KB

Open AccessArticle

High-Resolution 3D Thermal Mapping: From Dual-Sensor Calibration to Thermally Enriched Point Clouds

by Neri Edgardo Güidi, Andrea di Filippo and Salvatore Barba

Appl. Sci. 2025, 15(19), 10491; https://doi.org/10.3390/app151910491 - 28 Sep 2025

Viewed by 481

Abstract

Thermal imaging is increasingly applied in remote sensing to identify material degradation, monitor structural integrity, and support energy diagnostics. However, its adoption is limited by the low spatial resolution of thermal sensors compared to RGB cameras. This study proposes a modular pipeline to [...] Read more.

Thermal imaging is increasingly applied in remote sensing to identify material degradation, monitor structural integrity, and support energy diagnostics. However, its adoption is limited by the low spatial resolution of thermal sensors compared to RGB cameras. This study proposes a modular pipeline to generate thermally enriched 3D point clouds by fusing RGB and thermal imagery acquired simultaneously with a dual-sensor unmanned aerial vehicle system. The methodology includes geometric calibration of both cameras, image undistortion, cross-spectral feature matching, and projection of radiometric data onto the photogrammetric model through a computed homography. Thermal values are extracted using a custom parser and assigned to 3D points based on visibility masks and interpolation strategies. Calibration achieved 81.8% chessboard detection, yielding subpixel reprojection errors. Among twelve evaluated algorithms, LightGlue retained 99% of its matches and delivered a reprojection accuracy of 18.2% at 1 px, 65.1% at 3 px and 79% at 5 px. A case study on photovoltaic panels demonstrates the method’s capability to map thermal patterns with low temperature deviation from ground-truth data. Developed entirely in Python, the workflow integrates into Agisoft Metashape or other software. The proposed approach enables cost-effective, high-resolution thermal mapping with applications in civil engineering, cultural heritage conservation, and environmental monitoring applications. Full article

(This article belongs to the Special Issue Intelligent Techniques and 3D Virtual Reconstruction for Architectural Heritage)

► Show Figures

Figure 1

19 pages, 4445 KB

Open AccessArticle

Hyperspectral Imaging-Based Deep Learning Method for Detecting Quarantine Diseases in Apples

by Hang Zhang, Naibo Ye, Jingru Gong, Huajie Xue, Peihao Wang, Binbin Jiao, Liping Yin and Xi Qiao

Foods 2025, 14(18), 3246; https://doi.org/10.3390/foods14183246 - 18 Sep 2025

Viewed by 650

Abstract

Rapid detection of quarantine diseases in apples is essential for import–export control but remains difficult because routine inspections rely on manual visual checks that limit automation at port scale. A fast, non-destructive system suitable for deployment at customs is therefore needed. In this [...] Read more.

Rapid detection of quarantine diseases in apples is essential for import–export control but remains difficult because routine inspections rely on manual visual checks that limit automation at port scale. A fast, non-destructive system suitable for deployment at customs is therefore needed. In this study, three common apple quarantine pathogens were targeted using hyperspectral images acquired by a close-range hyperspectral camera and analyzed with a convolutional neural network (CNN). Symptoms of these diseases often appear similar in RGB images, making reliable differentiation difficult. Reflectance from 400 to 1000 nm was recorded to provide richer spectral detail for separating subtle disease signatures. To quantify stage-dependent differences, average reflectance curves were extracted for apples infected by each pathogen at early, middle, and late lesion stages. A CNN tailored to hyperspectral inputs, termed HSC-Resnet, was designed with an increased number of convolutional channels to accommodate the broad spectral dimension and with channel and spatial attention integrated to highlight informative bands and regions. HSC-Resnet achieved a precision of 95.51%, indicating strong potential for fast, accurate, and non-destructive detection of apple quarantine diseases in import–export management. Full article

(This article belongs to the Special Issue Advancements and Applications of Imaging and Hyperspectral Technologies in Non-Destructive Food Testing)

► Show Figures

Figure 1

26 pages, 1882 KB

Open AccessArticle

TAT-SARNet: A Transformer-Attentive Two-Stream Soccer Action Recognition Network with Multi-Dimensional Feature Fusion and Hierarchical Temporal Classification

by Abdulrahman Alqarafi and Bassam Almogadwy

Mathematics 2025, 13(18), 3011; https://doi.org/10.3390/math13183011 - 17 Sep 2025

Viewed by 524

Abstract

(1) Background: Soccer action recognition (SAR) is essential in modern sports analytics, supporting automated performance evaluation, tactical strategy analysis, and detailed player behavior modeling. Although recent advances in deep learning and computer vision have enhanced SAR capabilities, many existing methods remain limited to [...] Read more.

(1) Background: Soccer action recognition (SAR) is essential in modern sports analytics, supporting automated performance evaluation, tactical strategy analysis, and detailed player behavior modeling. Although recent advances in deep learning and computer vision have enhanced SAR capabilities, many existing methods remain limited to coarse-grained classifications, grouping actions into broad categories such as attacking, defending, or goalkeeping. These models often fall short in capturing fine-grained distinctions, contextual nuances, and long-range temporal dependencies. Transformer-based approaches offer potential improvements but are typically constrained by the need for large-scale datasets and high computational demands, limiting their practical applicability. Moreover, current SAR systems frequently encounter difficulties in handling occlusions, background clutter, and variable camera angles, which contribute to misclassifications and reduced accuracy. (2) Methods: To overcome these challenges, we propose TAT-SARNet, a structured framework designed for accurate and fine-grained SAR. The model begins by applying Sparse Dilated Attention (SDA) to emphasize relevant spatial dependencies while mitigating background noise. Refined spatial features are then processed through the Split-Stream Feature Processing Module (SSFPM), which separately extracts appearance-based (RGB) and motion-based (optical flow) features using ResNet and 3D CNNs. These features are temporally refined by the Multi-Granular Temporal Processing (MGTP) module, which integrates ResIncept Patch Consolidation (RIPC) and Progressive Scale Construction Module (PSCM) to capture both short- and long-range temporal patterns. The output is then fused via the Context-Guided Dual Transformer (CGDT), which models spatiotemporal interactions through a Bi-Transformer Connector (BTC) and Channel–Spatial Attention Block (CSAB); (3) Results: Finally, the Cascaded Temporal Classification (CTC) module maps these features to fine-grained action categories, enabling robust recognition even under challenging conditions such as occlusions and rapid movements. (4) Conclusions: This end-to-end architecture ensures high precision in complex real-world soccer scenarios. Full article

(This article belongs to the Special Issue Artificial Intelligence: Deep Learning and Computer Vision)

► Show Figures

Figure 1

Search Results (1,318)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,318)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI