Sensors

Research

Jump to: Review

21 pages, 9651 KiB

Open AccessArticle

Self-Supervised Visual Tracking via Image Synthesis and Domain Adversarial Learning

by Gu Geng, Sida Zhou, Jianing Tang, Xinming Zhang, Qiao Liu and Di Yuan

Sensors 2025, 25(15), 4621; https://doi.org/10.3390/s25154621 - 25 Jul 2025

Viewed by 308

Abstract

With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on [...] Read more.

With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on costly manual annotations; however, existing methods often train on incomplete object representations, resulting in inaccurate localization during inference. In addition, current methods typically struggle when applied to deep networks. To address these limitations, we propose a novel self-supervised tracking framework based on image synthesis and domain adversarial learning. We first construct a large-scale database of real-world target objects, then synthesize training video pairs by randomly inserting these targets into background frames while applying geometric and appearance transformations to simulate realistic variations. To reduce domain shift introduced by synthetic content, we incorporate a domain classification branch after feature extraction and adopt domain adversarial training to encourage feature alignment between real and synthetic domains. Experimental results on five standard tracking benchmarks demonstrate that our method significantly enhances tracking accuracy compared to existing self-supervised approaches without introducing any additional labeling cost. The proposed framework not only ensures complete target coverage during training but also shows strong scalability to deeper network architectures, offering a practical and effective solution for real-world tracking applications. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

21 pages, 5105 KiB

Open AccessArticle

A Dynamic Kalman Filtering Method for Multi-Object Fruit Tracking and Counting in Complex Orchards

by Yaning Zhai, Ling Zhang, Xin Hu, Fanghu Yang and Yang Huang

Sensors 2025, 25(13), 4138; https://doi.org/10.3390/s25134138 - 2 Jul 2025

Viewed by 586

Abstract

With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby [...] Read more.

With the rapid development of agricultural intelligence in recent years, automatic fruit detection and counting technologies have become increasingly significant for optimizing orchard management and advancing precision agriculture. However, existing deep learning-based models are primarily designed to process static and single-frame images, thereby failing to meet the large-scale detection and counting demands in the dynamically changing scenes of modern orchards. To address these challenges, this paper proposes a multi-object fruit tracking and counting method, which integrates an improved YOLO-based object detection algorithm with a dynamically optimized Kalman filter. By optimizing the network structure, the improved YOLO detection model provides high-quality detection results for subsequent tracking tasks. Then a modified Kalman filter with a variable forgetting factor is integrated to dynamically adjust the weighting of historical data, enabling the model to adapt to changes in observation and motion noise. Moreover, fruit targets are associated using a combined strategy based on Intersection over Union (IoU) and Re-Identification (Re-ID) features, improving the accuracy and stability of object matching. Consequently, the continuous tracking and precise counting of fruits in video sequences are achieved. Experimental results with image frames of fruits in video sequence are demonstrated, showing that the proposed method performs robust and continuous tracking (MOTA of 95.0% and HOTA of 82.4%). For fruit counting, the method attains a high coefficient-of-determination of 0.85 and a low root-mean-square error (RMSE) of 1.57, exhibiting high accuracy and stability of fruit detection, tracking and counting in video sequences under complex orchard environments. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

22 pages, 92602 KiB

Open AccessArticle

Source-Free Model Transferability Assessment for Smart Surveillance via Randomly Initialized Networks

by Wei-Cheng Wang, Sam Leroux and Pieter Simoens

Sensors 2025, 25(13), 3856; https://doi.org/10.3390/s25133856 - 20 Jun 2025

Viewed by 385

Abstract

Smart surveillance cameras are increasingly employed for automated tasks such as event and anomaly detection within smart city infrastructures. However, the heterogeneity of deployment environments, ranging from densely populated urban intersections to quiet residential neighborhoods, renders the use of a single, universal model [...] Read more.

Smart surveillance cameras are increasingly employed for automated tasks such as event and anomaly detection within smart city infrastructures. However, the heterogeneity of deployment environments, ranging from densely populated urban intersections to quiet residential neighborhoods, renders the use of a single, universal model suboptimal. To address this, we propose the construction of a model zoo comprising models trained for diverse environmental contexts. We introduce an automated transferability assessment framework that identifies the most suitable model for a new deployment site. This task is particularly challenging in smart surveillance settings, where both source data and labeled target data are typically unavailable. Existing approaches often depend on pretrained embeddings or assumptions about model uncertainty, which may not hold reliably in real-world scenarios. In contrast, our method leverages embeddings generated by randomly initialized neural networks (RINNs) to construct task-agnostic reference embeddings without relying on pretraining. By comparing feature representations of the target data extracted using both pretrained models and RINNs, this method eliminates the need for labeled data. Structural similarity between embeddings is quantified using minibatch-Centered Kernel Alignment (CKA), enabling efficient and scalable model ranking. We evaluate our method on realistic surveillance datasets across multiple downstream tasks, including object tagging, anomaly detection, and event classification. Our embedding-level score achieves high correlations with ground-truth model rankings (relative to fine-tuned baselines), attaining Kendall’s

τ

values of 0.95, 0.94, and 0.89 on these tasks, respectively. These results demonstrate that our framework consistently selects the most transferable model, even when the specific downstream task or objective is unknown. This confirms the practicality of our approach as a robust, low-cost precursor to model adaptation or retraining. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

19 pages, 1563 KiB

Open AccessArticle

Small Object Tracking in LiDAR Point Clouds: Learning the Target-Awareness Prototype and Fine-Grained Search Region

by Shengjing Tian, Yinan Han, Xiantong Zhao and Xiuping Liu

Sensors 2025, 25(12), 3633; https://doi.org/10.3390/s25123633 - 10 Jun 2025

Viewed by 832

Abstract

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point [...] Read more.

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point cloud data. Current methods predominantly focus on developing universal frameworks for general object categories, often sidelining the persistent difficulties associated with small objects. These challenges stem from a scarcity of foreground points and a low tolerance for disturbances. To this end, we propose a deep neural network framework that trains a Siamese network for feature extraction and innovatively incorporates two pivotal modules: the target-awareness prototype mining (TAPM) module and the regional grid subdivision (RGS) module. The TAPM module utilizes the reconstruction mechanism of the masked auto-encoder to distill prototypes within the feature space, thereby enhancing the salience of foreground points and aiding in the precise localization of small objects. To heighten the tolerance of disturbances in feature maps, the RGS module is devised to retrieve detailed features of the search area, capitalizing on Vision Transformer and pixel shuffle technologies. Furthermore, beyond standard experimental configurations, we have meticulously crafted scaling experiments to assess the robustness of various trackers when dealing with small objects. Comprehensive evaluations show our method achieves a mean Success of 64.9% and 60.4% under original and scaled settings, outperforming benchmarks by +3.6% and +5.4%, respectively. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

23 pages, 7950 KiB

Open AccessArticle

Tripartite: Tackling Realistic Noisy Labels with More Precise Partitions

by Lida Yu, Xuefeng Liang, Chang Cao, Longshan Yao and Xingyu Liu

Sensors 2025, 25(11), 3369; https://doi.org/10.3390/s25113369 - 27 May 2025

Viewed by 392

Abstract

Samples in large-scale datasets may be mislabeled for various reasons, and deep models are inclined to over-fit some noisy samples using conventional training procedures. The key solution is to alleviate the harm of these noisy labels. Many existing methods try to divide training [...] Read more.

Samples in large-scale datasets may be mislabeled for various reasons, and deep models are inclined to over-fit some noisy samples using conventional training procedures. The key solution is to alleviate the harm of these noisy labels. Many existing methods try to divide training data into clean and noisy subsets in terms of loss values. We observe that a reason hindering the better performance of deep models is the uncertain samples, which have relatively small losses and often appear in real-world datasets. Due to small losses, many uncertain noisy samples are divided into the clean subset and then degrade models’ performance. Instead, we propose a Tripartite solution to partition training data into three subsets, uncertain, clean and noisy according to the following criteria: the inconsistency of the predictions of two networks and the given labels. Tripartite considerably improves the quality of the clean subset. Moreover, to maximize the value of clean samples in the uncertain subset and minimize the harm of noisy labels, we apply low-weight learning and a semi-supervised learning, respectively. Extensive experiments demonstrate that Tripartite can filter out noisy samples more precisely and outperforms most state-of-the-art methods on four benchmark datasets and especially real-world datasets. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

23 pages, 8052 KiB

Open AccessArticle

Embedded Vision System for Thermal Face Detection Using Deep Learning

by Isidro Robledo-Vega, Scarllet Osuna-Tostado, Abraham Efraím Rodríguez-Mata, Carmen Leticia García-Mata, Pedro Rafael Acosta-Cano and Rogelio Enrique Baray-Arana

Sensors 2025, 25(10), 3126; https://doi.org/10.3390/s25103126 - 15 May 2025

Viewed by 843

Abstract

Face detection technology is essential for surveillance and security projects; however, algorithms designed to detect faces in color images often struggle in poor lighting conditions. In this paper, we describe the development of an embedded vision system designed to detect human faces by [...] Read more.

Face detection technology is essential for surveillance and security projects; however, algorithms designed to detect faces in color images often struggle in poor lighting conditions. In this paper, we describe the development of an embedded vision system designed to detect human faces by analyzing images captured with thermal infrared sensors, thereby overcoming the limitations imposed by varying illumination conditions. All variants of the Ultralytics YOLOv8 and YOLO11 models were trained on the Terravic Facial IR database and tested on the Charlotte-ThermalFace database; the YOLO11 model achieved slightly higher performance metrics. We compared the performance of two embedded system boards: the NVIDIA Jetson Orin Nano and the NVIDIA Jetson Xavier NX, while running the trained model in inference mode. The NVIDIA Jetson Orin Nano performed better in terms of inference time. The developed embedded vision system based on these platforms accurately detects faces in thermal images in real-time. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

21 pages, 18640 KiB

Open AccessArticle

High-Precision Pose Measurement of Containers on the Transfer Platform of the Dual-Trolley Quayside Container Crane Based on Machine Vision

by Jiaqi Wang, Mengjie He, Yujie Zhang, Zhiwei Zhang, Octavian Postolache and Chao Mi

Sensors 2025, 25(9), 2760; https://doi.org/10.3390/s25092760 - 27 Apr 2025

Viewed by 665

Abstract

To address the high-precision measurement requirements for container pose on dual-trolley quayside crane-transfer platforms, this paper proposes a machine vision-based measurement method that resolves the challenges of multi-scale lockhole detection and precision demands caused by complex illumination and perspective deformation in port operational [...] Read more.

To address the high-precision measurement requirements for container pose on dual-trolley quayside crane-transfer platforms, this paper proposes a machine vision-based measurement method that resolves the challenges of multi-scale lockhole detection and precision demands caused by complex illumination and perspective deformation in port operational environments. A hardware system comprising fixed cameras and edge computing modules is established, integrated with an adaptive image-enhancement preprocessing algorithm to enhance feature robustness under complex illumination conditions. A multi-scale adaptive frequency object-detection framework is developed based on YOLO11, achieving improved detection accuracy for multi-scale lockhole keypoints in perspective-distortion scenarios (mAP@0.5 reaches 95.1%, 4.7% higher than baseline models) through dynamic balancing of high–low-frequency features and adaptive convolution kernel adjustments. An enhanced EPnP optimization algorithm incorporating lockhole coplanar constraints is proposed, establishing a 2D–3D coordinate transformation model that reduces pose-estimation errors to millimeter level (planar MAE-P = 0.024 m) and sub-angular level (MAE-

θ

= 0.11°). Experimental results demonstrate that the proposed method outperforms existing solutions in container pose-deviation-detection accuracy, efficiency, and stability, proving to be a feasible measurement approach. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

21 pages, 3436 KiB

Open AccessArticle

A Multi-Modal Light Sheet Microscope for High-Resolution 3D Tomographic Imaging with Enhanced Raman Scattering and Computational Denoising

by Pooja Kumari, Björn Van Marwick, Johann Kern and Matthias Rädle

Sensors 2025, 25(8), 2386; https://doi.org/10.3390/s25082386 - 9 Apr 2025

Cited by 1 | Viewed by 735

Abstract

Three-dimensional (3D) cellular models, such as spheroids, serve as pivotal systems for understanding complex biological phenomena in histology, oncology, and tissue engineering. In response to the growing need for advanced imaging capabilities, we present a novel multi-modal Raman light sheet microscope designed to [...] Read more.

Three-dimensional (3D) cellular models, such as spheroids, serve as pivotal systems for understanding complex biological phenomena in histology, oncology, and tissue engineering. In response to the growing need for advanced imaging capabilities, we present a novel multi-modal Raman light sheet microscope designed to capture elastic (Rayleigh) and inelastic (Raman) scattering, along with fluorescence signals, in a single platform. By leveraging a shorter excitation wavelength (532 nm) to boost Raman scattering efficiency and incorporating robust fluorescence suppression, the system achieves label-free, high-resolution tomographic imaging without the drawbacks commonly associated with near-infrared modalities. An accompanying Deep Image Prior (DIP) seamlessly integrates with the microscope to provide unsupervised denoising and resolution enhancement, preserving critical molecular details and minimizing extraneous artifacts. Altogether, this synergy of optical and computational strategies underscores the potential for in-depth, 3D imaging of biomolecular and structural features in complex specimens and sets the stage for future advancements in biomedical research, diagnostics, and therapeutics. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

38 pages, 18311 KiB

Open AccessArticle

Design of an Interactive Exercise and Leisure System for the Elderly Integrating Artificial Intelligence and Motion-Sensing Technology

by Chao-Ming Wang, Cheng-Hao Shao and Yu-Ching Lin

Sensors 2025, 25(7), 2315; https://doi.org/10.3390/s25072315 - 5 Apr 2025

Viewed by 761

Abstract

In response to the global trend of population aging, the issue of providing elderly individuals suitable leisure and entertainment has become increasingly important. In this study, it aims to utilize artificial intelligence (AI) technology to offer the elderly with a healthy and enjoyable [...] Read more.

In response to the global trend of population aging, the issue of providing elderly individuals suitable leisure and entertainment has become increasingly important. In this study, it aims to utilize artificial intelligence (AI) technology to offer the elderly with a healthy and enjoyable exercise and leisure experience. A human–machine interactive system is designed using computer vision, a subfield of AI, to promote positive physical adaptation for the elderly. The relevant literature on the needs of the elderly, technology, exercise, leisure, and AI techniques is reviewed. Case studies of interactive devices for exercise and leisure for the elderly, both domestically and internationally, are summarized to establish the prototype concept for system design. The proposed interactive exercise and leisure system is developed by integrating motion-sensing interfaces and real-time object detection using the YOLO algorithm. The system’s effectiveness is evaluated through questionnaire surveys and participant interviews, with the collected survey data analyzed statistically using IBM SPSS 26 and AMOS 23. Findings indicate that (1) AI technology provides new and enjoyable interactive experiences for the elderly’s exercise and leisure; (2) positive impacts are made on the elderly’s health and well-being; and (3) the system’s acceptance and attractiveness increase when elements related to personal experiences are incorporated into the system. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

18 pages, 976 KiB

Open AccessArticle

TipSegNet: Fingertip Segmentation in Contactless Fingerprint Imaging

by Laurenz Ruzicka, Bernhard Kohn and Clemens Heitzinger

Sensors 2025, 25(6), 1824; https://doi.org/10.3390/s25061824 - 14 Mar 2025

Cited by 1 | Viewed by 1038

Abstract

Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art [...] Read more.

Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art performance in segmenting fingertips directly from grayscale hand images. TipSegNet leverages a ResNeXt-101 backbone for robust feature extraction, combined with a Feature Pyramid Network (FPN) for multi-scale representation, enabling accurate segmentation across varying finger poses and image qualities. Furthermore, we employ an extensive data augmentation strategy to enhance the model’s generalizability and robustness. This model was trained and evaluated using a combined dataset of 2257 labeled hand images. TipSegNet outperforms existing methods, achieving a mean intersection over union (mIoU) of 0.987 and an accuracy of 0.999, representing a significant advancement in contactless fingerprint segmentation. This enhanced accuracy has the potential to substantially improve the reliability and effectiveness of contactless biometric systems in real-world applications. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

18 pages, 117603 KiB

Open AccessArticle

A Novel Framework for Remote Sensing Image Synthesis with Optimal Transport

by Jinlong He, Xia Yuan, Yong Kou and Yanci Zhang

Sensors 2025, 25(6), 1792; https://doi.org/10.3390/s25061792 - 13 Mar 2025

Viewed by 652

Abstract

We propose a Generative Adversarial Network (GAN)-based method for image synthesis from remote sensing data. Remote sensing images (RSIs) are characterized by large intraclass variance and small interclass variance, which pose significant challenges for image synthesis. To address these issues, we design and [...] Read more.

We propose a Generative Adversarial Network (GAN)-based method for image synthesis from remote sensing data. Remote sensing images (RSIs) are characterized by large intraclass variance and small interclass variance, which pose significant challenges for image synthesis. To address these issues, we design and incorporate two distinct attention modules into our GAN framework. The first attention module is designed to enhance similarity measurements within label groups, effectively handling the large intraclass variance by reinforcing consistency within the same class. The second module addresses the small interclass variance by promoting diversity between adjacent label groups, ensuring that different classes are distinguishable in the generated images. These attention mechanisms play a critical role in generating more realistic and visually coherent images. Our GAN-based framework consists of an advanced image encoder and a generator, which are both enhanced by these attention modules. Furthermore, we integrate optimal transport (OT) to approximate human perceptual loss, further improving the visual quality of the synthesized images. Experimental results demonstrate the effectiveness of our approach, highlighting its advantages in the remote sensing field by significantly enhancing the quality of generated RSIs. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

21 pages, 5199 KiB

Open AccessArticle

Enhanced U-Net with Multi-Module Integration for High-Exposure-Difference Image Restoration

by Bo-Lin Jian, Hong-Li Chang and Chieh-Li Chen

Sensors 2025, 25(4), 1105; https://doi.org/10.3390/s25041105 - 12 Feb 2025

Viewed by 1136

Abstract

Machine vision systems have become key unmanned vehicle (UAV) sensing systems. However, under different weather conditions, the lighting direction and the selection of exposure parameters often lead to insufficient or missing object features in images, which could fail to perform various tasks. As [...] Read more.

Machine vision systems have become key unmanned vehicle (UAV) sensing systems. However, under different weather conditions, the lighting direction and the selection of exposure parameters often lead to insufficient or missing object features in images, which could fail to perform various tasks. As a result, images need to be restored to secure information that is accessible when facing a light exposure difference environment. Many applications require real-time and high-quality images; therefore, efficiently restoring images is also important for subsequent tasks. This study adopts supervised learning to solve the problem of images under lighting discrepancies using a U-Net as our main architecture of the network and adding suitable modules to its encoder and decoder, such as inception-like blocks, dual attention units, selective kernel feature fusion, and denoising blocks. In addition to the ablation study, we also compared the quality of image light restoration with other network models using BAID and considered the overall trainable parameters of the model to construct a lightweight, high-exposure-difference image restoration model. The performance of the proposed network was demonstrated by enhancing image detection and recognition. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

26 pages, 8033 KiB

Open AccessArticle

Time-Series Image-Based Automated Monitoring Framework for Visible Facilities: Focusing on Installation and Retention Period

by Seonjun Yoon and Hyunsoo Kim

Sensors 2025, 25(2), 574; https://doi.org/10.3390/s25020574 - 20 Jan 2025

Cited by 4 | Viewed by 1180

Abstract

In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study [...] Read more.

In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study proposes an integrated monitoring framework that combines computer vision-based object detection and document recognition techniques. The system utilizes YOLOv5 for detecting jack supports in both construction drawings and on-site images captured through wearable cameras, while optical character recognition (OCR) and natural language processing (NLP) extract installation and dismantling timelines from work orders. The proposed framework enables continuous monitoring and ensures compliance with retention periods by aligning on-site data with documented requirements. The analysis includes 23 jack supports monitored daily over 28 days under varying environmental conditions, including lighting changes and structural configurations. The results demonstrate that the system achieves an average detection accuracy of 94.1%, effectively identifying discrepancies and reducing misclassifications caused by structural similarities and environmental variations. To further enhance detection reliability, methods such as color differentiation, construction plan overlays, and vertical segmentation were implemented, significantly improving performance. This study validates the effectiveness of integrating visual and textual data sources in dynamic construction environments. The study supports the development of automated monitoring systems by improving accuracy and safety measures while reducing manual intervention, offering practical insights for future construction site management. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

26 pages, 29211 KiB

Open AccessArticle

Performance Evaluation of Deep Learning Image Classification Modules in the MUN-ABSAI Ice Risk Management Architecture

by Ravindu G. Thalagala, Oscar De Silva, Dan Oldford and David Molyneux

Sensors 2025, 25(2), 326; https://doi.org/10.3390/s25020326 - 8 Jan 2025

Viewed by 1260

Abstract

The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture [...] Read more.

The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture with multiple modules, including ice classification, risk assessment, ice floe tracking, and ice load calculations. A comprehensive dataset of 15,000 ice images was created using public sources and contributions from the Canadian Coast Guard, and it was used to support the development and evaluation of the system. The performance of the YOLOv8n-cls model was assessed for the ice classification modules due to its fast inference speed, making it suitable for resource-constrained onboard systems. The training and evaluation were conducted across multiple platforms, including Roboflow, Google Colab, and Compute Canada, allowing for a detailed comparison of their capabilities in image preprocessing, model training, and real-time inference generation. The results demonstrate that Image Classification Module I achieved a validation accuracy of 99.4%, while Module II attained 98.6%. Inference times were found to be less than 1 s in Colab and under 3 s on a stand-alone system, confirming the architecture’s efficiency in real-time ice condition monitoring. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

18 pages, 7569 KiB

Open AccessArticle

Design and Validation of an Obstacle Contact Sensor for Aerial Robots

by Victor Vigara-Puche, Manuel J. Fernandez-Gonzalez and Matteo Fumagalli

Sensors 2024, 24(23), 7814; https://doi.org/10.3390/s24237814 - 6 Dec 2024

Viewed by 1171

Abstract

Obstacle contact detection is not commonly employed in autonomous robots, which mainly depend on avoidance algorithms, limiting their effectiveness in cluttered environments. Current contact-detection techniques suffer from blind spots or discretized detection points, and rigid platforms further limit performance by merely detecting the [...] Read more.

Obstacle contact detection is not commonly employed in autonomous robots, which mainly depend on avoidance algorithms, limiting their effectiveness in cluttered environments. Current contact-detection techniques suffer from blind spots or discretized detection points, and rigid platforms further limit performance by merely detecting the presence of a collision without providing detailed feedback. To address these challenges, we propose an innovative contact sensor design that improves autonomous navigation through physical contact detection. The system features an elastic collision platform integrated with flex sensors to measure displacements during collisions. A neural network-based contact-detection algorithm converts the flex sensor data into actionable contact information. The collision system was validated with collisions through manual flights and autonomous contact-based missions, using sensor feedback for real-time collision recovery. The experimental results demonstrated the system’s capability to accurately detect contact events and estimate collision parameters, even under dynamic conditions. The proposed solution offers a robust approach to improving autonomous navigation in complex environments and provides a solid foundation for future research on contact-based navigation systems. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

32 pages, 6180 KiB

Open AccessArticle

Improving Sewer Damage Inspection: Development of a Deep Learning Integration Concept for a Multi-Sensor System

by Jan Thomas Jung and Alexander Reiterer

Sensors 2024, 24(23), 7786; https://doi.org/10.3390/s24237786 - 5 Dec 2024

Cited by 3 | Viewed by 2477

Abstract

The maintenance and inspection of sewer pipes are essential to urban infrastructure but remain predominantly manual, resource-intensive, and prone to human error. Advancements in artificial intelligence (AI) and computer vision offer significant potential to automate sewer inspections, improving reliability and reducing costs. However, [...] Read more.

The maintenance and inspection of sewer pipes are essential to urban infrastructure but remain predominantly manual, resource-intensive, and prone to human error. Advancements in artificial intelligence (AI) and computer vision offer significant potential to automate sewer inspections, improving reliability and reducing costs. However, the existing vision-based inspection robots fail to provide data quality sufficient for training reliable deep learning (DL) models. To address these limitations, we propose a novel multi-sensor robotic system coupled with a DL integration concept. Following a comprehensive review of the current 2D (image) and 3D (point cloud) sewage pipe inspection methods, we identify key limitations and propose a system incorporating a camera array, front camera, and LiDAR sensor to optimise surface capture and enhance data quality. Damage types are assigned to the sensor best suited for their detection and quantification, while tailored DL models are proposed for each sensor type to maximise performance. This approach enables the optimal detection and processing of relevant damage types, achieving higher accuracy for each compared to single-sensor systems. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

14 pages, 4606 KiB

Open AccessArticle

Research on Multi-Scale Spatio-Temporal Graph Convolutional Human Behavior Recognition Method Incorporating Multi-Granularity Features

by Yulin Wang, Tao Song, Yichen Yang and Zheng Hong

Sensors 2024, 24(23), 7595; https://doi.org/10.3390/s24237595 - 28 Nov 2024

Viewed by 1088

Abstract

Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a [...] Read more.

Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a skeleton fine-grained partitioning strategy is proposed, which initializes the skeleton data into data streams of different granularities. An adaptive cross-scale feature fusion layer is designed using a normalized Gaussian function to perform feature fusion among different granularities, guiding the model to focus on discriminative feature representations among similar behaviors through fine-grained features. Secondly, a sparse multi-scale adjacency matrix is introduced to solve the bias weighting problem that amplifies the multi-scale spatial domain modeling process under multi-granularity conditions. Finally, an end-to-end graph convolutional neural network is constructed to improve the feature expression ability of spatio-temporal receptive field information and enhance the robustness of recognition between similar behaviors. The feasibility of the proposed algorithm was verified on the public behavior recognition dataset MSR Action 3D, with a accuracy of 95.67%, which is superior to existing behavior recognition methods. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

16 pages, 5030 KiB

Open AccessArticle

YOLO-APDM: Improved YOLOv8 for Road Target Detection in Infrared Images

by Song Ling, Xianggong Hong and Yongchao Liu

Sensors 2024, 24(22), 7197; https://doi.org/10.3390/s24227197 - 10 Nov 2024

Cited by 3 | Viewed by 3334

Abstract

A new algorithm called YOLO-APDM is proposed to address low quality and multi-scale target detection issues in infrared road scenes. The method reconstructs the neck section of the algorithm using the multi-scale attentional feature fusion idea. Based on this reconstruction, the P2 detection [...] Read more.

A new algorithm called YOLO-APDM is proposed to address low quality and multi-scale target detection issues in infrared road scenes. The method reconstructs the neck section of the algorithm using the multi-scale attentional feature fusion idea. Based on this reconstruction, the P2 detection layer is established, which optimizes network structure, enhances multi-scale feature fusion performance, and expands the detection network’s capacity for multi-scale complicated targets. Replacing YOLOv8’s C2f module with C2f-DCNv3 increases the network’s ability to focus on the target region while lowering the amount of model parameters. The MSCA mechanism is added after the backbone’s SPPF module to improve the model’s detection performance by directing the network’s detection resources to the major road target detection zone. Experimental results show that on the FLIR_ADAS_v2 dataset retaining eight main categories, using YOLO-APDM compared to YOLOv8n, mAP_@0.5 and mAP_@0.5:0.95 increased by 6.6% and 5.0%, respectively. On the M3FD dataset, mAP_@0.5 and mAP_@0.5 increased by 8.1% and 5.9%, respectively. The number of model parameters and model size were reduced by 8.6% and 4.8%, respectively. The design requirements of the high-precision detection of infrared road targets were achieved while considering the requirements of model complexity control. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

Review

Jump to: Research

46 pages, 2791 KiB

Open AccessReview

YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11

by Makara Mao and Min Hong

Sensors 2025, 25(7), 2270; https://doi.org/10.3390/s25072270 - 3 Apr 2025

Cited by 12 | Viewed by 4241

Abstract

Automated fabric defect detection is crucial for improving quality control, reducing manual labor, and optimizing efficiency in the textile industry. Traditional inspection methods rely heavily on human oversight, which makes them prone to subjectivity, inefficiency, and inconsistency in high-speed manufacturing environments. This review [...] Read more.

Automated fabric defect detection is crucial for improving quality control, reducing manual labor, and optimizing efficiency in the textile industry. Traditional inspection methods rely heavily on human oversight, which makes them prone to subjectivity, inefficiency, and inconsistency in high-speed manufacturing environments. This review systematically examines the evolution of the You Only Look Once (YOLO) object detection framework from YOLO-v1 to YOLO-v11, emphasizing architectural advancements such as attention-based feature refinement and Transformer integration and their impact on fabric defect detection. Unlike prior studies focusing on specific YOLO variants, this work comprehensively compares the entire YOLO family, highlighting key innovations and their practical implications. We also discuss the challenges, including dataset limitations, domain generalization, and computational constraints, proposing future solutions such as synthetic data generation, federated learning, and edge AI deployment. By bridging the gap between academic advancements and industrial applications, this review is a practical guide for selecting and optimizing YOLO models for fabric inspection, paving the way for intelligent quality control systems. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

Journal Menu

Journal Browser

AI-Based Computer Vision Sensors & Systems

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (19 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI