sensors-logo

Journal Browser

Journal Browser

AI-Based Computer Vision Sensors & Systems

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: 10 May 2025 | Viewed by 10297

Special Issue Editors


E-Mail Website
Guest Editor
School of Artificial Intelligence, Xidian University, Xi'an, China
Interests: visual cognitive computing; computer vision; visual big data mining; intelligent algorithms
Special Issues, Collections and Topics in MDPI journals
Guangzhou Institute of Technology, Xidian University, Guangzhou 510555, China
Interests: computer vision; object tracking; machine learning; self-supervised learning; active learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) in computer vision sensors and systems is a specialized field that encompasses both current and historical AI advancements, as well as their potential impacts and future prospects within sensor technology and its applications. This Special Issue explores the innovative landscape of AI-based computer vision sensors and systems, emphasizing their transformative potential across a variety of applications. These technologies harness advanced imaging techniques to facilitate real-time analysis and intelligent decision-making. We invite researchers to submit original articles investigating the use of RGB cameras, depth cameras (e.g., LiDAR), and thermal cameras in conjunction with image processing units (GPUs, TPUs, FPGAs) and object detection frameworks (e.g., YOLO, SSD, Faster R-CNN) in areas such as environmental monitoring, healthcare imaging, autonomous navigation, and security systems. This issue aims to highlight innovative methodologies that enhance object detection, gesture recognition, and real-time analytics, ultimately advancing the capabilities of computer vision.

Prof. Dr. Xuefeng Liang
Dr. Di Yuan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • RGB cameras
  • depth cameras (LiDAR)
  • thermal cameras
  • image processing units (GPUs, TPUs, FPGAs)
  • YOLO (You Only Look Once)
  • gesture recognition systems autonomous navigation systems
  • augmented reality (AR)
  • industrial automation
  • smart surveillance systems

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

21 pages, 3436 KiB  
Article
A Multi-Modal Light Sheet Microscope for High-Resolution 3D Tomographic Imaging with Enhanced Raman Scattering and Computational Denoising
by Pooja Kumari, Björn Van Marwick, Johann Kern and Matthias Rädle
Sensors 2025, 25(8), 2386; https://doi.org/10.3390/s25082386 - 9 Apr 2025
Viewed by 222
Abstract
Three-dimensional (3D) cellular models, such as spheroids, serve as pivotal systems for understanding complex biological phenomena in histology, oncology, and tissue engineering. In response to the growing need for advanced imaging capabilities, we present a novel multi-modal Raman light sheet microscope designed to [...] Read more.
Three-dimensional (3D) cellular models, such as spheroids, serve as pivotal systems for understanding complex biological phenomena in histology, oncology, and tissue engineering. In response to the growing need for advanced imaging capabilities, we present a novel multi-modal Raman light sheet microscope designed to capture elastic (Rayleigh) and inelastic (Raman) scattering, along with fluorescence signals, in a single platform. By leveraging a shorter excitation wavelength (532 nm) to boost Raman scattering efficiency and incorporating robust fluorescence suppression, the system achieves label-free, high-resolution tomographic imaging without the drawbacks commonly associated with near-infrared modalities. An accompanying Deep Image Prior (DIP) seamlessly integrates with the microscope to provide unsupervised denoising and resolution enhancement, preserving critical molecular details and minimizing extraneous artifacts. Altogether, this synergy of optical and computational strategies underscores the potential for in-depth, 3D imaging of biomolecular and structural features in complex specimens and sets the stage for future advancements in biomedical research, diagnostics, and therapeutics. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

38 pages, 18311 KiB  
Article
Design of an Interactive Exercise and Leisure System for the Elderly Integrating Artificial Intelligence and Motion-Sensing Technology
by Chao-Ming Wang, Cheng-Hao Shao and Yu-Ching Lin
Sensors 2025, 25(7), 2315; https://doi.org/10.3390/s25072315 - 5 Apr 2025
Viewed by 225
Abstract
In response to the global trend of population aging, the issue of providing elderly individuals suitable leisure and entertainment has become increasingly important. In this study, it aims to utilize artificial intelligence (AI) technology to offer the elderly with a healthy and enjoyable [...] Read more.
In response to the global trend of population aging, the issue of providing elderly individuals suitable leisure and entertainment has become increasingly important. In this study, it aims to utilize artificial intelligence (AI) technology to offer the elderly with a healthy and enjoyable exercise and leisure experience. A human–machine interactive system is designed using computer vision, a subfield of AI, to promote positive physical adaptation for the elderly. The relevant literature on the needs of the elderly, technology, exercise, leisure, and AI techniques is reviewed. Case studies of interactive devices for exercise and leisure for the elderly, both domestically and internationally, are summarized to establish the prototype concept for system design. The proposed interactive exercise and leisure system is developed by integrating motion-sensing interfaces and real-time object detection using the YOLO algorithm. The system’s effectiveness is evaluated through questionnaire surveys and participant interviews, with the collected survey data analyzed statistically using IBM SPSS 26 and AMOS 23. Findings indicate that (1) AI technology provides new and enjoyable interactive experiences for the elderly’s exercise and leisure; (2) positive impacts are made on the elderly’s health and well-being; and (3) the system’s acceptance and attractiveness increase when elements related to personal experiences are incorporated into the system. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

18 pages, 976 KiB  
Article
TipSegNet: Fingertip Segmentation in Contactless Fingerprint Imaging
by Laurenz Ruzicka, Bernhard Kohn and Clemens Heitzinger
Sensors 2025, 25(6), 1824; https://doi.org/10.3390/s25061824 - 14 Mar 2025
Viewed by 284
Abstract
Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art [...] Read more.
Contactless fingerprint recognition systems offer a hygienic, user-friendly, and efficient alternative to traditional contact-based methods. However, their accuracy heavily relies on precise fingertip detection and segmentation, particularly under challenging background conditions. This paper introduces TipSegNet, a novel deep learning model that achieves state-of-the-art performance in segmenting fingertips directly from grayscale hand images. TipSegNet leverages a ResNeXt-101 backbone for robust feature extraction, combined with a Feature Pyramid Network (FPN) for multi-scale representation, enabling accurate segmentation across varying finger poses and image qualities. Furthermore, we employ an extensive data augmentation strategy to enhance the model’s generalizability and robustness. This model was trained and evaluated using a combined dataset of 2257 labeled hand images. TipSegNet outperforms existing methods, achieving a mean intersection over union (mIoU) of 0.987 and an accuracy of 0.999, representing a significant advancement in contactless fingerprint segmentation. This enhanced accuracy has the potential to substantially improve the reliability and effectiveness of contactless biometric systems in real-world applications. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

18 pages, 117603 KiB  
Article
A Novel Framework for Remote Sensing Image Synthesis with Optimal Transport
by Jinlong He, Xia Yuan, Yong Kou and Yanci Zhang
Sensors 2025, 25(6), 1792; https://doi.org/10.3390/s25061792 - 13 Mar 2025
Viewed by 330
Abstract
We propose a Generative Adversarial Network (GAN)-based method for image synthesis from remote sensing data. Remote sensing images (RSIs) are characterized by large intraclass variance and small interclass variance, which pose significant challenges for image synthesis. To address these issues, we design and [...] Read more.
We propose a Generative Adversarial Network (GAN)-based method for image synthesis from remote sensing data. Remote sensing images (RSIs) are characterized by large intraclass variance and small interclass variance, which pose significant challenges for image synthesis. To address these issues, we design and incorporate two distinct attention modules into our GAN framework. The first attention module is designed to enhance similarity measurements within label groups, effectively handling the large intraclass variance by reinforcing consistency within the same class. The second module addresses the small interclass variance by promoting diversity between adjacent label groups, ensuring that different classes are distinguishable in the generated images. These attention mechanisms play a critical role in generating more realistic and visually coherent images. Our GAN-based framework consists of an advanced image encoder and a generator, which are both enhanced by these attention modules. Furthermore, we integrate optimal transport (OT) to approximate human perceptual loss, further improving the visual quality of the synthesized images. Experimental results demonstrate the effectiveness of our approach, highlighting its advantages in the remote sensing field by significantly enhancing the quality of generated RSIs. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

21 pages, 5199 KiB  
Article
Enhanced U-Net with Multi-Module Integration for High-Exposure-Difference Image Restoration
by Bo-Lin Jian, Hong-Li Chang and Chieh-Li Chen
Sensors 2025, 25(4), 1105; https://doi.org/10.3390/s25041105 - 12 Feb 2025
Viewed by 626
Abstract
Machine vision systems have become key unmanned vehicle (UAV) sensing systems. However, under different weather conditions, the lighting direction and the selection of exposure parameters often lead to insufficient or missing object features in images, which could fail to perform various tasks. As [...] Read more.
Machine vision systems have become key unmanned vehicle (UAV) sensing systems. However, under different weather conditions, the lighting direction and the selection of exposure parameters often lead to insufficient or missing object features in images, which could fail to perform various tasks. As a result, images need to be restored to secure information that is accessible when facing a light exposure difference environment. Many applications require real-time and high-quality images; therefore, efficiently restoring images is also important for subsequent tasks. This study adopts supervised learning to solve the problem of images under lighting discrepancies using a U-Net as our main architecture of the network and adding suitable modules to its encoder and decoder, such as inception-like blocks, dual attention units, selective kernel feature fusion, and denoising blocks. In addition to the ablation study, we also compared the quality of image light restoration with other network models using BAID and considered the overall trainable parameters of the model to construct a lightweight, high-exposure-difference image restoration model. The performance of the proposed network was demonstrated by enhancing image detection and recognition. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

26 pages, 8033 KiB  
Article
Time-Series Image-Based Automated Monitoring Framework for Visible Facilities: Focusing on Installation and Retention Period
by Seonjun Yoon and Hyunsoo Kim
Sensors 2025, 25(2), 574; https://doi.org/10.3390/s25020574 - 20 Jan 2025
Cited by 2 | Viewed by 783
Abstract
In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study [...] Read more.
In the construction industry, ensuring the proper installation, retention, and dismantling of temporary structures, such as jack supports, is critical to maintaining safety and project timelines. However, inconsistencies between on-site data and construction documentation remain a significant challenge. To address this, this study proposes an integrated monitoring framework that combines computer vision-based object detection and document recognition techniques. The system utilizes YOLOv5 for detecting jack supports in both construction drawings and on-site images captured through wearable cameras, while optical character recognition (OCR) and natural language processing (NLP) extract installation and dismantling timelines from work orders. The proposed framework enables continuous monitoring and ensures compliance with retention periods by aligning on-site data with documented requirements. The analysis includes 23 jack supports monitored daily over 28 days under varying environmental conditions, including lighting changes and structural configurations. The results demonstrate that the system achieves an average detection accuracy of 94.1%, effectively identifying discrepancies and reducing misclassifications caused by structural similarities and environmental variations. To further enhance detection reliability, methods such as color differentiation, construction plan overlays, and vertical segmentation were implemented, significantly improving performance. This study validates the effectiveness of integrating visual and textual data sources in dynamic construction environments. The study supports the development of automated monitoring systems by improving accuracy and safety measures while reducing manual intervention, offering practical insights for future construction site management. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

26 pages, 29211 KiB  
Article
Performance Evaluation of Deep Learning Image Classification Modules in the MUN-ABSAI Ice Risk Management Architecture
by Ravindu G. Thalagala, Oscar De Silva, Dan Oldford and David Molyneux
Sensors 2025, 25(2), 326; https://doi.org/10.3390/s25020326 - 8 Jan 2025
Viewed by 715
Abstract
The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture [...] Read more.
The retreat of Arctic sea ice has opened new maritime routes, offering faster shipping opportunities; however, these routes present significant navigational challenges due to the harsh ice conditions. To address these challenges, this paper proposes a deep learning-based Arctic ice risk management architecture with multiple modules, including ice classification, risk assessment, ice floe tracking, and ice load calculations. A comprehensive dataset of 15,000 ice images was created using public sources and contributions from the Canadian Coast Guard, and it was used to support the development and evaluation of the system. The performance of the YOLOv8n-cls model was assessed for the ice classification modules due to its fast inference speed, making it suitable for resource-constrained onboard systems. The training and evaluation were conducted across multiple platforms, including Roboflow, Google Colab, and Compute Canada, allowing for a detailed comparison of their capabilities in image preprocessing, model training, and real-time inference generation. The results demonstrate that Image Classification Module I achieved a validation accuracy of 99.4%, while Module II attained 98.6%. Inference times were found to be less than 1 s in Colab and under 3 s on a stand-alone system, confirming the architecture’s efficiency in real-time ice condition monitoring. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

18 pages, 7569 KiB  
Article
Design and Validation of an Obstacle Contact Sensor for Aerial Robots
by Victor Vigara-Puche, Manuel J. Fernandez-Gonzalez and Matteo Fumagalli
Sensors 2024, 24(23), 7814; https://doi.org/10.3390/s24237814 - 6 Dec 2024
Viewed by 902
Abstract
Obstacle contact detection is not commonly employed in autonomous robots, which mainly depend on avoidance algorithms, limiting their effectiveness in cluttered environments. Current contact-detection techniques suffer from blind spots or discretized detection points, and rigid platforms further limit performance by merely detecting the [...] Read more.
Obstacle contact detection is not commonly employed in autonomous robots, which mainly depend on avoidance algorithms, limiting their effectiveness in cluttered environments. Current contact-detection techniques suffer from blind spots or discretized detection points, and rigid platforms further limit performance by merely detecting the presence of a collision without providing detailed feedback. To address these challenges, we propose an innovative contact sensor design that improves autonomous navigation through physical contact detection. The system features an elastic collision platform integrated with flex sensors to measure displacements during collisions. A neural network-based contact-detection algorithm converts the flex sensor data into actionable contact information. The collision system was validated with collisions through manual flights and autonomous contact-based missions, using sensor feedback for real-time collision recovery. The experimental results demonstrated the system’s capability to accurately detect contact events and estimate collision parameters, even under dynamic conditions. The proposed solution offers a robust approach to improving autonomous navigation in complex environments and provides a solid foundation for future research on contact-based navigation systems. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

32 pages, 6180 KiB  
Article
Improving Sewer Damage Inspection: Development of a Deep Learning Integration Concept for a Multi-Sensor System
by Jan Thomas Jung and Alexander Reiterer
Sensors 2024, 24(23), 7786; https://doi.org/10.3390/s24237786 - 5 Dec 2024
Cited by 1 | Viewed by 1409
Abstract
The maintenance and inspection of sewer pipes are essential to urban infrastructure but remain predominantly manual, resource-intensive, and prone to human error. Advancements in artificial intelligence (AI) and computer vision offer significant potential to automate sewer inspections, improving reliability and reducing costs. However, [...] Read more.
The maintenance and inspection of sewer pipes are essential to urban infrastructure but remain predominantly manual, resource-intensive, and prone to human error. Advancements in artificial intelligence (AI) and computer vision offer significant potential to automate sewer inspections, improving reliability and reducing costs. However, the existing vision-based inspection robots fail to provide data quality sufficient for training reliable deep learning (DL) models. To address these limitations, we propose a novel multi-sensor robotic system coupled with a DL integration concept. Following a comprehensive review of the current 2D (image) and 3D (point cloud) sewage pipe inspection methods, we identify key limitations and propose a system incorporating a camera array, front camera, and LiDAR sensor to optimise surface capture and enhance data quality. Damage types are assigned to the sensor best suited for their detection and quantification, while tailored DL models are proposed for each sensor type to maximise performance. This approach enables the optimal detection and processing of relevant damage types, achieving higher accuracy for each compared to single-sensor systems. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

14 pages, 4606 KiB  
Article
Research on Multi-Scale Spatio-Temporal Graph Convolutional Human Behavior Recognition Method Incorporating Multi-Granularity Features
by Yulin Wang, Tao Song, Yichen Yang and Zheng Hong
Sensors 2024, 24(23), 7595; https://doi.org/10.3390/s24237595 - 28 Nov 2024
Viewed by 866
Abstract
Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a [...] Read more.
Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a skeleton fine-grained partitioning strategy is proposed, which initializes the skeleton data into data streams of different granularities. An adaptive cross-scale feature fusion layer is designed using a normalized Gaussian function to perform feature fusion among different granularities, guiding the model to focus on discriminative feature representations among similar behaviors through fine-grained features. Secondly, a sparse multi-scale adjacency matrix is introduced to solve the bias weighting problem that amplifies the multi-scale spatial domain modeling process under multi-granularity conditions. Finally, an end-to-end graph convolutional neural network is constructed to improve the feature expression ability of spatio-temporal receptive field information and enhance the robustness of recognition between similar behaviors. The feasibility of the proposed algorithm was verified on the public behavior recognition dataset MSR Action 3D, with a accuracy of 95.67%, which is superior to existing behavior recognition methods. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

16 pages, 5030 KiB  
Article
YOLO-APDM: Improved YOLOv8 for Road Target Detection in Infrared Images
by Song Ling, Xianggong Hong and Yongchao Liu
Sensors 2024, 24(22), 7197; https://doi.org/10.3390/s24227197 - 10 Nov 2024
Cited by 2 | Viewed by 2303
Abstract
A new algorithm called YOLO-APDM is proposed to address low quality and multi-scale target detection issues in infrared road scenes. The method reconstructs the neck section of the algorithm using the multi-scale attentional feature fusion idea. Based on this reconstruction, the P2 detection [...] Read more.
A new algorithm called YOLO-APDM is proposed to address low quality and multi-scale target detection issues in infrared road scenes. The method reconstructs the neck section of the algorithm using the multi-scale attentional feature fusion idea. Based on this reconstruction, the P2 detection layer is established, which optimizes network structure, enhances multi-scale feature fusion performance, and expands the detection network’s capacity for multi-scale complicated targets. Replacing YOLOv8’s C2f module with C2f-DCNv3 increases the network’s ability to focus on the target region while lowering the amount of model parameters. The MSCA mechanism is added after the backbone’s SPPF module to improve the model’s detection performance by directing the network’s detection resources to the major road target detection zone. Experimental results show that on the FLIR_ADAS_v2 dataset retaining eight main categories, using YOLO-APDM compared to YOLOv8n, mAP@0.5 and mAP@0.5:0.95 increased by 6.6% and 5.0%, respectively. On the M3FD dataset, mAP@0.5 and mAP@0.5 increased by 8.1% and 5.9%, respectively. The number of model parameters and model size were reduced by 8.6% and 4.8%, respectively. The design requirements of the high-precision detection of infrared road targets were achieved while considering the requirements of model complexity control. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

Review

Jump to: Research

46 pages, 2791 KiB  
Review
YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11
by Makara Mao and Min Hong
Sensors 2025, 25(7), 2270; https://doi.org/10.3390/s25072270 - 3 Apr 2025
Cited by 1 | Viewed by 532
Abstract
Automated fabric defect detection is crucial for improving quality control, reducing manual labor, and optimizing efficiency in the textile industry. Traditional inspection methods rely heavily on human oversight, which makes them prone to subjectivity, inefficiency, and inconsistency in high-speed manufacturing environments. This review [...] Read more.
Automated fabric defect detection is crucial for improving quality control, reducing manual labor, and optimizing efficiency in the textile industry. Traditional inspection methods rely heavily on human oversight, which makes them prone to subjectivity, inefficiency, and inconsistency in high-speed manufacturing environments. This review systematically examines the evolution of the You Only Look Once (YOLO) object detection framework from YOLO-v1 to YOLO-v11, emphasizing architectural advancements such as attention-based feature refinement and Transformer integration and their impact on fabric defect detection. Unlike prior studies focusing on specific YOLO variants, this work comprehensively compares the entire YOLO family, highlighting key innovations and their practical implications. We also discuss the challenges, including dataset limitations, domain generalization, and computational constraints, proposing future solutions such as synthetic data generation, federated learning, and edge AI deployment. By bridging the gap between academic advancements and industrial applications, this review is a practical guide for selecting and optimizing YOLO models for fabric inspection, paving the way for intelligent quality control systems. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

Back to TopTop