Sensors

2024

Jump to: 2023, 2022, 2021

26 pages, 7432 KiB

Open AccessArticle

Research on Deep Learning Detection Model for Pedestrian Objects in Complex Scenes Based on Improved YOLOv7

by Jun Hu, Yongqi Zhou, Hao Wang, Peng Qiao and Wenwei Wan

Sensors 2024, 24(21), 6922; https://doi.org/10.3390/s24216922 - 29 Oct 2024

Viewed by 1581

Abstract

Objective: Pedestrian detection is very important for the environment perception and safety action of intelligent robots and autonomous driving, and is the key to ensuring the safe action of intelligent robots and auto assisted driving. Methods: In response to the characteristics of pedestrian [...] Read more.

Objective: Pedestrian detection is very important for the environment perception and safety action of intelligent robots and autonomous driving, and is the key to ensuring the safe action of intelligent robots and auto assisted driving. Methods: In response to the characteristics of pedestrian objects occupying a small image area, diverse poses, complex scenes and severe occlusion, this paper proposes an improved pedestrian object detection method based on the YOLOv7 model, which adopts the Convolutional Block Attention Module (CBAM) attention mechanism and Deformable ConvNets v2 (DCNv2) in the two Efficient Layer Aggregation Network (ELAN) modules of the backbone feature extraction network. In addition, the detection head is replaced with a Dynamic Head (DyHead) detector head with an attention mechanism; unnecessary background information around the pedestrian object is also effectively excluded, making the model learn more concentrated feature representations. Results: Compared with the original model, the log-average miss rate of the improved YOLOv7 model is significantly reduced in both the Citypersons dataset and the INRIA dataset. Conclusions: The improved YOLOv7 model proposed in this paper achieved good performance improvement in different pedestrian detection problems. The research in this paper has important reference significance for pedestrian detection in complex scenes such as small, occluded and overlapping objects. Full article

► Show Figures

Figure 1

18 pages, 10520 KiB

Open AccessArticle

Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition

by Hyun-Woo Kim and Yong-Suk Choi

Sensors 2024, 24(21), 6842; https://doi.org/10.3390/s24216842 - 24 Oct 2024

Viewed by 1099

Abstract

Conventional approaches to video action recognition perform global attention over the entire video patches, which may be ineffective due to the temporal redundancy of video frames. Recent works on masked video modeling adopt a high-ratio tube masking and reconstruction strategy as a pre-training [...] Read more.

Conventional approaches to video action recognition perform global attention over the entire video patches, which may be ineffective due to the temporal redundancy of video frames. Recent works on masked video modeling adopt a high-ratio tube masking and reconstruction strategy as a pre-training method to mitigate the problem of focusing on spatial features well but not on temporal features. Inspired by this pre-training method, we propose Fusion Attention for Action Recognition (FAR), which fuses the sparse-dense attention patterns specialized for temporal features with global attention during fine-tuning. FAR has three main components: head-split sparse-dense attention (HSDA), token–group interaction, and group-averaged classifier. First, HSDA splits the head of multi-head self-attention to fuse global and sparse-dense attention. The sparse-dense attention is divided into groups of tube-shaped patches to focus on temporal features. Second, token–group interaction is used to improve information exchange between divided patch groups. Finally, the group-averaged classifier uses spatio-temporal features from different patch groups to improve performance. The proposed method uses the weight parameters that are pre-trained with VideoMAE and MVD, and achieves higher performance (+0.1–0.4%) with less computation than models fine-tuned with global attention on Something-Something V2 and Kinetics-400. Moreover, qualitative comparisons show that FAR captures temporal features quite well in highly redundant video frames. The FAR approach demonstrates improved action recognition with efficient computation, and exploring its adaptability across different pre-training methods presents an interesting direction for future research. Full article

► Show Figures

Figure 1

19 pages, 10893 KiB

Open AccessArticle

An Improved YOLOv8 OBB Model for Ship Detection through Stable Diffusion Data Augmentation

by Sang Feng, Yi Huang and Ning Zhang

Sensors 2024, 24(17), 5850; https://doi.org/10.3390/s24175850 - 9 Sep 2024

Cited by 6 | Viewed by 3410

Abstract

Unmanned aerial vehicles (UAVs) with cameras offer extensive monitoring capabilities and exceptional maneuverability, making them ideal for real-time ship detection and effective ship management. However, ship detection by camera-equipped UAVs faces challenges when it comes to multi-viewpoints, multi-scales, environmental variability, and dataset scarcity. [...] Read more.

Unmanned aerial vehicles (UAVs) with cameras offer extensive monitoring capabilities and exceptional maneuverability, making them ideal for real-time ship detection and effective ship management. However, ship detection by camera-equipped UAVs faces challenges when it comes to multi-viewpoints, multi-scales, environmental variability, and dataset scarcity. To overcome these challenges, we proposed a data augmentation method based on stable diffusion to generate new images for expanding the dataset. Additionally, we improve the YOLOv8n OBB model by incorporating the BiFPN structure and EMA module, enhancing its ability to detect multi-viewpoint and multi-scale ship instances. Through multiple comparative experiments, we evaluated the effectiveness of our proposed data augmentation method and the improved model. The results indicated that our proposed data augmentation method is effective for low-volume datasets with complex object features. The YOLOv8n-BiFPN-EMA OBB model we proposed performed well in detecting multi-viewpoint and multi-scale ship instances, achieving the mAP (@0.5) of 92.3%, the mAP (@0.5:0.95) of 77.5%, a reduction of 0.8 million in model parameters, and a detection speed that satisfies real-time ship detection requirements. Full article

► Show Figures

Figure 1

22 pages, 17310 KiB

Open AccessArticle

Adjacent Image Augmentation and Its Framework for Self-Supervised Learning in Anomaly Detection

by Gi Seung Kwon and Yong Suk Choi

Sensors 2024, 24(17), 5616; https://doi.org/10.3390/s24175616 - 29 Aug 2024

Cited by 1 | Viewed by 1314

Abstract

Anomaly detection has gained significant attention with the advancements in deep neural networks. Effective training requires both normal and anomalous data, but this often leads to a class imbalance, as anomalous data is scarce. Traditional augmentation methods struggle to maintain the correlation between [...] Read more.

Anomaly detection has gained significant attention with the advancements in deep neural networks. Effective training requires both normal and anomalous data, but this often leads to a class imbalance, as anomalous data is scarce. Traditional augmentation methods struggle to maintain the correlation between anomalous patterns and their surroundings. To address this, we propose an adjacent augmentation technique that generates synthetic anomaly images, preserving object shapes while distorting contours to enhance correlation. Experimental results show that adjacent augmentation captures high-quality anomaly features, achieving superior AU-ROC and AU-PR scores compared to existing methods. Additionally, our technique produces synthetic normal images, aiding in learning detailed normal data features and reducing sensitivity to minor variations. Our framework considers all training images within a batch as positive pairs, pairing them with synthetic normal images as positive pairs and with synthetic anomaly images as negative pairs. This compensates for the lack of anomalous features and effectively distinguishes between normal and anomalous features, mitigating class imbalance. Using the ResNet50 network, our model achieved perfect AU-ROC and AU-PR scores of 100% in the bottle category of the MVTec-AD dataset. We are also investigating the relationship between anomalous pattern size and detection performance. Full article

► Show Figures

Figure 1

18 pages, 5410 KiB

Open AccessArticle

Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos

by Fernando J. Rendón-Segador, Juan A. Álvarez-García and Luis M. Soria-Morillo

Sensors 2024, 24(16), 5429; https://doi.org/10.3390/s24165429 - 22 Aug 2024

Viewed by 1277

Abstract

This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance [...] Read more.

This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance on all datasets (XD-Violence, UCF-Crime, NTU-CCTV Fights, UBI-Fights, Real Life Violence Situations, MediEval, RWF-2000, Hockey Fights, Violent Flows, Surveillance Camera Fights, and Movies Fight), achieving high AUC ROC and AUC PR values (up to 99% and 100%, respectively). However, the generalization of CrimeNet to cross-dataset experiments posed some problems, resulting in a 20–30% decrease in performance, for instance, training in UCF-Crime and testing in XD-Violence resulted in 70.20% in AUC ROC. The sliding window model with adaptive thresholding effectively solves these problems by automatically adjusting the violence detection threshold, resulting in a substantial improvement in detection accuracy. By applying the sliding window model as post-processing to CrimeNet results, we were able to improve detection accuracy by 10% to 15% in cross-dataset experiments. Future lines of research include improving generalization, addressing data imbalance, exploring multimodal representations, testing in real-world applications, and extending the approach to complex human interactions. Full article

► Show Figures

Figure 1

21 pages, 12734 KiB

Open AccessArticle

Improved Unsupervised Stitching Algorithm for Multiple Environments SuperUDIS

by Haoze Wu, Chun Bao, Qun Hao, Jie Cao and Li Zhang

Sensors 2024, 24(16), 5352; https://doi.org/10.3390/s24165352 - 19 Aug 2024

Viewed by 1415

Abstract

Large field-of-view images are increasingly used in various environments today, and image stitching technology can make up for the limited field of view caused by hardware design. However, previous methods are constrained in various environments. In this paper, we propose a method that [...] Read more.

Large field-of-view images are increasingly used in various environments today, and image stitching technology can make up for the limited field of view caused by hardware design. However, previous methods are constrained in various environments. In this paper, we propose a method that combines the powerful feature extraction capabilities of the Superpoint algorithm and the exact feature matching capabilities of the Lightglue algorithm with the image fusion algorithm of Unsupervised Deep Image Stitching (UDIS). Our proposed method effectively improves the situation where the linear structure is distorted and the resolution is low in the stitching results of the UDIS algorithm. On this basis, we make up for the shortcomings of the UDIS fusion algorithm. For stitching fractures of UDIS in some complex situations, we optimize the loss function of UDIS. We use a second-order differential Laplacian operator to replace the difference in the horizontal and vertical directions to emphasize the continuity of the structural edges during training. Combined with the above improvements, the Super Unsupervised Deep Image Stitching (SuperUDIS) algorithm is finally formed. SuperUDIS has better performance in both qualitative and quantitative evaluations compared to the UDIS algorithm, with the PSNR index increasing by 0.5 on average and the SSIM index increasing by 0.02 on average. Moreover, the proposed method is more robust in complex environments with large color differences or multi-linear structures. Full article

► Show Figures

Figure 1

20 pages, 4884 KiB

Open AccessArticle

Deep Learning-Based Dynamic Region of Interest Autofocus Method for Grayscale Image

by Yao Wang, Chuan Wu, Yunlong Gao and Huiying Liu

Sensors 2024, 24(13), 4336; https://doi.org/10.3390/s24134336 - 4 Jul 2024

Cited by 3 | Viewed by 2152

Abstract

In the field of autofocus for optical systems, although passive focusing methods are widely used due to their cost-effectiveness, fixed focusing windows and evaluation functions in certain scenarios can still lead to focusing failures. Additionally, the lack of datasets limits the extensive research [...] Read more.

In the field of autofocus for optical systems, although passive focusing methods are widely used due to their cost-effectiveness, fixed focusing windows and evaluation functions in certain scenarios can still lead to focusing failures. Additionally, the lack of datasets limits the extensive research of deep learning methods. In this work, we propose a neural network autofocus method with the capability of dynamically selecting the region of interest (ROI). Our main work is as follows: first, we construct a dataset for automatic focusing of grayscale images; second, we transform the autofocus issue into an ordinal regression problem and propose two focusing strategies: full-stack search and single-frame prediction; and third, we construct a MobileViT network with a linear self-attention mechanism to achieve automatic focusing on dynamic regions of interest. The effectiveness of the proposed focusing method is verified through experiments, and the results show that the focusing MAE of the full-stack search can be as low as 0.094, with a focusing time of 27.8 ms, and the focusing MAE of the single-frame prediction can be as low as 0.142, with a focusing time of 27.5 ms. Full article

► Show Figures

Figure 1

24 pages, 6026 KiB

Open AccessArticle

Monocular Depth Estimation via Self-Supervised Self-Distillation

by Haifeng Hu, Yuyang Feng, Dapeng Li, Suofei Zhang and Haitao Zhao

Sensors 2024, 24(13), 4090; https://doi.org/10.3390/s24134090 - 24 Jun 2024

Cited by 3 | Viewed by 1699

Abstract

Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this [...] Read more.

Self-supervised monocular depth estimation can exhibit excellent performance in static environments due to the multi-view consistency assumption during the training process. However, it is hard to maintain depth consistency in dynamic scenes when considering the occlusion problem caused by moving objects. For this reason, we propose a method of self-supervised self-distillation for monocular depth estimation (SS-MDE) in dynamic scenes, where a deep network with a multi-scale decoder and a lightweight pose network are designed to predict depth in a self-supervised manner via the disparity, motion information, and the association between two adjacent frames in the image sequence. Meanwhile, in order to improve the depth estimation accuracy of static areas, the pseudo-depth images generated by the LeReS network are used to provide the pseudo-supervision information, enhancing the effect of depth refinement in static areas. Furthermore, a forgetting factor is leveraged to alleviate the dependency on the pseudo-supervision. In addition, a teacher model is introduced to generate depth prior information, and a multi-view mask filter module is designed to implement feature extraction and noise filtering. This can enable the student model to better learn the deep structure of dynamic scenes, enhancing the generalization and robustness of the entire model in a self-distillation manner. Finally, on four public data datasets, the performance of the proposed SS-MDE method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy (

δ_{1}

) of 89% while minimizing the error (AbsRel) by 0.102 in NYU-Depth V2 and achieving an accuracy (

δ_{1}

) of 87% while minimizing the error (AbsRel) by 0.111 in KITTI. Full article

► Show Figures

Figure 1

14 pages, 9137 KiB

Open AccessArticle

Learning-Based Non-Intrusive Electric Load Monitoring for Smart Energy Management

by Nian He, Dengfeng Liu, Zhichen Zhang, Zhiquan Lin, Tiesong Zhao and Yiwen Xu

Sensors 2024, 24(10), 3109; https://doi.org/10.3390/s24103109 - 14 May 2024

Viewed by 1562

Abstract

State-of-the-art smart cities have been calling for economic but efficient energy management over a large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze, and control electric loads of all users in the system. In this study, [...] Read more.

State-of-the-art smart cities have been calling for economic but efficient energy management over a large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze, and control electric loads of all users in the system. In this study, a non-intrusive load monitoring method was designed for smart power management using computer vision techniques popular in artificial intelligence. First of all, one-dimensional current signals are mapped onto two-dimensional color feature images using signal transforms (including the wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods. Second, a deep neural network with multi-scale feature extraction and attention mechanism is proposed to recognize all electrical loads from the color feature images. Third, a cloud-based approach was designed for the non-intrusive monitoring of all users, thereby saving energy costs during power system control. Experimental results on both public and private datasets demonstrate that the method achieves superior performances compared to its peers, and thus supports efficient energy management over a large-scale Internet of Things network. Full article

► Show Figures

Figure 1

17 pages, 3364 KiB

Open AccessArticle

A Novel Lightweight Model for Underwater Image Enhancement

by Botao Liu, Yimin Yang, Ming Zhao and Min Hu

Sensors 2024, 24(10), 3070; https://doi.org/10.3390/s24103070 - 11 May 2024

Cited by 3 | Viewed by 2526

Abstract

Underwater images suffer from low contrast and color distortion. In order to improve the quality of underwater images and reduce storage and computational resources, this paper proposes a lightweight model Rep-UWnet to enhance underwater images. The model consists of a fully connected convolutional [...] Read more.

Underwater images suffer from low contrast and color distortion. In order to improve the quality of underwater images and reduce storage and computational resources, this paper proposes a lightweight model Rep-UWnet to enhance underwater images. The model consists of a fully connected convolutional network and three densely connected RepConv blocks in series, with the input images connected to the output of each block with a Skip connection. First, the original underwater image is subjected to feature extraction by the SimSPPF module and is processed through feature summation with the original one to be produced as the input image. Then, the first convolutional layer with a kernel size of 3 × 3, generates 64 feature maps, and the multi-scale hybrid convolutional attention module enhances the useful features by reweighting the features of different channels. Second, three RepConv blocks are connected to reduce the number of parameters in extracting features and increase the test speed. Finally, a convolutional layer with 3 kernels generates enhanced underwater images. Our method reduces the number of parameters from 2.7 M to 0.45 M (around 83% reduction) but outperforms state-of-the-art algorithms by extensive experiments. Furthermore, we demonstrate our Rep-UWnet effectively improves high-level vision tasks like edge detection and single image depth estimation. This method not only surpasses the contrast method in objective quality, but also significantly improves the contrast, colorimetry, and clarity of underwater images in subjective quality. Full article

► Show Figures

Figure 1

22 pages, 16578 KiB

Open AccessArticle

YOLOv8-RMDA: Lightweight YOLOv8 Network for Early Detection of Small Target Diseases in Tea

by Rong Ye, Guoqi Shao, Yun He, Quan Gao and Tong Li

Sensors 2024, 24(9), 2896; https://doi.org/10.3390/s24092896 - 1 May 2024

Cited by 15 | Viewed by 3169

Abstract

In order to efficiently identify early tea diseases, an improved YOLOv8 lesion detection method is proposed to address the challenges posed by the complex background of tea diseases, difficulty in detecting small lesions, and low recognition rate of similar phenotypic symptoms. This method [...] Read more.

In order to efficiently identify early tea diseases, an improved YOLOv8 lesion detection method is proposed to address the challenges posed by the complex background of tea diseases, difficulty in detecting small lesions, and low recognition rate of similar phenotypic symptoms. This method focuses on detecting tea leaf blight, tea white spot, tea sooty leaf disease, and tea ring spot as the research objects. This paper presents an enhancement to the YOLOv8 network framework by introducing the Receptive Field Concentration-Based Attention Module (RFCBAM) into the backbone network to replace C2f, thereby improving feature extraction capabilities. Additionally, a mixed pooling module (Mixed Pooling SPPF, MixSPPF) is proposed to enhance information blending between features at different levels. In the neck network, the RepGFPN module replaces the C2f module to further enhance feature extraction. The Dynamic Head module is embedded in the detection head part, applying multiple attention mechanisms to improve multi-scale spatial location and multi-task perception capabilities. The inner-IoU loss function is used to replace the original CIoU, improving learning ability for small lesion samples. Furthermore, the AKConv block replaces the traditional convolution Conv block to allow for the arbitrary sampling of targets of various sizes, reducing model parameters and enhancing disease detection. the experimental results using a self-built dataset demonstrate that the enhanced YOLOv8-RMDA exhibits superior detection capabilities in detecting small target disease areas, achieving an average accuracy of 93.04% in identifying early tea lesions. When compared to Faster R-CNN, MobileNetV2, and SSD, the average precision rates of YOLOv5, YOLOv7, and YOLOv8 have shown improvements of 20.41%, 17.92%, 12.18%, 12.18%, 10.85%, 7.32%, and 5.97%, respectively. Additionally, the recall rate (R) has increased by 15.25% compared to the lowest-performing Faster R-CNN model and by 8.15% compared to the top-performing YOLOv8 model. With an FPS of 132, YOLOv8-RMDA meets the requirements for real-time detection, enabling the swift and accurate identification of early tea diseases. This advancement presents a valuable approach for enhancing the ecological tea industry in Yunnan, ensuring its healthy development. Full article

► Show Figures

Figure 1

13 pages, 1668 KiB

Open AccessArticle

AcquisitionFocus: Joint Optimization of Acquisition Orientation and Cardiac Volume Reconstruction Using Deep Learning

by Christian Weihsbach, Nora Vogt, Ziad Al-Haj Hemidi, Alexander Bigalke, Lasse Hansen, Julien Oster and Mattias P. Heinrich

Sensors 2024, 24(7), 2296; https://doi.org/10.3390/s24072296 - 4 Apr 2024

Viewed by 1370

Abstract

In cardiac cine imaging, acquiring high-quality data is challenging and time-consuming due to the artifacts generated by the heart’s continuous movement. Volumetric, fully isotropic data acquisition with high temporal resolution is, to date, intractable due to MR physics constraints. To assess whole-heart movement [...] Read more.

In cardiac cine imaging, acquiring high-quality data is challenging and time-consuming due to the artifacts generated by the heart’s continuous movement. Volumetric, fully isotropic data acquisition with high temporal resolution is, to date, intractable due to MR physics constraints. To assess whole-heart movement under minimal acquisition time, we propose a deep learning model that reconstructs the volumetric shape of multiple cardiac chambers from a limited number of input slices while simultaneously optimizing the slice acquisition orientation for this task. We mimic the current clinical protocols for cardiac imaging and compare the shape reconstruction quality of standard clinical views and optimized views. In our experiments, we show that the jointly trained model achieves accurate high-resolution multi-chamber shape reconstruction with errors of <13 mm HD95 and Dice scores of >80%, indicating its effectiveness in both simulated cardiac cine MRI and clinical cardiac MRI with a wide range of pathological shape variations. Full article

► Show Figures

Figure 1

19 pages, 4886 KiB

Open AccessArticle

Burst-Enhanced Super-Resolution Network (BESR)

by Jiaao Li, Qunbo Lv, Wenjian Zhang, Yu Zhang and Zheng Tan

Sensors 2024, 24(7), 2052; https://doi.org/10.3390/s24072052 - 23 Mar 2024

Cited by 1 | Viewed by 1718

Abstract

Multi-frame super-resolution (MFSR) leverages complementary information between image sequences of the same scene to increase the resolution of the reconstructed image. As a branch of MFSR, burst super-resolution aims to restore image details by leveraging the complementary information between noisy sequences. In this [...] Read more.

Multi-frame super-resolution (MFSR) leverages complementary information between image sequences of the same scene to increase the resolution of the reconstructed image. As a branch of MFSR, burst super-resolution aims to restore image details by leveraging the complementary information between noisy sequences. In this paper, we propose an efficient burst-enhanced super-resolution network (BESR). Specifically, we introduce Geformer, a gate-enhanced transformer, and construct an enhanced CNN-Transformer block (ECTB) by combining convolutions to enhance local perception. ECTB efficiently aggregates intra-frame context and inter-frame correlation information, yielding an enhanced feature representation. Additionally, we leverage reference features to facilitate inter-frame communication, enhancing spatiotemporal coherence among multiple frames. To address the critical processes of inter-frame alignment and feature fusion, we propose optimized pyramid alignment (OPA) and hybrid feature fusion (HFF) modules to capture and utilize complementary information between multiple frames to recover more high-frequency details. Extensive experiments demonstrate that, compared to state-of-the-art methods, BESR achieves higher efficiency and competitively superior reconstruction results. On the synthetic dataset and real-world dataset of BurstSR, our BESR achieves PSNR values of 42.79 dB and 48.86 dB, respectively, outperforming other MFSR models significantly. Full article

► Show Figures

Figure 1

31 pages, 3474 KiB

Open AccessReview

A Survey of Deep Learning Road Extraction Algorithms Using High-Resolution Remote Sensing Images

by Shaoyi Mo, Yufeng Shi, Qi Yuan and Mingyue Li

Sensors 2024, 24(5), 1708; https://doi.org/10.3390/s24051708 - 6 Mar 2024

Cited by 13 | Viewed by 5264

Abstract

Roads are the fundamental elements of transportation, connecting cities and rural areas, as well as people’s lives and work. They play a significant role in various areas such as map updates, economic development, tourism, and disaster management. The automatic extraction of road features [...] Read more.

Roads are the fundamental elements of transportation, connecting cities and rural areas, as well as people’s lives and work. They play a significant role in various areas such as map updates, economic development, tourism, and disaster management. The automatic extraction of road features from high-resolution remote sensing images has always been a hot and challenging topic in the field of remote sensing, and deep learning network models are widely used to extract roads from remote sensing images in recent years. In light of this, this paper systematically reviews and summarizes the deep-learning-based techniques for automatic road extraction from high-resolution remote sensing images. It reviews the application of deep learning network models in road extraction tasks and classifies these models into fully supervised learning, semi-supervised learning, and weakly supervised learning based on their use of labels. Finally, a summary and outlook of the current development of deep learning techniques in road extraction are provided. Full article

► Show Figures

Figure 1

2023

Jump to: 2024, 2022, 2021

22 pages, 2825 KiB

Open AccessArticle

Thermal Image Super-Resolution Based on Lightweight Dynamic Attention Network for Infrared Sensors

by Haikun Zhang, Yueli Hu and Ming Yan

Sensors 2023, 23(21), 8717; https://doi.org/10.3390/s23218717 - 25 Oct 2023

Cited by 6 | Viewed by 2790

Abstract

Infrared sensors capture infrared rays radiated by objects to form thermal images. They have a steady ability to penetrate smoke and fog, and are widely used in security monitoring, military, etc. However, civilian infrared detectors with lower resolution cannot compare with megapixel RGB [...] Read more.

Infrared sensors capture infrared rays radiated by objects to form thermal images. They have a steady ability to penetrate smoke and fog, and are widely used in security monitoring, military, etc. However, civilian infrared detectors with lower resolution cannot compare with megapixel RGB camera sensors. In this paper, we propose a dynamic attention mechanism-based thermal image super-resolution network for infrared sensors. Specifically, the dynamic attention modules adaptively reweight the outputs of the attention and non-attention branches according to features at different depths of the network. The attention branch, which consists of channel- and pixel-wise attention blocks, is responsible for extracting the most informative features, while the non-attention branch is adopted as a supplement to extract the remaining ignored features. The dynamic weights block operates with 1D convolution instead of the full multi-layer perceptron on the global average pooled features, reducing parameters and enhancing information interaction between channels, and the same structure is adopted in the channel attention block. Qualitative and quantitative results on three testing datasets demonstrate that the proposed network can superior restore high-frequency details while improving the resolution of thermal images. And the lightweight structure of the proposed network with lower computing cost can be practically deployed on edge devices, effectively improving the imaging perception quality of infrared sensors. Full article

► Show Figures

Figure 1

19 pages, 1503 KiB

Open AccessArticle

Scale-Hybrid Group Distillation with Knowledge Disentangling for Continual Semantic Segmentation

by Zichen Song, Xiaoliang Zhang and Zhaofeng Shi

Sensors 2023, 23(18), 7820; https://doi.org/10.3390/s23187820 - 12 Sep 2023

Viewed by 1352

Abstract

Continual semantic segmentation (CSS) aims to learn new tasks sequentially and extract object(s) and stuff represented by pixel-level maps of new categories while preserving the original segmentation capabilities even when the old class data is absent. Current CSS methods typically preserve the capacities [...] Read more.

Continual semantic segmentation (CSS) aims to learn new tasks sequentially and extract object(s) and stuff represented by pixel-level maps of new categories while preserving the original segmentation capabilities even when the old class data is absent. Current CSS methods typically preserve the capacities of segmenting old classes via knowledge distillation, which encounters the limitations of insufficient utilization of the semantic knowledge, i.e., only distilling the last layer of the feature encoder, and the semantic shift of background caused by directly distilling the entire feature map of the decoder. In this paper, we propose a novel CCS method based on scale-hybrid distillation and knowledge disentangling to address these limitations. Firstly, we propose a scale-hybrid group semantic distillation (SGD) method for encoding, which transfers the multi-scale knowledge from the old model’s feature encoder with group pooling refinement to improve the stability of new models. Then, the knowledge disentangling distillation (KDD) method for decoding is proposed to distillate feature maps with the guidance of the old class regions and reduce incorrect guides from old models towards better plasticity. Extensive experiments are conducted on the Pascal VOC and ADE20K datasets. Competitive performance compared with other state-of-the-art methods demonstrates the effectiveness of our proposed method. Full article

► Show Figures

Figure 1

19 pages, 4308 KiB

Open AccessArticle

Deep Sensing for Compressive Video Acquisition

by Michitaka Yoshida, Akihiko Torii, Masatoshi Okutomi, Rin-ichiro Taniguchi, Hajime Nagahara and Yasushi Yagi

Sensors 2023, 23(17), 7535; https://doi.org/10.3390/s23177535 - 30 Aug 2023

Cited by 1 | Viewed by 1784

Abstract

A camera captures multidimensional information of the real world by convolving it into two dimensions using a sensing matrix. The original multidimensional information is then reconstructed from captured images. Traditionally, multidimensional information has been captured by uniform sampling, but by optimizing the sensing [...] Read more.

A camera captures multidimensional information of the real world by convolving it into two dimensions using a sensing matrix. The original multidimensional information is then reconstructed from captured images. Traditionally, multidimensional information has been captured by uniform sampling, but by optimizing the sensing matrix, we can capture images more efficiently and reconstruct multidimensional information with high quality. Although compressive video sensing requires random sampling as a theoretical optimum, when designing the sensing matrix in practice, there are many hardware limitations (such as exposure and color filter patterns). Existing studies have found random sampling is not always the best solution for compressive sensing because the optimal sampling pattern is related to the scene context, and it is hard to manually design a sampling pattern and reconstruction algorithm. In this paper, we propose an end-to-end learning approach that jointly optimizes the sampling pattern as well as the reconstruction decoder. We applied this deep sensing approach to the video compressive sensing problem. We modeled the spatio–temporal sampling and color filter pattern using a convolutional neural network constrained by hardware limitations during network training. We demonstrated that the proposed method performs better than the manually designed method in gray-scale video and color video acquisitions. Full article

► Show Figures

Figure 1

22 pages, 898 KiB

Open AccessArticle

Small Object Detection and Tracking: A Comprehensive Review

by Behzad Mirzaei, Hossein Nezamabadi-pour, Amir Raoof and Reza Derakhshani

Sensors 2023, 23(15), 6887; https://doi.org/10.3390/s23156887 - 3 Aug 2023

Cited by 40 | Viewed by 19063

Abstract

Object detection and tracking are vital in computer vision and visual surveillance, allowing for the detection, recognition, and subsequent tracking of objects within images or video sequences. These tasks underpin surveillance systems, facilitating automatic video annotation, identification of significant events, and detection of [...] Read more.

Object detection and tracking are vital in computer vision and visual surveillance, allowing for the detection, recognition, and subsequent tracking of objects within images or video sequences. These tasks underpin surveillance systems, facilitating automatic video annotation, identification of significant events, and detection of abnormal activities. However, detecting and tracking small objects introduce significant challenges within computer vision due to their subtle appearance and limited distinguishing features, which results in a scarcity of crucial information. This deficit complicates the tracking process, often leading to diminished efficiency and accuracy. To shed light on the intricacies of small object detection and tracking, we undertook a comprehensive review of the existing methods in this area, categorizing them from various perspectives. We also presented an overview of available datasets specifically curated for small object detection and tracking, aiming to inform and benefit future research in this domain. We further delineated the most widely used evaluation metrics for assessing the performance of small object detection and tracking techniques. Finally, we examined the present challenges within this field and discussed prospective future trends. By tackling these issues and leveraging upcoming trends, we aim to push forward the boundaries in small object detection and tracking, thereby augmenting the functionality of surveillance systems and broadening their real-world applicability. Full article

► Show Figures

Figure 1

19 pages, 6755 KiB

Open AccessArticle

Pixel Intensity Resemblance Measurement and Deep Learning Based Computer Vision Model for Crack Detection and Analysis

by Nirmala Paramanandham, Kishore Rajendiran, Florence Gnana Poovathy J, Yeshwant Santhanakrishnan Premanand, Sanjeeve Raveenthiran Mallichetty and Pramod Kumar

Sensors 2023, 23(6), 2954; https://doi.org/10.3390/s23062954 - 8 Mar 2023

Cited by 5 | Viewed by 3634

Abstract

This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under [...] Read more.

This research article is aimed at improving the efficiency of a computer vision system that uses image processing for detecting cracks. Images are prone to noise when captured using drones or under various lighting conditions. To analyze this, the images were gathered under various conditions. To address the noise issue and to classify the cracks based on the severity level, a novel technique is proposed using a pixel-intensity resemblance measurement (PIRM) rule. Using PIRM, the noisy images and noiseless images were classified. Then, the noise was filtered using a median filter. The cracks were detected using VGG-16, ResNet-50 and InceptionResNet-V2 models. Once the crack was detected, the images were then segregated using a crack risk-analysis algorithm. Based on the severity level of the crack, an alert can be given to the authorized person to take the necessary action to avoid major accidents. The proposed technique achieved a 6% improvement without PIRM and a 10% improvement with the PIRM rule for the VGG-16 model. Similarly, it showed 3 and 10% for ResNet-50, 2 and 3% for Inception ResNet and a 9 and 10% increment for the Xception model. When the images were corrupted from a single noise alone, 95.6% accuracy was achieved using the ResNet-50 model for Gaussian noise, 99.65% accuracy was achieved through Inception ResNet-v2 for Poisson noise, and 99.95% accuracy was achieved by the Xception model for speckle noise. Full article

► Show Figures

Figure 1

17 pages, 5156 KiB

Open AccessArticle

Blind Video Quality Assessment for Ultra-High-Definition Video Based on Super-Resolution and Deep Reinforcement Learning

by Zefeng Ying, Da Pan and Ping Shi

Sensors 2023, 23(3), 1511; https://doi.org/10.3390/s23031511 - 29 Jan 2023

Cited by 6 | Viewed by 2966

Abstract

Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will [...] Read more.

Ultra-high-definition (UHD) video has brought new challenges to objective video quality assessment (VQA) due to its high resolution and high frame rate. Most existing VQA methods are designed for non-UHD videos—when they are employed to deal with UHD videos, the processing speed will be slow and the global spatial features cannot be fully extracted. In addition, these VQA methods usually segment the video into multiple segments, predict the quality score of each segment, and then average the quality score of each segment to obtain the quality score of the whole video. This breaks the temporal correlation of the video sequences and is inconsistent with the characteristics of human visual perception. In this paper, we present a no-reference VQA method, aiming to effectively and efficiently predict quality scores for UHD videos. First, we construct a spatial distortion feature network based on a super-resolution model (SR-SDFNet), which can quickly extract the global spatial distortion features of UHD videos. Then, to aggregate the spatial distortion features of each UHD frame, we propose a time fusion network based on a reinforcement learning model (RL-TFNet), in which the actor network continuously combines multiple frame features extracted by SR-SDFNet and outputs an action to adjust the current quality score to approximate the subjective score, and the critic network outputs action values to optimize the quality perception of the actor network. Finally, we conduct large-scale experiments on UHD VQA databases and the results reveal that, compared to other state-of-the-art VQA methods, our method achieves competitive quality prediction performance with a shorter runtime and fewer model parameters. Full article

► Show Figures

Figure 1

19 pages, 3561 KiB

Open AccessArticle

Privacy Preserving Image Encryption with Optimal Deep Transfer Learning Based Accident Severity Classification Model

by Uddagiri Sirisha and Bolem Sai Chandana

Sensors 2023, 23(1), 519; https://doi.org/10.3390/s23010519 - 3 Jan 2023

Cited by 28 | Viewed by 2801

Abstract

Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos [...] Read more.

Effective accident management acts as a vital part of emergency and traffic control systems. In such systems, accident data can be collected from different sources (unmanned aerial vehicles, surveillance cameras, on-site people, etc.) and images are considered a major source. Accident site photos and measurements are the most important evidence. Attackers will steal data and breach personal privacy, causing untold costs. The massive number of images commonly employed poses a significant challenge to privacy preservation, and image encryption can be used to accomplish cloud storage and secure image transmission. Automated severity estimation using deep-learning (DL) models becomes essential for effective accident management. Therefore, this article presents a novel Privacy Preserving Image Encryption with Optimal Deep-Learning-based Accident Severity Classification (PPIE-ODLASC) method. The primary objective of the PPIE-ODLASC algorithm is to securely transmit the accident images and classify accident severity into different levels. In the presented PPIE-ODLASC technique, two major processes are involved, namely encryption and severity classification (i.e., high, medium, low, and normal). For accident image encryption, the multi-key homomorphic encryption (MKHE) technique with lion swarm optimization (LSO)-based optimal key generation procedure is involved. In addition, the PPIE-ODLASC approach involves YOLO-v5 object detector to identify the region of interest (ROI) in the accident images. Moreover, the accident severity classification module encompasses Xception feature extractor, bidirectional gated recurrent unit (BiGRU) classification, and Bayesian optimization (BO)-based hyperparameter tuning. The experimental validation of the proposed PPIE-ODLASC algorithm is tested utilizing accident images and the outcomes are examined in terms of many measures. The comparative examination revealed that the PPIE-ODLASC technique showed an enhanced performance of 57.68 dB over other existing models. Full article

► Show Figures

Figure 1

2022

Jump to: 2024, 2023, 2021

15 pages, 4035 KiB

Open AccessArticle

Lightweight Super-Resolution with Self-Calibrated Convolution for Panoramic Videos

by Fanjie Shang, Hongying Liu, Wanhao Ma, Yuanyuan Liu, Licheng Jiao, Fanhua Shang, Lijun Wang and Zhenyu Zhou

Sensors 2023, 23(1), 392; https://doi.org/10.3390/s23010392 - 30 Dec 2022

Cited by 4 | Viewed by 2419

Abstract

Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods [...] Read more.

Panoramic videos are shot by an omnidirectional camera or a collection of cameras, and can display a view in every direction. They can provide viewers with an immersive feeling. The study of super-resolution of panoramic videos has attracted much attention, and many methods have been proposed, especially deep learning-based methods. However, due to complex architectures of all the methods, they always result in a large number of hyperparameters. To address this issue, we propose the first lightweight super-resolution method with self-calibrated convolution for panoramic videos. A new deformable convolution module is designed first, with self-calibration convolution, which can learn more accurate offset and enhance feature alignment. Moreover, we present a new residual dense block for feature reconstruction, which can significantly reduce the parameters while maintaining performance. The performance of the proposed method is compared to those of the state-of-the-art methods, and is verified on the MiG panoramic video dataset. Full article

► Show Figures

Figure 1

23 pages, 5688 KiB

Open AccessArticle

Automatic Recognition of Road Damage Based on Lightweight Attentional Convolutional Neural Network

by Han Liang, Seong-Cheol Lee and Suyoung Seo

Sensors 2022, 22(24), 9599; https://doi.org/10.3390/s22249599 - 7 Dec 2022

Cited by 12 | Viewed by 3930

Abstract

An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate [...] Read more.

An efficient road damage detection system can reduce the risk of road defects to motorists and road maintenance costs to traffic management authorities, for which a lightweight end-to-end road damage detection network is proposed in this paper, aiming at fast and automatic accurate identification and classification of multiple types of road damage. The proposed technique consists of a backbone network based on a combination of lightweight feature detection modules constituted with a multi-scale feature fusion network, which is more beneficial for target identification and classification at different distances and angles than other studies. An embedded lightweight attention module was also developed that can enhance feature information by assigning weights to multi-scale convolutional kernels to improve detection accuracy with fewer parameters. The proposed model generally has higher performance and fewer parameters than other representative models. According to our practice tests, it can identify many types of road damage based on the images captured by vehicle cameras and meet the real-time detection required when piggybacking on mobile systems. Full article

► Show Figures

Figure 1

11 pages, 7541 KiB

Open AccessArticle

Industrial Anomaly Detection with Skip Autoencoder and Deep Feature Extractor

by Ta-Wei Tang, Hakiem Hsu, Wei-Ren Huang and Kuan-Ming Li

Sensors 2022, 22(23), 9327; https://doi.org/10.3390/s22239327 - 30 Nov 2022

Cited by 6 | Viewed by 2918

Abstract

Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the [...] Read more.

Over recent years, with the advances in image recognition technology for deep learning, researchers have devoted continued efforts toward importing anomaly detection technology into the production line of automatic optical detection. Although unsupervised learning helps overcome the high costs associated with labeling, the accuracy of anomaly detection still needs to be improved. Accordingly, this paper proposes a novel deep learning model for anomaly detection to overcome this bottleneck. Leveraging a powerful pre-trained feature extractor and the skip connection, the proposed method achieves better feature extraction and image reconstructing capabilities. Results reveal that the areas under the curve (AUC) for the proposed method are higher than those of previous anomaly detection models for 16 out of 17 categories. This indicates that the proposed method can realize the most appropriate adjustments to the needs of production lines in order to maximize economic benefits. Full article

► Show Figures

Figure 1

20 pages, 3364 KiB

Open AccessArticle

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network

by Shima Javanmardi, Ali Mohammad Latif, Mohammad Taghi Sadeghi, Mehrdad Jahanbanifard, Marcello Bonsangue and Fons J. Verbeek

Sensors 2022, 22(21), 8376; https://doi.org/10.3390/s22218376 - 1 Nov 2022

Cited by 7 | Viewed by 3506

Abstract

In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach [...] Read more.

In image captioning models, the main challenge in describing an image is identifying all the objects by precisely considering the relationships between the objects and producing various captions. Over the past few years, many methods have been proposed, from an attribute-to-attribute comparison approach to handling issues related to semantics and their relationships. Despite the improvements, the existing techniques suffer from inadequate positional and geometrical attributes concepts. The reason is that most of the abovementioned approaches depend on Convolutional Neural Networks (CNNs) for object detection. CNN is notorious for failing to detect equivariance and rotational invariance in objects. Moreover, the pooling layers in CNNs cause valuable information to be lost. Inspired by the recent successful approaches, this paper introduces a novel framework for extracting meaningful descriptions based on a parallelized capsule network that describes the content of images through a high level of understanding of the semantic contents of an image. The main contribution of this paper is proposing a new method that not only overrides the limitations of CNNs but also generates descriptions with a wide variety of words by using Wikipedia. In our framework, capsules focus on the generation of meaningful descriptions with more detailed spatial and geometrical attributes for a given set of images by considering the position of the entities as well as their relationships. Qualitative experiments on the benchmark dataset MS-COCO show that our framework outperforms state-of-the-art image captioning models when describing the semantic content of the images. Full article

► Show Figures

Figure 1

21 pages, 13912 KiB

Open AccessArticle

Deep Learning-Based Synthesized View Quality Enhancement with DIBR Distortion Mask Prediction Using Synthetic Images

by Huan Zhang, Jiangzhong Cao, Dongsheng Zheng, Ximei Yao and Bingo Wing-Kuen Ling

Sensors 2022, 22(21), 8127; https://doi.org/10.3390/s22218127 - 24 Oct 2022

Cited by 6 | Viewed by 2928

Abstract

Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus [...] Read more.

Recently, deep learning-based image quality enhancement models have been proposed to improve the perceptual quality of distorted synthesized views impaired by compression and the Depth Image-Based Rendering (DIBR) process in a multi-view video system. However, due to the lack of Multi-view Video plus Depth (MVD) data, the training data for quality enhancement models is small, which limits the performance and progress of these models. Augmenting the training data to enhance the synthesized view quality enhancement (SVQE) models is a feasible solution. In this paper, a deep learning-based SVQE model using more synthetic synthesized view images (SVIs) is suggested. To simulate the irregular geometric displacement of DIBR distortion, a random irregular polygon-based SVI synthesis method is proposed based on existing massive RGB/RGBD data, and a synthetic synthesized view database is constructed, which includes synthetic SVIs and the DIBR distortion mask. Moreover, to further guide the SVQE models to focus more precisely on DIBR distortion, a DIBR distortion mask prediction network which could predict the position and variance of DIBR distortion is embedded into the SVQE models. The experimental results on public MVD sequences demonstrate that the PSNR performance of the existing SVQE models, e.g., DnCNN, NAFNet, and TSAN, pre-trained on NYU-based synthetic SVIs could be greatly promoted by 0.51-, 0.36-, and 0.26 dB on average, respectively, while the MPPSNRr performance could also be elevated by 0.86, 0.25, and 0.24 on average, respectively. In addition, by introducing the DIBR distortion mask prediction network, the SVI quality obtained by the DnCNN and NAFNet pre-trained on NYU-based synthetic SVIs could be further enhanced by 0.02- and 0.03 dB on average in terms of the PSNR and 0.004 and 0.121 on average in terms of the MPPSNRr. Full article

► Show Figures

Figure 1

17 pages, 4653 KiB

Open AccessArticle

Semi-Supervised Defect Detection Method with Data-Expanding Strategy for PCB Quality Inspection

by Yusen Wan, Liang Gao, Xinyu Li and Yiping Gao

Sensors 2022, 22(20), 7971; https://doi.org/10.3390/s22207971 - 19 Oct 2022

Cited by 16 | Viewed by 3199

Abstract

Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning [...] Read more.

Printed circuit board (PCB) defect detection plays a crucial role in PCB production, and the popular methods are based on deep learning and require large-scale datasets with high-level ground-truth labels, in which it is time-consuming and costly to label these datasets. Semi-supervised learning (SSL) methods, which reduce the need for labeled samples by leveraging unlabeled samples, can address this problem well. However, for PCB defects, the detection accuracy on small numbers of labeled samples still needs to be improved because the number of labeled samples is small, and the training process will be disturbed by the unlabeled samples. To overcome this problem, this paper proposed a semi-supervised defect detection method with a data-expanding strategy (DE-SSD). The proposed DE-SSD uses both the labeled and unlabeled samples, which can reduce the cost of data labeling, and a batch-adding strategy (BA-SSL) is introduced to leverage the unlabeled data with less disturbance. Moreover, a data-expanding (DE) strategy is proposed to use the labeled samples from other datasets to expand the target dataset, which can also prevent the disturbance by the unlabeled samples. Based on the improvements, the proposed DE-SSD can achieve competitive results for PCB defects with fewer labeled samples. The experimental results on DeepPCB indicate that the proposed DE-SSD achieves state-of-the-art performance, which is improved by 4.7 mAP at least compared with the previous methods. Full article

► Show Figures

Figure 1

15 pages, 9839 KiB

Open AccessFeature PaperArticle

Low-Light Image Enhancement Using Photometric Alignment with Hierarchy Pyramid Network

by Jing Ye, Xintao Chen, Changzhen Qiu and Zhiyong Zhang

Sensors 2022, 22(18), 6799; https://doi.org/10.3390/s22186799 - 8 Sep 2022

Cited by 2 | Viewed by 3113

Abstract

Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. [...] Read more.

Low-light image enhancement can effectively assist high-level vision tasks that often fail in poor illumination conditions. Most previous data-driven methods, however, implemented enhancement directly from severely degraded low-light images that may provide undesirable enhancement results, including blurred detail, intensive noise, and distorted color. In this paper, inspired by a coarse-to-fine strategy, we propose an end-to-end image-level alignment with pixel-wise perceptual information enhancement pipeline for low-light image enhancement. A coarse adaptive global photometric alignment sub-network is constructed to reduce style differences, which facilitates improving illumination and revealing under-exposure area information. After the learned aligned image, a hierarchy pyramid enhancement sub-network is used to optimize image quality, which helps to remove amplified noise and enhance the local detail of low-light images. We also propose a multi-residual cascade attention block (MRCAB) that involves channel split and concatenation strategy, polarized self-attention mechanism, which leads to high-resolution reconstruction images in perceptual quality. Extensive experiments have demonstrated the effectiveness of our method on various datasets and significantly outperformed other state-of-the-art methods in detail and color reproduction. Full article

► Show Figures

Figure 1

3 pages, 166 KiB

Open AccessEditorial

Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing

by Yun Zhang, Sam Kwong, Long Xu and Tiesong Zhao

Sensors 2022, 22(16), 6192; https://doi.org/10.3390/s22166192 - 18 Aug 2022

Cited by 4 | Viewed by 2579

Abstract

Deep learning techniques have shown their capabilities to discover knowledge from massive unstructured data, providing data-driven solutions for representation and decision making [...] Full article

14 pages, 3795 KiB

Open AccessArticle

A Timestamp-Independent Haptic–Visual Synchronization Method for Haptic-Based Interaction System

by Yiwen Xu, Liangtao Huang, Tiesong Zhao, Ying Fang and Liqun Lin

Sensors 2022, 22(15), 5502; https://doi.org/10.3390/s22155502 - 23 Jul 2022

Cited by 3 | Viewed by 2366

Abstract

The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple [...] Read more.

The booming haptic data significantly improve the users’ immersion during multimedia interaction. As a result, the study of a Haptic-based Interaction System has attracted the attention of the multimedia community. To construct such a system, a challenging task is the synchronization of multiple sensorial signals that is critical to the user experience. Despite audio-visual synchronization efforts, there is still a lack of a haptic-aware multimedia synchronization model. In this work, we propose a timestamp-independent synchronization for haptic–visual signal transmission. First, we exploit the sequential correlations during delivery and playback of a haptic–visual communication system. Second, we develop a key sample extraction of haptic signals based on the force feedback characteristics and a key frame extraction of visual signals based on deep-object detection. Third, we combine the key samples and frames to synchronize the corresponding haptic–visual signals. Without timestamps in the signal flow, the proposed method is still effective and more robust in complicated network conditions. Subjective evaluation also shows a significant improvement of user experience with the proposed method. Full article

► Show Figures

Figure 1

17 pages, 13643 KiB

Open AccessArticle

Inspection of Underwater Hull Surface Condition Using the Soft Voting Ensemble of the Transfer-Learned Models

by Byung Chul Kim, Hoe Chang Kim, Sungho Han and Dong Kyou Park

Sensors 2022, 22(12), 4392; https://doi.org/10.3390/s22124392 - 10 Jun 2022

Cited by 14 | Viewed by 3258

Abstract

In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models [...] Read more.

In this study, we propose a method for inspecting the condition of hull surfaces using underwater images acquired from the camera of a remotely controlled underwater vehicle (ROUV). To this end, a soft voting ensemble classifier comprising six well-known convolutional neural network models was used. Using the transfer learning technique, the images of the hull surfaces were used to retrain the six models. The proposed method exhibited an accuracy of 98.13%, a precision of 98.73%, a recall of 97.50%, and an F₁-score of 98.11% for the classification of the test set. Furthermore, the time taken for the classification of one image was verified to be approximately 56.25 ms, which is applicable to ROUVs that require real-time inspection. Full article

► Show Figures

Figure 1

25 pages, 10159 KiB

Open AccessArticle

A Hybrid Deep Learning and Visualization Framework for Pushing Behavior Detection in Pedestrian Dynamics

by Ahmed Alia, Mohammed Maree and Mohcine Chraibi

Sensors 2022, 22(11), 4040; https://doi.org/10.3390/s22114040 - 26 May 2022

Cited by 13 | Viewed by 4197

Abstract

Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe [...] Read more.

Crowded event entrances could threaten the comfort and safety of pedestrians, especially when some pedestrians push others or use gaps in crowds to gain faster access to an event. Studying and understanding pushing dynamics leads to designing and building more comfortable and safe entrances. Researchers—to understand pushing dynamics—observe and analyze recorded videos to manually identify when and where pushing behavior occurs. Despite the accuracy of the manual method, it can still be time-consuming, tedious, and hard to identify pushing behavior in some scenarios. In this article, we propose a hybrid deep learning and visualization framework that aims to assist researchers in automatically identifying pushing behavior in videos. The proposed framework comprises two main components: (i) Deep optical flow and wheel visualization; to generate motion information maps. (ii) A combination of an EfficientNet-B0-based classifier and a false reduction algorithm for detecting pushing behavior at the video patch level. In addition to the framework, we present a new patch-based approach to enlarge the data and alleviate the class imbalance problem in small-scale pushing behavior datasets. Experimental results (using real-world ground truth of pushing behavior videos) demonstrate that the proposed framework achieves an 86% accuracy rate. Moreover, the EfficientNet-B0-based classifier outperforms baseline CNN-based classifiers in terms of accuracy. Full article

► Show Figures

Figure 1

17 pages, 2998 KiB

Open AccessArticle

Color-Dense Illumination Adjustment Network for Removing Haze and Smoke from Fire Scenario Images

by Chuansheng Wang, Jinxing Hu, Xiaowei Luo, Mei-Po Kwan, Weihua Chen and Hao Wang

Sensors 2022, 22(3), 911; https://doi.org/10.3390/s22030911 - 25 Jan 2022

Cited by 5 | Viewed by 3549

Abstract

The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and [...] Read more.

The atmospheric particles and aerosols from burning usually cause visual artifacts in single images captured from fire scenarios. Most existing haze removal methods exploit the atmospheric scattering model (ASM) for visual enhancement, which inevitably leads to inaccurate estimation of the atmosphere light and transmission matrix of the smoky and hazy inputs. To solve these problems, we present a novel color-dense illumination adjustment network (CIANet) for joint recovery of transmission matrix, illumination intensity, and the dominant color of aerosols from a single image. Meanwhile, to improve the visual effects of the recovered images, the proposed CIANet jointly optimizes the transmission map, atmospheric optical value, the color of aerosol, and a preliminary recovered scene. Furthermore, we designed a reformulated ASM, called the aerosol scattering model (ESM), to smooth out the enhancement results while keeping the visual effects and the semantic information of different objects. Experimental results on both the proposed RFSIE and NTIRE’20 demonstrate our superior performance favorably against state-of-the-art dehazing methods regarding PSNR, SSIM and subjective visual quality. Furthermore, when concatenating CIANet with Faster R-CNN, we witness an improvement of the objection performance with a large margin. Full article

► Show Figures

Figure 1

2021

Jump to: 2024, 2023, 2022

16 pages, 28869 KiB

Open AccessArticle

Small Object Detection in Traffic Scenes Based on YOLO-MXANet

by Xiaowei He, Rao Cheng, Zhonglong Zheng and Zeji Wang

Sensors 2021, 21(21), 7422; https://doi.org/10.3390/s21217422 - 8 Nov 2021

Cited by 34 | Viewed by 5273

Abstract

In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is [...] Read more.

In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages. Full article

► Show Figures

Figure 1

11 pages, 2880 KiB

Open AccessArticle

Deepfake Detection Using the Rate of Change between Frames Based on Computer Vision

by Gihun Lee and Mihui Kim

Sensors 2021, 21(21), 7367; https://doi.org/10.3390/s21217367 - 5 Nov 2021

Cited by 25 | Viewed by 14500

Abstract

Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is [...] Read more.

Recently, artificial intelligence has been successfully used in fields, such as computer vision, voice, and big data analysis. However, various problems, such as security, privacy, and ethics, also occur owing to the development of artificial intelligence. One such problem are deepfakes. Deepfake is a compound word for deep learning and fake. It refers to a fake video created using artificial intelligence technology or the production process itself. Deepfakes can be exploited for political abuse, pornography, and fake information. This paper proposes a method to determine integrity by analyzing the computer vision features of digital content. The proposed method extracts the rate of change in the computer vision features of adjacent frames and then checks whether the video is manipulated. The test demonstrated the highest detection rate of 97% compared to the existing method or machine learning method. It also maintained the highest detection rate of 96%, even for the test that manipulates the matrix of the image to avoid the convolutional neural network detection method. Full article

► Show Figures

Figure 1

16 pages, 7505 KiB

Open AccessArticle

Improving the Ability of a Laser Ultrasonic Wave-Based Detection of Damage on the Curved Surface of a Pipe Using a Deep Learning Technique

by Byoungjoon Yu, Kassahun Demissie Tola, Changgil Lee and Seunghee Park

Sensors 2021, 21(21), 7105; https://doi.org/10.3390/s21217105 - 26 Oct 2021

Cited by 10 | Viewed by 4785

Abstract

With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic [...] Read more.

With the advent of the Fourth Industrial Revolution, the economic, social, and technological demands for pipe maintenance are increasing due to the aging of the infrastructure caused by the increase in industrial development and the expansion of cities. Owing to this, an automatic pipe damage detection system was built using a laser-scanned pipe’s ultrasonic wave propagation imaging (UWPI) data and conventional neural network (CNN)-based object detection algorithms. The algorithm used in this study was EfficientDet-d0, a CNN-based object detection algorithm which uses the transfer learning method. As a result, the mean average precision (mAP) was measured to be 0.39. The result found was higher than COCO EfficientDet-d0 mAP, which is expected to enable the efficient maintenance of piping used in construction and many industries. Full article

► Show Figures

Figure 1

15 pages, 2472 KiB

Open AccessArticle

Compressed Video Quality Index Based on Saliency-Aware Artifact Detection

by Liqun Lin, Jing Yang, Zheng Wang, Liping Zhou, Weiling Chen and Yiwen Xu

Sensors 2021, 21(19), 6429; https://doi.org/10.3390/s21196429 - 26 Sep 2021

Cited by 6 | Viewed by 6943

Abstract

Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers [...] Read more.

Video coding technology makes the required storage and transmission bandwidth of video services decrease by reducing the bitrate of the video stream. However, the compressed video signals may involve perceivable information loss, especially when the video is overcompressed. In such cases, the viewers can observe visually annoying artifacts, namely, Perceivable Encoding Artifacts (PEAs), which degrade their perceived video quality. To monitor and measure these PEAs (including blurring, blocking, ringing and color bleeding), we propose an objective video quality metric named Saliency-Aware Artifact Measurement (SAAM) without any reference information. The SAAM metric first introduces video saliency detection to extract interested regions and further splits these regions into a finite number of image patches. For each image patch, the data-driven model is utilized to evaluate intensities of PEAs. Finally, these intensities are fused into an overall metric using Support Vector Regression (SVR). In experiment section, we compared the SAAM metric with other popular video quality metrics on four publicly available databases: LIVE, CSIQ, IVP and FERIT-RTRK. The results reveal the promising quality prediction performance of the SAAM metric, which is superior to most of the popular compressed video quality evaluation models. Full article

► Show Figures

Figure 1

18 pages, 8410 KiB

Open AccessArticle

MSF-Net: Multi-Scale Feature Learning Network for Classification of Surface Defects of Multifarious Sizes

by Pengcheng Xu, Zhongyuan Guo, Lei Liang and Xiaohang Xu

Sensors 2021, 21(15), 5125; https://doi.org/10.3390/s21155125 - 29 Jul 2021

Cited by 14 | Viewed by 3451

Abstract

In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local [...] Read more.

In the field of surface defect detection, the scale difference of product surface defects is often huge. The existing defect detection methods based on Convolutional Neural Networks (CNNs) are more inclined to express macro and abstract features, and the ability to express local and small defects is insufficient, resulting in an imbalance of feature expression capabilities. In this paper, a Multi-Scale Feature Learning Network (MSF-Net) based on Dual Module Feature (DMF) extractor is proposed. DMF extractor is mainly composed of optimized Concatenated Rectified Linear Units (CReLUs) and optimized Inception feature extraction modules, which increases the diversity of feature receptive fields while reducing the amount of calculation; the feature maps of the middle layer with different sizes of receptive fields are merged to increase the richness of the receptive fields of the last layer of feature maps; the residual shortcut connections, batch normalization layer and average pooling layer are used to replace the fully connected layer to improve training efficiency, and make the multi-scale feature learning ability more balanced at the same time. Two representative multi-scale defect data sets are used for experiments, and the experimental results verify the advancement and effectiveness of the proposed MSF-Net in the detection of surface defects with multi-scale features. Full article

► Show Figures

Figure 1

21 pages, 8853 KiB

Open AccessArticle

Wheat Ear Recognition Based on RetinaNet and Transfer Learning

by Jingbo Li, Changchun Li, Shuaipeng Fei, Chunyan Ma, Weinan Chen, Fan Ding, Yilin Wang, Yacong Li, Jinjin Shi and Zhen Xiao

Sensors 2021, 21(14), 4845; https://doi.org/10.3390/s21144845 - 16 Jul 2021

Cited by 54 | Viewed by 4854

Abstract

The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the [...] Read more.

The number of wheat ears is an essential indicator for wheat production and yield estimation, but accurately obtaining wheat ears requires expensive manual cost and labor time. Meanwhile, the characteristics of wheat ears provide less information, and the color is consistent with the background, which can be challenging to obtain the number of wheat ears required. In this paper, the performance of Faster regions with convolutional neural networks (Faster R-CNN) and RetinaNet to predict the number of wheat ears for wheat at different growth stages under different conditions is investigated. The results show that using the Global WHEAT dataset for recognition, the RetinaNet method, and the Faster R-CNN method achieve an average accuracy of 0.82 and 0.72, with the RetinaNet method obtaining the highest recognition accuracy. Secondly, using the collected image data for recognition, the R² of RetinaNet and Faster R-CNN after transfer learning is 0.9722 and 0.8702, respectively, indicating that the recognition accuracy of the RetinaNet method is higher on different data sets. We also tested wheat ears at both the filling and maturity stages; our proposed method has proven to be very robust (the R² is above 90). This study provides technical support and a reference for automatic wheat ear recognition and yield estimation. Full article

► Show Figures

Figure 1

12 pages, 3924 KiB

Open AccessCommunication

Bionic Birdlike Imaging Using a Multi-Hyperuniform LED Array

by Xin-Yu Zhao, Li-Jing Li, Lei Cao and Ming-Jie Sun

Sensors 2021, 21(12), 4084; https://doi.org/10.3390/s21124084 - 14 Jun 2021

Cited by 4 | Viewed by 3161

Abstract

Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during [...] Read more.

Digital cameras obtain color information of the scene using a chromatic filter, usually a Bayer filter, overlaid on a pixelated detector. However, the periodic arrangement of both the filter array and the detector array introduces frequency aliasing in sampling and color misregistration during demosaicking process which causes degradation of image quality. Inspired by the biological structure of the avian retinas, we developed a chromatic LED array which has a geometric arrangement of multi-hyperuniformity, which exhibits an irregularity on small-length scales but a quasi-uniformity on large scales, to suppress frequency aliasing and color misregistration in full color image retrieval. Experiments were performed with a single-pixel imaging system using the multi-hyperuniform chromatic LED array to provide structured illumination, and 208 fps frame rate was achieved at 32 × 32 pixel resolution. By comparing the experimental results with the images captured with a conventional digital camera, it has been demonstrated that the proposed imaging system forms images with less chromatic moiré patterns and color misregistration artifacts. The concept proposed verified here could provide insights for the design and the manufacturing of future bionic imaging sensors. Full article

► Show Figures

Figure 1

19 pages, 2182 KiB

Open AccessArticle

Attention Networks for the Quality Enhancement of Light Field Images

by Ionut Schiopu and Adrian Munteanu

Sensors 2021, 21(9), 3246; https://doi.org/10.3390/s21093246 - 7 May 2021

Cited by 2 | Viewed by 2687

Abstract

In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using [...] Read more.

In this paper, we propose a novel filtering method based on deep attention networks for the quality enhancement of light field (LF) images captured by plenoptic cameras and compressed using the High Efficiency Video Coding (HEVC) standard. The proposed architecture was built using efficient complex processing blocks and novel attention-based residual blocks. The network takes advantage of the macro-pixel (MP) structure, specific to LF images, and processes each reconstructed MP in the luminance (Y) channel. The input patch is represented as a tensor that collects, from an MP neighbourhood, four Epipolar Plane Images (EPIs) at four different angles. The experimental results on a common LF image database showed high improvements over HEVC in terms of the structural similarity index (SSIM), with an average Y-Bjøntegaard Delta (BD)-rate savings of

36.57 %,

and an average Y-BD-PSNR improvement of

2.301

dB. Increased performance was achieved when the HEVC built-in filtering methods were skipped. The visual results illustrate that the enhanced image contains sharper edges and more texture details. The ablation study provides two robust solutions to reduce the inference time by

44.6 %

and the network complexity by

74.7 %

. The results demonstrate the potential of attention networks for the quality enhancement of LF images encoded by HEVC. Full article

► Show Figures

Figure 1

20 pages, 4137 KiB

Open AccessArticle

DNet: Dynamic Neighborhood Feature Learning in Point Cloud

by Fujing Tian, Zhidi Jiang and Gangyi Jiang

Sensors 2021, 21(7), 2327; https://doi.org/10.3390/s21072327 - 26 Mar 2021

Cited by 5 | Viewed by 2636

Abstract

Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the [...] Read more.

Neighborhood selection is very important for local region feature learning in point cloud learning networks. Different neighborhood selection schemes may lead to quite different results for point cloud processing tasks. The existing point cloud learning networks mainly adopt the approach of customizing the neighborhood, without considering whether the selected neighborhood is reasonable or not. To solve this problem, this paper proposes a new point cloud learning network, denoted as Dynamic neighborhood Network (DNet), to dynamically select the neighborhood and learn the features of each point. The proposed DNet has a multi-head structure which has two important modules: the Feature Enhancement Layer (FELayer) and the masking mechanism. The FELayer enhances the manifold features of the point cloud, while the masking mechanism is used to remove the neighborhood points with low contribution. The DNet can learn the manifold features and spatial geometric features of point cloud, and obtain the relationship between each point and its effective neighborhood points through the masking mechanism, so that the dynamic neighborhood features of each point can be obtained. Experimental results on three public datasets demonstrate that compared with the state-of-the-art learning networks, the proposed DNet shows better superiority and competitiveness in point cloud processing task. Full article

► Show Figures

Figure 1

18 pages, 8556 KiB

Open AccessArticle

NRA-Net—Neg-Region Attention Network for Salient Object Detection with Gaze Tracking

by Hoijun Kim, Soonchul Kwon and Seunghyun Lee

Sensors 2021, 21(5), 1753; https://doi.org/10.3390/s21051753 - 4 Mar 2021

Cited by 5 | Viewed by 2732

Abstract

In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated [...] Read more.

In this paper, we propose a detection method for salient objects whose eyes are focused on gaze tracking; this method does not require a device in a single image. A network was constructed using Neg-Region Attention (NRA), which predicts objects with a concentrated line of sight using deep learning techniques. The existing deep learning-based method has an autoencoder structure, which causes feature loss during the encoding process of compressing and extracting features from the image and the decoding process of expanding and restoring. As a result, a feature loss occurs in the area of the object from the detection results, or another area is detected as an object. The proposed method, that is, NRA, can be used for reducing feature loss and emphasizing object areas with encoders. After separating positive and negative regions using the exponential linear unit activation function, converted attention was performed for each region. The attention method provided without using the backbone network emphasized the object area and suppressed the background area. In the experimental results, the proposed method showed higher detection results than the conventional methods. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing

Share This Topical Collection

Editors

Topical Collection Information

Keywords

Published Papers (43 papers)

2024

Jump to: 2023, 2022, 2021

2023

Jump to: 2024, 2022, 2021

2022

Jump to: 2024, 2023, 2021

2021

Jump to: 2024, 2023, 2022

Further Information

Guidelines

MDPI Initiatives

Follow MDPI