Advances in Computer Vision and Deep Learning and Its Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 May 2024 | Viewed by 31948

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Chubu University, 1200 Matsumoto-cho, Kasugai 487-8501, Aichi, Japan
Interests: computer vision; neural networks; machine learning; medical image analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China
Interests: machine vision; visual detection and image processing; medical virtual reality
Special Issues, Collections and Topics in MDPI journals
Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China
Interests: remote sensing image processing; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Up to 80% of neurons in the human brain are related to processing visual information and cognition, so the application of image processing is inevitably involved in all aspects of human life and work. Currently, image processing technology is at the core of artificial intelligence, aiming to design computer programs to achieve or mimic human-like intelligence in perception and inference in the real world. With the rapid development of visual sensors and imaging technologies, image analysis and pattern recognition techniques have been extensively applied in various artificial intelligence-related fields, from industry and agriculture to surveillance and social security, etc. Up to now, image processing technology based on deep learning has rapidly developed and become the most successful applied intelligent technology. Pattern recognition is an important research field in image processing and includes image preprocessing, feature extraction and selection, classifier design, and classification decisions.

In this context, for this Special Issue entitled “Advances in Computer Vision and Deep Learning and Their Applications”, we invite original research and comprehensive reviews on, but not limited to, the following:

  • Advances in computer vision;
  • Advances in feature extraction and the selection of images;
  • Advances in pattern recognition in image processing technology;
  • Image processing in intelligent transportation;
  • Neural networks, machine learning, and deep learning;
  • Hyperspectral image processing;
  • Biomedical image processing and recognition;
  • Speech processing and video processing;
  • Image processing in intelligent monitoring;
  • Deep learning for image processing;
  • Deep learning-based methods for image and video analysis;
  • Image analysis and pattern recognition for robotics and unmanned systems;
  • AI-based image processing, understanding, recognition, compression, and reconstruction.

Prof. Dr. Yuji Iwahori
Dr. Haibin Wu
Dr. Aili Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • pattern recognition
  • deep learning
  • machine learning
  • computational intelligence
  • Media Information Technology

Published Papers (22 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 5805 KiB  
Article
Research on 3D Visualization of Drone Scenes Based on Neural Radiance Fields
by Pengfei Jin and Zhuoyuan Yu
Electronics 2024, 13(9), 1682; https://doi.org/10.3390/electronics13091682 - 26 Apr 2024
Viewed by 305
Abstract
Neural Radiance Fields (NeRFs), as an innovative method employing neural networks for the implicit representation of 3D scenes, have been able to synthesize images from arbitrary viewpoints and successfully apply them to the visualization of objects and room-level scenes (<50 m2). [...] Read more.
Neural Radiance Fields (NeRFs), as an innovative method employing neural networks for the implicit representation of 3D scenes, have been able to synthesize images from arbitrary viewpoints and successfully apply them to the visualization of objects and room-level scenes (<50 m2). However, due to the capacity limitations of neural networks, the rendering of drone-captured scenes (>10,000 m2) often appears blurry and lacks detail. Merely increasing the model’s capacity or the number of sample points can significantly raise training costs. Existing space contraction methods, designed for forward-facing trajectory or the 360° object-centric trajectory, are not suitable for the unique trajectories of drone footage. Furthermore, anomalies and cloud fog artifacts, resulting from complex lighting conditions and sparse data acquisition, can significantly degrade the quality of rendering. To address these challenges, we propose a framework specifically designed for drone-captured scenes. Within this framework, while using a feature grid and multi-layer perceptron (MLP) to jointly represent 3D scenes, we introduce a Space Boundary Compression method and a Ground-Optimized Sampling strategy to streamline spatial structure and enhance sampling performance. Moreover, we propose an anti-aliasing neural rendering model based on Cluster Sampling and Integrated Hash Encoding to optimize distant details and incorporate an L1 norm penalty for outliers, as well as entropy regularization loss to reduce fluffy artifacts. To verify the effectiveness of the algorithm, experiments were conducted on four drone-captured scenes. The results show that, with only a single GPU and less than two hours of training time, photorealistic visualization can be achieved, significantly improving upon the performance of the existing NeRF approaches. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

13 pages, 4454 KiB  
Article
A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes
by Yong Qin, Wuqing Miao and Chen Qian
Electronics 2024, 13(6), 1141; https://doi.org/10.3390/electronics13061141 - 20 Mar 2024
Viewed by 608
Abstract
Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectively address the challenges posed [...] Read more.
Falls can cause significant harm, and even death, to elderly individuals. Therefore, it is crucial to have a highly accurate fall detection model that can promptly detect and respond to changes in posture. The YOLOv8 model may not effectively address the challenges posed by deformation, different scale targets, and occlusion in complex scenes during human falls. This paper presented ESD-YOLO, a new high-precision fall detection model based on dynamic convolution that improves upon the YOLOv8 model. The C2f module in the backbone network was replaced with the C2Dv3 module to enhance the network’s ability to capture complex details and deformations. The Neck section used the DyHead block to unify multiple attentional operations, enhancing the detection accuracy of targets at different scales and improving performance in cases of occlusion. Additionally, the algorithm proposed in this paper utilized the loss function EASlideloss to increase the model’s focus on hard samples and solve the problem of sample imbalance. The experimental results demonstrated a 1.9% increase in precision, a 4.1% increase in recall, a 4.3% increase in mAP0.5, and a 2.8% increase in mAP0.5:0.95 compared to YOLOv8. Specifically, it has significantly improved the precision of human fall detection in complex scenes. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

25 pages, 9298 KiB  
Article
Research on the Car Searching System in the Multi-Storey Garage with the RSSI Indoor Locating Based on Neural Network
by Jihui Ma, Lijie Wang, Xianwen Zhu, Ziyi Li and Xinyu Lu
Electronics 2024, 13(5), 907; https://doi.org/10.3390/electronics13050907 - 27 Feb 2024
Viewed by 529
Abstract
To solve the problem of reverse car searching in intelligent multi-story garages or parking lots, the reverse car searching method based on the intelligent garage of the PC client and mobile client APP was studied, and the interface design and function development of [...] Read more.
To solve the problem of reverse car searching in intelligent multi-story garages or parking lots, the reverse car searching method based on the intelligent garage of the PC client and mobile client APP was studied, and the interface design and function development of the system’s PC and mobile client APP were carried out. YOLOv5 network and LPRNet network were used for license plate location and recognition to realize parking and entry detection. The indoor pedestrian location method based on RSSI fingerprint signal fusion BPNet network and KNN algorithm was studied, and the location accuracy within 2.5 m was found to be 100%. The research on the A* algorithm based on spatial accessibility was conducted to realize the reverse car search function. The research results indicate that the guidance of the vehicle finding path can be completed while the number of invalid search nodes for the example maps was reduced by more than 55.0%, and the operating efficiency of the algorithm increased to 28.5%. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

16 pages, 62019 KiB  
Article
MM-NeRF: Large-Scale Scene Representation with Multi-Resolution Hash Grid and Multi-View Priors Features
by Bo Dong, Kaiqiang Chen, Zhirui Wang, Menglong Yan, Jiaojiao Gu and Xian Sun
Electronics 2024, 13(5), 844; https://doi.org/10.3390/electronics13050844 - 22 Feb 2024
Viewed by 870
Abstract
Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the [...] Read more.
Reconstructing large-scale scenes using Neural Radiance Fields (NeRFs) is a research hotspot in 3D computer vision. Existing MLP (multi-layer perception)-based methods often suffer from issues of underfitting and a lack of fine details in rendering large-scale scenes. Popular solutions are to divide the scene into small areas for separate modeling or to increase the layer scale of the MLP network. However, the subsequent problem is that the training cost increases. Moreover, reconstructing large scenes, unlike object-scale reconstruction, involves a geometrically considerable increase in the quantity of view data if the prior information of the scene is not effectively utilized. In this paper, we propose an innovative method named MM-NeRF, which integrates efficient hybrid features into the NeRF framework to enhance the reconstruction of large-scale scenes. We propose employing a dual-branch feature capture structure, comprising a multi-resolution 3D hash grid feature branch and a multi-view 2D prior feature branch. The 3D hash grid feature models geometric details, while the 2D prior feature supplements local texture information. Our experimental results show that such integration is sufficient to render realistic novel views with fine details, forming a more accurate geometric representation. Compared with representative methods in the field, our method significantly improves the PSNR (Peak Signal-to-Noise Ratio) by approximately 5%. This remarkable progress underscores the outstanding contribution of our method in the field of large-scene radiance field reconstruction. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

21 pages, 11283 KiB  
Article
A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization
by Li Xin, Hu Lin, Xinjun Liu and Shiyu Wang
Electronics 2024, 13(4), 774; https://doi.org/10.3390/electronics13040774 - 16 Feb 2024
Viewed by 648
Abstract
Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method [...] Read more.
Six degrees of freedom pose estimation technology constitutes the cornerstone for precise robotic control and similar tasks. Addressing the limitations of current 6-DoF pose estimation methods in handling object occlusions and unknown objects, we have developed a novel two-stage 6-DoF pose estimation method that integrates RGB-D data with CAD models. Initially, targeting high-quality zero-shot object instance segmentation tasks, we innovated the CAE-SAM model based on the SAM framework. In addressing the SAM model’s boundary blur, mask voids, and over-segmentation issues, this paper introduces innovative strategies such as local spatial-feature-enhancement modules, global context markers, and a bounding box generator. Subsequently, we proposed a registration method optimized through a hybrid distance metric to diminish the dependency of point cloud registration algorithms on sensitive hyperparameters. Experimental results on the HQSeg-44K dataset substantiate the notable improvements in instance segmentation accuracy and robustness rendered by the CAE-SAM model. Moreover, the efficacy of this two-stage method is further corroborated using a 6-DoF pose dataset of workpieces constructed with CloudCompare and RealSense. For unseen targets, the ADD metric achieved 2.973 mm, and the ADD-S metric reached 1.472 mm. This paper significantly enhances pose estimation performance and streamlines the algorithm’s deployment and maintenance procedures. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Graphical abstract

17 pages, 10797 KiB  
Article
Multi-Branch Spectral Channel Attention Network for Breast Cancer Histopathology Image Classification
by Lu Cao, Ke Pan, Yuan Ren, Ruidong Lu and Jianxin Zhang
Electronics 2024, 13(2), 459; https://doi.org/10.3390/electronics13020459 - 22 Jan 2024
Viewed by 860
Abstract
Deep-learning-based breast cancer image diagnosis is currently a prominent and growingly popular area of research. Existing convolutional-neural-network-related methods mainly capture breast cancer image features based on spatial domain characteristics for classification. However, according to digital signal processing theory, texture images usually contain repeated [...] Read more.
Deep-learning-based breast cancer image diagnosis is currently a prominent and growingly popular area of research. Existing convolutional-neural-network-related methods mainly capture breast cancer image features based on spatial domain characteristics for classification. However, according to digital signal processing theory, texture images usually contain repeated patterns and structures, which appear as intense energy at specific frequencies in the frequency domain. Motivated by this, we make an attempt to explore a breast cancer histopathology classification application in the frequency domain and further propose a novel multi-branch spectral channel attention network, i.e., the MbsCANet. It expands the interaction of frequency domain attention mechanisms from a multi-branch perspective via combining the lowest frequency features with selected high frequency information from two-dimensional discrete cosine transform, thus preventing the loss of phase information and gaining richer context information for classification. We thoroughly evaluate and analyze the MbsCANet on the publicly accessible BreakHis breast cancer histopathology dataset. It respectively achieves the optimal image-level and patient-level classification results of 99.01% and 98.87%, averagely outperforming the spatial-domain-dominated models by a large margin, and visualization results also demonstrate the effectiveness of the MbsCANet for this medical image application. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

23 pages, 8433 KiB  
Article
Application of Improved YOLOv5 Algorithm in Lightweight Transmission Line Small Target Defect Detection
by Zhilong Yu, Yanqiao Lei, Feng Shen and Shuai Zhou
Electronics 2024, 13(2), 305; https://doi.org/10.3390/electronics13020305 - 10 Jan 2024
Viewed by 992
Abstract
With the development of UAV automatic cruising along power transmission lines, intelligent defect detection in aerial images has become increasingly important. In the process of target detection for aerial photography of transmission lines, insulator defects often pose challenges due to complex backgrounds, resulting [...] Read more.
With the development of UAV automatic cruising along power transmission lines, intelligent defect detection in aerial images has become increasingly important. In the process of target detection for aerial photography of transmission lines, insulator defects often pose challenges due to complex backgrounds, resulting in noisy images and issues such as slow detection speed, leakage, and the misidentification of small-sized targets. To address these challenges, this paper proposes an insulator defect detection algorithm called DFCG_YOLOv5, which focuses on improving both the accuracy and speed by enhancing the network structure and optimizing the loss function. Firstly, the input part is optimized, and a High-Speed Adaptive Median Filtering (HSMF) algorithm is introduced to preprocess the images captured by the UAV system, effectively reducing the noise interference in target detection. Secondly, the original Ghost backbone structure is further optimized, and the DFC attention mechanism is incorporated to strike a balance between the target detection accuracy and speed. Additionally, the original CIOU loss function is replaced with the Poly Loss, which addresses the issue of imbalanced positive and negative samples for small targets. By adjusting the parameters for different datasets, this modification effectively suppresses background positive samples and enhances the detection accuracy. To align with real-world engineering applications, the dataset utilized in this study consists of unmanned aircraft system machine patrol images from the Yunnan Power Supply Bureau Company. The experimental results demonstrate a 9.2% improvement in the algorithm accuracy and a 26.2% increase in the inference speed compared to YOLOv5s. These findings hold significant implications for the practical implementation of target detection in engineering scenarios. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

21 pages, 2728 KiB  
Article
Depth-Quality Purification Feature Processing for Red Green Blue-Depth Salient Object Detection
by Shijie Feng, Li Zhao, Jie Hu, Xiaolong Zhou and Sixian Chan
Electronics 2024, 13(1), 93; https://doi.org/10.3390/electronics13010093 - 25 Dec 2023
Viewed by 671
Abstract
With the advances in deep learning technology, Red Green Blue-Depth (RGB-D) Salient Object Detection (SOD) based on convolutional neural networks (CNNs) is gaining more and more attention. However, the accuracy of current models is challenging. It has been found that the quality of [...] Read more.
With the advances in deep learning technology, Red Green Blue-Depth (RGB-D) Salient Object Detection (SOD) based on convolutional neural networks (CNNs) is gaining more and more attention. However, the accuracy of current models is challenging. It has been found that the quality of the depth features profoundly affects the accuracy. Several current RGB-D SOD techniques do not consider the quality of the depth features and directly fuse the original depth features and Red Green Blue (RGB) features for training, resulting in enhanced precision of the model. To address this issue, we propose a depth-quality purification feature processing network for RGB-D SOD, named DQPFPNet. First, we design a depth-quality purification feature processing (DQPFP) module to filter the depth features in a multi-scale manner and fuse them with RGB features in a multi-scale manner. This module can control and enhance the depth features explicitly in the process of cross-modal fusion, avoiding injecting noise or misleading depth features. Second, to prevent overfitting and avoid neuron inactivation, we utilize the RReLU activation function in the training process. In addition, we introduce the pixel position adaptive importance (PPAI) loss, which integrates local structure information to assign different weights to each pixel, thus better guiding the network’s learning process and producing clearer details. Finally, a dual-stage decoder is designed to utilize contextual information to improve the modeling ability of the model and enhance the efficiency of the network. Extensive experiments on six RGB-D datasets demonstrate that DQPFPNet outperforms recent efficient models and delivers cutting-edge accuracy. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

19 pages, 4395 KiB  
Article
LezioSeg: Multi-Scale Attention Affine-Based CNN for Segmenting Diabetic Retinopathy Lesions in Images
by Mohammed Yousef Salem Ali, Mohammed Jabreel, Aida Valls, Marc Baget and Mohamed Abdel-Nasser
Electronics 2023, 12(24), 4940; https://doi.org/10.3390/electronics12244940 - 08 Dec 2023
Cited by 3 | Viewed by 908
Abstract
Diagnosing some eye pathologies, such as diabetic retinopathy (DR), depends on accurately detecting retinal eye lesions. Automatic lesion-segmentation methods based on deep learning involve heavy-weight models and have yet to produce the desired quality of results. This paper presents a new deep learning [...] Read more.
Diagnosing some eye pathologies, such as diabetic retinopathy (DR), depends on accurately detecting retinal eye lesions. Automatic lesion-segmentation methods based on deep learning involve heavy-weight models and have yet to produce the desired quality of results. This paper presents a new deep learning method for segmenting the four types of DR lesions found in eye fundus images. The method, called LezioSeg, is based on multi-scale modules and gated skip connections. It has three components: (1) Two multi-scale modules, the first is atrous spatial pyramid pooling (ASPP), which is inserted at the neck of the network, while the second is added at the end of the decoder to improve the fundus image feature extraction; (2) ImageNet MobileNet encoder; and (3) gated skip connection (GSC) mechanism for improving the ability to obtain information about retinal eye lesions. Experiments using affine-based transformation techniques showed that this architecture improved the performance in lesion segmentation on the well-known IDRiD and E-ophtha datasets. Considering the AUPR standard metric, for the IDRiD dataset, we obtained 81% for soft exudates, 86% for hard exudates, 69% for hemorrhages, and 40% for microaneurysms. For the E-ophtha dataset, we achieved an AUPR of 63% for hard exudates and 37.5% for microaneurysms. These results show that our model with affine-based augmentation achieved competitive results compared to several cutting-edge techniques, but with a model with much fewer parameters. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

14 pages, 15025 KiB  
Article
Consistent Weighted Correlation-Based Attention for Transformer Tracking
by Lei Liu, Genwen Fang, Jun Wang, Shuai Wang, Chun Wang, Longfeng Shen, Kongfen Zhu and Silas N. Melo
Electronics 2023, 12(22), 4648; https://doi.org/10.3390/electronics12224648 - 15 Nov 2023
Viewed by 686
Abstract
Attention mechanism takes a crucial role among the key technologies in transformer-based visual tracking. However, the current methods for attention computing neglect the correlation between the query and the key, which results in erroneous correlations. To address this issue, a CWCTrack framework is [...] Read more.
Attention mechanism takes a crucial role among the key technologies in transformer-based visual tracking. However, the current methods for attention computing neglect the correlation between the query and the key, which results in erroneous correlations. To address this issue, a CWCTrack framework is proposed in this study for transformer visual tracking. To balance the weights of the attention module and enhance the feature extraction of the search region and template region, a consistent weighted correlation (CWC) module is introduced into the cross-attention block. The CWC module computes the correlation score between each query and all keys. Then, the correlation multiplies the consistent weights of the other query–key pairs to acquire the final attention weights. The weights of consistency are computed by the relevance of the query–key pairs. The correlation is enhanced for the relevant query–key pair and suppressed for the irrelevant query–key pair. Experimental results conducted on four prevalent benchmarks demonstrate that the proposed CWCTrack yields preferable performances. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

19 pages, 33702 KiB  
Article
Detection of Fittings Based on the Dynamic Graph CNN and U-Net Embedded with Bi-Level Routing Attention
by Zhihui Xie, Min Fu and Xuefeng Liu
Electronics 2023, 12(22), 4611; https://doi.org/10.3390/electronics12224611 - 11 Nov 2023
Viewed by 947
Abstract
Accurate detection of power fittings is crucial for identifying defects or faults in these components, which is essential for assessing the safety and stability of the power system. However, the accuracy of fittings detection is affected by a complex background, small target sizes, [...] Read more.
Accurate detection of power fittings is crucial for identifying defects or faults in these components, which is essential for assessing the safety and stability of the power system. However, the accuracy of fittings detection is affected by a complex background, small target sizes, and overlapping fittings in the images. To address these challenges, a fittings detection method based on the dynamic graph convolutional neural network (DGCNN) and U-shaped network (U-Net) is proposed, which combines three-dimensional detection with two-dimensional object detection. Firstly, the bi-level routing attention mechanism is incorporated into the lightweight U-Net network to enhance feature extraction for detecting the fittings boundary. Secondly, pseudo-point cloud data are synthesized by transforming the depth map generated by the Lite-Mono algorithm and its corresponding RGB fittings image. The DGCNN algorithm is then employed to extract obscured fittings features, contributing to the final refinement of the results. This process helps alleviate the issue of occlusions among targets and further enhances the precision of fittings detection. Finally, the proposed method is evaluated using a custom dataset of fittings, and comparative studies are conducted. The experimental results illustrate the promising potential of the proposed approach in enhancing features and extracting information from fittings images. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

15 pages, 2535 KiB  
Article
Toward Unified and Quantitative Cinematic Shot Attribute Analysis
by Yuzhi Li, Feng Tian, Haojun Xu and Tianfeng Lu
Electronics 2023, 12(19), 4174; https://doi.org/10.3390/electronics12194174 - 08 Oct 2023
Viewed by 702
Abstract
Cinematic Shot Attribute Analysis aims to analyze the intrinsic attributes of movie shots, such as movement and scale. In previous methods, specialized architectures were designed for each specific task and relied on the use of optical flow maps. In this paper, we [...] Read more.
Cinematic Shot Attribute Analysis aims to analyze the intrinsic attributes of movie shots, such as movement and scale. In previous methods, specialized architectures were designed for each specific task and relied on the use of optical flow maps. In this paper, we consider shot attribute analysis as a unified task of motion–static weight allocation, and propose a motion–static dual-path architecture for recognizing various shot attributes. In this architecture, we design a new action cue generation module for adapting the end-to-end training process instead of a pre-trained optical flow network; and, to address the issue of limited samples in movie shot datasets, we design a fixed-size adjustment strategy to enable the network to directly utilize pre-trained vision transformer models while adapting to shot data inputs at arbitrary sample rates. In addition, we quantitatively analyze the sensitivity of different shot attributes to motion and static features for the first time. Subsequent experimental results on two datasets, MovieShots and AVE, demonstrate that our proposed method outperforms all previous approaches without increasing computational cost. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

19 pages, 3408 KiB  
Article
Convolutional Neural Networks Adapted for Regression Tasks: Predicting the Orientation of Straight Arrows on Marked Road Pavement Using Deep Learning and Rectified Orthophotography
by Calimanut-Ionut Cira, Alberto Díaz-Álvarez, Francisco Serradilla and Miguel-Ángel Manso-Callejo
Electronics 2023, 12(18), 3980; https://doi.org/10.3390/electronics12183980 - 21 Sep 2023
Cited by 1 | Viewed by 1393
Abstract
Arrow signs found on roadway pavement are an important component of modern transportation systems. Given the rise in autonomous vehicles, public agencies are increasingly interested in accurately identifying and analysing detailed road pavement information to generate comprehensive road maps and decision support systems [...] Read more.
Arrow signs found on roadway pavement are an important component of modern transportation systems. Given the rise in autonomous vehicles, public agencies are increasingly interested in accurately identifying and analysing detailed road pavement information to generate comprehensive road maps and decision support systems that can optimise traffic flow, enhance road safety, and provide complete official road cartographic support (that can be used in autonomous driving tasks). As arrow signs are a fundamental component of traffic guidance, this paper aims to present a novel deep learning-based approach to identify the orientation and direction of arrow signs on marked roadway pavements using high-resolution aerial orthoimages. The approach is based on convolutional neural network architectures (VGGNet, ResNet, Xception, and DenseNet) that are modified and adapted for regression tasks with a proposed learning structure, together with an ad hoc model, specially introduced for this task. Although the best-performing artificial neural network was based on VGGNet (VGG-19 variant), it only slightly surpassed the proposed ad hoc model in the average values of the R2 score, mean squared error, and angular error by 0.005, 0.001, and 0.036, respectively, using the training set (the ad hoc model delivered an average R2 score, mean squared error, and angular error of 0.9874, 0.001, and 2.516, respectively). Furthermore, the ad hoc model’s predictions using the test set were the most consistent (a standard deviation of the R2 score of 0.033 compared with the score of 0.042 achieved using VGG19), while being almost eight times more computationally efficient when compared with the VGG19 model (2,673,729 parameters vs VGG19′s 20,321,985 parameters). Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

17 pages, 15056 KiB  
Article
Improving Monocular Depth Estimation with Learned Perceptual Image Patch Similarity-Based Image Reconstruction and Left–Right Difference Image Constraints
by Hyeseung Park and Seungchul Park
Electronics 2023, 12(17), 3730; https://doi.org/10.3390/electronics12173730 - 04 Sep 2023
Viewed by 1156
Abstract
This paper introduces a novel approach for self-supervised monocular depth estimation. The model is trained on stereo–image (left–right pair) data and incorporates carefully designed perceptual image quality assessment-based loss functions for image reconstruction and left–right image difference. The fidelity of the reconstructed images, [...] Read more.
This paper introduces a novel approach for self-supervised monocular depth estimation. The model is trained on stereo–image (left–right pair) data and incorporates carefully designed perceptual image quality assessment-based loss functions for image reconstruction and left–right image difference. The fidelity of the reconstructed images, obtained by warping the input images using the predicted disparity maps, significantly influences the accuracy of depth estimation in self-supervised monocular depth networks. The suggested LPIPS (Learned Perceptual Image Patch Similarity)-based evaluation of image reconstruction accurately emulates human perceptual mechanisms to quantify the quality of reconstructed images, serving as an image reconstruction loss. Consequently, it facilitates the gradual convergence of the reconstructed images toward a greater similarity with the target images during the training process. Stereo–image pair often exhibits slight discrepancies in brightness, contrast, color, and camera angle due to factors like lighting conditions and camera calibration inaccuracies. These factors limit the improvement of image reconstruction quality. To address this, the left–right difference image loss is introduced, aimed at aligning the disparities between the actual left–right image pair and the reconstructed left–right image pair. Due to the tendency of distant pixel values to approach zero in the difference images derived from the left and right source images of stereo pairs, this loss progressively steers the distant pixel values of the reconstructed difference images toward a convergence with zero. Hence, the use of this loss has demonstrated its efficacy in mitigating distortions in distant regions while enhancing overall performance. The primary objective of this study is to introduce and validate the effectiveness of LPIPS-based image reconstruction and left–right difference image losses in the context of monocular depth estimation. To this end, the proposed loss functions have been seamlessly integrated into a straightforward single-task stereo–image learning framework, incorporating simple hyperparameters. Notably, our approach achieves superior results compared to other state-of-the-art methods, even those adopting more intricate hybrid data and multi-task learning strategies. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

21 pages, 7872 KiB  
Article
YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection
by Xianxu Zhai, Zhihua Huang, Tao Li, Hanzheng Liu and Siyuan Wang
Electronics 2023, 12(17), 3664; https://doi.org/10.3390/electronics12173664 - 30 Aug 2023
Cited by 10 | Viewed by 9233
Abstract
With the widespread use of UAVs in commercial and industrial applications, UAV detection is receiving increasing attention in areas such as public safety. As a result, object detection techniques for UAVs are also developing rapidly. However, the small size of drones, complex airspace [...] Read more.
With the widespread use of UAVs in commercial and industrial applications, UAV detection is receiving increasing attention in areas such as public safety. As a result, object detection techniques for UAVs are also developing rapidly. However, the small size of drones, complex airspace backgrounds, and changing light conditions still pose significant challenges for research in this area. Based on the above problems, this paper proposes a tiny UAV detection method based on the optimized YOLOv8. First, in the detection head component, a high-resolution detection head is added to improve the device’s detection capability for small targets, while the large target detection head and redundant network layers are cut off to effectively reduce the number of network parameters and improve the detection speed of UAV; second, in the feature extraction stage, SPD-Conv is used to extract multi-scale features instead of Conv to reduce the loss of fine-grained information and enhance the model’s feature extraction capability for small targets. Finally, the GAM attention mechanism is introduced in the neck to enhance the model’s fusion of target features and improve the model’s overall performance in detecting UAVs. Relative to the baseline model, our method improves performance by 11.9%, 15.2%, and 9% in terms of P (precision), R (recall), and mAP (mean average precision), respectively. Meanwhile, it reduces the number of parameters and model size by 59.9% and 57.9%, respectively. In addition, our method demonstrates clear advantages in comparison experiments and self-built dataset experiments and is more suitable for engineering deployment and the practical applications of UAV object detection systems. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

20 pages, 4890 KiB  
Article
ESD-YOLOv5: A Full-Surface Defect Detection Network for Bearing Collars
by Jiale Li, Haipeng Pan and Junfeng Li
Electronics 2023, 12(16), 3446; https://doi.org/10.3390/electronics12163446 - 15 Aug 2023
Cited by 3 | Viewed by 1036
Abstract
To address the different forms and sizes of bearing collar surface defects, uneven distribution of defect positions, and complex backgrounds, we propose ESD-YOLOv5, an improved algorithm for bearing collar full-surface defect detection. First, a hybrid attention module, ECCA, was constructed by combining an [...] Read more.
To address the different forms and sizes of bearing collar surface defects, uneven distribution of defect positions, and complex backgrounds, we propose ESD-YOLOv5, an improved algorithm for bearing collar full-surface defect detection. First, a hybrid attention module, ECCA, was constructed by combining an efficient channel attention (ECA) mechanism and a coordinate attention (CA) mechanism, which was introduced into the YOLOv5 backbone network to enhance the localization ability of object features by the network. Second, the original neck was replaced by the constructed Slim-neck, which reduces the model’s parameters and computational complexity without sacrificing accuracy for object detection. Furthermore, the original head was replaced by the decoupled head from YOLOX, which separates the classification and regression tasks for object detection. Last, we constructed a dataset of defective bearing collars using images collected from industrial sites and conducted extensive experiments. The results demonstrate that our proposed ESD-YOLOv5 detection model achieved an mAP of 98.6% on our self-built dataset, which is a 2.3% improvement over the YOLOv5 base model. Moreover, it outperformed mainstream one-stage object detection algorithms. Additionally, the bearing collar surface defect detection system developed based on our proposed method has been successfully applied in the industrial domain for bearing collar inspection. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

17 pages, 8743 KiB  
Article
Digital Twin 3D System for Power Maintenance Vehicles Based on UWB and Deep Learning
by Mingju Chen, Tingting Liu, Jinsong Zhang, Xingzhong Xiong and Feng Liu
Electronics 2023, 12(14), 3151; https://doi.org/10.3390/electronics12143151 - 20 Jul 2023
Cited by 1 | Viewed by 907
Abstract
To address the issue of the insufficient safety monitoring of power maintenance vehicles during power operations, this study proposes a vehicle monitoring scheme based on ultra wideband (UWB) and deep learning. The UWB localization algorithm employs Chaotic Particle Swarm Optimization (CSPO) to optimize [...] Read more.
To address the issue of the insufficient safety monitoring of power maintenance vehicles during power operations, this study proposes a vehicle monitoring scheme based on ultra wideband (UWB) and deep learning. The UWB localization algorithm employs Chaotic Particle Swarm Optimization (CSPO) to optimize the Time Difference of Arrival (TDOA)/Angle of Arrival (AOA) locating scheme in order to overcome the adverse effects of the non-visual distance and multipath effects in substations and significantly improve the positioning accuracy of vehicles. To solve the problem of the a large aspect ratio and the angle in the process of power maintenance vehicle operation situational awareness in the mechanical arm of the maintenance vehicle, the arm recognition network is based on the You Only Look Once version 5 (YOLOv5) and modified by Convolutional Block Attention Module (CBAM). The long-edge definition method with circular smoothing label, SIoU loss function, and HardSwish activation function enhance the precision and processing speed for the arm state. The experimental results show that the proposed CPSO-TDOA/AOA outperforms other algorithms in localization accuracy and effectively attenuates the non-visual distance and multipath effects. The recognition accuracy of the YOLOv5-CSL-CBAM network is substantially improved; the mAP value of the vehicles arm reaches 85.04%. The detection speed meets the real-time requirement, and the digital twin of the maintenance vehicle is effectively realized in the 3D substation model. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

13 pages, 8618 KiB  
Article
Lightweight Strawberry Instance Segmentation on Low-Power Devices for Picking Robots
by Leilei Cao, Yaoran Chen and Qiangguo Jin
Electronics 2023, 12(14), 3145; https://doi.org/10.3390/electronics12143145 - 20 Jul 2023
Cited by 1 | Viewed by 1089
Abstract
Machine vision plays a great role in localizing strawberries in a complex orchard or greenhouse for picking robots. Due to the variety of each strawberry (shape, size, and color) and occlusions of strawberries by leaves and stems, precisely locating each strawberry brings a [...] Read more.
Machine vision plays a great role in localizing strawberries in a complex orchard or greenhouse for picking robots. Due to the variety of each strawberry (shape, size, and color) and occlusions of strawberries by leaves and stems, precisely locating each strawberry brings a great challenge to the vision system of picking robots. Several methods have been developed for localizing strawberries, based on the well-known Mask R-CNN network, which, however, are not efficient running on the picking robots. In this paper, we propose a simple and highly efficient framework for strawberry instance segmentation running on low-power devices for picking robots, termed StrawSeg. Instead of using the common paradigm of “detection-then-segment”, we directly segment each strawberry in a single-shot manner without relying on object detection. In our model, we design a novel feature aggregation network to merge features with different scales, which employs a pixel shuffle operation to increase the resolution and reduce the channels of features. Experiments on the open-source dataset StrawDI_Db1 demonstrate that our model can achieve a good trade-off between accuracy and inference speed on a low-power device. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

10 pages, 2419 KiB  
Article
Content-Aware Image Resizing Technology Based on Composition Detection and Composition Rules
by Bo Wang, Hongyang Si, Huiting Fu, Ruao Gao, Minjuan Zhan, Huili Jiang and Aili Wang
Electronics 2023, 12(14), 3096; https://doi.org/10.3390/electronics12143096 - 17 Jul 2023
Viewed by 800
Abstract
A novel content-aware image resizing mechanism based on composition detection and composition rules is proposed to address the lack of esthetic perception in current content-aware resizing algorithms. A composition detection module is introduced for the detection of the composition of the input image [...] Read more.
A novel content-aware image resizing mechanism based on composition detection and composition rules is proposed to address the lack of esthetic perception in current content-aware resizing algorithms. A composition detection module is introduced for the detection of the composition of the input image types in the proposed algorithm. According to the classification results, the corresponding composition rules in computational esthetics are selected. Finally, the algorithm performs the operations of seam carving using the corresponding esthetic rules. The resized image not only protects the important content of the image, but also meets the composition rules to optimize the overall visual effect of the image. The simulation results show that the proposed algorithm achieves a better visual effect. Compared with the existing algorithms, the proposed algorithm not only effectively protects important image content, but also protects important structures and improves the overall beauty of the image. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

15 pages, 2518 KiB  
Article
Automatic Fabric Defect Detection Method Using AC-YOLOv5
by Yongbin Guo, Xinjian Kang, Junfeng Li and Yuanxun Yang
Electronics 2023, 12(13), 2950; https://doi.org/10.3390/electronics12132950 - 05 Jul 2023
Cited by 7 | Viewed by 2414
Abstract
In the face of detection problems posed by complex textile texture backgrounds, different sizes, and different types of defects, commonly used object detection networks have limitations in handling target sizes. Furthermore, their stability and anti-jamming capabilities are relatively weak. Therefore, when the target [...] Read more.
In the face of detection problems posed by complex textile texture backgrounds, different sizes, and different types of defects, commonly used object detection networks have limitations in handling target sizes. Furthermore, their stability and anti-jamming capabilities are relatively weak. Therefore, when the target types are more diverse, false detections or missed detections are likely to occur. In order to meet the stringent requirements of textile defect detection, we propose a novel AC-YOLOv5-based textile defect detection method. This method fully considers the optical properties, texture distribution, imaging properties, and detection requirements specific to textiles. First, the Atrous Spatial Pyramid Pooling (ASPP) module is introduced into the YOLOv5 backbone network, and the feature map is pooled using convolution cores with different expansion rates. Multiscale feature information is obtained from feature maps of different receptive fields, which improves the detection of defects of different sizes without changing the resolution of the input image. Secondly, a convolution squeeze-and-excitation (CSE) channel attention module is proposed, and the CSE module is introduced into the YOLOv5 backbone network. The weights of each feature channel are obtained through self-learning to further improve the defect detection and anti-jamming capability. Finally, a large number of fabric images were collected using an inspection system built on a circular knitting machine at an industrial site, and a large number of experiments were conducted using a self-built fabric defect dataset. The experimental results showed that AC-YOLOv5 can achieve an overall detection accuracy of 99.1% for fabric defect datasets, satisfying the requirements for applications in industrial areas. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

18 pages, 1895 KiB  
Article
An Improved YOLOv5 Underwater Detector Based on an Attention Mechanism and Multi-Branch Reparameterization Module
by Jian Zhang, Hongda Chen, Xinyue Yan, Kexin Zhou, Jinshuai Zhang, Yonghui Zhang, Hong Jiang and Bingqian Shao
Electronics 2023, 12(12), 2597; https://doi.org/10.3390/electronics12122597 - 08 Jun 2023
Cited by 4 | Viewed by 2144
Abstract
Underwater target detection is a critical task in various applications, including environmental monitoring, underwater exploration, and marine resource management. As the demand for underwater observation and exploitation continues to grow, there is a greater need for reliable and efficient methods of detecting underwater [...] Read more.
Underwater target detection is a critical task in various applications, including environmental monitoring, underwater exploration, and marine resource management. As the demand for underwater observation and exploitation continues to grow, there is a greater need for reliable and efficient methods of detecting underwater targets. However, the unique underwater environment often leads to significant degradation of the image quality, which results in reduced detection accuracy. This paper proposes an improved YOLOv5 underwater-target-detection network to enhance accuracy and reduce missed detection. First, we added the global attention mechanism (GAM) to the backbone network, which could retain the channel and spatial information to a greater extent and strengthen cross-dimensional interaction so as to improve the ability of the backbone network to extract features. Then, we introduced the fusion block based on DAMO-YOLO for the neck, which enhanced the system’s ability to extract features at different scales. Finally, we used the SIoU loss to measure the degree of matching between the target box and the regression box, which accelerated the convergence and improved the accuracy. The results obtained from experiments on the URPC2019 dataset revealed that our model achieved an [email protected] score of 80.2%, representing a 1.8% and 2.3% increase in performance compared to YOLOv7 and YOLOv8, respectively, which means our method achieved state-of-the-art (SOTA) performance. Moreover, additional evaluations on the MS COCO dataset indicated that our model’s [email protected]:0.95 reached 51.0%, surpassing advanced methods such as ViDT and RF-Next, demonstrating the versatility of our enhanced model architecture. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

25 pages, 1204 KiB  
Article
Research on Railway Dispatcher Fatigue Detection Method Based on Deep Learning with Multi-Feature Fusion
by Liang Chen and Wei Zheng
Electronics 2023, 12(10), 2303; https://doi.org/10.3390/electronics12102303 - 19 May 2023
Cited by 2 | Viewed by 1139
Abstract
Traffic command and scheduling are the core monitoring aspects of railway transportation. Detecting the fatigued state of dispatchers is, therefore, of great significance to ensure the safety of railway operations. In this paper, we present a multi-feature fatigue detection method based on key [...] Read more.
Traffic command and scheduling are the core monitoring aspects of railway transportation. Detecting the fatigued state of dispatchers is, therefore, of great significance to ensure the safety of railway operations. In this paper, we present a multi-feature fatigue detection method based on key points of the human face and body posture. Considering unfavorable factors such as facial occlusion and angle changes that have limited single-feature fatigue state detection methods, we developed our model based on the fusion of body postures and facial features for better accuracy. Using facial key points and eye features, we calculate the percentage of eye closure that accounts for more than 80% of the time duration, as well as blinking and yawning frequency, and we analyze fatigue behaviors, such as yawning, a bowed head (that could indicate sleep state), and lying down on a table, using a behavior recognition algorithm. We fuse five facial features and behavioral postures to comprehensively determine the fatigue state of dispatchers. The results show that on the 300 W dataset, as well as a hand-crafted dataset, the inference time of the improved facial key point detection algorithm based on the retina–face model was 100 ms and that the normalized average error (NME) was 3.58. On our own dataset, the classification accuracy based the an Bi-LSTM-SVM adaptive enhancement algorithm model reached 97%. Video data of volunteers who carried out scheduling operations in the simulation laboratory were used for our experiments, and our multi-feature fusion fatigue detection algorithm showed an accuracy rate of 96.30% and a recall rate of 96.30% in fatigue classification, both of which were higher than those of existing single-feature detection methods. Our multi-feature fatigue detection method offers a potential solution for fatigue level classification in vital areas of the industry, such as in railway transportation. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)
Show Figures

Figure 1

Back to TopTop