Deep Learning-Based Object Detection/Classification

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 January 2025 | Viewed by 2794

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Interests: biomedical wearable system; mobile device identity recognition system; commodity big data analysis system; 3D printing and scanning application system

Special Issue Information

Dear Colleagues,

Object detection and classification are two important tasks in computer vision, and their algorithm, architecture, system and application scope are very wide. Here are some common application scenarios:

  1. Algorithm: Object detection and classification algorithms are used to identify and classify objects in images or videos. Common object detection algorithms include YOLO, Faster R-CNN, SSD, etc., and common object classification algorithms include ResNet, VGG, etc. These algorithms can be used in many fields such as human–computer interaction, security monitoring, autonomous driving, smart home and manufacturing.
  2. Architecture: Object detection and classification architecture mainly refers to the computer system used to process images or videos, including hardware architecture and software architecture. Hardware architecture usually includes processors, GPUs, FPGAs, etc., and software architecture includes operating systems, programming languages, libraries, etc. The selection of appropriate architecture can improve the efficiency and accuracy of object detection and classification tasks.
  3. System: Object detection and classification systems can be applied to many fields. For example, human–computer interaction systems can detect and classify human actions and other behaviors in images or videos to achieve communication between people and machines; security monitoring systems can monitor the behavior of people in the video to identify and classify the behavior to achieve intelligent alarms; autonomous driving systems can detect and classify the objects in the road to help the autonomous vehicle identify traffic signals and obstacles; smart home systems can recognize the location and activity of users to automatically adjust the indoor environment; manufacturing systems can detect whether there is a defect or error in the manufactured product according to the image information of the product, and classify and count the types and quantities of defects to help enterprises improve production quality.
  4. Application: Object detection and classification applications are very extensive, including but not limited to human–computer interaction, security monitoring, autonomous driving, smart home, manufacturing and other fields. For example, in human–computer interaction, object detection and classification can be used to identify human actions and other behaviors in images or videos, so as to achieve communication between people and machines; in security monitoring, object detection and classification can be used to monitor the behavior of people in the video to identify and classify the behavior to achieve intelligent alarms; in autonomous driving, object detection and classification can be used to detect and classify the objects in the road to help the autonomous vehicle identify traffic signals and obstacles; in smart homes, object detection and classification can be used to recognize the location and activity of users to automatically adjust the indoor environment; in manufacturing, object detection and classification can be used to detect whether there is a defect or error in the manufactured product according to the image information of the product, and classify and count the types and quantities of defects to help enterprises improve production quality.

In short, object detection and classification have broad application prospects in many fields of computer vision technology, which can help people better understand the visual world, improve work efficiency and quality of life.

Dr. Kuo-Kun Tseng
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection
  • object classification

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 5820 KiB  
Article
Optic Nerve Sheath Ultrasound Image Segmentation Based on CBC-YOLOv5s
by Yonghua Chu, Jinyang Xu, Chunshuang Wu, Jianping Ye, Jucheng Zhang, Lei Shen, Huaxia Wang and Yudong Yao
Electronics 2024, 13(18), 3595; https://doi.org/10.3390/electronics13183595 - 10 Sep 2024
Viewed by 414
Abstract
The diameter of the optic nerve sheath is an important indicator for assessing the intracranial pressure in critically ill patients. The methods for measuring the optic nerve sheath diameter are generally divided into invasive and non-invasive methods. Compared to the invasive methods, the [...] Read more.
The diameter of the optic nerve sheath is an important indicator for assessing the intracranial pressure in critically ill patients. The methods for measuring the optic nerve sheath diameter are generally divided into invasive and non-invasive methods. Compared to the invasive methods, the non-invasive methods are safer and have thus gained popularity. Among the non-invasive methods, using deep learning to process the ultrasound images of the eyes of critically ill patients and promptly output the diameter of the optic nerve sheath offers significant advantages. This paper proposes a CBC-YOLOv5s optic nerve sheath ultrasound image segmentation method that integrates both local and global features. First, it introduces the CBC-Backbone feature extraction network, which consists of dual-layer C3 Swin-Transformer (C3STR) and dual-layer Bottleneck Transformer (BoT3) modules. The C3STR backbone’s multi-layer convolution and residual connections focus on the local features of the optic nerve sheath, while the Window Transformer Attention (WTA) mechanism in the C3STR module and the Multi-Head Self-Attention (MHSA) in the BoT3 module enhance the model’s understanding of the global features of the optic nerve sheath. The extracted local and global features are fully integrated in the Spatial Pyramid Pooling Fusion (SPPF) module. Additionally, the CBC-Neck feature pyramid is proposed, which includes a single-layer C3STR module and three-layer CReToNeXt (CRTN) module. During upsampling feature fusion, the C3STR module is used to enhance the local and global awareness of the fused features. During downsampling feature fusion, the CRTN module’s multi-level residual design helps the network to better capture the global features of the optic nerve sheath within the fused features. The introduction of these modules achieves the thorough integration of the local and global features, enabling the model to efficiently and accurately identify the optic nerve sheath boundaries, even when the ocular ultrasound images are blurry or the boundaries are unclear. The Z2HOSPITAL-5000 dataset collected from Zhejiang University Second Hospital was used for the experiments. Compared to the widely used YOLOv5s and U-Net algorithms, the proposed method shows improved performance on the blurry test set. Specifically, the proposed method achieves precision, recall, and Intersection over Union (IoU) values that are 4.1%, 2.1%, and 4.5% higher than those of YOLOv5s. When compared to U-Net, the precision, recall, and IoU are improved by 9.2%, 21%, and 19.7%, respectively. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

14 pages, 1280 KiB  
Article
Multihead-Res-SE Residual Network with Attention for Human Activity Recognition
by Hongbo Kang, Tailong Lv, Chunjie Yang and Wenqing Wang
Electronics 2024, 13(17), 3407; https://doi.org/10.3390/electronics13173407 - 27 Aug 2024
Viewed by 544
Abstract
Human activity recognition (HAR) typically uses wearable sensors to identify and analyze the time-series data they collect, enabling recognition of specific actions. As such, HAR is increasingly applied in human–computer interaction, healthcare, and other fields, making accurate and efficient recognition of various human [...] Read more.
Human activity recognition (HAR) typically uses wearable sensors to identify and analyze the time-series data they collect, enabling recognition of specific actions. As such, HAR is increasingly applied in human–computer interaction, healthcare, and other fields, making accurate and efficient recognition of various human activities. In recent years, deep learning methods have been extensively applied in sensor-based HAR, yielding remarkable results. However, complex HAR research, which involves specific human behaviors in varied contexts, still faces several challenges. To solve these problems, we propose a multi-head neural network based on the attention mechanism. This framework contains three convolutional heads, with each head designed using one-dimensional CNN to extract features from sensory data. The model uses a channel attention module (squeeze–excitation module) to enhance the representational capabilities of convolutional neural networks. We conducted experiments on two publicly available benchmark datasets, UCI-HAR and WISDM, to evaluate our model. The results were satisfactory, with overall recognition accuracies of 96.72% and 97.73% on their respective datasets. The experimental results demonstrate the effectiveness of the network structure for the HAR, which ensures a higher level of accuracy. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

28 pages, 12322 KiB  
Article
An Efficient Transformer–CNN Network for Document Image Binarization
by Lina Zhang, Kaiyuan Wang and Yi Wan
Electronics 2024, 13(12), 2243; https://doi.org/10.3390/electronics13122243 - 7 Jun 2024
Viewed by 610
Abstract
Color image binarization plays a pivotal role in image preprocessing work and significantly impacts subsequent tasks, particularly for text recognition. This paper concentrates on document image binarization (DIB), which aims to separate an image into a foreground (text) and background (non-text content). We [...] Read more.
Color image binarization plays a pivotal role in image preprocessing work and significantly impacts subsequent tasks, particularly for text recognition. This paper concentrates on document image binarization (DIB), which aims to separate an image into a foreground (text) and background (non-text content). We thoroughly analyze conventional and deep-learning-based approaches and conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields of pre- and post-network training to underscore the Transformer model’s advantages. Subsequently, we introduce a lightweight model based on the U-Net structure and enhanced with the MobileViT module to capture global information features in document images better. Given its adeptness at learning both local and global features, our proposed model demonstrates competitive performance on two standard datasets (DIBCO2012 and DIBCO2017) and good robustness on the DIBCO2019 dataset. Notably, our proposed method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing, eschewing the use of ensemble models. Moreover, its parameter count is less than one-eighth of the model, which achieves the best results on most DIBCO datasets. Finally, two sets of ablation experiments are conducted to verify the effectiveness of the proposed binarization model. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

19 pages, 5278 KiB  
Article
A Foam Line Position Detection Algorithm for A/O Pool Based on YOLOv5
by Yubin Xu, Yihao Wu and Yinzhang Guo
Electronics 2024, 13(10), 1834; https://doi.org/10.3390/electronics13101834 - 9 May 2024
Viewed by 616
Abstract
During the biochemical pretreatment process of leachate in urban landfill sites, if the foam in the A/O pool is not promptly addressed, it can lead to overflow, posing hazards to the surrounding environment and personnel. Therefore, a real-time foam line detection algorithm based [...] Read more.
During the biochemical pretreatment process of leachate in urban landfill sites, if the foam in the A/O pool is not promptly addressed, it can lead to overflow, posing hazards to the surrounding environment and personnel. Therefore, a real-time foam line detection algorithm based on YOLOv5x was proposed, which enhances feature information and improves anchor box regression prediction to accurately detect the position of foam lines. Firstly, in the preprocessing stage, employing a rectangular box to simultaneously label the foam line and the edge of the A/O pool within the same region, enhances the feature information of the foam line. Then, the C3NAM module was proposed, which applies weight sparse penalties to attention modules in the feature extraction section, to enhance the capability of extracting foam line features. Subsequently, a B-SPPCSPC module was proposed to enhance the fusion of shallow and deep feature information, addressing the issue of susceptibility to background interference during foam line detection. Next, the Focal_EIOU was introduced to ameliorate the issue of class imbalance in detection, providing more accurate bounding box predictions. Lastly, optimizing the detection layer scale improves the detection performance for smaller targets. The experimental results demonstrate that the accuracy of this algorithm reaches 98.9%, and the recall reaches 88.1%, with a detection frame rate of 26.2 frames per second, which can meet the actual detection requirements of real-world application scenarios. Full article
(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)
Show Figures

Figure 1

Back to TopTop