Submit to Special Issue Submit Abstract to Special Issue Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

Deep Learning-Based Object Detection/Classification

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 July 2025 | Viewed by 9108

Share This Special Issue

Special Issue Editor

Dr. Kuo-Kun Tseng

E-Mail Website
Guest Editor

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Interests: biomedical wearable system; mobile device identity recognition system; commodity big data analysis system; 3D printing and scanning application system

Special Issue Information

Dear Colleagues,

Object detection and classification are two important tasks in computer vision, and their algorithm, architecture, system and application scope are very wide. Here are some common application scenarios:

Algorithm: Object detection and classification algorithms are used to identify and classify objects in images or videos. Common object detection algorithms include YOLO, Faster R-CNN, SSD, etc., and common object classification algorithms include ResNet, VGG, etc. These algorithms can be used in many fields such as human–computer interaction, security monitoring, autonomous driving, smart home and manufacturing.
Architecture: Object detection and classification architecture mainly refers to the computer system used to process images or videos, including hardware architecture and software architecture. Hardware architecture usually includes processors, GPUs, FPGAs, etc., and software architecture includes operating systems, programming languages, libraries, etc. The selection of appropriate architecture can improve the efficiency and accuracy of object detection and classification tasks.
System: Object detection and classification systems can be applied to many fields. For example, human–computer interaction systems can detect and classify human actions and other behaviors in images or videos to achieve communication between people and machines; security monitoring systems can monitor the behavior of people in the video to identify and classify the behavior to achieve intelligent alarms; autonomous driving systems can detect and classify the objects in the road to help the autonomous vehicle identify traffic signals and obstacles; smart home systems can recognize the location and activity of users to automatically adjust the indoor environment; manufacturing systems can detect whether there is a defect or error in the manufactured product according to the image information of the product, and classify and count the types and quantities of defects to help enterprises improve production quality.
Application: Object detection and classification applications are very extensive, including but not limited to human–computer interaction, security monitoring, autonomous driving, smart home, manufacturing and other fields. For example, in human–computer interaction, object detection and classification can be used to identify human actions and other behaviors in images or videos, so as to achieve communication between people and machines; in security monitoring, object detection and classification can be used to monitor the behavior of people in the video to identify and classify the behavior to achieve intelligent alarms; in autonomous driving, object detection and classification can be used to detect and classify the objects in the road to help the autonomous vehicle identify traffic signals and obstacles; in smart homes, object detection and classification can be used to recognize the location and activity of users to automatically adjust the indoor environment; in manufacturing, object detection and classification can be used to detect whether there is a defect or error in the manufactured product according to the image information of the product, and classify and count the types and quantities of defects to help enterprises improve production quality.

In short, object detection and classification have broad application prospects in many fields of computer vision technology, which can help people better understand the visual world, improve work efficiency and quality of life.

Dr. Kuo-Kun Tseng
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

object detection
object classification

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

25 pages, 12059 KiB

Open AccessArticle

FasterGDSF-DETR: A Faster End-to-End Real-Time Fire Detection Model via the Gather-and-Distribute Mechanism

by Chengming Liu, Fan Wu and Lei Shi

Electronics 2025, 14(7), 1472; https://doi.org/10.3390/electronics14071472 - 6 Apr 2025

Viewed by 240

Abstract

Fire detection using deep learning has become a widely adopted approach. However, YOLO-based models often face performance limitations due to NMS, while DETR-based models struggle to meet real-time processing requirements. To address these challenges, we propose FasterGDSF-DETR, a novel fire detection model built upon the RT-DETR framework, designed to enhance both detection accuracy and efficiency. Firstly, this model introduces the FasterDBBNet backbone, which efficiently captures and retains feature information, accelerating the model’s convergence speed. Secondly, we propose the AIFI-GDSF hybrid encoder to reduce information loss in intra-scale interactions and improve the capability of detecting varying morphological flames. Furthermore, to better adapt to complex fire scenarios, we expand the dataset based on the KMU Fire and Smoke database and incorporate WIoU as the loss function to improve model robustness. Experimental results demonstrate that our proposed model surpasses mainstream object detection models in both accuracy and computational efficiency. FasterGDSF-DETR achieves a mean Average Precision of 71.5% on the self-constructed dataset, outperforming the YOLOv9 model of the same scale by 2.4 percentage points. This study introduces a novel task-specific enhancement to the RT-DETR framework, offering valuable insights for future advancements in fire detection technology. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

13 pages, 1362 KiB

Open AccessArticle

A CLIP-Based Framework to Enhance Order Accuracy in Food Packaging

by Mattia Gatti, Anwar Ur Rehman and Ignazio Gallo

Electronics 2025, 14(7), 1420; https://doi.org/10.3390/electronics14071420 - 31 Mar 2025

Viewed by 196

Abstract

This study addresses the challenge of ensuring order accuracy in the dynamic environment of industrial food packaging through a novel zero-shot learning framework. The fundamental limitations of conventional systems, which rely heavily on pre-defined food categories, require a flexible approach capable of adapting to unseen and new food items. Our approach leverages the CLIP model for its efficient capability to semantically match text descriptions with image content, alongside YOLO’s robust object detection abilities, to ensure accurate order fulfilment without prior knowledge of the food items. To assess the effectiveness of this approach, we introduced the Food Recognition dataset, comprising multi-compartment food packages with annotated food items, uniquely representing a variety of complex Italian recipes. Our CLIP-based approach can understand if a specific food name is represented by an image with a precision of 92.92% and a recall of 76.65% on the FR dataset, showcasing the model’s effectiveness in recognizing and validating diverse food items in real-time scenarios. Furthermore, experiments conducted on 1000 entire food packages showed that our framework can detect whether a user’s order matches the package contents with an accuracy of 85.86%. These results underline the potential of employing semantic image-text matching approaches to improve the efficiency of food packaging processes. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

14 pages, 6222 KiB

Open AccessArticle

Volume Estimation Method for Irregular Object Using RGB-D Deep Learning

by Ji-hwan Kim, Dong-seok Lee and Soon-kak Kwon

Electronics 2025, 14(5), 919; https://doi.org/10.3390/electronics14050919 - 26 Feb 2025

Viewed by 611

Abstract

The demand for non-contact object volume estimation has been increasing across various industries. Although image-based volume estimation methods have been studied, they are generally limited to measuring simple shapes and are not suitable for irregularly shaped objects. We propose a volume estimation method for a single object using color and depth images through a deep learning network. For the color image, the object region is detected using a pre-trained segmentation model. For the depth image, the heights corresponding to pixels in the object region are calculated as differences between the pixel values and a background depth image that has captured a floor without any objects. Then, the heights are divided into multiple 2 × 2 pixel units. The element volumes corresponding to the 2 × 2 pixel units are estimated by a deep learning network which takes heights in a 2 × 2 pixel unit as inputs. This network estimates the corresponding volume to this unit through multiple fully connected layers. All estimated element volumes are summed to obtain the object volume. The simulation results show that the average volume estimation error is about 2.37%. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

17 pages, 3765 KiB

Open AccessArticle

FeatherFace: Robust and Lightweight Face Detection via Optimal Feature Integration

by Dohun Kim, Jinmyung Jung and Jinhyun Kim

Electronics 2025, 14(3), 517; https://doi.org/10.3390/electronics14030517 - 27 Jan 2025

Viewed by 858

Abstract

Face detection in resource-constrained environments presents challenges, due to the computational demands of state-of-the-art models and the complexity of real-world conditions, such as variations in scale, pose, and occlusion. This study introduces FeatherFace, a lightweight face-detection architecture with only 0.49 M parameters, designed for high accuracy and efficiency in such environments. Leveraging MobileNet-0.25 as a backbone, FeatherFace incorporates advanced feature-integration strategies, including a bidirectional feature pyramid network (BiFPN), a convolutional block attention module (CBAM), deformable convolutions, and channel shuffling. Evaluated on the WIDERFace dataset, FeatherFace achieves an overall average precision (AP) of 87.2%, with notable performance gains of 4.0% AP on the Hard subset compared with the baseline. Ablation studies highlight the critical role of multiscale feature integration and the strategic placement of attention mechanisms in addressing detection challenges such as small or occluded faces. With its compact design and reduced inference time, FeatherFace bridges the gap between the reliability of computationally intensive models and the need for deploying robust models in highly resource-constrained environments, such as edge devices and embedded systems. This work provides valuable insights for developing robust and lightweight models suited to challenging real-world applications. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

17 pages, 6468 KiB

Open AccessArticle

Research on Deep Learning Model Enhancements for PCB Surface Defect Detection

by Hao Yan, Hong Zhang, Fengyu Gao, Huaqin Wu and Shun Tang

Electronics 2024, 13(23), 4626; https://doi.org/10.3390/electronics13234626 - 23 Nov 2024

Viewed by 1216

Abstract

With the miniaturization and increasing complexity of electronic devices, the accuracy and efficiency of printed circuit board (PCB) defect detection are crucial to ensuring product quality. To address the issues of small defect sizes and high missed detection rates in PCB surface inspection, this paper proposes an enhanced YOLOv8s model which not only improves detection performance but also achieves a lightweight design. Firstly, the Nexus Attention module is introduced, which organically integrates multiple attention mechanisms to further enhance feature extraction and fusion capabilities, improving the model’s learning and generalization performance. Secondly, an improved CGFPN network is designed to optimize multi-scale feature fusion, significantly boosting the detection of small objects. Additionally, the WaveletUnPool module is incorporated, leveraging wavelet transform technology to refine the upsampling process, accurately restoring detailed information and improving small-object detection in complex backgrounds. Lastly, the C2f-GDConv module replaces the traditional C2f module, reducing the number of model parameters and computational complexity while maintaining feature extraction efficiency. Comparative experiments on a public PCB dataset demonstrate that the enhanced model achieved a mean average precision (mAP) of 97.3% in PCB defect detection tasks, representing a 3.0% improvement over the original model, while reducing Giga Floating Point Operations (GFLOPs) by 26.8%. These enhancements make the model more practical and adaptable for industrial applications, providing a solid foundation for future research. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

13 pages, 5820 KiB

Open AccessArticle

Optic Nerve Sheath Ultrasound Image Segmentation Based on CBC-YOLOv5s

by Yonghua Chu, Jinyang Xu, Chunshuang Wu, Jianping Ye, Jucheng Zhang, Lei Shen, Huaxia Wang and Yudong Yao

Electronics 2024, 13(18), 3595; https://doi.org/10.3390/electronics13183595 - 10 Sep 2024

Viewed by 988

Abstract

The diameter of the optic nerve sheath is an important indicator for assessing the intracranial pressure in critically ill patients. The methods for measuring the optic nerve sheath diameter are generally divided into invasive and non-invasive methods. Compared to the invasive methods, the non-invasive methods are safer and have thus gained popularity. Among the non-invasive methods, using deep learning to process the ultrasound images of the eyes of critically ill patients and promptly output the diameter of the optic nerve sheath offers significant advantages. This paper proposes a CBC-YOLOv5s optic nerve sheath ultrasound image segmentation method that integrates both local and global features. First, it introduces the CBC-Backbone feature extraction network, which consists of dual-layer C3 Swin-Transformer (C3STR) and dual-layer Bottleneck Transformer (BoT3) modules. The C3STR backbone’s multi-layer convolution and residual connections focus on the local features of the optic nerve sheath, while the Window Transformer Attention (WTA) mechanism in the C3STR module and the Multi-Head Self-Attention (MHSA) in the BoT3 module enhance the model’s understanding of the global features of the optic nerve sheath. The extracted local and global features are fully integrated in the Spatial Pyramid Pooling Fusion (SPPF) module. Additionally, the CBC-Neck feature pyramid is proposed, which includes a single-layer C3STR module and three-layer CReToNeXt (CRTN) module. During upsampling feature fusion, the C3STR module is used to enhance the local and global awareness of the fused features. During downsampling feature fusion, the CRTN module’s multi-level residual design helps the network to better capture the global features of the optic nerve sheath within the fused features. The introduction of these modules achieves the thorough integration of the local and global features, enabling the model to efficiently and accurately identify the optic nerve sheath boundaries, even when the ocular ultrasound images are blurry or the boundaries are unclear. The Z2HOSPITAL-5000 dataset collected from Zhejiang University Second Hospital was used for the experiments. Compared to the widely used YOLOv5s and U-Net algorithms, the proposed method shows improved performance on the blurry test set. Specifically, the proposed method achieves precision, recall, and Intersection over Union (IoU) values that are 4.1%, 2.1%, and 4.5% higher than those of YOLOv5s. When compared to U-Net, the precision, recall, and IoU are improved by 9.2%, 21%, and 19.7%, respectively. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

14 pages, 1280 KiB

Open AccessArticle

Multihead-Res-SE Residual Network with Attention for Human Activity Recognition

by Hongbo Kang, Tailong Lv, Chunjie Yang and Wenqing Wang

Electronics 2024, 13(17), 3407; https://doi.org/10.3390/electronics13173407 - 27 Aug 2024

Cited by 1 | Viewed by 1663

Abstract

Human activity recognition (HAR) typically uses wearable sensors to identify and analyze the time-series data they collect, enabling recognition of specific actions. As such, HAR is increasingly applied in human–computer interaction, healthcare, and other fields, making accurate and efficient recognition of various human activities. In recent years, deep learning methods have been extensively applied in sensor-based HAR, yielding remarkable results. However, complex HAR research, which involves specific human behaviors in varied contexts, still faces several challenges. To solve these problems, we propose a multi-head neural network based on the attention mechanism. This framework contains three convolutional heads, with each head designed using one-dimensional CNN to extract features from sensory data. The model uses a channel attention module (squeeze–excitation module) to enhance the representational capabilities of convolutional neural networks. We conducted experiments on two publicly available benchmark datasets, UCI-HAR and WISDM, to evaluate our model. The results were satisfactory, with overall recognition accuracies of 96.72% and 97.73% on their respective datasets. The experimental results demonstrate the effectiveness of the network structure for the HAR, which ensures a higher level of accuracy. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

28 pages, 12322 KiB

Open AccessArticle

An Efficient Transformer–CNN Network for Document Image Binarization

by Lina Zhang, Kaiyuan Wang and Yi Wan

Electronics 2024, 13(12), 2243; https://doi.org/10.3390/electronics13122243 - 7 Jun 2024

Viewed by 1348

Abstract

Color image binarization plays a pivotal role in image preprocessing work and significantly impacts subsequent tasks, particularly for text recognition. This paper concentrates on document image binarization (DIB), which aims to separate an image into a foreground (text) and background (non-text content). We thoroughly analyze conventional and deep-learning-based approaches and conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields of pre- and post-network training to underscore the Transformer model’s advantages. Subsequently, we introduce a lightweight model based on the U-Net structure and enhanced with the MobileViT module to capture global information features in document images better. Given its adeptness at learning both local and global features, our proposed model demonstrates competitive performance on two standard datasets (DIBCO2012 and DIBCO2017) and good robustness on the DIBCO2019 dataset. Notably, our proposed method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing, eschewing the use of ensemble models. Moreover, its parameter count is less than one-eighth of the model, which achieves the best results on most DIBCO datasets. Finally, two sets of ablation experiments are conducted to verify the effectiveness of the proposed binarization model. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Figure 1

19 pages, 5278 KiB

Open AccessArticle

A Foam Line Position Detection Algorithm for A/O Pool Based on YOLOv5

by Yubin Xu, Yihao Wu and Yinzhang Guo

Electronics 2024, 13(10), 1834; https://doi.org/10.3390/electronics13101834 - 9 May 2024

Viewed by 903

Abstract

During the biochemical pretreatment process of leachate in urban landfill sites, if the foam in the A/O pool is not promptly addressed, it can lead to overflow, posing hazards to the surrounding environment and personnel. Therefore, a real-time foam line detection algorithm based on YOLOv5x was proposed, which enhances feature information and improves anchor box regression prediction to accurately detect the position of foam lines. Firstly, in the preprocessing stage, employing a rectangular box to simultaneously label the foam line and the edge of the A/O pool within the same region, enhances the feature information of the foam line. Then, the C3NAM module was proposed, which applies weight sparse penalties to attention modules in the feature extraction section, to enhance the capability of extracting foam line features. Subsequently, a B-SPPCSPC module was proposed to enhance the fusion of shallow and deep feature information, addressing the issue of susceptibility to background interference during foam line detection. Next, the Focal_EIOU was introduced to ameliorate the issue of class imbalance in detection, providing more accurate bounding box predictions. Lastly, optimizing the detection layer scale improves the detection performance for smaller targets. The experimental results demonstrate that the accuracy of this algorithm reaches 98.9%, and the recall reaches 88.1%, with a detection frame rate of 26.2 frames per second, which can meet the actual detection requirements of real-world application scenarios. Full article

(This article belongs to the Special Issue Deep Learning-Based Object Detection/Classification)

► Show Figures

Journal Menu

Journal Browser

Deep Learning-Based Object Detection/Classification

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI