1. Introduction
As the complexity of PCB (printed circuit board) manufacturing processes escalates, even minor defects can significantly impair product performance, leading to diminished yield rates [
1]. It is critical to precisely identify defects such as shorts, open circuits, spurs, spurious copper, mouse bites, and missing holes [
2] in PCB production and usage to enhance product yield. Traditional PCB defect detection methods encompass visual inspection and automated defect detection technology [
3,
4]. The former hinges on manual observation for the identification of small surface defects, demanding sustained attention and being prone to visual fatigue and distraction-induced errors [
5]. The latter comprises methods like optical inspection, X-ray inspection, and infrared thermal imaging [
6], providing high detection efficiency and accuracy but necessitating costly equipment and time. These methods may even inflict damage on the PCB. Utilizing symmetry features for PCB defect detection can effectively identify and classify defects such as scratches, cracks, stains, and missing solder joints, enabling timely rectification of potential issues. Zhang proposed a method based on convolutional neural networks for accurately classifying and recognizing symmetry in planar engineering structures [
7]. However, the classification accuracy achieved was 86.69%, indicating a need for further improvement. Hence, devising high-precision and high-efficiency methods for detecting PCB defects continues to pose a challenge.
Deep learning image recognition algorithms have demonstrated impressive success across various fields [
8,
9,
10,
11]. Their integration with PCB defect detection displays good adaptability [
12,
13,
14] and has shown progress. The deep learning object detection method involves training neural network models to identify PCB surface defects, including steps such as constructing annotated datasets, designing network architectures, and adjusting training parameters and strategies. During the training process, weight parameters are continuously updated to reduce loss values, improve accuracy, and ultimately evaluate model performance through valid or test sets. Research results indicate that this method is effective in detecting PCB defects. Wei et al. utilized Automatic Optical Inspection (AOI) technology and CNN for PCB defect detection [
15], demonstrating higher accuracy and stability compared with traditional methods. Due to the complexity of AOI technology involving multiple fields such as computer science, optoelectronics, machine vision, and pattern recognition, further improvements are needed in algorithm real-time performance, preprocessing accuracy, and defect classification precision. Kaya et al. proposed a method utilizing a deep-learning-based hybrid optical sensor for the noncontact detection and classification of small-sized defects on large PCBs, addressing issues of time consumption and fatigue [
16]. However, the method involves the use of optical microscopic sensors, which introduces operational complexity. Kim et al. proposed a PCB detection system based on a skip-connection convolutional autoencoder, achieving efficient defect detection [
17]. However, this method uses manually generated PCB defect images, which requires further model verification of actual PCB defect data, and the inference speed needs to be improved.
Ding et al. [
2] proposed a small defect detection network called TDD-Net, achieving notable results by leveraging the multiscale and pyramid-level characteristics of deep convolutional networks in constructing a feature pyramid with high portability. Hu et al. [
18] proposed an improved Faster-R-CNN model, utilizing ResNet50 as the backbone network and fusing GARPN (Region Proposal by Guided Anchoring) and ShuffleNetV2 residual units. Peishu Wu et al. [
19] proposed an improved multiscale small target detection method and validated its efficacy and satisfactory results on the PCB dataset. Liao et al. [
20] enhanced the YOLOv4 network, proposed the YOLOv4-MN3 model, and achieved accurate surface defects detection on PCB using the MobileNetV3 backbone network and optimization strategy. The experimental results reveal that the model attained performance of up to 98.64% mAP and 56.98 FPS (Frames Per Second) on six different surface defects. Cheng et al. [
21] proposed a fast Tiny RetinaNet network to address the class imbalance problem by mitigating the influence of easily classified samples on classification results. Yu et al. [
22] improved small defect detection by proposing DFP (diagonal feature pyramid) to enhance performance. Other deep-learning-based PCB defect detection algorithms include [
23,
24], but most of these methods generally suffer from large model sizes and slow detection speed issues. For instance, the parameters of the aforementioned models usually exceed 60 MB, and the detection speed typically falls below 90 FPS, making it challenging to meet the high-performance demands of PCB industrial production.
To effectively augment the efficiency of deep-learning-based PCB defect detection algorithms and achieve lightweight detection models, this paper posits the following improvement strategies based on previous work: Firstly, we design a multiscale feature fusion structure founded on the characteristics of the BiFPN (Bidirectional Feature Pyramid Network) [
25] network structure to tackle the issue of missing and displaced features in FPN (Feature Pyramid Network) [
26] and PAN (Path Aggregation Network) [
27] structures, thereby enhancing model accuracy while decreasing model size. Secondly, to address the issue of large calculation load and slow inference speed caused by the Bottleneck structure used in C2f (CSPDarknet53 to 2-Stage FPN), we utilize PConv (Partial Convolution) to optimize the conventional convolution structure based on the FasterNet [
28] network structure characteristics, effectively reducing calculation load and boosting model inference speed. Thirdly, to solve the problem that the CIoU (Complete Intersection over Union) loss function cannot optimize prediction boxes and ground truth boxes when they share the same aspect ratio, leading to a complex computation process, we optimize the loss function based on MPDIoU (Minimum Point Distance Intersection over Union) [
29] to improve accuracy and simplify the computation process.
Drawing on the above research and thinking in addressing the issue of defect detection in printed circuit boards, this study is grounded in the relevant literature [
25,
26,
28,
30], with a focus on exploring the optimization of detection accuracy and inference speed. A novel improved model, LW-YOLO, is proposed for this problem, which integrates three optimization strategies, including network structure enhancement and algorithm optimization, leading to improved detection accuracy and inference speed. The primary contributions of this paper are as follows:
We propose a straightforward and efficient bidirectional feature pyramid network to effectively fuse features of different scales, thereby improving object detection accuracy.
We optimize the Bottleneck convolution structure based on the FasterNet module network structure design and introduce it into the new model, reducing redundant calculations and memory access while enhancing detection speed.
We introduce a new loss measurement method based on MPDIoU, which overcomes existing loss function limitations and improves object detection task convergence speed and regression result accuracy.
The rest of this paper is organized as follows:
Section 2 introduces the proposed method in detail, and
Section 3 introduces image preprocessing and related information about the dataset used in this paper. In
Section 4, we evaluate the performance of this model through ablation experiments and comparative experiments. Finally,
Section 5 summarizes the entire paper.
2. Methodology
2.1. YOLOv8 Model
The YOLOv8 algorithm is an efficient one-stage object detection method that consists of several key modules: input preprocessing, backbone network, neck module, and output processing. In the input preprocessing stage, the algorithm processes the input image through techniques such as data augmentation, adaptive anchoring calculation, and adaptive grayscale filling. The backbone network and neck module are the core of the YOLOv8 network, extracting features of different scales from the input image through a series of convolution and feature extraction operations. The neck module, an improvement on the C3 module, leverages the advantages of the ELAN structure [
31] and introduces bottleneck modules to enhance gradient branches for better capture of gradient flow information. Overall, the YOLOv8 algorithm retains its lightweight characteristics while enhancing the richness of gradient information.
Figure 1 illustrates the basic structure of this algorithm.
The backbone network and Neck section integrate the design principles of YOLOv7 ELAN, replacing the C3 structure in YOLOv5 with the C2f structure to enhance feature representation capability. Additionally, they employ a decoupled head structure to separate classification and detection tasks. The introduction of TaskAlignedAssigner for positive sample assignment strategy and Distribution Focal Loss for loss calculation comprehensively improves object detection performance.
2.2. LW-YOLO Model
This study draws upon the YOLOv8 model, proposing three enhancements to address its limitations, resulting in a novel network model termed LW-YOLO. The network structure of LW-YOLO is depicted in
Figure 2, while
Figure 3 demonstrates the detailed structure of LW-YOLO, which comprises three components, Backbone, Neck, and Head, which are elaborated upon in the following sections.
The Backbone extracts the feature representation of the input image and conveys it to subsequent layers for detection. This component incorporates a convolutional neural network, specifically CSPDarknet53 [
32], renowned for its high precision, lightweight design, robust feature extraction capability, and transferability. The C2f module employs the compact FasterNet module structure to minimize the model’s size without significantly compromising accuracy.
The Neck is utilized to extract multiscale features based on the backbone network. In LW-YOLO, we integrated the BiFPN structure, which effectively disseminates both high-level semantic information and low-level detail information. This integration aids in accurately locating and recognizing objects, delivering more precise detection results across various scales. Furthermore, BiFPN incorporates a lightweight feature fusion module, reducing network parameters and computational complexity while enhancing efficiency and speed.
The Head, the central component of the LW-YOLO model, generates the output results of object detection. It comprises a series of convolutional layers and fully connected layers, amalgamating classifiers and regressors. The classifier identifies the presence of objects in the image and categorizes them accordingly. The regressor predicts the bounding box position and size of the objects. LW-YOLO employs the MPDIoU loss function, facilitating accurate prediction of the position and shape of bounding boxes, thereby boosting the model’s performance and precision.
To facilitate readability, we provided annotations for the abbreviations used in this article, as shown in
Table 1.
2.3. Feature Extraction Network
In deep learning network models, an increase in the number of layers can lead to information distortion or compression at each layer, resulting in feature loss. The integration of feature information from multiple scales can augment the model’s detection capability and enhance detection result accuracy. In the YOLOv8 model, the Neck network employs the FPN and PAN network structures. While FPN propagates semantically rich features from deep layers to shallow layers via upsampling, fusing feature information from different scales, PAN propagates position information contained in shallow feature layers to deep feature layers through downsampling, augmenting position and semantic correlations. Despite enhancing the network’s feature extraction capability, this combination introduces issues: PAN’s input primarily depends on the multiscale features supplied by FPN, potentially leading to the loss or partial replacement of some low-level original features. Consequently, unique details and local information in the backbone network may not be fully transmitted to the PAN network. To rectify these limitations of the original feature extraction network, we introduce an adjusted network structure termed BiFPN. This novel structure combines top-down and bottom-up feature fusion to enhance object detection task accuracy and performance by addressing shallow feature loss.
Figure 4 contrasts the structures of FPN, PAN, and BiFPN, clearly illustrating BiFPN’s advantages.
BiFPN is an improved structure based on the PAN network. With respect to bidirectional cross-scale connections, we implement several key steps to enhance the feature fusion effect. Firstly, we eliminate nodes with only one input edge due to their relatively minor contribution to the results, thereby simplifying the network structure and reducing computational costs. Secondly, we establish a connection between the original input node and the output node, enabling more feature information to be fused at a lower cost, increasing feature diversity, and enhancing overall expressive power. Lastly, we merge top-down and bottom-up paths into a single module, allowing the module to be stacked repeatedly for higher-level feature fusion. These enhancements render BiFPN more potent and efficient in terms of feature fusion capability. To fuse weighted features, BiFPN introduces a rapid normalization fusion method, as demonstrated in Equation (
1). In this method, each feature learns a weight to represent its importance and differentiate between various features for discriminative fusion.
In Equations (
1)–(
3),
denotes the weight learned through the Rectified Linear Unit (ReLU) activation function, which modulates the contribution levels of various input features. The symbol
represents a small value (for instance, 0.0001), employed to augment the stability of the denominator in the equation. The weights learned,
(
,
representing input features), are confined to non-negative ranges by the ReLU activation function, ensuring weight stability and rationality. This fusion methodology intensifies feature fusion effects and bolsters the model’s adaptability to diverse features.
For performing cross-scale connections and weighted feature fusion, BiFPN extracts
,
,
, and
features from the backbone as BiFPN input. Using node
as an illustration, the fusion process of two features is depicted in
Figure 4c.
symbolizes intermediate features from top-down (the blue circle in the middle);
signifies output features from bottom-up (the blue circle on the right); and
and
represent left input features. Resize is typically used for resolution matching through upsampling or downsampling interpolation, while Conv is generally employed for feature processing via convolution operations. By leveraging mutual connections and fusion between different levels, BiFPN successfully achieves bidirectional cross-scale transmission and efficiently integrates information via rapid normalization fusion.
2.4. Bottleneck Lightweighting
As neural networks increase in layer quantity, the number of feature map channels also expands. However, as multiple channels of feature maps may contain similar or identical information, redundancy between feature maps may occur. This redundancy necessitates additional floating-point operations (FLOPs), resulting in increased computational delay.
For an input feature of size
, when performing convolutional operations using a
convolutional kernel, the required memory access can be calculated using Equation (
4), where c is the number of input data channels.
DWConv (Depthwise Convolution) [
33] computes output channel features using sliding operations on each input channel, and the required memory access can be calculated as Equation (
5), where
is the number of input data channels.
DWConv is often used with PWConv (Pointwise Convolution) [
34] to improve model accuracy, which means
, where the memory access of PWConv is typically larger than that of Conv (Contrary to standard Convolution), leading to increased latency. To improve detection speed and reduce memory access, a new convolution module needs to be introduced to replace the conventional convolution and solve the inefficiency problem of DWConv.
Unlike Conv and DWConv, PConv in FasterNet applies regular convolution to a subset of input channels to extract spatial features, rather than convolving the entirety of the input channel. If the feature maps are stored in a continuous or regular pattern in memory, the first or last continuous channel can adequately represent the entire feature map. Experimental findings [
28] demonstrate that PConv, when compared with standard Conv, markedly diminishes computational complexity, necessitating only 1/16 of the computational operations. Similarly, PConv also curtails memory access to merely 1/4 of that required by traditional convolution.
Drawing parallels to DWConv, FasterNet introduces PWConv based on PConv to capture correlations between input channels. As depicted in
Figure 5, by integrating PWConv with PConv, two distinct structures are formed: a T-shaped Conv structure and two independent convolution structures.
The T-shaped convolution structure incorporates a Pointwise Convolution (PWConv) layer on the foundation of Partial Convolution (PConv), performing convolutional operations on the spatial dimension to further extract features. Compared with conventional Conv, the T-shaped Conv places greater emphasis on features at the central position. Although the T-shaped convolution module can directly adopt higher computational efficiency, it demands more FLOPs and is computationally more complex relative to PConv and PWConv. For identical input and output feature maps, the FLOPs of T-shaped Conv and two independent convolutions are demonstrated in Equation (
6) and Equation (
7), respectively. Here,
and
,
are the first or last channel numbers of the channels stored contiguously in memory.
The FasterNet module comprises three convolutional layers, as depicted in
Figure 6, one of which is the PConv layer and the remaining two are PWConv layers. The Bottleneck module of C2f is supplanted by the FasterNet module, as illustrated in
Figure 7. By implementing PConv, we can reduce model computation while preserving accuracy. Relative to the original Bottleneck structure, each FasterNet module incurs a computational cost of approximately 1/16. This is attributable to the fact that in the FasterNet module, we only convolve 1/4 of the original channel number, and the computational cost of subsequent 1 × 1 convolution is relatively marginal. Consequently, the computational cost of the entire FasterNet module can be roughly 1/16 of the Bottleneck structure. By optimizing the Bottleneck structure in YOLOv8, our enhancement scheme yielded significant results in reducing computational load and improving inference speed, a fact that is thoroughly corroborated in the experimental section in
Section 4.
2.5. Loss Function
Given that defects on Printed Circuit Boards are typically minuscule, a judiciously designed loss function can significantly enhance the model’s target detection performance. In target detection, the bounding box regression loss function is pivotal and directly influences the model’s performance. While the Complete Intersection over Union (CIoU) [
35] loss function factors in a penalty term for aspect ratio during the computation of the bounding box regression loss, it exhibits a limitation: when the actual bounding box and predicted bounding box share the same aspect ratio but differ in terms of width and height values, the CIoU loss function fails to accurately represent the true disparities between these two bounding boxes. This leads to a reduction in the convergence speed and accuracy of bounding box regression. The formula for CIoU can be articulated as Equation (
8):
As Equation (
8) shows, the Intersection over Union (IoU) denotes the ratio of the intersecting area of the predicted bounding box and the actual bounding box to their combined area. The parameters implicated in the formula are illustrated in
Figure 8. The term
signifies the Euclidean distance between the center of the actual box and the predicted box;
h and
w, respectively, correspond to the height and width of the predicted box;
and
, respectively, represent the height and width of the actual box; and
and
, respectively, denote the height and width of the smallest enclosing box formed by the predicted box and actual box.
MPDIoU is a bounding box similarity measurement technique predicated on minimum point distance. It holistically considers various related factors in existing loss functions. Compared with other loss functions, MPDIoU simplifies the similarity comparison between bounding boxes, is suitable for bounding box regression considering both overlapping and nonoverlapping scenarios, and attains superior efficiency and accuracy in bounding box regression tasks. MPDIoU employs a novel IoU-based measurement approach, which directly minimizes the distance between the upper-left corner and lower-right corner points of the predicted bounding box and actual bounding box to assess their similarity. This method circumvents traditional area measurement and concentrates on specific positional information of bounding boxes, thereby measuring their similarity with greater precision. The computation method of MPDIoU is as follows:
As Equations (
9)–(
12) show, A and B are two arbitrary convex polygons, and the parameters involved in the formula are shown in
Figure 9.
w and
h are, respectively, the width and height of the input image.
, respectively, represent the coordinates of the upper-left corner and lower-right corner points of A and B.
The MPDIoU loss function is used in target detection tasks to help the model accurately predict the position and shape of bounding boxes by measuring their correlation and overlap. Compared with the original CIoU loss function, the MPDIoU loss function provides a more accurate way to measure bounding boxes, which can effectively improve the performance and accuracy of detection models during training and optimization.
3. Image Preprocessing and Dataset
The dataset employed for experimentation in this study is derived from the PCB defect dataset released by the Open Laboratory of Peking University. The dataset comprises 1386 PCB images, each averaging a pixel size of 2777 × 2138. It encompasses six distinct types of defects: missing hole, open circuit, short, spur, spurious copper, and mouse bite.
Figure 10 presents sample images of these defects. Given the relatively limited number of training samples in the original dataset, the model is susceptible to overfitting on these sparse samples, leading to issues such as poor generalization capability, imprecise parameter estimation, overly intricate network, etc. These challenges can be effectively mitigated by suitably enhancing the original images and augmenting the number of images [
36]. Concurrently, various transformations can be applied to the images to bolster the model’s generalization capability, robustness, and curb overfitting, thereby enhancing the model’s performance and efficacy (
Figure 11). In this study, following random flipping, rotation, translation, scaling, and cropping operations, the quantity of images in the dataset was expanded to 2272. To train and evaluate the model, the dataset was partitioned into a training set, validation set, and test set in a ratio of 7:2:1.
Table 2 exhibits the distribution of each type of defect image data in the dataset. Post data augmentation, the mean Average Precision (mAP) escalated from 91.81% to 94.20% in
Table 3.
5. Conclusions
Compared with prevalent object detection network models, this paper introduces a lightweight model, LW-YOLO, specifically designed for detecting PCB defects. Initially, this paper addresses the issue of feature information loss in the backbone network by fully leveraging feature information at various levels, thereby enhancing the model’s detection capability for objects of different scales. Concurrently, lightweight feature fusion modules are employed in the BiFPN to diminish the network parameter count and computational complexity, thus enhancing network computational efficiency and speed. Subsequently, the lightweight FasterNet module structure is utilized to rectify the redundancy issue of channel information in feature maps and augment the model’s inference speed. Finally, a novel MPDIoU loss function is employed to guide the model in accurately predicting the position and shape of bounding boxes, effectively boosting model performance and accuracy.
The experimental results unequivocally demonstrate that the LW-YOLO object detection model developed in this study exhibits significant advantages on the PCB dataset. This model not only outperforms in terms of detection accuracy and speed but also fulfills the requirements for lightweight deployment. This implies that our model showcases superior performance in practical industrial applications. It can effectively detect defects or anomalies on PCB, meet the high-precision object detection requirements in the industrial field, and provide reliable solutions for other domains such as automated production processes, quality control, and fault detection.