1. Introduction
The rapid development of electronic technology has made printed circuit boards (PCBs) the basic component of electronic devices, and their performance, stability, and reliability directly affect the effectiveness of the overall equipment [
1]. Therefore, PCB defect detection technology in the manufacturing of electronic products has a vital role in ensuring the stability of electronic products [
2]. Traditional PCB testing methods are artificial visual examination methods, online testing, flying needle testing and so on. However, traditional manual inspection methods are highly subjective, rely on the experience and visual acuity of operators, and exhibit a high rate of misjudgments [
3]. The online test method has limited detection capacity for physical defects (such as poor welding, cracks and so on). In addition, online testing equipment requires professional operations and maintenance, and the cost of use is high [
4]. The flying needle test method exhibits a relatively slow speed and low efficiency when applied to large-scale, high-density, and high-precision circuit board detection. At present, these traditional detection methods find it difficult to meet the needs of modern industrial production [
5]. Therefore, studying an efficient and accurate PCB deficiency detection method has become an urgent problem.
At present, the methods for PCB defect detection are mainly the following: conventional feature extraction techniques and deep learning approaches. Traditional feature extraction methods capture target information through structured steps, such as edge detection and texture analysis [
6]. Acciani et al. [
7] passed the geometric features extracted from the input welded image to the multi-layer perceptual machine, and then obtained the classification result. Although this method improves the overall classification rate for five types of welded defects, the reference image is obtained through multiple image-matching steps. The calculation process is complex, and it performs only defect classification without defect positioning. Annaby et al. [
8] proposed an optimized and low-complexity algorithm. The algorithm belongs to the one-dimensional category; during matching, the two-dimensional sub-image is converted into a one-dimensional descriptor, and the descriptors are then transformed. The implementation of the algorithm has improved the calculation efficiency and reduced the impact of noise. However, the detection accuracy remains limited. The swift advancement of deep learning technology provides new solutions in the field of PCB defect detection [
9]. Li et al. [
10] leveraged the XGBoost model and neural transformation networks to capture dynamic facial and bodily features, objectively linking classroom participation to academic outcomes. While these approaches demonstrate the efficacy of multimodal fusion in assessing engagement, their application is predominantly focused on general educational settings, leaving room for further exploration in specialized domains such as PCB defect detection, where multimodal data could similarly enhance the detection accuracy and robustness. Zhang et al. [
11] introduced a dual-domain feature extraction framework that combined spatial and frequency domain features, along with a resource allocation mechanism to enhance feature integration. Despite these advancements, this approach heavily relies on handcrafted mechanisms, such as channel and position attention, which may not fully adapt to diverse scenarios or exploit latent feature relationships effectively. Li et al. [
12] introduced an extended characteristic pyramid network detection model, which effectively combines high-level semantic details with low-level geometric features and introduces a focus loss function. The experimental results demonstrate that the model exhibits strong transferability. However, in the actual industrial production environment, due to the problems of noise interference and uneven exposure, the accuracy is still low and the generalization ability of the algorithm is poor, which makes it unsuitable for industrial production applications. Wang et al. [
13] developed a deep large-kernel backbone to broaden the effective receptive field and capture global information more efficiently. They also employed 1 × 1 convolutions to balance the model’s depth, thereby enhancing the efficiency of the feature extraction through reparameterization techniques. Furthermore, they introduced a bidirectional weighted feature fusion network, along with an innovative noise filter and feature enhancement extractor, to eliminate noise generated during information fusion and to recalibrate features across different channels, thereby improving the quality of the deep features. Additionally, the aspect ratio of the bounding box was simplified to address the issue of specificity values. Finally, they achieved an average mean average precision of 97.3% after multiple experiments. Wang et al. [
14] proposed a SparseBEV-based framework for enhancing 3D object detection accuracy by incorporating a hybrid discrete–continuous loss function. This method effectively mitigates planar-stereo misidentifications and improves 3D direction regression. However, its reliance on limited contextual information and challenges in handling diverse geometries reduce its robustness in real-world applications. Song et al. [
15] proposed a coordinate attention dynamic mechanism that performs convolution operations with the deformable convolutional network v2 (DCNv2) using coordinate attention based on YOLOv7 and a dynamic head diverse (Dy Head-d) module that prioritizes spatial awareness over scale awareness, building on Dy Head. Finally, the WDC-YOLO achieved a mean average precision of 98.4% on public datasets. The two algorithms have enhanced the accuracy of PCB defect detection, but they exhibit poor generalization ability, and the model training process is complicated. In the referenced study, the authors used rotation, random cropping, and other data augmentation methods to expand the original dataset to 10,668 PCB defect images, which improved the model’s performance on a common benchmark. However, this large sample demand not only increases the cost of data collection and processing but also results in significant computational overheads.
To solve the problems of the above-mentioned PCB defect detection accuracy, weak generalization ability, and complex model training, this article proposes a small-sample and high-precision PCB defect detection algorithm: SSHP-YOLO. The main work is as follows.
(1) Design a small target feature information extraction module: The ELAN-C module. The convolutional block attention module (CBAM), combined with the efficient layer attention network (ELAN), constitutes the ELAN-C Module, increasing the degree of attention to PCB tiny defect information.
(2) Propose the ASPPCSPC structure: By combining the dilated convolution, the spatial pyramid pooling and the concatenate operation, extract the multi-scale features of the PCB defects to improve the detection accuracy.
(3) Use the SIoU loss function to enhance the soft-matching relationship between the detection box and the real frame in the target detection, improving the predictive accuracy of the border box.
(4) On the PKU-MARKET-PCB public dataset, through ablation experiments, comparative experiments and visual comparison analysis, verify the effectiveness of each part of this article and the performance advantages of the proposed algorithm.
(5) On the NEU-DET public dataset, through comparative experiments, verify the generalization capacity of the proposed algorithm.
2. Related Work
The YOLO algorithm was first proposed by Joseph Redmon et al. [
16] in 2015. The network structure of YOLOv1 mainly includes 24 convolutional layers and two fully connected layers. In the detection process, the image is divided into 7 × 7 grids, and each grid predicts two bounding boxes and class probabilities, and the output is a 7 × 7 × 30 tensor. YOLOv1 is superior to Faster R-CNN in terms of the detection speed and can process 45 frames per second, but the detection effect is poor for small and similar objects. The following year, Joseph Redmon et al. [
17] introduced the anchor mechanism, added multi-scale training and joint training strategies, and used the Darknet-19 network structure to propose the YOLOv2 algorithm. The suggested algorithm enhances the detection precision while ensuring a quick detection rate. Then, Redmon et al. [
18] made improvements based on YOLOv2 and proposed YOLOv3. The feature extraction component utilizes the Darknet-53 network architecture. The feature pyramid network (FPNLIN) [
19] is used for multi-scale detection, which guarantees the precision of object detection while considering practical aspects. In 2020, YOLOv4 was proposed by Alexey Bochkovskiy et al. [
20]. Several improvements were made to YOLOv3, including the use of CSPNet as the backbone network, the introduction of the Mish activation function, and the adoption of the CIoU loss function to enhance the detection performance in terms of small objects. One month after YOLOv4’s release, YOLOv5 was officially introduced. YOLOv5 optimizes both the network training and inference processes, emphasizes industrial applicability, and has become the most widely studied algorithm in surface defect detection on PCBs.
Based on YOLOv3, Ge et al. [
21] developed YOLOX in 2021 by implementing methods such as decoupled head, data augmentation, anchor free, and SimOTA sample matching to create an anchor-free, end-to-end object detection framework. YOLOv6 was proposed by Li et al. [
22]. The algorithm cancels the anchor box that has been used from YOLOv1 to YOLOv5, optimizes the backbone network and neck network, and decouples the detection head to separate the classification process of border and category. Wang et al. [
23] proposed YOLOv7 in 2022. They reconstructed the head structure, introduced a new attention mechanism and feature fusion method, and achieved strong performance on multiple standard datasets, making it one of the most studied algorithms in the field of PCB surface defect detection.
Table 1 summarizes the network structure of the YOLO family of algorithms.
In view of the problems in the field of PCB defect detection, many scholars have proposed their own methods based on YOLOv5 and YOLOv7. Du et al. [
24] proposed YOLO-MBBi, using mobile inverted residual bottleneck block (MBConv) modules, CBAM attention and depth-wise convolutions to substitute layers in the YOLOv5s network. Zhou et al. [
25] combined the lightweight MobileNet-v3 network with the CSPDarknet53 network and replaced the coupling detection head with a decoupling detection head. Yuan et al. [
26] proposed YOLO-HMC, which adopted the backbone part the HorNet as an improved multiple, designed a convolutional block attention module, and used the content-aware reassembly of features to replace the up-sampling layer of the original model. Xiao et al. [
27] introduced the coordinate attention mechanism to improve the backbone and neck network of YOLOv7-tiny, used DS-Conv to replace part of the common convolution in YOLOv7-tiny and used Inner-CIoU as the bounding box regression loss function. Luo et al. [
28] proposed EC-YOLO, in which ACmix (a mixed model that enjoys the benefits of both self-attention and convolution)was used as a substitute for the 3 × 3 convolutional modules in the extended ELAN (E-ELAN) architecture, the ResNet-ACmix module was engineered, and the dynamic head (DyHead) was utilized. Multi-modality object detection, especially involving infrared and visible imagery, has drawn significant attention due to its potential to integrate complementary features for improved robustness. Zhang et al. [
29] proposed an antagonistic-learning-based framework, utilizing modules like AFED for extracting differential features and ADFF for attention-based feature fusion. This method demonstrates state-of-the-art performance on infrared and visible detection benchmarks. However, its focus on large-scale datasets and general-purpose tasks makes it less effective for small-sample scenarios or domain-specific applications, such as PCB defect detection.
In the above works, some scholars have improved the accuracy of the model, some scholars have reduced the number of parameters of the model, and some scholars have improved the generalization ability of the model to a certain extent. However, these methods rely on a large number of samples, often reaching thousands or even tens of thousands, resulting in significant waste of computational resources. Therefore, SSHP-YOLO is proposed, which has strong generalization ability while ensuring the high accuracy of the model. The entry-level graphics card RTX-4060 can be used to train the model with a small sample.
3. Proposed Method
3.1. Overall Network Structure
The YOLOv7 algorithm is a one-stage target detection method. Based on YOLOv7, we propose SSHP-YOLO, and the overall network structure is illustrated in
Figure 1.
3.1.1. Input Module
The input module preprocesses PCB defective images. The input image is unified to 640 × 640 through the resize method, ensuring that the input image is matched with the model requirements, so that it can be processed correctly through its main network.
3.1.2. Backbone Module
The backbone module is characterized by featuring the input PCB defective image and finally obtained three feature layers. The backbone network structure is mainly composed of the CBS (convolution + bn naturalization + SiLU activation function) module, MP-1 module, and ELAN-C (CBS module for multi-branch stack with CBAM) module.
3.1.3. Neck Module
The neck module enhances feature extraction, primarily incorporating ASPPCSPC, ELAN-W, UP Sample, and the Cat structure. The ASPPCSPC module employs SPP with dilated convolution and a CSP structure to expand the receptive field, facilitating feature map fusion, enriching the feature information, and aiding in the detection of PCB image defects. Compared to the ELAN module, the ELAN-W module incorporates two additional stitching operations. The UP Sample module is utilized to achieve efficient fusion of features at different levels. The Cat structure further optimizes the performance of the convolutional layers.
3.1.4. Head Module
The head module performs detection by using a high-parameter structure to determine the number of output channels, generating a prediction box, and estimating its location, confidence, and category. These predictions serve as the model’s final output.
3.2. ELAN-C Module
In YOLOv7, ELAN can learn complex features by effectively managing both the shortest and longest gradient pathways, demonstrating significant robustness [
30]. It effectively aggregates the characteristics of different levels of characteristics, which significantly improves the model’s capture ability in terms of multi-scale target information. Especially when processing tasks with significant scale differences, it shows a strong performance advantage. Nonetheless, when it comes to identifying defects in PCBs, the detection difficulty of ELAN is still large due to small defect size and the low contrast of PCBs.
We propose a small target feature information extraction module: ELAN-C module, as shown in
Figure 2. The module that adds the CBAM after the ELAN depth can decouple the convolution process, thereby constructing a new, highly efficient layer attention network. The ELAN-C module fully leverages the surrounding contextual information. By dynamically adjusting the weights of the PCB feature map, it enhances the utilization of global information and significantly improves the sensitivity to local, subtle defect details. This capability is particularly advantageous for detecting high-density, multi-detailed PCB defects, where precision in identifying subtle variations is crucial. Furthermore, the module’s ability to effectively extract and prioritize essential features reduces the reliance on large training datasets. This makes SSHP-YOLO highly data-efficient.
The CBAM [
31] is an attention mechanism that employs dual pathways, consisting of a channel attention module (CAM) and a spatial attention module (SAM), as depicted in
Figure 3.
The CAM focuses on the channel characteristics of the PCB feature diagram. This module applies maximum pooling and average pooling to the input feature map in parallel, and then it sends the two one-dimensional tensors after pooling to the fully connected layer to obtain the channel attention weight MC. Finally, the MC is element-wise multiplied by the input feature map F to generate the feature map F’ adjusted by the CAM.
The specific calculations are shown in Equation (1).
Among them, is the average pooling, is the maximum pooling, is the weight of the hidden layer to the output layer, is the weight of the input layer to the dimension layer, is the vector after average pooling, and is the vector after maximum pooling.
The SAM focuses on the spatial characteristics of the PCB feature diagram. This module concatenates the tensors after maximum pooling and average pooling according to space, and then it performs convolution and sigmoid nonlinear activation operations in turn to normalize the channel weights of the PCB feature maps and generate the spatial attention weights M
S. Finally, the M
S is element-wise multiplied by the input feature map F′ to produce the feature map F″ adjusted by the SAM. The specific calculations are shown in Equation (2).
Among them, is a 2-dimensional space attention vector; is a convolutional layer that uses a convolution kernel size of 7 × 7; is average pooling, with a size of 1 × H × W; and is the maximum pooling feature, with a size of 1 × H × W.
3.3. ASPPCSPC Module
In YOLOv7, the SPPCSPC structure adopts multiple branches [
32], which combine the maximum pooling method of different sizes and the PCB defect characteristics obtained by conventional convolution. The processing method of scale enhances the sensitivity of the model to different size defects. By introducing the spatial pyramid pooling (SPP) layer, the SPPCSPC structure can significantly increase the effective field and enable the algorithm to capture global information more effectively, which is particularly important for identifying defects that span larger areas. However, the SPP structure inevitably leads to the loss of feature map resolution while increasing the receptive field, which will weaken the algorithm’s ability to capture fine features, thus affecting the accuracy and robustness of PCB defect detection. Resolution loss becomes a key factor limiting the detection performance, particularly when handling small defects or those with blurred edges.
The structure of the ASPPCSPC that has been suggested is illustrated in
Figure 4. The dilated convolution [
33] is set to ensure that the resolution of the feature map is unchanged while the receptive field is increased. The receptive field is further expanded through multiple dilated convolutions, pooling, parallel operations, and splicing, enabling multi-scale feature extraction that enhances the precision and robustness of PCB defect identification. This ensures that the network effectively captures fine details and contextual information, which is critical for detecting small or subtle PCB defects. The structure integrates multi-scale feature extraction through dilated convolution, pooling, parallel, and splicing operations, enhancing both the precision and the reliability of PCB defect identification. Dilated convolution adds the dilated rate (d) parameter to traditional convolution and adds convolution with a constant weight of 0 to the convolution kernel. The receptive field of the convolution kernel is expanded by increasing the hole step, so that each convolution operation can cover a wider input area and capture more context information. At the same time, combined with the padding operation (p), the dilated convolution can maintain the output feature map’s resolution to allow the network to effectively gather information at multiple scales. This high-resolution feature extraction reduces the reliance on large-scale annotated datasets, improving the model’s adaptability and efficiency in data-scarce environments.
The ASPPCSPC structure is mainly composed of the ASPP and CSP. The ASPP structure contains one 1 × 1 convolution with d of 0, which is equivalent to a normal convolution, and three 3 × 3 convolution with d of {6,12,18} and filled with p of {6,12,18}, resulting in PCB feature maps that have an equal number of input channels, height and width. In this structure, four dilated convolutions with different d are used in parallel for sampling, which effectively expands the receptive field of the feature map while maintaining the resolution of the image. This high-resolution feature enables the algorithm to precisely locate PCB defects. Moreover, the use of varying dilation rates (d) enables the network to perceive information at multiple scales, allowing for a more comprehensive understanding and analysis of details and contextual information within the image, thereby enhancing the performance of PCB defect detection. This approach is particularly advantageous for small-sample datasets, as it ensures high-quality feature representation without sacrificing detail, enabling accurate defect localization with fewer labeled samples. In parallel with the ASPP structure, image-level feature representation is performed. The specific operation is to apply global average pooling followed by a CBS structure, and then a bilinear up-sample operation is performed to generate PCB feature maps that match the quantity, height, and width of the input channels.
This enhances the capability of the feature map to extract features more effectively. The CSP structure divides the feature map into two parts. One part is processed with the ASPP structure and parallel image feature representation, and the other part is processed with traditional CBS. Finally, the CBS structure is used to output the PCB feature map using the same number of channels, as well as the same height and width of the input feature map. By leveraging these dual pathways, the ASPPCSPC structure enhances the feature extraction capability, enabling robust and precise detection of PCB defects. Moreover, this efficient architecture significantly reduces the dependency on extensive training datasets, ensuring that SSHP-YOLO performs effectively even in small-sample settings.
3.4. SIoU Loss Function
In the task of detecting targets, the boundary box loss function plays a vital role [
34]. The coordinate loss in the YOLOv7 network model uses the CIoU loss function, which is calculated as in Equation (3).
Among them,
represents the prediction box;
means the real box; and IoU is used to overlap between the quantitative prediction box and the real frame. The higher the value, the more accurate the prediction;
represents the Euclidean distance between the center of the predicted box and the true box and
refers to the diagonal len10668h of the minimum rectangular closed area that can fully cover the prediction box and the real frame. The consistency ratio of the length between the prediction box and the real frame is measured, and Equation (4) is defined.
Among them,
represents the width of the center position of the rear frame;
represents the level of the center of the rear frame;
represents the width of the position of the center of the prediction box; and
is a parameter used for trade-off, which indicates the weight of the weight, defined as shown in Equation (5).
Although the CIoU loss function introduces the length–width ratio of the detection box, it can only measure the consistency of the width and height ratio. It is not the difference between the real width and height and its confidence, which will hinder the return optimization of the forecast box. The SIoU [
35] loss function starts from the direction and angle between the regression vector, and it introduces the angle between the actual box and the predicted box in vector terms. The calculation is as shown in Equation (6).
Among them, is a distance loss and omega is the shape loss. The SIoU loss function makes the model converge faster by introducing the vector angle between the prediction box and the real frame. This design makes the SIoU loss function more advantageous in the target detection task, which can more accurately locate the target object, thereby improving the model’s accuracy. Therefore, this article uses the SIoU loss function.
5. Conclusions
This paper tackles the challenges of PCB defect detection, including the complexity and high rates of false positives and negatives, by introducing an improved PCB defect detection algorithm, SSHP-YOLO, which builds upon enhancements to YOLOv7. First, the CBAM attention mechanism is incorporated into the ELAN-C module, thereby enhancing the model’s ability to focus on critical features of PCB defects. Subsequently, the SPPCSPC module is substituted with the ASPPCSPC module, which facilitates the capture of contextual information across different scales through dilated convolutions applied to the input PCB feature map at varying sampling rates. This modification substantially improves the detection accuracy and robustness of PCB defect identification. Furthermore, the SIoU loss function is employed instead of the traditional CIoU loss function, thereby improving the soft-matching relationship between the predicted bounding box and the ground truth. This substitution enhances the precision of bounding box predictions in target detection. Finally, ablation experiments are performed on the PKU-Market-PCB public dataset to assess the effectiveness of each improvement. The proposed model achieves a precision of 96.91%, a recall of 96.16%, and a mean average precision (mAP) of 97.80%. The superiority of the improved model is further validated through visual comparative analysis. To assess the model’s generalization ability, additional experiments are conducted on the NEU-DET public steel defect dataset, confirming the robustness and adaptability of the enhanced algorithm. This paper aims to tackle issues such as the low detection accuracy, frequent false positives and negatives, requirement for large training samples, and complexity of PCB defect detection. Future work will focus on reducing the model complexity and computational cost while maintaining the detection accuracy.