1. Introduction
In recent years, semiconductor chips have played a critical role in various industries, including military and defense, electrical engineering, automotive manufacturing, and healthcare [
1]. The ongoing developments in manufacturing have resulted in the miniaturization and increased integration of semiconductor chips, thereby significantly enhancing their computational capabilities [
2]. However, these developments have also led to the emergence of more intricate chip architectures, which have in turn given rise to considerable difficulties in the field of high-quality chip packaging. Chip packaging entails the secure attachment of an integrated circuit chip to a substrate, followed by the soldering of a protective cover plate onto it. This process serves to safeguard the chip from physical damage and chemical corrosion while simultaneously ensuring optimal electrical connections and thermal management. The discrepancies in manufacturing techniques and thermal stress between materials can result in the formation of voids within the solder layer. The formation of voids can result in a number of adverse effects, including electrical failures, thermal issues, and impaired interconnectivity. These can have a significant impact on the performance and lifespan of the chip in question [
3]. Conventional detection techniques employ X-ray imaging systems to obtain internal chip images, which are then manually analyzed to identify any defects. However, this approach is both labor-intensive and time-consuming. Furthermore, the low contrast of X-ray grayscale images diminishes the efficiency and accuracy of defect identification. It is therefore imperative that advanced and reliable technologies be developed with the aim of accurately and efficiently detecting internal defects.
The advent of machine vision has mitigated some of the constraints associated with manual inspections. Dunderdale et al. [
4] utilized a combination of Scale-Invariant Feature Transform (SIFT) descriptors and a random forest classifier for defect detection in photovoltaic modules, achieving a classification accuracy of 91.2%. Li et al. [
5] developed a method for classifying wafer maps by integrating supervised SVM classifiers with unsupervised Self-Organizing Map (SOM) clustering, achieving a classification accuracy exceeding 90%. Liu et al. [
6] employed a hybrid recognition method integrating mathematical morphology with pattern recognition to analyze the causes and types of Printed Circuit Board (PCB) defects. The authors proceeded to process and binarize the raw color images, subsequently proposing a distortion detection algorithm for the threshold segmentation of PCB defect images. This approach was found to yield high levels of accuracy in detection and to reduce the time required for this process. However, traditional machine vision methods rely on handcrafted features, which are often domain-specific and lack generalizability [
7]. Furthermore, manual feature design is time-consuming and labor-intensive, making it challenging to adapt to new products and production lines. With the advent of Industry 4.0 [
8], there is an increasing requirement for the development of flexible and adaptable defect detection systems.
In recent years, supervised deep learning methods have been extensively employed in industrial product defect detection [
9]. These methods can be classified into three categories based on the level of granularity of the computer vision tasks involved. These categories are image-level classification, region-level object detection, and pixel-level semantic segmentation. The preliminary stage of fundamental image-level classification techniques entails the deployment of Convolutional Neural Networks (CNNs) for the extraction of features from samples, which are subsequently classified. Alvarenga et al. [
10] proposed a method for the classification of rail defects based on the analysis of wavelet-transformed eddy current signals using a convolutional neural network. The method demonstrated a classification accuracy of 98% when evaluated in real-world settings. Deng et al. [
11] proposed an automated defect verification system combining a fast circuit comparison algorithm with deep neural-network-based classification. This approach achieved high accuracy in PCB defect classification and significantly reduced false positive and false negative rates. Shu et al. [
12] introduced the Parallel Spatial Pyramid Pooling Network (PSPP-net), which combines online and offline deep CNN feature extraction streams and uses GPU-integrated features and softmax regression to classify LED chip defects. Batool et al. [
13] proposed a Convolutional Neural Network (CNN) approach to address class imbalance through data undersampling, offering a robust solution for wafer defect classification. The method achieved an accuracy of 90.44% in the testing phase. While image-level classification methods are capable of identifying defect types in individual samples, they are not appropriate for images that contain multiple defects. In contrast, region-level object detection techniques are capable of identifying and localizing multiple defects within a single image. Tang et al. [
14] used MobileNetV3 as a baseline model and combined it with a dual-domain attention mechanism to propose a lightweight PCB defect detection network designed to efficiently detect small defect features. Dlamini et al. [
15] developed a detection system for PCB surface mount technology using a feature pyramid network in conjunction with MobileNetV2. Chen et al. [
16] enhanced the YOLOv3 model with DenseNet to improve SMD LED chip detection and optimized it using the Taguchi method. In response to the complex architecture and slow inference of anchor-based target detection algorithms, Chen et al. [
17] proposed an anchor-free target detection algorithm (LGCL-CentreNet) for PCB defect detection. A lightweight module was designed with the objective of augmenting the local and global contexts, thereby achieving higher defect detection with lower computational complexity. However, due to the irregularities of defects and the wide variations in shapes and sizes, region-level object detection methods only output bounding boxes. As a result, accurate description of the area and shape of complex defects is difficult. Consequently, research has concentrated on pixel-level segmentation methods, with the objective of achieving precise defect localization. Furthermore, precise defect descriptions facilitate subsequent defect identification and improvements in manufacturing processes. In the field of industrial defect detection, researchers have typically designed models based on basic semantic segmentation networks, including FCN [
18], U-Net [
19], SegNet [
20], and the DeepLab series [
21,
22,
23]. These models have primarily focused on multi-scale and lightweight architectures. Ling et al. [
24] developed a deep twin semantic segmentation network for the detection of PCB soldering defects, utilizing similarity metrics derived from the twin network to enhance the accuracy of the detection process. Wu et al. [
25] proposed a Mask R-CNN-based deep learning method for PCB solder joint defect segmentation, utilizing ResNet-101 as the backbone network, with an mAP of over 95% in segmentation tasks. Yang et al. [
26] introduced a novel non-destructive defect segmentation network, called NDD-Net, which uses an Attention Fusion Block (AFB) and a Residual Dense Connection Convolution Block (RDCCB). When applied to the publicly available GDXray and RSSDs X-ray datasets, NDD-Net outperforms other state-of-the-art segmentation networks in defect localization, particularly in overcoming class imbalance challenges.
Despite the widespread application and significant achievements of deep learning algorithms in industrial defect detection, defect segmentation in X-ray images of chip interiors remains challenging. First, the small size of chips and variations in manual placement often result in captured images containing excessive background information, resulting in unnecessary computational load and reduced detection speed. Second, significant scale variations in defects require further research into effective multi-scale feature representation methods. Finally, the lack of semantic information in X-ray images, low contrast between defects and background, and noise interference often blur defect boundaries. Therefore, achieving efficient and accurate segmentation of void defects in chip X-ray images remains a challenge.
To address these challenges, we propose a deep-learning-based framework for chip packaging void defect segmentation that achieves high detection accuracy and speed. The main contributions of this work are as follows:
We propose a lightweight U-Net network with rectangular spatial attention (RSALite-UNet) for segmenting chip cover plates with different rotation angles. This network uses MobileNetV3 as the feature extraction backbone and incorporates a rectangular spatial attention module. This design significantly increases the segmentation speed while improving the segmentation accuracy for chip cover plates.
We propose a dual decoder Mamba U-Net (DM-UNet) network for void defect segmentation in chip weld areas. This network uses Vision State Space (VSS) blocks as the core units of the encoder to cope with the diversity of defect shapes and sizes. It also introduces the Feature Correlation Cross Gate (FCCG) module, which effectively integrates boundary-aware features with defect segmentation features in the dual decoders, significantly improving chip void defect segmentation performance.