LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips

Zhang, Jie; Chen, Ning; Li, Mengyuan; Zhang, Yifan; Suo, Xinyu; Li, Rong; Liu, Jian

doi:10.3390/s25020425

Open AccessArticle

LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips

by

Jie Zhang

,

Ning Chen

^*,

Mengyuan Li

,

Yifan Zhang

,

Xinyu Suo

,

Rong Li

and

Jian Liu

Mechnical and Vehicle Engineering, Hunan University, Changsha 411082, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(2), 425; https://doi.org/10.3390/s25020425

Submission received: 13 November 2024 / Revised: 13 December 2024 / Accepted: 20 December 2024 / Published: 13 January 2025

(This article belongs to the Special Issue Recent Advances in Imaging Sensors: Integration with Machine Learning and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Chip defect detection is a crucial aspect of the semiconductor production industry, given its significant impact on chip performance. This paper proposes a lightweight neural network with dual decoding paths for LED chip segmentation, named LDDP-Net. Within the LDDP-Net framework, the receptive field of the MobileNetv3 backbone is modified to mitigate information loss. In addition, dual decoding paths consisting of a coarse decoding path and a fine-grained decoding path in parallel are developed. Specifically, the former employs a straightforward upsampling approach, emphasizing macro information. The latter is more detail-oriented, using multiple pooling and convolution techniques to focus on fine-grained information after deconvolution. Moreover, the integration of intermediate-layer features into the upsampling operation enhances boundary segmentation. Experimental results demonstrate that LDDP-Net achieves an mIoU (mean Intersection over Union) of 90.29% on the chip dataset, with parameter numbers and FLOPs (Floating Point Operations) of 2.98 M and 2.24 G, respectively. Comparative analyses with advanced methods reveal varying degrees of improvement, affirming the effectiveness of the proposed method.

Keywords:

deep learning; defect detection; feature fusion; lightweight network; semantic segmentation; digital images; spatial resolution

1. Introduction

LED chips serve as the foundational components in LED lighting systems and find widespread applications in diverse fields such as lighting, displays, indicators, and signage. Typically crafted from semiconductor materials such as gallium arsenide (GaAs), gallium phosphide (GaP), gallium nitride (GaN), or silicon carbide (SiC), LED chips offer distinct advantages over conventional lighting technologies. One notable advantage lies in their exceptional energy efficiency, as LED chips convert a higher percentage of electrical energy into light compared to incandescent bulbs or fluorescent lamps. Additionally, LED chips exhibit a compact form factor and can be arranged in arrays to generate various light patterns and intensities. This inherent flexibility in design allows for a broad spectrum of lighting solutions. The impact of LED chips on the lighting industry is transformative, introducing unparalleled energy efficiency, durability, and versatility, qualities that traditional lighting technologies struggle to match [1,2,3]. This innovation continues to propel advancements in lighting technology, gaining increased adoption in residential, commercial, and industrial settings worldwide. As the utilization of LED chips continues to rise, the detection of defects becomes paramount during the production process to ensure the overall quality of LED chips [4,5].

In recent decades, significant research attention has been devoted to computer vision processing methods for chip defect detection [6,7,8]. Early studies employed automatic optical inspection (AOI) to identify chip defects. Lin [9] introduced a wavelet-based multivariate statistical approach for detecting ripple defects on chip surfaces. Xie et al. [10] innovatively proposed a golden-template self-generating technique for detecting potential defects in periodic two-dimensional wafer images. This technique involves directly detecting a golden template of the patterned wafer image under examination, requiring no prior knowledge. Zhang et al. [11] presented a hybrid approach for in-tray chip defect inspection, incorporating an image alignment algorithm and a hybrid method for defect detection. Chen et al. [12] developed a microscopic 3D volumetric topographic method based on the infrared confocal principle. In this method, confocal microscopy using infrared (IR) illumination projects active structured light onto the IC sample, enabling 3D volumetric inspection of its internal structure or detected defects. While traditional image processing methods can satisfy the requirements of chip defect detection, they face challenges in meeting the demands of large-scale LED chip detection due to issues such as low efficiency, poor accuracy, and limited robustness.

With the development of deep learning, a series of applications in the field of chip defect detection have been developed by combining the advantages of deep learning, such as stronger learning ability and being data-driven. Nowadays, target detection algorithms are mainly classified into end-to-end-based methods and region-nomination-based methods [13,14]. End-to-end-based methods such as SSD [15] and YOLO [16] obviate the need for region nomination and directly provide the class probability and location coordinates of the target. Huang et al. [17] proposed a novel small-object detection approach utilizing YOLOv4, which aims to enhance detection accuracy. It incorporates expanded feature fusion, optimized anchor box parameters via k-means++ clustering [18], and streamlined YOLO head network branches for improving efficiency. Li et al. [19] proposed an improved chip detection model based on the YOLOv5 network, namely YOLO-STPN. It adds a prediction head to the original model and integrates the swin transformer block [20] and convolutional block attention module into the path aggregation network. The mean average precision at 50% IoU (mAP50) of YOLO-STPN is 10.05% higher than the original YOLOv5, while the decrease of frames per second (FPS) is acceptable. Wang et al. [21] proposed an improved bubble defect detection model, YOLO-Xray, based on the YOLOv5 algorithm for chip X-ray images. The mean average precision (mAP) of the YOLO-Xray algorithm on the CXray dataset reaches 93.5%, which is 5.1% higher than the original YOLOv5. Cao et al. [22] proposed a real-time chip package surface defect detection method based on the YOLOv7 model to solve the challenge of detecting small targets and proposed a confidence propagation cluster (CP-Cluster) to further increase detection accuracy and result confidence. The two-stage classification algorithm has higher detection accuracy, and the single-stage regression algorithm has faster detection speed. Region-nomination-based methods require the preselection of regions of interest and classification detection, such as R-CNN [23], Faster-RCNN [24], and so on. Wang et al. [25] combined the CNN-based super-resolution (SR) technique and a classification model for the application of scanning acoustic microscopy (SAM) to conduct an intelligent inspection of flip chips, which was employed to enhance the SAM image resolution of flip chips, and designed a CNN-based network to classify the solder joints without manual feature extraction. These methodologies often attain elevated levels of precision, but they entail substantial computational expenses, thereby rendering them unsuitable for real-time applications in many cases. While the R-CNN algorithm has significantly enhanced accuracy compared to traditional detection algorithms, its major drawback is evident: redundant computation of overlapping frame features slows down network-wide detection. Despite the Faster R-CNN algorithm’s improved detection accuracy, its reliance on a selective search algorithm to locate regions of interest often results in sluggish performance. Additionally, Faster R-CNN continues to grapple with computational redundancy, and selecting an inappropriate IOU threshold can lead to issues such as noise detection or overfitting. There are also some hybrid algorithms that have been proposed. For example, Zheng et al. [26] proposed a hybrid algorithm based on geometric computation and a convolutional neural network for LED chip defect detection, which improved the SPP network model and performed coarse detection of defects on preprocessed chip lithography graphs in the form of grid segmentation.

Although these target detection algorithms are excellent, they can only determine the bounding box of the defect and cannot quantify the defect. It is important to emphasize that not all defects are categorically prohibited in LED chips. Certain defects must undergo evaluation to ascertain the extent of damage and determine the chip’s qualification based on the nature of the defect, such as surface area loss. Consequently, conventional object detection methods may be constrained in addressing this particular issue.

The primary objective of addressing the defect detection challenge involves establishing a comprehensive framework capable of autonomously analyzing the visual data pertaining to the objects of interest. Due to the irregular nature of defect occurrence, semantic segmentation methods offer a promising alternative. Utilizing the segmentation network, the chip categories are precisely delineated pixel by pixel [27], enabling the detection of damaged areas and evaluation of damage severity. In this way, the quantitative evaluation of defects can be realized. Deep learning has become widely adopted in image processing, with segmentation tasks emerging as a key criterion for evaluating the performance of deep learning models [28,29,30,31]. Shankar et al. [32] introduced a novel defect pattern segmentation scheme designed for enhanced flexibility, leveraging known properties of the human visual system, which remains optimal for image analysis in various applications. Zhang et al. [33] proposed a Non-local Aggregation Network (NANet), with a well-designed Multi-modality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-stage to integrate RGB-D features effectively. Niu et al. [34] proposed a simple plug-and-play data augmentation method based on the requirements of the CNN defect segmentation task to improve the accuracy of the segmentation model. Zhang et al. [35] proposed a lightweight deep learning-based algorithm for chip image segmentation named Hybridformer. It generates scale-aware semantic features using feature maps of different scales as the input part and uses the cross specification and SelfNorm to improve the generalization robustness of the model in terms of distribution bias. The network extracts global contextual information fusing EfficientNetv2 features to achieve accurate localization.

While semantic segmentation methods have demonstrated significant accomplishments in defect detection, our research identifies certain drawbacks. Firstly, in comparison to target detection algorithms, semantic segmentation algorithms exhibit higher parameter counts and slower inference speeds, making it challenging to ensure real-time performance on less powerful computing devices. Secondly, numerous segmentation networks employ 8× or 16× up-sampling directly to restore the feature map to the original resolution, often compromising high-segmentation accuracy. Additionally, during the up-sampling process, the neglect of fine-grained information results in incomplete fusion of feature information. To address these limitations, we propose a lightweight convolutional neural network with dual decoding paths for LED chip segmentation. This innovative approach aims to overcome the aforementioned drawbacks, providing a more efficient and accurate solution for defect detection in the realm of semiconductor chip analysis. The main contributions of this paper can be summarized as follows:

A lightweight convolutional neural network with dual decoding paths (LDDP-Net), which achieves good results for LED chip image defect segmentation, is proposed.
The backbone of Mobilenetv3 is modified for feature extraction to reduce the number of parameters, and depthwise convolution and pointwise convolution are effectively applied to achieve lightweight.
We propose a novel decoder containing dual decoding paths (LDDP), which is comprised of a coarse decoding path and a fine-grained decoding path for parallel operations. Our LDDP can serve as a plug-and-play structure to the efficacy and efficiency of upsampling process.

The remaining sections of the paper are organized as follows. Section 2 describe the details of the proposed method. The strategy of network training, the evaluation metrics, and the experiment results are shown in Section 3. Finally, conclusions are present in Section 4.

2. Methodology

2.1. Overall Structure of LDDP-Net

Figure 1 illustrates the comprehensive architecture of LDDP-Net, primarily comprised of an encoder and a decoder. The encoder leverages a modified MobileNetV3 backbone, while the decoder is predominantly composed of two intricately designed dual decoding paths to facilitate resolution recovery. Initially, the encoder engages in feature extraction from an input RGB image earmarked for segmentation. During feature extraction, two feature maps are preserved for resolution recovery. The first is the high-layer feature, retaining extensive semantic information for providing global information during up-sampling. The second is the intermediate-layer feature, maintaining intricate details as this location has not undergone complex mapping operations yet. The judicious application of intermediate-layer features serves as a detailed reference for up-sampling. Recognizing that high-level features inherently have a greater number of channels, leading to increased computational load in the decoder, we implement pointwise convolution [36] to reduce the dimension of high-level features. The dimensionally reduced feature map then undergoes the first step of up-sampling through the proposed dual decoding paths. These paths include a simple coarse decoding path, offering macro-information for decoding operations, and a specially designed fine-grained decoding path dedicated to detail reconstruction with the assistance of the coarse path. Subsequently, the up-sampled features are concatenated with the intermediate-layer feature, and the SE block [37] is employed to adjust channel weights. The resulting feature map is fed into the dual decoding paths for the second step of up-sampling. Finally, the required channel number is obtained using pointwise convolution, followed by the bilinear interpolation to ensure the final output aligns with the input.

2.2. Modified MobileNetV3 Backbone

Compared to large-scale networks, lightweight networks [38,39,40] are characterized by fewer parameters, reduced computational demands, and shorter inference times, making them more suitable for scenarios with limited storage space and power consumption. Consequently, the study of lightweight networks has garnered widespread attention, with MobileNet [38] serving as a prominent example. MobileNetV3, in particular, stands out for its outstanding performance and speed, showcasing unique advantages across various applications. The lightweight nature of MobileNet can be attributed primarily to depthwise convolution [41] and pointwise convolution [36], whose principles are illustrated in Figure 2. For an input feature map X_in with the channel number of C_in, we can adopt two ways to get an X_out with the channel number of C_out. Let the kernel size be k. The number of parameters introduced by conventional convolution can be calculated as

C_{i n} \times C_{o u t} \times k \times k

(1)

However, by decomposing the conventional convolution into a depthwise convolution and a pointwise convolution, the parameter number introduced can be calculated as

C_{i n} \times k \times k + C_{o u t} \times C_{i n}

(2)

Furthermore, the detailed structure of MobileNetV3 is determined based on the network architecture search method. However, blindly adopting the model may not yield optimal performance, as the network was originally designed for classification tasks, emphasizing a large receptive field. In the context of segmentation tasks, pursuing an excessively extensive receptive field can be counterproductive, leading to feature information loss. In this study, we have modified the MobileNetV3 backbone to serve as the encoder for chip segmentation, with the detailed structural composition presented in Table 1. Specifically, we set the input image height to 160 pixels and the width to 512 pixels. Within Table 1, the ‘3 × 3’ and ‘5 × 5’ entries in the operator column represent the convolution kernel size for depthwise convolution in this block, respectively. #Out means the number of channels in the middle layer. ‘Exp size’ denotes the number of transition channels from input to output, and the ‘SE’ column indicates whether the SE block is added after depthwise convolution in Block2. In contrast to the original MobileNetV3, we have reduced the downsampling multiple from 32 to 8 to preserve sufficient fine-grained information. Additionally, to reduce network parameters, we have eliminated the last convolution operation in the backbone, resulting in the encoder output channel number being 160 instead of 960.

2.3. Dual Decoding Paths

The prevalent upsampling methods primarily fall into the categories of interpolation and deconvolution [42]. Interpolation, devoid of the necessity for introducing additional learnable parameters, stands in contrast to deconvolution, which demands parameter training for determination. The relative performance superiority between these two methods remains ambiguous. Hence, in the formulation of our dual decoding path, we synergistically integrate both approaches.

As illustrated in detail in Figure 3, the dual decoding paths encompass a coarse decoding path and a fine-grained decoding path operating in parallel. The coarse decoding path simplifies channel reduction and feature map sampling, focusing on furnishing macro-level information, which is crucial for the decoding process. Conversely, the fine-grained decoding path exhibits heightened attention to detail. Employing multiple pooling and convolution techniques, this path focuses the network on spatial locations and channels subsequent to the deconvolution operation. Notably, to curtail the number of parameters and computational load, we exclusively employ depthwise convolution and pointwise convolution within this pathway. This strategic amalgamation of interpolation and deconvolution, combined with the nuanced approaches of coarse and fine-grained decoding, collectively enhances the efficacy and efficiency of our dual decoding framework.

Clearly, the coarse decoding path contains only one convolution combination and one interpolation. For a given input feature X ∈ RC × H × W, the output of the coarse decoding path can be expressed as

Y_{c} = U p_{b} (ReLU (B N (P C o n v (X))))

(3)

where Up_b(·) represents the bilinear interpolation. The number of output channels of PConv is reduced to 1/2 of the input channels. Therefore, Y_c is the feature map with the channel number, height, and width of 1/2C, 2H, and 2W, respectively.

In the fine-grained decoding path, deconvolution with a stride of 2 effectively reduces the number of channels while doubling the height and width. To enhance the quality of the decoded feature map, we initially employ deconvolution for upsampling. While interpolation optimally utilizes adjacent pixel information, deconvolution results may contain some noise. Consequently, secondary processing of feature maps obtained through deconvolution becomes essential. This secondary processing involves continuous convolution and feature fusion. Let Y₁ ∈ R^1/2C×2H×2W be the output after deconvolution. All subsequent operations on Y₁ will not change the output size but will enhance the spatial and channel details. The approach for enhancing the channel details can be expressed as

Y_{c h} = f [P R P ((M P_{c h} (D B A (Y_{1})))) + P R P ((A P_{c h} (D B A (Y_{1}))))] \cdot D B A (Y_{1})

(4)

where MP_ch and AP_ch denote the global maxpooling and avgpooling along the channel direction. Sign(·) denotes the sigmoid function. The output Y_ch has the same size as Y₁. Then, spatial detail enhancement will be conducted as follows:

Y * = P B A (f ⎣ C o n v (C A T (M P_{s p} (Y_{c h}) + A P_{s p} (Y_{c h}))) ⎦ \cdot Y_{c h})

(5)

where MP_sp and AP_sp denote the global maxpooling and avgpooling along the spatial direction. Y^* represents the transition feature map. The process is repeated twice and then added to Y₁ after Dropout. The whole process is repeated twice to fully enhance the fine-grained information.

2.4. Multi-Layer Feature Fusion

The high-level feature encapsulates significant semantic information, while the intermediate-level feature is replete with intricate details. Many segmentation networks resort to 8× or 16× direct up-sampling to restore the feature map to its original resolution, posing challenges in achieving high segmentation accuracy. Furthermore, during the up-sampling process, the neglect of details hinders the complete fusion of feature information. For addressing this challenge, our approach involves the initial concatenation of multiple layers of features. Recognizing that simple feature concatenation alone may not effectively adjust their relative importance, we subsequently incorporate the SE block to modulate the weight of individual channels. The schematic representation of the multi-layer feature fusion we employ is depicted in Figure 4.

Given the intermediate-layer feature X_I with the size of c1 × h × w and the high-layer feature up-sampled by dual decoding paths X_H with the size of c2 × h × w, the features concatenated along the channel direction are denoted as X_cat, X_cat∈R^{(c1+c2)×h×w}. Next, we perform a self-attention operation on X_cat, which requires squeeze and excitation to get a set of channel weights. The channel weights w can be calculated as follows:

w = F_{e x} (F_{s q} (A P (X_{c a t})))

(6)

where AP denotes global average pooling. F_sq(·) and F_ex(·) denote squeeze and excitation operations, respectively. Both squeeze and excitation operations reduce and increase the data dimension through linear mapping. The channel numbers of w are the same as X_cat. Finally, X_cat is multiplied by the weights of the corresponding channels to obtain the result of feature fusion.

3. Experimental Results

This section begins by presenting the experiment setup, encompassing implementation details, dataset specifications, and evaluation metrics. Subsequently, the experimental results are comprehensively reported. A detailed analysis of the results is conducted, leveraging both the experimental data and visualizations of the segmentation results from different networks for a qualitative performance comparison.

3.1. Experiment Setup

3.1.1. Implementation Details

The camera was a Hikvision MV-CE200-10UC(Haikang in Hangzhou, China), the lens was a 2-magnification video clear, and the WWK20-110C telecentric lens(Vicoimaging in Huizhou, China) had a field of view of 6.4 mm × 4.8 mm. The light source was FG-DVR3W-W (Fugenmv in Shanghai, China) with a white point light and FGTHP100-W(Fugenmv in Shanghai, China) with a white backlight, and the DD motor was a direct-drive motor model DMN71-P0 from Shangyin in Shanghai, China.

LDDP-Net was built under the Pytorch2021.3.3 framework, and all networks were loaded with pretrained weights from ImageNet 1K [43] before training. The SGD [44] with weight decay of 1 × 10⁻⁴ and momentum of 0.9 was adopted as the optimizer for training. We used the learning rate scheduling lr = baselr × (1 − iter/total_iter)^power. The base learning rate was set at 0.025, and the power was set to 0.9. The training epoch on the chip dataset was set to 50 for the networks. The warm-up strategy [45] was used in the first 40 iterations to accelerate the convergence. In addition, the hardware configuration was one single NVIDIA 1650 SUPER GPU (4G Memory). The loss function was dice loss [46]. In order to balance the segmentation difficulty, we set different weight coefficients for different categories. Let P be the prediction of the network after softmax and T be the true segmentation label. The final loss function formula is as follows:

D i c e_{i} = \frac{2 |P_{i} \cap T_{i}|}{|P_{i}| + |T_{i}|}

(7)

L = \frac{1}{N} \sum_{i = 1}^{N} α_{i} (1 - D i c e_{i})

(8)

where i is the category index, N is equal to the total category number, and α is the weight coefficient.

3.1.2. Dataset

We collected 150 defective chip images and annotated them using Labelme. Each image was segmented into five categories: background, lead, electrode, light-emitting area, and defect. To ensure uniformity, all chip images were resized to 160 × 512, based on their shape characteristics, before being inputted into the segmentation network. While the shape and position of defects varied significantly, other categories exhibited strong consistency, leading us to forgo additional preprocessing methods and data augmentation strategies. To mitigate potential variations in the segmentation of specific test sets, we relied on the average results obtained through five-fold cross-validation as the benchmark for performance comparison.

3.1.3. Evaluation Metrics

The application of semantic segmentation entails pixel classification within the input chip image, and as a result, the accuracy of segmentation plays a crucial role in subsequent defect detection tasks. To enhance the evaluation of different models, we introduced metrics such as Intersection over Union (IoU), mean Intersection over Union (mIoU), and mean Pixel Accuracy (mPA). IoU quantifies the overlap between the predictions of the segmentation network and the ground truth, while mIoU represents the average IoU value calculated individually for each category. Similarly, mPA gauges the average accuracy of pixel classification, calculated separately for each category. They are defined as follows:

I o U_{i} = \frac{p_{i i}}{\sum_{j = 0}^{m} p_{i j} + \sum_{j = 0}^{m} p_{j i} - p_{i i}},

(9)

m I o U = \frac{1}{m + 1} \sum_{i = 0}^{m} I o U_{i},

(10)

m P A = \frac{1}{m + 1} \sum_{i = 0}^{m} \frac{p_{i i}}{\sum_{j = 0}^{m} p_{j i}}

(11)

where m represents the category number excluding the background, p_ij denotes that a pixel of category i is incorrectly predicted to be category j, p_ji denotes that a pixel of category j is incorrectly predicted to be category i, and p_ii means that a pixel of category i is correctly predicted to be category i.

3.2. Overall Performance

3.2.1. Analysis Based on the Performance Indicators

Table 2 presents a performance comparison of the proposed method against several state-of-the-art models. Notably, LDDPNet exhibits remarkable advantages across all performance indicators. The mIoU reaches 90.29%, surpassing FCN [47] by 10.9%, PSPNet [48] by 12.49%, EncNet [49] by 5.96%, GCNet [50] by 5.99%, DeepLabV3 [51] by 7.98%, and SegNet [52] by 2.77%. Among these, FCN, PSPNet, and DeepLabV3 are conventional segmentation networks employing direct 8× or 4× up-sampling for resolution recovery. This approach sacrifices detailed features, leading to suboptimal segmentation accuracy for challenging categories. For instance, the slender structure of the lead and the intricate nature of the defect make it challenging to precisely define the boundary area, resulting in DeepLabV3 achieving IoU values of only 60.30% and 67.98% for these cases, respectively.

EncNet and GCNet represent two state-of-the-art segmentation models leveraging advanced attention mechanisms, yielding mIoU values of 84.33% and 84.30%, respectively. Remarkably, SegNet, an earlier design, achieves high accuracy through multi-stage upsampling, trailing the proposed LDDP-Net by only 2.77%. In terms of model scale, LDDP-Net boasts a mere 2.98 M trainable parameters, just 1/10th of SegNet, with the fewest FLOPs at 2.24 G. This signifies LDDP-Net’s capability for rapid inference on less powerful computing devices. In conclusion, the proposed network showcases superior overall performance.

3.2.2. Analysis Based on Visualization

The segmentation results obtained using FCN, PSPNet, EncNet, GCNet, DeepLabv3, SegNet, and LDDP-Net are illustrated in Figure 5c–i. It is evident that the segmentation quality aligns with the data distribution provided in the table above for each respective category. Notably, leads and defects prove difficult to distinguish among the five categories. Specifically, leads exhibit a slender appearance, while defects manifest in a variety of shapes and positions. In contrast, the background, light-emitting area, and electrode exhibit relatively fixed positions and shapes.

Observably, FCN, PSPNet, and DeepLabV3 display broken leads, primarily attributable to direct upsampling. GCNet and EncNet incorporate attention mechanisms and context encoding, respectively, resulting in more continuous outcomes. However, lead segmentation remains less accurate due to the width significantly exceeding the ground truth standard.

Regarding defect segmentation, all methods, except for SegNet and LDDP-Net, struggle with boundary reconstruction. LDDP-Net outperforms SegNet, albeit with some shortcomings. Notably, when confronted with a complex defect pattern, as illustrated in Figure 6, most methods fail to achieve complete segmentation. Despite some limitations in the results obtained by LDDP-Net, it exhibits stronger robustness compared to other advanced methods.

3.2.3. Ablation Experiments

Comparison of Different Downsampling Multiples: In this paper, we introduce a lightweight neural network with dual decoding paths for the segmentation of LED chips. To achieve large receptive fields and preserve low-level semantic information in feature maps, we made modifications to the downsampling multiple of the feature extraction layer. Ablation experiments were performed to identify the optimal network configuration. By controlling the number of strides (stride = 2), various downsampling multiples were obtained. Figure 7 illustrates the curves of mIoU and mPA at 4×, 8×, 16×, and 32×. The original MobileNetV3 utilizes a 32× downsampling, retaining a substantial amount of high-level semantic information crucial for classification tasks. However, in semantic segmentation tasks, the emphasis is on resolution reconstruction. The use of 32× downsampling leads to a noticeable decline in mIoU and mPA. Notably, when the downsampling multiple is set to 8×, the network achieves optimal performance. This configuration strikes a balance between capturing essential high-level features and preserving the necessary details for accurate semantic segmentation.

To evaluate the individual contributions of each pathway in the dual decoding architecture and assess the performance enhancements introduced by the dual decoding design, we conducted ablation experiments focusing on the decoding process. As can be seen in the table, the fine-grained decoding path was significantly higher than the coarse decoding path in the two key indicators of mIoU% and MPa%, which were 4.6% and 1.33% higher, respectively. The IoU of the fine-grained decoding path on the background, lead, electrode, and light-emitting area was also significantly higher than that of the coarse decoding path, especially on the lead, on which the IoU was 21.67% higher. The results demonstrate that the fine decoding path outperforms in most aspects. However, for target regions characterized by a small proportion of features, such as defects, the coarse decoding path exhibits superior performance due to its ability to retain more original features critical for accurate segmentation. As shown in Table 3, the implementation of the dual decoding path effectively integrates the strengths of both decoding paths, resulting in a significant improvement in the overall segmentation performance metrics.

4. Conclusions

This paper proposes a lightweight neural network, LDDP-Net, for LED chip segmentation. Within the LDDP-Net framework, modifications to the receptive field of the MobileNetv3 backbone address the challenge of information loss. Additionally, dual decoding paths are introduced, consisting of a coarse decoding path and a fine-grained decoding path. The incorporation of the intermediate-layer feature into the upsampling process significantly enhances the effectiveness of detailed boundary segmentation. Experimental results demonstrate that LDDP-Net boasts the fewest trainable parameters of 2.98M and achieves the highest mIoU and mPA of 90.29% and 94.79%, respectively, on the established chip dataset. The visualization of segmentation results shows that, with the contribution of the dual decoding paths, LDDP-Net can keep the continuity of the segmented target and present the advantage of detail recovery. Although lightweight chip segmentation is realized in this paper, defects are not quantitatively evaluated according to defect types. In future work, further research will be carried out based on the proposed network to realize the deployment and application of the project.

Author Contributions

Conceptualization, J.Z. and N.C.; Methodology, J.Z. and N.C.; Software, J.Z. and Y.Z.; Validation, M.L.; Formal analysis, M.L. and R.L.; Investigation, J.Z. and X.S.; Resources, J.Z., X.S. and R.L.; Data curation, J.Z., M.L., X.S. and J.L.; Writing—original draft, M.L. and Y.Z.; Writing—review & editing, M.L. and Y.Z.; Visualization, J.Z. and Y.Z.; Supervision, N.C.; Project administration, N.C. and J.L.; Funding acquisition, N.C., R.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the project of Yuelu Mountain Industrial Innovation Center (Grant No. 2023YCII0112), the Hunan Provincial Science and Technology Department (Grant No. 2023GK2008, 2023RC1046).

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shi, J.; Sun, Z. Large-Scale Three-Dimensional Measurement Based on LED Marker Tracking. Vis. Comput. 2016, 32, 179–190. [Google Scholar] [CrossRef]
Petkovic, M.; Bajovic, D.; Vukobratovic, D.; Machaj, J.; Brida, P.; McCutcheon, G.; Stankovic, L.; Stankovic, V. Smart Dimmable LED Lighting Systems. Sensors 2022, 22, 8523. [Google Scholar] [CrossRef] [PubMed]
Satorres Martínez, S.; Martínez Gila, D.M.; Rico, S.I.; Teba Camacho, D. Machine Vision System for Automatic Adjustment of Optical Components in LED Modules for Automotive Lighting. Sensors 2023, 23, 8988. [Google Scholar] [CrossRef]
Shen, J.; Liu, N.; Sun, H. Defect Detection of Printed Circuit Board Based on Lightweight Deep Convolution Network. IET Image Process. 2020, 14, 3932–3940. [Google Scholar] [CrossRef]
Wu, C.; Zhang, W.; Jia, Q.; Liu, Y. Hardware Efficient Multiplier-Less Multi-Level 2D DWT Architecture without off-Chip RAM. IET Image Process. 2017, 11, 362–369. [Google Scholar] [CrossRef]
Jayaram, K.; Joshi, S.S. Design and Development of a Vision-Based Micro-Assembly System. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2016, 230, 1164–1168. [Google Scholar] [CrossRef]
Wang, J.; Zhou, X.; Wu, J. Chip Appearance Defect Recognition Based on Convolutional Neural Network. Sensors 2021, 21, 7076. [Google Scholar] [CrossRef] [PubMed]
Zhou, S.; Yao, S.; Shen, T.; Wang, Q. A Novel End-to-End Deep Learning Framework for Chip Packaging Defect Detection. Sensors 2024, 24, 5837. [Google Scholar] [CrossRef] [PubMed]
Lin, H.-D. Computer-Aided Visual Inspection of Surface Defects in Ceramic Capacitor Chips. J. Mater. Process. Technol. 2007, 189, 19–25. [Google Scholar] [CrossRef]
Xie, P.; Guan, S.-U. A Golden-Template Self-Generating Method for Patterned Wafer Inspection. Mach. Vis. Appl. 2000, 12, 149–156. [Google Scholar] [CrossRef]
Zhang, M.; Li, M.; Zhang, J.; Liu, L.; Li, H. Onset Detection of Ultrasonic Signals for the Testing of Concrete Foundation Piles by Coupled Continuous Wavelet Transform and Machine Learning Algorithms. Adv. Eng. Inform. 2020, 43, 101034. [Google Scholar] [CrossRef]
Chen, L.-C.; Le, M.-T.; Phuc, D.C.; Lin, S.-T. In-Situ Volumetric Topography of IC Chips for Defect Detection Using Infrared Confocal Measurement with Active Structured Light. Meas. Sci. Technol. 2014, 25, 094013. [Google Scholar] [CrossRef]
Zhang, D.; Wang, A.; Mo, R.; Wang, D. End-to-End Acceleration of the YOLO Object Detection Framework on FPGA-Only Devices. Neural Comput. Appl. 2024, 36, 1067–1089. [Google Scholar] [CrossRef]
Liu, X.; Lin, Y. YOLO-GW: Quickly and Accurately Detecting Pedestrians in a Foggy Traffic Environment. Sensors 2023, 23, 5539. [Google Scholar] [CrossRef] [PubMed]
Kang, S.-H.; Park, J.-S. Aligned Matching: Improving Small Object Detection in SSD. Sensors 2023, 23, 2589. [Google Scholar] [CrossRef] [PubMed]
Hu, M.; Li, Z.; Yu, J.; Wan, X.; Tan, H.; Lin, Z. Efficient-Lightweight Yolo: Improving Small Object Detection in Yolo for Aerial Images. Sensors 2023, 23, 6423. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Tang, X.; Wen, F.; Jin, X. Small Object Detection Method with Shallow Feature Fusion Network for Chip Surface Defect Detection. Sci. Rep. 2022, 12, 3914. [Google Scholar] [CrossRef] [PubMed]
Capó, M.; Pérez, A.; Lozano, J.A. An Efficient Split-Merge Re-Start for the K K-Means Algorithm. IEEE Trans. Knowl. Data Eng. 2020, 34, 1618–1627. [Google Scholar] [CrossRef]
Li, K.; Xu, L.; Su, L.; Gu, J.; Ji, Y.; Wang, G.; Ming, X. X-Ray Detection of Ceramic Packaging Chip Solder Defects Based on Improved YOLOv5. NDT E Int. 2024, 143, 103048. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference On Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Wang, J.; Lin, B.; Li, G.; Zhou, Y.; Zhong, L.; Li, X.; Zhang, X. YOLO-Xray: A Bubble Defect Detection Algorithm for Chip X-Ray Images Based on Improved YOLOv5. Electronics 2023, 12, 3060. [Google Scholar] [CrossRef]
Cao, Y.; Ni, Y.; Zhou, Y.; Li, H.; Huang, Z.; Yao, E. An Auto Chip Package Surface Defect Detection Based on Deep Learning. IEEE Trans. Instrum. Meas. 2023, 73, 3507115. [Google Scholar] [CrossRef]
Wang, S.; Xia, X.; Ye, L.; Yang, B. Automatic Detection and Classification of Steel Surface Defect Using Deep Convolutional Neural Networks. Metals 2021, 11, 388. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Lu, X.; He, Z.; Shi, T. Using Convolutional Neural Network for Intelligent SAM Inspection of Flip Chips. Meas. Sci. Technol. 2021, 32, 115022. [Google Scholar] [CrossRef]
Zheng, P.; Lou, J.; Wan, X.; Luo, Q.; Li, Y.; Xie, L.; Zhu, Z. LED Chip Defect Detection Method Based on a Hybrid Algorithm. Int. J. Intell. Syst. 2023, 2023, 4096164. [Google Scholar]
Li, M.; Chen, N.; Suo, X.; Yin, S.; Liu, J. An Efficient Defect Detection Method for Nuclear-Fuel Rod Grooves through Weakly Supervised Learning. Measurement 2023, 222, 113708. [Google Scholar] [CrossRef]
Ge, P.; Chen, Y.; Wang, G.; Weng, G. An Active Contour Model Based on Jeffreys Divergence and Clustering Technology for Image Segmentation. J. Vis. Commun. Image Represent. 2024, 99, 104069. [Google Scholar] [CrossRef]
Wang, G.; Li, Z.; Weng, G.; Chen, Y. An Optimized Denoised Bias Correction Model with Local Pre-Fitting Function for Weak Boundary Image Segmentation. Signal Process. 2024, 220, 109448. [Google Scholar] [CrossRef]
Li, M.; Chen, N.; Hu, Z.; Li, R.; Yin, S.; Liu, J. A Global Feature Interaction Network (GFINet) for Image Segmentation of GaN Chips. Adv. Eng. Inform. 2024, 62, 102670. [Google Scholar] [CrossRef]
Ge, P.; Chen, Y.; Wang, G.; Weng, G.; Chen, H. A Level Set Approach Using Adaptive Local Pre-Fitting Energy for Image Segmentation with Intensity Non-Uniformity. J. Intell. Fuzzy Syst. 2024, 46, 11003–11024. [Google Scholar]
Shankar, N.; Zhong, Z. A Rule-Based Computing Approach for the Segmentation of Semiconductor Defects. Microelectron. J. 2006, 37, 500–509. [Google Scholar] [CrossRef]
Zhang, G.; Xue, J.-H.; Xie, P.; Yang, S.; Wang, G. Non-Local Aggregation for RGB-D Semantic Segmentation. IEEE Signal Process. Lett. 2021, 28, 658–662. [Google Scholar] [CrossRef]
Niu, S.; Peng, Y.; Li, B.; Qiu, Y.; Niu, T.; Li, W. A Novel Deep Learning Motivated Data Augmentation System Based on Defect Segmentation Requirements. J. Intell. Manuf. 2024, 35, 687–701. [Google Scholar] [CrossRef]
Zhang, C.; Liu, X.; Ning, X.; Bai, Y. Hybridformer: An Efficient and Robust New Hybrid Network for Chip Image Segmentation. Appl. Intell. 2023, 53, 28592–28610. [Google Scholar] [CrossRef]
Singh, P.; Mazumder, P.; Namboodiri, V.P. Context Extraction Module for Deep Convolutional Neural Networks. Pattern Recognit. 2022, 122, 108284. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for Mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Jia, P.; Liu, F. Lightweight Feature Enhancement Network for Single-Shot Object Detection. Sensors 2021, 21, 1066. [Google Scholar] [CrossRef]
Huang, X.; Mao, Y.; Li, J.; Wu, S.; Chen, X.; Lu, H. CRUN: A Super Lightweight and Efficient Network for Single-Image Super Resolution. Appl. Intell. 2023, 53, 29557–29569. [Google Scholar] [CrossRef]
Gao, H.; Yang, Y.; Li, C.; Gao, L.; Zhang, B. Multiscale Residual Network with Mixed Depthwise Convolution for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 3396–3408. [Google Scholar] [CrossRef]
Noh, H.; Hong, S.; Han, B. Learning Deconvolution Network for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Yan, Z.; Chen, J.; Hu, R.; Huang, T.; Chen, Y.; Wen, S. Training Memristor-Based Multilayer Neuromorphic Networks with SGD, Momentum and Adaptive Learning Rates. Neural Netw. 2020, 128, 142–149. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
He, Z.; Pei, Z.; Li, E.; Zhou, E.; Huang, Z.; Xing, Z.; Li, B. An Image Segmentation-Based Localization Method for Detecting Weld Seams. Adv. Eng. Softw. 2024, 194, 103662. [Google Scholar] [CrossRef]
Li, G.; Liu, Q.; Zhao, S.; Qiao, W.; Ren, X. Automatic Crack Recognition for Concrete Bridges Using a Fully Convolutional Neural Network and Naive Bayes Data Fusion Based on a Visual Detection System. Meas. Sci. Technol. 2020, 31, 075403. [Google Scholar] [CrossRef]
Sun, Y.; Zheng, W. HRNet-and PSPNet-Based Multiband Semantic Segmentation of Remote Sensing Images. Neural Comput. Appl. 2023, 35, 8667–8675. [Google Scholar] [CrossRef]
Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A.; Agrawal, A. Context Encoding for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7151–7160. [Google Scholar]
Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The structure of the proposed LDDP-Net.

Figure 2. The structure of the depthwise convolution and the pointwise convolution.

Figure 3. The structure of the proposed dual decoding paths.

Figure 4. The strategy of the feature fusion.

Figure 5. The segmentation result comparison of different models. (a) Original image. (b) Ground truth. (c) FCN. (d) PSPNet. (e) EncNet. (f) GCNet. (g) DeepLabV3. (h) SegNet. (i) LDDP-Net.

Figure 6. The segmentation result comparison of different models on the complex defect pattern. (a) Original image. (b) Ground truth. (c) FCN. (d) PSPNet. (e) EncNet. (f) GCNet. (g) DeepLabV3. (h) SegNet. (i) LDDP-Net.

Figure 7. Curves of mIoU and mPA when different downsampling multiples are adopted.

Table 1. The detail of structural composition of the modified MobileNetV3.

Input	Operator	Exp Size	#Out	SE	Activation	Stride
160 × 512 × 3	CBA	-	16	-	Hardswish	2
80 × 256 × 16	Block1, 3 × 3	16	16	-	ReLU	1
80 × 256 × 16	Block2, 3 × 3	64	24	-	ReLU	2
40 × 128 × 24	Block2, 3 × 3	72	24	-	ReLU	1
40 × 128 × 24	Block2, 5 × 5	72	40	√	ReLU	2
20 × 64 × 40	Block2, 5 × 5	120	40	√	ReLU	1
20 × 64 × 40	Block2, 5 × 5	120	40	√	ReLU	1
20 × 64 × 40	Block2, 3 × 3	240	80	-	Hardswish	1
20 × 64 × 80	Block2, 3 × 3	200	80	-	Hardswish	1
20 × 64 × 80	Block2, 3 × 3	184	80	-	Hardswish	1
20 × 64 × 80	Block2, 3 × 3	184	80	-	Hardswish	1
20 × 64 × 80	Block2, 3 × 3	480	112	√	Hardswish	1
20 × 64 × 112	Block2, 3 × 3	672	112	√	Hardswish	1
20 × 64 × 112	Block2, 5 × 5	672	160	√	Hardswish	1
20 × 64 × 160	Block2, 5 × 5	960	160	√	Hardswish	1
20 × 64 × 160	Block2, 5 × 5	960	160	√	Hardswish	1

Table 2. Comparison of performance indicators of different networks on the chip dataset.

Model	IoU %					mIoU %	mPA %	Params	FLOPs
Model	Background	Lead	Electrode	Light-Emitting Area	Defect	mIoU %	mPA %	Params	FLOPs
FCN [47]	96.21	52.90	93.99	94.30	59.55	79.39	84.93	47.11 M	61.46 G
PSPNet [48]	96.57	48.91	93.80	93.87	55.84	77.80	83.49	46.60 M	55.50 G
EncNet [49]	96.54	62.19	95.23	95.12	72.58	84.33	93.62	33.49 M	43.68 G
GCNet [50]	96.46	61.64	94.81	95.01	73.55	84.30	94.37	47.24 M	61.46 G
DeepLabV3 [51]	97.02	60.30	91.51	94.74	67.98	82.31	87.61	39.64 M	51.23 G
SegNet [52]	97.18	79.59	95.22	96.46	69.18	87.52	92.33	29.44 M	50.35 G
LDDP-Net	97.26	82.30	95.70	97.03	78.75	90.29	94.79	2.98 M	2.24 G

Table 3. Comparison of performance metrics for different decoding paths on chip datasets.

Downsampling Model	IoU %					mIoU %	mPA %
Downsampling Model	Background	Lead	Electrode	Light-Emitting Area	Defect	mIoU %	mPA %
Only coarse decoding path	88.68	56.13	77.79	92.17	61.95	75.34	82.35
Only fine-grained decoding path	96.46	77.81	91.38	94.91	39.15	79.94	83.78
Dual decoding paths	97.26	82.68	95.70	97.03	78.75	90.29	94.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Chen, N.; Li, M.; Zhang, Y.; Suo, X.; Li, R.; Liu, J. LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips. Sensors 2025, 25, 425. https://doi.org/10.3390/s25020425

AMA Style

Zhang J, Chen N, Li M, Zhang Y, Suo X, Li R, Liu J. LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips. Sensors. 2025; 25(2):425. https://doi.org/10.3390/s25020425

Chicago/Turabian Style

Zhang, Jie, Ning Chen, Mengyuan Li, Yifan Zhang, Xinyu Suo, Rong Li, and Jian Liu. 2025. "LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips" Sensors 25, no. 2: 425. https://doi.org/10.3390/s25020425

APA Style

Zhang, J., Chen, N., Li, M., Zhang, Y., Suo, X., Li, R., & Liu, J. (2025). LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips. Sensors, 25(2), 425. https://doi.org/10.3390/s25020425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LDDP-Net: A Lightweight Neural Network with Dual Decoding Paths for Defect Segmentation of LED Chips

Abstract

1. Introduction

2. Methodology

2.1. Overall Structure of LDDP-Net

2.2. Modified MobileNetV3 Backbone

2.3. Dual Decoding Paths

2.4. Multi-Layer Feature Fusion

3. Experimental Results

3.1. Experiment Setup

3.1.1. Implementation Details

3.1.2. Dataset

3.1.3. Evaluation Metrics

3.2. Overall Performance

3.2.1. Analysis Based on the Performance Indicators

3.2.2. Analysis Based on Visualization

3.2.3. Ablation Experiments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI