A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO

Zhao, Xinfang; Lyu, Qinghua; Zeng, Hui; Ling, Zhuoyi; Zhai, Zhongsheng; Lyu, Hui; Riffat, Saffa; Chen, Benyuan; Wang, Wanting

doi:10.3390/mi16030267

Open AccessArticle

A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO

by

Xinfang Zhao

¹,

Qinghua Lyu

^1,*

,

Hui Zeng

¹,

Zhuoyi Ling

¹,

Zhongsheng Zhai

²,

Hui Lyu

^1,3,

Saffa Riffat

⁴,

Benyuan Chen

¹ and

Wanting Wang

¹

National “111 Research Center” Microelectronics and Integrated Circuits, School of Science, Hubei University of Technology, Wuhan 430068, China

²

School of mechanical Engineering, Hubei University of Technology, Wuhan 430068, China

³

College of Physics and Mechanical Engineering, Hubei University of Education, Wuhan 430205, China

⁴

Department of Architecture & Built Environment, The University of Nottingham, Nottingham NG7 2RD, UK

^*

Author to whom correspondence should be addressed.

Micromachines 2025, 16(3), 267; https://doi.org/10.3390/mi16030267

Submission received: 25 December 2024 / Revised: 24 February 2025 / Accepted: 25 February 2025 / Published: 26 February 2025

Download

Browse Figures

Versions Notes

Abstract

:

Photodetectors are indispensable in a multitude of applications, with the detection of surface defects serving as a cornerstone for their production and advancement. To meet the demands of real-time and accurate defect detection, this paper introduces an optimization algorithm based on the GLV-YOLO model tailored for photodetector defect detection in manufacturing settings. The algorithm achieves a reduction in the model complexity and parameter count by incorporating the GhostC3_MSF module. Additionally, it enhances feature extraction capabilities with the integration of the LSKNet_3 attention mechanism. Furthermore, it improves generalization performance through the utilization of the WIoU loss function, which minimizes geometric penalties. The experimental results showed that the proposed algorithm achieved 98.9% accuracy, with 2.1 million parameters and a computational cost of 7.0 GFLOPs. Compared to other methods, our approach outperforms them in both performance and efficiency, fulfilling the real-time and precise defect detection needs of photodetectors.

Keywords:

lightweight model; machine vision; PIN photodiode; surface defect detection

1. Introduction

Photodiodes, which function to convert optical radiation into electrical signals, find extensive application across numerous fields [1,2,3]. Among the various types of photodiodes, positive–intrinsic–negative (PIN) photodiodes are particularly esteemed for their exceptional frequency response, high sensitivity, and superior signal-to-noise ratio. These attributes render PIN photodiodes ideal for monitoring and imaging tasks within the visible and infrared wavelength ranges [4,5,6]. Due to these qualities, PIN photodiodes have established themselves as indispensable components in a wide range of fields, including aerospace [7], defense and security, optical communications [8], medical devices [9], and scientific instruments. Despite their simple structure and well-established manufacturing processes, the production of photodiodes is still susceptible to defects caused by factors like manufacturing techniques, environmental conditions, and equipment variability [10,11]. As the semiconductor industry progresses and device dimensions continue to shrink, the demand for higher quality photodiodes has intensified [12]. Consequently, surface defect detection is crucial not only for enhancing the device quality and yield but also for driving innovation in the semiconductor industry.

The need for automated surface defect detection in semiconductor manufacturing has grown significantly. Common defects include foreign particles, fractures, and organic contamination, all of which can compromise device performance [13]. Traditionally, these defects were identified manually by skilled inspectors, a process that is inefficient, error-prone, and costly. Machine vision methods have provided an alternative, offering reliable detection for simple and well-defined defects, but they often fail when addressing complex defect patterns [14,15]. The advent of computer vision technologies has resolved many of these limitations, enabling their application in semiconductor defect detection. For example, Jiabin Jiang et al. developed a U-Net-based convolutional neural network (CNN) with an encoder–decoder architecture to detect surface defects on screen-printed smartphone back glass, achieving over 91% accuracy and a recall rate exceeding 95% [16]. Similarly, Hang Zhang et al. proposed a deep learning-based three-stage approach for TO56 semiconductor laser defect detection, encompassing localization, segmentation, and defect pattern recognition [17]. Shang Wu et al. introduced the Dense Skip Connection U-Net (DSCU-Net), which optimized skip connections between the encoder and decoder to enhance high-order feature integration, effectively addressing defects in semiconductor chip manufacturing [18]. However, these methods are often limited by high computational demands, complex models, and slower detection speeds.

The You Only Look Once (YOLO) framework has gained popularity for its balance of speed and accuracy in object detection [19,20,21]. For instance, Fei Ren et al. introduced the ECA-SimSPPF-SIoU-YOLOv5 algorithm for steel defect detection, achieving a 7.1% improvement in the mean average precision (mAP) compared to the original YOLOv5s model [22]. Moyun Liu et al. developed the LF-YOLO algorithm for detecting welding defects in X-ray images, incorporating an efficient feature extraction (EFE) module that achieved an mAP of 92.9% and a frame-per-second (FPS) rate of 61.5 [21]. Yunchang Zheng et al. proposed the GBCD-YOLO model for high-precision, lightweight, and real-time wood surface defect detection, reporting a 13.45% improvement in the mAP(0.5) and an 11.95% improvement in the mAP(0.5:0.95), along with a 6.25 FPS increase compared to YOLOv5s, while reducing the model parameters by 15.49% [23]. These studies demonstrate the potential of YOLO-based models for surface defect detection in PIN photodiodes.

Despite these advances, limited research has focused specifically on using deep learning for detecting surface defects in PIN photodiodes. In this study, we compiled a labeled dataset tailored to PIN photodiode surface defects and applied diverse data preprocessing techniques to improve the model robustness in practical scenarios. To address the challenges of maintaining detection accuracy while minimizing the model complexity and computational cost, this paper introduces a GLV-YOLO-based optimization algorithm tailored for PIN photodiode surface defect detection. The proposed model outperforms YOLOv8 in terms of accuracy, with fewer parameters and reduced computational requirements.

2. Related Work

One-stage detection models are renowned for their speed and accuracy, with the YOLO series being a prime example of this category. YOLOv8, developed by Ultralytics, represents one of the latest advancements in the YOLO series of object detection algorithms. The YOLOv8 framework is available in five variants, YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, ordered by increasing model weight, width, and depth. As illustrated in Figure 1, the overall network architecture of YOLOv8 comprises four main components: the input layer, Backbone layer, Neck layer, and Head layer.

Backbone: YOLOv8 utilizes CSPDarkNet as its Backbone network. The Conv module consists of convolutional layers, Batch Normalization (BN) layers [24], and an SiLU activation function [25]. The convolutional layers extract feature information from the input image, which is then processed through the BN layers to accelerate the network’s convergence. Finally, the SiLU activation function helps the network better adapt to complex data and model more intricate functions. The BN layer operations are detailed in Formulas (1)–(4), which include calculating the sample mean, computing the sample variance, standardizing the data, and applying translation and scaling, where γ and β are learnable parameters.

μ_{β} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}

(1)

σ_{β}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - u_{β})}^{2}

(2)

x_{i}^{'} = \frac{x_{i} - u_{β}}{\sqrt{σ_{β}^{2} + ε}}

(3)

y_{i} = γ x_{i}^{'} + β = B N_{γ, β} (x_{i})

(4)

The SiLU activation function has a derivative that approaches 1 as the input approaches 0, which allows it to retain more feature information during forward propagation. The computation process for SiLU is shown in Formula (5).

S i L U (x) = x * s i g m o i d (x) = \frac{x}{1 + e^{- x}}

(5)

The C2f module is introduced as a residual connection module to better preserve the feature information of the original image. This module splits the output into two parts. One part is passed directly to the output, while the other part is processed through multiple Bottleneck modules. Finally, the results from both parts are concatenated along the channel dimension and passed through a second convolutional layer (Conv) to obtain the final output. The SPPF module is then used to capture features at different scales.

Neck: This layer adopts the FPN+PAN feature pyramid structure [26], which performs feature fusion across different scale levels. This effectively extracts feature information at various scales, constructing a feature pyramid with rich information across all scales.

Head: The Head consists of three different scale layers that output information such as the object confidence, class scores, and anchor box coordinates for objects of various sizes. Non-Maximum Suppression [27] (NMS) is then applied to obtain the final object coordinates.

Loss Function: YOLOv8 uses CIoU [28] as the loss function for anchor boxes, as shown in Formulas (6)–(8), where α represents the weight function and ν measures the aspect ratio. A diagram of the CIoU loss is illustrated in Figure 2.

α = \frac{v}{(1 - I o U) + v}

(6)

v = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(7)

C I o U L o s s = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(8)

3. Proposed Method

The GVL-YOLO model framework, shown in Figure 3, consists of four main components: the input layer, Backbone layer, Neck layer, and Head layer. The input layer performs image preprocessing before the images are passed into the model. Various augmentation techniques, including geometric transformations, exposure adjustments, field-of-view darkening, and noise addition, are applied to enhance the dataset and improve the model’s robustness.

In the Backbone layer, the GhostConv [29] and GhostC3 models are used to reduce the model’s parameter count and computational complexity, achieving a more lightweight design. For the last two layers, which have more channels, GhostC3_MSF redistributes the weights across these channels to improve feature extraction. Additionally, the LSKNet [30] module is incorporated to capture contextual information at multiple scales, further enhancing the model’s feature extraction capabilities.

The Neck layer uses an FPN+PAN feature pyramid structure [26] to fuse features across multiple scales. This structure efficiently extracts information at various scales and constructs a feature-rich pyramid. Lightweight modules, such as GSConv [31] and VoVGSCSP, replace the original Neck modules, reducing the number of parameters while maintaining accuracy.

Finally, in the Head layer, predictions are made for class labels and anchor box locations. The WIoU loss [32] function is used for the localization loss, which more accurately estimates the model performance and leads to improved predictions with lower loss.

3.1. Lightweight Backbone Network

In convolutional neural networks (CNNs), frequent convolution operations often lead to increased model parameters and computational costs. To address this issue, the present study replaced the original YOLOv8 Backbone with the GhostConv and GhostC3 modules, resulting in a lightweight and efficient architecture. The model uses multiple 1 × 1 convolutional kernels instead of the larger kernels from the original network and incorporates depthwise convolutions. This design reduces the number of parameters while improving computational efficiency.

Although the use of depthwise separable convolutions improves the efficiency of the model, it may lead to a loss in accuracy. To strike a balance between model efficiency and accuracy, we adopted a multi-scale selective fusion approach, which redistributes weight information for comprehensive fusion. This approach reduces the model parameters while enhancing accuracy. In the latter half of the Backbone, we designed the GhostC3_MSF (multi-scale selective fusion) network. In this module, we replaced the original GhostBottleneck structure in the GhostC3 module with our proposed GhostBottleneck_MSF structure.

For the input feature layer F_in of the GhostBottleneck_MSF, we first replaced the residual structure in the original YOLOv8 with the GhostBottleneck [29], aiming to reduce the computational complexity using lightweight convolution operations while ensuring no significant performance degradation, thereby preserving the model’s ability to extract spatial features effectively. The output feature maps FM₁ and FM₂ from the GhostBottleneck are then processed through global average pooling and 1 × 1 convolution operations, transforming spatial features into channel-wise representations. These representations are used to generate the channel attention weights W₁ and W₂.

Next, the weights W₁ and W₂ are concatenated, and the Softmax and sigmoid activation functions are applied to compute the normalized probabilities of the two weights, resulting in the redistributed weights W_new₁ and W_new₂. This process enhances the model’s focus on the most important channels, emphasizing relevant features while suppressing less significant ones. The updated weight information is then applied to the feature maps through weighted fusion, yielding refined spatial feature information. Finally, a residual connection is established between the two feature layers to enable the learning of deeper, more abstract features. The specific architecture of the GhostBottleneck_MSF structure is illustrated in Figure 4.

In summary, by replacing the original GhostBottleneck structure in the GhostC3 network with our newly designed GhostBottleneck_MSF, we created the GhostC3_MSF network, which combines the benefits of multi-scale fusion with efficient computation and enhanced accuracy.

3.2. Lightweight Modules in the Neck Layer

Within the Neck layer, the feature scales are reduced, concentrating feature information within the channels with minimal redundancy. This allows the model to maintain performance while reducing the number of parameters. In this study, the GSConv and VoVGSCSP lightweight modules were introduced to replace the convolutional components in the Neck layer of the original model. Compared to other lightweight modules, GSConv effectively preserves hidden connections between channels, which reduces the network’s complexity while maintaining high accuracy. This approach balances model performance and speed.

In the GSConv module, the input has C₁ channels, and the output has C₂ channels. First, a standard convolution operation is applied to the input feature layer F_in, producing a hidden feature layer with C₂/2 channels, which reduces the number of parameters. Next, depthwise separable convolution (DWConv) is applied to the hidden feature layer, keeping the number of channels at C₂/2. Then, a concatenation operation (Concat) is used to merge the results of the standard convolution and depthwise convolution, ensuring the final output has C₂ channels. Finally, a channel shuffle operation is applied to enhance the fusion of information across different channels, improving the model’s ability to extract semantic information. The specific steps are illustrated in Figure 5a,b.

Next, the GSConv modules are used to construct the GSBottleNeck module, as shown in Figure 5c. The module consists of two paths: the first path uses two GSConv layers, with the first outputting half of the target output channels; the second path is composed of a 1 × 1 point-wise convolution (PWConv). The outputs of both paths are then fused through a residual connection to generate an output feature map with the desired C₂ channels.

The final VoVGSCSP module is constructed by combining the GSBottleNeck module with several PWConv layers. In the VoVGSCSP module, the PWConv layers reduce the input channels by half, and the results of the GSBottleNeck module are concatenated with the output from the PWConv layers. This process produces the final feature map with C₂ output channels. The detailed operation is illustrated in Figure 5d.

3.3. The Introduction of the LSKNet_3 Attention Mechanism

In the dataset used in this study, many defect types occurred at the edges of the images, making edge detection particularly crucial. To improve the network’s sensitivity to edge defects, larger convolutional kernels were employed to expand the receptive field. A larger receptive field typically enhances the network’s ability to capture global features, enabling it to learn more comprehensive, broader, and semantically richer representations. However, large convolutional kernels pose challenges, including the potential loss of fine details and an increase in the number of parameters.

To overcome these challenges, this study utilized the LSKNet model, which decomposes large convolutional kernels into dilated convolutions with varying sizes and dilation rates. This approach allows the network to generate feature representations with different receptive fields, effectively capturing edge and other defect information. Moreover, the decomposition of large kernels into smaller ones reduces the parameter count, improving the network’s efficiency while maintaining strong semantic information extraction.

The LSKNet_3 network is composed of three key components: the decomposition of large convolutional kernels, spatial feature interaction, and feature-weighted summation. The decomposed convolutional layers are first applied to the input feature map X∈R^C×H×W, enabling the extraction of contextual information at various scales.

U_{0} = X U_{i + 1} = F_{i}^{d K} (U_{i})

(9)

The ith depthwise separable convolution layer, with a kernel size of K_i and a dilation rate of d_i, is denoted by

F_{i}^{d K} (\cdot)

.The output of the i + 1th feature layer is represented by U_i₊₁. A subsequent 1 × 1 convolution with the dilation rate

F_{i}^{1 \times 1} ()

is applied to divide the input convolution channels into N groups, facilitating the extraction of distinct features from different receptive fields.

U_{i} = F_{i}^{1 \times 1} (U_{i})

(10)

W_{i}^{1 \times H \times W}

represents the features at the ith scale.

To direct the network’s focus toward the most relevant spatial regions, features at multiple scales are extracted using three distinct receptive fields and concatenated along the channel dimension. Then, global average pooling and global max pooling operations are performed along the channel axis to compress the global channel information.

U = Concat (U_{1} \dots U_{i})

(11)

{SA}_{avg} \in R^{1 \times H \times W} = AvgPool (U)

(12)

{SA}_{\max} \in R^{1 \times H \times W} = MaxPool (U)

(13)

To facilitate the interaction between the two types of spatial information, their respective feature layers are concatenated. A convolutional layer, followed by a sigmoid function, is then used to compute the corresponding weight

W_{i}^{1 \times H \times W}

for each scale. These N weights are applied to the N feature layers, where they are used for weighted feature fusion.

W_{i}^{1 \times H \times W} = sigmoid ({F_{i}}^{2 \to N} {(Concat (SA}_{avg} {, SA}_{\max})))

(14)

S \in R^{C \times H \times W} = F (\sum_{i}^{N} W_{i}^{1 \times H \times W} \cdot U_{i})

(15)

The final output is obtained as the point-wise, element-wise product between the input feature

X \in R^{C \times H \times W}

and the output weight layer

Y \in R^{C \times H \times W}

.

The LSKNet_3 model is illustrated in Figure 6.

3.4. Loss Function

The CIoU loss function considers the overlap between the predicted and ground truth bounding boxes, the distance between their centers, and the aspect ratio. However, low-quality samples in training datasets may be penalized more heavily due to geometric factors such as the distance and aspect ratio, which can negatively impact the model’s generalization ability. To address this issue, the WIoU loss function introduces a distance-aware mechanism (RWIoU) to reduce the influence of these geometric factors. Furthermore, while the CIoU loss function increases the computational complexity, the WIoU loss function effectively reduces the computational burden, thereby enhancing the detection speed. The WIoU loss function is described in Equations (16) and (17).

L_{W I o U} = R_{W I o U} L_{I o U}

(16)

R_{W I o U} = \exp (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{({W_{g}}^{2} + {H_{g}}^{2})})

(17)

4. Experiments and Analysis

4.1. Dataset

The primary detectors used in optical communication are PIN photodetectors and avalanche photodiodes (APDs). This study collected a dataset comprising nine different defect types occurring during the production of PIN photodiodes. The types and corresponding sample counts of photodiode surface defects collected in this study are presented in Table 1. The images in the dataset had a resolution of 237 × 239 pixels. In total, 3450 images of surface defects on PIN photodiodes were collected and split into training, validation, and test sets in an 8:1:1 ratio. Specifically, the dataset contained 2760 images for training, 345 for validation, and 345 for testing. The dataset was collected under a microscope, and the training set was labeled using a manual annotation method.

The defects in the diodes were categorized as follows: an image of a complete photodetector is shown in Figure 7a, while other defect types included breakdown (Figure 7b), electrode loss (Figure 7c), missing parts (Figure 7d), organic contamination (Figure 7e), redundancy (Figure 7f), damage (Figure 7g), foreign objects (Figure 7h), cracks (Figure 7i), and film damage (Figure 7j). A sample of the collected images depicting these defects is shown in Figure 7.

4.2. Image Preprocessing

The dataset was collected under normal conditions with the microscope sampling equipment functioning properly, resulting in high-quality samples. However, to simulate potential real-world issues, such as noise and variations in the field of view, we applied several image enhancement techniques to artificially expand the dataset. These techniques included (b) geometric transformations, (c) noise addition, (d) field-of-view brightening, and (e) field-of-view darkening. These preprocessing steps aimed to improve the robustness of the model, enabling it to handle the complex scenarios and variations often encountered during real-world defect detection. Figure 8 illustrates examples of these preprocessing steps.

4.3. Computational Environment and Simulation Parameters

The computational environment used for model training and evaluation consisted of the following hardware and software components: a Windows 10 operating system, Intel(R) Xeon(R) W-2235 CPU @ 3.80 GHz, NVIDIA RTX A2000 GPU (12GB), 64 GB of RAM, PyTorch 2.2.2 with CUDA 12.1, and Python 3.10. The simulation parameters are detailed in Table 2.

4.4. Model Evaluation Metrics

To objectively evaluate the performance of defect detection tasks across various defect types, the following metrics were considered: P (precision), R (recall), the mean average precision at an IoU of 0.5 (mAP@0.5), the mean average precision at an IoU of 0.5–0.95 (mAP@0.5–0.95), and the number of model parameters. The precision represents the proportion of true positive samples among all detected targets; the recall measures the proportion of true positives correctly identified from all actual positive samples; the mAP@0.5 refers to the average precision (AP) at an Intersection over Union (IoU) threshold of 0.5, where the IoU is the ratio of the overlap between anchor boxes; and the parameters refer to the number of parameters in the model. The specific formulas are provided in Equations (18)–(21).

P = \frac{T P}{T P + F P}

(18)

R = \frac{T P}{T P + F N}

(19)

A P = \int_{0}^{1} P (R) d R

(20)

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(21)

In this context, TP denotes true positives, where the label is positive and the prediction is also positive; FP represents false positives, where the label is negative but the prediction is positive; and FN refers to false negatives, where the label is positive but the prediction is negative. AP indicates the average precision for each class, N is the number of classes, and A

P_{i}

represents the average precision across all classes.

4.5. The Experimental Results of the Model Architecture

To validate the effectiveness of the proposed improvements, experiments were conducted using different model architectures on the baseline model. First, GhostNet was used to replace the Backbone network for feature extraction. Next, the GSConv and VoVGSCSP layers were introduced in the Neck module. Finally, the LSKNet attention mechanism was applied. The results of the ablation study are presented in Table 3. The original YOLOv8 model achieved an mAP of 95.8%. After introducing GhostNet into the Backbone (Model 1), the model parameters decreased by 22.5%, with a slight reduction in the mAP. In Model 2, the addition of GSConv and VoVGSCSP layers in the Neck module led to a 13.3% reduction in the model parameters. The introduction of the LSKNet attention mechanism and changes in the network structure resulted in a slight increase in the model parameters, but the mAP improved by 3.2%. However, this led to an increase in the computational complexity. The improved YOLOv8 model demonstrated an accuracy increase of 0.8%, a recall rate improvement of 4.9%, an mAP@0.5 increase of 3.1%, and an mAP@0.5–0.95 increase of 10.5%. These results indicate that the proposed modifications significantly enhanced the performance of the YOLOv8 model in detecting surface defects in optoelectronic detectors.

To evaluate the advantages of the proposed WIoU loss function over other commonly used loss functions, a comparison was made between the WIoU and SIoU, CIoU, EIoU, GIoU, and DIoU loss functions. As shown in the experimental results in Figure 9a for the mAP@50 and Figure 9b for the mAP@50:95, the WIoU loss function consistently outperformed the others in terms of the average precision after 25 training epochs. Specifically, as indicated in Table 4, the WIoU loss function achieved a 1.5% improvement over CIoU, a 2.6% improvement over SIoU, and a 2% improvement over EIoU at an mAP@50:95.

4.6. Results Analysis

Figure 10 depicts a comparison of heatmaps for different models. Although the YOLOv8 model focuses on surface defect locations, it also assigns weights to irrelevant areas. This results in false positives and missed detections. In contrast, the GLV-YOLO model more effectively targets defect locations, assigns higher weights to these areas, and reduces the weights given to irrelevant regions. Consequently, GLV-YOLO has a lower rate of false positives and missed detections compared to YOLOv8. In these heatmaps, the colors represent the weight assigned by the model to each region, where cold colors (e.g., blue) correspond to areas with lower weights, indicating less importance or attention, and warm colors (e.g., red and yellow) represent areas with higher weights, suggesting a greater significance or focus for defect detection.

The accuracy–recall curves for the nine types of defects detected by the proposed model are shown in Figure 11, with an average accuracy of 98.6%. These results demonstrate that the model exhibits strong defect detection capabilities for optoelectronic diode surfaces. Figure 11 presents the defect detection results of the proposed model under various challenging environments, including noisy, normal, field-of-view brightness variation, and geometric transformation conditions. As shown in Figure 12, the proposed algorithm was able to accurately identify various defect categories on the surface of optoelectronic diodes across different detection environments.

4.7. Comparison with Other Models

In this section, we compare the proposed model with several other models, including Yolov5s, Yolov7-tiny, Yolov8n, Yolov8s, and Yolov3tiny. The results clearly show that our proposed model outperforms these methods. On the dataset of optoelectronic detector defects, our algorithm achieved higher accuracy than Yolov5s, Yolov7-tiny, and Yolov8n. Although the mAP value of our model was comparable to that of Yolov8s, our algorithm had fewer parameters and a lower computational cost. In Table 5, the mAP and computational costs for different models are presented. Our proposed algorithm achieved high accuracy with a small computational cost, reaching an average precision of 98.9% with a computational load of 7 GFLOPs. From Table 5, it is evident that our algorithm had 2.18 million parameters and a weight size of 4.7 MB. Overall, the proposed algorithm maintained high precision with fewer parameters, allowing for the efficient and accurate detection of surface defects on optoelectronic diodes. However, the FPS of our model was 51, which is 19 frames lower than that of the original model.

5. Conclusions

This study presents a lightweight detection method, GVL-YOLO, designed to efficiently identify small surface defects on optoelectronic diodes. The method integrated the LSKNet_3 attention mechanism into the original model to enhance the extraction of contextual information. By using convolution kernels of varying sizes, it sampled different receptive fields, effectively capturing the context across a range of defect sizes. Furthermore, the inclusion of lightweight GSConv and VoVGSCSP modules reduced the model’s parameters from 3.1 million to 2.18 million, representing a 30.5% reduction compared to the original algorithm. The improved method boosted accuracy by 3.1%, while reducing the computational cost by 20%, ensuring precise defect detection on optoelectronic detectors’ surfaces. When compared to mainstream algorithms, the proposed method demonstrated significant improvements, making it more suitable for real-time monitoring and high-precision detection. Additionally, the model’s weight file size decreased from 5.94 MB to 4.71 MB, further enhancing its compactness. Overall, the GVL-YOLO model represents a substantial advancement in surface defect detection for optoelectronic diodes, with broad applications in real-time monitoring and quality control in semiconductor manufacturing. The findings demonstrate the feasibility of implementing more efficient, lightweight models for industrial use, and future work will focus on further optimizing the model depth and detection speed.

Author Contributions

H.L. was responsible for conceptualization, supervision, and project administration. S.R. contributed to the methodology. X.Z. contributed to the software, formal analysis, validation (with H.Z.), writing of the original draft, and data curation. Z.L. conducted the investigation. Z.Z. provided the resources. W.W. was involved in data curation. Q.L. was responsible for writing—review and editing—and funding acquisition. B.C. contributed to the visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Wuhan Key Research and Development Plan (Grant No. 2022012202015034) and the China-Sudan Joint Laboratory of New Photovoltaic Ecological Agriculture (Grant No. 2023YFE0126400).

Data Availability Statement

The data are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Oanh Vu, T.K.; Tran, M.T.; Tu, N.X.; Thanh Bao, N.T.; Kim, E.K. Electronic transport mechanism and defect states for p-InP/i-InGaAs/n-InP photodiodes. J. Mater. Res. Technol. 2022, 19, 2742–2749. [Google Scholar] [CrossRef]
Kim, B.C.; Park, K.; Jung, B. An FIR Filter Using Segmented Photodiodes for Silicon Photodiode Equalization. IEEE Photonics Technol. Lett. 2016, 28, 2515–2518. [Google Scholar] [CrossRef]
Sarcan, F.; Doğan, U.; Althumali, A.; Vasili, H.B.; Lari, L.; Kerrigan, A.; Kuruoğlu, F.; Lazarov, V.K.; Erol, A. A novel NiO-based p-i-n ultraviolet photodiode. J. Alloys Compd. 2023, 934, 167806. [Google Scholar] [CrossRef]
Pelamatti, A.; Goiffon, V.; Moreira, A.D.I.; Magnan, P.; Virmontois, C.; Saint-Pé, O.; Boisanger, M.B. Comparison of Pinning Voltage Estimation Methods in Pinned Photodiode CMOS Image Sensors. IEEE J. Electron Devices Soc. 2016, 4, 99–108. [Google Scholar] [CrossRef]
Lee, J.; Georgitzikis, E.; Hermans, Y.; Papadopoulos, N.; Chandrasekaran, N.; Jin, M.; Siddik, A.B.; De Roose, F.; Uytterhoeven, G.; Kim, J.H.; et al. Thin-film image sensors with a pinned photodiode structure. Nat. Electron. 2023, 6, 590–598. [Google Scholar] [CrossRef]
Fortsch, M.; Zimmermann, H.; Pless, H. 220-MHz monolithically integrated optical sensor with large-area integrated PIN photodiode. IEEE Sens. J. 2006, 6, 385–390. [Google Scholar] [CrossRef]
Smithard, J.; Rajic, N.; Van der Velden, S.; Norman, P.; Rosalie, C.; Galea, S.; Mei, H.; Lin, B.; Giurgiutiu, V. An Advanced Multi-Sensor Acousto-Ultrasonic Structural Health Monitoring System: Development and Aerospace Demonstration Materials. Materials 2017, 10, 832. [Google Scholar] [CrossRef]
Zaman Sarker, M.S.; Itoh, S.; Hamai, M.; Takai, I.; Andoh, M.; Yasutomi, K.; Kawahito, S. Design and Implementation of A CMOS Light Pulse Receiver Cell Array for Spatial Optical Communications Sensors. Sensors 2011, 11, 2056–2076. [Google Scholar] [CrossRef]
Jursinic, P. A PIN photodiode ionizing radiation detector with small angular dependence and low buildup. Radiat. Meas. 2023, 166, 106963. [Google Scholar] [CrossRef]
Roch, A.L.; Virmontois, C.; Goiffon, V.; Tauziède, L.; Belloir, J.M.; Durnez, C.; Magnan, P. Radiation-Induced Defects in 8T-CMOS Global Shutter Image Sensor for Space Applications. IEEE Trans. Nucl. Sci. 2018, 65, 1645–1653. [Google Scholar] [CrossRef]
Wei, Z.; Zhang, W.; Wang, D.; Jin, G.Y. Structural, optical and electrical behavior of millisecond pulse laser damaged silicon-based positive-intrinsic-negative photodiode. Optik 2017, 131, 110–115. [Google Scholar] [CrossRef]
Ye, J.; El Desouky, A.; Elwany, A. On the applications of additive manufacturing in semiconductor manufacturing equipment. J. Manuf. Process. 2024, 124, 1065–1079. [Google Scholar] [CrossRef]
Kim, J.; Nam, Y.; Kang, M.C.; Kim, K.; Hong, J.; Lee, S.; Kim, D.N. Adversarial Defect Detection in Semiconductor Manufacturing Process. IEEE Trans. Semicond. Manuf. 2021, 34, 365–371. [Google Scholar] [CrossRef]
Park, J.; Lee, J. Automated visual inspection of particle defect in semiconductor packaging. J. Mech. Sci. Technol. 2024, 38, 4447–4453. [Google Scholar] [CrossRef]
Chia, J.Y.; Thamrongsiripak, N.; Thongphanit, S.; Nuntawong, N. Machine learning-enhanced detection of minor radiation-induced defects in semiconductor materials using Raman spectroscopy. J. Appl. Phys. 2024, 135, 025701. [Google Scholar] [CrossRef]
Jiang, J.; Cao, P.; Lu, Z.; Lou, W.; Yang, Y. Surface Defect Detection for Mobile Phone Back Glass Based on Symmetric Convolutional Neural Network Deep Learning. Appl. Sci. 2020, 10, 3621. [Google Scholar] [CrossRef]
Zhang, H.; Li, R.; Zou, D.; Liu, J.; Chen, N. An automatic defect detection method for TO56 semiconductor laser using deep convolutional neural network. Comput. Ind. Eng. 2023, 179, 109148. [Google Scholar] [CrossRef]
Wu, S.; Zhu, Y.; Liang, P. DSCU-Net: MEMS Defect Detection Using Dense Skip-Connection U-Net. Symmetry 2024, 16, 300. [Google Scholar] [CrossRef]
Yang, S.; Xie, Y.; Wu, J.; Huang, W.; Yan, H.; Wang, J.; Wang, B.; Yu, X.; Wu, Q.; Xie, F. CFE-YOLOv8s: Improved YOLOv8s for Steel Surface Defect Detection. Electronics 2024, 13, 2771. [Google Scholar] [CrossRef]
Wang, Z.; Zhao, L.; Li, H.; Xue, X.; Liu, H. Research on a Metal Surface Defect Detection Algorithm Based on DSL-YOLO. Sensors 2024, 24, 6268. [Google Scholar] [CrossRef]
Liu, M.; Chen, Y.; Xie, J.; He, L.; Zhang, Y. LF-YOLO: A Lighter and Faster YOLO for Weld Defect Detection of X-Ray Image. IEEE Sens. J. 2023, 23, 7430–7439. [Google Scholar] [CrossRef]
Ren, F.; Fei, J.; Li, H.; Doma, B.T. Steel Surface Defect Detection Using Improved Deep Learning Algorithm: ECA-SimSPPF-SIoU-Yolov5. IEEE Access 2024, 12, 32545–32553. [Google Scholar] [CrossRef]
Zheng, Y.; Wang, M.; Zhang, B.; Shi, X.; Chang, Q. GBCD-YOLO: A High-Precision and Real-Time Lightweight Model for Wood Defect Detection. IEEE Access 2024, 12, 12853–12868. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
Neubeck, A.; Gool, L.V. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 850–855. [Google Scholar]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2022, 52, 8574–8586. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.M.; Yang, J.; Li, X. Large Selective Kernel Network for Remote Sensing Object Detection. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 16748–16759. [Google Scholar]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R.J.A. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023. [Google Scholar] [CrossRef]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv 2021. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]

Figure 1. Schematic diagram of YOLOv8 architecture.

Figure 2. Schematic diagram of the CIoU loss function.

Figure 3. The overall framework of the GVL-YOLO model.

Figure 4. Schematic of the GhostBottleneck_MSF structure.

Figure 5. Structure of the GSConv and VoVGSCSP modules.

Figure 6. Structure of the LSKNet_3 model.

Figure 7. Examples of various diode defect types.

Figure 8. Examples of Image Enhancement Techniques.

Figure 9. Comparison of Various IoU Loss Functions. (a) Curve of mAP@50. (b) Curve of mAP@50-90.

Figure 10. Comparison of heatmaps from different models.

Figure 11. Accuracy–recall results of GLV-YOLO.

Figure 12. Defect detection results of the proposed model.

Table 1. Defect types and sample counts.

Defect Type	Sample Count
breakdown	60
electrode loss	690
missing parts	360
organic contamination	480
redundancy	585
damage	3384
foreign objects	1782
cracks	1755
film damage	1947

Table 2. Simulation parameters.

Simulation Parameters	Parameter Values
Epochs	300
Batch size	32
Learning rate	0.01
Optimizer	SGD

Table 3. Experimental results using different model architectures on our dataset.

Model	GhostNet (Backbone)	GsConv + VoV (Neck)	LSKNet_3	mAP0.5 (%)	mAP0.5–0.95 (%)	Param.	P (%)	R (%)	GFLOPs
YOLOv8n	×	×	×	95.8	72.9	3,157,200	95.6	90.9	8.9
Model 1	√	×	×	94.9	70.6	2,445,167	93.2	89.3	7.2
Model 2	×	√	×	96.3	76.3	2,737,107	94.3	92.6	7.9
Model 3	×	×	√	98.5	84.1	3,199,881	96.9	96.6	9.0
Model 4	√	√	×	95.3	72.8	2,169,671	95.4	90.3	6.3
Model 5	√	√	√	98.9	83.4	2,187,390	96.4	97	7.0

Table 4. Comparison of the loss functions in the experimental study.

Loss Function	CIoU [28]	WIoU	EIoU [33]	GIoU [34]	DIoU [35]	SIoU [36]
mAP@50 (%)	98.2	98.9	98.2	98.2	97.9	98
mAP@50:95 (%)	81.9	83.4	81.4	82.3	80.7	80.8

Table 5. Comparison of different models.

Model	mAP0.5 (%)	mAP0.5–0.95 (%)	FLOPs (G)	Parameters (Million)	Weights_file (Mb)	FPS
Yolov3tiny	91.8	58.6	19.1	12.1	11.7	140
Yolov5s	91.1	59.2	16	7.0	13.6	24
Yolov7tiny	89.2	54.4	3.46	6.03	11.7	55
Yolov8n	95.8	72.9	8.9	3.16	5.94	65
Yolov8s	98.7	83.7	28.7	11.2	21.4	60
GLV-YOLO	98.9	83.4	7.0	2.18	4.7	46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, X.; Lyu, Q.; Zeng, H.; Ling, Z.; Zhai, Z.; Lyu, H.; Riffat, S.; Chen, B.; Wang, W. A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO. Micromachines 2025, 16, 267. https://doi.org/10.3390/mi16030267

AMA Style

Zhao X, Lyu Q, Zeng H, Ling Z, Zhai Z, Lyu H, Riffat S, Chen B, Wang W. A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO. Micromachines. 2025; 16(3):267. https://doi.org/10.3390/mi16030267

Chicago/Turabian Style

Zhao, Xinfang, Qinghua Lyu, Hui Zeng, Zhuoyi Ling, Zhongsheng Zhai, Hui Lyu, Saffa Riffat, Benyuan Chen, and Wanting Wang. 2025. "A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO" Micromachines 16, no. 3: 267. https://doi.org/10.3390/mi16030267

APA Style

Zhao, X., Lyu, Q., Zeng, H., Ling, Z., Zhai, Z., Lyu, H., Riffat, S., Chen, B., & Wang, W. (2025). A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO. Micromachines, 16(3), 267. https://doi.org/10.3390/mi16030267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Defect Detection Algorithm for Optoelectronic Detectors Utilizing GLV-YOLO

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Lightweight Backbone Network

3.2. Lightweight Modules in the Neck Layer

3.3. The Introduction of the LSKNet_3 Attention Mechanism

3.4. Loss Function

4. Experiments and Analysis

4.1. Dataset

4.2. Image Preprocessing

4.3. Computational Environment and Simulation Parameters

4.4. Model Evaluation Metrics

4.5. The Experimental Results of the Model Architecture

4.6. Results Analysis

4.7. Comparison with Other Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI