1. Introduction
With the widespread adoption of information technology, digital devices, such as portable laptops, smartphones, and tablets, have experienced significant growth. Liquid Crystal Display (LCD) monitors, known for their low power consumption and lack of radiation pollution, have found extensive applications in these domains. However, in the LCD industry, manual inspection is predominantly employed to comprehensively detect defects in finished LCD products, resulting in time wastage, missed detections, and reduced production efficiency [
1,
2].
This research is focused on the automatic visual inspection on low-contrast surface micro defects of LCDs [
3,
4,
5].
Figure 1 illustrates the brightness non-uniformity defects on the low-contrast LCD surface. The left section (a1–a3) depicts the defects, while the right section (b1–b3) shows the enhanced effects of these defects. It is evident that the surrounding areas of these defects possess low-contrast characteristics, making it challenging to identify these brightness non-uniform defects.
In recent years, deep learning methods have gained prominence and have gradually been employed for LCD defect detection [
6,
7,
8]. However, the drawback of current deep learning methods is their reliance on a significant number of positive and negative samples for model training. Additionally, the laborious and time-consuming process of labeling defective samples (i.e., creating masks) poses a challenge. Furthermore, the issue of imbalanced samples in production arises, where it is challenging to gather a sufficient quantity of defective samples. Therefore, it is crucial to research a new method based on generative adversarial networks to solve these problems.
We propose a method to automatically generate samples and masks simultaneously using deep generative network models. The method can accomplish the complex and laborious task of labeling and obtaining image masks while solving the difficult problem of positive and negative sample imbalance.
2. Related Work
Common machine-vision-based methods for surface defect detection can be categorized into the following classes [
9]: statistical methods; feature-based methods; spectral-based methods; subspace-based methods [
10,
11]; and the emerging deep learning-based methods.
Statistical methods require the collection of a certain number of qualified samples and use statistical models to perform calculations to establish a fixed template of qualified samples. During inspection, the sample to be inspected is matched with a fixed template, and the differences between the sample and the template are highlighted and defined as defects.
Zhong [
12] analyzed a certain number of defect samples and calculated the grayscale threshold of the defect and background image; this threshold was used to threshold the subsequent detection objects to enhance the image contrast. Calculating the probability between the defect and the background edge pixels enables the detection of impurity defects in flexible integrated circuit packaging substrates. Since the threshold value of this method relies on manual calculation, there is a need to repeatedly calculate the threshold value when facing multiple categories of inspection objects or multiple defect types.
The feature-based method involves processing image pixels to obtain defect information and has a certain degree of simplicity when facing detection objects with obvious defect characteristics and high recognition. When detecting low-contrast surface defects, it is usually impossible to calculate an effective threshold due to the high randomness of the defect appearance and the complex background image.
Tu [
13] proposed a printed circuit board inspection and sorting method. The PCB images collected by the camera are processed in sub-pixels and then registered with the corresponding template based on grayscale information to detect mis-soldering and missing soldering of surface components. The PCB samples are automatically positioned and sorted. Since this type of method only requires qualified samples to construct a fixed template, it solves the problems of difficulties in defect feature extraction and sample imbalances in feature detection algorithms, and can effectively detect targets with too many defect types and insufficient defect samples.
However, the defects encountered in LCD manufacturing exhibit localized brightness non-uniformity and smooth brightness variations. Traditional methods are inadequate for detecting low-contrast surface brightness non-uniform defects, as investigated in this study. Consequently, in addition to employing approaches such as adaptive thresholding [
14], sophisticated machine learning algorithms [
15,
16,
17], including deep convolutional neural networks (CNN) [
18,
19], have been introduced. In recent years, deep learning methods have made great advancements in classification, detection [
20,
21], and instance segmentation [
22,
23]. Consequently, deep learning methods are increasingly used in LCD defect detection.
Shuang Mei et al. [
18] proposed a Mura defect identification method based on feature-level fusion of unsupervised learning. This method is a defect identification method based on joint feature representation. This method fuses hand-crafted and unsupervised learning of features to obtain useful features. Experimental results show that this method realizes the identification of Mura defects in thin-film-transistor LCD panels using visual inspection equipment and has strong robustness and accuracy.
In the latest deep learning methods, Faster R-CNN [
20,
21] and instance segmentation method Mask R-CNN [
22], a pivotal element is the region proposal network (RPN). The fundamental concept underlying RPN is the dense sampling of the entire input image using a multitude of overlapping bounding boxes of various shapes and sizes. Subsequently, the network is trained to generate multiple object proposals, also referred to as regions of interest (RoI). This architectural choice enables RPN to effectively explore features across diverse scales. RPN comprises a convolutional neural network that takes feature images as input and produces output bounding boxes along with associated probabilities for contained objects.
Ramya et al. [
24] introduced the utilization of the state-of-the-art Single Shot Multibox Detector (SSD) network for both classifying and localizing Mura defects, achieving simultaneous defect classification and localization. In comparison, the Mask R-CNN method [
25,
26,
27] offers higher accuracy compared to the aforementioned deep learning-based object detection methods. Moreover, it possesses the advantage of simultaneous classification, localization, and instance segmentation. Therefore, improving and applying the Mask R-CNN method in the detection of micro defects in LCDs will make it possible to identify defect categories and segment defect shapes. However, the utilization of the aforementioned deep learning methods faces challenges in collecting a substantial number of defect samples for training and testing, as well as in the complex and time-consuming task of annotating defect samples with masks. To address these issues, a deep network model capable of automatically generating samples and masks is proposed as follows.
In practical LCD manufacturing processes, a large quantity of qualified samples can be easily obtained, while gathering a lot of defect samples within a short time is challenging. Data augmentation techniques are used to augment the defect dataset [
28,
29]. This involves synthesizing defect regions onto normal images through operations, such as rotation, cropping, and duplication, to generate defect samples.
As an unsupervised network model, generative adversarial networks (GAN) can adaptively generate similar samples through the input unlabeled dataset by setting the generator and discriminator. A precisely designed GAN network can be trained with a small number of defect samples and automatically generate a large number of similar samples for subsequent training of the recognition network. In defect detection, the problem of insufficient defect samples can be solved.
Yi [
30] used a variational self-generator to improve the adversarial generation network, successfully expanded the MINST dataset and verified that there was no significant difference between the generated samples and the original samples. Li [
31] performed 3D modeling of the belt conveyor and then input it into the CycleGAN [
32] network to expand the fault samples. The expanded samples were used to fully train the Yolo v5 target detection network, achieving an improvement in network detection accuracy. Liu [
33] used CycleGAN to expand the defect samples of sample LCD screens to achieve the construction of a balanced sample dataset. Mask R-CNN, when fully trained on this dataset, achieved an improvement in detection accuracy. These previous studies amply demonstrate the effectiveness of utilizing deep generative networks in sample dataset expansion. However, when the original CycleGAN network faces welding images with complex backgrounds, the visual effects and authenticity of the samples it generates have not yet been verified. Thus, it is proposed to employ a Cycle-Consistent Generative Adversarial Network (CycleGAN) to address the issue of sample imbalance.
After a significant number of samples have been generated, the time-consuming and labor-intensive process of annotating and acquiring defect image masks persists when training the aforementioned deep learning-based surface defect detection methods such as Mask R-CNN. Existing annotation tools like LabelMe [
34] add to the complexity and effort required for mask annotation. To overcome this challenge, a method based on generative adversarial networks (GANs) [
35,
36,
37] is proposed to automatically annotate and acquire image masks. Specifically, the generated defect sample images and their corresponding input defect-free images are fed into a CycleGAN model. The defect sample images serve as the target images, while the defect-free input images are treated as the input images during the training process. Through iterative steps, the defect information is progressively accumulated in the defect-free input images until the generation of defect samples is achieved. This iterative process represents the generation of sample masks.
3. Simultaneous Generation of Training Samples and Masks Based on the GAN Model
We propose a technique that leverages a generative adversarial network (GAN) to autonomously generate defect samples. This method requires only a limited quantity of real samples to enhance and expand the LCD sample dataset. Subsequently, the generated dataset is employed to enhance the detection results.
After a sufficient defect sample dataset has been generated, the labeling and acquisition of masks of defect images is time-consuming and laborious. A new method of automatically acquiring image masks is proposed in which the generated defect sample image and the corresponding input non-defective image are input into the new CycleGAN model and trained as target images and input images in the training process of this model. During the training process, the superimposed defect information is accumulated in the defect-free input image in a step-by-step iterative manner until the defect sample is generated, and this superposition process is the generation process of the sample mask.
3.1. CycleGAN Model
The CycleGAN model is an image style transformation technique, and the ultimate goal of this model is to complete the image style transformation between two domains without the one-to-one corresponding training data. Image style conversion refers to the conversion of a picture from one style to another.
As shown in
Figure 2, the CycleGAN model maps from the
X domain to the
Y domain by mapping
G.
DY is the discriminator corresponding to the generator, which is used to distinguish between real data and generate G(
x), forming a single-generation adversarial process. In order to avoid invalid conversion effects, the authors of the CycleGAN model propose cyclic consistency loss. Another mapping relationship F maps from the
Y domain to the
X domain, and the discriminator of the denoted generator is
DX, which distinguishes the real data and generates F(
y). The CycleGAN model learns both the G mapping and F mapping relationships, while satisfying the cyclic consensus requirement: G(F(
x)) ≈
x; after two opposite mappings, it returns from the
x domain back to the
x domain.
3.2. CycleGAN Loss Function
CycleGAN combines adversarial loss and periodic consistency loss to create an output image, which measures adversarial loss where the generation distribution does not match the target. The consistency loss is used to avoid contradictory pairs of mappings. It is trained with unpaired samples and is ideal for defect detection. The process involves two types of loss: adversarial losses and cycle consistency losses.
To bring the generated data distribution closer to the real data distribution:
Like GAN,
G is used to achieve
X→
Y, and G(
x) should be as close to
Y as possible during training, and the discriminator
DY is used to determine the true and false of the sample. The same formula as GAN is used:
Similarly,
Y→X is implemented for
F:
- 2.
Cycle consistency loss
Adversarial loss only ensures that the generated sample is homogeneous with the real sample, but it also requires a one-to-one correspondence of images in the corresponding domain.
We want ≈ x, called forward cycle consistency; ≈ y, called backward cycle consistency.
To ensure consistency as much as possible, set the corresponding loss as:
- 3.
Overall loss
Generator G tries to achieve the migration of
X to
Y, generator
F tries to achieve the migration of
Y to
X, and at the same time, it is hoped that the generators of the two generators can achieve mutual inversion, that is, iterate back to themselves:
where
is the weight between control, resistance loss, and periodic consistency loss.
The background of the defective image generated by CycleGAN is similar to the real image with the defect. CycleGAN can generate synthetic defective samples by simply entering new defect-free samples.
3.3. The Proposed Automatic Sample and Mask Generation Method
The proposed method’s workflow is depicted in
Figure 3, in which the input, the defect-free LCD sample image
, is used to generate a large number of defective LCD sample images
through CycleGAN1.
The non-defective sample and the corresponding generated defective sample are trained as the input and output of CycleGAN2, and the effect part of , the defect, will be gradually superimposed during the training process, and when enough defect information is superimposed, the image can be obtained by simple image processing and a binarization operation to obtain the mask of the image.
In this investigation, a learning rate of 0.0002 was utilized for CycleGAN1. This exceptionally low value was chosen to ensure that the synthetic defects closely resemble real defects. In CycleGAN2, CycleGAN2 takes a large value (e.g., 100) so that the texture background of the defect-free input and the output are as close as possible.
In order to identify defect regions in the image mask generated by CycleGAN2, the difference between the generated image in t iterations and the final image generated at the last T iteration is the accumulation of intermediate iterations. namely:
where
is the output generated by CycleGAN2 at
t epochs, for
t = 3, 4, …,
T − 1. Since the background texture was not reconstructed very well during the first two cycles,
t = 1,2 was discarded. Empirical studies have shown that ten iterations (
T = 10) are usually sufficient to segment the defect regions in the generated image. Pixel background areas that share common ground produce small differences
, while defective pixels make large differences. A simple image processing operation and binary threshold processing were applied to segment defects in a differential image
.
4. Experimental Results and Analysis
The experiment aims to evaluate the performance of a GAN-based sample and mask generation scheme by using LCD images with and without defects. First, the experiment validates the generation of samples to augment the dataset using a CycleGAN1 model. Then, the second experiment generates masks using the CycleGAN2 model, and finally, its performance is evaluated using Mask R-CNN. The hardware and software configuration for the experiment includes Nvidia RTX4000 GPU and Python 3.
4.1. Dataset Augmentation Using CycleGAN to Generate Image Samples
To address the limited number of images in the original dataset, which is insufficient for effective training, data augmentation is necessary. Initially, common techniques such as rotation and mirroring are applied to the original images for data augmentation, as frequently used in deep learning. However, even with these techniques, the dataset remains limited in size. Therefore, a GAN-based data sample augmentation method is employed.
Since the number of existing defective LCD sample images is limited, conducting effective experiments poses a challenge. CycleGAN is utilized to expand the sample dataset by leveraging the capabilities of generative adversarial networks (GANs). CycleGAN can generate additional datasets based on the features extracted from a small amount of existing data, thus compensating for the scarcity of the original dataset.
Here,
is the weight in the loss function that controls the adversarial loss and the consistency loss. The results shown in
Figure 4 show that smaller
is more inclined to generate images that highlight local defects while the background is similar. An excessively large
value proves ineffective in generating defects, while an excessively small
value hinders the accurate reconstruction of background texture. To strike a balance that aligns with practical requirements and ensures overall performance in both defect synthesis and background preservation, this study employs
= 45 for CycleGAN1.
4.2. Image Masks Generating Using the CycleGAN2 Model
Through the existing defect-free image and the defect image corresponding to this defect-free image generated from the CycleGAN1 model, these two images become the input pair of the CycleGAN2 model, and CycleGAN2 trains each pair of images separately. As shown in
Figure 5, (a1–d1) are non-defective sample images, while a2–d2 are corresponding defective sample images generated by CycleGAN1. The corresponding generated defect sample image and the defect-free image into the CycleGAN2 model were input as the target image and input image in the training process of this model, respectively. During the training process, the defect-free input image gradually iterates, accumulating and superimposing defect information until the defect samples are generated, and this superposition process is the generation process of sample masks.
In terms of CycleGAN2’s loss function parameter selection, since the larger
value in CycleGAN2 pays more attention to consistency loss, it tends to preserve the global background texture of the object surface.
Figure 6 shows the comparison results of different
. Our purpose is to make the input image without defects and the background of the generated defect image as similar as possible so that CycleGAN2 needs a larger regularization value.
T is the generation period of a pair of control samples in the process of training CycleGAN2 to generate a process graph, corresponding to the Formula (6). The larger the value of
T, the closer the comparison image in the process is to the comparison image generated by the final call model;
is the weight in the loss function that controls the adversarial loss and the consistency loss. The experimental results show that a smaller value of
does not reconstruct the background in the early stage. Defects only appear in the later stage. A larger value of
has a better background reconstruction effect and synthesizes defects in the early stage. Therefore, CycleGAN2 uses
= 100 in this paper.
In order to highlight defective pixels in the synthetic images in CycleGAN2, we integrated the process image and accumulated the differences in the generated images. Since the background texture is not well reconstructed in the early stage, a floating integral lower bound was chosen to compare the effect of different integral lower bounds on segmenting defect regions in the synthesized image.
4.3. Segmentation Results
Simple image processing and binarization on the superimposed process image of defects was carried out to obtain the segmentation results. The experimental results are as shown in
Figure 7: from the segmentation image, it can be seen that the position information of the defects is relatively obvious, which can be used to generate a mask.
4.4. Employing the Mask RCNN Model for Recognition
The Mask R-CNN model was employed for training and testing the LCD dataset. Initially, the dataset was prepared to include defect sample images obtained previously and their corresponding segmented mask images. Subsequently, the Mask R-CNN model was used for training and testing. For a single detection target, we compared the surface area of the detected defect with an empirically determined surface area to calculate the recognition rate for a single image.
Figure 8 illustrates the test results using 30 samples, demonstrating a varying recognition rate ranging from 0.730 to 0.999. Notably, the majority of these recognition rates surpass 0.95. We tested the two sets of data, the pre-expansion sample set, and the GAN method to generate samples to expand the dataset. The detailed results are shown in
Table 1.
As shown in
Table 1, the initial Group1 experiments were performed on a set of unexpanded defective image samples. This group of experiments was trained with 30 random samples and tested with 14 samples. Its recognition rate mainly exceeded 0.99, albeit with one instance falling below 0.9. The average recognition rate across all test samples was 0.988. The model demonstrated excellent performance on real samples, but the limitations imposed by the limited sample size must be recognized.
Group2 experiments were performed on LCD samples generated by an expanded defect image dataset. To ensure that the model was trained to the same extent, we similarly used 30 random samples for training and then tested on 50 samples. From the experimental results, it can be seen that most of the test results are above 0.95; however, 11 images were less than 0.9. This was related to the increase in the number of sample sets. The average recognition rate of all the test samples was 0.9463. Overall, the performance of the detection results of the generated sample dataset decreased; however, it is easy to obtain a sufficient data sample set.
In order to quantitatively evaluate the performance of our proposed unsupervised automatic mask generation method, we compared it with a manually labeled mask generation method (LableMe). We input the masks generated by these two methods to Mask R-CNN separately. As shown in
Table 2, the final test results were used to compare the impact of various mask generation methods on the algorithm’s recognition performance.
4.5. The Comparing Segmentation Results
Figure 9 illustrates the experimental results that juxtapose our proposed method with the Gabor filter method, wavelet method, and U-Net segmentation method. The experiments reveal that the Gabor and wavelet methods struggle to effectively segment images of defective LCD surfaces, whereas our proposed method excels in successfully detecting these defects. The U-Net segmentation method can successfully segment defect areas, but the segmentation results are not very accurate. In addition, the U-Net method only belongs to semantic segmentation and does not mark specific instance information of pixels. Our proposed method can effectively solve this problem.