1. Introduction
Traditional machine learning methods can effectively solve the problem of defect detection of a variety of industrial products, such as bearings [
1], mobile screen [
2], coiled materials [
3], rails [
4], steel beams [
5], etc. These methods can manually design feature extractors to adapt to the specific product image dataset and input product features into classifiers such as SVM (support vector machines) [
6] and NN (neural network) [
7] to determine whether the product has defects. However, when the surface defects of the products have problems such as a complex background texture, large variation of defect feature scale, and similarity of defect region features and background features (as shown in
Figure 1), the traditional machine learning method cannot meet the needs of this kind of detection.
Since AlexNet [
8] was proposed, the deep learning method based on convolutional neural network (CNN) has become the mainstream method in the field of surface defect detection [
9,
10,
11,
12]. CNN can not only automatically learn image features, but also extract more abstract image features through the superposition of multiple convolution layers, which has better feature representation ability than the manually designed feature extraction algorithm. According to the results of network output, the defect detection algorithm based on deep learning can be divided into the defect classification method, defect recognition method and defect segmentation method.
The algorithm based on defect classification usually uses some classical classification networks to train the samples, and the trained model can classify the defective and defective-free samples. Tian [
13] used two CNN networks to detect defects in six types of images; Xu [
14] proposed a CNN classification network integrating VGG (Visual Geometry Group) and ResNet to detect and classify the surface defects of rollers; Weimer [
15] also use CNN to identify defect categories. Such methods usually do not involve the location of defect areas.
In order to accurately locate the defect area, some researchers have improved the network with excellent performance in target recognition task and applied it to surface defect detection. Such algorithms are mostly based on R-CNN [
16], SSD (single-shot multibox detector) [
17], YOLO (You Only Look Once) [
18] and other networks. Chen [
19] applied deep CNN (DCNN) to accelerate defect detection.
In order to achieve the accuracy of pixel-level detection, some researchers have used segmentation networks, such as the detection network constructed by Huang [
20] using U-Net to transform defect detection tasks into semantic segmentation tasks, which improves the accuracy of magnetic tile surface detection. Long [
21] uses a full convolutional network (FCN) to segment the defect area. These methods all rely on a certain number of defect samples.
On many occasions, the type of product defect is unpredictable, and it is difficult to collect a large number of defect samples. To solve these problems, researchers began to pay attention to small samples or unsupervised learning methods. For examples, Yu [
22] used the Yolo V3 network to train a small number of defective samples to achieve high accuracy detection. Methods based on autoencoder (AE) are used for surface defect detection tasks, such as the convolutional autoencoder (CAE) [
23], stacked noise reduction autoencoder based on fisher criterion (FCSDA) [
24], robust autoencoder (RCAE) [
25], sparse denoising autoencoder network fused with gradient difference information [
26], etc. Mei [
27] proposed using the multi-scale convolution autoencoder network (MSCDAE) to reconstruct the image and generate the detection result by using the reconstruction residual. Compared with the traditional unsupervised algorithms, such as PHOT (phase-only transform) [
28] and DCT (discrete cosine transformation) [
29], MSCDAE has greatly improved the model evaluation index. Yangh [
30] used feature clustering on the basis of MSCDAE to improve the reconstruction accuracy of texture background. The data samples used in the above reconstruction network are mostly regular textures, without considering the differences in image textures, so the detection accuracy obtained via such detection methods cannot fully reflect the performance of detection methods, nor can it measure the generalization ability of detection methods.
In addition to the autoencoder, the generic adversarial network (GAN) [
31] is also applied to the unsupervised defect detection method. By learning a large number of normal samples, GAN enables the generator in the network to learn the texture features of normal samples. Zhao [
32] combined GAN and autoencoder to put defects into defect-free samples and trained GAN network to restore images. He [
33] used SGAN and autoencoder to train unmarked steel surface defect samples, extract fine-grained image features and classify them. Schlegl [
34] proposed the AnoGAN network to solve the abnormal detection of lesion images under unsupervised conditions, while GAN has the problem of unstable performance [
35] in applications.
Considering the scarcity of defect samples in application, this paper proposes a method based on lightweight reconstruction network for low-complexity texture (LRN-L). This method uses only a small number of defect-free samples to train the reconstructed network, so that the network has the ability to reconstruct the samples. When abnormal samples are inputted, the trained network model can detect the abnormal region of the samples. In addition to the experimental analysis of the network structure, loss function, algorithm efficiency and other aspects, this paper also introduces the index of texture complexity, and uses the calculation model of texture complexity to grade the texture samples, to evaluate the detection ability and application of LRN-L.
2. LRN-L
LRN-L is divided into two stages: texture reconstruction stage and defect location stage. In the texture reconstruction stage, the reconstruction network (LRN) is designed based on CAE, and only a small number of defect-free samples are used for training, so that the reconstruction network can generate defect-free images; In the defect location stage, the residual image between the reconstructed image and the original image is taken, and the defect is located by the segmentation algorithm. The LRN-L model is shown in
Figure 2.
2.1. Texture Complexity
Texture complexity reflects the difficulty of some operations (such as image enhancement, defect detection, etc.). One of the functions of texture complexity is to measure the performance of the algorithm; Second is to classify textures or measure the similarity between textures. The structure of the reconstructed network is closely related to the texture complexity, so for textures with different complexity level, the network structure should be different. Texture complexity can be measured in different ways [
36,
37,
38,
39,
40]. The GLCM (gray level co-occurrence matrix) [
41] is used to statistically analyze the features of texture to reflect the complexity.
If the image gray has N levels, then the gray level co-occurrence matrix P is a N-order matrix, where the element in the i-th row and j-th column represents the probability that two pixels with gray i and j, respectively, separated by a distance δ = (Δx, Δy), occur simultaneously in the image. δ determines the distance and direction between two pixels. There are four commonly used directions θ: 0° direction, δ = (Δx, 0); 45° direction, δ = (Δx, Δy); 90° direction, δ = (0, Δy); and 135° direction, δ = (−Δx, −Δy).
Generally, five most commonly used parameters are extracted from GLCM to describe texture features: Energy
J, Entropy
H, Contrast
G, Deficit
Q and Correlation
COV, which are defined as follows:
GLCMs of four directions is extracted from texture image, and
J,
H,
G,
Q and
COV in the four directions are calculated, respectively, denoted as
Ji,
Hi,
Gi,
Qi and
COVi, where
i = 1, 2, 3, 4. To make the texture features independent of direction, the harmonic average is calculated for the above feature parameters by Formula (6). Taking the parameter
J as an example, the energy values of the four directions are
J1,
J2,
J3 and
J4, respectively, and the energy value
J of the texture image is obtained from Formula (6).
Among the five parameters,
J,
H and
G were positively correlated with texture complexity, while
Q and
COV were negatively. Inspired by SSIM [
42],
G,
Q and
COV are selected as indicators of texture complexity based on the texture features of industrial product surface images. The mean square error (MSE) is used to assign weights to the parameters of
G,
Q, and
COV, and the formula of texture complexity
f is constructed, as shown in Formula (7). In the Formula (7),
PCi represents
G,
Q and
COV, respectively, and
i = 1, 2 and 3, ā,
MSEi and
wi represent the average, the variance and the weight assigned to parameters, respectively.
Mario [
44] divided image textures into three levels according to the complexity: low-complexity texture, medium-complexity texture, and complexity textures, represented by L, M and H, as shown in
Figure 3.
2.2. Lightweight Reconstruction Network Model (LRN)
The core of lightweight reconstruction network model is to comprehensively design the network from two aspects, namely, network structure and detection speed, while maintaining accuracy. According to the characteristics of industrial product texture samples, some improvements are made on the basis of CAE. The structure of LRN is shown in
Figure 4.
First, input the original image into the network, and use three convolution kernels of size 1 × 1, 3 × 3 and 5 × 5 to obtain multiscale features; then, input them to CAE module. The output of the decoding module is then deconvoluted by different kernels to obtain three scales of the reconstructed images, and the final reconstructed image is obtained via feature fusion. Compared with the MSCDAE [
29], multi-scale features can also be obtained, but the computational cost is reduced.
The CAE module of LRN includes four convolution sub-modules and four deconvolution sub-modules. Each convolution sub-module includes a convolution layer, a BN layer [
43] and a nonlinear activation layer. The first three convolution sub-modules also include a pool layer that can change the image scale. The activation function adopts Relu6. Use a 5 × 5 convolution kernel for the first three convolution layers, and the last layer uses a 3 × 3 convolution kernel.
The depth of the reconstruction network determines the reconstruction ability of the autoencoder. If a model with complex network structure is used, the ability of texture feature extraction can be improved, but at the same time, the ability of feature extraction of defect region is also improved. When the residual operation is carried out, the detection will fail because the difference between reconstruction image and origin image is not obvious enough. LRN uses a lightweight network structure, which has limited ability to reconstruct textures. However, through the design of multi-scale features and the loss function, the network can not only fully learn the characteristics of normal texture but also perform the restorative reconstruction of the defective areas.
2.3. Loss Function
The LRN takes the reconstruction error between the original image and the reconstructed image as the loss function to promote the convergence of the network. Set the input image as x and the reconstructed image as y.
- 1.
L1 Loss
L1 loss is also known as MAE (mean absolute error) loss, which is defined as:
where
ω represents the set of weight matrices in the reconstructed network,
λ represents the penalty factor of the regularization term, and 0 <
λ < 1.
- 2.
L2 Loss
L2 loss is also called MSE (mean squared error) loss, which is a common loss function to evaluate the difference between the reconstructed image and original image, and it is defined as follows:
Compared with
L1,
L2 is more sensitive to abnormal areas and will over punish large loss errors, such as MSCDAE [
29], so LRN introduces
L1 loss.
- 3.
Structural Loss
Both
L1 and
L2 do not consider the structural characteristics of texture, so LRN introduces
SSIM (structural similarity index) [
44] to build a loss function.
SSIM optimizes the model from brightness, contrast and structure [
45], as shown in Formulas (10) and (11). The larger the
SSIM is, the more similar the images are. When the two images are identical,
SSIM = 1. Therefore, to use it as a loss function, we add a minus sign.
where
and
are the average brightness of
x and
y,
and
is the standard deviation of the pixel value, the covariance of
x and
y is
, and
C1,
C2 and
C3 are constant terms that are added to avoid situations where the denominator is zero.
- 4.
Loss Function of LRN
The loss function designed in this paper,
LLRN, combines the advantages of
L1 and
LSSIM and adopts a combined form, as shown in Formula (12), where
α is a weight factor with the range of (0, 1) to balance the proportion of
L1 loss and
LSSIM.
2.4. Defect Location
- 1.
Residual Image
The difference between the original image (as shown in
Figure 5a, the red circle area is the defect area) and the reconstructed image by LRN (as shown in
Figure 5b, the red circle area is the reconstruction area for defects) is made by using Formula (13). The residual image is shown in
Figure 5c, which contains the location information of the abnormal area.
- 2.
Noise Removal
The residual image shows a lot of noise, forming pseudo defects that affect the positioning of the real defect area. The average filter is used to denoise, and the result is shown in
Figure 5d.
- 3.
Threshold Segmentation and Defect Location
The adaptive threshold method is used to locate the defect, and the final result is obtained, as shown in
Figure 5e.
3. Experiment
In this paper, LRN-L is tested on the surface texture dataset of industrial products. The influence factors of LRN-L, including loss function, network structure, and texture complexity, are analyzed in detail. Finally, LRN-L is compared with other similar unsupervised algorithms. The implementation of this program was executed by using Python 3.6 and PyTorch framework. Performance testing was carried out using CUDA 9.0 and CUDNN 5.1. The CPU of the workstation is Intel Xeon X5 @2.9 GHz, accompanied by 128 GB DDR4 memory, Ubuntu 16.04. Furthermore, the GPU employed was the NVIDIA GTX-1080Ti with 11 GB of single card video memory.
3.1. Dataset Introduction
The texture samples are shown in
Figure 6.
Figure 6a–j are from the dataset DAGM2007 [
46], which contains 10 kinds of texture sample.
Figure 6k–n are from the dataset MVTech [
35].
Figure 6o is from AITEX [
47]. As to the 15 kinds of texture, each kind contains 100 defect-free positive samples for training and 10 defect samples for testing. The image size is 512 × 512.
3.2. Evaluation Index
This paper uses
Recall,
Precision and
F1 Measure to evaluate the performance of LRN-L, which is defined as follows:
where
TP is the defect sample with correct defect segmentation,
FP is the defect sample with no defect area detected, and
FN is the normal sample with defect area detected.
3.3. Network Structure Comparison Experiment
The network structure affects the training results of the reconstructed network. In this experiment, the structure of LRN is compared with classic networks such as FCN [
21] and U-Net [
48]. The experimental results are shown in
Figure 7.
The results show that the number of layers of the network cannot be too many when the reconstruction network is used to detect texture surface defects. Although the deep network structure has a strong ability of feature extraction, it is easy to reconstruct the defect area, resulting in the residual error between the reconstructed image and the original image being almost equal to zero, and the defect location cannot be realized. When using lightweight structure to reconstruct the network, it can not only fully learn the texture features of positive samples, but also reconstruct the defect area into normal texture, forming an obvious reconstruction error. Therefore, LRN does not need too many layers, nor does it need a complex network structure such as GRL (Global Residual Learning) [
49], sub-pixel layer [
50] and residual connection [
51].
3.4. Loss Function Comparison Experiment
In LLRN, L1, L2, LSSIM and their combination are selected for comparative experiments. During the training, the size of the image block (patch) is 32 pixels, the size of the batch is 256, and after 1000 iterations, the model output results are entered into the defect location module.
Figure 8a,b are the experimental results of two types of surface defect samples under various loss functions, the red circle area is the defect area.
Figure 8a shows the defect samples with irregular surface texture. From the comparison of residual results, it can be seen that the residual results obtained by using
L2 as the loss function have more noise points in other areas except the real defect area, forming pseudo defects; using
LSSIM as the loss function alone, the detected defect area is slightly smaller than the real defect area; compared with other loss functions,
LLRN achieves a better result.
Figure 8b shows the defect samples with regular surface texture. From the comparison of residual results, it can be seen that the integrity of the defect area obtained by using
L2 is poor, which is similar to the detection result obtained by using
L2 +
LSSIM.
LLRN achieves a good result, and the result is similar to that using only
L1.
Table 1 shows the
Precision,
Recall and
F1 Measure for LRN to use different loss function. For the defect samples with an irregular surface texture in
Figure 8a,
LLRN achieved maximum values of 0.75 and 0.82 for
Recall and
F1 Measure, respectively, and is slightly inferior to
LSSIM in terms of
Precision. For the defect samples with a regular surface texture, shown in
Figure 8b, when only
L1 is used,
Recall achieves the highest value of 0.76, followed by LRN, which is 0.71.
Precision achieved by using only
LSSIM is the highest, which is 0.96, and
LLRN is second with a slightly lower value of 0.87. For
F1 Measure,
L1 performs best.
The results show the following: (1) For a regular texture, using L1 alone and LSSIM alone, or using L1 and LSSIM in combination (LLRN), can achieve better results with slight differences. (2) For an irregular texture, it is suggested to use LLRN, which can obtain better results. (3) The LLRN can solve the detection task of more types of texture surface abnormalities, and it is the best loss function.
3.5. Experiment of Texture Complexity
In the face of defect detection tasks with different texture complexities, it is necessary to evaluate the applicability of LRN-L. The characteristic parameters of texture samples shown in
Figure 6 are calculated according to Formulas (6) and (7) and are shown in
Table 2. The experimental results are shown in
Table 3.
From the results presented in
Table 3, LRN-L performs admirably in reconstructing images for low-complexity and medium-complexity textures, yielding a higher defect detection rate. However, this algorithm’s efficacy diminishes when dealing with complex textures. Most notably, there does not appear to be a direct linear relationship between the magnitude of the evaluation index and the texture complexity for low-complexity and medium-complexity textures. For instance, samples (d) and (j), despite being simplistic in their texture complexity, were deemed undetectable, as they exhibit irregular and inhomogeneous texture structures, thereby exhibiting low values across all three indices.
Overall, LRN-L yields superior results when applied to samples with low-complexity and medium-complexity textures, particularly those with relatively uniform texture structures. On the other hand, samples with low-complexity and medium-complexity textures but non-uniform texture structures have low detection indices. LRN-L is unsuited to deal with high-complexity textures.
3.6. Experiment of Loss Function under Different Weight Factors
LLRN is a combination of
L1 and
LSSIM, as shown in Formula (12). Weight factor
α was used to balance the relative importance of these two components. Using sample (g) in
Figure 6, with
α from 0.15 to 0.85, we conducted a series of comparative experiments in increments of 0.1. The results are shown in
Figure 9 and
Table 4.
As illustrated in
Figure 9, the results vary significantly with changes in
α. As
α increases, the
LSSIM proportion decreases, resulting in reduced the structural influence. The results obtained at
α = 0.15 exhibit the least amount of noise and yield more accurate defect localization.
Table 4 demonstrates that
α = 0.15 produces the highest
Recall and
F1 Measure, which is 0.79 and 0.73, respectively, as well as the second-highest
Precision among the evaluation indices.
3.7. Comparison Experimental of Related Algorithms
In this experiment, LRN-L is compared with the traditional unsupervised method (LCA [
52], PHOT [
28]) and the unsupervised method based on autoencoder (MSCDAE [
27]), and it has been proven in the literature that the performance of MSCDAE is superior to other autoencoding methods such as ACAE [
9] and RCAE [
25]. The experiment uses the texture samples (b), (e), (j), (n) and (o) in
Figure 5. These five types of textures belong to low-complexity textures and medium-complexity textures. As to the five kinds of texture, each kind contains 100 defect-free positive samples for training and 10 defect samples for testing. The default network parameters are as follows: block size is 32 × 32, batch size is 256, number of epochs is 1000 and weight
α is 0.15. The results are shown in
Figure 10.
LCA can eliminate the high-frequency part, which represents the background, while retaining the low-frequency part, which represents the defect, which is not suitable for irregular texture detection, as shown in No. 3 in
Figure 10. For PHOT, only No. 3 detection is effective. MSCDAE can detect the defect areas of all samples, but also detect some defect-free areas as suspected defects, such as No. 1, No. 3 and No. 5. LRN-L achieves good detection results on all types of defects and textures.
In addition,
Recall,
Precision and
F1 Measure are used to quantitatively analyze the experimental results of the above four methods, as shown in
Table 5 (the optimal result is highlighted in bold font).
As can be seen from
Table 5, the three metrics of LRN-L are superior to other algorithms in almost all types of samples. The
Recall on sample No. 3 is slightly lower than that of MSCDAE, but MSCDAE will simultaneously detect defect-free areas and generate pseudo defects.
The efficiency of the algorithms is also compared. Sample images measuring 1024 × 1024 pixels were used in the experiment. Under the same computational performance, the processing time of the four methods is compared, as shown in
Table 6. The average detection time of LRN-L is 2.82 ms, which can meet the requirements of industrial real-time detection. Other methods are time consuming, which limit the promotion of their practical applications.
4. Conclusions
In this paper, a method of texture defect detection based on the reconstruction network (LRN-L) is proposed. LRN uses CAE with a lightweight structure to design the reconstruction network. In the phase of texture reconstruction, only the defect-free samples are used for training, which can solve the problem of shortage of defective samples in the industry. In the defect location stage, the accurate location of the defect region is achieved by segmentation algorithm. In this paper, the LLRN loss function is designed for defect detection, which improves the detection efficiency. The evaluation index of image complexity is established, the texture complexity of texture samples is calculated, and the texture complexity level is divided. This paper discusses the influence of network structure, loss function, texture complexity and other factors on the defect detection task in the unsupervised algorithm, and it conducts a comparative experiment between the proposed LRN-L and other unsupervised algorithms on multiple types of texture samples. The results show that LRN-L has strong robustness, accuracy and generalization ability, and is more suitable for transplantation to the industrial detection. Because of the lightweight characteristics of the network, LRN-L is more suitable for the detection of surface defects of industrial products with low-complexity and medium-complexity textures.