Next Article in Journal
EGS-YOLO: A Fast and Reliable Safety Helmet Detection Method Modified Based on YOLOv7
Next Article in Special Issue
SimMolCC: A Similarity of Automatically Detected Bio-Molecule Clusters between Fluorescent Cells
Previous Article in Journal
Method of Predicting Dynamic Deformation of Mining Areas Based on Synthetic Aperture Radar Interferometry (InSAR) Time Series Boltzmann Function
Previous Article in Special Issue
Robot Operating Systems–You Only Look Once Version 5–Fleet Efficient Multi-Scale Attention: An Improved You Only Look Once Version 5-Lite Object Detection Algorithm Based on Efficient Multi-Scale Attention and Bounding Box Regression Combined with Robot Operating Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Knowledge Embedding Relation Network for Small Data Defect Detection

1
China Waterborne Transport Research Institute, Beijing 100088, China
2
Shandong Maritime Safety Administration, Qingdao 266002, China
3
China National Offshore Oil Corporation, Tianjin 300459, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(17), 7922; https://doi.org/10.3390/app14177922
Submission received: 9 August 2024 / Revised: 29 August 2024 / Accepted: 3 September 2024 / Published: 5 September 2024
(This article belongs to the Special Issue Object Detection and Image Classification)

Abstract

:
In industrial vision, the lack of defect samples is one of the key constraints of depth vision quality inspection. This paper mainly studies defect detection under a small training set, trying to reduce the dependence of the model on defect samples by using normal samples. Therefore, we propose a Knowledge-Embedding Relational Network. We propose a Knowledge-Embedding Relational Network (KRN): firstly, unsupervised clustering and convolution features are used to model the knowledge of normal samples; at the same time, based on CNN feature extraction assisted by image segmentation, the conv feature is obtained from the backbone network; then, we build the relationship between knowledge and prediction samples through covariance, embed the knowledge, further mine the correlation using gram operation, normalize the power of the high-order features obtained by covariance, and finally send them to the prediction network. Our KRN has three attractive characteristics: (I) Knowledge Modeling uses the unsupervised clustering algorithm to statistically model the standard samples so as to reduce the dependence of the model on defect data. (II) Covariance-based Knowledge Embedding and the Gram Operation capture the second-order statistics of knowledge features and predicted image features to deeply mine the robust correlation. (III) Power Normalizing suppresses the burstiness of covariance module learning and the complexity of the feature space. KRN outperformed several advanced baselines in small training sets on the DAGM 2007, KSDD, and Steel datasets.

1. Introduction

Industrial quality inspection is one of the important application directions of computer vision in smart factories. However, in the industrial environment, defect samples are very scarce, and there may even be zero samples for some defect categories [1]. This makes small training sets a key constraint on the industrial implementation of many data-driven algorithms [2].
In the context of small training datasets, industry scholars have explored two main approaches: increasing the amount of data and reducing the dependency of algorithms on data. The former approach mainly generates new data through augmentation or introduces new data from other datasets [3,4], while the latter focuses on model improvement [5] and algorithm optimization [6] to enhance the feature extraction capabilities of small training sets. However, these methods from other machine learning applications in industrial manufacturing may not be directly applicable to industrial computer vision problems. For instance, in defect detection, defects may not be consistently present, or only a limited number of samples may be available over an extended period. Publicly available defect datasets also suffer from a scarcity of defect samples: the KolektorSDD dataset [7] contains only 52 defect samples out of 399 total, the AITEX dataset [8] has only 105 defect samples, and each category in the RSDDs dataset [9] has only 300 samples. This makes it challenging to effectively apply the aforementioned methods in these contexts.
Therefore, we believe that the small training set of industrial vision requires the introduction of external knowledge. Crucially, we noticed that there is an important difference between defect detection and target detection, that is, in addition to defect samples in the support set of defect detection, standard samples of products are also given, but target detection only has a small number of object samples [10].
Most of the existing automatic inspection equipment manufacturers do not use data-driven AI algorithms for automatic optical inspection, and their equipment generally still uses artificially designed features, although these thresholds test the engineer’s ability to adjust parameters [11]. In fact, the deep learning algorithm can capture some latent features so that it far exceeds the ability of traditional algorithms in a single dataset, but in the actual industrial production environment, it cannot mine the correct features from limited samples, that is, the dataset External defect characteristics. The traditional “statistical modeling + similarity matching” algorithm can be more robust.
Inspired by this, we propose an idea to perform statistical modeling on standard samples as an auxiliary knowledge representation. This standard sample will participate in prediction and assist in identifying product quality inspection. Unlike other algorithms, we use standard samples as a priori knowledge rather than an input. Specifically, we introduce the Gram Matric for Knowledge Modeling. The Gram Matric has achieved outstanding results in the field of style transfer [12]. Some scholars have introduced it into few-shot classification and showed amazing results [13]. We believe that the covariance operation in the Gram Matric can deeply mine pixel-level feature correlations, which can also be considered to be related to texture features. Therefore, we introduce it into Knowledge Modeling and use it as an adjunct to enhance the identification of known and unknown defects.
The main contributions of this paper are as follows:
  • An aim at the lack of defect samples in industrial quality inspection scenarios, starting from external knowledge, using statistical modeling methods to build standard templates, using them as prior knowledge, and designing a defect detection network enhanced by prior knowledge;
  • In order to measure the difference between the standard sample and the predicted sample, as well as to embed this difference into the feature for subsequent head defect identification, a Knowledge-Embedding module based on self-attention was designed;
  • In order to obtain the relationship between features in the vector space and mine weak clues, we designed an eccentric covariance matrix to extract the characteristics of each dimension of the statistic, we automatically adjusted the unnecessary information in the extraction process avoiding interference from the cluttered background.
  • We demonstrated the effectiveness of this method on the public DAGM2007 [14], KolektorSDD [15], and Severstal Steel defect detection datasets.

2. Literature Review

Small Training Sets. Small training sets has always been a huge challenge in the application practice of deep machine vision. Even small training sets leads to a machine learning task of “decomposing the dataset into different meta tasks to understand the generalization ability of the model when the category changes”—otherwise known as Few-Shot Learning [16]. There are three methods of small training sets: data augmentation, model improvement, and algorithm optimization [17]: Data augmentation is used to expand the training data through various image methods to achieve the effect of increasing training samples, such as mixup [18], adding noise [19], and generating samples based on GANs [20]. In recent years, Pseudo-Labeling [21,22] has also become an effective method to improve performance points. Model improvement refers to adjusting the model structure to enhance the feature extraction ability [23]. The optimization algorithm is used to adjust learning strategies to improve algorithm performance. Semisupervised and unsupervised have also become a popular idea (to solve small training set problems) [24]. This paper focuses on introducing a priori knowledge and improving the model to reduce the dependence of the algorithm on defect data.
Active Shape Model. The statistical model of the PCB standard board has a strong positive impact on defect detection. The statistical shape modeling technology was proposed by Cootes in his paper [25] in 1995. It is a deformable model in computer vision, which is used to model the shape in the image. This method only needs to establish a flexible mathematical model and only needs to compare each time. Using this method, the debugging efficiency of the AOI is accelerated, and the misjudgment rate is reduced. Inspired by this, this paper encodes the standard image through CNN, makes statistical analysis on the standard samples by using the clustering algorithm, obtains representative standard samples, and constructs the standard template as the representation of a priori knowledge.
Self-Attention Modules. They have been successfully applied in NLP [26] and physical system modeling [27]. The self-attention mechanism can capture the relationship between the original sentence and the target sentence in natural language processing, and replace the recurrent neural network with an attention model, so as to realize parallel implementation and more efficient learning. These works inspire us to deduce the variant of knowledge embedding based on correlation mining. We converted the original elements from words to conv features and employed the knowledge model of the predicted image. We used this mechanism to establish the knowledge embedding method in the feature mapping from the low dimension to the high dimension.
Gram Matrix. In fact, it can be regarded as an eccentric covariance matrix between features, that is, a covariance matrix without mean subtraction. Second-order statistics have been studied in the context of texture recognition through so-called regional covariance descriptors (RCDs), which were further applied to object class recognition [28]. Co-occurrence patterns can also be used in the CNN setting. A recent approach [29] extracted feature vectors at two separate locations in a feature map and performed an outer product to form a CNN co-occurrence layer. Higher-order statistics have also been used for fine-grained image classification [30] and domain adaptation [31]. SoSN utilizes second-order information and power normalization for end-to-end training with one- or few-shot learning. Based on the second-order statistics applied to these matrics, we designed a multi-relational feature descriptor that captures deep relationships between proposals before being passed to the classification network for defect identification.

3. Research Methods

Below, we introduce our deep template matching defect detector network and then describe its individual components.

3.1. Overview

In this paper, this method operates on the so-called small training set defect detection, which is essentially a classification task. However, in different scenarios, defects are detected, and segmentation is also a task requirement. Taking classification as the main goal, we evaluate the segmentation and detection of defects.
Different from some defect detection schemes that simply add negative samples, we use standard samples as a priori knowledge to identify defects by mining feature relationships. Our Knowledge-Embedding Relational Network (KRN) consists of (i) an encoding network, (ii) Knowledge Modeling, (iii) Knowledge Embedding, (iv) a Gram Operation, and (v) a Prediction Network. Figure 1 shows an example of an architecture that supports images.
The role of the Encoding Network is to generate image-level convolutional feature vectors (descriptors), and our Encoding Network includes the segmentation part. The task of the Knowledge Modeling part is to perform statistical modeling on multiple standard samples in order to obtain a knowledge representation that can assist in enhancing defect detection. The knowledge association embedding module is an operation of mining the relationship between prior knowledge and predicted samples, aiming to promote the fusion of prior knowledge and predicted samples. The task of the Gram Operation is to use the Gram Matrix to mine the latent relationship between each feature vector so as to make the defect salient. Finally, the Predictive Network learns and recognizes this knowledge-embedded relation mining feature.

3.2. Encoding Network

The feature encoding network is responsible for generating convolutional feature vectors, which serve as image descriptors. To address the challenges of sample concentration, high resolution, and small target scenes in industrial visual defect detection tasks, this paper utilizes a convolutional neural network architecture based on ESDN [32]. Specifically, we employed the Segmentation Network and Decision Network, which perform downsampling by a factor of 32, as the feature encoding network. It is important to note that the segmentation component was used as an auxiliary module for feature extraction.
The Encoding Network can be described as f : ( R W × H ; R | F | ) R W × H , where W and H represent the width and height of the input image.The Encoding Network f is a convolutional neural network specifically designed for feature extraction in industrial visual defect detection tasks. It takes an input image of size W × H and produces a feature map of the same spatial dimensions. This network includes multiple convolutional layers, which downsample the input by a factor of 8. After downsampling, the output is split into two branches: one branch undergoes segmentation using a 1 × 1 convolution, and the result is concatenated with the original downsampled feature map. This architecture is optimized to retain defect details by operating at a middle scale, balancing the trade-off between computational efficiency and the preservation of important features.

3.3. Correlation Knowledge Embedding

3.3.1. Knowledge Modeling

Defect detection is a small training sets recognition task, and a large number of normal samples can be used as a reference. Therefore, we specially designed the Knowledge Modeling module, which aims to perform knowledge mining on normal samples for the reference of defect recognition.
Specifically, we used resnet50 to map standard samples to high-dimensional features and convert them into tensors. The weight is trained on Imagenet and has extensive classification ability. Then, n representative images are selected from a large number of standard samples by a clustering algorithm, and then a multi-dimensional image composed of overlapping standard images is defined as X n o r m R W × H , that is, the constructed statistical knowledge.

3.3.2. Knowledge Embedding

Then, we designed a knowledge embedding method based on self-attention, as shown in Figure 2.
The simplest knowledge fusion is concat and add operations, but we hope that this fusion can excavate the correlation between the two to a certain extent for fusion. Referring to the self-attention mechanism, we designed a knowledge embedding method based on relationship mining. In our Knowledge Embedding module, the input consists of conv feature Φ p r e and knowledge feature Φ n o r m ; Φ p r e comes from the last convolution output feature of the Encoding Network. Φ n o r m is the shallow tensor obtained by X n o r m through three maxpoling and three groups of convolution sampling, as shown in Formula (1).
Φ n o r m = f n o r m X n o r m ; F , Φ n o r m R K × N
where F are the parameters to learn of three convolution layers in the Knowledge Model.
A dot product is performed between the conv features and knowledge features to obtain their correlation. A softmax function is applied to obtain the weights on the values. Given matrices ϕ n o r m (by flattening Φ n o r m ) and matrices ϕ p r e (by flattening Φ n o r m ), their correlation is computed as follows:
M = s Φ n o r m , Φ p r e = s o f t m a x ϕ n o r m ϕ p r e T
This correlation will be embedded into the original conv feature through multiplication. The output is computed as
ϕ o u t = m u l M , ϕ p r e = s o f t m a x d o t ϕ n o r m , ϕ p r e T ϕ p r e
Finally, the feature matrix after Knowledge Embedding will be reshape as a feature maps.
In addition, after visiting an electronic factory, we speculated that adding the standard template would help the consumer electronics industry with some wrong parts and defects. They had no appearance damage defects, but the welded components were inconsistent with the design drawings. Having been limited by this dataset, we may test it in future practice.

3.4. Gram Operation

The Gram Matrx is an operation to deeply mine the correlation between features, as shown in Figure 3. We used it as a feature mining tool for defect textures and normal textures. Its input is a feature vector from Knowledge Embedding, which we define as Φ = { ϕ n } n N . Then, we used ϕ ϕ T = 2 ϕ to denote the covariance operation of the eigenvectors. Taking Φ for example,
Ψ Φ n = 1 N n N s r ϕ n = Ψ ϕ n n N s = 1 N n N s ϕ n ϕ n T
KE and GO capture feature correlation through covariance, which essentially introduces second-order statistics. Second-order statistics have to deal with the so-called burstiness, which is “the property that a given visual element appears more times in an image than a statistically independent model would predict”. Power Normalization [13] is known to suppress this burstiness and has been extensively studied and evaluated in the context of Bag-of-Words and Few-Shot Learning. Therefore, we adopted SigmE PN which is defined as
G S i g m E ( M , η ) = 2 1 + e η M T r ( M ) + λ 1
where 1 η N interpolates between counting and detection, λ l e 6 is a regularization constant, and the trace T r stops the diagonal from exceeding 1.
After calculating the Gram Matrix, it is replicated 16 times in the channel dimension to ensure compatibility with the subsequent predict network layers, which are designed to process a specific number of input channels. While using a single-channeled Gram Matrix could reduce computational redundancy, this would require significant architectural changes to the predict network, potentially affecting its performance and stability. The replication maintains the continuity and integrity of the convolutional bottleneck structure without altering the network’s existing architecture.

3.5. Predictive Network

The function of the Prediction Network is to mine and judge the features of the Knowledge Embedding, as well as realize the detection of the target image. We did not directly predict the features of the Knowledge Embedding but used three Conv Blocks: the specific parameters are two Conv 1 × 1 with channel 32 and one Conv 3 × 3 with channel 16, implemented using convolution operations to facilitate feature mining between standard templates and predicted samples.

3.6. Loss Function

The loss function of the Knowledge-Embedding Relation Network (DKER) consists of two parts: the loss L s e g for segmentation that is assisted and the classification loss L c l s . The total loss can be denoted as L = λ L s e g + δ 1 λ L c l s M , where λ is a simple linear function, and δ is a weight coefficient with a small value. The detailed sets were defined as in [32].

4. Experiments

Below, we experimentally demonstrate the merits of our Knowledge-Embedding Relation Network. Our method was mainly evaluated on the DAGM2007, KolektorSDD, and Severstal Steel datasets. We compared with other advanced algorithms, designed a small training sets test, and conducted ablation experiments.

4.1. Datasets

DAGM2007 contains texture data of 10 categories, and each category contains 1000 negative samples and 150 positive samples saved in grayscale 8-bit PNG format. The training set and test set of each category were allocated in a proportion of 1:1, and the size was a 512 × 512 image. In addition, we explored small training sets scenarios of 5, 10, 15, 20, and 25 positive samples in proportion. It should be pointed out that the standard sample modeling in all training was established from the standard samples of the complete dataset.
The KolektorSDD dataset includes eight non-overlapping images collected from each commutator surface of 50 defective electronic commutators, and a total of 399 images were obtained, including 52 defective images and 347 defect-free images. All data settings refer to dagm, except that the image size was 1408 × 512 for smaller datasets. In addition, our ablation experiment was evaluated on the KSDD dataset with five positive samples.
The Severstal Steel dataset is from the Kaggle Challenge, which contains 12,568 images and involves four kinds of defects. In the effectiveness demonstration of this method, we adopted the scheme of 1000 positive samples, and the size of the input image was 256 × 1600 . In the small training sets scenario, our data settings were the same as above, but the number of test sets was not changed (in order to be more consistent with the real scenario).

4.2. Implementation Details

All codes were implemented in PyTorch. All experiments were tested in the PyTorch framework under the Ubuntu system, and two Titan Xs were used for GPU acceleration. For the learning rate, we followed the learning scheme [32], and DAGM adopted L R = 0.01 and δ = 1 . KSDD adopted L R = 0.5 and δ = 0.01 , as well as L R = 0.1 and δ = 0.1 . However, for the number of learning iterations, due to the introduction of high-order moments, the high-dimensional mapping of features dragged down the convergence speed to a certain extent at the beginning of training, so we adjusted the number of iterations of the experiment: in the small training sets of 5, 10, 15, 20, and 25 positive samples, we trained 350, 190, 170, 150, and 140 epochs, respectively. In the complete experiment, the three data trained 150 epochs.
We conducted less exploration on training tricks and paid more attention to less sample training. In the following experiments, we compare the research results with several advanced methods, and we report the commonly used matrices for KRN, such as AP, FP, and FN.

4.3. Comparison with the State of the Art

4.3.1. DAGM2007

The proposed KRN was evaluated on the DAGM 2007 dataset, and the obtained true positive rate (TPR) and true negative rate (TNR) are shown in Table 1. Our method achieved 100% TPR and TNR on all folds, which means its completely solved this dataset. Practically, the ESDN had achieved this goal before that. Some other explorations also achieved high scores, such as Racki et al. [33], who obtained nine 100% outcomes and a 98.5% in a ten fold, and Kim et al. [34], who obtained 100%, except fold 1 and fold 4 Dagm, as a classic dataset of material texture, whih has sufficient data samples. We tested it above to prove that the KRN guarantees a high score on this complete dataset. We visualized some results in Figure 4.

4.3.2. KolektorSDD

The proposed KRN is compared with the ESDN, SDN, and EfficientNet in Table 2. The KSDD is a totally industrial few-shot dataset with only 53 positive samples. Its author defines it as a classification problem and especially adds small-scale segmentation annotation as an auxiliary. Assisted by segmentation, the SDN obtained a higher score of 99.00%. After a series of optimizations such as dynamic balance loss and gradient adjustment, the ESDN became an end-to-end defect detection model, and it achieved a 99.49% AP and 1 + 2 (FP + FN) in our experiments. According to Table 2, Our KRN achieved a further performance of 100.00% AP and realized 0 + 0 (FP + FN).

4.3.3. Severstal Steel

Table 3 compares our KRN with the ESDN, SDN, and EfficientNet on Severstal steel for 1000 positive samples. As shown in the table, in contrast, the KRN was the best among all methods. Its performance on the AP was 1.18%, 7.38%, and 8.17% higher than the ESDN, SDN, and EfficientNet, respectively.

4.4. Ablation Study

Below we analyze the effectiveness of each component of the proposed KRN approach. We designed six groups of experimental variants (except for the ablation experiment of each component), including variants used by all components and variants not used by all components. The following ablation studies were based on the KSDD dataset with the five positive samples setting. For some components, we further designed comparative experiments for in-depth analysis. For example, the Knowledge Embedding part also adopted add, concat, and mat operations.
Knowledge Modeling (KM): We conducted KM on flawless samples. In this part of the ablation experiment, we changed the KM to predict the conv feature of the image to simulate the effect of no KM. The experimental results show that KM has a great impact on the KRN (97.39% vs. 94.88%). For KSDD settings, it shows that without statistical modeling in Table 4, the AP scores of our KRN on fold 0 and fold 1 decreased by 6.2% and 3.28%, respectively. The analysis shows that KM brings additional knowledge and enhances the recognition performance of the model.
Knowledge Embedding (KE): In the ablation experiment of KE, our KRN performed the variant (97.39% vs. 96.49%), which levers the regular concat fusion in Table 4. concat had the best AP value on fold 0 and fold 2, but fold 1 had only a 91.58% AP. The KE module has the ability to capture knowledge features and predict the relationship between sample features in design, because the module has been significantly enhanced in the defect identification of fold 1 (97.39% vs. 91.58%).
We additionally analyzed the impact of concat, add, and mul operations on KE. The experimental results are shown in Table 5. Among them, the add operation was the roughest for feature fusion through simple addition, with the AP score being the lowest, which was 95.26%. The concat operation retained the original features and knowledge features, which was 1.23% higher than add. Mul enlarged the local difference, and the multiplication between the conv feature and knowledge feature was conducive to mining the relationship between the two features; its ap score was close to KE. Our KE not only retains the original features, but also constructs the relationship between the quasi-sample and the predicted sample, deeply excavates the potential differences, and shows the best performance (97.39% vs. 95.26%, 96.49%, and 97.26%).
Gram Operation (GO): The GO is a supplement to KE, mining the relationship between features after KE. As shown in Table 4, the performance of the GO defect detection algorithm had a 2.27% drop (97.39% vs. 95.12%). GO mines the relationship between features of the final total features. Based on the experimental results of three folds (94.28%, 95.00%, and 96.08%), the GO was one of the main components to improve the performance of the model. The results in fold 0 show that its contribution to the KRN is second only to KM.
Power Normalizing (PN): We also analyzed the improvement brought by the PN, which considers a Burst suppressor on second-order statistics. It can be seen from the previous experiments that the KE and GO introduced negative effects. The analysis shows that it was due to the burst of high-order statistics. In Table 6, it can be seen that after adding the PN operation, the negative effects brought by KE and GO were suppressed. The AP on fold 0 and fold 1 increased by 2.54% and 6.77%, respectively, and the average AP increased by 2.60%.
In particular, we explored two different PN strategies: Asinhe and SigmE. Without any power normalization, the AP score was only 94.79%, and FP + FN was 2 + 2. After adding Power Normalization, the performance was optimized, the false detection and missed detection were reduced from 4 to 2, and the AP increased to 96.89% and 97.39%, respectively. Our knowledge model relationship detector with sigma pooling is beneficial for small training sets defect detection.
In summary, KM, the GO, and PN have a significant impact on the three folds. Among them, KM and GO have a greater gain on fold 0, KE and PN have a greater impact on fold 1, and Ke seems to have a negative effect on fold 2. KM introduces external knowledge, KE embeds and fuses the knowledge, and GO and PN further promote the integration. The four modules cooperate with each other to form gain, mine potential features, and improve the performance of the small training sets defect detection model.

4.5. Small Training Sets

We paid special attention to the scene, where it is difficult to obtain defect samples in the industrial scene, that is, a small number of positive samples. We used positive samples of 5, 10, 15, 20, and 25 for each dataset. The results are shown in Table 7.
For the DAGM2007 dataset, originally based on 150 positive samples, EfficientNet and ESDN could have good performance, but when the number of positive samples decreased, the detection performance of all algorithms showed different decline, as shown in the figure. The details of the decline can be seen in Table 7. When the number of positive samples decreased from 150 to 25 and then to 5, the AP value of the baseline was 100%–90.11%–82.39%; the AP value of the KRN 100%–99.11%–82.39%. When the training samples were sufficient, their AP scores were very high. When the number of positive samples decreased to 25, ESDN decreased by 3%, and KRN decreased by 1.47%; when the number of positive samples turned to five, the AP of the baseline decreased significantly (14%), and the KRN performed better, which only decreased by 9%.
For the KSDD dataset, all three methods achieved good scores. After analysis, we believe that this is because KSDD itself is a small dataset, and the test set is not complex, which makes it possible to obtain a good recognizer with only a small number of samples. However, five positive samples still brought differences to the performance of the various methods. According to Table 7, when the number of positive samples was greater than 20, the AP of each of the three methods was close to 100%. However, with the decrease in the number of positive samples to five, the baseline decreased by 4.18%, while the KRN was the least affected by the standard template KE and relationship mining module, which was only 2.65%.
Regarding the performance for the Steel dataset, like DAGM, all methods were greatly reduced: in the scenario of five positive samples, the AP of the baseline decreased to 58.45%, and there were false detection and missed detection values of 490 + 25; the AP performance of the KRN was the best, which was 63.28%, and the false detection and missing detection came out to only 467 + 34. Although these benefits are lower than the DAGM and KSDD, they are consistent and significant considering that the Steel dataset is more challenging in terms of complexity and dataset size.
We visualized the experimental results, as shown in Figure 5. The AP and FP + FN increased by varying degrees with the increase in positive samples. It reveals the generalization fragility of the baseline under the small training sets: without adequate training images, it detected poorly defected from input images. In contrast, for our KRN, the KE and GO demonstrated superior performance on small training sets defect detection.

4.6. Visualization of Detection Results

The following shows some test images on the DAGM, KSDD, Steel, and other datasets. We output the segmentation results, including the original image, ground truth, baseline, and KRN. Our model is more classification-based, and segmentation is the auxiliary part of conv feature extraction. Therefore, segmentation only needs shallow segmentation in the small and medium scale. Based on the visual results, the defect part in the segmented image will be larger than ground truth, which is normal. In addition, from Figure 5, in “the first sample of DAGM, the third sample of KSDD and the second sample of Steel”, it can also be seen that the difference was amplified by the KRN after introducing the second-order moment of covariance. But at the same time, the higher-order feature increased the burst (the noise in the upper right of the second sample of Steel).

5. Conclusions

In this paper, we proposed a Knowledge-Embedding Relation Network (KRN) for the small training sets Defect Detection to address few-shot defect detection. Our model extends the ESDN through embedding standard templates and second-order statistics into CNN features of segmentation excitation. The standard template provides external knowledge for defect samples, while KE and GO provide high-latitude potential relationship features. In order to demonstrate the effectiveness of the KRN, we have conducted extensive quantitative and qualitative experiments on several datasets. In particular, we simulated the industrial detection scene of small training sets and carried out relevant experiments.

Author Contributions

Resources, Y.W.; Data curation, Y.T., Y.F. and L.Q.; Writing—original draft, J.R.; Writing—review & editing, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of China, No. 2023YFC3107903, and in part by the National Key Research and Development Program of China, No. 2023YFB4302302.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository. The DAGM2007 dataset presented in this study are available at [https://www.kaggle.com/datasets/mhskjelvareid/dagm-2007-competition-dataset-optical-inspection, accessed on 4 September 2024], the KolektorSDD dataset presented in this study are available at [https://www.vicos.si/resources/kolektorsdd/, accessed on 4 September 2024]. No new data were created or analyzed in this study.

Conflicts of Interest

Author Liang Qu was employed by the China National Offshore Oil Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Bao, Y.; Song, K.; Liu, J.; Wang, Y.; Yan, Y.; Yu, H.; Li, X. Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 5011111. [Google Scholar] [CrossRef]
  2. Yu, W.; Zhang, Y.; Shi, H. Surface Defect Inspection Under a Small Training Set Condition. In Proceedings of the International Conference on Intelligent Robotics and Applications, Shenyang, China, 8–11 August 2019; pp. 517–528. [Google Scholar]
  3. Saha, S.; Sheikh, N. Ultrasound image classification using ACGAN with small training dataset. In Proceedings of the International Symposium on Signal and Image Processing, Kolkata, India, 18–19 March 2020; pp. 85–93. [Google Scholar]
  4. Si, C.; Zhang, Z.; Qi, F.; Liu, Z.; Wang, Y.; Liu, Q.; Sun, M. Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1569–1576. [Google Scholar]
  5. Hsu, M.-J.; Chien, Y.-H.; Wang, W.-Y.; Hsu, C.-C. A convolutional fuzzy neural network architecture for object classification with small training database. Int. J. Fuzzy Syst. 2020, 22, 1–10. [Google Scholar] [CrossRef]
  6. Wu, Y.; Lin, Y.; Dong, X.; Yan, Y.; Ouyang, W.; Yang, Y. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5177–5186. [Google Scholar]
  7. Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
  8. Silvestre-Blanes, J.; Albero, T.; Miralles, I.; Pérez-Llorens, R.; Moreno, J. A public fabric database for defect detection methods and results. Autex Res. J. 2019, 19, 363–374. [Google Scholar] [CrossRef]
  9. Gan, J.; Li, Q.; Wang, J.; Yu, H. A hierarchical extractor-based visual rail surface inspection system. IEEE Sens. J. 2017, 17, 7935–7944. [Google Scholar] [CrossRef]
  10. Malamas, E.N.; Petrakis, E.G.M.; Zervakis, M.; Petit, L.; Legat, J.-D. A survey on industrial vision systems, applications and tools. Image Vis. Comput. 2003, 21, 171–188. [Google Scholar] [CrossRef]
  11. Abd Al Rahman, M.; Mousavi, A. A review and analysis of automatic optical inspection and quality monitoring methods in electronics industry. IEEE Access 2020, 8, 183192–183271. [Google Scholar]
  12. Li, Y.; Wang, N.; Liu, J.; Hou, X. Demystifying neural style transfer. arXiv 2017, arXiv:1701.01036. [Google Scholar]
  13. Zhang, H.; Koniusz, P. Power normalizing second-order similarity network for few-shot learning. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 1185–1193. [Google Scholar]
  14. Jager, M.; Knoll, C.; Hamprecht, F.A. Weakly supervised learning of a classifier for unusual event detection. IEEE Trans. Image Process. 2008, 17, 1700–1708. [Google Scholar] [CrossRef] [PubMed]
  15. Ghatnekar, S. Use Machine Learning to Detect Defects on the Steel Surface. 2018. Available online: https://insiders.intel.com/projects/using-machine-learning-to-detect-defects-on-the-steel-surface (accessed on 2 September 2024).
  16. Wang, W.; Zheng, V.W.; Yu, H.; Miao, C. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–37. [Google Scholar] [CrossRef]
  17. Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (CSUR) 2020, 53, 1–34. [Google Scholar] [CrossRef]
  18. Fu, Y.; Fu, Y.; Jiang, Y.-G. Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data. In Proceedings of the Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 5326–5334. [Google Scholar]
  19. Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  20. Zhang, R.; Che, T.; Ghahramani, Z.; Bengio, Y.; Song, Y. Metagan: An adversarial approach to few-shot learning. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
  21. Renz, K.; Stache, N.C.; Fox, N.; Varol, G.; Albanie, S. Sign Segmentation with Changepoint-Modulated Pseudo-Labelling. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
  22. Lau, S.L.; Lew, J.; Ho, C.C.; Su, S. Exploratory Investigation on a Naive Pseudo-labelling Technique for Liquid Droplet Images Detection using Semi-supervised Learning. In Proceedings of the 2021 IEEE International Conference on Computing (ICOCO), Kuala Lumpur, Malaysia, 17–19 November 2021; pp. 353–359. [Google Scholar]
  23. Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 6–11 July 2015; Volume 2. [Google Scholar]
  24. Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459. [Google Scholar] [CrossRef]
  25. Cootes, T.F.; Taylor, C.J.; Cooper, D.H.; Graham, J. Active shape models-their training and application. Comput. Vis. Image Underst. 1995, 61, 38–59. [Google Scholar] [CrossRef]
  26. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017. [Google Scholar]
  27. Battaglia, P.; Pascanu, R.; Lai, M.; Jimenez Rezende, D. Interaction networks for learning about objects, relations and physics. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  28. Koniusz, P.; Yan, F.; Gosselin, P.-H.; Mikolajczyk, K. Higher-order occurrence pooling for bags-of-words: Visual concept detection. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 313–326. [Google Scholar] [CrossRef] [PubMed]
  29. Shih, Y.-F.; Yeh, Y.-M.; Lin, Y.-Y.; Weng, M.-F.; Lu, Y.-C.; Chuang, Y.-Y. Deep co-occurrence feature learning for visual object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4123–4132. [Google Scholar]
  30. Koniusz, P.; Zhang, H.; Porikli, F. A Deeper Look at Power Normalizations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  31. Koniusz, P.; Tas, Y.; Porikli, F. Domain adaptation by mixture of alignments of second-or higher-order scatter tensors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4478–4487. [Google Scholar]
  32. Božič, J.; Tabernik, D.; Skočaj, D. End-to-end training of a two-stage neural network for defect detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5619–5626. [Google Scholar]
  33. Racki, D.; Tomazevic, D.; Skocaj, D. A compact convolutional neural network for textured surface anomaly detection. In Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1331–1339. [Google Scholar]
  34. Kim, S.; Kim, W.; Noh, Y.-K.; Park, F.C. Transfer learning for automated optical inspection. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 201; pp. 2517–2524.
  35. Scholz-Reiter, B.; Weimer, D.; Thamer, H. Automated surface inspection of cold-formed micro-parts. CIRP Ann. 2012, 61, 531–534. [Google Scholar] [CrossRef]
Figure 1. The architecture of general Defect Detection models. (1) Encoding Network; (2) Knowledge Embedding consist of knowledge model, correlation fusion module, and Gram Operation; (3) Predictive Network.
Figure 1. The architecture of general Defect Detection models. (1) Encoding Network; (2) Knowledge Embedding consist of knowledge model, correlation fusion module, and Gram Operation; (3) Predictive Network.
Applsci 14 07922 g001
Figure 2. Knowledge mining based on embedded relation module. The input is the conv characteristics of the predicted samples and the prior knowledge processed into tensors. The correlation between the two is captured with the help of covariance operation and then fused in the form of attention.
Figure 2. Knowledge mining based on embedded relation module. The input is the conv characteristics of the predicted samples and the prior knowledge processed into tensors. The correlation between the two is captured with the help of covariance operation and then fused in the form of attention.
Applsci 14 07922 g002
Figure 3. Gram Operation module. We flatten conv features into feature vectors, capture second-order features by covariance operation, and then send them to PN. Finally, self replication is carried out for the subsequent diversity and fusion promotion operation.
Figure 3. Gram Operation module. We flatten conv features into feature vectors, capture second-order features by covariance operation, and then send them to PN. Finally, self replication is carried out for the subsequent diversity and fusion promotion operation.
Applsci 14 07922 g003
Figure 4. Examples of images, defects, and detections with segmentation output from the DAGM (top), KolektorSDD (middle), and Steel (bottom) datasets.
Figure 4. Examples of images, defects, and detections with segmentation output from the DAGM (top), KolektorSDD (middle), and Steel (bottom) datasets.
Applsci 14 07922 g004aApplsci 14 07922 g004b
Figure 5. Smaller training set size results of DAGM, KSDD, Steel. The three figures above are the change curve of map with the number of positive samples, and the three figures below are the corresponding FP + FN.
Figure 5. Smaller training set size results of DAGM, KSDD, Steel. The three figures above are the change curve of map with the number of positive samples, and the three figures below are the corresponding FP + FN.
Applsci 14 07922 g005
Table 1. mAP on four methods (DAGM, 150 positive samples).
Table 1. mAP on four methods (DAGM, 150 positive samples).
SurfaceOurESDNRacki et al.Kim et al.Scholz et al. [35]
TPRTNRTPRTNRTPRTNRTPRTNRTPRTNR
110010010010010098.899.810099.799.4
210010010010010099.810010080.094.3
310010010010010096.310010010099.5
410010010010098.599.899.910096.192.5
510010010010010010010010096.196.9
610010010010010010010010096.1100
7100100100100100100----
8100100100100100100----
910010010010010099.9----
10100100100100100100----
Table 2. mAP on four methods (KSDD, 33 positive samples).
Table 2. mAP on four methods (KSDD, 33 positive samples).
MethodAP/%FP + FN
EfficientNet--
SDN99.001 + 0
ESDN99.491 + 2
Ours100.000 + 0
Table 3. mAP on four methods (Steel, 1000 positive samples).
Table 3. mAP on four methods (Steel, 1000 positive samples).
MethodAP/%FP + FN
EfficientNet91.56-
SDN92.35-
ESDN98.4568 + 85
Ours99.7340 + 32
Table 4. mAP on four methods (ablation study on KSDD Dataset with 5 positive samples).
Table 4. mAP on four methods (ablation study on KSDD Dataset with 5 positive samples).
Fold 0Fold 1Fold 2MeanKnowledge
Model
Knowledge
Embedding
Gram
Operation
Power
Normalizing
94.9090.0198.4794.46
93.1894.7096.7594.88
99.4391.588.4796.49
94.2895.0096.0895.12
96.8491.2196.3194.79
99.3897.9894.8097.39
Table 5. mAP on four variants of KE (KSDD, 5 positive samples).
Table 5. mAP on four variants of KE (KSDD, 5 positive samples).
MethodAP/%FP + FN
Add95.262 + 1
Concat96.491 + 1
Mul97.261 + 1
KE97.390 + 1
Table 6. mAP on Two PNs (KSDD, 5 positive samples).
Table 6. mAP on Two PNs (KSDD, 5 positive samples).
MethodAP/%FP + FN
None94.792 + 2
AsinhE96.890 + 2
SigmE97.390 + 1
Table 7. mAP on Three Methods (DAGM, KSDD, STEEL).
Table 7. mAP on Three Methods (DAGM, KSDD, STEEL).
ModelDatasetNum of Positive Samples
510152025
AP/%FP + FNAP/%FP + FNAP/%FP + FNAP/%FP + FNAP/%FP + FN
baselineDAGM85.5843 + 1291.0243 + 598.092 + 399.272 + 299.181 + 2
Our90.1223 + 1696.233 + 798.042 + 399.651 + 199.111 + 1
baselineKSDD95.822 + 297.252 + 297.961 + 298.741 + 199.780 + 1
Our97.351 + 297.841 + 299.151 + 199.290 + 1100.000 + 0
baselineSTEEL58.45490 + 2554.27508 + 1864.81334 + 6562.87419 + 5667.04401 + 61
Our63.28467 + 3160.13438 + 5365.57346 + 7068.13349 + 4875.46315 + 83
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ruan, J.; He, J.; Tong, Y.; Wang, Y.; Fang, Y.; Qu, L. Knowledge Embedding Relation Network for Small Data Defect Detection. Appl. Sci. 2024, 14, 7922. https://doi.org/10.3390/app14177922

AMA Style

Ruan J, He J, Tong Y, Wang Y, Fang Y, Qu L. Knowledge Embedding Relation Network for Small Data Defect Detection. Applied Sciences. 2024; 14(17):7922. https://doi.org/10.3390/app14177922

Chicago/Turabian Style

Ruan, Jinjia, Jin He, Yao Tong, Yuchuan Wang, Yinghao Fang, and Liang Qu. 2024. "Knowledge Embedding Relation Network for Small Data Defect Detection" Applied Sciences 14, no. 17: 7922. https://doi.org/10.3390/app14177922

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop