Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning

Li, Hang; Li, Li; Wang, Hongbing

doi:10.3390/app122311893

Open AccessArticle

Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning

by

Hang Li

,

Li Li

^* and

Hongbing Wang

School of Computer and Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 11893; https://doi.org/10.3390/app122311893

Submission received: 26 October 2022 / Revised: 11 November 2022 / Accepted: 12 November 2022 / Published: 22 November 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In industrial defect detection tasks, the low probability of occurrence of severe industrial defects under normal production conditions has brought a great challenge for data-driven deep learning models that have just a few samples. Contrastive learning based on a sample pair makes it possible to obtain a great number of training samples and learn effective features quickly. In the field of industrial defect detection, the features of some defect instances have small category variance, and the scale of some defect instances has a great diversity. We propose a few-shot object detection network based on contrastive learning and multi-scale feature fusion. Aligned contrastive loss is adopted to increase the instance-level intra-class compactness and inter-class variance, and the misalignment problem is alleviated to a certain extent. A multi-scale fusion module is designed to recognize multi-scale defects by adaptively fusing features from different resolutions with the idea of exploiting the support branch’s information. The robustness and efficiency of the proposed method were evaluated on an industrial wear debris defect dataset and the MS COCO dataset.

Keywords:

few-shot learning; contrastive learning; multi-scale fusion; defect detection

1. Introduction

Wear debris comprises the friction particles found in lubricating oil as a result of friction on equipment, and it carries much information about the running state of a machine.The shape, amount, and size of wear debris can reflect and reveal the degree and mechanism of wear of a machine. Wear debris techniques can be used to monitor the status and long-term trends of the wear of parts of steel production equipment, and they play an important role in preventive measures, which are helpful for avoiding faults in steel production equipment.

Convolutional neural networks (CNNs) have made great progress in general object detection. Deep learning requires a large amount of annotated training data. Few-shot learning aims to train a model with a strong generalization ability with just a small amount of data. Existing few-shot learning methods can mainly be grouped into the two categories based on the model architecture: single-branch models [1,2,3] and two-branch models [4,5,6,7,8,9,10]. Single-branch models attempt to exploit transfer learning to achieve quick adaptation to novel domains. Two-branch models mimic a Siamese network [11] in order to handle the query and support branches, and the distance between the two branches is naturally learned through logistic regression. The key in a two-branch model is to make good use of the guidance information of the support branch.

Unlike object detection in a natural scene, the defect detection with wear debris has the following difficulties: (1) The probability of the occurrence of serious defects in wear debris is low, which makes it difficult to collect enough defect samples. So, the defect detection with wear debris is a typical few-shot object detection task. (2) The features of wear debris defects have a smaller category variance, and it is difficult for commonly used object detection models to learn their distinguishing features. (3) Recognizing objects at vastly different scales is a fundamental challenge in computer vision, and wear debris defects have excessive variations in scale. The shape and size of wear debris defects are unpredictable, which brings great difficulties for models that are used to classify and locate them. So, it is necessary to fuse multi-scale features.

We propose an improved two-branch model that combines contrastive learning and multi-scale feature fusion for few-shot learning. We improve the two-branch model from the following perspectives: (1) Aligned contrastive loss is designed to reduce the variance of object proposals embedding from the same category while pushing instances of different categories away from each other, and the misalignment problem is handled. (2) Multi-scale fusion is proposed in order to adaptively fuse features from different resolutions in the query branch. Comprehensive experiments are conducted to assess each method fairly. The extensive experimental results show the effectiveness and robustness of the proposed model.

2. Related Works

Object detection has made great progress in recent years. Previous approaches can be broadly divided into two categories: anchor-based and anchor-free approaches. An anchor-based approach incorporates an anchor as prior information, and such approaches can be subdivided into one-stage models and two-stage models. The common one-stage models include YOLO [12], RetinaNet [13], etc. Common two-stage models include Faster RCNN [14], etc. An anchor-free approach replaces anchors with reference points that are obtained through prediction/regression, and common anchor-free models include CentreNet [15], FCOS [16], etc. General detectors fail to establish robust feature representations from limited shots, resulting in the mislabeling of localized objects and a low average precision. Few-shot learning addresses such issues, and several approaches have been studied.

Over the last few years, few-shot learning has been widely used in the field of object detection. Wang et al. [1] proved that a model that was trained with a base dataset and fine-tuned with a novel dataset could also achieve good accuracy. On this basis, Wu et al. [2] used multi-scale positive samples to adjust the features of the proposal. Sun et al. [3] utilized contrastive learning to constrain the distance of sample pairs in projection space. In this area, Kang et al. [4] firstly proposed a two-branch method that consisted of twin networks that shared weights, where each respective network was fed with a support image and a query. Its detector needed to match image features from two branches. This matching strategy captured the inherent variations between the support and query regardless of their categories. Their two-branch method contained positive and negative pairs, and the learning form used for the sample pairs made the model converge quickly. One of the key issues in this line of research is that of making full use of the information of the support branch. Subsequent works exploited a delicate module [5,6,7,8,9] or transformer [17,18] to enhance the feature embedding from the query and support branches.

In the field of industrial defect detection, a complicated background brings a greater challenge to the model. Wear debris was classified using traditional features, such as the LBP and Tamura coarseness features, by Huang et al. [19]. Peng et al. [20] proposed a hybrid search–tree discriminant technique to conduct an analysis of wear debris. An automatic wear particle detection and classification process was developed in [21] by using a cascade of two convolutional neural networks and a support vector machine (SVM) classifier. Yang et al. [22] proposed a pre-classification and cascade detection framework to detect welding defects. Wang et al. [23] combined traditional features and CNN features to fulfill digit number detection at an industrial site. A few methods implemented few-shot learning for industrial defect detection. Wang et al. [24] added an ECA module and transformer module in order to deal with few-shot defect detection tasks by enhancing the diversity of the defect shapes and scales. Xu et al. [25] applied a two-branch model for defect classification. Yu et al. [26] utilized few-shot learning for defect segmentation.

Severe sliding is infrequent in production, so the number of severe defects is quite small. It is necessary to conduct a study of few-shot wear debris defect detection.

3. Method

3.1. Problem Definition

A few-shot object detection scenario contains two types of dataset: a base dataset and a novel dataset. There are sufficient data in the base dataset, and just few data in the novel dataset. The categories of two datasets are different from each other. Few-shot object detection learns category-specific embedding and requires to be fine-tuned for novel categories. Our method consists of two training phases: pre-training phase and fine-tuning phase. The base dataset was used in pre-training phase and the novel dataset was used in fine-tuning phase.

The model adopts a meta-learning approach. During each phase, the dataset will be further subdivided into query set and support set, and the images of two subsets are different from each other. Each episode contains a query image (Qc, where c is the target category), K-1 support images sampled from the support set which contains the same category of support image (Sc), and K support images of different categories (Sd). The task is to locate all the targets in Qc that belong to category c and classify them correctly. In the fine-tuning phase, if novel dataset contains N categories, K targets per category, this task is defined as N-way-K-shot few-shot object detection task.

3.2. Overall Architecture

Compared with Baseline, we jointly optimize model with Aligned Contrastive Loss and Multi-Scale Fusion, as shown in Figure 1. Specifically, positive pairs which query image and support image are in the same category, and negative pairs which query image and support image are in the different category were constructed. We build a weight-shared framework that consists of two branches. The query branch of the weight-shared framework is a Faster RCNN network, which contains RPN and detector. The Aligned Contrastive Loss promotes instance level intra-class compactness and inter-class variance. The Multi-Scale Fusion fuses the features of query branch in different resolutions with the information from support branch. We utilize this framework to train the matching relationship between support and query features, so as to make the network learn general knowledge among the positive/negative pairs.

3.3. Aligned Contrastive Loss

The features of wear debris defect are too analogous to distinguish them, which is a burden on the detector of the model, as shown in Figure 2. Traditional methods fuse query features and support features by channel-wise multiplication [7], element-wise subtraction [6], without treating support features as a supervision information.

Inspired by the pretext task of unsupervised contrastive model MoCo [27], instance discrimination, we introduce Aligned Contrastive Loss, which is based on InfoNCE Loss [28] and is parallel to the RoIPooling, to the primary model, as shown in Equation (1).

L_{A l i g n e d C o n t r a s t i v e} = - log \frac{exp (f (A l i g n e d (q, k_{+}) * q, k_{+}) / τ)}{\sum_{1}^{k} exp (f (A l i g n e d (q, k_{i}) * q, k_{i}) / τ))}

(1)

Specifically, the model learns to match Pf (Proposal of foreground) and Pb (Proposal of background) generated by RPN in the query image with the Sp (Support positive) object and Sn (Support negative) object in the support image. Thus, the model learns to not only match the objects in positive pair (Pf, Sp) but also distinguish objects in negative pairs (Pf, Sn), (Pb, Sp) and (Pb, Sn). The ratio of the positive and negative pairs is 1:3. The InfoNCE Loss is shown in Equation (2).

I n f o N C E L o s s = - log \frac{exp (p o s)}{exp (p o s + n e g)}

(2)

Referring to SimCLR [29] and MoCo, the features from query and support branch were regularized by L2 norm and projected by a MLP projector. Another form of InfoNCE Loss is shown in Equation (3).

I n f o N C E L o s s = - log \frac{exp (f (q \cdot k_{+}) / τ)}{\sum_{1}^{k} exp (f (q \cdot k_{i}) / τ)}

(3)

In Equation (3): c is the total number of positive and negative pairs within the current episode; q is the query feature of the proposal;

k_{+}

is the support feature of the positive pairs;

k_{i}

is the support feature of the positive/negative pairs; i is the i-th pair in this batch; f is a distance metric function, such as cosine distance, Euclidean distance, and so on. The hyper-parameter

τ

is a hyper-parameter that controls the concentration level of the distribution. We perform a hyper-parameter searching for

τ

, and select 0.2 as the default value of

τ

in this paper.

However, due to the inaccurate localization of proposals generated by RPN, spatial misalignment between proposals and support examples has a negative effect on the matching. Instance discrimination is a classification task, so there is no misaligned problem. FSCE [3] filters out proposals that are heavily misaligned using IoU threshold. Inspired by Meta-FasterRCNN [8], we design an alignment module that establishes soft correspondences between query features and support features. Then the regions of suppression and highlight are deduced. So, the misaligned problem is alleviated in a certain extent. In alignment module, Aligned Mask is calculated by multiplication and sigmoid operation. Aligned Mask is calculated as Equation (4):

A l i g n e d M a s k = A l i g n e d (Q, K) = sigmoid (\frac{Q K^{T}}{\sqrt{d}})

(4)

In Equation (4): Q is the query feature of the proposal;

K^{T}

is the transpose of support feature in the positive/negative pairs; Q is the scaling factor of features. InfoNCE Loss with alignment module(Aligned Contrastive Loss) is shown in Figure 3. The value in the Aligned Mask means the soft correspondences between each position in query and support feature maps.

Specifically, in Faster RCNN frameworks, RPN takes backbone feature maps as inputs and generates region proposals, RoI head feature extractor first pools the region proposals to fixed size and then encodes them as vector embeddings known as the RoI features. Then, alignment module adjust the RoI features to handle the misaligned problem. Finally, we utilize the InfoNCE Loss function to optimize model parameter.

The optimization of the above loss function increases the instance level similarity between proposals with the same label and proposals with different labels apart in the projection space. As a result, instances from each category will form a tighter cluster, and the margins around the periphery of the clusters are enlarged. Thus, the total loss function of our model consists of five parts, namely the classification loss

L_{r p n c l s}

and the regression loss

L_{r p n r e g}

of RPN, the classification loss

L_{h e a d c l s}

and the regression loss of the detection head

L_{h e a d r e g}

and

L_{A l i g n e d C o n t r a s t i v e}

, as shown in Equation (5).

L_{t o t a l} = L_{r p n c l s} + L_{r p n r e g} + L_{h e a d c l s} + L_{h e a d r e g} + L_{A l i g n e d C o n t r a s t i v e}

(5)

3.4. Multi-Scale Fusion

The statistical bounding box area information in each category of the wear debris defect dataset shows that the size of defect instances has greater diversity. Even instances in the same category vary greatly in size, perspective and possible occlusion, as shown in Figure 4.

Feature fusion is one of the most useful methods to detect multi-scale defect. But the previous two-branch object detection method didn’t use FPN, because it’s difficult to select the most representative features from FPN. To address this issue, we propose Multi-Scale Fusion not only to avoid the selection, but also to fuse features from different resolution, as shown in Figure 5.

When fusing features with different resolutions, a common way is to first resize them to the same resolution and then sum them up. However, we observe that different input features usually contribute to the output features unequally since they are at different resolutions. We notice that the support branch hasn’t been exploited in the previous methods. The information of support branch was idle. So, we propose an attention module to utilize the information of support branch to learn fusion weights. The attention module is a bottleneck network consisting of 3*3 convolution, 1*1 convolution and sigmoid operation. The value of each normalized fusion weights falls between 0 and 1.

The support branch should provide diverse features and perspectives of different resolutions so that attention module can learn more effective fusion weights. So, we utilize multi-scale training, random flipping, color jitter and other augmentations for the images in support branch.

4. Experiment

4.1. Dataset

The wear debris defect dataset and weld defect dataset used in this paper are collected from a domestic company. The weld defect dataset contains defects of round, crack, icf, lop, and bar, as shown in Figure 6. The number of each category in weld defect dataset is shown in Table 1. The training set in the weld defect dataset contains 555 images and the validation set contains 138 images. And The wear debris defect dataset contains defects of copper, cutting, fatigue, severe sliding, and spherical, as shown in Figure 7. The number of each category in wear debris defect dataset is shown in Table 2. The wear debris defect dataset contains 2643 images in the training set and 882 images in the validation set.

We use the weld defect dataset as the base dataset and the wear debris defect dataset as the novel dataset to carry out the 5way-5shot, 5way-10shot, 5way-30shot few-shot object detection tasks according to the settings in Section 3.1. The data used in few-shot setting are sampled from wear debris defect dataset.

4.2. Implementation

Our model is trained end-to-end on 4 NVIDIA 1080Ti GPUs in PyTorch. Our framework consists of two training phases. In the pre-training stage, the training dataset is weld defect dataset. The batch size is 8 (for query image) and the learning rate is 0.004 for the first 25,000 iterations and 0.0004 for later 25,000 iterations. In the fine-tuning stage, the training dataset is wear debris defect dataset. The batch size is 8 (for query image) and the learning rate is 0.001 for the first 2000 iterations and 0.0001 for later 2000 iterations. All accuracy results are calculated on the validation set of the wear debris defect dataset in this paper unless otherwise specified. The query images are resized to short size

M * M

, in which M is sampled from 440–600, and the maximum long side is set to be 1000. The support images are cropped from the annotations and padded with 16 pixels around it, then resized to

N * N

, in which N is sampled from 256–384. We fix the weights of Res1-3 blocks and only train high-level layers to utilize low-level basic features and avoid over-fitting. We adopt the COCO-style AP, AP50 and AP75 for evaluation.

4.3. Evaluation

4.3.1. Comparison between Single-Branch and Two-Branch Object Detection Models

The accuracy results of Faster RCNN [13], few-shot model FSCE [3] and baseline are shown in Figure 8. Three models trained with the same data, same experimental settings. It is worth mentioning that both Faster RCNN and FSCE using FPN, but the Baseline does not. The accuracy of baseline is higher than Faster RCNN and FSCE, which proves that the two-branch object detection model is more suitable for few-shot defect detection tasks.

4.3.2. Evaluation of Aligned Contrastive Loss

Few-shot detection results for Aligned Contrastive Loss are shown in first and second row of each setting in Table 3. The model with Aligned Contrastive Loss has accuracy improved on AP50, AP75, APs, APm and APl. The lowest APs accuracy of baseline is only 0.69% due to the small number of data, and this phenomenon proves that the model doesn’t fully exploit data. When Aligned Contrastive Loss is added, APs accuracy increase by 14.76% (5shot), 15.32% (10shot), and 15.22% (30shot).

The AP accuracy for each category is shown in first and second row of each setting in Table 4. The feature and size between coppers and fatigue are very similar, which puts a burden on model learning, as shown in Figure 2. With Aligned Contrastive Loss, AP accuracy for these two defects is improved, indicating that our model learns more distinguishable features.

4.3.3. Evaluation of Multi-Scale Fusion

Few-shot detection results for Multi-Scale Fusion are shown in first and third row of each setting in Table 3. The results indicate that the combination of multi-scale features has greatly alleviated the problem of excessive scale variation of defect.

We analyze the wear debris defect dataset, and the average bounding box size of each defect category is shown in Figure 9. Typically, spherical has the smallest average size, and severe sliding has the largest. AP accuracy of both defects is improved in Table 4, demonstrating the effectiveness of the Multi-Scale Fusion.

4.3.4. Result on MS COCO Dataset

It is worth mentioning that model with Aligned Contrastive Loss and Multi-Scale Fusion has achieved an accuracy improvement in MS COCO dataset, as shown in Figure 10.

4.4. Ablation

4.4.1. Ablation for Aligned Contrastive Loss and Multi-Scale Fusion

Table 5 shows the AP, AP50, AP75 accuracy of the model with or without Aligned Contrastive Loss and Multi-Scale Fusion on the validation set of the wear debris defect dataset. It is effective to constrain the distance of query and support features in projection space by Aligned Contrastive Loss. In addition, fusing features from different resolutions is also effective. Table 5 shows that both modules help to improve detection performance, and both processes work better as the number of shots increased.

4.4.2. Ablation for Alignment Module in Aligned Contrastive Loss

We compare the accuracy of InfoNCE Loss with and without Alignment module at 5shot, 10shot, and 30shot settings in Table 6. After alleviating the spatial misalignment, Aligned Contrastive Loss enables the model to achieve greater accuracy improvements.

4.4.3. Ablation for Using Information of Support Branch in Multi-Scale Fusion

To verify the effectiveness of our method, we also implement the Support Fusion Module which doesn’t exploit the information of support branch. And the fusion weights are learned from the features of different resolution in query branch. A comparative experiment is designed for using the information of support branch on the wear debris defect dataset, and the results are shown in Table 7. In 5way-5shot setting, the support branch contains few positive and negative pairs to provide enough effective information, which is the reason for accuracy decrease. And with the increase of shot number, the detection performance improves.

5. Future Work

We evaluate our model trained on the base dataset and finetuned on the novel dataset. If the networks have the ability to match given support images with the query image well after training, detection can be performed with only training on the base dataset. If this characteristic is well utilized, it may be more advantageous to detect severe industrial defects which do not have the training data.

6. Conclusions

The defects in the wear debris defect dataset have smaller category variance and excessive scale variation, and it’s difficult for common object detection model to deal with this situation. We propose a meta-learning based few-shot object detection method in this paper. Firstly, Aligned Contrastive Loss is proposed to address the misalignment between proposals and novel class examples and constrain the distance in projection space. Then, a novel Multi-Scale Fusion module, which learns knowledge from support branch, is proposed to address the multi-scale problem. Compared with Baseline, the above methods achieved AP accuracy improvement of 2.08% (5shot), 3.17% (10shot), and 3.49% (30shot) on the wear debris defect dataset. In Addition, Aligned Contrastive Loss and Multi-scale Fusion also significantly surpassed baseline in the 20way-10shot setting of the MS COCO dataset, proving the effectiveness of our methods.

Author Contributions

Conceptualization, H.L. and H.W.; Methodology, H.L. and H.W.; Investigation, H.L.; Validation, H.L.; Writing—original draft preparation, H.L.; Writing—review and editing, H.L., H.W. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MS COCO dataset is available at https://cocodataset.org (accessed on 3 September 2022).

Acknowledgments

Additionally, the authors would like to thank all the reviewers who participated in the review.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, X.; Huang, T.E.; Darrell, T.; Gonzalez, J.E.; Yu, F. Frustratingly simple few-shot object detection. arXiv 2020, arXiv:2003.06957. [Google Scholar]
Wu, J.; Liu, S.; Huang, D.; Wang, Y. Multi-scale positive sample refinement for few-shot object detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 456–472. [Google Scholar]
Sun, B.; Li, B.; Cai, S.; Yuan, Y.; Zhang, C. Fsce: Few-shot object detection via contrastive proposal encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7352–7362. [Google Scholar]
Kang, B.; Liu, Z.; Wang, X.; Yu, F.; Feng, J.; Darrell, T. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8420–8429. [Google Scholar]
Fan, Q.; Zhuo, W.; Tang, C.K.; Tai, Y.W. Few-shot object detection with attention-RPN and multi-relation detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4013–4022. [Google Scholar]
Xiao, Y.; Marlet, R. Few-shot object detection and viewpoint estimation for objects in the wild. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 192–210. [Google Scholar]
Wu, X.; Sahoo, D.; Hoi, S. Meta-rcnn: Meta learning for few-shot object detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1679–1687. [Google Scholar]
Han, G.; Huang, S.; Ma, J.; He, Y.; Chang, S.F. Meta faster r-cnn: Towards accurate few-shot object detection with attentive feature alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 780–789. [Google Scholar]
Chen, T.I.; Liu, Y.C.; Su, H.T.; Chang, Y.C.; Lin, Y.H.; Yeh, J.F.; Chen, W.C.; Hsu, W. Dual-awareness attention for few-shot object detection. IEEE Trans. Multimed. 2021, 1. [Google Scholar] [CrossRef]
Lee, H.; Lee, M.; Kwak, N. Few-Shot Object Detection by Attending to Per-Sample-Prototype. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2445–2454. [Google Scholar]
Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML Deep Learning Workshop, Lille, France, 10–11 July 2015; Volume 2. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ibbett, R.N.; Edwards, D.A.; Hopkins, T.; Cadogan, C.; Train, D. Centrenet—A High Performance Local Area Network. Comput. J. 1985, 28, 231–242. [Google Scholar] [CrossRef] [Green Version]
Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1922–1933. [Google Scholar] [CrossRef] [PubMed]
Hsieh, T.I.; Lo, Y.C.; Chen, H.T.; Liu, T.L. One-shot object detection with co-attention and co-excitation. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Doersch, C.; Gupta, A.; Zisserman, A. Crosstransformers: Spatially-aware few-shot transfer. Adv. Neural Inf. Process. Syst. 2020, 33, 21981–21993. [Google Scholar]
Wang, H.; Huang, R.; Gao, L.; Wang, W.; Xu, A.; Yuan, F. Wear debris classification of steel production equipment using feature fusion and case-based reasoning. ISIJ Int. 2018, 58, 1293–1299. [Google Scholar] [CrossRef] [Green Version]
Peng, Y.; Wu, T.; Cao, G.; Huang, S.; Wu, H.; Kwok, N.; Peng, Z. A hybrid search-tree discriminant technique for multivariate wear debris classification. Wear 2017, 392, 152–158. [Google Scholar] [CrossRef]
Peng, Y.; Cai, J.; Wu, T.; Cao, G.; Kwok, N.; Peng, Z. WP-DRnet: A novel wear particle detection and recognition network for automatic ferrograph image analysis. Tribol. Int. 2020, 151, 106379. [Google Scholar] [CrossRef]
Yang, H.; Wang, H.; Li, H.; Song, X. Weld Defect Cascaded Detection Model Based on Bidirectional Multi-scale Feature Fusion and Shape Pre-classification. ISIJ Int. 2022, 62, 1485–1492. [Google Scholar] [CrossRef]
Wang, H.; Wei, S.; Huang, R.; Deng, S.; Yuan, F.; Xu, A.; Zhou, J. Recognition of plate identification numbers using convolution neural network and character distribution rules. ISIJ Int. 2019, 59, 2044–2051. [Google Scholar] [CrossRef] [Green Version]
Wang, H.; Li, Z.; Wang, H. Few-Shot Steel Surface Defect Detection. IEEE Trans. Instrum. Meas. 2021, 71, 1–12. [Google Scholar] [CrossRef]
Xu, J.; Ma, J. Auto Parts Defect Detection Based on Few-shot Learning. In Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; pp. 943–946. [Google Scholar]
Yu, R.; Guo, B.; Yang, K. Selective Prototype Network for Few-Shot Metal Surface Defect Segmentation. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 13–18 July 2020; pp. 1597–1607. [Google Scholar]

Figure 1. The overall architecture.

Figure 2. The features of wear debris defect have small inter-category variance, between fatigue and copper.

Figure 3. InfoNCE Loss with alignment module.

Figure 4. The size of cutting in wear debris defect dataset varies widely.

Figure 5. Multi-Scale Fusion.

Figure 6. Weld defect dataset.

Figure 7. Wear debris defect dataset.

Figure 8. The detection result of single-branch and two-branch object detection models.

Figure 9. Average bounding box size by defect category.

Figure 10. The detection result of Aligned Contrastive Loss and Multi-Scale Fusion on MS COCO dataset.

Table 1. The number of each category in weld defect dataset.

Defect Name	Round	Crack	Icf	Lop	Bar
number	7564	397	342	353	372

Table 2. The number of each category in wear debris defect dataset.

Defect Name	Copper	Cutting	Fatigue	Severe Sliding	Spherical
number	275	248	402	253	619

Table 3. Detection result for Aligned Contrastive Loss and Multi-Scale Fusion.

	Method	AP	AP50	AP75	APs	APm	APl
5way-5shot	Baseline	10.46	18.93	11.12	0.96	5.60	13.99
	Baseline+Aligned ¹	11.47	20.62	11.34	15.72	6.80	15.13
	Baseline+Fusion ²	12.47	21.87	13.32	12.74	5.69	16.08
5way-10shot	Baseline	11.94	21.50	12.53	5.94	7.22	15.94
	Baseline+Aligned	12.62	23.60	12.58	21.26	7.36	16.56
	Baseline+Fusion	14.46	25.59	15.50	13.30	7.99	18.78
5way-30shot	Baseline	16.02	27.89	17.22	0.69	7.90	20.53
	Baseline+Aligned	17.87	32.07	18.77	15.91	9.87	22.20
	Baseline+Fusion	18.98	32.31	20.65	11.33	9.88	24.81

¹ Aligned: Aligned Contrastive Loss. ² Fusion: Multi-Scale Fusion.

Table 4. AP accuracy for each category.

	Method	Copper	Cutting	Fatigue	Severe Sliding	Spherical
5way-5shot	Baseline	17.926	3.155	11.529	14.821	4.868
	Baseline+Aligned ¹	20.572	2.356	12.776	16.697	4.940
	Baseline+Fusion ²	21.065	4.751	15.082	16.515	4.932
5way-10shot	Baseline	20.077	7.880	11.174	14.137	6.433
	Baseline+Aligned	24.154	6.239	12.052	14.886	5.785
	Baseline+Fusion	29.270	7.755	14.299	14.304	6.654
5way-30shot	Baseline	31.969	14.150	13.690	14.191	6.085
	Baseline+Aligned	34.841	14.859	15.697	17.677	6.149
	Baseline+Fusion	36.083	13.202	18.857	18.474	8.290

¹ Aligned: Aligned Contrastive Loss. ² Fusion: Multi-Scale Fusion.

Table 5. Ablation result.

	Aligned Contrastive Loss	Multi-Scale Fusion	AP	AP50	AP75
			10.46	18.93	11.12
5way-5shot	✓		11.47	20.62	11.34
	✓	✓	11.85	21.43	12.28
			11.94	21.50	12.53
5way-10shot	✓		12.62	23.60	12.58
	✓	✓	13.94	25.04	14.08
			16.02	27.89	17.22
5way-30shot	✓		17.84	32.07	18.77
	✓	✓	19.08	34.54	19.29

Table 6. The ablation result for Alignment module in Aligned Contrastive Loss.

	w/o Alignment Module	w/ Alignment Module	AP	AP50	AP75
5way-5shot	✓		11.14	20.44	11.43
5way-5shot		✓	11.47	20.62	11.34
5way-10shot	✓		12.14	22.16	12.38
5way-10shot		✓	12.62	23.60	12.58
5way-30shot	✓		17.52	30.71	18.49
5way-30shot		✓	17.84	32.07	18.77

Table 7. The ablation result for using support information in Multi-Scale Fusion.

	w/o Support Information	w/ Support Information	AP	AP50	AP75
5way-5shot	✓		12.53	22.15	13.50
5way-5shot		✓	12.47	21.87	13.32
5way-10shot	✓		14.31	25.46	15.10
5way-10shot		✓	14.46	25.59	15.50
5way-30shot	✓		17.05	30.26	18.97
5way-30shot		✓	18.98	32.31	20.65

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Li, L.; Wang, H. Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning. Appl. Sci. 2022, 12, 11893. https://doi.org/10.3390/app122311893

AMA Style

Li H, Li L, Wang H. Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning. Applied Sciences. 2022; 12(23):11893. https://doi.org/10.3390/app122311893

Chicago/Turabian Style

Li, Hang, Li Li, and Hongbing Wang. 2022. "Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning" Applied Sciences 12, no. 23: 11893. https://doi.org/10.3390/app122311893

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defect Detection for Wear Debris Based on Few-Shot Contrastive Learning

Abstract

1. Introduction

2. Related Works

3. Method

3.1. Problem Definition

3.2. Overall Architecture

3.3. Aligned Contrastive Loss

3.4. Multi-Scale Fusion

4. Experiment

4.1. Dataset

4.2. Implementation

4.3. Evaluation

4.3.1. Comparison between Single-Branch and Two-Branch Object Detection Models

4.3.2. Evaluation of Aligned Contrastive Loss

4.3.3. Evaluation of Multi-Scale Fusion

4.3.4. Result on MS COCO Dataset

4.4. Ablation

4.4.1. Ablation for Aligned Contrastive Loss and Multi-Scale Fusion

4.4.2. Ablation for Alignment Module in Aligned Contrastive Loss

4.4.3. Ablation for Using Information of Support Branch in Multi-Scale Fusion

5. Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI