Next Article in Journal
Prediction of the Remaining Useful Life of Bearings Through CNN-Bi-LSTM-Based Domain Adaptation Model
Previous Article in Journal
UAV Geo-Localization Dataset and Method Based on Cross-View Matching
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Pseudo-Labeling Multi-Screening-Based Semi-Supervised Learning Method for Few-Shot Fault Diagnosis

1
College of Mechanical Engineering and Automation, Foshan University, Foshan 528200, China
2
School of Automation, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(21), 6907; https://doi.org/10.3390/s24216907
Submission received: 27 September 2024 / Revised: 24 October 2024 / Accepted: 25 October 2024 / Published: 28 October 2024
(This article belongs to the Section Fault Diagnosis & Sensors)

Abstract

:
In few-shot fault diagnosis tasks in which the effective label samples are scarce, the existing semi-supervised learning (SSL)-based methods have obtained impressive results. However, in industry, some low-quality label samples are hidden in the collected dataset, which can cause a serious shift in model training and lead to the performance of SSL-based method degradation. To address this issue, the latest prototypical network-based SSL techniques are studied. However, most prototypical network-based scenarios consider that each sample has the same contribution to the class prototype, which ignores the impact of individual differences. This article proposes a new SSL method based on pseudo-labeling multi-screening for few-shot bearing fault diagnosis. In the proposed work, a pseudo-labeling multi-screening strategy is explored to accurately screen the pseudo-labeling for improving the generalization ability of the prototypical network. In addition, the AdaBoost adaptation-based weighted technique is employed to obtain accurate class prototypes by clustering multiple samples, improving the performance that deteriorated by low-quality samples. Specifically, the squeeze and excitation block technique is used to enhance the useful feature information and suppress non-useful feature information for extracting accuracy features. Finally, three well-known bearing datasets are selected to verify the effectiveness of the proposed method. The experiments illustrated that our method can receive better performance than that of the state-of-the-art methods.

1. Introduction

As an indispensable component in industrial applications, mechanical equipment is an important force in promoting sustainable development and industrial upgrading [1,2]. However, any tiny failure may cause production downtime or even catastrophic consequences. Furthermore, it is prone to component failure when the mechanical equipment operates under high loads for a long time [3]. It is of great significance to carry out an equipment fault diagnosis study to improve equipment safety and reliability, which has attracted increasing attention in the industrial safety community [4,5].
In the last decade, there has been a rapid development of information technology, which brings new perspectives and challenges for the traditional fault diagnosis methods of rotating machinery and promotes the development of fault diagnosis from traditional shallow models to deep learning models. Zhang et al. [6] proposed an improved residual network (ResNet) based on hybrid attention for wind turbine gearbox fault diagnosis. Shao et al. [7] presented a novel deep belief network (DBN) based on convolutional for bearing fault diagnosis. He et al. [8] explored a transfer learning fault diagnosis method based on a convolutional neural network (CNN), Shao et al. [9] provided a modified stacked autoencoder (SAE) based on adaptive Morlet wavelet for rotating machinery fault diagnosis. Nie et al. [10] developed a fault diagnosis framework to relax the impact of noisy labels with recurrent neural networks (RNN). These existing deep learning models can achieve results and overcome the shortcomings of shallow models, which heavily rely on manual feature extraction. However, the number of samples selected to train the deep model will seriously affect the deep model training accuracy. Moreover, in real industrial applications, it is difficult or even impossible to collect a large amount of label data, which gives the deep learning-based methods poor generalization ability. To this end, it is essential to capture discriminative knowledge from limited training data to obtain a generalized deep model.
FSL (FSL) is an impressive scenario to utilize limited labeled samples to quickly learn and achieve stable classification results, which has received widespread attention and obtained encouraging progress [11,12]. Up to now, there several FLS methods have been reported, such as Prototypical Network (ProNet) [13,14], Match Network (MatNet) [15,16], Siamese Network (SiaNet) [17,18], etc. Among them, the ProNet transforms the classification problem into a distance measurement problem in the feature embedding space, which has lower time complexity and wildly applied in pattern recognition fields. Chowdhury et al. [19] used the maximum mean discrepancy (MMD) to evaluate the influence of distributions, including and excluding the sample and obtained the sample weights by subtracting from 1 only. Wang et al. [20] presented a weight prototypical network for bearing fault diagnosis, and the Kullback–Leibler (KL) divergence was adopted to estimate the influence of specific samples from a sample distribution. Gao et al. [21] designed a novel prototypical network for noisy few-shot problems based on instance-level and feature-level attention schemes to accentuate the significance of instances and features, respectively. Ye et al. [22] proposed a learning with a strong teacher framework for few-shot learning, in which a strong classifier was constructed to supervise the few-shot learner for image recognition. Zhao et al. [23] employed a dual adaptive representation alignment network for cross-domain few-shot learning, which can update the support instances as prototypes and renew the prototypes with the differentiable. To this end, the above-mentioned FSL-based methods provide a new idea and make certain progress in solving the problem of training scarce samples. However, they only focus on how to evaluate the weight of samples and do not overcome the limitation of small samples. To tackle the problem of FSL from the root, semi-supervised learning (SSL), utilizing a few labeled samples and massive unlabeled samples to improve learning performance, can be divided into three categories: adversarial generation, consistency regularization, and pseudo-labeling. Pseudo-labeling techniques—explored to label unlabeled samples, which are easier to obtain for expanding the training set—have been paid increasing attention recently. He et al. [24] proposed a semi-supervised prototypical network based on pseudo-labeling for bearing fault diagnosis. A fixed threshold was used to select pseudo-labeling and obtain the optimal threshold by a large number of experiments. Fan et al. [25] presented a semi-supervised fault diagnosis method, screened pseudo-labeling with thresholds, and adjusted the dependence of the model on pseudo-labeling through learning. Zhang et al. [26] explored a self-training semi-supervised method that selects unlabeled data with high predictive confidence on a trained model and extracts pseudo-labeling iteratively. Zhang et al. [27] adopted Monte Carlo uncertainty as the threshold to screen the pseudo-labeling and built a gearbox fault diagnosis scenario with small samples based on a momentum prototypical network. Zhou et al. [28] designed an adaptive prototypical network for few-shot learning with sample quality and pseudo-labeling screening to weaken the impact of unreliable pseudo-labeling.
Most of these existing SSL methods based on pseudo-labeling learning achieved impressive accuracy in few-shot learning tasks by increasing the number of training samples by labeling the unlabeled samples. However, there are still the following limitations: (1) A single threshold selected for pseudo-labeling screening cannot guarantee the accuracy of pseudo-labeling collections, which dramatically degrades the performance of SSL methods. (2) Insufficient consideration of iteration-stopping conditions can easily lead to the propagation and accumulation of incorrect information during the iteration process. To resolve the trouble of network degradation caused by the low accuracy of pseudo-labeling screening and insufficient selection samples, a semi-supervised learning method based on a pseudo-labeling multi-screening strategy for a few-shot bearing fault diagnosis is proposed. In this paper, a composite threshold for pseudo-labeling screening combined with Monte Carlo uncertainty and classification probability is explored to overcome the limitations of pseudo-labeling screening with a single threshold. Then, a multi-pseudo-labeling accumulation model based on network optimization is employed to solve the problem of network degradation caused by mislabeling. Finally, three well-known bearing datasets were used to verify the effectiveness of the proposed model. The main contributions are summarized as follows:
(1) A multi-screening strategy based on Monte Carlo uncertainty and classification was proposed for pseudo-labeling selection, which can assist in ensuring the accuracy of pseudo-labeling screening and improve generalization ability.
(2) A semi-supervised learning method based on AdaBoost adaptation was explored to integrate multiple samples into a class prototype to obtain a more accurate class prototype, which can overcome the drawbacks caused by low-quality label samples hidden in the dataset.
(3) An estimation strategy for individual sample contribution rate was presented to accurately obtain individual sample weights for improving the performance of AdaBoost adaptation, which can tackle the problem that ignored the impact from individual differences.

2. Theoretical Background

2.1. Few-Shot Learning

Learning classification information from limited labeled samples to new samples to improve generalization ability, FSL increases attention in pattern recognition and has achieved interesting progress [29,30].
For a typical FSL task T, the whole dataset was described as S T = ( S t r n , S t s t ) , where the S t r n = { ( x i , y i ) } i = 1 N t r n and S t s t = { x j } denoted the training set and testing set with x i , x j X T X , y i Y T Y , respectively. The samples x i , x j come from a particular domain D T = { X T , P ( X T ) } , which consists of a data space X T and a marginal probability distribution P ( X T ) . In task T, the K samples were selected from N and randomly classed in S t r n , i.e., N-way, K-shot [31], aiming to generate an objective prediction function f F : X Y to predict the samples in S t s t . It was an enormous challenge to achieve a high-accuracy model in the training process under the limited samples in S t r n . Consequently, in the majority of instances, a supervised query dataset is employed: S Q = ( x i q , y i q ) i = 1 N s u p , x i q X ϱ X , y i q Y ϱ Y , which was selected to randomly assess task T.

2.2. Prototypical Network

As one of the most attention-attracting techniques of FSL, the prototypical network (PN) aims to generate a prototype for each class with labeled data being widely applied to image recognition and fault diagnosis [32,33].
Specifically, given a support set S s , the average of the feature vectors from the same class was used to define the prototype; thus, prototype P l of class l is expressed as:
P l = 1 N x s l S s f φ ( x n l )
where f φ ( · ) is the feature extractor, x n l represents the n-th sample of class l, and N denotes the number of samples in class l.
For unlabeled samples, the Euclidean distance between the sample and the prototypes of each category is calculated, which is normalized by the Softmax function to obtain the probability that the sample belongs to each class. The metric-based meta-learning method of prototypical networks is an effective means of alleviating the overfitting problem that can arise when insufficient data are available.

2.3. AdaBoost

Ensemble learning is a type of method that combines multiple models through specific mechanisms to obtain a more robust model. As a classic ensemble algorithm, the main idea of AdaBoost is to construct a strong classifier by the linear combination of several weak classifiers [34]. The performance requirements for weak classifiers need not be too high, they just need to be better than random assumptions. Weak classifiers with high accuracy were given higher weights; conversely, their weights were reduced. Assuming the error rate of a weak classifier is ρ , its weight is:
= 1 2 log 1 ρ ρ
Thus, multiple base classifiers are weighted and combined to improve classification performance:
G ( x ) = s i g n H ( x )

3. Semi-Supervised Learning Method of the Proposed

In this section, a pseudo-labeling multi-screening-based semi-supervised learning method for few-shot fault diagnosis is proposed. The overall structure is presented in Figure 1, which includes three main components: (1) AdaBoost-based adaptive weighted prototypical network (AWPN); (2) pseudo-labeling multi-screening strategy; and (3) semi-supervised learning-based fault diagnosis.

3.1. Squeeze and Excitation-Based Feature Extractor

To make the model pay attention to the differences between different perspectives in the learning process and automatically learn the importance of features from different perspectives, Roy et al. [35] proposed spatial and channel squeeze and channel excitation (scSE) for achieving feature recalibration in both space and channel.
The application of scSE to one-dimensional data is given in Figure 2. Given that an input feature set U = [ u 1 , u 2 , , u c ] R D × C is a combination of C channels u i R D × 1 , and can also be rewritten as a combination of D feature layer slices U = [ u 1 , u 2 , . . . , u D ] . Vector z R 1 × C is generated by spatial squeeze, which is executed by the global average pooling layer, and vector o R D × 1 is generated by channel squeeze, obtained by convolution W s U , W s R 1 × C × l . It is actually a projection of multi-channel features at a feature level. The vector z is converted into vector z ^ through two fully connected layers W 1 C R C × C r 1 , W 2 C   C × C r 1 , where r 1 represents the bottleneck in the channel excitation. To ensure that the excitation channel remains within an appropriate range, z ^ is mapped to [0, 1] through a sigmoid function σ ( z ^ ) . It is worth noting that after channel squeeze, the obtained feature projection is still applicable to the encode–decode operation. Therefore, two fully connected layers W 1 S R D × D r 2 , W 2 S R D r 2 × D are used to convert o to o ^ . r 2 represents the bottleneck in the spatial excitation, and the sigmoid function σ ( o ^ ) is also used to keep o ^ within an appropriate range. Finally, the calibrated features are used in a max-out manner.
The calculation process of scSE is shown in Equations (4) to (10):
z = 1 D i = 1 D u ( i )
z ^ = W 1 c ( δ ( W 2 c z ) )
U ^ c = [ σ ( z ^ 1 ) u 1 , σ ( z ^ 2 ) u 2 , , σ ( z ^ C ) u C ]
o = W s U
o ^ = W 1 s ( δ ( W 2 s o ) )
U ^ s = [ σ ( o ^ 1 ) u 1 , σ ( o ^ 2 ) u 2 , , σ ( o ^ D ) u D ]
U ^ s c = max ( U ^ c , U ^ s )
where δ ( · ) and σ ( · ) are the ReLU function and sigmoid function, respectively. * represents the convolution operation.

3.2. Adaptive Weighted Prototypical Network

Inspired by the AdaBoost theory that weak classifiers can be integrated into strong classifiers, a prototypical network is proposed that adaptively weights the features into strong feature representation. First, each sample is treated as a weak classifier, its weight is calculated by measuring the influence of missing specific samples against the whole sample distribution, which is weighted to build a strong feature representation, that is, class prototype.
As a commonly used criterion, maximum mean discrepancy is adopted to widely measure the distribution discrepancy between the two domains. For a given feature set U = { f φ ( x 1 ) , f φ ( x 2 ) , , f φ ( x n ) } , f φ ( · ) is the feature extractor based on squeeze and excitation mechanism. U t = { f φ ( x 1 ) , f φ ( x 2 ) , f φ ( x t 1 ) , f φ ( x t + 1 ) , f φ ( x n ) } represents the absence of feature f φ ( x t ) in feature set U. Therefore, the influence of sample x t against the distribution of the sample set can be converted to calculate the maximum mean distribution discrepancy between U and U t , as shown below:
m t = M M D 2 ( U , U t ) = 1 n i = 1 n ϕ ( f φ ( x i ) ) 1 n 1 i = 1 , i t n ϕ ( f φ ( x i ) ) H 2
where ϕ ( · ) represents the mapping function x H = < x , x > H for any x H . The smaller m t is, the closer the samples are, and vice versa; the sample deviates from the sample distribution. When m t = 0 , U and U t are the same distribution. AdaBoost does not have high performance requirements for weak classifiers and only needs to be better than the random hypothesis, so m t is projected to [0, 0.5) by (12).
m t = 1 1 + e m t 0.5
Therefore, the weight of feature extractor f φ ( x t ) is rewritten:
α t = 1 2 ln 1 m t m t
Noting that m t may be set to 0, a sufficiently small positive number ε should be added into the denominator term in (13).
Define the support set S s = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) } , y i { 1 , 2 , L } as the label for L-class samples. S l is the set of samples with the labeled class l, and the prototype P l of support class l can be calculated by
P i = i = 1 n ( x i , y i ) S i α i f φ ( x i ) i = 1 n α i
Assuming that sample x needs to be classified, the feature extractor is used to obtain the feature space, and then the Euclidean distance d ( · ) is adopted to compute the f ϕ ( x ) and L prototype vectors, respectively. The probability that sample x belongs to category l is:
Pr ( y = l x ) = exp ( d ( f ϕ ( x ) , P i ) ) i = 1 L exp ( d ( f ϕ ( x ) , P i ) )
Therefore, the loss function J ( x , y ) of the query set S q is:
J ( x , y ) = ( x , y ) S q log p ( y = l | x )

3.3. Pseudo-Labeling Multi-Screening Strategy

For data-driven classification models, the number of trainable samples affects the accuracy of the model, especially in semi-supervised learning. In this paper, a pseudo-labeling multi-screening strategy based on uncertainty and classification probability is proposed, which can effectively expand the training set and improve the model training accuracy.
There is a positive correlation between network correction and model output uncertainty. The lower the uncertainty of the model, the smaller the network correction error, and the higher the accuracy of the model [36]. Therefore, the uncertainty of the model is used as one of the indicators of pseudo-label screening. The uncertainty of the output of each sample is calculated by using the Monte Carlo dropout model. In the forward propagation of dropout layers in the testing phase, the Monte Carlo dropout was employed to generate output distributions that emulate the variability observed across different network architectures. The predictive outcomes and the uncertainty of the model are calculated by averaging the outputs and the statistical variance.
Supposing f d t P N = { f d 0 P N ( x ) , , f d t P N ( x ) } is the output of PN after t iterations of random dropout, its uncertainty can be calculated as in Equation (17) [37].
q = 1 t j = 0 t [ f d j P N ( x ) μ ] 2
where μ = 1 t j = 0 t f d j P N ( x ) represents the predicted posterior mean. The model architectures do not need to be modified, which can reduce the overfitting of the network and improve the computation efficiency. It assesses the predictive mean and model uncertainty by collecting the results of stochastic forward passes.
A pseudo-labeling multi-screening strategy based on the dual threshold of Monte Carlo dropout uncertainty and softmax output probability is constructed as:
g i = 1 , q i τ q a n d p i τ p 0 , o t h e r w i s e
where q i is the estimated uncertainty of sample i , p i is the maximum value of the predicted probability. τ p and τ q represent the thresholds of uncertainty and prediction probability, respectively. When g i is l, it means that sample i is filtered as a pseudo-labeled sample.
In order to select as many trainable samples as possible on the premise of ensuring the accuracy of pseudo-labeling, a multi-accumulation strategy is proposed in this paper. The pseudo-label samples selected in the previous round were combined with the training samples to update the AWPN, and the updated AWPN was used for a new round of pseudo-label screening, which was accumulated layer-by-layer until the conditions for stopping were met. If incorrect pseudo-labeling is added to the shallow network, the error information will accumulate in the iteration up to the deep network layer, greatly reducing the accuracy of fault diagnosis. Therefore, a timely stop accumulation strategy is the key to ensure the accuracy of the model. The labeled training set is used to judge whether the network is degraded. If the addition of pseudo-labeling reduces the accuracy of the AWPN on the training set, it indicates that the false pseudo-labeling causes network degradation, and the accumulation strategy is stopped. The accumulation strategy stops in two cases: 1. the filtered pseudo-labeling sample is empty; 2. network degradation.

3.4. Overview of the Proposed Method

The semi-supervised few-shot learning based on an adaptive prototypical network and multiple accumulation of pseudo-labeling samples is proposed in this article, with the pseudo code shown in Algorithm 1.
Algorithm 1: The proposed learning strategy
Sensors 24 06907 i001

4. Results of Experiments

4.1. Dataset Description

Dataset A: The CWRU dataset is a classic dataset in bearing fault diagnosis and is widely used in the field of bearing fault diagnosis [38]. The fault types are divided into inner race fault (IF), outer race fault (OF), roller fault (RF), and normal state (N). Each type of fault is composed of vibration signals collected under different working conditions, with a sampling frequency of 12 kHz. In this experiment, a sliding window with a length of 2048 and a step size of 80 was used to obtain vibration data samples. The CWRU dataset experimental platform is shown in Figure 3, and the data introduction is shown in Table 1.
Dataset B: The petrochemical dataset is collected in industrial environments. Under industrial operating conditions, the collected vibration signals usually contain more noise [39]. For example, environmental noise, temperature changes, equipment aging, and other factors may have an impact on vibration signals, making the data more in line with the actual conditions of industrial operation. The petrochemical experiment platform is shown in Figure 4. The dataset contains six different fault states, normal, defect (s) in gearwheels (F1), defect (s) in gearwheels along with the outer-ring wear of the left-hand side bearing (F2), defect (s) in gearwheels along with the inter-ring wear of the left-hand side bearing (F3), defect (s) in gearwheels along with the absence of balls on the left-hand side bearing (F4), defect (s) in pinions along with the defect (s) in gearwheels (F5). A vibration sensor mounted on the bearing seat is used to collect vibration acceleration signal data, and the data introduction is shown in Table 2.
Dataset C: The IMS bearing dataset was constructed by the Intelligent Maintenance Systems Center at the University of Cincinnati in the United States [40]. The bearings are subjected to simulated degradation tests from a bearing test bench consisting of a 2000 RPM motor and four bearings mounted on the same shaft. During the experiment, the vibration signal generated by the bearing during the operation of the test bench is collected by installing an accelerometer on the bearing seat, with a sampling frequency of 20 kHz. This dataset is the full life cycle data of bearings. In multiple running fault tests, three bearings occurred: outer race failure (ORF), inner race failure (IRF), and ball failure (BF). The data introduction is shown in Table 3.

4.2. Results and Analysis

In order to prove the effectiveness of the proposed, three methods are selected for comparative experiments:
The kernel principal analysis-based semi-supervised prototypical network (K-kernel PN) [24], in which the classical prototypical network is used to realize fault diagnosis with the principal component analysis of the Gaussian kernel function to process the original vibration signal.
The robust re-weighting prototypical networks (WProNet) [41], which incorporate a re-weighting mechanism to set a weight for each summation item.
(IPNet) [19], which adjusts the weights of prototypical networks based on the largest average differences between data.
In order to ensure the fairness of comparison, the parameters of the three data sets are the optimal parameters provided in the paper, and all the results are obtained by running 10 times. A time window of length 1024 and step length 128 is used to divide the original oscillation signal. Each type of fault randomly selects 500 samples as the training set, with 1400 samples as the test set. The unlabeled test set is also used for the pseudo-label screening. Table 4, Table 5 and Table 6 show the simulation results of the four methods in three data sets.
It can be seen from the results in Table 4, Table 5 and Table 6 that the proposed method is superior to other models in all tasks of small sample classification. WProNet and IPNet use weighted prototypical networks, but the weighting method is too simple and does not use pseudo-labeling technology to increase the number of available samples, resulting in poor performance in the petrochemical dataset. Although pseudo-labeling technology is used in PSSPN, a single criterion cannot guarantee the accuracy of screening, so the accuracy is lower than that of the proposed method, which proves the effectiveness of the AWPN and dual-threshold multi-accumulation pseudo-labeling screening strategy.

4.3. Ablation Experiment

To demonstrate the effectiveness of the adaptive weighting module and pseudo-labeling screen module, we conducted ablation experiments on three datasets, and the experimental results are shown in Table 7. The original prototypical network incorporates pseudo-labeling screening (PNIPS) to compare the effectiveness of the adaptive weighting module. The AWPN does not include a pseudo-labeling module to compare the effectiveness of the pseudo-labeling screen. The experimental results show that the addition of adaptive weights can generate a more representative class prototype. The screen of pseudo-labeling can increase the amount of valid data. Therefore, both can improve the classification accuracy of the mode.

4.4. Visualization Analysis

The t-distributed random domain embedding (t-SNE) is used to display the visualization images of the CWRU dataset in the 10-way 5-shot scenario under the four models mentioned in Section 4.2. The experimental results are shown in Figure 5; as can be seen from the figure, the same type of features in our model are close to each other, and the overall boundary is relatively obvious. Different class features are far from each other, resulting in the best classification performance. However, the classification performance of other models still needs to be improved, with a few features of different classes crossing each other and a few features overlapping together. Possible reasons may be the misclassification of the model, as well as the prototype generated from the extracted features not maximizing the distance between different classes as much as possible.
Figure 6 shows the confusion matrices obtained from the classification of the CWRU dataset in the case of 10-way 5-shot. According to the testing results, the WProNet and K-kernel PN models performed poorly in classifying label 4 (inner race fault with a fault diameter of 0.007) and label 7 (outer race faults with a fault diameter of 0.007), while performing well for other categories. The classification performance of IPNet is better than that of WProNet and K-kernel PN, but it still performs poorly in classifying label 4. WProNet and IPNet do not use pseudo-labeling techniques to increase the number of trainable samples, which can lead to overfitting in the training. K-kernel PN has conducted pseudo-label screening and uses the original prototype network, causing the network performance to degrade. Among these four methods, the proposed method had the best classification effect.

4.5. Related Parameters

The hyperparameters involved here are mainly the bottlenecks r 1 and r 2 representing channel excitation and spatial excitation, and τ q and τ p represent uncertainty and category probability thresholds. Appropriate r 1 and r 2 can improve the efficiency of feature extraction, enabling the extracted features to accurately depict different types of fault expressions, and τ q and τ p determine the accuracy of pseudo-labeling screening. Therefore, the selection of parameters will greatly affect the performance of the model. For r 1 and r 2 , the values are usually { 2 , 4 , 8 , 16 } . In order to remove the influence of parameters in pseudo-labeling screening, the adaptive weighted prototypical network (without pseudo-labeling screening) and CWRU dataset were selected for parameter evaluation, and the results were obtained as shown in Figure 7; the optimal parameters are r 1 = 2 and r 2 = 16 . The detailed system parameter settings in this article are shown in Table 8 and Table 9.

4.6. Validity of Dual-Threshold Pseudo-Labeling Screen

The range of uncertainty threshold τ q is { 10 , 1 , 0.1 , 0.01 } , and the range of output probability threshold τ p is { 0.6 , 0.7 , 0.8 , 0.9 } . The results obtained by adding a pseudo-labeling screen module to the AWPN with determined parameters are shown in Figure 8, and the optimal parameters are τ q = 0.1 and τ p = 0.9 .

4.7. Effectiveness of Dual-Threshold Pseudo-Labeling Screen

The output probability is usually selected as the criterion for false label screening, and a single threshold is prone to producing false pseudo-labeling. In this paper, the dual-threshold of output probability and uncertainty are adopted, and multiple selection strategies are adopted to screen out more samples without reducing the accuracy of pseudo-labeling. To demonstrate the effectiveness of the dual threshold strategy, a single output probability threshold and a single uncertainty threshold were selected as the comparison terms for the dual threshold strategy. In the application of the CWRU dataset, the accuracy of the model is shown in Table 10.

5. Conclusions

The article proposed a pseudo-labeling multi-screening-based semi-supervised learning method for few-shot fault diagnosis to improve the generalization ability under limited label samples. The proposed method consists of a squeeze and excitation block-based feature extractor, an AdaBoost adaptation-based prototypical clustering, and a pseudo-labeling multi-screening strategy. The feature extractor of the proposed method achieves effective information by squeeze and excitation; high accuracy pseudo-labeling samples are generated with pseudo-labeling screening to expand the number of trainable data points to verify the effectiveness of the proposed method compared with the existing methods in two ways (5-way and 10-way) with three shots (1-shot, 5-shot, and 10-shot). A comparison result of feature visualization and confusion matrix with other related scenarios illustrated that the proposed method has significant performance in the different shots diagnosis tasks. However, our proposed method still has certain limitations, which reduced the time cost in the process of pseudo-labeling multi-screening. Therefore, in future research, it is necessary to study the method with simpler structure and lower complexity.

Author Contributions

Conceptualization, S.L. and Z.Z.; methodology, Z.C. (Zibin Chen) and J.H.; validation, X.C.; writing—original draft preparation, S.L.; writing—review and editing, J.H.; supervision, Z.C. (Zhiwen Chen). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gawde, S.; Patil, S.; Kumar, S.; Kamat, P.; Kotecha, K.; Abraham, A. Multi-fault diagnosis of Industrial Rotating Machines using Data-driven approach: A review of two decades of research. Eng. Appl. Artif. Intell. 2023, 123, 106139. [Google Scholar] [CrossRef]
  2. Zhou, D.; Zhao, Y.; Wang, Z.; He, X.; Gao, M. Review on Diagnosis Techniques for Intermittent Faults in Dynamic Systems. IEEE Trans. Ind. Electron. 2020, 67, 2337–2347. [Google Scholar] [CrossRef]
  3. Liu, D.; Cui, L.; Cheng, W. A review on deep learning in planetary gearbox health state recognition: Methods, applications, and dataset publication. Meas. Sci. Technol. 2024, 35, 012002. [Google Scholar] [CrossRef]
  4. Brito, L.C.; Susto, G.A.; Brito, J.N.; Duarte, M.A.V. An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. Mech. Syst. Signal Process. 2022, 163, 108105. [Google Scholar] [CrossRef]
  5. Zhao, C.; Shen, W. Adversarial Mutual Information-Guided Single Domain Generalization Network for Intelligent Fault Diagnosis. IEEE Trans. Ind. Inform. 2023, 19, 2909–2918. [Google Scholar] [CrossRef]
  6. Zhang, K.; Tang, B.; Deng, L.; Liu, X. A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox. Measurement 2021, 179, 109491. [Google Scholar] [CrossRef]
  7. Shao, H.; Jiang, H.; Zhang, H.; Liang, T. Electric Locomotive Bearing Fault Diagnosis Using a Novel Convolutional Deep Belief Network. IEEE Trans. Ind. Electron. 2018, 65, 2727–2736. [Google Scholar] [CrossRef]
  8. He, J.; Li, X.; Chen, Y.; Chen, D.; Guo, J.; Zhou, Y. Deep Transfer Learning Method Based on 1D-CNN for Bearing Fault Diagnosis. Shock Vib. 2021, 2021, 6687331. [Google Scholar] [CrossRef]
  9. Shao, H.; Xia, M.; Wan, J.; de Silva, C.W. Modified Stacked Autoencoder Using Adaptive Morlet Wavelet for Intelligent Fault Diagnosis of Rotating Machinery. IEEE-ASME Trans. Mechatron. 2022, 27, 24–33. [Google Scholar] [CrossRef]
  10. Nie, X.; Xie, G. A Fault Diagnosis Framework Insensitive to Noisy Labels Based on Recurrent Neural Network. IEEE Sens. J. 2021, 21, 2676–2686. [Google Scholar] [CrossRef]
  11. Wang, J.; Liu, K.; Zhang, Y.; Leng, B.; Lu, J. Recent advances of few-shot learning methods and applications. Sci. China Technol. Sci. 2023, 66, 920–944. [Google Scholar] [CrossRef]
  12. Liang, X.; Zhang, M.; Feng, G.; Wang, D.; Xu, Y.; Gu, F. Few-Shot Learning Approaches for Fault Diagnosis Using Vibration Data: A Comprehensive Review. Sustainability 2023, 15, 14975. [Google Scholar] [CrossRef]
  13. Zhan, Z.; Zhou, J.; Xu, B. Fabric defect classification using prototypical network of few-shot learning algorithm. Comput. Ind. 2022, 138, 103628. [Google Scholar] [CrossRef]
  14. Su, Z.; Zhang, X.; Wang, G.; Wang, S.; Luo, M.; Wang, X. The Semisupervised Weighted Centroid Prototype Network for Fault Diagnosis of Wind Turbine Gearbox. IEEE-ASME Trans. Mechatron. 2023, 29, 1567–1578. [Google Scholar] [CrossRef]
  15. Gou, Y.; Fu, X. Broadband Electrical Impedance Matching of Transmitter Transducer for Acoustic Logging While Drilling Tool. IEEE Sens. J. 2022, 22, 1382–1390. [Google Scholar] [CrossRef]
  16. Peng, Q.; Wang, W.; Liu, H.; Wang, Y.; Xu, H.; Shao, M. Towards comprehensive expert finding with a hierarchical matching network. Knowl.-Based Syst. 2022, 257, 109933. [Google Scholar] [CrossRef]
  17. Zhang, H.; Xing, W.; Yang, Y.; Li, Y.; Yuan, D. SiamST: Siamese network with spatio-temporal awareness for object tracking. Inf. Sci. 2023, 634, 122–139. [Google Scholar] [CrossRef]
  18. Chiplunkar, R.; Huang, B. Siamese Neural Network-Based Supervised Slow Feature Extraction for Soft Sensor Application. IEEE Trans. Ind. Electron. 2021, 68, 8953–8962. [Google Scholar] [CrossRef]
  19. Chowdhury, R.R.; Bathula, D.R. Influential Prototypical Networks for Few Shot Learning: A Dermatological Case Study. arXiv 2021, arXiv:2111.00698. [Google Scholar]
  20. Wang, Z.; Shen, H.; Xiong, W.; Zhang, X.; Hou, J. Method for Diagnosing Bearing Faults in Electromechanical Equipment Based on Improved Prototypical Networks. Sensors 2023, 23, 4485. [Google Scholar] [CrossRef]
  21. Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6407–6414. [Google Scholar]
  22. Ye, H.J.; Ming, L.; Zhan, D.C.; Chao, W.L. Few-Shot Learning With a Strong Teacher. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 1425–1440. [Google Scholar] [CrossRef] [PubMed]
  23. Zhao, Y.; Zhang, T.; Li, J.; Tian, Y. Dual Adaptive Representation Alignment for Cross-Domain Few-Shot Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11720–11732. [Google Scholar] [CrossRef] [PubMed]
  24. He, J.; Zhu, Z.; Fan, X.; Chen, Y.; Liu, S.; Chen, D. Few-Shot Learning for Fault Diagnosis: Semi-Supervised Prototypical Network with Pseudo-Labels. Symmetry 2022, 14, 1489. [Google Scholar] [CrossRef]
  25. Fan, C.; Liu, X.; Xue, P.; Wang, J. Statistical characterization of semi-supervised neural networks for fault detection and diagnosis of air handling units. Energy Build. 2021, 234, 110733. [Google Scholar] [CrossRef]
  26. Zhang, L.; Yang, L.; Ma, T.; Shen, F.; Cai, Y.; Zhou, C. A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data. Geoderma 2021, 384, 114809. [Google Scholar] [CrossRef]
  27. Zhang, X.; Su, Z.; Hu, X.; Han, Y.; Wang, S. Semisupervised Momentum Prototype Network for Gearbox Fault Diagnosis Under Limited Labeled Samples. IEEE Trans. Ind. Inform. 2022, 18, 6203–6213. [Google Scholar] [CrossRef]
  28. Zhou, F.; Xu, W.; Wang, C.; Hu, X.; Wang, T. A Semi-Supervised Federated Learning Fault Diagnosis Method Based on Adaptive Class Prototype Points for Data Suffered by High Missing Rate. J. Intell. Robot. Syst. 2023, 109, 93. [Google Scholar] [CrossRef]
  29. Che, C.C.; Wang, H.W.; Xiong, M.L.; Ni, X.M. Few-shot fault diagnosis of rolling bearing under variable working conditions based on ensemble meta-learning. Digit. Signal Process. 2022, 131, 103777. [Google Scholar] [CrossRef]
  30. Kang, S.; Liang, X.; Wang, Y.; Wang, Q.; Qiao, C.; Mikulovich, V.I. Few-shot rolling bearing fault classification method based on improved relation network. Meas. Sci. Technol. 2022, 33, 125020. [Google Scholar] [CrossRef]
  31. Wang, S.; Wang, D.; Kong, D.; Wang, J.; Li, W.; Zhou, S. Few-Shot Rolling Bearing Fault Diagnosis with Metric-Based Meta Learning. Sensors 2020, 20, 6437. [Google Scholar] [CrossRef]
  32. Wang, X.; Liang, J.; Xiao, Y.; Wang, W. Prototypical Concept Representation. IEEE Trans. Knowl. Data Eng. 2023, 35, 7357–7370. [Google Scholar] [CrossRef]
  33. Tian, X.M.; Chen, L.; Zhang, X.L.; Chen, E.R. Improved Prototypical Network Model for Forest Species Classification in Complex Stand. Remote Sens. 2020, 12, 3839. [Google Scholar] [CrossRef]
  34. Wang, R. AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review. Phys. Procedia 2012, 25, 800–807. [Google Scholar] [CrossRef]
  35. Roy, A.G.; Navab, N.; Wachinger, C. Recalibrating Fully Convolutional Networks With Spatial and Channel Squeeze and Excitation Blocks. IEEE Trans. Med. Imaging 2019, 38, 540–549. [Google Scholar] [CrossRef]
  36. Rizve, M.N.; Duarte, K.; Rawat, Y.S.; Shah, M. In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning. arXiv 2021, arXiv:2101.06329. [Google Scholar]
  37. Islam, M.F.; Zabeen, S.; Islam, M.A.; Rahman, F.B.; Ahmed, A.; Karim, D.Z.; Rasel, A.A.; Manab, M.A. How certain are tansformers in image classification: Uncertainty analysis with monte carlo dropout. In Proceedings of the Fifteenth International Conference on Machine Vision, Rome, Italy, 18–20 November 2022; Volume 12701. [Google Scholar]
  38. Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64–65, 100–131. [Google Scholar] [CrossRef]
  39. Zhang, J.; Zhang, Q.; He, X.; Sun, G.; Zhou, D. Compound-Fault Diagnosis of Rotating Machinery: A Fused Imbalance Learning Method. IEEE Trans. Control. Syst. Technol. 2021, 29, 1462–1474. [Google Scholar] [CrossRef]
  40. Qiu, H.; Lee, J.; Lin, J.; Yu, G. Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. J. Sound Vib. 2006, 289, 1066–1090. [Google Scholar] [CrossRef]
  41. Zhu, J.; Yi, X.; Guan, N.; Cheng, H. Robust Re-weighting Prototypical Networks for Few-Shot Classification. In Proceedings of the 2020 6th International Conference on Robotics and Artificial Intelligence, Singapore, 20–22 November 2020. [Google Scholar]
Figure 1. The overall structure diagram of the proposed method.
Figure 1. The overall structure diagram of the proposed method.
Sensors 24 06907 g001
Figure 2. Architectural configuration of scSE.
Figure 2. Architectural configuration of scSE.
Sensors 24 06907 g002
Figure 3. CWRU bearing test bed.
Figure 3. CWRU bearing test bed.
Sensors 24 06907 g003
Figure 4. Petrochemical experiment platform.
Figure 4. Petrochemical experiment platform.
Sensors 24 06907 g004
Figure 5. Visualization of CWRU data set features under four models.
Figure 5. Visualization of CWRU data set features under four models.
Sensors 24 06907 g005aSensors 24 06907 g005b
Figure 6. Confusion matrix of CWRU data set under four models.
Figure 6. Confusion matrix of CWRU data set under four models.
Sensors 24 06907 g006
Figure 7. Accuracy of CWRU data set under different values of hyperparameters r 1 and r 2 .
Figure 7. Accuracy of CWRU data set under different values of hyperparameters r 1 and r 2 .
Sensors 24 06907 g007
Figure 8. Accuracy of CWRU data set under different values of hyperparameters τ q and τ p .
Figure 8. Accuracy of CWRU data set under different values of hyperparameters τ q and τ p .
Sensors 24 06907 g008
Table 1. Description of CWRU dataset.
Table 1. Description of CWRU dataset.
Fault LocationLoadFault Diameter (Inch)Fault Label
N00
00.0071
RF0.0142
10.0213
0.0074
IF20.0145
0.0216
30.0077
OF0.0148
0.0219
Table 2. Description of petrochemical dataset.
Table 2. Description of petrochemical dataset.
Fault LocationNormalF1F2F3F4F5
Fault label012345
Table 3. Description of IMS dataset.
Table 3. Description of IMS dataset.
Fault LocationNormalORFIRFBF
Fault label0123
Table 4. Average accuracies of the results for CWRU (%).
Table 4. Average accuracies of the results for CWRU (%).
CWRU5-Way10-Way
1-Shot 5-Shot 10-Shot 1-Shot 5-Shot 10-Shot
WProNet92.76 ± 0.1192.14 ± 0.0895.18 ± 0.0687.57 ± 0.0989.84 ± 0.0490.13 ± 0.03
IPNet93.42 ± 0.1195.03 ± 0.0694.48 ± 0.0775.96 ± 0.1291.41 ± 0.0493.82 ± 0.03
K-kernel PN92.63 ± 0.1295.24 ± 0.0797.81 ± 0.0489.72 ± 0.3894.65 ± 0.1697.05 ± 0.07
Ours98.94 ± 0.0599.29 ± 0.0298.19 ± 0.0397.40 ± 0.0598.19 ± 0.0299.09 ± 0.01
Table 5. Average accuracies of the results for petrochemical (%).
Table 5. Average accuracies of the results for petrochemical (%).
CWRU5-Way10-Way
1-Shot 5-Shot 10-Shot 1-Shot 5-Shot 10-Shot
WProNet88.86 ± 0.1989.88 ± 0.1490.99 ± 0.1280.74 ± 0.1583.54 ± 0.0682.04 ± 0.04
IPNet88.55 ± 0.1993.33 ± 0.1191.14 ± 0.1180.60 ± 0.1383.62 ± 0.0685.04 ± 0.04
K-kernel PN91.89 ± 0.1197.07 ± 0.0198.12 ± 0.0889.61 ± 0.3096.23 ± 0.0697.77 ± 0.08
Ours92.11 ± 0.1697.36 ± 0.1198.46 ± 0.0391.67 ± 0.1396.50 ± 0.0697.72 ± 0.04
Table 6. Average accuracies of the results for IMS (%).
Table 6. Average accuracies of the results for IMS (%).
CWRU5-Way10-Way
1-Shot 5-Shot 10-Shot 1-Shot 5-Shot 10-Shot
WProNet92.97 ± 0.1899.59 ± 0.0299.86 ± 0.0193.35 ± 0.1299.23 ± 0.0299.43 ± 0.01
IPNet96.53 ± 0.1397.62 ± 0.0699.83 ± 0.0195.16 ± 0.1099.24 ± 0.0299.35 ± 0.01
K-kernel PN98.92 ± 0.0799.56 ± 0.0199.95 ± 0.0099.38 ± 0.0499.79 ± 0.0097.77 ± 0.01
Ours99.66 ± 0.0499.82 ± 0.0199.97 ± 0.0199.72 ± 0.0299.91 ± 0.0199.98 ± 0.00
Table 7. Results of ablation experiment (%).
Table 7. Results of ablation experiment (%).
MethodCWRUIMSPetrochemical
10-Way 5-Shot4-Way 5-Shot6-Way 5-Shot
PNIPS94.14 ± 0.0698.11 ± 0.0198.67 ± 0.06
AWPN96.89 ± 0.0399.03 ± 0.0297.98 ± 0.06
Ours98.19 ± 0.0299.71 ± 0.0199.50 ± 0.06
Table 8. The feature extractor parameters.
Table 8. The feature extractor parameters.
LayerParameterOut Shape
Input 1024 × 1
Conv1D/BN/ReLu/Maxpchannels = 64, c_size = [3], c_str = [1], p_size = [2], p_str = [1]512 × 64
Conv1D/BN/ReLu/Maxpchannels = 64, c_size = [3], c_str = [1], p_size = [2], p_str = [1]256 × 64
Conv1D/BN/ReLu/Maxpchannels = 64, c_size = [3], c_str = [1], p_size = [2], p_str = [1]128 × 64
Conv1D/BN/ReLu/Maxpchannels = 64, c_size = [3], c_str = [1], p_size = [2], p_str = [1]64 × 64
Conv1D/BN/ReLu/Maxpchannels = 64, c_size = [3], c_str = [1], p_size = [2], p_str = [1]31 × 64
Dropout/FlattenUnits = 1984, dropout_rate = 0.21984
Table 9. The scSE parameters.
Table 9. The scSE parameters.
OperationParameterOut Shape
Channel_input 1984
Expand_dimdim = −11984 × 1
Channel_expansionchannel = 641984 × 64
Avgpooldim = 01 × 64
Squeezedim = −11 × 4
Excitation/ReLuunits = 64 (Dense1)1 × 64
Fusion/Sigmoidunits = 1984 (Dense2)1984
Spatial_inputt 1984
Expand_dimdim = −11984 × 1
Channel_expansionchannel = 641984 × 64
Conv/Sigmoidchannels = 1, c_size = [1], c_str = [1]1984 × 1
Squeezedim = −1992 × 4
Excitation/ReLuunits = 1984 (Dense3)1984 × 1
Fusion/Sigmoidunits = 64 (Dense4)1984 × 4
Max(Channel_input, Spatial_input)1984 × 1
Dim_reductionUnits = −11984
Table 10. Accuracy of the model under different hyperparameters (%).
Table 10. Accuracy of the model under different hyperparameters (%).
Without τ p τ p = 0.6 τ p = 0.7 τ p = 0.8 τ p = 0.9
Without τ q None96.76 ± 0.02698.03 ± 0.02197.59 ± 0.02397.70 ± 0.023
τ q = 10 96.84 ± 0.02697.84 ± 0.02298.08 ± 0.02197.16 ± 0.02697.34 ± 0.024
τ q = 1 96.94 ± 0.02697.15 ± 0.02597.98 ± 0.02196.51 ± 0.02797.60 ± 0.023
τ q = 0.1 96.92 ± 0.02697.02 ± 0.02597.47 ± 0.02497.41 ± 0.02498.24 ± 0.019
τ q = 0.01 97.73 ± 0.02297.35 ± 0.02497.36 ± 0.02497.48 ± 0.02497.71 ± 0.022
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Zhu, Z.; Chen, Z.; He, J.; Chen, X.; Chen, Z. A Pseudo-Labeling Multi-Screening-Based Semi-Supervised Learning Method for Few-Shot Fault Diagnosis. Sensors 2024, 24, 6907. https://doi.org/10.3390/s24216907

AMA Style

Liu S, Zhu Z, Chen Z, He J, Chen X, Chen Z. A Pseudo-Labeling Multi-Screening-Based Semi-Supervised Learning Method for Few-Shot Fault Diagnosis. Sensors. 2024; 24(21):6907. https://doi.org/10.3390/s24216907

Chicago/Turabian Style

Liu, Shiya, Zheshuai Zhu, Zibin Chen, Jun He, Xingda Chen, and Zhiwen Chen. 2024. "A Pseudo-Labeling Multi-Screening-Based Semi-Supervised Learning Method for Few-Shot Fault Diagnosis" Sensors 24, no. 21: 6907. https://doi.org/10.3390/s24216907

APA Style

Liu, S., Zhu, Z., Chen, Z., He, J., Chen, X., & Chen, Z. (2024). A Pseudo-Labeling Multi-Screening-Based Semi-Supervised Learning Method for Few-Shot Fault Diagnosis. Sensors, 24(21), 6907. https://doi.org/10.3390/s24216907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop