4.1. Data Set
For this paper, the experimental data from the rolling bearing on the inner and outer rings and the rolling body was recorded, after the use of electric spark technology on the rolling bearing had caused different degrees of single-point bearing fault.
Figure 5 is the test stand; the diameters of the fault damage were 0.18 mm, 0.36 mm, 0.54 mm, and 0.71 mm, respectively. Then the vibration signals were collected under the conditions of 0HP, 1HP, 2HP and 3HP, respectively; the sampling frequency was 12 kHz.
In this experiment, faults are divided into 10 categories. In order to ensure that the statistical characteristic distribution of fault samples follows that of the distribution of a large number of overall fault characteristics, 1024 consecutive sample points are intercepted on the original vibration signal for each small number of fault samples and a large number of overall fault samples are generated, with a total of three data sets
A,
B and
C. Data set
A is mainly used to train C−DCGAN to generate high-quality fault sample data, so it mainly contains the original fault sample data of various types of faults in the original fault data set. Data set
B is mainly used to train the classification model for fault classification and diagnosis. The simulated fault samples generated from data set
A are mixed into the original data set
A, and the mixed fault data is used as the training set. Data set
C is a test set to test the training convergent classification model. The number of experimental samples is shown in
Table 4.
4.2. Experimental Results and Comparative Analysis
In order to verify the feasibility of the fault diagnosis method based on the C-DCGAN extended data set, the comprehensive indexes of positive case accuracy and negative case accuracy are used to evaluate the method.
“Accuracy” is the ratio of the number of correctly classified failure samples to the total number of failure samples when testing against a failure test set.
“Recall” is the ratio of the number of failure samples from all positive cases to the number of failure samples from the actual positive cases in the correct classification.
“Specificity” refers to the ratio of all false samples to all actual false samples in the predicted accurate fault samples.
G-mean, an index obtained by combining the accuracy of positive cases and negative cases, is usually used to evaluate the classification effect when the data distribution is unbalanced.
TP refers to the number of real samples judged as true sample categories; TN refers to the number of false samples judged as false sample categories; FP refers to the number of false samples judged as true sample categories, that is, the number of samples that are mistakenly classified; FN refers to the number of real samples judged as false sample categories, that is, the number of real samples omitted.
Figure 8a–c shows the time domain diagrams of the original fault samples of the rolling element, inner ring and outer ring, and the corresponding time domain diagrams of the samples generated by C−DCGAN. It can be seen from
Figure 8 that the fault sample data generated by C−DCGAN is not exactly the same as the original data, but its overall distribution is similar to the original sample data. At the same time, the generated data has expanded the diversity of the real sample data, which proves that the generation network can be used to effectively expand the unbalanced data set, so as to solve the problem of data imbalance in the data set.
As shown in
Figure 9, the data set of original unbalanced fault and the expanded fault are used for fault diagnosis by the same classifier, and T−SNE visualization is performed.
Figure 9a is the diagnostic classification result of the original unbalanced fault data set. It can be seen that there are many overlapping cases of different types of faults.
Figure 9b shows the diagnostic classification results after the data set is balanced and expanded by the generated model structure in this paper. It can be seen that C−DCGAN can generate muti-category fault sample data and mix the generated simulated fault data into the original fault data set. Use of the mixed fault data set for fault diagnosis and classification can improve the clustering effect of each type of fault sample data, and effectively improve the accuracy of fault diagnosis and classification. It can be seen from the classification effect in
Figure 9 that the sample values of bearing faults of the same category are gathered together, while the fault data of different categories can be clearly separated according to the fault category, and most of the sample data can be correctly classified.
According to the definition and formula, the value range of
G−mean is (0, 1), and values close to 1 indicate that the classification effect of
G−mean is better. According to
Figure 10, the original unbalanced fault data of bearing inputted into the 1−D−CNN network for fault diagnosis and the classification result reveals that most of the values are too small, indicating that the original unbalanced small fault data cannot train 1−D−CNN to a high accuracy rate, so the classification effect is poor. CGAN performs supervised data generation under constraints. The simulated fault sample data generated after the constraint guidance is mixed into the original fault data, and input into the classifier for training. After the classifier is trained to Nash equilibrium, the classification is used. It can be seen from the
G−mean value that generating valid fault data through CGAN and expanding the data set can reduce the impact of uneven distribution of data samples. The performance of C−DCGAN under ten fault categories is the strongest of the three, and the average value returned exceeds 0.8, which can prove that the network structure described in this paper can better balance the expansion of the original fault sample data set. By balancing the fault data set, the classifier is trained to achieve high-accuracy fault classification and diagnosis, reducing the impact of classification errors caused by uneven data distribution.
In order to verify the validity and authenticity of the fault sample data generated by the model proposed in this paper, the maximum mean discrepancy (MMD) is used for evaluation. This metric evaluates the true rows of the generated simulated sample data by calculating the probability distribution distance between the simulated sample data and the original sample data; the calculation formula is shown in Formula (18):
where
K indicates that the distance between the original and generated data set, and which is mapped to the regenerated Hilbert space by the function of Gaussian kernel Randomly selected 0, 1, 4, and 7 categories of fault data, as shown in
Figure 11, in the process of generating data for these four different fault categories, as the number of training iterations increases, the maximum mean difference overall shows a downward trend, and the probability distribution between the original fault data and the generated fault samples gradually decreases. Scaled down, this verifies the authenticity of the simulated fault samples generated by the adversarial generative network proposed in this paper.
In order to verify that the C−DCGAN fault diagnosis model proposed in this paper is suitable for fault diagnosis of bearings in the case of unbalanced sample data, the model is compared with the following two different diagnostic models of adversarial generative networks, with a 1:20 imbalance. The proportion of the fault data set is defined, and the highly unbalanced data set is input into three fault diagnosis models trained to Nash equilibrium, the trained diagnostic model is then used to classify the fault diagnosis of the test set. The classification results are classified into a confusion matrix, as shown in
Figure 12. Under the premise of unbalanced sample distribution, the data generation quality of the generative model can be reflected according to the classification accuracy of fault diagnosis. In
Figure 12, the fault diagnosis model proposed in this paper evinces high accuracy and can therefore be applied to bearing fault diagnosis scenarios with small sample fault data sets.
In order to further verify that the conditional deep convolutional adversarial generation network proposed in this paper can improve fault diagnosis classification accuracy by balanced expansion of the small sample fault dataset, common fault diagnostic methods, such as C−DCGAN+SVM, C−DCGAN+LSTM, C−DCGAN+1-D−CNN, infoGAN+1−D−CNN, and CGAN+1−D−CNN were adopted for a comparison of rolling bearing fault diagnosis. LSTM is a type of long short-term memory network that detects and classifies faults by learning the temporal information between fault features. C−DCGAN+SVM is a type of process in which, after expanding the small sample data set through the confrontation generation network proposed in this paper, the data set is input to the SVM network for classification training, and the trained classifier is used to perform fault diagnosis and classification on the test set. infoGAN+1-D−CNN is based on an information generation adversarial network for data generation, and it then uses one-dimensional convolutional network for fault feature extraction and fault classification. CGAN is based on GAN and adds constraints for data generation.
According to
Table 5, it can be seen that the adversarial generation model proposed in this paper can effectively expand the fault data set and use a variety of classifiers for fault diagnosis, and the classification accuracy can reach 90%. The results show that the fault diagnosis method proposed in this paper can effectively improve the fault diagnosis accuracy of rolling bearings compared with several common fault diagnosis methods based on data augmentation.