4.2.1. Comparison of Diagnostic Performance of Different Input Images
The transformation equation of the SDP image indicates the importance of the parameters
, and
. Numerous types of research have shown that the number of symmetrical figures is most suitable when
is chosen at 60°, making the image’s symmetry and shape characteristics more prominent. Properly selected parameters
and
can enhance the graph’s resolution and intensify the differences between signals, thus better differentiating between different vibration signals. This paper employs image correlation coefficients to analyze the correlation between different images. For two images of size
, the correlation coefficient
R is described as the following.
where
A and
B are the two-dimensional gray matrices of the image. The calculated correlation coefficient
R between the different images takes values between 0 and 1, where
R = 0 and
R = 1 mean that the two images are different and identical, respectively. In order to better select the optimal
and
to distinguish SDP images with different fault states, the current paper considers the sum of correlation coefficients of SDP images with four fault states as the image evaluation index, which can be expressed as:
Firstly, the bearing fault vibration signal is selected with a load of 1 hp and fault diameter of 0.007 inches, the values of
L,
g, and the step length are set to be 1–10, 10–60°, and 1 and 5°, respectively. Then, it is converted into SDP images, and the sum of correlation coefficients
R of the four-fault state images is obtained. The results are presented in
Table 4 and
Figure 7.
Rsum gets a minimum value of 2.7851 when
= 3 and
= 30°. This indicates that the correlation between the different fault images is minimal, and the best identifiability is achieved when the current parameter is selected. In order to visualize the impact of parameter selection on the SDP image, some of the images are shown in
Figure 8 while combining
and
. It can be concluded that the highest image quality can be obtained when
= 3 and
= 30°. As
gradually increases, the arms of the SDP image become progressively fuller. As the value of
increases, the angle between the center of mass of the SDP image arm and the horizontal axis becomes progressively larger. Therefore, the SDP parameters were chosen as
,
= 3, and
= 30°, respectively.
According to the above analysis, the optimal parameters are selected to generate SDP images for RBFD. In order to clarify the benefits of the SDP images used in the current paper for RBFD tasks, rolling bearing vibration signals were converted into SDP images, Short Time Fourier Transform (STFT) images, Wigner-Ville Distribution (WVD) images, Hilbert-Huang Transform (HHT) images, and greyscale images of vibration signals [
33], respectively, as input into CBAM-DRN for fault diagnosis research. Under 1 hp load, the different types of input images for the early fault of an RB (i.e., fault diameter of 0.007″ for inner ring, outer ring, and rolling element faults) are shown in
Figure 9 as examples. Each experiment was performed ten times to eliminate the effect of accidental errors. The performance was assessed using the average accuracy of the ten results.
Figure 10 shows the detailed diagnostic accuracies for the ten experiments. The mean accuracies and standard deviations are shown in
Table 5.
The experimental results reflect that the highest detection precision is achieved using the same diagnostic model with SDP images as input. Diagnostic accuracy is around 95% for both STFT and WVD images as input. The diagnostic accuracy of HHT images as input is lower than SDP images, but higher than STFT images and WVD images, while accuracy is lower for greyscale images of vibration signals. The results indicate the excellent fault diagnosis capability of the presented approach. Since the SDP images are obtained by transforming the vibration signal into a coordinate system, the vibration signal’s fault features are not lost, and the different bearing faults can be characterized very well. In contrast, STFT and WVD are both time-frequency images. STFT truncates the vibration signal by adding a window function, while its selection influences the time-frequency image. WVD is a nonlinear time-frequency analysis method, which generates cross-talk terms when dealing with complex non-smooth signals and cannot accurately reflect the signal’s time and frequency information. Therefore, both the STFT and the WVD images are missing some information about the fault characteristics of the bearing, degrading the detection precision. In contrast to STFT images and WVD images, HHT images are obtained using the Hilbert Transform (HT) method after obtaining a series of Intrinsic Mode Functions (IMF) of the vibration signal through Empirical Mode Decomposition (EMD), since the HHT method is not limited by Heisenberg’s inaccuracy principle. Furthermore, EMD can be adaptively time-frequency localised and can effectively extract information about the features of the original signal to reflect local features. Therefore, the diagnostic accuracy of the HHT image as input is higher than that of the STFT image and the WVD image. However, EMD decomposition has the problems of mode aliasing and end effect, so the fault feature information in the HHT image will be affected, making its diagnosis accuracy slightly lower than that of SDP image.
Compared to the above methods, the vibration signal greyscale image is obtained by converting the signal’s amplitude into the corresponding greyscale value, which contains insufficient information about the bearing fault characteristics. Thus, the diagnostic accuracy is low when employing the greyscale image as input.
It is worth noting that the average diagnostic precision of the above methods exceeds 90%, which indicates that the presented CBAM-DRN is an efficient approach with good generalization and robustness. It also shows that the vibration signals of rolling bearings can be transformed into images for fault diagnosis.
4.2.2. Performance Comparison of Various Fault Diagnosis Approaches
In order to clarify the efficiency of the presented approach, its diagnostic performance has been compared with traditional approaches using a similar data set. Among them, the CNN diagnostic model consists of five alternately connected convolutional and pooling layers, an FC layer, and a softmax classifier. The convolutional kernel size is 3 × 3, the number of convolutional kernels is 16, 32, 32, 64, and 128, the pooling layer employs a maximum pooling of 2 × 2, and the number of nodes in the FC layer is 1024. The SVM diagnostic model adopts a Gaussian radial basis function (RBF) as the kernel function. The BPNN diagnostic model is structured as 512-256-128-64-4, and the activation function is ReLU. The inputs to the SVM and BPNN are the texture feature parameters of the SDP images. All experiments were performed ten times to eliminate the effect of accidental errors. The mean accuracy of the results of ten experiments was utilized for evaluating the diagnostic efficiencies of all approaches.
Figure 11 shows the detailed diagnostic accuracies for the ten experiments. Mean accuracies and standard deviations are shown in
Table 6.
The experimental results show that the highest mean diagnostic precision among all experimental methods is the proposed method A (99.32%), followed by method C (96.72%), method B (95.51%), method D (93.36%), method E (77.56%), and method F (67.16%). The mean precision of the presented approach exceeds the results of several other approaches, indicating its good fault diagnosis performance. The experimental results indicate the excellent stability of the presented diagnostic approach based on SDP images and CBAM-DRN, as shown by the slight standard deviation of the diagnostic accuracy attained from various experiments. Also, it can be found from the experimental results that the diagnostic performance of the methods using the DRN model (A,C) is superior to those using the CNN model (B,D), indicating the superiority of the DRN model to the CNN model in the RBFD task. This is because DRNs have more intermediate layers than CNN models, which allows them to extract more in-depth fault features. Moreover, the DRN employs residual connectivity to allow the model to learn features even when increasing depth, which solves the performance degradation problem arising from ordinary CNNs when the number of layers increases. As presented in
Table 6, methods A, B, C, and D all have good diagnostic accuracies. However, their comparison indicates that method A has a higher diagnostic precision than method C and method B has a higher diagnostic precision than method D. This is because methods A and B both add the CBAM attention mechanism, which gives different weights to different regions of the feature image, allowing the diagnostic model to locate and focus more on the image parts with more prominent fault features, thus effectively improving the diagnostic precision of the fault diagnosis model.
Notably, approaches A, B, C, and D, which employed DRN or CNN models, all had average accuracies above 90%, while methods E and F, which used SVM or BPNN models, all had accuracies below 80%. This is because both DRN and CNN are deep diagnostic models. Compared to shallow models such as SVM and BPNN, deeper models can extract more and deeper fault features, thus effectively characterizing the complex mapping relationships between bearing vibration signals and fault states. Moreover, since the data samples employed in the experiments were randomly selected from a composite fault dataset with different loads and fault diameters, the deeper model with better generalization capability for fault diagnosis can achieve higher diagnostic accuracy. Besides, compared with SDP images directly input into the deep model, artificial feature extraction and complex signal processing approaches determine the diagnostic efficiency of conventional approaches like SVM and BPNN. This experiment adopts the texture feature parameter of the SDP image as the input for the SVM and BPNN, introducing uncertainty due to human interference in the extraction process. Furthermore, texture features are manually extracted features designed for a specific diagnostic model, which is time-consuming, labor-intensive, and not universal.
In order to analyze the classification of the above fault diagnosis methods in more detail, the classification results of the different diagnosis methods were counted during the first experiment to attain the confusion matrix, as shown in
Figure 12. The vertical coordinate of each confusion matrix indicates the classification’s actual label, and the horizontal coordinate indicates the predicted label. The elements on the main diagonal of the confusion matrix indicate the number of samples in the current category that were correctly classified. The confusion matrix of the presented approach provides a better classification of samples for each type of health status and the highest number of correctly classified samples, indicating its high classification accuracy. This illustrates the validity and applicability of the presented diagnostic approach for distinguishing between different types of rolling bearing health states.
In order to further clarify the efficiency of the presented diagnostic approach and get a more intuitive understanding of its feature extraction and fault classification capabilities, the t-SNE technique [
34] was utilized to downscale and visualize the fault features extracted for CBAM-DRN and other methods using deep models, as presented in
Figure 13.
The visualization of the CBAM-DRN shows that features of the same type are aggregated with small intra-class distances, while the fault features of different states are effectively separated using the diagnosis method with large inter-class distances. In contrast, other methods confuse and misclassify the feature maps of different fault categories. It indicates that the presented approach has better fault feature extraction and classification capability under the multi-load and multi-fault diameter conditions. It is proved that different bearing faults can be well represented by transforming bearing vibration signals into SDP images without losing the critical fault feature information in vibration signals. It also shows that the SDP image has more representative and discriminative fault feature information than the manually extracted fault features.