1. Introduction
Against the backdrop of high-efficiency modern enterprises, equipment failures cause large economic losses and personal safety risks [
1,
2,
3]. Rolling bearing is one of the important components of rotating equipment and rolling bearing failure can easily lead not only to machinery and equipment producing a large security risk but also to increasing enterprise maintenance costs [
4,
5]. Therefore, effective health monitoring of rolling bearings in mechanical equipment is a better measure to improve the overall safety of equipment and reduce industry production costs [
6].
Vibration signals are collected from running rolling bearings and contain information about the health of the equipment. The fault diagnosis method based on vibration analysis is divided mainly into feature extraction, feature selection and fault identification [
7,
8]. It has been widely demonstrated in some studies that deep learning approaches can extract valuable fault features from raw data without requiring any preconceived signal processing methods. However, in the context of modern equipment, the appearance of heavy background noise, complexity performance and insufficient vibration data, advanced signal processing techniques are required to extract features from the raw data [
9,
10]. When analysing the operating conditions of rolling bearings in the time domain or frequency domain, the diagnosis can be affected by nonlinear factors, such as stiffness and clearance to vibration signals and load friction. Time-frequency analysis methods are more effective when handling nonlinear and nonstationary signals because these methods can accurately describe the local time-frequency characteristics of nonstationary signals by revealing the frequency components and their time-varying properties [
11,
12,
13,
14]. Hence, time-frequency analysis methods have been widely used in fault diagnosis in the past few years. Li et al. [
15] proposed a two-direction two-dimensional linear discriminative analysis (TD-2DLDA), which combine the advantages of the short-time Fourier transform (STFT) and wavelet transform. Cai et al. [
16] combined the generalized S transform and singular value decomposition (SVD) for time-frequency analysis of bearing fault vibration signals. Dhamande et al. [
17] employed continuous and discrete wavelet transforms of the vibration signal to extract compound fault features in gear systems. Chandra et al. [
18] investigated the performance of different time-frequency analysis methods in fault diagnosis of rotor bearing systems. Wang et al. [
19] developed a sparse and low-rank decomposition of the TFR method for bearing fault diagnosis. Israel et al. [
20] combined local mean decomposition (LMD) and the Wigner–Ville distribution (WVD) to propose an efficient time-frequency analysis method.
The aforementioned traditional time-frequency methods can be roughly divided into two categories: linear time-frequency methods (which are represented by the STFT transform and wavelet transform) and nonlinear time-frequency methods (which are represented by the Wigner–Ville distribution). The method in the latter category can characterize faults in the time-frequency domain but has obvious limitations. The methods in the former category are limited by the Heisenberg uncertainty principle, resulting in poor sparsity and low time-frequency resolution in the analysis results. The latter suffers from cross-term interference, which reduces the accuracy of the TFR.
The development of sparse representation theory allows the above problems to be solved. With the application of sparse representation theory in time-frequency analysis, time-frequency methods based on sparse representation and an STFT have emerged. A time-frequency analysis problem can be transformed into a sparse optimization problem based on the L0 norm constraint, and the solution for this problem can obtain a higher resolution time-frequency representation. Although there is no interference from cross-terms, the optimization process for this model is a nonconvex optimization problem, which is a nondeterministic polynomial hard (NP-hard) problem to solve. For this purpose, this paper proposes a time-frequency analysis method based on PGD and a SSTFT. First, the convex relaxation technique is applied to relax the nonconvex optimization problem of the L0 norm to an L1 norm minimization problem and then PGD is used to solve the problem.
Some satisfactory results have been acquired by intelligent bearing fault identification algorithms by using TFR as input. Zhang et al. [
21] used the scaled exponential linear unit to improve the learning ability of the convolutional model for TFR. Ma et al. [
22] proposed a fault diagnosis model based on time-frequency analysis and a deep residual network. Liang et al. [
23] utilized parallel convolutional neural networks (CNNs) for TFR learned by a continuous wavelet transform (CWT). Akhenia et al. [
24] applied the single image generative adversarial network (SinGAN) to generate additional TFRs as training samples for the classifier. Wang et al. [
25] employed convolution deformable atrous convolution to extract bearing fault features in TFR. Udmale et al. [
26] used CNNs to learn fault features in Kurtogram TFRs. Wang et al. [
27] proposed a fault diagnosis method based on a multitask CNN by taking the time-domain signals, the frequency-domain signals and the TFR as the input of the CNN at the same time.
However, the abovementioned traditional intelligent diagnosis method based on TFR still has two inherent shortcomings:
(a) When the TFR is used as input, the entire TFR is usually used as the learning object of the model and the understanding of the TFR is lacking. Many irrelevant vibration components also establish relationships with labels, reducing model interpretability.
(b) As one of the cutting-edge ideas in image processing, the application of object detection theory in fault diagnosis is rarely addressed.
To this end, the Faster RCNN algorithm is introduced in the traditional intelligent fault diagnosis based on time-frequency analysis in this paper; the algorithm can accurately mark the fault components from the TFR to accurately realize the fault diagnosis of rolling bearings [
28]. Faster RCNN, one of the extensions of the RCNN, is one of the state-of-the-art solutions in general object detection [
29]. The method can be divided into two main parts. (1) Region proposal network: This network is used mainly to generate a list of region proposals that may contain objects. (2) Object localization and classification network: This is used mainly for classifying a region of an image into objects (and the background) and refining the boundaries of these regions. The proposed method follows a framework similar to Faster RCNN. Here, the fault component is taken as the target to be identified so that the fault category of the original TFR can be accurately identified towards higher recall and accuracy. Compared with the traditional method (which uses the entire TFR as the model mapping object), the proposed method can directly point out the fault components in the TFR, thus improving the interpretability of the intelligent diagnosis method.
Inspired by the expectation of addressing the abovementioned problems, a bearing fault diagnosis method based on PGD-SSTFT and Faster RCNN is proposed in this paper. The PGD-SSTFT algorithm is adopted to obtain a sparse TFR without cross-term interference from the vibration signal and the fault characteristics of rolling bearings can be displayed with high resolution. A fault diagnosis model is then built based on Faster RCNN, one of the representative algorithms in object detection theory and the model is used to learn the fault feature components from the TFR. Finally, the fault components can be accurately marked in the bearing fault samples of unknown categories and the fault type can be identified.
The main contributions of this paper are as follows:
(1) A time-frequency analysis model of the bearing vibration signal is built by using the STFT and sparse constraints are introduced into this model, thereby improving the time-frequency resolution and time-frequency aggregation of the time-frequency analysis method. Additionally, the model is transformed into an easy-to-solve unconstrained problem by using the PGD algorithm.
(2) The object detection method is introduced in fault diagnosis and the fault feature components are more accurately and pertinently labelled from the original TFR, thereby improving the interpretability of the fault diagnosis method.
(3) The effectiveness of the proposed method is validated by using the simulated signal and the actual bearing vibration signal. The results indicate that the TFR of the proposed method is more accurate than that of the traditional method and the fault identification accuracy is higher.
The remaining parts are organized as follows:
Section 2 introduces the basic theory of sparse time-frequency analysis methods based on PGD-SSTFT and
Section 3 presents a fault diagnosis model based on object detection theory.
Section 4 establishes the end-to-end fault diagnosis model based on the proposed method.
Section 5 conducts experiments to test the effectiveness of the proposed method. The conclusions are summarized in
Section 6.
3. Fault Diagnosis Model Based on Fault Feature Detection
In the second section, the time-frequency decomposition results of PGD-SSTFT are presented. It can be clearly seen from the results that different types of faults have different frequency components, but relying on manual identification of the fault components from the TFR is undoubtedly complex and labor-intensive. Hence, our focus is to automatically identify the fault components in the TFR.
To this end, the Faster RCNN is introduced to directly extract the regions related to the fault components from the original TFR and then the identification model is established to achieve fault diagnosis. Traditional intelligent diagnosis methods usually use the entire TFR corresponding to the fault type to achieve the fault diagnosis, which lacks the understanding of time-frequency information. The difference from the traditional intelligent method is that the proposed method can automatically identify the regions containing the fault frequency from the original TFR and determine the fault type implied in each TFR. This not only considers the location of the fault frequency (the position in the TFR is determined by the horizontal ordinate time and the longitudinal coordinate frequency), but also considers the energy of the fault frequency (determined by the color displayed in the TFR). The proposed method can be divided into the following steps: data preprocessing, fault region proposal, screening algorithm and fault identification.
3.1. Data Preprocessing
In this paper, when processing the original time-frequency dataset, software is used to annotate the time-frequency signal and the main fault frequency features in the TFR of different fault types are selected. As shown in the figure, the regions in these boxes include existing features as much as possible and the regions selected in the box can reflect the position and energy of this frequency. This process is realized with existing software. More details on this process can be found in the results.
3.2. Fault Region Proposal
The input TFR first undergoes feature extraction and then enters the region suggestion step. This process is completed with multiple convolution layer structures and
k anchors are randomly generated on the input feature map. As illustrated in
Figure 5, the image size is
and
anchors will be generated. Then, nine anchor boxes are generated with each anchor as the centre. This process obtains multiple suggested regions on the same input feature map. Therefore, this process outputs the coordinates of these regions and the probability that the box contains different features; this probability is calculated with
Softmax.
This process can obtain the best region containing fault features but also generates many duplicate anchor regions and can thereby lower the performance of models. Therefore, regions with lower scores are deleted according to the intersection-over-union (IoU) condition. The operation can be summarized as follows: The box with the highest confidence is selected and then the box whose IoU is higher than the threshold is deleted until the best rectangle is selected to achieve the position output of the target. These selected regions are called regions of interest (RoIs).
3.3. Standardization of Regions of Interest
After feature extraction, the above RoIs are performed on the feature map, which cannot reflect the features in the original TFR. In addition, the above steps result in the suggested regions containing different dimensions, thus making it impossible to complete the subsequent fault classification. For this purpose, a normalization layer similar to spatial pyramid pooling is introduced to map the resulting RoIs to the original TFR and adjust the size for classification. This part consists of multiple convolutional layers and pooling layers.
3.4. Fault Identification
After the RoIs are obtained, the fault identification part is entered. The fault identification part completes two main tasks. First, the position regression calculation is performed to calculate the loss of the predicted and the actual regions. Then, the loss will be backpropagated to optimize the parameters.
Suppose the coordinates of the box are , P is the proposed box and G is the actual box. Then, the goal of model regression is to find a mapping that satisfies .
For this purpose, the prediction box is translated and scaled separately.
The gradient descent algorithm is used to continuously make the predicted box G approach the position of the real box. Fault identification completes fault classification, which is realized by adding Softmax after the fully connected layer. The final classification part integrates all RoIs in the entire feature map, sorts the probability categories corresponding to these RoIs and outputs the category with the highest probability among all RoIs. During this process, the loss between the predicted box and the ground-truth box is calculated and the loss is backpropagated to optimize the parameters.
3.5. The Framework of Proposed Model
In this paper, a fault diagnosis model is built on the basis of Faster RCNN in object detection theory, thus making the neural network more refined with respect to finding fault features from the TFR. As mentioned above, this method can be divided into four main parts (as shown in
Figure 6). First, the feature extraction part performs a preliminary extraction of fault features and then the region proposal part extracts proposal regions, which are then processed by the RoI pooling part. The final classification part will integrate the preselected boxes obtained from a TFR to identify the fault type.
4. General Procedure of Proposed Fault Diagnosis Method
To improve the accuracy and stability of intelligent diagnosis, a rolling bearings fault diagnosis method based on sparse time-frequency decomposition and object detection theory is proposed in this paper. As shown in
Figure 7, more details on the framework of the proposed method are shown as follows:
(1) Vibration signals are collected by using acquisition equipment, such as sensors.
(2) The vibration signal is then converted to a TFR by using sparse time-frequency decomposition based on PGD-SSTFT; then, these images are labelled.
(3) A fault diagnosis mode is established based on object detection theory and the optimal related hyperparameters and model structure are determined.
(4) The model is fully trained by using training samples.
(5) The trained model is applied to identify test samples and then the diagnosis results are output and the model performance is evaluated.
Figure 7.
The general diagnosis procedure of the proposed method.
Figure 7.
The general diagnosis procedure of the proposed method.