1. Introduction
Bearings are one of the most important mechanical components in rotating machinery. According to statistics, 40% of motor failures are bearing failures [
1]. In order to reduce maintenance costs and prevent harmful or even damaging consequences of faults and failures, we need to find and replace fault bearings early. There are many ways to detect motor bearing faults, such as fault diagnosis through stray flux analysis [
2], Park’s vector method (PVA) [
3], instantaneous power factor (IPF) monitoring [
4] and so on. Among them, using the acceleration of the motor housing to determine the existence of these faults is the most accurate method, which has become a very well-developed field in recent years [
5]. Real time bearing vibration signal fault detection methods, including traditional methods and deep learning methods, have become a hot topic recently [
6].
The traditional methods require feature extraction of the bearing vibration signal, dimension reduction, and classification. Feature extraction is mainly for signals in the time-domain, frequency domain and time-frequency domain. Traditional time-domain techniques extract features from time-domain vibration signals, including peak-to-peak, root mean square, crest factor, kurtosis, and pulse factor. The frequency domain technique extracts the corresponding features by transforming the time-domain signal into the frequency domain using Fast Fourier Transform (FFT) or other methods. At present, many time-frequency techniques such as atomic decomposition based wavelet transform (WT), empirical mode decomposition (EMD) and so on [
7] have been applied in mechanical fault diagnosis. The descending dimension algorithms mainly include principal component analysis (PCA) [
8] and independent component analysis (ICA) [
9]. The classification methods include k-nearest neighbor (KNN) [
10], support vector machine (SVM) [
11] and so on. However, selecting the best features from the original feature set is a blind and subjective task. Moreover, due to the large number of monitoring points of electromechanical equipment and the high sampling frequency of the sensor, the detection system will obtain a large amount of data [
12]. Correspondingly, there are two major challenges for fault diagnosis: (1), the amount of data is large and needs to be detected automatically; (2), the data types are diverse, and the features are difficult to extract. In recent years, deep learning has been developed rapidly. Since CNN has the ability to detect faults by learning optimal filters, it can directly extract and learn the best features from the original signals and it has been applied to behavior recognition [
13], classification of ECG signals [
14], speech recognition [
15,
16] and so on. CNN is superior to traditional methods not only in terms of accuracy, but also in terms of speed, and another key feature of CNN is its adaptive design. Based on these advantages, researchers have tried to apply deep learning to bearing fault diagnosis [
17,
18].
In [
19], the time-frequency diagram of the rolling bearing signal is extracted as the input of the two-dimensional CNN, and the fault diagnosis is realized under the same network structure. Jia [
20] proposed an FFT-DNN fault diagnosis method, which uses the preprocessed FFT spectrum image as the input of the DNN. Generally speaking, a two-dimensional image is equivalent to one surface, and a one-dimensional vibration signal is more like a line. We believe that the main reason for the application of CNN with two-dimensional convolution structure in image analysis is due to the two-dimensional spatial correlation in the image, but most bearing fault diagnosis measurement data are only related to one-dimensional time. If we directly convert it into a two-dimensional form, the spatial correlation in the original sequence will be destroyed and the information related to the failure may be lost [
21].
Furthermore, bearings are usually operated with different loads. Under these different loads, problems such as different periods and phases of vibration signals, different amplitudes, and large waveform differences between different fault depths bring great challenges for bearing fault diagnosis [
22]. Zhang [
23] proposed the WDCNN algorithm for bearing fault type recognition and fault size assessment. It handles raw noisy signals and diagnoses bearing faults under different workloads. However, it still has a problem of over-fitting, which causes the network to not recognize the fault under variable load conditions. In addition, due to the complexity of the acquired vibration signals, or even the imbalance between different fault samples [
24], the limitations of the individual model are highlighted [
25]. At present, the deep learning model mainly focuses on the study of individual models, and there are also problems such as weak diagnostic generalization. The new technology of ensemble learning can combine multiple learners through some voting-like combination strategies to get better results [
26]. In order to further improve the fault diagnosis ability, we propose a One-Dimensional Fusion Neural Network (OFNN).
We propose The One-Dimensional Fusion Neural Network combined D-S evidence theory with Adaptive one-dimensional Convolution Neural Networks with Wide Kernel (ACNN-W). Diagnostic models based on a single source of information may lead to misdiagnosis, and multi-sensor information sources can be used to obtain more reliable fault diagnosis through D-S evidence fusion. In recent years, D-S evidence theory and its variants have been widely used in multi-sensor data fusion, decision analysis, fault detection, and other industrial fields [
27]. Most of the input signals for current deep learning fault diagnosis come from a single sensor. However, different sensors have different adaptability and anti-interference ability, which may result in failure to provide reliable information in a harsh working environment, so it makes sense to develop a learning model that takes advantage of deep learning and ensemble learning. Therefore, the proposed method uses D-S evidence theory to ensemble deep neural networks with multiple sensor information to construct a data fusion model for bearing fault diagnosis. The proposed fusion model can combine multiple uncertain evidences and provide fusion results by combining consensus information and exclusion information. It has the following characteristics:
- (1)
Compared with the machine learning-based damage assessment method proposed by the traditional method, the proposed method can adaptively extract features directly from the original vibration signal without manual feature extraction. Traditional machine learning damage detection methods use hand-crafted features that are not only sub-optimal, but also have high computational complexity.
- (2)
The ACNN-W can effectively suppress over-fitting and improve the generalization performance of the network.
- (3)
In order to overcome the limitations of individual deep learning models, reduce the impact of random initialization of neural networks, and make full use of information in different fields, we use D-S evidence theory to make comprehensive decisions on OFNN.
The remainder of this paper is as follows:
Section 2 introduces the related work.
Section 3 presents our method.
Section 4 illustrates the evaluation of the ACNN-W model and optimizer and learning rate selection. Experiments in
Section 5 expound the evaluation and analysis of the OFNN network.
Section 6 is our conclusion.
2. Related Work
The method of classifying based on artificial extraction can be used in bearing fault diagnosis, but the method relies on manually acquired prior knowledge and cumbersome manual feature extraction, and the final accuracy is greatly affected by the selected features. Therefore, in this paper, we adopt a deep learning scheme to combine feature extraction and feature classification into one step. Some present methods of bearing fault diagnosis based on deep learning are described in the following paragraphs.
Xie [
28] proposed rotating machinery fault diagnosis based on convolutional neural network and empirical mode decomposition (EMD). The EMD is used to extract the frequency domain information of the signal, and CNN is used as the classification. Guo et al. [
29] proposed multi-scale continuous wavelet transform combined with CNN for bearing fault detection, but they only used CNN for classification without fully utilizing CNN’s powerful feature extraction capability. Ince [
30] proposed to combine feature extraction and classification tasks into a network. Abdeljaber [
31] used a one-dimensional convolutional neural network for the vibration detection of steel frames, and directly used the powerful learning ability of one-dimensional convolutional neural networks to classify faults.
Pan [
32] believed that the above works used the same load to collect signals for training and testing CNN, which limits the further application of the model. Therefore, the LiftingNet deep learning network is proposed to train and test signals with different sampling frequencies and different loads. However, this method normalizes the amplitude, which leads the model not to recognize the severity of the fault.
Zhang [
23] proposed WDCNN, which is an end-to-end diagnostic method for the case of variable loads. However, this method is only a single model, which has certain limitations and is more susceptible to random initialization. In addition, they performed a min-max regularization operation on the input, which made the signal distribution susceptible to individual extreme values, resulting in excessive distribution differences under different loads. Li [
33] proposed to treat the original signal as a two-dimensional frequency domain signal as an input and use a multi-sensor classifier to integrate fault diagnosis. It can well overcome the limitations of a single model, but the original signal is a one-dimensional time domain signal, and transforming it into a two-dimensional signal may destroy some spatial information.
Therefore, we propose a one-dimensional fusion neural network based on D-S evidence theory, using the original signal without min-max regularization as input, and directly using the one-dimensional convolutional neural network to adaptively extract and classify the original time domain signal, and adopt it in the network. Moreover, RMSprop optimization and BN are used to suppress over-fitting of the network.
3. Proposed OFNN for Bearing Fault Diagnosis
We propose the Adaptive one-dimensional Convolution Neural Networks with Wide Kernel (ACNN-W) as the basis for the One-dimensional Fusion Neural Network (OFNN). As shown in
Figure 1, ACNN-W is proposed to learn features adaptively from raw mechanical data without prior knowledge. And we use ACNN-W as sub-classifiers of OFNN, combined with D-S evidence theory for evidence fusion. In this Section, we introduce the design of ACNN-W’s framework and the application of D-S evidence theory in one-dimensional fusion neural networks.
3.1. Architecture Design for ACNN-W
CNN consists of three layers [
34], which are convolutional layers, pooling layers and fully connected layers. As shown in
Figure 2, the first few layers of a typical ACNN-W consist of a combination of two types of layers—convolutional layers, followed by pooling layers—and the last layer is a fully-connected layer. Next, we will describe them in more detail.
The structural parameters of ACNN-W are shown in
Table 1. The one-dimensional convolutional neural network extracts feature information from the first eight layers. The first nine layers use the ReLU function as the activation function, and the Softmax layer behind the network converts the output of the neural network into a probability distribution. Compared with the WDCNN [
23], the ACNN-W network has fewer layers, requires less network parameters and has a stronger ability to avoid over-fitting.
In order to effectively suppress the neural network over-fitting and improve the ability to express, we use the Dropout algorithm in the fully connected layer. The algorithm turns the value of a layer of neurons to 0 with a certain probability, thereby preventing co-adaptation between neurons. Similarly, Zhang [
23] proposed that when the first layer of the one-dimensional convolutional network is a large convolution kernel, a larger receptive field can be obtained, and the useful features for diagnosis can be learned autonomously and the over-fitting can be suppressed, so the network we propose uses a large convolution kernel at the first level.
The convolutional layer consists of a number of 1-D filters with weighting parameters. These filters combine with the input data and get an output called feature map, and each filter shares the same weighting parameters for all slices of input data to reduce training time and model complexity.
In order to improve the training efficiency of the network, reduce the internal covariate transfer, and enhance the generalization ability of the neural network. We used Batch Normalization (BN), which can accelerate deep network training by reducing internal covariate shift, reduce the fluctuation and improve the recognition rate [
35].
Commonly used neural network activation functions are the sigmoid function (Sigmoid), the hyperbolic tangent function (Tanh), and the rectified linear unit (ReLU). When the absolute value of the input value is relatively large, the derivative values of the Sigmoid and Tanh functions are close to 0, which causes the error value to be propagated downward when updating the weight with the error back propagation, and the vanishing gradient problem during training the underlying network. Conversely, when the input value of the ReLU function is greater than 0, its reciprocal value is 1, which can well overcome the vanishing gradient problem.
3.1.1. The Cost Function of the ACNN-W
Two cost functions have been developed to measure the error between the input vector and the target vector, which are the traditional mean square error cost function and the newly developed cross entropy cost function. Compared with the mean square error cost function, the cross entropy cost function exhibits faster convergence speed and stronger global optimization ability. So cross entropy is a widely used loss function that characterizes the distance between two probability distributions. The smaller the cross entropy, the closer the two probability distributions are. Given two probability distributions
p and
q, we can use
q to denote the cross entropy of
p as:
where the probability distribution function satisfies:
Combined with mini-batch, the loss function cross entropy can be expressed as:
where
m is the mini-batch size,
q is the actual output value of the Softmax layer, and
p is its target distribution [
36].
3.1.2. The Optimizer of the ACNN-W
The optimizer plays an important role in the training speed and classification accuracy of the neural network model. The commonly used optimizer has three kinds of optimizers: RMSprop, Adam and Adadelta. Qu [
21] demonstrates the superiority of the RMSProp optimizer in bearing fault diagnosis. The optimizer can effectively prevent the premature convergence problem in the deep learning process by adaptively retaining the learning rate based on the mean value of the nearest magnitude of the weight gradient, and is suitable for processing non-stationary data such as a vibration signal. So we use the RMSprop optimizer, and the RMSprop optimizer training steps are as follows:
Input: Global learning rate , decay rate , Initial parameter , Constant is standing at 10−6 (for stable values) |
Initialize cumulative variables |
While not reach the stop criterion do |
Take a small batch of samples from the training set and the corresponding label is |
Gradient calculation: |
Cumulative square gradient: |
Update parameter: ( Element-by-element application) |
Application update: |
End while |
3.2. The Application of D-S Evidence Theory in OFNN
To reduce the impact of random initialization of the network, we use at least two ACNN-W networks to train the same data set. Therefore, as shown in
Figure 1, the proposed method constructs four sub-classifiers based on ACNN-W. The vibration data measured by the two sensors in the same period is cut and used as the input of the classifiers 1, 2 and the classifiers 3, 4, respectively. In addition, the classifiers 1, 2, 3, and 4 use random initialization operations and are trained individually. After the training is completed, the vibration signals that need to be verified are respectively input into the trained network, and the class vectors of the Softmax layers of the four networks are taken out for D-S evidence fusion, thereby obtaining the final diagnosis result.
D-S evidence theory is mainly used to deal with uncertainty reasoning, which was proposed by Harvard mathematician Dempster and his student Shafer. In the currently used combination strategy, voting is simple and convenient, and has been widely applied to different ensemble learning methods. However, the main disadvantage of voting majority consent rules is that all individual models have the same weight and cannot fully exploit hidden information. The D-S evidence theory can be considered as a general extension of Bayesian theory, which can effectively deal with incomplete data. D-S evidence theory is not the probability of calculating propositions, but the probability of supporting evidence to support propositions and provides an alternative method for dealing with uncertainty inference based on incomplete information, solving the prior probability problem by tracking explicit probability measures that may lack information [
37,
38,
39].
We use the fault type as the identification framework of D-S evidence theory
Θ = {
A1,
A2, …,
An}, and the probability value of the Softmax layer output of the sub-classifiers can well satisfy the condition of the basic probability assignment function (BPA)
m:
where proposition
A is a non-zero subset that identifies the frame, the symbol
m is a measure of the subset of
Θ and
m(A) represents the degree of trust in
A.
For
A ⊂
Θ, in the recognition framework, the Dempster synthesis rules for the finite basic probability distribution functions
m1,
m2, …,
mn are as follows:
where
k represents the degree of conflict evidence, and the coefficient 1/(1 −
k) is called the normalization factor, ensuring that the sum of the probabilities of
BPA is 1.
According to the Dempster synthesis rule, we can fuse the class vectors output of the softmax layer of each ACNN-W, and use the category with the highest probability value of output as the final diagnostic category.
6. Conclusions
In this paper, we propose a method of OFNN, combined Dempster-Shafer evidence theory and the proposed Adaptive One-dimensional Convolution Neural Networks with Wide Kernel, for intelligent fault diagnosis of rolling bearings. It can be effectively achieved higher accuracy under variable load conditions, compared with the FFT-SVM, FFT-DNN, WDCNN, TICNN, Ensemble TICNN and IDSCNN models. Under the premise that we assume that the evidence provided by each classifier is independent of each other, it proves that DS evidence theory can effectively synthesize the output information of multi-classifiers, and better overcomes the local optimal problem caused by random initialization of networks. The proposed method can get rid of the dependence on manual feature extraction during diagnosing experimental bearing vibration data, and give full play to the advantages of one-dimensional neural network for feature extraction of one-dimensional original vibration signals. It is more effective and more robust than the existing intelligent diagnosis methods, overcoming the limitations of individual deep learning model. The new application of combining deep learning with integrated learning is very promising. However, as the number of integrated individual models increases, so does the amount of computer resources that are occupied. It is a research direction for bearing fault diagnosis in the future how to use integrated learning for diagnosis faster and at a lower cost. We will continue to study this topic in the future, combined with the hardware implementation of neural networks.