**1. Introduction**

As a critical component of rotating machine, rolling bearings have the advantages of high efficiency, low friction resistance and convenient assembly. Furthermore, their performance directly affects the operation of all the equipment. Therefore, knowing how to fully exploit the fault features from the complex vibration signals and carry out pattern recognition is of grea<sup>t</sup> significance [1,2].

The mainstream methods of fault diagnosis only focus on the application of a single sensor [3–5]. The commonly used sensor is the vibration acceleration sensor, which can measure the relationship between the vibration amplitude and time. However, more and more studies have shown that, for a complex mechanical system, the fault information contained in a single sensor is limited, and accurate condition monitoring and fault diagnosis cannot be performed [6–8]. The application of multiple sensor technologies in fault diagnosis makes it possible to study fault diagnosis based on multiple sensors. Wang et al. [9] proposed a mixture of Gaussians and variational auto-encoders (Mix-VAEs) fault diagnosis method, which can fully utilize the redundancy and complementarity of multisensor information. Chen et al. [10] proposed an stack auto-encoder and deep belief network (SAE-DBN) based multisensor fusion method, and verified the effectiveness through a bearing fault experiment. Shi et al. [11] proposed a two-stage multisensor fusion method to achieve accurate diagnosis of hydraulic directional valve faults. The above studies show that compared with a single sensor, multisensor information fusion technology can further improve the accuracy and reliability of diagnosis.

Multisensor information fusion technology includes data-level fusion, feature-level fusion and decision-level fusion, which have their own advantages and limitations [12–14]. The advantage of data-level fusion is that the raw signals of multiple sensors can be directly fused. Unfortunately, the raw data usually contains a lot of redundant information, and the data-level fusion method cannot take full advantage of the complementarity

**<sup>\*</sup>**Correspondence: lqdlzheng@126.com

between the information of multiple sensors. Furthermore, the interpretability of the data is poor. Jing et al. [15] directly fused data from multisensors to construct a deep network for planetary gearbox fault diagnosis. Huang et al. [16] proposed a multisensor data fusion method to solve the problem of multisource remote sensing data fusion.

In order to make up for the deficiencies of data-level fusion methods and eliminate redundant information from multiple sensors, data-level fusion methods can be combined with feature extraction methods. First, the data from each sensor is transformed into a high-dimensional feature representation, and then, fusion is performed at the feature level, and this fusion method is called feature-level fusion [17,18]. Li et al. [19] proposed a fault diagnosis method based on a feature fusion covariance matrix and Riemann kernel ridge regression. Wang et al. [20] proposed a multisource sensor feature fusion method based on a convolutional neural network for mechanical fault diagnosis. Jiang et al. [21] extracted various entropy values of vibration signals using information entropy theory, and established a feature-level fusion model to classify faults. One of the advantages of feature fusion is that it can flexibly choose where to fuse, but it cannot eliminate the effect of high correlations between different sensor features.

In decision-level fusion, the basic learning model is first trained with different sensor signals, and then the output results of multiple models are fused through decision strategies. The errors of fusion models come from different basic learning models, which are often ir-relevant and do not affect each other, and will not cause further accumulation of errors. Therefore, the decision-level fusion method is favored. Common decision fusion methods [22,23] include the voting method and D-S evidence theory. Li et al. [24] proposed an enhanced weighted voting combination strategy with specific category threshold to realize multisensor decision fusion. Basir et al. [25] constructed a multisensor-based model according to D-S evidence theory to solve the problem of engine fault diagnosis. Zhao et al. [26] proposed a new distributed distance measurement method to measure the conflict between evidence based on an improved evidence theory algorithm. The decisionlevel fusion method is very sensitive to the selection of voting fusion rules, which directly determines the fusion result.

For the fault diagnosis of multisensor fusion, a unified and effective fusion model and algorithm has not ye<sup>t</sup> been established, and various proposed models are still in the exploratory stage. From the above discussion, it can be seen that feature-level fusion is more flexible and convenient, not only to select information that can characterize fault features, but also to fuse at multiple locations. Furthermore, deep learning has the ability to learn features directly from raw signals, which largely overcomes the loss of effective information in feature-level fusion. Therefore, this paper proposes a multisensor feature fusion method combined with feature-level fusion and the deep learning method, and applies them to the fault diagnosis of rolling bearings under different working conditions. The proposed feature fusion method provides a more effective means for the deep mining of fault signals. The main contributions of this paper are as follows:


The rest of the paper is organized as follows. Section 2 reviews the AE. In Section 3, the proposed model is described in detail. Section 4 gives a detailed analysis and discussion of the experimental diagnosis results of rolling bearings. Section 5 presents the conclusions and possible future research directions.

## **2. Theoretical Basis**

*Autoencoder*

Autoencoders (AE) can minimize the reconstruction error of input and output and are unsupervised neural networks. The structure of AE is shown in Figure 1. It consists of an input layer, a hidden layer and an output layer. The input layer and the hidden layer constitute the encoder, and the hidden layer and the output layer constitute the decoder. The encoder converts the high-dimensional input data into a low-dimensional feature representation, and the decoder converts the feature representation into a reconstructed form of the input data.

**Figure 1.** Structure of AE.

The encoder maps raw input signal **X** to the hidden layer feature **H**. The process is as follows:

$$\mathbf{H} = r\_f(\mathbf{W}\mathbf{X} + \mathbf{b})\tag{1}$$

The decoder reconstructs the hidden layer feature **H** to obtain the output vector **X** ˆ . The process is as follows:

$$
\hat{\mathbf{X}} = r\_{\vec{\mathcal{K}}} (\mathbf{W}^\prime \mathbf{X} + \mathbf{b}^\prime) \tag{2}
$$

where **W** and **W** are the weight matrix, **b** and **b** are the bias matrix *rf* and *r*g are the activation function.

The reconstruction error of AE is:

$$L(\mathbf{X}, \hat{\mathbf{X}}) = \frac{1}{2} \left\| \mathbf{X} - \hat{\mathbf{X}} \right\|^2 \tag{3}$$

where • represents the norm.

> Therefore, the total loss function for *S* sample is:

$$J(\mathbf{W}, \mathbf{b}) = \frac{1}{S} \sum\_{n=1}^{S} L(\mathbf{X}, \hat{\mathbf{X}}) \tag{4}$$

## **3. Proposed Method**

In this section, a feature fusion model based on multisensor signals is proposed and applied to rolling bearing fault diagnosis.

#### *3.1. Fusion Model Architecture for Multisensor Signals*

The proposed method consists of two steps. The first step is multisensor feature fusion, where the IMF of each sensor vibration signal is calculated by VMD [27]. Then, time-domain, frequency-domain and multiscale entropy features are extracted based on the preferred IMF and fused into a multidomain feature dataset. In the second step, the DAEN is constructed and the multisensor fusion features of the first step are used as inputs of the DAEN. Then, the multisensor fusion features are further extracted and classified.

## *3.2. Implementation Process*

3.2.1. Multisensor Feature Fusion

> The proposed feature fusion method is as follows:


#### 3.2.2. Deep Feature Learning and Classification

To enhance the performance of multisensor feature fusion, the DAEN model is proposed for deep feature learning and classification in this section. The proposed DAEN model is a multilayer neural network, which is composed of multiple stacked AE and a Softmax classification layer. The structure of DAEN is shown in Figure 2.

**Figure 2.** Structure of the proposed DAEN.

DAEN uses the Sigmoid activation function for nonlinear mapping [30]. The Sigmoid activation function is defined as follows:

$$\text{Sign}\,\text{void}(x) = \frac{1}{1 + \varepsilon^{-x}}\tag{5}$$

The output of DAEN hidden layer is:

$$h\_i = \frac{1}{1 + \cfrac{-\left(\sum\_{j=1}^{N} w\_{ij} x\_j + b\_j\right)}{1 + \cfrac{-\left(\sum\_{j=1}^{N} w\_{ij} x\_j + b\_j\right)}{}}} \tag{6}$$

where *wij* is the connection weight between node *i* at layer *L* and node *j* at layer *L* + 1, and *bj* is the bias of the hidden layer node *j*.

The most commonly used loss function of AE is the mean square error [31], which is defined as:

$$\mathcal{L}(\mathbf{x}, \mathfrak{k}) = \sum\_{i=1}^{S} (\mathbf{x} - \mathfrak{k})^2 \tag{7}$$

Then the loss function of the proposed DAEN model can be expressed as:

$$J(w, b) = \sum\_{i=1}^{S} \left(\mathbf{x}^{i} - \mathbf{\hat{x}}^{i}\right)^{2} + rR(w, b) \tag{8}$$

where the first term is the mean square error loss, the second term is the penalty term and *r* is the sparse penalty factor.

The training process of DAEN consists of unsupervised training and fine-tuning. The process is as follows:


#### *3.3. Rolling Bearing Fault Diagnosis Process Based on the Proposed Method*

Based on the proposed method, the process of the rolling bearing fault diagnosis method is as follows:

