*5.1. Data Preprocessing*

The observed data in the process of the bearing operation show obvious periodicity, which needs to be eliminated. Taking normal data *f*<sup>0</sup> as an example, the main frequency in the observed signal can be obtained by fast Fourier transform (FFT), and the result of the FFT is shown in Figure 3.

**Figure 3.** Single-sided amplitude spectrum of *f*0.

Figure 3 indicates that the main frequency is approximately 1036 Hz, and thus, the basis function is constructed as

$$f(t) = \begin{bmatrix} 1 & \sin(1036 \times 2\pi t) & \cos(1036 \times 2\pi t) \end{bmatrix}^{\mathrm{T}}.$$

The estimation of *β* calculated using Equation (7) is

$$
\hat{\mathcal{B}} = \begin{bmatrix}
0.0116 & -0.0158 & 0.0548 \\
0.0280 & 0.0326 & -0.0396
\end{bmatrix}.
$$

Thus, the data after removing the intrinsic signal are shown in Figure 4, where Figure 4a represents the acceleration data of the drive end and Figure 4b represents the acceleration data of the fan end.

(**b**) Acceleration data of fan end of *f*<sup>0</sup>

**Figure 4.** Preprocessed data to remove trends by fast Fourier transform (FFT).

In the later fault detection process, the data of all modes are similar to the above operation, and the results are recorded as *fi*.

#### *5.2. Fault Detection Effect*

#### 5.2.1. Norm Data and Known Fault

For the norm data *f*<sup>0</sup> and the known fault *f*1, *f*2, the first 20,480 sample points are selected as the training set, which are recorded as *fi*−train. The last 81,920 sample points are taken as the testing set, which are recorded as *fi*−test. A total of 128 sample points are used as detection objects in each test. The training set data are shown in Figure 5, where Figure 5a,b represent data *fi*−train, *i* = 1, 2 of the two dimensions, respectively.

**Figure 5.** Training data *f*1, *f*<sup>2</sup> after being preprocessed.

Figure 5 shows that the bearing data have high frequency, and the fault does not change the observed mean value; however, it changes the dispersion characteristics or the correlation of data.

### 5.2.2. Unknown Fault

The training data does not necessarily contain all types of patterns, and the detection of unknown faults is always a difficult problem. *f*<sup>3</sup> is used as an unknown fault for fault detection; the training set sample does not contain any information about *f*3. The unknown fault data are shown in Figure 6, where in Figure 6a represents the acceleration data at the driving end and Figure 6b represents the acceleration data at the fan end.

Figure 6 shows that the data of unknown faults is close to the other two types of fault data. If the fault detection method is not sensitive, the detection rate will be reduced significantly.

(**b**) Acceleration data at the fan end of *f*<sup>3</sup>

**Figure 6.** Training data *f*<sup>3</sup> after preprocessed.

#### 5.2.3. Detection Effect

The characteristics of bearing data make bearing fault detection extremely challenging. The input of the training set is *<sup>f</sup>*0−train, the estimation accuracy is *<sup>ε</sup>* <sup>=</sup> <sup>10</sup>−4, and the maximum number of iterations is *kmax* = 100, according to Algorithm 1, the optimal bandwidth is

$$h\_m = 0.0445.$$

The KDE of the training set is obtained by Equation (15), and the results are shown in Figure 7, where Figure 7a,c,e represent the two-dimensional frequency histograms of the training data *fi*−train, *i* = 0, 1, 2, and Figure 7b,d,f represent the two-dimensional KDE of the training data *fi*−train, *i* = 0, 1, 2.

Figure 7 further shows that the bearing fault changes the dispersion characteristics and data correlation. Meanwhile, Figure 7 shows that the KDE of the training data obtained by Equation (15) is in good agreement with the data distribution of the training data, and therefore, this method can really describe the distribution of multidimensional data.

The JS divergence of the training data and KDE of the distribution are obtained by Equations (51) and (58); the results are shown in Figure 8.

**Figure 8.** The results of detection threshold.

When the significance level is *α* = 95%, the detection thresholds of the training set, which are calculated using Equation (58), are

$$\begin{cases} \begin{array}{l} f\_0:JS\_{\text{high}} < 0.1375 \\ f\_1:JS\_{\text{high}} < 0.0995 \\ f\_2:JS\_{\text{high}} < 0.1225 \end{array} \end{cases}$$

Thus, the detection results of using JS divergence methods on the testing data are shown in Figure 9. If the detection points fall within the threshold, the data set to be detected is in the same pattern; otherwise, the data have different patterns.

**Figure 9.** Fault detection effect using JS divergence as index.

Furthermore, detection rates using different methods are shown in Table 1.

**Table 1.** Detection rate of normal and different failure modes using different methods.


For the known fault, Table 1 indicates that the bearing fault identification based on multidimensional KDE and JS divergence achieves better results compared to those obtained using the *T*<sup>2</sup> statistics detection methods in the testing data. The detection rate of normal data *f*<sup>0</sup> increases from 95.08% to 97.03%, the detection rate of fault data *f*<sup>1</sup> increases from 81.33% to 95.81%, and the detection rate of fault data *f*<sup>2</sup> increases from 70.69% to 95.36%. Meanwhile, compared with the cross-entropy methods, the detection rate of normal data *f*<sup>0</sup> increased from 96.95% to 97.03%; of fault data *f*<sup>1</sup> increased from 94.41% to 95.81%; and of fault data *f*<sup>2</sup> increased from 94.19% to 95.36%.

For the unknown fault *f*3, Table 1 shows that the *T*<sup>2</sup> statistics detection method cannot detect the unknown faults. The method using cross entropy as a measure can only detect unknown faults with a detection rate of 53.16%, which is not obvious. The JS divergence method constructed in this study can identify the unknown fault accurately, and the detection rate reaches 69.49%. This is because JS divergence is more accurate at measuring the differences between distributions.

#### *5.3. Influence of Window Width on Fault Diagnosis*

The fault diagnosis effect is related to the data window width; therefore, the fault diagnosis effect under different window widths is investigated. The results are shown in Figure 10.

**Figure 10.** Fault diagnosis effect under different window width *hm*.

Figure 10 indicates that, with the increase in the detection window, the detection performance of the proposed method for the known fault detection first rises, and then, it tends to be stable. This is because when the length of the detection window increases to a certain extent, the data to be detected already contains sufficient information. Meanwhile, if the detection window continues to increase, the contribution rate to the improvement of the fault detection rate is not large. Meanwhile, for unknown faults, the detection rate increases rapidly with the length of the detection window because the longer the detection window, the higher the amount of information contained in the data to be detected, and the better is the difference characterized between the fault and the known fault.

#### **6. Conclusions**

In this study, a method of bearing fault detection and identification was constructed using multidimensional KDE and JS divergence. The distribution characteristics of JS divergence between the sample density distribution and population density distribution were derived using the sliding sampling window method. Thus, the threshold of fault detection was provided, and therefore, different faults, especially unknown faults, could be identified. The theory showed that the multidimensional KDE method could reduce information loss caused by processing each dimension; the JS divergence is more accurate than the traditional cross entropy to measure the difference in density distribution. The experimental results verified the above conclusions.

For a known fault, the detection effect of this method was obviously better than that of the traditional method, and it also had a certain degree of improvement compared with the cross-entropy method. Second, for unknown faults, the traditional method could not detect

the distribution difference accurately, while the detection effect of the proposed method was significantly improved.

Furthermore, the detection effect of this method depends on the window width. The detection effect improved with a growth in the detection window. In this paper, under the condition of a given window width, the estimation formula for the optimal bandwidth of a multidimensional KDE was provided. The experimental results showed that the formula was applicable to any mode of data, and therefore, it had a certain universality.

However, this study has certain limitations. Firstly, although the calculation formula of multidimensional KDE is given in this study, the computational complexity will increase when the dimension is large, which may restrict the further application of the method. Secondly, the calculation of JS divergence is time consuming, which is not conducive to rapid fault diagnosis.

In future research, we can try to use the PCA dimension reduction method to solve the computational complexity caused by very large dimension, and optimize the algorithm flow of JS divergence to expedite the calculation. In the latest study Ginzarly et al. [24], prognosis of the vehicle's electrical machine is treated using a hidden Markov model after modeling the electrical machine using the finite element method. Therefore, we will try to combine this method in future work and apply it to the fault detection of other systems.

**Author Contributions:** Conceptualization and methodology, J.W. (Juhui Wei); formal analysis and visualization, Z.H.; validation and data curation, J.W. (Jiongqi Wang); resources, D.W.; writing review and editing X.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China grant number 61903366, 61903086, 61773021, Natural Science Foundation of Hunan Province grant number 2019JJ50745, 2019JJ20018, 2020JJ4280 and Foundation of Beijing Institute of Control Engineering grant number HTKJ2019KL502007.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors express their appreciation to the Associate Editor and anonymous reviewers for their helpful suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

KDE kernel density estimation

JS Jensen–Shannon

PCA principal component analysis

MISE mean integral square error

#### **References**

