*Article* **A Comprehensive Diagnosis Method of Rolling Bearing Fault Based on CEEMDAN-DFA-Improved Wavelet Threshold Function and QPSO-MPE-SVM**

**Yi Wang, Chuannuo Xu, Yu Wang and Xuezhen Cheng \***

College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China; skd992631@sdust.edu.cn (Y.W.); xuchuannuo@sdust.edu.cn (C.X.); wangyu007@sdust.edu.cn (Y.W.)

**\*** Correspondence: zhenxc6411@163.com or chengxuezhen@sdust.edu.cn; Tel.: +86-135-0532-4619

**Abstract:** A comprehensive fault diagnosis method of rolling bearing about noise interference, fault feature extraction, and identification was proposed. Based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), detrended fluctuation analysis (DFA), and improved wavelet thresholding, a denoising method of CEEMDAN-DFA-improved wavelet threshold function was presented to reduce the distortion of the noised signal. Based on quantum-behaved particle swarm optimization (QPSO), multiscale permutation entropy (MPE), and support vector machine (SVM), the QPSO-MPE-SVM method was presented to construct the fault-features sets and realize fault identification. Simulation and experimental platform verification showed that the proposed comprehensive diagnosis method not only can better remove the noise interference and maintain the original characteristics of the signal by CEEMDAN-DFA-improved wavelet threshold function, but also overcome overlapping MPE values by the QPSO-optimizing MPE parameters to separate the features of different fault types. The experimental results showed that the fault identification accuracy of the fault diagnosis can reach 95%, which is a great improvement compared with the existing methods.

**Keywords:** rolling bearing fault; CEEMDAN; DFA; improved wavelet threshold; QPSO; MPE; SVM

#### **1. Introduction**

Rolling bearings are the most important parts of rotating machinery and are widely used in modern mechanized equipment [1,2]. The fault of rolling bearing components can cause serious damage to the running status of the machine and the production process. Therefore, it is important to explore a new fault diagnosis technique. However, in practical engineering, the collected bearing vibration signals contain various interference signals (e.g., white noise, harmonic interference), and have nonlinear and nonstationary properties, which makes it difficult to distinguish the bearing fault types and to identify them [3]. Therefore, to improve the fault diagnosis accuracy of rolling bearings, this paper researched three aspects of signal denoising, fault feature extraction, and fault type identification of rolling bearings.

Common vibration signal denoising methods include empirical mode decomposition (EMD) [4], discrete wavelet transform (DWT) [5], singular-value decomposition (SVD) [6–8], and so on. Abdelkader et al. [9], based on EMD algorithm, proposed a method to determine the travel point based on the energy values of the IMF components of each order, and an improved wavelet threshold was used to denoise the selected IMF component-bearing vibration signals. Zhao et al. [10] fused the EMD algorithm and selected and tested (ST) the algorithm to denoise the signal, solving the low accuracy in the low-speed operation of the bearing of the EMD denoising method based on the number of interrelationships and the kurtosis criterion. Gao et al. [11] added the CEEMDAN algorithm to adaptive

**Citation:** Wang, Y.; Xu, C.; Wang, Y.; Cheng, X. A Comprehensive Diagnosis Method of Rolling Bearing Fault Based on CEEMDAN-DFA-Improved Wavelet Threshold Function and QPSO-MPE-SVM. *Entropy* **2021**, *23*, 1142. https:// doi.org/10.3390/e23091142

Academic Editors: Giuseppe Fusco, Quanmin Zhu, Jing Na, Weicun Zhang and Ahmad Taher Azar

Received: 30 July 2021 Accepted: 27 August 2021 Published: 31 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

noise, further reducing the modal effect, yielding better convergence, and realizing the fault diagnosis of rolling bearings.

The hard and soft thresholding functions in wavelet thresholding denoising are widely used, but both functions have their shortcomings and can cause a certain degree of distortion when reconstructing the signal. Tajeddini et al. [12] proposed a generic threshold function based on an unbiased risk estimation method to carry out wavelet-packet transform preprocessing of the vibration signal, followed by generic threshold denoising. Kumar et al. [13] proposed an improved modified kurtosis-mixing threshold rule to denoise the vibration signal, which improved the signal-to-noise ratio. Yang et al. [14] used a particleswarm algorithm to perform adaptive optimal processing of the initial parameters of the wavelet threshold function, which solved the problem of the initial parameters being influenced by empirical values and overcame the defects such as signal noise removal not being complete enough or removing the useful information. Chegini et al. Ref. [15] proposed a new method of bearing vibration signal denoising based on empirical wavelet transform (EWT) as well as the threshold function, which better overcame the defect of constant deviation of the traditional soft threshold function. However, the denoising performance of the wavelet threshold denoising method is closely related to the wavelet basis function and the threshold function, and is not adaptive. Although the CEEMDAN algorithm is adaptive, the definition of noise and useful signal is relatively vague, resulting in high distortion of the denoised signal. Shi et al. [3] proposed a de-trending fluctuation analysis technique for the critical values of the noise component and the useful signal component. This method ensures the accuracy of signal correlation discrimination to a certain extent. Based on the CEEMDAN algorithm decomposition of the vibration signal, Chaabi et al. [16] used wavelet analysis, principal component analysis, and order tracking analysis to perform multi-method denoising. In order to improve identification accuracy of rolling bearings with nonlinear and nonstationary vibration signals, Chen et al. [17] proposed a novel fault diagnosis method based on wavelet thresholding denoising and CMEEMDAN with adaptive noise. To verify the denoising effectiveness of combined wavelet thresholding and CEEMDAN, Bie et al. [18] added random Gaussian white noise to the bearing fault signal to simulate the actual noise disturbance of rolling bearings, and adopted the CEEMDANwavelet threshold function to the denoising method. The experimental results showed that the combined denoising method could effectively remove the interference of noise. Therefore, based on the application characteristics of each denoising method, this paper proposed a denoising method based on a CEEMDAN-DFA-improved wavelet threshold function. The method uses the CEEMDAN algorithm to decompose the vibration signal, performs detrended fluctuation analysis (DFA) on the obtained eigenmode function (IMF), calculates the scalar function value of each IMF component, selects the noise-dominated IMF component, and applies an improved wavelet threshold function to denoise it.

The current methods of fault feature extraction for rolling bearings are mainly timedomain analysis, frequency domain analysis, and time-frequency domain analysis; among them, the time-frequency domain analysis method is most commonly used. Zhen et al. [19] decomposed the vibration signal using wavelet packets and selected a suitable bandwidth according to the Kurtosis spectrum correlation theory, and finally applied the envelope spectrum analysis to extract the eigenfrequencies. Han et al. [20] used the Teager energy operator to enhance the signal after wavelet denoising and then extracted the bearing fault features by the CEEMD algorithm. Li et al. [21] used exchange entropy (PE) for bearing fault feature extraction, but could not measure multi-scale signals. Multi-scale variational entropy (MPE) [22,23] was introduced into fault diagnosis by Yin et al. [24]. Du et al. [25] used MPE to extract fault features and combined them with a self-organizing fuzzy classifier based on the harmonic mean difference (HMDSOF) to classify the fault features. MPE can respond to the changes of vibration signals very well, but its parameters have a great influence on the calculation of entropy values. If the parameters are not selected properly, it will cause the arrangement entropy values of multiple signals of bearings to be mixed, and thus the type of bearing failure cannot be identified. In order to achieve complete

separation of arrangement entropy values under different operating conditions of the bearings, this paper adopted quantum particle swarm optimization (QPSO) to optimize the initial parameters of MPE, and then selected the appropriate MPE entropy values to construct the rolling bearing fault feature vector.

Common fault classification and identification methods include artificial neural networks (ANN) [26], extreme learning machines (ELM) [27,28], and support vector machines (SVM) [29,30]. The ANN has made possible many achievements in the field of pattern identification, but its identification performance is strongly influenced by parameters and it is easy to fall into local minima during the optimization process. Although ELM runs fast, its generalization performance is poor. SVM has fewer adjustable parameters and runs stably. It can obtain higher diagnostic accuracy under the condition of fewer training samples [24]. Therefore, this paper used a SVM for fault identification of rolling bearings. This paper also proposed a comprehensive rolling bearing fault feature extraction and identification method based on the combination of QPSO-MPE-SVM.

The main work and contributions of this paper are summarized as follows:


The rest of this paper is organized as follows. The CEEMDAN-DFA-improved wavelet thresholding denoising method is introduced, and its validity is verified in Section 2. The QPSO-MPE-SVM fault feature extraction and diagnosis method is presented, and its validity is verified in Section 3. The effectiveness of the proposed method is proven by experimental example in Section 4. In addition, contrastive analysis among the different methods is conducted. Finally, some conclusions are summarized in Section 5.

#### **2. Denoising Algorithm of the CEEMDAN-DFA-Improved Wavelet Threshold Function**

#### *2.1. Basic Algorithm Related to CEEMDAN-DFA-Improved Wavelet Threshold Function*

#### 2.1.1. CEEMDAN Algorithm

To overcome the high computing time and the residue of added noise present in the IMFs, and to better deal with the modal mixing problem of empirical modal decomposition (EMD), the ensemble empirical mode decomposition (EEMD) was proposed by Liu et al. in 2009 [31]. The EEMD algorithm is effective in suppressing modal aliasing, but a certain degree of distortion is produced when the signal is reconstructed. The improved CEEM-DAN algorithm was proposed by Colominas et al. in 2014 [32]. The improved CEEMDAN algorithm is summarized as follows.

(1) The *j*th IMF component generated by the signal decomposition by the EMD is defined as *Ej*(·). The *j*th IMF by the CEEMDAN is defined as *IMF<sup>j</sup>* 0 . *n i* (*t*) is for the Gaussian white noise. The CEEMDAN performs *I* EMD decomposition on the noisy signal *x*(*t*) + *ε*<sup>0</sup> · *n i* (*t*) formed by the combination of the original signal and the white noise. Then the first IMF component decomposed by CEEMDAN can be expressed as:

$$IMF\_1'(t) = \frac{1}{I} \sum\_{i=1}^{I} IMF\_1^i(t) \tag{1}$$

(2) First residual sequence of the first stage (*j* = 1) is expressed as:

$$r\_1(t) = \mathbf{x}(t) - IMF\_1'(t) \tag{2}$$

(3) The *r*1(*t*) + *ε*1*E*<sup>1</sup> *n i* (*t*) (*i* = 1, 2, · · ·) is processed several times using the EMD algorithm until the first IMF component is generated. The second IMF component is expressed as:

$$IMF\_2'(t) = \frac{1}{I} \sum\_{i=1}^{I} E\_1\left(r\_1(t) + \varepsilon\_1 E\_1\left(n^i(t)\right)\right) \tag{3}$$

(4) Perform step (3) above for the other remaining stages (*j* = 2, 3, · · ·*J*), then the *j* + 1 IMF component is expressed as:

$$r\_j(t) = r\_{j-1} - IMF\_j(t) \tag{4}$$

$$IMF\_{j+1}'(t) = \frac{1}{I} \sum\_{i=1}^{I} E\_1\left(r\_j(t) + \varepsilon\_j E\_j\left(n^i(t)\right)\right) \tag{5}$$

(5) Add 1 for *j* and repeat step (4) until the residual sequence cannot be processed. The number of IMF components is *J*. The final calculated residual sequence is expressed as:

$$r(t) = \mathbf{x}(t) - \sum\_{j=1}^{J} IMF\_j'(t) \tag{6}$$

(6) The original signal *x*(*t*) represented by the IMF component and the residual component is expressed as:

$$\mathbf{x}(t) = \sum\_{j=1}^{J} IMF\_j'(t) + r(t) \tag{7}$$

The CEEMDAN algorithm introduces a segment of positive and negative white noise for each stage processing, and overcomes the reconstruction error defect of the EEMD algorithm by processing only one residual component to find each IMF component. The CEEMDAN algorithm is able to reconstruct the decomposed signal with close to zero deviation. However, it also has shortcomings, for example, the definition of noise and useful signal in IMF components containing more noise is relatively ambiguous, and direct removal of these components can cause signal distortion.

#### 2.1.2. DFA Algorithm

To solve the problem of CEEMDAN for noise and useful signal demarcation ambiguity, the DFA algorithm is more accurate [33].

(1) *x*(*i*) is defined as the average of the time series *x*(*i*) in the time intervals [1,*N*], and denoted as:

$$\overline{\mathfrak{X}} = \frac{1}{N} \sum\_{i=1}^{N} \mathfrak{x}(i) \tag{8}$$

where *N* is the number of segments of length *n*.

(2) The time series *y*(*k*) is segmented into segments of length *n.* It is denoted as:

$$y(k) = \sum\_{i=1}^{k} [\mathbf{x}(i) - \overline{\mathbf{x}}] , k = 1, 2, 3 \dots \text{-} \tag{9}$$

(3) The trend *ys*(*i*) of each series segment is calculated as:

$$y\_s(i) = \sum\_{n=0}^{k} a\_n i^n \tag{10}$$

(4) After removing the uncertain trend in each series segment, the second-order fluctuation coefficient of the segment series is expressed as:

$$F^2(n,s) = \frac{1}{n} \sum\_{i=1}^{n} \left( y[(s-1)n+i] - y\_s(i) \right)^2 \tag{11}$$

$$F\_{\emptyset}(n) = \left[\frac{1}{N\_n} \sum\_{s=1}^{N\_s} F^2(n, s)\right]^{1/2} \tag{12}$$

(5) Change the segment length *n* in step (1), and repeat steps (2) and (3) to obtain the change in the fluctuation function of the time series. The correlation of the time series represented by the Hurst function is expressed as:

$$H = \frac{\log\_2 F\_q(n)}{\log\_2(n)} = \frac{\log\_2 \left[ \frac{1}{N\_n} \sum\_{s=1}^{N\_n} F^2(n, s) \right]}{\log\_2(n)} \tag{13}$$

(6) The relationship between the scalar function *α* and the time series fluctuation function *F*(*n*) is expressed as:

$$F(n) \propto n^{\alpha} \tag{14}$$

It can be seen that *α* is related to *F*(*n*), and *F*(*n*) is related to *H*. Therefore, *α* is related to H. The value of *α* is proportional to the smoothness of the signal. When 0 < *α* < 0.5, it indicates that the proportion of noise in the signal is very large; *α* = 0.5 indicates the signal is not correlated; the signal has a large correlation when *α* > 0.5 [34]. By comparing the values of *α* in the IMF components, the noise-dominated IMF components and the useful information-dominated IMF components can be distinguished.

DFA is an algorithm for correlation discrimination of non-smooth signals. It is used to confirm the critical value of the noise component and the useful signal component, which can guarantee the accuracy of signal correlation discrimination to a certain extent.

#### 2.1.3. Improved Wavelet Threshold Function

To solve the distortion problem of the signal caused by the CEEMDAN, a wavelet threshold function is introduced to denoise the dominant component of the noise. Traditional threshold functions include hard threshold functions and soft threshold functions. The hard threshold function can maintain the signal characteristics well, but there is a discontinuity at the set threshold *λ*, which will cause serious oscillations when the signal is reconstructed. The soft threshold function does not have the discontinuity at the set threshold, but the wavelet coefficients after threshold quantization will have a constant deviation, which leads to a large deviation for the signal characteristics. Therefore, Based on the advantages of the hard thresholding function and soft thresholding function, an improved wavelet threshold function is presented to reduce reconstruction error.

The improved threshold function must meet the following conditions.


The improved wavelet threshold function is expressed as:

$$\Psi\_{j,k} = \begin{cases} \text{sign}\left(\mathbf{\varPsi}\_{j,k}\right) \left[ \left| \mathbf{\varPsi}\_{j,k} \right|^2 - \left( \lambda e^{-(|\mathbf{\varPsi}\_{j,k}| - \lambda)/k} \right)^2 \right]^{1/2} & \left| \mathbf{\varPsi}\_{j,k} \right| \ge \lambda\\ a \mathbf{\varPsi}\_{j,k} & \left| \mathbf{\varPsi}\_{j,k} \right| < \lambda \end{cases} \tag{15}$$

where *k* and *α* in the expression are adjustable, *k* > 0, *a* ∈ (0.05, 0.5).

The characteristics of the improved wavelet threshold function are as follows:

(1) When *ψj*,*<sup>k</sup>* → *λ*, *e* −(|*ψj*,*<sup>k</sup>* |−*λ*)/*<sup>k</sup>* <sup>→</sup> <sup>1</sup> and \_ *ψj*,*<sup>k</sup>* → 0, the improved wavelet threshold function is continuous at the threshold *λ*. When *ψj*,*<sup>k</sup>* → ∞, *e* −(|*ψj*,*<sup>k</sup>* |−*λ*)/*<sup>k</sup>* <sup>→</sup> <sup>0</sup> and \_

*ψj*,*<sup>k</sup>* → *ψj*,*<sup>k</sup>* , the improved wavelet threshold function is an asymptote, which makes the reconstructed signal closer to the actual value.


The selection of the threshold value has a large impact on the signal denoising effect. The formula for the threshold selection rule used in this paper is expressed as:

$$
\lambda\_i = \frac{\delta \sqrt{2 \ln(N)}}{\ln(i+1)} \tag{16}
$$

where *δ* is for noise intensity. *i* is the number of decomposition layers. *N* is the signal length. The threshold *λ<sup>i</sup>* varies with the number of decomposition layers. The wavelet coefficients of a noisy signal are inversely related to the number of decomposition layers. To ensure that the reconstruction error of the signal after denoising is small, the threshold *λ<sup>i</sup>* should be dynamically smaller as the number of decomposition layers increases.

#### *2.2. The Validation of CEEMDAN-DFA-Improved Wavelet Threshold Function Denoising Algorithm*

The operational flow of the CEEMDAN-DFA-improved wavelet threshold function denoising algorithm is shown in Figure 1. Firstly, the CEEMDAN is used to decompose the vibration signal of the rolling bearing to obtain a series of IMF components. Then, the IMF is analyzed by DFA, and the scaling function values of each order of IMF are calculated. According to the correlation of DFA, the IMF component dominated by noise is selected, and the improved wavelet threshold function is used for denoising. Finally, the IMF component after denoising and the IMF component dominated by useful signals are reconstructed, and the reconstructed signal is the bearing vibration signal after denoising.

The verification data of the denoising algorithm come from the rolling bearing database provided by the Electrical Engineering Laboratory of Case Western Reserve University (CWRU) [35]. The validation data selected for this paper were the inner rings of a bearing with a damaging size of a 0.007-inch signal. To meet match the actual working environment, random Gaussian white noise was superimposed on the used bearing inner-ring fault signal. Figure 2 shows the time domain waveform of the bearing inner-ring fault noise-containing signal. The noise-containing signal was first decomposed using CEEMDAN to obtain the 12th-order IMF component, and then the IMF was calculated by the DFA for the scaling function value *α*. The calculation results are shown in Table 1. From the table, it can be seen that the scaling function value *α* of the first four-order IMF components was less than 0.5. Therefore, it can be determined that the main component of the first fourth-order IFM component is noise, and the main component of the remaining IMF is useful information. Signal reconstruction was performed on the first fourth-order IMF, and the reconstructed signal was denoised.

To compare the advantages of the proposed denoising algorithms, four denoising methods were used, including CEEMDAN-DFA, CEEMDAN-DFA-hard thresholding, CEEMDAN-DFA-soft thresholding and the CEEMDAN-DFA-improved wavelet thresholding. The denoising results are shown in Figure 3.

**Figure 1.** The operational flow of the CEEMDAN-DFA-improved wavelet threshold function algorithm.

**Figure 2.** The time domain waveform of the inner-ring fault noise signal.

**Figure 3.** The time domain waveform of inner-ring fault noise signal.


**Table 1.** Scalar function values of each order IMF of the bearing inner ring fault signal.

The results show that both CEEMDAN-DFA and CEEMDAN-DFA-hard threshold denoising methods cause large signal distortion. The values of the signal-to-noise ratio (SNR) and root-mean-square error (RMSE) using the four denoising methods to the inner ring fault signal are shown in Table 2. The proposed CEEMDAN-DFA-improved wavelet threshold denoising method improved the SNR by 11.4% and reduced the RMSE by 16.2% compared with the CEEMDAN-DFA-wavelet soft threshold denoising method. It can be seen that the proposed denoising method in this paper has better performance than other denoising methods.

**Table 2.** The SNR and RMSE of the inner-ring fault denoising.


#### **3. Fault Feature Extraction and Identification Algorithm of the QPSO-MPE-SVM**

*3.1. Basic Algorithm Related to the QPSO-MPE-SVM*

3.1.1. MPE Algorithm

The theoretical foundation of MPE is based on multiscale analysis and alignment entropy. MPE was used to coarsely granulate the initial time series to create a multiscale time series [35]. The calculation procedure is described as follows:

(1) The initial time series *x*(*i*) is coarsely granularized to obtain the coarse-grained series *y τ j* , which is calculated as follow:

$$y\_j^\pi = \frac{1}{\pi} \sum\_{i=(j-1)+1}^\pi x\_i \ 1 \le j \le \frac{N}{\pi} \tag{17}$$

(2) Calculate multi-scale alignment entropy based on sequence *y τ j* .

$$\text{MPE}(\mathbf{x}, \mathbf{r}, m, \lambda) = \text{PE}(y\_{\mathbf{j}'}^{\tau}, m, \lambda) \tag{18}$$

The MPE calculation requires the signal length *N*, delay time *λ*, embedding dimension *m*, and scale factor *τ*. The MPE can respond very well to the changes of vibration signals, but if the parameters are not selected properly, the permutation entropy values of multiple vibration signals will be overlapped, and thus the fault characteristics cannot be extracted [36].

#### 3.1.2. QPSO Algorithm

In order to achieve complete separation of the MPE values for different operating conditions of the bearings, the QPSO is introduced in this paper to optimize the initial parameters of the MPE. To find the global optimal solution, the calculation formula is expressed as [37]:

$$m\_{\text{best}} = \frac{1}{M} \sum\_{j=1}^{M} P\_j \tag{19}$$

$$P = rP\_{\dot{f}} + (1 - r)P\_{\mathcal{S}} \tag{20}$$

$$L\_j(t\_1 + 1) = P \pm \varsigma \left| m\_{\text{best}} - L\_j(t\_1) \right| \ln 1/u \tag{21}$$

where *mbest* is particle optimum average. *M* is race number. *P<sup>g</sup>* is the global optimal solution for the particle. *ς* is compression expansion factor.

The fitness function has a large impact on the optimization results of QPSO when studying the trend of the data.

The MPE values of time series *x*(*i*) are a formed sequence *Hp*(*X*), *Hp*(*X*) is described as follows:

$$H\_P(X) = \{H\_P(1), H\_P(2), \dots, H\_P(n)\}\tag{22}$$

Skewness formula is described as follows:

$$\text{ske} = \frac{E\left[H\_P(X) - H\_P^m(X)\right]^3}{\left[H\_P^d(X)\right]^3} \tag{23}$$

where E is the expected value. *H<sup>m</sup> p* (*X*) and *H<sup>d</sup> p* (*X*) are the mean and standard deviations of *Hp*(*X*).

The fitness function about QPSO is described as follows:

$$F(X) = \frac{1}{ske^2 + 1} \tag{24}$$

The fitness function has a large impact on the optimization results of QPSO. The skewness value is inversely proportional to the fitness function. Therefore, by calculating the minimum value of the skewness, we can obtain the best fitness function value. Using this value to optimize the parameters of the alignment entropy, we can make the distinction between the alignment entropy values of different fault types more obvious and make it easier to carry out fault diagnosis and classification.

#### 3.1.3. SVM Algorithm

Assume that {(*x*, *y*)|*i* = 1, 2, · · ·, *k*} is the input training set, where *k* is the number of training set samples.*x<sup>i</sup>* ∈ *T d* is the d-dimensional feature vector. *y<sup>i</sup>* ∈ {−1, 1} denotes the category of sample *x<sup>i</sup>* .

In order to maximize the sample interval, the optimization formula was constructed and expressed as follows [38]:

$$\begin{array}{c} \min\_{w,b} \frac{||w||^2}{2} + \mathbb{C} \sum\_{i=1}^{k} \xi\_i \\ \text{s.t. } y\_i[(w\_i \mathbf{x}) + b] \ge 1 - \xi\_i \, i = 1, 2, \dots, k \end{array} \tag{25}$$

where *w* is the weighting of the optimal classification surface. *C* denotes the penalty parameter of the deviation item *ξ<sup>i</sup>* . When *C* > 0, *ξ<sup>i</sup>* is the error caused when the sample points are misclassified.

The above nonlinear planning problem was transformed into a linear planning problem using the Lagrangian equation. The Lagrangian equation fuses the objective function with the constraint function and then finds the optimum value of that equation, which is the optimal classification surface. The classification decision function was obtained as:

$$f(\mathbf{x}) = \text{sgn}\left(\sum\_{i=1}^{k} \alpha\_i y\_i \mathcal{K}(\mathbf{x}\_i, \mathbf{x}) + b^\*\right) \tag{26}$$

where *α<sup>i</sup>* is a Lagrangian multiplier. *K*(*x<sup>i</sup>* , *x*) represents the kernel function of the Mercer condition, which is denoted as follows:

$$K(x, y) = \exp\left(-\frac{||x - y||^2}{2g^2}\right) \tag{27}$$

where *g* indicates the complexity of the subspace distribution of the sample data.

#### *3.2. The Validation of the QPSO-MPE-SVM Algorithm*

The process of QPSO-MPE-SVM fault feature extraction and identification is shown in Figure 4, which is as follows.

**Figure 4.** QPSO-MPE-SVM fault identification flow chart.


The correlation coefficient value is proportional to the fault information content in the IMF component. The kurtosis value reflects the amount of shock information in the IMF component; the larger the kurtosis value, the more shock information in the IMF component, and the smaller the kurtosis value, the less oscillating information in the IMF component.

The validated data came from the rolling bearing database provided by the Electrical Engineering Laboratory of Case Western Reserve University (CWRU) [35]. The correlation coefficient values and Kurtosis values of each order IMF component of the bearing fault signal (normal signal, inner-ring fault signal, outer-ring signal, and rolling-body fault signal) at 1797 rmp were calculated and the results are shown in Figure 5.

**Figure 5.** Plots of IMF component correlation coefficients and Kurtosis values. (**a**) The correlation coefficients of IMF; (**b**) The Kurtosis values of IMF.

From Figure 5, it can be seen that the correlation coefficients of the four vibration signals were different, the correlation coefficients of the first five order IMFs of the fourfault signals were all above 10%, and Kurtosis values were all greater than 3. Therefore, the first fifth-order IMF components were selected for signal reconstruction.

To prove whether the bearing fault characteristic information can be accurately extracted from the first fifth-order IMFs, we took the inner ring fault signal as an example for research. The CEEMDAN decomposed the fault signal of the bearing inner ring and selects the first-order and fifth-order IMF for signal reconstruction. Then, the reconstructed signal was analyzed by Hilbert-Huang transform (HHT) envelope spectrum. The envelope spectrum analysis of the original signal and the reconstructed signal is shown in Figure 6.

**Figure 6.** The HHT envelope spectrum of the inner-ring fault signal. (**a**) The envelope spectrum analysis of the original signal; (**b**) The envelope spectrum analysis of the reconstructed signal.

The theoretical value of the characteristic frequency of the inner-ring fault signal is 162.1 Hz, and the actual value is 161.9 Hz. The error of no more than 10% between the theoretical frequency value and the actual frequency can be determined as the same fault. From the original signal envelope spectrum shown in Figure 6a, it can be seen that there were other interference peaks near the peak of the fault characteristic frequency (161.9 Hz), which could not effectively extract the fault characteristic frequency. The HHT envelope spectrum of the reconstructed signal shown in Figure 6b, indicating that the characteristic

frequency (161.9 Hz) can be easily extracted, and it is also easy to extract the di-frequency (323 Hz) and tri-frequency (484.9 Hz) of the fault features.

In summary, it was proved that the first fifth-order IMF components can retain the fault feature information, which further verifies the effectiveness of the proposed denoising method in this paper. This scheme can realize the real-time processing of fault signals and can be looped.

#### 3.2.1. Optimize MPE Values Using QPSO

The initial parameters of MPE were set to *N* = 1024, *λ* = 1, *m* = 6 and *τ* = 12. The first fifth-order IMF components of the normal signal, inner-ring fault signal, outer-ring fault signal, and rolling-element fault signal were selected for signal reconstruction. The MPE values reconstructed signal was calculated and the results are shown in Figure 7. It can be seen that the MPE values of each signal were not distinguishable, so they could not be used as an effective feature set for fault classification.

**Figure 7.** The MPE values without QPSO optimization.

To separate the MPE values of the four operating conditions of the bearings, the QPSO was used to optimize the initial MPE parameters. The parameters of QPSO were set to 30 for the number of races, 200 for the maximum number of iterations, and 10 and 0.2 for the maximum and minimum inertia weights, respectively. The parameters of the optimized MPE are shown in Table 3. The MPE values were calculated using the optimized parameters, and the results are shown in Figure 8. It can be seen that the MPE values of each signal were independent of each other, which can better construct the fault feature set. In this paper, the scale factors with stable and larger partition degrees were selected to construct the fault feature vector, so the selected factors were 5, 6, 7, 8, and 9.

**Figure 8.** The MPE values optimized after QPSO.


**Table 3.** Parameters of MPE after QPSO optimization.

#### 3.2.2. Fault Feature Extraction and Identification Using the QPSO-MPE-SVM

Sixty sets of vibration data were selected for each of the four operating conditions of the bearings, for a total of 240 sets. These data were divided equally into two groups, one for the training set, and the other for the test set. Two fault-identification experiments were operated. One experiment is with the initial parameters of MPE, and the fault identification results are shown in Figure 9; the other experiment was with the optimized parameters of MPE by QPSO, and the fault identification results are shown in Figure 10. The indications in the figure include 1 for the normal state, 2 for the inner-ring failure state, 3 for the outer-ring failure state, and 4 for the rolling-element failure state.

**Figure 9.** MPE-SVM fault identification results.

**Figure 10.** QPSO-MPE-SVM fault identification results.

Analysis of Figures 9 and 10 reveals that when the MPE values without optimization were directly input to the SVM for fault identification, the fault identification accuracy was 85.83%, while the fault identification accuracy could reach 97.5% after optimization by the QPSO. In order to verify the reliability of the proposed method, several simulation experiments were repeated, and the simulation results all showed that the proposed method could effectively improve the accuracy of bearing fault diagnosis. Thus, the effectiveness of the QSPO-MPE-SVM method proposed is confirmed.

#### **4. Fault Diagnosis of Rolling Bearing of Sine Roller Screen Based on Vibration Signal**

The experimental platform is shown in Figure 11. The experimental platform completes the tasks of installing sensors, driving bearings, applying radial loads and acquiring vibration signals. The test bearing was a grease-lubricated deep-groove ball bearing, and rolling bearing parameters as shown in Table 4. The working limit temperature of the bearing is 400 ◦C. The rotational speed is 10–800 r/min, and the limit rotational speed is 2000 r/min. The speed can be continuously adjusted, and the error is less than ±1%. The applied radial load is 120 kg, the ultimate is 500 kg, the model number of the load sensor is MZLF-2, and the sensor sensitivity is 2.0 ± 0.01 mV/V. The model number of the acceleration sensor is KH-HZD-B-2-12, and the sensitivity is 20 ± 0.05 mV/mm/s. The model number of data acquisition cards is USB3200N (32 bit, 4 channels, 20 MHz); the vibration sensor (MIC-HZB-F-2-12) collected the vibration acceleration of the bearing in real-time, and transferred the vibration signal to the data-acquisition card. The signal was finally transferred to the computer for data processing. All the experimental data were within the allowable error range.

**Figure 11.** Sine roller screen rolling bearing fault experimental platform.

**Table 4.** Rolling bearing parameters.


*4.1. Rolling Bearing Feature Frequency Calculation*

The rolling bearing speed is 500 r/min. The theoretical formula of the fault frequency is expressed as:

(1) Journal rotation frequency is expressed as:

$$f\_r = \frac{v}{60} \tag{28}$$

(2) Inner-ring fault characteristic frequency is expressed as:

$$f\_a = \frac{nf\_r}{2} \left( 1 + \frac{d}{D} \cos \theta \right) \tag{29}$$

(3) Outer-ring fault characteristic frequency is expressed as:

$$f\_b = \frac{nf\_r}{2} \left( 1 - \frac{d}{D} \cos \theta \right) \tag{30}$$

(4) Rolling-body fault characteristic frequency is expressed as:

$$f\_d = \frac{Df\_r}{2d} \left[ 1 - \left(\frac{d}{D}\right)^2 \cos^2\theta \right] \tag{31}$$

where *v* is journal rotation speed.

The calculation results by the formulas are shown in Table 5.

**Table 5.** The fault feature frequency of rolling bearing.


#### *4.2. Rolling Bearing Fault Signal Denoising and Feature Extraction*

Taking the inner ring fault signal as an example, the original vibration signal and HHT envelope spectrum of the inner-ring fault are shown in Figures 12 and 13, respectively. It was denoised using CEEMDAN-DFA-improved wavelet thresholding.

**Figure 12.** Time domain diagram of the original signal of the inner-ring fault.

**Figure 13.** Original signal envelope spectrum of the inner-ring fault.

The IMFs were generated by the CEEMDAN decomposition, and the value of the scaling function *α* of the IMFs was calculated. *α* values of the first eighth-order IMFs are shown in Table 6.

**Table 6.** Scalar function value of each IMF component of the inner-ring signal.


According to the DFA, correlation can be judged that the first fourth-order IMFs were dominated by noise, so the first four-order IMFs need to be denoised. The time-domain signal of the inner-ring fault after denoising is shown in Figure 14. The denoised signal and the remaining IMFs were reconstructed, and the reconstructed signal is the fault signal of the bearing inner ring.

**Figure 14.** Time domain diagram of the inner-ring fault signal after denoising.

The denoised signal was again decomposed by the CEEMDAN, and the correlation coefficients and Kurtosis values of each IMF were calculated. The calculation results are shown in Table 7.

**Table 7.** Correlation coefficients and Kurtosis values of IMFs of the inner-ring signal.


From Table 7, it can be found that the correlation coefficients of the first fifth-order IMF components were greater than 0.1, and the Kurtosis values were greater than three. Therefore, the first fifth-order IMF component signal was reconstructed, and the HHT envelope spectrum of the reconstructed signal was analyzed. The analysis result is shown in Figure 15. It can be seen that the characteristic frequency of the inner-ring fault signal was 39.55 Hz, which is close to the theoretically calculated value (40.143 Hz).

**Figure 15.** Envelope spectrum of the inner-ring fault signal after denoising.

The initial parameters of the MPE of the four running signals were all set as *N* = 1024, *λ* = 1, *m* = 6, *τ* = 12. The MPE values were calculated using the initial parameters, and the results are shown in Figure 16a. It can be seen that the MPE values of four signals had small differences and overlapped each other, which is not conducive to the construction of the fault feature set.

**Figure 16.** The MPE values of the four running signals. (**a**) The MPE values using the initial parameters; (**b**) The MPE values using the optimized parameters.

The initial parameters were optimized using the QPSO, and the results are shown in Table 8.


**Table 8.** The optimized parameters of MPE after QPSO.

The MPE values of the four running signals were calculated using the optimized parameters, and the results are shown in Figure 16b. It can be seen that the MPE values of the four signals achieved separation without overlap, which is suitable for constructing fault feature sets for fault-type identification. The MPE values with scale factors (5, 6, 7, 8, 9, and 10) were selected to construct the fault feature vectors.

#### *4.3. Rolling Bearing Failure Identification and Control Experiment*

A total of 160 sets of vibration experimental data were selected from 40 sets of four running signals, of which 100 sets were used as training samples and the remaining 60 sets were used as test samples. Comparing the four test sets, the recognition results are shown in Figure 17. The fault identification result of the original signal by MPE-SVM is shown in Figure 17a. The fault identification result of QPSO-MPE-SVM to the original signal is shown in Figure 17b. The results of MPE-SVM fault identification on the denoised (CEEMDAN-DFA-improved wavelet threshold function) signal are shown in Figure 17c. The results of QPSO-MPE-SVM fault identification on the denoised (CEEMDAN-DFA-improved wavelet threshold function) signal are shown in Figure 17d.

The following conclusions can be drawn from the analysis of Figure 17.

When using the MPE-SVM method for fault identification on the original signal without denoising, there were 23 false identifications, and the identification accuracy was only 61.67%. When using the QPSO-MPE-SVM method for fault identification on the original signal without denoising, there were 17 false identifications, and the identification accuracy was 71.67%, which shows that the noise in the collected vibration signal had a large interference to the bearing fault diagnosis.

**Figure 17.** Four types of test sets identify results. (**a**) The results of MPE-SVM fault identification on the original signal; (**b**) the results of the QPSO-MPE-SVM fault identification on the original signal; (**c**) the results of MPE-SVM fault identification on the denoised signal; (**d**) the results of QPSO-MPE-SVM fault identification on the denoised signal.

When using the MPE-SVM method for fault identification on the denoised signal, the number of false identifications was reduced to 10, and the identification accuracy increased to 83.33%. When using the QPSO-MPE-SVM method for fault identification on the denoised signal, the number of false identifications was only three and the fault identification accuracy was 95%.

By applying the comprehensive diagnosis method proposed in this paper to the experimental platform of rolling bearings, the fault diagnosis of the actual measured fault signal of the inner ring of the bearing was achieved. The de-noising process and fault feature extraction were completed, and the fault feature set was constructed to achieve fault identification. The experimental results showed that the fault identification accuracy could be 95%. The method was verified, by several experiments, to not only have high accuracy and reliability, but also not be limited to inner-ring fault diagnosis, also being applicable to vibration signals with non-smooth and non-linear characteristics (such as outer-ring fault signals). Therefore, the method has good practical application value.

#### **5. Conclusions**

This paper proposed a CEEMDAN-DFA-improved wavelet threshold denoising method and QPSO-MPE-SVM fault feature extraction and identification method, the purpose being to realize the fault diagnosis of the rolling-bearing vibration signal.

The denoising process with the CEEMDAN-DFA-improved wavelet threshold method is as follows: Firstly, the vibration signal is decomposed into IMF by CEEMDAN, and the

DFA is performed on the IMF components. Then, the scaling function of each IMF component is calculated to select the noise-dominated IMF component. Finally, the improved wavelet threshold function is applied to denoise the noise-dominated IMFs. The de-noised IMFs and the remaining other IMFs are merged to get reconfigured signals. The validation results showed that compared with traditional denoising methods, the CEEMDAN-DFAimproved wavelet threshold function method proposed in this paper could better remove noise, effectively reduce the signal distortion, and maintain the original characteristics of the signal.

The fault feature extraction and identification with the QPSO-MPE-SVM method are as follows: Firstly, the reconfigured signal is decomposed again by CEEMDAN, and the correlation coefficients and Kurtosis values of each order IMF component are calculated; the IMF components with larger values about correlation coefficients and Kurtosis are selected for signal reconstruction, and the HHT envelope spectrum analysis is performed on the reconstructed signal to extract the fault characteristic frequencies. Then, the initial parameters of MPE are optimized with QPSO, the MPE value is calculated for the reconstructed signal, and the appropriate MPE value is selected to construct the rolling-bearing fault feature vectors. Finally, the fault feature vectors are inputted to the trained SVM for rolling-bearing fault type identification. The validation results showed that the MPE parameters were optimized by the QPSO, which makes the MPE values of the four signals achieve separation without overlap, which is more suitable for constructing fault feature sets for fault type identification.

The algorithms proposed in the paper were all validated by building an experimental platform of rolling bearings. The experimental results showed that the fault identification accuracy of rolling bearings could reach 95%; it not only has high accuracy and reliability, but also can be applicable to vibration signals with non-smooth and non-linear characteristics, having good practical application value.

**Author Contributions:** Conceptualization, Y.W. (Yi Wang); data curation, Y.W. (Yi Wang); formal analysis, Y.W. (Yi Wang); funding acquisition, X.C.; Investigation, C.X.; methodology, Y.W. (Yi Wang); project administration, X.C.; resources, X.C.; software, Y.W. (Yu Wang) and C.X.; supervision, X.C. and C.X.; validation, Y.W. (Yu Wang) and C.X.; writing—original draft, Y.W. (Yi Wang); writing—review and editing, Y.W. (Yu Wang), X.C. and C.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China Program (No.62073198), the Major Research Development Program of Shandong Province of China (No.2016GSF117009).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** https://csegroups.case.edu/bearingdatacenter/pages/12k-drive-endbearing-fault-data accessed on 30 August 2021.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Jinghui Pan 1,\*, Lili Qu <sup>2</sup> and Kaixiang Peng <sup>1</sup>**


**Abstract:** This paper proposes a data-driven method-based fault diagnosis method using the deep convolutional neural network (DCNN). The DCNN is used to deal with sensor and actuator faults of robot joints, such as gain error, offset error, and malfunction for both sensors and actuators, and different fault types are diagnosed using the trained neural network. In order to achieve the above goal, the fused data of sensors and actuators are used, where both types of fault are described in one formulation. Then, the deep convolutional neural network is applied to learn characteristic features from the merged data to try to find discriminative information for each kind of fault. After that, the fully connected layer does prediction work based on learned features. In order to verify the effectiveness of the proposed deep convolutional neural network model, different fault diagnosis methods including support vector machine (SVM), artificial neural network (ANN), conventional neural network (CNN) using the LeNet-5 method, and long-term memory network (LTMN) are investigated and compared with DCNN method. The results show that the DCNN fault diagnosis method can realize high fault recognition accuracy while needing less model training time.

**Keywords:** fault diagnosis; sensor fault; actuator fault; deep convolutional neural network; robot joints

### **1. Introduction**

With the development of robot and control technology, various robots are widely used in industry. Different applications present specific requirements for robot systems, such as rapidity, robustness, and safety [1–3]. However, among all the indices required by applications, the controllability of the robot in fault state has become the most critical factor. Fault identification is the precondition for realizing this goal, which promotes the investigation of our research [4–6].

Robot systems cannot work without the support of different kinds of sensors and actuators. Miniaturization and multi-functionality are required for development. The rapid development of sensors, material science, and micro-electro-mechanical technology allows modern robot joint modules—such as hollow motor, servo driver, harmonic reducer, brake, and encoder—to be integrated within limited space [7]. Sensors and actuators are key components in the robot system, but their working environment is very complex, with electromagnetic interference, vibration, etc., which will affect the output of the sensors and then the actuators. Moreover, the variable load on manipulators is also a challenge for system state feedback or estimation. All of the above factors make the faults diagnosis of robot system sensors and actuators an urgent task [8].

In most robot system faults, sensor and actuator malfunction are the main causes of robot system failure. Therefore, diagnosis for the sensors and actuators is very important. In order to improve the reliability of robot joints and realize fault detection and faulttolerant control of robot systems, researchers have been focused on fault detection and fault-tolerant control of robot joints for many years, and many practical fault diagnosis

**Citation:** Pan, J.; Qu, L.; Peng, K. Sensor and Actuator Fault Diagnosis for Robot Joint Based on Deep CNN. *Entropy* **2021**, *23*, 751. https:// doi.org/10.3390/e23060751

Academic Editor: José A. Tenreiro Machado

Received: 4 May 2021 Accepted: 8 June 2021 Published: 15 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

methods have been proposed. In [9], redundant sensors are used on the robot joint, and then fuzzy rules are designed to adjust the threshold of the fault signal adaptively to carry out fault diagnosis. In [10], for a six-degree-of-freedom robot joint system, low-cost MEMS magnetic, angular velocity, and gravity sensors are used to estimate the joint angle of a rotating manipulator. In [11], a discrete-time framework for fault diagnosis of robot joint sensors, force or torque sensors, and actuators is proposed. The redundant sensors are used on the robot joint, and the feedback values from redundant sensors and the estimated values calculated by two isolation observers are input into the fault decision system. The data from redundant sensors are used to provide information for a group of diagnostic observers to detect, isolate, and identify faults of joint actuators, force, or torque sensors.

However, there may be another consideration when using redundant sensors for fault diagnosis. A robot fault diagnosis system based on redundant sensors not only increases structural complexity, but also increases the hardware cost of the system. In addition, redundant sensors also increase the probability of a sensor fault when the running time of a robot system approaches the sensor's life cycle.

In order to overcome the shortcomings of using redundant sensors for fault diagnosis, observers have been widely used. There are many novel theories that could be used to design state observers for robot fault diagnosis. A robot-fault diagnosis method using fuzzy logic is proposed in [12] to evaluate residuals. Fuzzy logic applied to robot fault diagnosis does not require any redundant sensors, but it relies on the fault model of the robot system. The sliding mode method can be seen everywhere in robot fault diagnosis. Daniele uses a second-order sliding mode observer for fault detection and isolation of the rigid manipulator of the COMAU robot and uses the suboptimal second-order sliding mode algorithm to design the input law of the proposed observer, which can detect a single fault on a specific brake or a specific sensor of the manipulator [13]. Since the high order sliding mode observer can detect possible faults in specific parts of the robot system, the sliding mode method is greatly expanded [14]. The observer design methods mentioned above are just some typical representatives, actually, there are many other methods that could be used for robot fault diagnosis, such as the output feedback method [15], nonlinear disturbance observer [16], and feedback linearization disturbance observer design method [17]. As it is well known, the difficulty of observer-based robot fault diagnosis lies in the gain design process [18].

Machine learning introduces an effective solution to the above problems caused by redundant sensors and observers. Typical application methods include, but are not limited to, genetic algorithm [19], support vector machine [20], cluster analysis [21], and neural network [22]. Among them, the neural network is widely used in the field of fault diagnosis because of its superior nonlinear fitting ability. Traditional methods of fault diagnosis manually realize feature extraction, so prior knowledge about fault information is needed, which increases the difficulty of analyzing the results. Neural networks, especially deep learning methods, can learn representations and patterns hierarchically from the input data, and realize effective feature extraction, so the deep learning method has the ability to model complex working conditions and output accurate predictions. Several typical deep learning methods have been successfully applied to fault diagnosis [23–26], including autoencoders [27], deep belief networks [28], and CNN [29]. The autoencoders and feature ensemble method is applied in actuator fault diagnosis [30]. Furthermore, the one-layer autoencoder-based neural network is proven to be effective in the task of fault classification [31]. The deep belief nets model is successfully applied for fault diagnosis in actuators using vibration signals [32]. One-dimensional CNN is used to analyze the raw time series data of the actuator and proved to be successful in diagnosing fault states [33], and a new CNN architecture based on LeNet-5 is set to process the bearing data set [34].

Considering that the output of sensor and actuator are similar when faults occur, the normal neural network fault diagnosis methods cannot exactly tell the difference between them. In this paper, the DCNN is used to diagnose sensor and actuator faults of robot joints. DCNN can extract the features from the input data and realize fault classification

by increasing the depth of the network. In addition, flexible selection of convolution kernel width makes it an efficient way to deal with classification problems with weak characteristics. Actually, there may be many types of sensors and actuators; our research mainly focuses on the problems of fault diagnosis in position sensors for the robot joint and torque sensors for the actuator. The robot joint is forced to move in a sinusoidal trajectory with the control of actuator, and the position sensor feeds back corresponding signals under different sensor states. Position sensor and torque sensor are separately denoted by sensors and actuators in the following main text. The main contributions of this paper are as follows.


The rest of this paper is organized as follows: Section 2 introduces the basic structure of DCNN, Section 3 introduces the neural network module training method based on deep CNN, simulation experiments are conducted in Section 4 and the results are compared, the authors conclude the paper at the end.

#### **2. Basic Structure of DCNN**

Generally, CNN consists of five parts: the convolutional layer, Batch normalization, activation layer, pooling layer, and dropout layer, as shown in Figure 1. In the CNN fault diagnosis architecture, each layer plays a different role. The following part of this section will briefly introduce the function of each layer.

**Figure 1.** Diagram for CNN fault diagnosis system.

#### *2.1. Convolution Layer*

The convolution layer is one of the most important parts of CNN. It is an effective feature extraction method using a convolution operation. The expression of convolution operation in discrete format is as follows.

$$S(l,k) = X\_l^k \* \mathcal{W}\_l^k + b\_l \tag{1}$$

where *S*(*l*, *k*) is the output of convolution core, *X k l* is the input of convolution core, *W<sup>k</sup> l* is the core function, and *b<sup>l</sup>* is the bias term. The numbers *l, k* are the serial number of the layer and convolution kernel. The mathematical operator ∗ in the above equation denotes the sum based on the multiplication of corresponding elements. Figure 2 gives an example of convolution operation with kernel dimension of 2 × 2. The input data with a dimension of 4 × 4 is divided into 9 subsets when the sliding step is 1, and each subset has the same dimension compared with the kernel. The multiplication of corresponding elements between kernel and subsets generates a 3 × 3 matrix, which is called the output of convolution core.

**Figure 2.** Diagram for convolution calculation.

#### *2.2. Batch Normalization*

Batch normalization is one of the most computing-intensive layers in CNN architecture. Meanwhile, it is also a widely used method to accelerate and stabilize the CNN training process. The main purpose of batch normalization is to force the input data sets back to the standard normal distribution with the mean value of zero and the variance of one, so that the input of the nonlinear transformation function falls into the sensitive region, which could avoid gradient loss.

The input of batch normalization comes from the output of the convolution core, and the output *y<sup>i</sup>* of batch normalization is relative to data features, as shown in the following equation:

$$y\_i = \gamma \pounds\_i + \beta \tag{2}$$

where the gain *γ* and the bias term *β* are used to accelerate the convergence process. *x*ˆ*<sup>i</sup>* is the function of mean value and standard deviation *σβ*, see the following equation.

$$\pounds\_{\dot{l}} = \frac{1}{\sigma\_{\beta}} (\varkappa\_{\dot{l}} - u\_{\beta}) \tag{3}$$

$$\begin{cases} \begin{aligned} \sigma\_{\beta} &= \sqrt{\frac{1}{m} \sum\_{i=1}^{m} \left(\boldsymbol{x}\_{i} - \boldsymbol{u}\_{\beta}\right)^{2} + \varepsilon} \\ &\boldsymbol{u}\_{\beta} = \frac{1}{m} \sum\_{i=1}^{m} \boldsymbol{x}\_{i} \end{aligned} \tag{4}$$

where *ε* is a very small positive value, *x<sup>i</sup>* is the output of convolution operation module, *m* is the length of data sets.

#### *2.3. Activation Layers*

A convolutional neural network consists of stacked layers, with two basic parts on each layer; separately, they are trainable linear convolutional filters and a fixed nonlinear activation function. The activation function used in our research is ReLU [35], and its expression is as follows.

$$\mathbf{g}(t) = \max(t, 0) = \begin{cases} \ 0, & t < 0 \\ \ t, & t > 0 \end{cases} \tag{5}$$

The ReLU function indicated by Equation (5) is a nonlinear function, where its derivative is one when *t* > 0, and zero when *t* ≤ 0. The graph of the ReLU function is shown in Figure 3. The use of the ReLU function eliminates the problem of gradient vanishing compared with Sigmoid and tanh functions.

**Figure 3.** Graph of ReLU.

#### *2.4. Pooling Layer*

The pooling layer is used to reduce the amount of feature data needed and enhance the operational performance of the network. The main pooling methods could be classified into two categories, and they are maximum value pooling and mean value pooling methods. Figure 4 shows the basic operation of the above two pooling methods, where the input data dimension is 4 × 4, and the pooling kernel dimension is 2 × 2. The pooling kernel matrix is used to multiply with input data using the sliding step of 2. The maximum value pooling method chooses the maximum value in all four data with the same color, while the mean value pooling methods get the average value.

**Figure 4.** Operation of pooling process.

#### *2.5. Dropout Layer*

To alleviate the overfitting problem of the neural network, some neurons in the hidden layer are temporarily discarded according to a certain proportion in the training process of the neural network, and all neurons are restored when used again. When the rate of discard is fifty percent, each neuron has one half of the probability to be removed, so that the training of one neuron does not rely on another one, thus, the interaction between features is weakened, so as to alleviate the overfitting phenomenon caused by severe neural network depth [36]. The dropout process is shown in Figure 5.

**Figure 5.** Process of dropout.

#### *2.6. Fully Connected Layer*

The fully connected layer is a classifier, which can map the features extracted from the filter modules to the marked data features [37]. The input of the fully connected layer is the output of the last pooling layer. Each neuron of the fully connected layer relates to all other input neurons. Then the data on each neuron is processed using the ReLU activation function, and the final output is classified through the Softmax regression method. Taking one dimensional convolution as an example, the full connection process is shown in Figure 6.

**Figure 6.** Full connection process.

#### *2.7. Loss Function*

There should be an evaluation method for the precision of the model during the training process. In order to make the prediction outputs of the neural network model consistent with the real values, cross-entropy function is calculated to evaluate the differences between them. The cross-entropy function is called loss function and its expression is as follows.

$$\operatorname{loss\\_ce} = -\sum\_{\mathfrak{n}} P(\mathfrak{y}) \log P(\mathfrak{y}) \tag{6}$$

where *y* denotes the known result, *y*ˆ is the output of the network, *P*(*y*) is the probability distribution of known results, *P*(*y*ˆ) is the probability distribution of network prediction results, and *loss\_ce* is the cross-entropy function. The cross-entropy function emphasizes the difference in the probability distribution of each data class and is often used in the multi-classification problem of neural networks.

The softmax regression method makes the outputs of the neural network submit to established distribution. Assuming outputs are divided into *n* classes, then (*y*ˆ1, *y*ˆ2, . . . , *y*ˆ*n*) is obtained. When the Softmax function is used, the outputs of the neural network meet the desired probability distribution. The calculation of the loss function is shown in Figure 7.

$$Softmax(\mathcal{y}\_i) = \frac{e^{\mathcal{Y}\_i}}{\sum\_{j=1}^n e^{\mathcal{Y}\_j}} \tag{7}$$

$$\forall \mathcal{Y} \, P(\hat{Y} = \mathcal{Y}) \in [0, 1] \, \cap \, \sum P(\hat{Y} = \mathcal{Y}) = 1 \tag{8}$$

**Figure 7.** Calculation of loss function.

#### **3. Sensor and Actuator Fault Diagnosis Framework Using DCNN**

The proposed robot joint sensor and actuator fault diagnosis framework based on DCNN is shown in Figure 8. It is shown that the whole fault diagnosis framework can be divided into data fusion, multi-level feature extraction, dropout of neurons, full connection, and diagnosis output parts. There are six "Blocks" in the framework and each framework consists of the convolution layer, batch standardization layer, activation layer, and pooling layer.

**Figure 8.** Fault diagnosis framework of DCNN.

The convolution layer parameters of Block1 are [32 \* 1, 8, 4], where 32 \* 1 represents the dimension of one-dimensional convolution kernel, convolution depth is 8, and sliding step is 4. The convolution parameters of Block2 to Block6 are [3 \* 1, 16, 1], [3 \* 1, 32, 1], [3 \* 1, 64, 1], [3 \* 1128, 1], [3 \* 1128, 1]. The pooling parameters of Block1 are [2 \* 1, 8, 2], where 2 \* 1 represents the dimension of the pooling kernel, pooling depth is 8, and sliding step is 2. The pooled layer parameters from Block2 to Block6 are [2 \* 1, 16, 2], [2 \* 1, 32, 2], [2 \* 1, 64, 2], [2 \* 1128, 2], [2 \* 1, 16, 2]. The pooled output of Block6 is used as the input of Dropout, and the rejection rate is set to 50% to slow down the over fitting of the model. The number of neurons in the full junction layer is 100, and they are connected to 10 categories.

#### *3.1. Data Fusion*

This paper aims at the fault diagnosis problem of robot sensor and actuator, so there are many kinds of input data that contain a variety of fault information. Therefore, this paper adopts a data fusion scheme to unify the sensor fault and actuator fault in one expression, so that the output of the fused model contains both sensor and actuator fault characteristics. The first thing to do is to establish the mathematical model of sensor and actuator respectively. According to the laws of mechanics, the mathematical model of the actuator is as follows. .

$$\mathcal{S}\_1: \begin{cases} \dot{\mathbf{x}} = A\mathbf{x} + \mathcal{B}f(q, \dot{q}, \mathbf{r})\\ y = E\mathbf{x} \end{cases} \tag{9}$$

where,

$$A = \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}, B = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, E = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, f(q, \dot{q}, \tau) = D^{-1}(q)[\tau - \mathcal{C}(q, \dot{q})\dot{q} - \mathcal{G}(q)],$$
  $\mathbf{x} = \begin{bmatrix} \mathbf{x}\_1 \ \mathbf{x}\_2 \end{bmatrix}^T = \begin{bmatrix} q \ \dot{q} \end{bmatrix}^T$ , and  $q, \dot{q} \in R^u$  are state variables of angular position and angular velocity.

From Equation (9), the torque equation of the actuator can be obtained, as follows.

$$D(q)\ddot{q} + \mathcal{C}(q, \dot{q})\dot{q} + \mathcal{G}(q) = \rho \tau + f\_a \tag{10}$$

where *τ* is the vector for torque with a dimension of *n*; *D*(*q*) and *C*(*q*, . *q*) are square matrices with the dimension of *n*, denoting inertia matrix and Coriolis force matrix, respectively; and *G*(*q*) ∈ *R n* is the gravity moment vector. *ρ* ∈ [0,1] is the effective factor of the actuator. *ρ* = 0 means the actuator is completely broken. *f<sup>a</sup>* is the bias term and its value is positively correlated with the degree of actuator damage. The actuator faults with different combinations of *ρ* and *f<sup>a</sup>* are listed in Table 1.

**Table 1.** Actuator fault type.


It could be seen from Equation (9) that the output *y* variable of the system *S*<sup>1</sup> is the state variable *x* multiplied by the coefficient matrix *E*. Thus, the robot sensor fault can be directly expressed by the output equation, as follows.

$$\mathcal{S}\_2: \left\{ \begin{array}{l} \dot{\mathbf{x}} = A\mathbf{x} + \mathcal{B}f(q, \dot{q}, \mathbf{r})\\ y = \lambda \mathbf{E}\mathbf{x} + f\_b \end{array} \right. \tag{11}$$

where *λ* ∈ [0,1] is the effective factor of the sensor. *λ* = 0 means the sensor does not work anymore. *f<sup>b</sup>* is the bias term and is positively correlated with the degree of sensor damage. The sensor faults with different combinations of *λ* and *f<sup>b</sup>* are listed in Table 2.

**Table 2.** Sensor fault type.


From the robot joint sensor fault model and actuator fault model, it could be seen that both of them affect the system in a different way. However, through the model derivation and transformation, the sensor fault can be transformed into an actuator fault through the first-order filter [38], which simplifies the model. The sensor and actuator fault data are fused according to the following formula.

$$Fault = a \Delta Sensor + bActualor \tag{12}$$

where *Fault* denotes fault data set needed, ∆*Sensor* represents the difference between sensor output and settings, *Actuator* is the output of actuator, *a* and *b* are sensor and actuator faults coefficient respectively. Equation (12) has now unified two kinds of fault from sensor and actuator, which helps to obtain training data sets.

The robot sensor and actuator mixed faults studied in this paper are listed in Table 3. F1 to F10 will be used to represent different fault labels in the following description for ease of use.


**Table 3.** Investigated Sensor and actuator fault type.

#### *3.2. Training of Model and Diagnosis*

The fault diagnosis model needs to be well trained before it is used to realize fault diagnosis. The basic training process of the fault diagnosis model based on the DCNN proposed in this paper is as follows.

Fused data containing single or mixed fault of sensor and actuator is input into Block1 (refer to Figure 8). Convolution layer1 uses kernel1 to carry out convolution operations. The result of the convolution operation is input into the batch standardization module, and the extracted data features are standardized to make the extracted feature data conform to the standard normal distribution. The activation function is then used to activate the neurons. The activation function used here is the ReLU function, and it owns the fine linear property, which overcomes the saturation effect of using Sigmoid or Tanh functions.

Finally, the activated features are input into the pooling layer. The pooling method used here is the maximum pooling method, which can extract the maximum features. The above operation completes the main steps of Block1. The output of Block1 is propagated to Block2, and the above steps conducted in Block1 are repeated until all six Blocks finished the corresponding operation.

After feature extraction is finished, the data should flow into the fully connected layer. however, in order to speed up the training process, the dropout layer is introduced to inactivate some neurons with a certain probability, and then the rest of the neurons come into the fully connected layer, and finally the fault diagnosis model gives the prediction labels.

In the training stage of the model, updating the network weight parameters is key. Figure 9 gives the flowchart of the parameters updating process, where the network outputs are evaluated using the loss function to determine the direction of parameter update. After the model is well trained, the real-time data from the robot sensor and actuator can be propagated into the fault diagnosis model to realize fault diagnosis.

**Figure 9.** Process of training based on DCNN.

#### **4. Experiment and Analysis**

Data needed is based on multi-joint cooperative robot AUBO i3, with a DOF of 6, and six revolute joints with a maximum working radius of 625 mm. Our Matlab model is constructed based on the above platform. There are several position sensors and one actuator for each joint, so it is necessary to monitor their working state. Experiments using different fault diagnosis methods are conducted to reveal the effectiveness of the proposed DCNN. A PC with i7-10510U 1.8Ghz processor and 16G of RAM is used. The PyCharm software is installed, and combined with a Python3.6 interpreter. All the algorithms are implemented on Keras with Tensorflow as its backend.

The basic architecture of several experiments is shown in Figure 10, where the fused data is expanded through the data set enhancement method, and then several neural network fault diagnosis methods are investigated and their results are compared.

**Figure 10.** The basic architecture of the Comparative experiments.

#### *4.1. Data Sets Enhancement*

Considering that the amount of fault sample is not enough in a real robot joint system, in this paper, the data set enhancement method is used to expand the fault sample data sets and improve the generalization ability of the model. In order to expand the acquired robot joint fault data, a sliding sampling data set enhancement method is proposed in this paper, and the schematic diagram of data set enhancement is shown in Figure 11.

**Figure 11.** Diagram of data set enhancement.

The system data over a period is obtained, and a sample data segment with *N*<sup>1</sup> points is needed for single training. Assuming that the length of the data obtained is *N*, then the network could be trained by *N*/*N*<sup>1</sup> times according to the above method. In order to expand the coefficient of utilization for data, the start point of the second data set is *h* backwards compared to the first one, and the rest is roughly the same. The difference of samples before and after the data set enhancement method is used is shown in the following equation.

$$\frac{N-h-1}{N\_1-h-1} - \frac{N}{N\_1} = \frac{N-N\_1}{N\_1(N\_1-h-1)} \ge 0\tag{13}$$

It is obvious that when the sliding step size is small, which means *h* is quite large, more data samples could be obtained, which can well meet the needs of data sets in the training process. Here in our research, the sliding step selected is *h* = 29, and thus we could obtain plenty of data samples for model training and validation.

We have constructed a robot fault model in MATLAB/Simulink according to Equation (12), and the fault data needed in our study is obtained. In the process of data acquisition, the sampling rate is set to1000 hertz, and the sampling time is 8 s, so 8000 sample points for each kind of conditions are obtained, see Figure 12. The aforementioned data set enhancement method is adopted, so we can get 2000 samples for each kind of conditions. The 2000 samples are divided into three parts, and they are training, verification, and testing samples respectively, and the proportion of the above samples are 70%, 20%, and 10%, which would be used in the later model training and verification process. It should be noted that the input data set is one-dimensional, which is different from the conventional two-dimensional image data.

From Figure 12, it is very clear to see that some of the fault types could be easily distinguished, such as F1, F3; F2, F9. while some of them could not, such as F3, F4. When the "F3" fault happens, if the diagnosis model output is "F4", it may bring adverse effects. Thus, realizing high accuracy prediction makes sense.

#### *4.2. Hyper Parameters of DCNN*

Hyper parameters of the neural network include learning rate, regularization parameters, and iterations. Actually, these hyper parameters control the values of the weight coefficient. According to existing researches, hyper parameters in deep learning algorithms not only affect the performance of the algorithm itself, but also affect the expression ability of the trained model. This paper takes the advice of Bengio [39]. The corresponding hyper parameters are set according to whether these hyper parameters can increase or decrease the capacity of the model.

In the process of parameter updating, the exponential decay learning rate is adopted. At first, a large learning rate is used to get the optimal solution quickly, and then the learning rate is gradually reduced to keep the model stable in the later stage of training. The initial learning rate *η*<sup>0</sup> is set to 0.2, and the decay learning rate *ξ* is set to 0.99, so the decay rate is updated per round. The expression for decay rate is as follows.

$$\begin{cases} \quad \eta = \eta\_0 \times \tilde{\mathcal{G}}^{(H/L)} \\ \quad H = E poch/Batch\_k \end{cases} \tag{14}$$

where *η* denotes exponential decay learning rate, *H* stands for the number of the current round, and *L* represents the turns the decay should be executed once, *Batch\_k* is the number of iterations. When a complete data set passes through the neural network once and then returns, this process is called *Epoch*.

**Figure 12.** Data samples under different working conditions.

In order to alleviate the overfitting of the neural network, the *l*<sup>2</sup> regularization method is used in this paper. Regularization is to introduce the model complexity index into the loss function and suppress the noise in the training data set by weighting the parameters in the neural network. The expression of the loss function is as follows.

$$\begin{cases} \text{Loss} = \text{Loss\\_all} + \text{REGULARIZER} \times \\ \text{Loss}(w) \\ \text{Loss}(w) = \sum\_{i} |w\_i| \end{cases} \tag{15}$$

where *Loss\_all* represents the loss function of all parameters in the model, *REGULARIZER* is the regularization weight, *w* generally refers to the weight in the forward propagation of neural network, and *Loss(w)* is the result of *l*<sup>2</sup> regularization of parameter *w*.

The Adma optimal algorithm is used [40] and the procedure of weight updating is as follows.

Step 1: give the iteration step *ε* = 0.001.

Step 2: set the decay rate for matrix calculation, *ρ*<sup>1</sup> = 0.99, *ρ*<sup>2</sup> = 0.999.

Step 3: Determine the convergence threshold *δ* = 10−<sup>8</sup> .

Step 4: Initialize network weight *θ*, 1st and 2st moment variables *s*,*r*, and set s = 0,r = 0.

Step 5: Set the simulation time step to 0.0001.

Step 6: Small data set with *m* samples are collected from the training set, using {*x* (1) ,*x* (2) , . . . ,*x* (m)}. to denote it, and set corresponding goals *y* (*i*) .

Step 7: Calculate gradient variable *<sup>g</sup>* <sup>←</sup> <sup>1</sup> *<sup>m</sup>*∇*θ*∑ *i L f x* (*i*) ; *θ*),*y* (*i*) , and update biased first moment estimation *s* ← *ρ*1*s* + (1 − *ρ*1)*g* as well as biased second moment estimation *r* ← *ρ*2*r* + (1 − *ρ*2)*g* .

Step 8: Correct the deviation of the first moment *<sup>s</sup>*<sup>ˆ</sup> <sup>←</sup> *<sup>s</sup>* 1−*ρ t* 1 and deviation of the second moment *<sup>r</sup>*<sup>ˆ</sup> <sup>←</sup> *<sup>r</sup> t* .

1−*ρ* 2 Step 9: Calculate incremental weight error <sup>∆</sup>*<sup>θ</sup>* <sup>=</sup> <sup>−</sup>*<sup>ε</sup>* <sup>√</sup> *<sup>s</sup>*<sup>ˆ</sup> *r*ˆ+*δ* , and update it to the network weight *θ* ← *θ* + ∆*θ* .

Step 10: If the convergence threshold in Step 3 is not met, then, back to Step 6, otherwise end the iterative process.

#### *4.3. Simulation and Results*

In order to verify the feasibility and effectiveness of the DCNN used in this paper for robot joint sensor and actuator fault diagnosis, the ANN, SVM, CNN, and LTMN methods are studied for comparative analysis and verification. The diagnosis accuracy of the different network is shown in Figure 13.


**Figure 13.** Fault diagnosis accuracy on training process of the model.

As shown in Table 3, There are nine fault states for sensor and actuator. For each fault state F1 to F9, 2000 samples are obtained and then divided into training, testing, and verification data subset. The accuracy of different diagnosis methods is summarized in Figure 13. It can be seen that the accuracy of fault recognition using DCNN is significantly improved compared with ANN, SVM. The average accuracy of the DCNN in the training process is over 99%. However, it is worth noting that the lowest accuracy of the five methods appears in F5. It could be interpreted that F5 is a mixed fault of sensor and actuator, and Figure 12 shows that there is no obvious difference between the fault curves of sensor and actuator, so the fault diagnosis models cannot effectively tell the difference between them.

The confusion matrix of DCNN used on robot sensor and actuator fault diagnosis is shown in Figure 14. It could be seen that none of the fault types could be 100% recognized, and meanwhile, confusion matrix data shows that between that "Misjudgment" categories, their waveforms seem alike.

Further experiment research is conducted to compare the fault diagnosis effect of CNN, LTMN, and DCNN on robot joint sensor and actuator faults, and the accuracy and loss function figure of each diagnosis method in the training set and testing set are drawn with the help of TensorFlow.

Three of five fault diagnosis methods with an accuracy over 90% are investigated. From Figure 15, it can be seen that the LTMN and DCNN could realize a fault recognition rate of 100% on both training and testing data sets, and there is no gap between training and testing loss function, which shows the robustness of LTMN and DCNN.

**Figure 14.** Confusion matrix of DCNN.

**Figure 15.** Accuracy and loss of three kinds of neural networks.

The model training time and accuracy are shown in Figure 16. This shows DCNN needs fewer training time but still gets the maximum diagnosis accuracy, which proves the high performance of the proposed DCNN fault diagnosis method.

**Figure 16.** Accuracy and training time of three kinds of neural networks.

The initial value of the neural network is randomly given, in order to eliminate the influence of accidental factors. This paper uses the cross method to train the model 10 times, and each time the network is initialized with a random value. The accuracy of CNN, LTMN, and DCNN are shown in Figure 17.

**Figure 17.** Cross-validation result.

As can be seen from Figure 17, the recognition accuracy of DCNN in each experiment is more than 99%, and its lowest accuracy is 99.6%. Thus, the model initial value has little effect on fault diagnosis accuracy, which also proves the robustness of DCNN.

#### **5. Conclusions**

The DCNN fault diagnosis method is used to recognize the sensor and actuator faults of the robot system. The robot sensor and actuator output data are fused. In order to increase the number of training samples, the fault data set is expanded way of the data set enhancement method, and then the fault diagnosis is carried out using a deep convolution neural network. SVM, ANN, CNN, and LTMN-based neural network fault diagnosis methods are compared with the proposed DCNN and conclusions can be drawn that DCNN can better extract the fault information from the original input data and makes a more accurate classification of the sensor and actuator fault types.

**Author Contributions:** Conceptualization, J.P.; methodology, J.P., L.Q. and K.P.; software, L.Q.; validation, J.P.; formal analysis, K.P.; investigation, J.P. and K.P.; resources, L.Q. and K.P.; data curation, J.P. and L.Q.; writing—original draft preparation, J.P.; writing—review and editing, J.P. and K.P.; visualization, K.P. and L.Q.; supervision, L.Q.; project administration, L.Q.; funding acquisition, L.Q. and K.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key R&D Program of China, grant number 2019YFB1309900.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **X-ray Pulsar Signal Denoising Based on Variational Mode Decomposition**

**Qiang Chen, Yong Zhao and Lixia Yan \***

School of Automation Science and Electrical Engineering, Beihang University (BUAA), Beijing 100191, China; qiangchen@buaa.edu.cn (Q.C.); zhaoyong1996@buaa.edu.cn (Y.Z.)

**\*** Correspondence: yanlixia@buaa.edu.cn

**Abstract:** Pulsars, especially X-ray pulsars detectable for small-size detectors, are highly accurate natural clocks suggesting potential applications such as interplanetary navigation control. Due to various complex cosmic background noise, the original pulsar signals, namely photon sequences, observed by detectors have low signal-to-noise ratios (SNRs) that obstruct the practical uses. This paper presents the pulsar denoising strategy developed based on the variational mode decomposition (VMD) approach. It is actually the initial work of our interplanetary navigation control research. The original pulsar signals are decomposed into intrinsic mode functions (IMFs) via VMD, by which the Gaussian noise contaminating the pulsar signals can be attenuated because of the filtering effect during signal decomposition and reconstruction. Comparison experiments based on both simulation and HEASARC-archived X-ray pulsar signals are carried out to validate the effectiveness of the proposed pulsar denoising strategy.

**Keywords:** X-ray pulsar; signal denoising; variational mode decomposition

**Citation:** Chen, Q.; Zhao, Y.; Yan, L. X-ray Pulsar Signal Denoising Based on Variational Mode Decomposition. *Entropy* **2021**, *23*, 1181. https:// doi.org/10.3390/e23091181

Academic Editor: Quanmin Zhu

Received: 31 July 2021 Accepted: 2 September 2021 Published: 8 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Pulsars are rapidly rotating neutron stars that emit electromagnetic signals and have periods ranging from milliseconds to thousands of seconds [1,2]; see Figure 1 for an illustration.

**Figure 1.** Illustration of the pulsar model.

The pulsar emission mechanism is complex, featuring chaotic characteristics [3,4]. Various methods have been developed to calculate the Lyapunov exponent and its complementary measures to characterize the time scales on which chaotic systems, such as pulsars, become unpredictable [5–7]. Luckily, the potential applications of pulsars, especially accurate pulsar timing features, do not rely on a comprehensive understanding of pulsar emission models. Researchers have found that the rotation periods of pulsars over long timescales are as precise as the state-of-art terrestrial atomic clock, which suggests pulsars

are the perfect choice for precise timing and autonomous interplanetary navigation [8–11]. The precise timing properties of pulsars can be used for unmanned out-terrestrial vehicles, for instance, incorporating advanced robust control algorithms, such as those reported in [12,13], with the pulsar TOA sequence as well as the pulsar positioning algorithm. Nowadays, there are already over 2000 known pulsars that build a firm base for governmental and civilian use, as well as space applications not only for the military but also for astronomy exploration [14,15].

Among all types of pulsars, X-ray pulsars appear to be the most favorable ones as they are detectable by small-size detectors [1,8,9,16–19]. However, the X-ray pulsar radiation, namely the effective signals, would degenerate into photon sequences with very low intensity when it arrives at the terrestrial detector after long-distance space travel. An even more troublesome problem arises due to the complex, noisy photons, up to about nine times that of the effective photons in quantity, including the diffuse X-ray background and the cosmic X-ray background, which inherently contend with the originally observed signal [11]. These facts imply the low signal-to-noise (SNR) ratio of the pulsar signal and, hence, signify the necessity of pulsar denoising research, especially when the observation time is short.

The pulsar denoising refers to filtering the pulsar profile, generally obtained via epoch folding performed on photon sequences, that stores the time of arrivals (TOAs) of photons [1,17,20]. The Fourier filtering in the frequency domain is the original attempt to denoise pulsar profiles, which has already been proven invalid when the signal profile comprises nonsinusoidal components or nongaussian noise [16,18,21]. The modified kernel regression denoising method reported in [22] develops the second-order derivative compensation and reveals that a clear positive relationship between the TOA accuracy and the profile SNR does not exist. In [23], the wavelet analysis method using noise dispersion to determine the threshold value is applied to denoise the pulsar profile via the wavelet inverse transform. The denoising strategy based on biorthogonal lifting wavelet transforms (WT) reported in [17] studied the statistical properties of the X-ray background noise. The core idea of wavelet transform denoising is choosing an appropriate wavelet library and decomposition level in terms of finding a suitable wavelet basis and threshold function [24]. In particular, the wavelet basis plays an essential role in denoising performance as it is the base of wavelet denoising. However, at present, there is no universal wavelet basis that accommodates different pulsar signals. It has also been shown in [21] that the wavelet denoising methodology heavily relies on expertise and experience and is not compatible with an average processor. New approaches to increase the SNR of X-ray pulsar signals can be found in [25] with a machine learning method and in [26] with a recurrent neural network method, respectively. These two kinds of denoising strategies actually require high computational costs too.

Recently, empirical mode decomposition (EMD) has achieved great success in removing both white and fractional Gaussian noise by reconstructing the signal with predetermined thresholding intrinsic mode functions (IMFs) [27]. Nevertheless, the traditional EMD framework only allows for analyzing the single signal due to possible mixing phenomena when analyzing multiple signals simultaneously. This problem can be overcome by the multivariate strategy reported in [28], in which multiple pulsar signals are processed at the time while the mode mixing is avoided. The ensemble empirical mode decomposition (EEMD) synthesizes white Gaussian noise in the input signal for the later decomposition, in which the average operation can also avoid the mode mixing problem that existed in the output IMFs after EMD [29–31]. Given the limitation from the fixed threshold functions, the adaptive threshold mechanism together with the EMD denoising method shown in [32] remarkably increases the SNR of X-ray pulsar profiles. Via creating the Hausdorff measures between the probability density functions of the input signal and the mode, the filtering approach in [9] aliases and reconstructs the X-ray pulsar profiles to achieve the purpose of denoising. Though the EMD framework allows for analyzing nonstationary and nonlinear signals, it lacks a firm mathematical foundation and is sensitive to sampling noise, which

as a particular result, would lead to mode mixing when it is applied on pulsar denoising processing. Additionally, one might find that eliminating the mode mixing problem would build a firm base for performing autonomous mode decomposition and signal denoising, which is essential for embedding advanced control algorithms, such as in [33], into the pulsar-based spacecraft navigation system. Regarding the possible deficiencies of EMD on pulsar signal denoising, Dragomiretskiy developed a novel signal decomposition technique called variational mode decomposition (VMD) that nonrecursively decompose the signal into several quasi-orthogonal IMFs [34]. The VMD has already been applied in many areas such as seismic wave analysis [35], pipeline leakage detection [36] and vibration estimation of rotor-stator machinery [37]. For VMD-based applications, it is important to determine an accurate number of modes that have an essential effect on denoising performance [38]. The detrended fluctuation analysis (DFA) appears to be a favorable choice for this issue as it is a systematic scaling analysis approach to evaluate long-term dependency for nonstationary signal series [35,36]. In our previous work [39], a brief VMD-based denoising algorithm was developed for an X-ray pulsar profile after epoch folding without applying DFA to determine the mode number. To the authors' best knowledge, no literature reported had shown a detailed discussion about applying the VMD framework on denoising X-ray pulsar signals.

Motivated by the discussion above, this brief undertakes further endeavors on the research of denoising designs of X-ray pulsar signals so as to increase the SNR of the pulsar profiles and pave the way for potential applications. More precisely, a VMD-based denoising strategy is developed. We first perform the epoch folding method on the faint X-ray pulsar signals and obtain the noisy pulsar profile, followed by applying DFA to obtain the number of IMFs, after which the reconstruction of the pulsar signals achieves the denoising goal.

The contribution of this paper includes presenting the denoising algorithm for X-ray pulsar signals based on the VMD method. Compared with those in [18,20,40], the prior knowledge of pulsar profiles is unnecessary for the proposed VMD-based pulsar denoising strategies. In comparison with wavelet analysis [17,23,24], the denoising algorithm in this paper removes the reliance on choosing perfect basis and threshold functions. The VMD-based design in this paper also allows for processing the original pulsar signals that contain nonstationary background noise derived from many sources without leading to mode mixing problems, like that of the EMD-based analysis in [9,28,32].

The rest of the paper is organized as follows. Section 2 introduces basic knowledge of the VMD framework. Section 3 involves the VMD-based denoising design for the contaminated faint X-ray pulsar signals. Section 4 presents the comparison experiments with simulation and measured data. Section 5 concludes this work briefly.

#### **2. Preliminaries**

#### *Variational Mode Decomposition*

Variational Mode Decomposition (VMD) is a novel framework for signal processing that outperforms the traditional decomposition approaches, allowing for a systematic analysis of nonstationary and nonlinear signals [34]. It is built upon classical theories such as the Wiener filter, Hilbert transformation, frequency mixing, and decomposition method. Compared with the traditional empirical mode decomposition (EMD) technique, the VMD theoretically eliminates the potential problem of mode mixing by decomposing the signal into a sum of IMFs with the limited center frequency and bandwidth calculated analytically. It is worth noting that the reconstruction of the VMD-based decomposed signals, namely summing the obtained IMFs, would attenuate noise and increase SNR.

The VMD-based decomposition performed on the input signal *f* intends to achieve

$$\min\_{\{\{h\_k\}, \{\omega\_k\}}} \left\{ \sum\_{k=1}^K \left\| \frac{\partial}{\partial t} \left[ \left( \delta(t) + j \frac{1}{\pi t} \right) \* h\_k(t) \right] e^{-j\omega t} \right\|\_2^2 \right\},\tag{1}$$

so that the *f* can be reconstructed by the sum *K* ∑ *k*=1 *h<sup>k</sup>* = *f* , where {*hk*} and {*ωk*} denote the sets of all modes and center frequencies, respectively; *δ*(*t*) stands for Dirac function and notation ∗ denotes convolution. For definitions and discussions about the mode, refer to [34,41]. For the sake of completeness and simplicity, the VMD algorithm renders the constraint problem (1) into an unconstrained one via introducing a quadratic penalty term *α* and Lagrangian multipliers *λ*(*t*), resulting in the following augmented Lagrangian.

$$\begin{split} \mathcal{L}(\{h\_{k}\}, \{\omega\_{k}\}, \lambda) &= \mathfrak{a} \sum\_{k=1}^{K} \left\| \frac{\partial}{\partial t} \left[ \left( \delta(t) + \frac{j}{\pi t} \right) \* h\_{k}(t) \right] e^{-j\omega\_{k}t} \right\|\_{2}^{2} \\ &+ \left\| f(t) - \sum\_{k=1}^{K} h\_{k}(t) \right\|\_{2}^{2} \\ &+ \left\langle \lambda(t), f(t) - \sum\_{k=1}^{K} h\_{k}(t) \right\rangle. \end{split} \tag{2}$$

It then converts the original minimization problem (1) into the saddle point problem of (2). Iterative sub-optimizations in terms of the alternate direction multiplier method (ADMM) can be used to obtain the saddle point of (2) and the optimal solution of (1). Let *ω<sup>k</sup>* and *hi*6=*<sup>k</sup>* denote the most recently updated values, and the minimization problem for *h n*+1 *k* becomes

$$\begin{split} h\_{k}^{n+1} &= \underset{h\_{k} \in X}{\arg\min} \{ a \left\| \frac{\partial}{\partial t} \left[ \left( \delta(t) + \frac{j}{\pi t} \right) \* h\_{k}(t) \right] e^{-j\omega\_{k}t} \right\|\_{2}^{2} \\ &+ \left\| f(t) - \sum\_{t} h\_{i}(t) + \frac{\lambda(t)}{2} \right\|\_{2}^{2} \}. \end{split} \tag{3}$$

By applying the Parseval/Plancherel formula for Fourier transform together with Hermitian properties under the *L*<sup>2</sup> norm, we can find the optimal solution in the spectrum domain as follows,

$$\hat{h}\_{k}^{n+1}(\omega) = \frac{\hat{f}(\omega) - \sum\_{i \neq k} \hat{h}\_{i}(\omega) + \frac{\hat{\lambda}(\omega)}{2}}{1 + 2a(\omega - \omega\_{k})^{2}}.\tag{4}$$

Implementing the same technical principles above, we obtain the center frequency in the form of

$$
\omega\_k^{n+1} = \frac{\int\_0^\infty \omega \left| \widehat{h}\_k(\omega) \right|^2 \mathrm{d}\omega}{\int\_0^\infty \left| \widehat{h}\_k(\omega) \right|^2 \mathrm{d}\omega}. \tag{5}
$$

The specific calculation steps of variational mode decomposition, intuitively including denoising functionality, can be summarized as follows.


$$\text{4.}\qquad \text{Use } \hat{\lambda}^{n+1}(\omega) = \hat{\lambda}^n(\omega) + \tau[\hat{f}(\omega) - \sum\_{k} \hat{h}\_k^{n+1}(\omega)] \text{ and update } \hat{\lambda}, \forall \omega \ge 0;$$

5. Stop the iteration until *K* ∑ *k*=1 ˆ*h n*+1 *<sup>k</sup>* <sup>−</sup> <sup>ˆ</sup>*<sup>h</sup> n k* 2 2 ˆ*h n k* < *ε* for a chosen criterion *ε*, otherwise return

to step 2.

Actually, one should calculate an appropriate number of mode *K* before applying VMD to decompose the input signals. For this sake, the DFA approach is applicable to determine *K* as it can characterize different components contained in the signal and provides us with a threshold functionality for the calculation of *K* [21,35,36].

**Remark 1.** *Adopting the same simulation conditions as our previous work [42], we perform here a comparison experiment of EMD and VMD on decomposing the analog signal defined by*

$$f(t) = \begin{cases} \cos(20\pi t), \forall t \in [0, 0.5], \\ \cos(20\pi t) + \cos(6\pi t) + \cos(120\pi t), \\ \forall t \in (0.5, 0.8], \\ \cos(20\pi t), \forall t \in (0.8, 1]. \end{cases} \tag{6}$$

*The simulation results are drawn in Figure 2. As shown in Figure 2a, the EMD decomposes f*(*t*) *into four intrinsic mode functions and residue, leading to background signals mixing with frequencies 10 and 60 Hz. This mixing phenomenon distorts the intrinsic mode functions obtained later, lowering the signal decomposition performance. Comparing the results in Figure 2b with that in Figure 2a, the VMD features its superiority by accurately decomposing f*(*t*) *into three types of signals with different time frames and frequencies.*

**Figure 2.** Decomposing results of *f*(*t*) via EMD and VMD.

#### **3. VMD-Based Denoising Design for X-ray Pulsar Signals**

The X-ray pulsar signal observed by detectors contains the time of arrival and the number of photons. It is contaminated by various types of electromagnetic noise. Denoising X-ray pulsar signals appear essential for characterizing different pulsars and building a base for accurate timing in space. In this section, we present a VMD-based denoising method of X-ray pulsars step by step. First, we apply the epoch folding to obtain the noisy pulsar profile. Second, we use the DFA approach to calculate the number of modes of the pulsar profile. Third, we apply the VMD method to decompose and reconstruct the pulsar profile, which achieves the purpose of denoising.

#### *3.1. X-ray Pulsar Profile*

Epoch folding is applied in this subsection to obtain the pulsar profile. Sort the TOAs of photons into,

$$t\_0' \le t\_1' \le t\_2' \le \dots \le t\_{N-1'}' \tag{7}$$

where *t* 0 *i* , *i* ∈ *Z* denotes the time of arrival of the *i* + 1-th photon. In (7), the symbol "≤" suggests that numerous photons would run into the detector area simultaneously at a certain observing window.

For eliminating the effects of earth-observatory motions and interstellar medium during the pulsar signal transmitting path, we convert the TOAs of photons from observed time *t*obs to the solar system barycenter (SSB) as follows,

$$t\_{\rm SSB} = t\_{\rm obs} + \Delta\_{\rm E} + \Delta\_{\rm R} + \Delta\_{\rm SJ} \tag{8}$$

where ∆<sup>E</sup> denotes Einstein delay, ∆<sup>R</sup> is the Roemer delay and ∆<sup>S</sup> represents the Shapiro delay [43]. Many tools can be applied to finish the time conversion (8). For example, one can synthesis *t*obs with the orbit parameters and evoke the barycorr function provided by High Energy Astrophysics Science Archive Research Center (HEASARC) to complete the time conversion automatically.

Denote the TOA sequences in SSB by

$$t\_0 \le t\_1 \le t\_2 \le \dots \le t\_{N-1} \tag{9}$$

Supposing that the pulsar period is *T*0, the phase of (9) in the normalized time frame [0, 1) with respect to the initial time instant can be calculated as

$$
\varphi\_{\bar{l}} = \frac{t\_{\bar{l}} - t\_0}{T\_0} \text{mod 1.} \tag{10}
$$

Let us averagely scatter each period into *m* bins and compute the number of photons *N<sup>i</sup>* in each bin. Construct

$$\chi^2 = \sum\_{i=1}^{m} \frac{\left(N\_i - \bar{N}\right)^2}{\bar{N}}, \bar{N} = \frac{N}{m}.\tag{11}$$

where *N* is the total number of photons, *N*¯ stands for the mean value of each bin. When we only perform period estimation, the variable *χ* 2 in (11) satisfies the *χ* 2 -distribution with *m* − 1 degrees of freedom. Different pulsar periods vary from phases and photon quantities in each bin, which, as a result, shows that the obtained *χ* 2 -distributions are different. A period with errors would make the estimated phases inaccurate, simultaneously reducing *Ni* , *N*¯ and *χ* 2 . In contrast with that, an accurate estimated period suggests that the estimated phases are reliable, and *N<sup>i</sup>* and *N*¯ will differ a lot from each other, which increases *χ* 2 . Therefore, achieving the best estimation of period *T* relates to finding the maximum of *χ* 2 . Let

$$T = \arg\_{T \in \left[T\_{\min}, T\_{\max}\right]} \max \{ \mathcal{O}^2 \}. \tag{12}$$

The estimated phase of the arrival time of every photon is assigned to a certain bin according to the pulsar period, resulting in the estimated pulsar profile. The relationship between the phase and the number of photons in each bin is drawn. For instance, the horizontal axis of the pulsar profile is the phase with multiple bins, while the vertical axis denotes the number of photons. Theoretically speaking, the longer the observing time frame and the more the photon quantity, the more accurate the pulsar profile would become [17,20].

#### *3.2. Denoise of Pulse Profile Based on VMD*

Without causing any confusion, let *f* , defined below, be the pulse profile,

$$f = \mathfrak{x} + d,\tag{13}$$

where *x* is the original pulsar signal, and *d* denotes noise.

The number of modes *K* should be determined before performing VMD on *f* . The *K* plays an essential role in VMD decomposition, or in the view of this brief, the wrong choice of *K* would lower the denoising performance. For these considerations, we apply the detrended fluctuation analysis (DFA) method to determine *K*. The DFA is a favorable scaling tool generally utilized to analyze nonstationary signals, by which the scaling exponent *α* depicts how tough the signal performs, and a large *α* means that the volatility is small [21]. The *K* can then be calculated according to *α*.

Given any signal {*u*(*i*), *i* = 1, 2, . . . , *N*}, let us apply DFA to calculate *α* by the steps below.

1. Calculate the sum

$$y(k) = \sum\_{i=1}^{k} u(i) - k\vec{u}, k = 1, 2, \dots, N,\tag{14}$$

where *u*¯ = <sup>1</sup> *N N* ∑ *i*=1 *u*(*i*).

2. Divide the sequence *y*(*k*) into *N<sup>n</sup>* = (*N*/*n*) nonoverlap length-of-*n* pieces. As for each local trend, one can apply *l*-order polynomial to fit *yn*(*k*). For example, let *l* = 2 and define

$$y\_n(k) = a\_n k^2 + b\_n k + c\_n \tag{15}$$

where *an*, *b<sup>n</sup>* and *c<sup>n</sup>* denote constants.

3. Define the root-mean-square (RMS) function by

$$F(n) = \sqrt{\frac{1}{N} \sum\_{k=1}^{N} \left[ y(k) - y\_n(k) \right]^2}. \tag{16}$$

4. Finally, calculate the scaling exponent *α* by the least square regression approach as follows,

$$
\ln(F(n)) = a \ln(n) + \mathcal{C}.\tag{17}
$$

The relationship between *K* and *α* can be understood in the sense that the quantity of all scaling exponents *α*1:*<sup>K</sup>* under the constraint *α*1:*<sup>K</sup>* ≥ *θ* equals *J*, where *θ* = *α<sup>θ</sup>* = 0.25 denotes the threshold and *α<sup>θ</sup>* = 0.5 for the white Gaussian noise. Without losing generality, we suppose that the noise of the pulsar signal received by probes satisfies the Gaussian distribution.

The variable *J* is determined by *α<sup>θ</sup>* via,

$$J = \begin{cases} 1, & \alpha\_0 \le 0.8 \\ 2, & 0.8 < \alpha\_0 \le 1.0 \\ 3, & 1.0 < \alpha\_0 \le 1.5 \\ 4, & \alpha\_0 > 1.5 \end{cases} \tag{18}$$

Then, we utilize the obtained *K* to decompose the signal *f* via VMD, after which we reconstruct the nominal functions and achieve denoising; see (19).

$$\mathfrak{X} = \sum\_{k} h\_{k'} k = \{k | \mathfrak{a}\_k \ge \theta\}. \tag{19}$$

The flow chart in Figure 3 depicts the VMD denoising process on X-ray pulsar signals. The VMD denoising applied on pulsar signals consists of reconstructing the pulsar profile via summing intrinsic mode functions after decomposition. This process actually attenuates the noise and would remarkably increase the SNR of the pulsar profile. As can be seen, the presented X-ray pulsar denoising strategy does not require any prior knowledge, such as the known pulsar profiles required by the denoising algorithms in [18,20,40].

**Figure 3.** The denoising process of VMD on X-ray pulsar signals.

**Remark 2.** *The VMD approach provides a general framework for signal processing. In addition to X-ray pulsar denoising designs, various VMD-based applications, such as seismic wave analysis [35], pipeline leakage testing [36] and machinery vibration research [37], have achieved significant attention.*

#### **4. Experimental Analysis**

This section presents the experimental study focusing on validating the effectiveness of the proposed X-ray pulsar denoising strategies. Both simulated and measured data of the X-ray pulsar are considered. For performance evaluation purposes, various measures, including the Signal to Noise Ratio (SNR), Root-Mean-Square Error (RMSE), and Pearson's correlation coefficient (PCC), are considered as follows.

$$\text{SNR} = 10 \log \frac{\sum\_{i=1}^{M} \mathbf{x}(i)^2}{\sum\_{i=1}^{M} \left[\mathbf{x}(i) - \mathbf{\hat{x}}(i)\right]^2}, \\ \text{RMSE} = \sqrt{\frac{1}{M} \sum\_{i=1}^{M} \frac{\left[\mathbf{x}(i) - \mathbf{\hat{x}}(i)\right]^2}{\mathbf{x}(i)^2}},$$

$$\text{PCC} = \frac{M \sum\_{i=1}^{M} \mathbf{x}(i)\mathbf{\hat{x}}(i) - \sum\_{i=1}^{M} \mathbf{x}(i) \sum\_{i=1}^{M} \mathbf{\hat{x}}(i)}{\sqrt{M \sum\_{i=1}^{M} \mathbf{x}(i)^2 - \left(\sum\_{i=1}^{M} \mathbf{x}(i)\right)^2} \sqrt{M \sum\_{i=1}^{M} \mathbf{\hat{x}}(i)^2 - \left(\sum\_{i=1}^{M} \mathbf{\hat{x}}(i)\right)^2}},$$

where *x*(*i*) denotes the nominal signal, *x*ˆ(*i*) represents the estimated denoised signal, and *M* stands for the signal length. A higher SNR value or a lower RMSE value leads to better denoising performance. The PCC in the range [−1, 1] is generally utilized to address the linear relativity of paired variables. The larger PCC would suggest better denoising performance.

#### *4.1. Experiments of Simulation Data*

For pulsar simulation purposes, we applied the Psrsigsim pulsar signal simulator to construct a virtual reference X-ray pulsar profile and associated noisy pulsar profile. The Psrsigsim is a python toolbox developed for simulating pulsar signals based on the known and open databases. Parameters including pulsar profile type, interstellar medium, and telescope information should be set before propagating the virtual pulsar signals. In our settings, we saved the pulsar profile data in the portable '%.csv' form and processed the data via Matlab. Figure 4 depicts the overall production of propagating data and processing.

**Figure 4.** Experiment flow chart of Psrsigsim-propagated pulsar data.

We adopted the default pulsar profile 'J1713 + 07474' provided by Psrsigsim as the reference profile, by which the noisy pulsar profile is constructed via the 'pulsar.make\_pulse' function of the Psrsigsim class with a 2000-s observing time. As a result, two million pulsar events in terms of phases were obtained and taken as noisy pulsar signals. The reference pulsar profile is plotted in Figure 5a, while the noisy profiles in the first 1000 samples are drawn in Figure 5b. The vertical coordinate unit is omitted for the convenient processing of simulation data. Two hundred thousand noisy samples were adopted to perform filtering techniques that include epoch folding (EP), wavelet transform (WT), empirical mode decomposition (EMD), and variational mode decomposition (VMD). It is worth noting that the pulsar profile data after EP builds the base for the latter three approaches, or say, the latter three ones use the folding profile as the input source to perform filtering.

**Figure 5.** The reference and noisy pulsar profile of simulation data.

For EP filtering, we divided the adopted 200,000 noisy data into 1000 bins per the whole period, following the routine of (7)–(11), and depicted the folding results in Figure 6a. For WT filtering, the Matlab widenoise function was applied to denoise the pulsar profile after EP, and we decomposed the profile signal into three layers, after which the coefficients with high frequencies were eliminated manually. The reconstructed filtered profile after WT is shown in Figure 6b. For EMD filtering, the Matlab emd function was directly applied to process the EP-folded profile data sequence, and eight IMFs were obtained in which the last seven ones were summed together to reconstruct the profile signal, as plotted in Figure 6c. Finally, we implemented the proposed VMD-based denoising strategy to denoise the EP-folded profile data and plotted the results in Figure 6d. For VMD decomposition, we determined the mode number *K* = 3 via the DFA, as shown in (13)–(19), after which the VMD function shown in [34] is evoked.

As shown in Figure 5b, the noisy pulsar profile lacks basic profile shape features, which are important to characterize the pulsar property. Figure 6a–d demonstrates the denoising performance of the EP, WT, EMD, and VMD, respectively. Compared with the other three methods, it can be found that our VMD-based denoising strategy obtains better denoising results, as the denoised profile features a more fluent shape and fewer disturbances. For supporting this viewpoint, the SNRs, RMSEs, and PCCs of the original noisy profile and the denoised profile samples are shown in Table 1.

**Table 1.** Denoising performance with simulation data.


In Table 1, the term 'ORI' refers to the original noisy profile samples. As can be seen, all denoising methods chosen for comparison dramatically increase the SNR values compared with the original profiles of negative SNR, which means that the pulsar signal intensity is less than 1/10 of the noise. In particular, our VMD-based denoising strategy outperforms

the compared methods as it gained the highest SNR, smallest RMSE, and largest PCC among the others.

**Figure 6.** Denoising results of simulation data via different methods.

#### *4.2. Experiments of HEASARC-Archived Data*

The real pulsar TOA data are implemented in this subsection to validate the proposed pulsar denoising strategy. We chose the ni1011010301.cl file to perform denoising experiments from the XAMIN database. The chosen '%.cl' file includes the TOAs of Crab Pulsar photons observed and archived by the NICER mission in standard FITS format. The ni1011010301.cl file was created on 20 November 2017 after a 99-min continuous observation of the Crab X-ray Pulsar and includes 1,680,000 observed photons.

Because the motions of earth and observatory, as well as the interstellar medium, would distort the pulsar signals, we needed to convert the recorded TOAs to the solar system barycenter (SSB). More precisely, the Web Hera, an online HEASARC HEAsoft toolbox set, was applied to evoke the barycorr function that converts the TOA information in '%.cl' files into TOA in SSB, in which the associated orbit information should be set simultaneously. Later on, we performed denoising analysis via EP, WT, EMD, and VMD, respectively. The overall process is depicted in the Figure 7.

**Figure 7.** Experiment flow chart of HEASARC-archived pulsar data.

According to (12), we estimated the pulsar period by the Chi-square period estimation method. The estimated pulsar period is 0.033742389697 s. As the SNR value becomes large due to an increase in time frames used for folding, we view the pulse profile created by all 1,680,000 observed data as the reference signal while viewing the pulse profile created by the former 40,000 photons as a disturbance signal for experimental use. The reference profile is then obtained by multiplying the nominal signal by the ratio of noisy photon quantity and all photons, in which the phase part is divided into 100 bins. The reference and noisy profiles are depicted in Figure 8a,b.

**Figure 8.** The reference and noisy pulsar profile archived pulsar data. (**a**) Reference pulsar profile obtained via folding all photons; (**b**) Noisy pulsar profile.

The EP method was firstly used to obtain the X-ray pulsar profile, and the results are shown in Figure 9a. Note that the obtained profile in Figure 9a includes 400 times of folding because the total photon quantity in use is 40,000, and each period is divided into 100 bins. For WT filtering, we followed the same routine as the previous subsection and manually removed the high-frequency part, and the results are depicted in Figure 9b. For EMD filtering, the EP-folded profile data were decomposed into six modes, and we summed the later five ones to reconstruct the profile. The EMD-based denoising result is shown in Figure 9c. For VMD filtering, we applied the method of (13)–(18) to calculate the number of modes and obtain *K* = 3, based on which we performed VMD decomposition and signal reconstruction. The VMD-based denoising results are shown in Figure 9d.

**Figure 9.** Denoising results of archived pulsar data via different methods.

It is observed from Figure 9a–d that the adopted approaches achieve denoising at different levels. Figure 9c depicts the results after EP, in which obvious saw signals exist, even though the pulse shape matches that of the reference. It is hard to distinguish the best denoising algorithm among WT, EMD, and VMD from the plotted figures. We then numerically summarized the SNRs, RMSEs, and PCCs of the denoised signals in the following table for better comparisons.

The performance indexes in Table 2 indicate that all chosen experimental approaches have realized denoising purposes, and the SNRs are obviously increased compared with the original noisy signal. In particular, the proposed VMD-based denoising algorithm outperforms the other three approaches when inputting virtual Psrsigsim-propagated pulsar data. The obtained experimental results validate the effectiveness of the proposed VMD-based X-ray pulsar denoising design.


**Table 2.** Denoising performance with measured data.

#### **5. Conclusions**

This paper has made further efforts on X-ray pulsar denoising techniques. The VMD approach incorporated with EP and DFA is facilitated to derive a novel pulsar denoising algorithm that features filtering capacity while eliminating the mode mixing problem after EMD decomposition. Experimental results with both simulation and HEASARC-archived pulsar data demonstrate that the proposed denoising algorithm outperforms the EP-, WTand EMD-based denoising approaches. In the future, the authors will apply the proposed denoising algorithm on pulsar identification and pulsar-based navigation simulation of orbital satellites.

**Author Contributions:** Conceptualization, Q.C. and Y.Z.; methodology, Q.C., Y.Z. and L.Y.; software, Y.Z. and L.Y.; validation, Y.Z. and L.Y.; formal analysis, Q.C., Y.Z. and L.Y.; investigation, Q.C., Y.Z. and L.Y.; resources, Q.C., Y.Z. and L.Y.; data curation, Y.Z. and L.Y.; writing—original draft preparation, L.Y.; writing—review and editing, L.Y.; visualization, L.Y.; supervision, L.Y.; project administration, L.Y.; funding acquisition, Q.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by National Key R&D Program of China (No.2017YFB0503304).

**Data Availability Statement:** The HEASARC-archived %.cl file is available at https://heasarc.gsfc. nasa.gov (accessed on 20 November 2017).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

