*2.1. Principles of Improved EWT*

### 2.1.1. Principles of EWT

EWT is a widely used method for the adaptive segmentation of signals [31]. The segmentation principle involves adaptively segmenting the Fourier spectrum by marking maximum points in the frequency domain, and a set of bandpass filters suitable for processing signals is constructed in the frequency domain to extract amplitude modulation and frequency modulation (AM-FM) components from the Fourier spectrum.

The Fourier axis [0, π] is divided into n consecutive parts, that is, Λ*<sup>n</sup>* = [ω*n*<sup>−</sup>1, ω*n*](ω<sup>0</sup> = 0, ω*<sup>n</sup>* = π), where ω*<sup>n</sup>* is the boundary point between two parts and the corresponding value is the minimum between the two adjacent maximum values in the Fourier spectrum of the signal. Figure 1 [32] shows the division diagram of the Fourier axis. In the figure, ω*<sup>n</sup>* is defined as the center point of Λ*n*. Then, a transition region with a width of *Tn* = 2τ*<sup>n</sup>* is obtained.

**Figure 1.** Region segmentation diagram of the Fourier axis.

Referring to the wavelet construction method of Littlewood–Paley and Meyer, the empirical wavelet function is constructed. After Λ*<sup>n</sup>* is determined, the empirical wavelet is used as a bandpass filter. The formulas of the empirical wavelet function <sup>∧</sup> <sup>ψ</sup>*n*(ω) and the empirical scale function <sup>∧</sup> φ*n*(ω) are as follows [33]:

$$\stackrel{\wedge}{\psi}\_n(\omega) = \begin{cases} 1, \,\, (|\omega| \le (1 - \gamma)\omega\_n) \\ \cos\left\{\frac{1}{2}\beta \left[\frac{1}{2\gamma\omega\_n}(|\omega| - (1 - \gamma)\omega\_n)\right]\right\}, \, (1 - \gamma)\omega\_n \le \omega \le (1 + \gamma)\omega\_n \\ 0, \,\, (\text{others}) \end{cases} \tag{1}$$

$$\stackrel{\wedge}{\phi}\_n(\omega) = \begin{cases} 1, \; (|\omega| \le (1 - \gamma)\omega\_n) \\ \cos\left[\frac{\pi}{2}\beta \left[\frac{1}{2\gamma\omega\_n}(|\omega| - (1 - \gamma)\omega\_n)\right]\right], \; (1 - \gamma)\omega\_n \le \omega \le (1 + \gamma)a\_\text{n} \\ 0, \; (\text{others}) \end{cases} \tag{2}$$

where

$$\begin{array}{ll} \beta(\mathbf{x}) = \mathbf{x}^4 (35 - 84\mathbf{x} + 70\mathbf{x}^2 - 20\mathbf{x}^3) \\ \tau\_{\mathbf{n}} = \gamma \omega\_{\mathbf{n}} \qquad \gamma < \min\_{\omega\_{\mathbf{n}+1} + \omega\_{\mathbf{n}}} (\frac{\omega\_{\mathbf{n}+1} - \omega\_{\mathbf{n}}}{\omega\_{\mathbf{n}+1} + \omega\_{\mathbf{n}}}) \end{array} \tag{3}$$

After the EWT, the approximation coefficient *Wf* <sup>1</sup>(0, *t*) and the detail coefficient *Wf* <sup>2</sup>(*n*, *t*) can be expressed as follows.

$$\begin{aligned} \mathcal{W}\_{f1}(0, t) &= <\mathbf{x}, \phi\_1 \succ = \int \mathbf{x}(\tau) \, \overline{\phi\_1(\tau - t)} \, \mathbf{d}\tau \\ &= F^{-1}[\mathbf{x}(\omega) \widehat{\phi}\_1(\omega)] \end{aligned} \tag{4}$$

$$\begin{aligned} W\_{f2}(\mathfrak{n}, t) &= \lnot \mathfrak{x}, \psi\_{\mathfrak{n}} \rhd = \int \mathfrak{x}(\tau) \, \overline{\psi\_{\mathfrak{n}}(\tau - t)} \, \mathrm{d}\tau \\ &= F^{-1}[\mathfrak{x}(\omega) \widehat{\psi}\_{\mathfrak{n}}(\omega)] \end{aligned} \tag{5}$$

Then, the functional expression of the reconstructed original signal a is as follows:

$$\begin{aligned} x(t) &= \mathcal{W}\_{f1}(0, t) \* \varphi\_1(t) + \sum\_{n=1}^{N} \mathcal{W}\_{f2}(n, t) \* \psi\_n(t) \\ &= F^{-1} \left[ \stackrel{\wedge}{\mathcal{W}\_f}(n, \omega) \varphi\_1(\omega) + \sum\_{n=1}^{N} \stackrel{\wedge}{\mathcal{W}\_f}(n, \omega) \stackrel{\wedge}{\psi}\_n(\omega) \right] \end{aligned} \tag{6}$$

where "∗" is a convolution operation and <sup>∧</sup> *Wf* <sup>1</sup>(0, <sup>ω</sup>) and <sup>∧</sup> *Wf* <sup>2</sup>(*n*, ω) are the Fourier transforms of the approximate coefficient *Wf* <sup>1</sup>(0, *t*) and the detail coefficient *Wf* <sup>2</sup>(*n*, *t*), respectively. Finally, the signal f is decomposed into the sum of several single component signals.

$$\mathbf{x}(t) = \sum\_{k=0}^{N-1} \mathbf{x}\_k(t) \tag{7}$$

#### 2.1.2. Principle of the Adaptive Frequency Window EWT Algorithm

The division rules of the spectral boundaries of the traditional EWT algorithm are determined by the frequency domain extreme points, but a ball mill is vulnerable to strong noise, resulting in the disorderly arrangement of frequency domain extreme points. Considering these deficiencies, this paper uses the adaptive frequency window EWT to divide the spectral boundaries, as shown in Figure 2.

**Figure 2.** Diagram of empirical wavelet transform (EWT) spectral boundary division with an adaptive frequency window.

In Figure 2, the frequency window is represented as [ω*a*, ω*b*], where ω*a*, ω*<sup>b</sup>* is the central frequency of the lower cutoff band of the window. The shaded area represents the transition region of the segmented portion of the spectrum with width 2τ. The range of the d support interval is [0, π]. The frequency window can slide freely in the interval, and the width range is adaptively variable.

After the frequency window segmentation is improved, Equations (1) and (5) are modified as follows.

$$\begin{aligned} \overset{\wedge}{\psi}\_{n}(\omega) = \begin{cases} 1, \left(\omega\_{\mathfrak{d}} + \tau \le |\omega| \le \omega\_{\mathfrak{b}} - \tau\right) \\ \quad \cos\left[\frac{\pi}{2}\beta \left| \frac{1}{2\pi} (|\omega| - \omega\_{\mathfrak{b}} + \tau) \right| \right], \left(\omega\_{\mathfrak{b}} - \tau \le |\omega| \le \omega\_{\mathfrak{b}} + \tau\right) \\ \quad \sin\left[\frac{\pi}{2}\beta \left| \frac{1}{2\pi} (|\omega| - \omega\_{\mathfrak{a}} + \tau) \right| \right], \left(\omega\_{\mathfrak{a}} - \tau \le |\omega| \le \omega\_{\mathfrak{a}} + \tau\right) \\ 0, \text{ (others)} \end{cases} \end{aligned} \tag{8}$$

$$\begin{aligned} W'(t) &= <\mathbf{x}, \psi > = \int \mathbf{x}(\tau) \, \overline{\psi(\tau - t)} \, \mathrm{d}\tau \\ &= F^{-1}[\mathbf{x}(\omega) \psi(\omega)] \end{aligned} \tag{9}$$

Additionally, Equation (8) must be satisfied as follows.

$$\begin{cases} \beta(\mathbf{x}) = \mathbf{x}^4 (35 - 84\mathbf{x} + 70\mathbf{x}^2 - 20\mathbf{x}^3) \\ \tau = \gamma \omega\_a \qquad \gamma < (\omega\_b - \omega\_a) / (\omega\_b + \omega\_a) \end{cases} \tag{10}$$

Therefore, the modal component signal can be reconstructed as follows:

$$\mathbf{x}^\*(t) = \mathcal{W}'(t) \* \psi(\omega) = F^{-1} \left[ \stackrel{\wedge}{\mathcal{W}}(\omega) \stackrel{\wedge}{\psi}(\omega) \right] \tag{11}$$

where <sup>∧</sup> ψ(ω) is a Fourier transform of ψ(ω) and *x*<sup>∗</sup> (*t*) is an AM–FM component signal for improving EWT extraction.

#### 2.1.3. Simulation and Comparative Analysis of Improved EWT

To verify the ability of the improved EWT method to extract the feature components of the signal, a simulation with the improved EWT approach is performed, and the results are compared with those of the traditional EWT. The simulation signal *x* (*t*) is constructed as follows in Equation (12):

$$\begin{cases} \begin{aligned} \mathbf{x}\_1(t) &= 2t^2 \\ \mathbf{x}\_2(t) &= 1.1 \sin(34\pi t) \\ \mathbf{x}\_3(t) &= \begin{cases} 0.7 \cos(56\pi t) & 0 < t < 0.5 \\ 0.8 \cos(64\pi t) & t \ge 0.5 \end{cases} \\ \mathbf{x}(t) &= \mathbf{x}\_1(t) + \mathbf{x}\_2(t) + \mathbf{x}\_3(t) \end{aligned} \tag{12}$$

where *x* (*t*) is white noise, the signal-to-noise ratio (SNR) is set to 3, and *t* ∈ [0, 1]. Figure 3 shows the improved EWT and traditional EWT decomposition results for the simulation signal *x* (*t*).

**Figure 3.** The improved EWT and traditional EWT decomposition results (red dotted lines represent the original signal; blue solid lines represent the decomposition results). (**a**) Improved EWT; (**b**) EWT.

In Figure 3, the components f2–f5 correspond to the signals *x*3(*t*)~*x*1(*t*), respectively. Figure 3a shows that the noise contained in the signal is well decomposed by the improved EWT and that the degree of coincidence of each component is close to 90%. The two modes that originally belonged to the same component are decomposed because the two modal components obviously have distinct energy signals and can be regarded as two independent modes. Figure 3b shows that traditional EWT can decompose noise, but the components *x*1(*t*), *x*2(*t*), and *x*3(*t*) are deformed because the traditional EWT segmentation method is too simple. When analyzing local noise or nonstationary signals, some local maxima generated by noise and nonstationary components may appear and erroneously remain in the peak sequence, and some useful maxima may not be kept in the peak sequence, resulting in improper segmentation. The improved EWT uses the adaptive frequency window for spectrum segmentation, which can reduce the effects of noise and nonstationary components and greatly increase the reliability of spectrum segmentation.

This comparative study of simulated signals indicates that the improved EWT method can effectively detect the modal components in power spectra, extract components similar to the original signal components, and suppress modal aliasing. Thus, the decomposition effect of the improved EWT method is better than that of the traditional EWT method.

#### *2.2. Principle of Multiscale Fuzzy Entropy*

#### 2.2.1. Principle of Fuzzy Entropy

FE is the probability of identifying a new pattern in a time series when the dimension changes, which reflects the complexity and irregularity of the time series. The larger the probability of the time series, the greater the FE value [34]. During the operation of a ball mill, the change in the load state will cause the characteristics of the vibration signal of a cylinder to change in an obvious manner, and FE can effectively characterize the state characteristics of the signal in each frequency band during the sampling time. Therefore, it is feasible to introduce FE as the characteristic parameter of the vibration signal of a ball mill cylinder. The algorithm steps are as follows.

1. The m-dimensional vector is obtained by processing the time series:

$$\begin{aligned} X\_i^m &= \{ u(i), u(i+1), \dots, u(i+m+1) \} - u\_0(i) \\ u\_0(i) &= \frac{1}{m} \sum\_{j=0}^{m-1} u(i+j) \qquad i = 1, 2, \dots, i+m+1 \end{aligned} \tag{13}$$

where *X<sup>m</sup> <sup>i</sup>* is the result of removing the mean *u*0(*i*) of the time series.

2. Calculate the maximum distance between *X<sup>m</sup> <sup>i</sup>* and *<sup>X</sup><sup>m</sup> j* :

$$d\_{i\ \dot{j}}^m = d[X\_i^m, X\_{\dot{j}}^m] = \max\_{k \in (0, m-1)} \| \mu(i+k) - \mu 0(i) - (\mu(j+k) - \mu 0(j)) \| \tag{14}$$

where *i*, *j* = 1, 2, ···, *N* − *m*, *i j* .

3. The similarity between *X<sup>m</sup> <sup>i</sup>* and *<sup>X</sup><sup>m</sup> <sup>j</sup>* is defined by a fuzzy function as follows:

$$D\_{i\ \dot{j}}^{\text{m}} = \mathfrak{u}(d\_{i\ \dot{j}}^{\text{m}}, n, r) = \mathfrak{e}^{-(d\_{i\ \dot{j}}^{\text{m}}/t)^{n}} \tag{15}$$

where *u*(*dm i j*, *n*,*r*) is an exponential fuzzy membership function and *<sup>n</sup>* and *<sup>r</sup>* are the boundary gradients and widths of the fuzzy membership functions, respectively.

4. Define the functions as follows:

$$\phi^m(n,r) = \frac{1}{N-m} \sum\_{i=1}^{N-m} \left( \frac{1}{N-m-1} \sum\_{\substack{j=1 \\ j \neq i}}^{N-m} D\_{i,j}^m \right) \tag{16}$$

where *i*, *j* = 1, 2, ···, *N* − *m*, *i j* .

5. The *m*+1 vector is constructed based on the above four steps.

$$\phi^{m+1}(n,r) = \frac{1}{N-m} \sum\_{i=1}^{N-m} \left( \frac{1}{N-m-1} \sum\_{\substack{j=1 \\ j \neq i}}^{N-m} D\_{i,j}^{m+1} \right) \tag{17}$$

6. The calculation formula of the FE value can be summarized as follows:

$$FuxzyEn(m,n,r) = \lim\_{N \to \infty} \left[ \ln \phi^m(n,r) - \ln \phi^{m+1}(n,r) \right] \tag{18}$$

where *i*, *j* = 1, 2, ···, *N* − *m*, *i j* .

7. When *N* is limited, Equation (18) is transformed into the following formula.

$$FuzzyEn(m, n, r, N) = \ln \phi^m(n, r) - \ln \phi^{m+1}(n, r) \tag{19}$$

#### 2.2.2. Principle of Multiscale Fuzzy Entropy

The characteristic frequency band and complexity of the vibration signal of a cylinder under different load conditions in a ball mill are different at different scales. Considering the FE of the vibration signal at different scales can improve the recognition accuracy, therefore the concept of multiple scales is introduced based on FE. The steps in the MFE algorithm are as follows.

1. Construct a new coarse granularity vector for the original time series *Xi* = {*x*1, *x*2, ···, *xn*} as follows:

$$y\_j(\tau) = \frac{1}{\tau} \sum\_{i=\left(j-1\right)\tau+1}^{j\tau} x\_i \quad 1 \le j \le \frac{N}{\tau} \tag{20}$$

where τ = 1, 2, ···, *n* represents the scale factor. When τ = 1, the coarse-grain vector is the original sequence. For a given τ, the original sequence is divided into coarse granularity vectors of length *N*/τ, and Figure 4 shows the coarse granularity process for τ = 2 and τ = 3.

**Figure 4.** The coarse granularity process of multiscale fuzzy entropy (MFE).

2. The FuzzyEn of each coarse-grained sequence is determined by the standard deviation of the original sequence. The FuzzyEn value can be expressed as a function of the scale factor in MFE analysis.

#### 2.2.3. Parameter Selection for MFE

According to the definition of MFE, the calculation of MFE is related to the embedding dimension *m*, similarity tolerance *r*, exponential function gradient *n,* and data length *N*. The selection rules are as follows.


#### *2.3. Principle of the AEPSO\_PNN*
