**1. Introduction**

Fault diagnosis and detection for complex industrial systems has been widely investigated and rapidly developed in recent years [1–5]. In essence, fault diagnosis in industrial environments is pattern recognition based on fault features. In engineering systems, fault diagnosis is usually carried out in two aspects: model-based and data-based [6]. With the progress of science and technology, intelligent pattern recognition algorithms for fault signals have been developed vigorously, such as neural networks [7–9], K-nearest neighbor [10–12], and LSSVM [13–15]. Neural networks have the advantage of being able to approximate arbitrary complex nonlinearities and have good robustness [6,16]. For example, Xu et al. [17] proposed a fault diagnosis method based on neural networks and fuzzy theory for rotating machinery. In [4], a performance degradation and fault detection model for industrial systems was proposed based on transfer learning and federated neural networks, and the analysis illustrated its effectiveness and feasibility for industrial systems. For the purpose of fault detection, Chen et al. [9] established a data-driven fault detection scheme based on two neural networks, which can construct the optimal model adaptively. These methods demonstrate the effectiveness of neural network algorithms in fault diagnosis for dynamic industrial systems [18]. In another respect, vibration signals can be converted into two-dimensional digital images representing the patterns of

**Citation:** Guan, S.; Huang, D.; Guo, S.; Zhao, L.; Chen, H. An Improved Fault Diagnosis Approach Using LSSVM for Complex Industrial Systems. *Machines* **2022**, *10*, 443. https://doi.org/10.3390/ machines10060443

 Academic Editor: Ahmed Abu-Siada

Received: 7 May 2022 Accepted: 31 May 2022 Published: 4 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the permutation entropy of those signals, as in [19], where a deep neural network was established for pattern recognition. Usually, a neural network algorithm needs a large amount of data training to establish a model with high diagnostic accuracy. However, it is difficult to obtain a large amount of fault data from complex systems in practice. K-nearest neighbor is one of the simplest algorithms based on data-driven classification technology, and it is easy to implement and requires no parameter estimations. It is widely applied in pattern recognition, fault diagnosis, and the multiple classification problem [20–22]. Ma et al. [23] proposed a multilabel learning algorithm based on the K-nearest neighbor algorithm for managing the prognostics and health of rolling bearings, and Gan et al. [24] used the K-nearest classifier to identify different rolling bearing conditions for industrial systems. Nevertheless, K-nearest neighbor is highly dependent on samples, the effect of this defect on classification accuracy cannot be neglected.

Support vector machine (SVM), as a classical pattern identification method, is widely used in various fields. For example, a temporal-based support vector machine for the detection and identification of several toxic gases in a gas mixture was proposed in [25], which also indicates the grea<sup>t</sup> potential of SVM. LSSVM, which is a modification of the SVM, was proposed by Suykens and Vandewalle in [26]. Inequality constraints in SVM are replaced by equality constraints in LSSVM, reducing the difficulty of calculation. Zhang et al. [27] combined a generalized frequency response function and LSSVM to achieve fault classification for a nonlinear analog circuit. The results showed that the fault diagnosis method can obtain high recognition accuracy. Product function correntropy and LSSVM were presented in [28] to improve the fault diagnosis performance for rolling bearings in varying industrial conditions. In order to further improve the effectiveness of LSSVM, Zhang et al. [29] used PSO to optimize LSSVM, and their proposed PSO-LSSVM fault diagnosis method had a high recognition rate. Similarly, a fault identification method for rolling bearings in industrial systems was proposed in [30]. In addition, Ren et al. achieved fault detection and diagnosis in complex industrial systems based on PSO-LSSVM, and their experimental results showed that this method can be applied well in the field of industry. As mentioned above, as a classical intelligent optimization algorithm, PSO is widely used due to its convenience of implementation: it does not require that extra attention be paid to parameter tuning. However, the PSO algorithm also has many disadvantages, such as a poor ability to search locally, and its tendency to fall easily into the local extremum [31–33]. To solve this problem, many scholars have made grea<sup>t</sup> efforts. For example, Zhang et al. [34] introduced dynamic inertia weights and gradient information to improve PSO. At the same time, a bearing fault diagnosis method via an LSSVM identification model was presented. Liu et al. [35] established a fault detection model based on a chaotic PSO algorithm and a kernel-independent component analysis, and the simulation results showed that the optimization method can avoid the phenomenon of the PSO algorithm's susceptibilty to falling into a local extremum. Furthermore, an improved PSO- and SVM-based fault diagnosis methodology was presented in [36] to predict faults in nuclear power plants.

Motivated by the above observations, the first contribution of this study is to design a novel fault diagnosis method based on WMPSO-LSSVM that can achieve a high classification accuracy. The second contribution is to solve the problems of the PSO algorithm's susceptibilty to falling into a local extremum and its low search precision. In addition, this study adopts the data-driven method to realize the fault diagnosis and prognostics of the actual complex parts in an industrial system, and a contrast experiment shows that the established joint optimization scheme has superior performance and strong robustness, which can promote the development of mechanical fault diagnosis.

The remaining parts of this study include Section 2, which introduces the signal preprocessing and feature extraction methods, which are based on an orthogonal wavelet packet algorithm ( WPT); Section 3, in which the WMPSO-LSSVM-based fault diagnosis scheme is presented; Section 4, where the effectiveness of this study is verified by actual fault data and comparison experiments; and finally, Section 5, in which the conclusion is given.

### **2. Signal Decomposition and Feature Extraction-Based Orthogonal Wavelet Packet Transform**

Wavelet transforms have been widely used for vibration signal pre-processing for industrial systems. Generally, wavelet transforms only decompose the low-frequency part of the signal, and do not treat the high-frequency portion of the signal at all. However, the detailed information that can characterize the vibration signal usually exists in the high-frequency section. Therefore, the orthogonal wavelet packet transform is introduced to solve this problem. Furthermore, the vibration signal of industrial systems can be decomposed in this way without information loss, which lays a foundation for obtaining high fault diagnosis accuracy. The theoretical basis is described as follows.

In multiresolution analysis, *L*<sup>2</sup>(*R*) is a square-integrable space and *L*<sup>2</sup>(*R*) = ⊕ *j*∈*ZWj*,

indicating that the multiresolution analysis decomposes *L*<sup>2</sup>(*R*) into the orthogonal sum of all subspaces *Wj*(*j* ∈ *<sup>Z</sup>*), according to the different scale factors *j*. *Wj*(*j* ∈ *Z*) is the wavelet subspace of the wavelet function *ψ*(*t*). Then, we hope to further subdivide *Wj*(*j* ∈ *Z*) through a binary fraction. Therefore, the scale subspace *Vj* and the wavelet subspace *Wj* can be represented through a new subspace *Un j*, if there are the following conditions:

$$\begin{cases} \ \mathcal{U}^0\_j = V\_j j \in Z \\ \ \mathcal{U}^1\_j = \mathcal{W}\_j j \in Z \end{cases} \tag{1}$$

Then, the orthogonal decomposition of the Hilbert space can be expressed as follows:

$$
\mathcal{U}\_{j+1}^0 = \mathcal{U}\_j^0 \oplus \mathcal{U}\_j^1 \tag{2}
$$

Suppose *Un j* is the wavelet subspace of *un*(*t*), *U*2*n j* is the wavelet subspace of *<sup>u</sup>*2*n*(*t*), and *un*(*t*) is:

$$\begin{cases} \boldsymbol{\mu}\_{2u}(t) = \sqrt{2} \sum\_{k \in \mathbb{Z}} h(k) \boldsymbol{\mu}\_{\boldsymbol{n}}(2t - k) \\\ \boldsymbol{\mu}\_{2u+1}(t) = \sqrt{2} \sum\_{k \in \mathbb{Z}} \boldsymbol{\mathcal{g}}(k) \boldsymbol{\mu}\_{\boldsymbol{n}}(2t - k) \end{cases} \tag{3}$$

where *h*(*k*) represents the low-pass filter coefficients and *g*(*k*) represents the high-pass filter coefficients, and *g*(*k*) = (−<sup>1</sup>) *k h*(1 − *k*). Then, Formula (3) can be rewritten as follows:

$$\begin{cases} \boldsymbol{\mu}\_{2\boldsymbol{u}}(t) = \sqrt{2} \sum\_{\substack{k \in \mathbb{Z} \\ \boldsymbol{\mu}\_{2\boldsymbol{u}+1}(t)}} \boldsymbol{h}\_{k} \boldsymbol{\mu}\_{\boldsymbol{u}}(2t - k) \\\ \boldsymbol{\mu}\_{2\boldsymbol{u}+1}(t) = \sqrt{2} \sum\_{k \in \mathbb{Z}} \boldsymbol{g}\_{k} \boldsymbol{\mu}\_{\boldsymbol{u}}(2t - k) \end{cases} \tag{4}$$

where *<sup>u</sup>*0(*t*) = *φ*(*t*) (*φ*(*t*) is the scale function), *<sup>u</sup>*1(*t*) = *ψ*(*t*) (*ψ*(*t*) is the wavelet basis function), and the sequence {*un*(*t*)}*n*∈*Z*+ is the orthogonal wavelet packet basis.

Suppose *f*(*n*) is the signal to be decomposed. In fact, a wavelet packet transform of *f*(*n*) is a projection coefficient on the wavelet packet basis {*un*(*t*)}*n*∈*Z*+ :

$$p\_f(n,j,k) = \langle f(t), u\_n(t) \rangle = \int\_{-\infty}^{+\infty} f(t) \left[ 2^{-j/2} \bar{u}\_{\text{fl}} \left( 2^{-j} \bar{\mathbf{r}} - k \right) \right] dt \tag{5}$$

where {*ps*(*<sup>n</sup>*, *j*, *<sup>k</sup>*)}*<sup>k</sup>*∈*<sup>Z</sup>* is the sequence of transformation coefficients of *f*(*n*) on *Un j*.

Usually, the transformation coefficients {*ps*(*<sup>n</sup>*, *j*, *<sup>k</sup>*)}*<sup>k</sup>*∈*<sup>Z</sup>* can be calculated through the Mallat algorithm:

$$\begin{cases} p\_f(2n,j,k) = \sum\_{l \in \mathbb{Z}} h\_{l-2k} p\_f(n,j-1,l) \\\ p\_f(2n+1,j,k) = \sum\_{l \in \mathbb{Z}} g\_{l-2k} p\_f(n,j-1,l) \end{cases} \tag{6}$$

According to the above discussion, the decomposition processing of the original signal is depicted and illustrated in Figure 1.

**Figure 1.** Wavelet decomposition for original signals.

Because of the integrity and orthogonality of the wavelet packet space, the original signal *f*(*n*) is almost completely intact after wavelet decomposition, which provides conditions for analyzing signal characteristics.

According to the above definition of the orthogonal wavelet packet transform, the signal *f*(*n*) has been projected adaptively into the orthogonal wavelet packet space; then, the obtained component can be regarded as the energy distributed in the corresponding space. If the energy distribution of signals in the space of each orthogonal wavelet packet can be calculated at a certain decomposition level, then the characteristics can be extracted by sorting these energies according to the frequency index of *Unj* . The energy distribution in the time-frequency localization space can be interpreted as follows:

$$E(j, n) = \sum\_{k \in \mathbb{Z}} \left[ p\_f(n, j, k) \right]^2 \tag{7}$$

Therefore, if the original signal *f*(*t*) is decomposed by *P* levels, the energy feature vector extracted from the original signal can be expressed as follows:

$$E^\*(P, f) = \begin{bmatrix} E(P, 0), E(P, 1), \dots, E\left(P, 2^P - 1\right) \end{bmatrix} \tag{8}$$

### **3. Improved Fault Diagnosis Approach Using WMPSO-LSSVM**

*3.1. Least Squares Support Vector Machine*

The literature of various fields shows that the LSSVM model performs well on various datasets, so it can process the data generated under unknown working conditions in complex industrial systems well. In addition, the complete theoretical basis of LSSVM can also ensure its stability. The principle of LSSVM is as follows:

$$\min\_{w,b} \frac{1}{2} \left\| w \right\|^2 + C \sum\_{i=1}^{m} \zeta\_i^2 \tag{9}$$

$$s.t.y\_i(w^Tx\_i + b) = 1 - \zeta\_{i\prime}i = 1,2,\ldots,m\tag{10}$$

where {(*<sup>x</sup>*1, *y*1),(*<sup>x</sup>*2, *y*2),... ,(*xl*, *yl*)} are the samples to be observed, *w* is the perpendicular vector of the line, *b* is the offset of the hyperplane, *C* is the regularization parameter, and *ζi* represents the fluctuations in the error of each sample.

To obtain an accurate solution to the above optimal problem, the Lagrange function with slack variables can be established as follows:

$$L(\boldsymbol{w}, b, \boldsymbol{\zeta}, \boldsymbol{a}, \boldsymbol{\lambda}) = \frac{1}{2} \left\| \boldsymbol{w} \right\|^2 + \mathbb{C} \sum\_{i} \boldsymbol{\zeta}\_i^2 + \sum\_{i} a\_i (1 - \boldsymbol{\zeta}\_i y\_i \left( \boldsymbol{w}^T \boldsymbol{q}\_{\text{lswm}}(\mathbf{x}\_i) + b \right)) - \sum\_{i} \boldsymbol{\lambda}\_i \boldsymbol{\zeta}\_i \tag{11}$$

where *αi* is the Lagrange multiplier of the original problem, and *λi* is the Lagrange multiplier of the additional slack variables.

Take the derivative of each variable in Formulas (9) and (10) and let them be 0. The following equalities hold:

$$\begin{cases} w = \sum\_{i}^{n} \boldsymbol{\alpha}\_{i} \mathbf{x}\_{i} \boldsymbol{y}\_{i} \\ \sum\_{i=1}^{n} \boldsymbol{\alpha}\_{i} \mathbf{y}\_{i} = 0 \\ \mathbf{C} - \boldsymbol{\alpha}\_{i} - \boldsymbol{\lambda}\_{i} = 0 \end{cases} \tag{12}$$

Thus, Formula (11) can be rewritten as follows:

$$L(\zeta, \mathfrak{a}, \lambda) = \sum\_{i} \mathfrak{a}\_{i}^{2} + \sum\_{i,j} \mathfrak{a}\_{i} \mathfrak{a}\_{j} \mathfrak{y}\_{i} \mathfrak{y}\_{j} \mathfrak{X}\_{i}^{T} \mathfrak{X}\_{j} \tag{13}$$

Therefore, the optimal problem of Formulas (9) and (10) can be expressed as follows:

$$\begin{cases} \max\_{\mathfrak{a}} \mathcal{W}(\mathfrak{a}) = \sum\_{i=1}^{n} \mathfrak{a}\_{i} - \frac{1}{2} \sum\_{i,j=1}^{n} y\_{i} y\_{j} \mathfrak{a}\_{i} \mathfrak{a}\_{j} \langle \mathfrak{x}\_{i}, \mathfrak{x}\_{j} \rangle \\ \text{ s.t. } \sum\_{i=1}^{n} \mathfrak{a}\_{i} y\_{i} = 0 \end{cases} \tag{14}$$

Given the varying conditions of industrial systems, the vibration signal of equipment follows a nonlinear relationship. In order to solve the problem of linear indivisibility in primordial space, it is necessary to transform the failure samples into multi-dimensional distinguishable space by introducing kernel functions. Therefore, Formula (14) can be written as follows:

$$\begin{cases} \max\_{\boldsymbol{\alpha}} \mathcal{W}(\boldsymbol{\alpha}) = \sum\_{i=1}^{n} \boldsymbol{\alpha}\_{i} - \frac{1}{2} \sum\_{i,j}^{n} \mathcal{Y}^{(i)} \mathcal{Y}^{(j)} \boldsymbol{\alpha}\_{i} \boldsymbol{\alpha}\_{j} k (\boldsymbol{x}\_{i}, \boldsymbol{x}\_{j}) \\ \quad \text{s.t. } \sum\_{i=1}^{n} \boldsymbol{a}\_{i} \boldsymbol{y}\_{i} = 0 \end{cases} \tag{15}$$

where *<sup>k</sup> xi*, *xj* is the kernel function, and the selection of the kernel function has grea<sup>t</sup> flexibility. The common kernel functions are described as follows:

1. Linear kernel function:

$$K(\mathbf{x}\_{i\prime}, \mathbf{x}\_{j}) = \mathbf{x}\_{i} \cdot \mathbf{x}\_{j} \tag{16}$$

2. Polynomial kernel function:

$$\mathcal{K}(\mathbf{x}\_{i}, \mathbf{x}\_{j}) = \left(\mathbf{x}\_{i} \cdot \mathbf{x}\_{j} + 1\right)^{l}, l = 1, 2, \dots \tag{17}$$

3. Gaussian kernel function:

$$K(\mathbf{x}\_{i}, \mathbf{x}\_{j}) = \exp\left[-\frac{\left\|\mathbf{x}\_{i} - \mathbf{x}\_{j}\right\|^{2}}{2\sigma^{2}}\right] \tag{18}$$

The Gaussian kernel function selected in this paper can effectively transform the data from the low-dimensional non-separable space to the high-dimensional separable space, and it can further improve the classification accuracy of the model. Another advantage of Gaussian kernels, compared to other kernels, is that the more complex the model, the stronger the performance. In addition, no matter how many dimensions are the

characteristics of each sample point, each sample can be transformed into the total sample quantity dimension after processing by the Gaussian kernel function, which expands the dimension and the diversity of data.

It is natural to notice that LSSVM's classification accuracy is closely related to the penalty factor and parameter *σ* of the kernel functions. If the kernel function is too small, there will be an over-fitting phenomenon in the classification; otherwise, there will be an under-fitting phenomenon. Similarly, the larger the penalty factor, the more likely it is to overfit; and the smaller the penalty factor is, the more likely it is to underfit. Thus, in order to improve the accuracy of fault diagnosis for industrial systems, an optimized approach, named WMPSO-LSSVM, is proposed in the next section.

### *3.2. WMPSO-Based Parameters Optimization of LSSVM*

As mentioned above, the regularization parameter and kernel functions play an important role in LSSVM. Thus, in this paper, we adopt the proposed WMPSO algorithm to optimize the parameters and establish a desirable model with high classification accuracy. Firstly, the basic model of PSO is as follows:

$$C\_i = m \times C\_i + c\_1 \times rand \times (qbest - \sigma\_i) + c\_2 \times rand \times (qbest - \sigma\_i) \tag{19}$$

$$
\sigma\_{\bar{i}} = \sigma\_{\bar{i}} + \mathbb{C}\_{\bar{i}} \tag{20}
$$

where *Ci* is the regulation parameter of the LSSVM as well as the current velocity of PSO, and *σi* is the kernel function of the LSSVM as well as the location of particles in PSO. *m* indicates the weight coefficient, *c*1 and *c*2 are learning factors, and *rand* is a random number between 0 and 1. Meanwhile, *gbest* and *qbest* store the optimal values corresponding to the penalty coefficient *C* and the kernel parameter *σ*, respectively.

Suppose there is a group of particle swarms *S* = (*<sup>S</sup>*1, *S*2,... , *Sn*) in an n-dimensional space; *C* and *σ* can be presented as follows:

$$\mathbb{C} = (\mathbb{C}\_1, \mathbb{C}\_2, \dots, \mathbb{C}\_i) \tag{21}$$

$$
\sigma = (\sigma\_1, \sigma\_2, \dots, \sigma\_i) \tag{22}
$$

In this paper, the wavelet function *μ*∗ is used to conduct a random perturbation of all the dimensions of the contemporary optimal value *Qm g* (*t*) particles, and the perturbation result is taken as the position of the particles. The calculation model is given as follows:

$$
\sigma^{\mathfrak{m}}(t) = \mu^\* Q\_{\mathcal{F}}^{\mathfrak{m}}(t) \tag{23}
$$

For the sake of the accuracy of the WMPSO algorithm, the Morlet function was selected as the wavelet base in this study, as shown in Figure 2.

The Morlet wavelet has more accurate and high-resolution spectral estimation, and has thus been widely used. Compared with the Gaussian and Cauchy variations often used in particle swarm optimizations, the Morlet wavelet searches more effectively in the solution space because there is an equal probability of producing positive and negative numbers.

In addition, the Morlet wavelet function changes the local solution more frequently in the solution space, and it is easier to obtain the optimal solution in the local optimization. The Morlet wavelet function can fine-tune the particle, so it is a remarkable choice to select the Morlet wavelet for mutation.

Thus, the wavelet function value applied is expressed as follows:

$$
\mu^\* = \frac{1}{\sqrt{a}} e^{-\left(\frac{p^\*}{a^\*}\right)^2/2} \cos\left(5\left(\frac{p^\*}{a^\*}\right)\right) \tag{24}
$$

Meanwhile, the scale parameter *a*<sup>∗</sup> is calculated by Formula (25):

$$a^\* = e^{-\ln(\mathcal{g}) \times \left(1 - \frac{\mu}{\tau}\right)^{\gamma\_{\omega \times \text{rw}}}} + \ln(\mathcal{g})\tag{25}$$

where *γωm* is the shape parameter, *t* is the current iteration number, *T* is the maximum number of iterations, and *g* is the limit of *<sup>a</sup>*<sup>∗</sup>.

**Figure 2.** Morlet-wavelet function.

Therefore, after the perturbance by using the wavelet mutation function, the new positions of the particles are *σ*¯ *m* = *σ*¯ *m*1 , *σ*¯ *m*2 ,... , *σ*¯ *mn* . Once the position and kernel parameter *σ* are determined, the regularization factor *C* can be confirmed according to Formula (19). The optimization process for the parameters in this study is given in Algorithm 1:

### **Algorithm 1** The process of the WMPSO parameters' optimization

**Initialize** *σi* \\ *σi* is the position of the *ith* particle

Calculate fitness function \\ Individual extreme values of particles can be calculated by fitness function **while** *i* <= *T* **do** \\ *T* is the maximum number of iterations performed by the algorithm *i* = *i* + 1 **for** *j* = 1 to *n* **do** Update velocity *Ci* based on Equation (19) Update position *σi* based on Equation (20) **if** *pm* > *rand* **then** Calculate *a*<sup>∗</sup> based on Equation (25) *ϕ*∗ = 2.5 ∗ *a*<sup>∗</sup> ∗ *rand*(1, 30) Calculate *μ*∗ based on Equation (24) Update position *σi* based on Equation (23) **end if** Calculate fitness function Update *Qi* and *Qg* **end for endwhile**

### *3.3. Design of WMPSO-LSSVM-Based Fault Diagnosis Scheme for Industrial Systems*

Based on the above analysis, the WMPSO-LSSVM-based data-driven fault diagnosis approach is designed as follows:

	- • Initialize the following parameters: the evolution algebra of the particles, the learning factors *c*1 and *c*2, the regularization factor *C*, the kernel parameter *σ*, and the historical optimal kernel parameter *Qσ*;
	- • Calculate the new information of the *C* and *σ*, and update a new generation of the particles;
	- • Calculate the fitness value of the particles according to the fitness function, and update the individual and global optimal values of *C* and *σ* on this basis;
	- • Evaluate whether the maximum number of iterations or searching boundaries has been reached. If so, store the *C* and *σ*, and construct the WMPSO-LSSVMbased identification model;

The corresponding flowchart is presented in Figure 3.

**Figure 3.** The flowchart of the proposed WMPSO-LSSVM algorithm.

### **4. Experimental Applications for Industrial Systems Based on WMPSO-LSSVM**

The effectiveness and superiority of this study for industrial systems are evaluated on a database taken from the Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis of China. Meanwhile, some comparative experiments are used to further prove the fault diagnosis performance of the proposed method.

As shown in Figure 4, the industrial system studied in this section is the main fan motor of a steam turbine, and the specific research object of this system is the gearbox containing the rolling bearings. The actual data of the gearbox and bearings are obtained from the intelligent fault diagnosis system, which consists of an acceleration sensor, a preamplifier (PMP), an explosion-proof BOX (BOX), a data collector (butylated hydroxytoluene), and a server (PC-1).

**Figure 4.** Schematic diagram of the main fan system.

In addition, the acceleration sensor is installed on the generator to obtain the vibration signals; the role of the BOX is to protect the preamplifier; the preamplifier is installed in the BOX for signal amplification; the data collector is installed in the steam turbine of the main fan for signal acquisition and processing; and the server is used for data storage and management.

The accelerometer used to measure the vibration acceleration mainly contains the following information. The highest amplitude is 50 g, the channel number is 6, the maximum transmission distance is 300 m, the working power supply is 18–30 VDC, and the working current is constant (2–10 mA). The actual industrial system operation environment and data collection situation are shown in Figures 5–8.

**Figure 5.** The on-site industrial environment.

**Figure 6.** The local-data acquisition system.

**Figure 7.** The data acquisition base station.

**Figure 8.** The data acquisition platform.

The data collected by the intelligent fault diagnosis system mainly include seven states, which are different fault combinations of gears and bearings. Their fault modes and corresponding indicators are shown in Table 1, and the waveforms of the part of the original vibration signals are shown in Figures 9–12.

**Figure 9.** The original signals of the inner race fault of the bearings and the tooth loss of gear-box.

**Figure 10.** The original signals of the outer race fault of the bearings and the tooth loss of the gear-box.

**Figure 12.** The original signals of balls that are missing bearings and the tooth loss of the gear-box.


**Table 1.** Seven fault states of the key components for entire systems.

The numbers in bold in the table represent the time domain index of the faulty component. Look at the numbers in the table. If the data in the table appears to be significantly asynchronous, this can be used to distinguish component failures. Taking the waveform indicator as an example,1.2920 is obviously out of sync with all the numbers in the second row of the waveform column, and 1.3007 is also out of sync with the numbers in the first row of the waveform column, so it can be used as the basis for division.

Therefore, according to the indicators in bold in Table 1, the following analysis can be obtained.


Then, the original signals are decomposed into three layers using the wavelet packet decomposition algorithm, and the node coefficients are calculated according to Formula (5). The corresponding results are given in Figure 13. In addition, the wavelet packet coefficients of the third layer, consisting of nodes 7 to 14 and calculated according to Formula (6), are shown in Figure 14.

The spectral distributions of the non-stationary vibration signals of the gearbox and bearings are closely related to their characteristic structures. Therefore, the energy distributions in the wavelet packet space of the original vibration signals decomposed by the wavelet packet are the fault features of the gears and bearings to be extracted. The parts of the characteristic extraction results are shown in Figure 15.

Finally, by using 75% of the extracted fault features as the input to establish the optimal WMPSO-LSSVM and by inputting the test samples into the model, the classification results can be obtained. The experimental results of LSSVM, PSO-LSSVM, and WMPSO-LSSVM are given in Figures 16–18, respectively.

**Figure 13.** The decomposition results of the vibration signals.


**Figure 14.** The node coefficients of the wavelet-packet algorithm.

**Figure 15.** The fault characteristic extraction results of the gear-box and bearings.

**Figure 16.** The classification results of LSSVM.

**Figure 17.** The classification results of PSO-LSSVM.

In order to further verify the superiority of the WVPSO-LSSVM classification model for key components of industrial systems, ELM and the traditional BP network are used for comparison purposes; the experimental results are shown in Table 2 and Figures 19–22.

**Figure 19.** Fault diagnosis results based on ELM (1).

**Figure 20.** Fault diagnosis results based on ELM (2).

**Figure 21.** Training performance of the neural-networks.

**Figure 22.** Training state of the neural-netwoks.


**Table 2.** This table contrasts the results of the three mechanisms.

To evaluate the performance of the WMPSO-LSSVM classification model, the confusion matrices of the WMPSO-LSSVM and ELM are presented, respectively, in Figures 23 and 24.

In the Figures 23 and 24, the blue square represents the number of correctly classified samples, while the pink square represents the number of incorrectly classified samples. For example, in Figure 23, there is only one incorrectly classified sample for the second type, and the remaining nine are correctly classified. The more diagonally distributed samples in the matrix, the better the performance of the model. And according to the results, the WMPSO-LSSVM has a higher precision than ELM.

In order to further verify the effectiveness of the proposed algorithm, the corresponding WVPSO-LSSVM regression model for the bearings and gearbox is established, and the composite fault characteristic trend is predicted. The comparative results are shown in Figures 25–28 and Table 3.

**Figure 23.** The confusion matrix of the WMPSO-LSSVM model.

**Figure 24.** The confusion matrix result of the ELM model.

**Figure 25.** Bearing inner ring wear and gear tooth loss.

**Figure 26.** Bearing outer ring wear and gear tooth loss.

**Figure 27.** Bearing missing balls and gear tooth loss.

**Table 3.** Comparison between the WMPSO-LSSVM regression model and the linear regression model.


Since the weight and the deviation of ELM are randomly generated, the inconsistent networks generated each time will eventually lead to a large performance difference, although the learning speed of ELM is fast and its generalization performance is good. Furthermore, because the BP neural network is a gradient descent method, its optimized objective function is extremely complex, and there will be a zig-zag phenomenon in the training process, which makes the BP algorithm inefficient. The accuracy of the BP neural network also depends largely on the sample size, and the number of fault samples obtained from industrial systems is small. Thus, it is not suitable for limited fault data of complex industrial systems.

In addition, it can be seen from the comparative experimental results that the WMPSO-LSSVM model has strong performance. The introduction of the Gaussian kernel function in WMPSO-LSSVM can expand the diversity and dimension of limited data and solve the defect of traditional neural networks' unsuitability for small samples. At the same time, the model can not only classify complex fault data effectively, but can also predict the complex fault characteristic trend, which has good applicability to complex fault data in industrial systems.
