*2.1. PFCM Algorithm*

In the fuzzy C-means clustering (FCM) algorithm, the membership value of each sample point must be 1.0, therefore, it is sensitive to noise points and the classification result is not accurate. The PCM algorithm is sensitive to the initial cluster center, and only when the cluster centers are the same can the global optimal solution be obtained, which causes cluster consistency problems [34]. To solve the shortcomings of the above algorithm, Pal proposed the possibilistic fuzzy C-means (PFCM) algorithm based on the above algorithm [35,36]. The PFCM algorithm overcomes the sensitivity of the FCM algorithm to noise and the sensitivity of the PCM algorithm to initial clustering centers. Additionally, the PFCM algorithm improves the accuracy of classification results. The objective function of the PFCM algorithm is as follows:

$$f(\mathcal{U}, T, V; X) \ = \sum\_{i=-1}^{c} \sum\_{j=-1}^{n} (au\_{ij}^m + bt\_{ij}^p)d\_{ij}^2 + \sum\_{i=-1}^{c} \eta\_i \sum\_{j=-1}^{n} (1 - t\_{ij}) \tag{1}$$

where 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>c</sup>*, 1 <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>n</sup>*; *<sup>c</sup> i* = 1 *uij* = 1; *a* and *b* define the relative importance of fuzzy membership and typicality values in the objective function, *a* > 0, *b* > 0; *m* and *p* are the fuzzy parameters; *dij* = *xj* − *vi* is the Euclidean distance from sample point *xj* to *vi*; *c* is the number of cluster centers; and *n* is the number of sample points.

The penalty coefficient of the PFCM algorithm is as follows:

$$\eta\_{il} = K \frac{\sum\_{j=-1}^{n} (au\_{lj}^{un} + bt\_{ij}^{p})d\_{ij}^{2}}{\sum\_{j=-1}^{n} (au\_{lj}^{un} + bt\_{ij}^{p})}, K > 0 \tag{2}$$

where η*<sup>i</sup>* is the penalty coefficient; generally, *K* = 1, from the optimal solution of Equation (1), get the following Equations (3)–(5):

$$u\_{ij} = \frac{1}{\sum\_{k=1}^{\varepsilon} \left(\frac{d\_{ij}}{d\_{kj}}\right)^{2/(m-1)}}\tag{3}$$

$$t\_{ij} = \frac{1}{1 + \left(\frac{\text{bd}\_{ij}^2}{\eta\_i}\right)^{1/(p-1)}}\tag{4}$$

$$w\_i = \frac{\sum\_{j=1}^{n} (au\_{ij}^m + bt\_{ij}^p)x\_j}{\sum\_{j=1}^{n} (au\_{ij}^m + bt\_{ij}^p)}\tag{5}$$

The steps of the PFCM algorithm are as follows:

**Step 1** Set the fuzzy parameters, set the terminating threshold ε, set the maximum number of iterations *L*, set the number of initial iterations *l*, initialize the cluster center *V*(0), initialize the membership matrix *U*(0), and initialize the typicality matrix *T*(0);

**Step 2** According to Formula (2), calculate the penalty coefficient η*i*;

**Step 3** According to Formula (3), calculate and update the membership matrix *<sup>u</sup>*(*l*+1) *ij* ;

$$u\_{ij}^{(l+1)} = \frac{1}{\sum\_{\substack{c\\k=1}}^{c} \left(\frac{d\_{ij}^{(l+1)}}{d\_{kj}^{(l+1)}}\right)}\tag{6}$$

**Step 4** According to Formula (4), calculate and update the typicality matrix *t* (*l*+1) *ij* ;

$$t\_{ij}^{(l+1)} = \frac{1}{1 + \left(\frac{b(d\_{ij}^{(l+1)})}{\eta\_{li}}\right)^2} \tag{7}$$

**Step 5** According to Formula (5), calculate and update the cluster center matrix *v* (*l*+1) *<sup>i</sup>* ;

$$\boldsymbol{w}\_{i}^{(l+1)} = \frac{\sum\_{j=1}^{n} \left( a(\boldsymbol{u}\_{ij}^{(l+1)}) \right)^{m} + \left( b\boldsymbol{t}\_{ij}^{(l+1)} \right)^{p} \boldsymbol{\upbeta}\_{j}}{\sum\_{j=1}^{n} \left[ a(\boldsymbol{u}\_{ij}^{(l+1)}) \right]^{m} + b\left( \boldsymbol{t}\_{ij}^{(l+1)} \right)^{p} \boldsymbol{\upbeta}\_{j}} \tag{8}$$

**Step 6** If *V*(*l*+1) <sup>−</sup> *<sup>V</sup><sup>l</sup>* < ε or *L* < *l*, output the cluster center, the membership matrix and the typicality matrix; if not, make *l* = *l* + 1, skip to Step 2. The flow chart of the PFCM algorithm is shown in Figure 1.

**Figure 1.** The flow chart of the possibilistic fuzzy C-means (PFCM) algorithm.

## *2.2. Multi-Parameter Optimization of Support Vector Machine(SVM)*

Support Vector Machine (SVM) is a new general machine learning method presented by Vapnik [37]. Traditional statistical research is in the case of sufficient samples or assuming an infinite number of samples, but in actual problems, there are few samples. According to the principle of structural risk minimization and the Vapnik-Chervonenkis(VC) dimension theory, SVM can combine the complexity of the model with the learning ability, find the optimal solution, and obtain better generalization ability. The classification theory of SVM is developed from the problem of linear separable binary classification. In the process of classification, the optimal classification hyperplane is constructed. The training samples are classified correctly according to the principle of least empirical risk, and the maximum classification interval is required to ensure the minimum confidence range. It has advantages in solving small-sized, nonlinear, high-dimensional problems [38–40]. The objective function of the SVM optimization problem is as follows:

$$\begin{cases} \max \, \mathcal{L}(\boldsymbol{\alpha}) &= \sum\_{i=1}^{n} \alpha\_{i} - \frac{1}{2} \sum\_{i,j=n}^{n} \alpha\_{i} \alpha\_{j} y\_{i} y\_{j} \mathcal{K}(\mathbf{x}\_{i}, \mathbf{x}\_{j}) \\ & \sum\_{i=1}^{N} \alpha\_{i} y\_{i} = \; \; 0 \; \mathcal{0} \le \alpha\_{i} \le \mathcal{C}i = 1, 2 \cdots N \end{cases} \tag{9}$$

where <sup>∀</sup>*xi*, *xj* <sup>∈</sup> *Rn*; <sup>α</sup>*<sup>i</sup>* is the Lagrange multiplier; *<sup>C</sup>* is the penalty factor; and *<sup>K</sup>*(*xi*, *xj*) is the kernel function which can transform a low-dimensional vector into a high-dimensional inner product.

The corresponding optimal classification function is as follows:

$$f(\mathbf{x}) = \text{sgn}[\sum\_{i=1}^{n} a\_i^\* y\_i \mathbb{K}(\mathbf{x}\_i, \mathbf{x}\_j) + b^\*] \tag{10}$$

where <sup>α</sup><sup>∗</sup> is the optimal solution; *<sup>b</sup>*<sup>∗</sup> <sup>=</sup> *yi* <sup>−</sup> *<sup>N</sup> i* = 1 α∗ *i yiK*(*xi*, *xj*).

In the above optimization problem, it is necessary to determine the kernel function *K*(*xi*, *xj*). There are four kinds of kernel functions commonly used in SVM as follows: linear kernel function *K*(*xi*, *yi*) = *xi* · *yi*; polynomial kernel function *K*(*xi*, *yi*) = [(*xi* · *yi*) + *b*] *d* ; hyperbolic tangent kernels function *K*(*xi*, *yi*) = tanh[*v*(*xi* · *yi*) + *c*]; and radial basis kernel function *K*(*xi*, *yi*) = exp(−g*xi* − *yi* 2), where *g* is the kernel parameter.

Many studies show that radial basis kernel function is a better choice when there is not enough prior knowledge [41]. The radial basis kernel function is used as the kernel function in SVM. After that, the kernel function parameter *g* and the penalty factor *C* should be selected which are significant to establish the optimized SVM model.

The artificial bee colony (ABC) algorithm is an intelligent optimization algorithm inspired by biological behaviors proposed by Karaboga [42,43]. It mainly solves practical problems by simulating bees collecting honey. The ABC algorithm finds the global optimal solution through the local optimization behavior of bees. It is often used to solve multi-parameter optimization problems [44]. In the paper, the ABC algorithm is used to obtain the optimal penalty factor *C* and kernel parameter *g.* Compared with the genetic algorithm (GA), and particle swarm optimization algorithm (PSO), the ABC algorithm has the advantages of strong global optimization ability and few control parameters.

The multi-parameter optimization of SVM is as follows:

**Step 1** Initialize the parameters in the ABC algorithm and SVM, i.e., the number of bee colonies, the number of honey sources, the maximum search number of honey sources (*Limit*), the current search number of honey sources, the maximum number of iterations (MaxIter), the search range of penalty factors *C,* and the search range of kernel function parameter *g*.

**Step 2** Select the fitness function in the ABC algorithm. The purpose of optimizing the SVM parameters is to improve the accuracy of fault classification. The solution of the optimization problem can be regarded as a process for the bee to find the honey source. The fitness function is as follows:

$$fitness\_{i} = \begin{cases} \frac{1}{1+f\_{i}} & (f\_{i} \ge 0) \\ 1+\left| f\_{i} \right| & (f\_{i} < 0) \end{cases} \tag{11}$$

where, *fitnessi* is the fitness value of the *i*-th parameter, and *fi* is the objective function value of the *i*-th honey source.

**Step 3** Employed bees search for the neighborhood of the current honey source according to Formula (12) and calculate the fitness of the new honey source according to Formula (11). If the fitness value of the new honey source is better than that of the original honey source, the new honey source position replaces the original honey source position, otherwise the original honey source remains unchanged.

$$\text{neww\\_x}\_{id} = \left. \mathbf{x}\_{id} + R\_{ij} (\mathbf{x}\_{id} - \mathbf{x}\_{kd}) \right. \tag{12}$$

where, new\_*xid* is the value of the *d*-th dimension in the *i*-th new honey source; *xid* is the value of the *d*-th dimension in the *i*-th original honey source; *R* is a random number in [–1, 1]; and *k* is any honey source except the *i*-th honey source.

**Step 4** After the employed bees complete the global search, onlooker bees select the honey source according to Formula (13), and then search for the neighborhood to get the new honey source according to Formula (12). If the fitness value of the new honey source is better than that of the original honey source, the new honey source position replaces the original honey source position, otherwise the original honey source remains unchanged.

$$P\_i = \frac{fitness\_i}{\sum\_{n=1}^{N} fitness\_n} \tag{13}$$

where, *Pi* is the probability that the *i*-th honey source is selected, *fitnessi* is the fitness value of the *i*-th honey source, and *N* is the total number of honey sources.

**Step 5** Judge whether the current search number of honey sources is bigger than the maximum search number of honey sources. If it is bigger, generate a new honey source according to Formula (14).

$$\mathbf{x}\_{i\bar{j}} = \mathbf{m} \mathbf{n} \mathbf{x}\_{i\bar{j}} + R\_{i\bar{j}} (\mathbf{m} \mathbf{a} \mathbf{x}\_{i\bar{j}} - \mathbf{m} \mathbf{n} \mathbf{x}\_{i\bar{j}}) \tag{14}$$

where, *xij* is the value of the *j*-th dimension of the *i*-th honey source, *j* ∈ {1, 2}.

**Step 6** Record the current optimal honey source and judge whether the termination condition is met. If the termination condition is met, skip to Step 7, otherwise skip to Step 3.

**Step 7** Get the global optimal honey sources, which are the penalty factor *C* and kernel parameter *g,* to establish the optimized SVM model.
