*Article* **Partial Discharge Fault Diagnosis Based on Multi-Scale Dispersion Entropy and a Hypersphere Multiclass Support Vector Machine**

**Haikun Shang 1,\*, Feng Li <sup>2</sup> and Yingjie Wu <sup>3</sup>**


Received: 24 December 2018; Accepted: 15 January 2019; Published: 17 January 2019

**Abstract:** Partial discharge (PD) fault analysis is an important tool for insulation condition diagnosis of electrical equipment. In order to conquer the limitations of traditional PD fault diagnosis, a novel feature extraction approach based on variational mode decomposition (VMD) and multi-scale dispersion entropy (MDE) is proposed. Besides, a hypersphere multiclass support vector machine (HMSVM) is used for PD pattern recognition with extracted PD features. Firstly, the original PD signal is decomposed with VMD to obtain intrinsic mode functions (IMFs). Secondly proper IMFs are selected according to central frequency observation and MDE values in each IMF are calculated. And then principal component analysis (PCA) is introduced to extract effective principle components in MDE. Finally, the extracted principle factors are used as PD features and sent to HMSVM classifier. Experiment results demonstrate that, PD feature extraction method based on VMD-MDE can extract effective characteristic parameters that representing dominant PD features. Recognition results verify the effectiveness and superiority of the proposed PD fault diagnosis method.

**Keywords:** PD; fault diagnosis; variational mode decomposition; multi-scale dispersion entropy; HMSVM

#### **1. Introduction**

Partial discharge (PD) is an important symptom of insulation degradation for electrical equipment. PD fault diagnosis plays an irreplaceable role in the evaluation of insulation condition [1]. PD feature extraction is an important step in insulation fault diagnosis. The common methods include statistical atlas (SA) [2], wave analysis (WA) [3] and wavelet transform (WT) [4]. However, SA has the limitations of high request of sampling rate, large data size and slow speed of data processing which are not suitable for on-line monitoring. Besides, it is difficult to extract PD phase information during statistical atlas construction. WA is easily influenced by electromagnetic interference. WT has some inherent limitations such as the difficulty of selection of the wavelet basis, wavelet thresholds, decomposition levels, and so on [5].

Empirical mode decomposition (EMD), as an adaptive signal processing method that decomposes a time series into some limited intrinsic mode functions (IMFs). It is widely used in the areas of fault detection, signal processing and data compression [6–8]. However, due to the problems of ending effects and mode mixing in non-stationary signal decomposition, EMD is limited in practical applications. Variational mode decomposition (VMD) is a new signal decomposition method, which is widely applied in electrical fault feature extraction [9]. It is a non-recursive variational decomposition model. In VMD, the central frequency and bandwidth of each mode are determined by searching the optimal solution of the variation model. VMD can solve the problems of mode mixing and ending effects in traditional EMD methods [10]. In this paper, VMD is employed for PD signal decomposition to extract effective IMFs from PD signals.

In order to quantify the PD feature information extracted by VMD, entropy theory is introduced. Entropy, as a measure of uncertainty or irregularity, was widely applied in fault diagnosis recently [11]. It was first introduced by Shannon in 1948 [12]. Afterwards, approximate entropy (AE) was put forward by Pincus [13], which provided one dimensionless index representing signal features. It was suitable for both deterministic and random signals. However, AE is heavily relied on the data length. Moreover, its estimated value is uniformly lower than expected ones when processing the short dataset [14]. To overcome the weakness of AE, Richman and Moorman proposed sample entropy (SE) [15]. Due to the insensitivity to the data length and immunity to the noise in data, SE has attracted a great deal of attention. However, SE is not fast enough for some real-time applications, especially for long signals [16]. Another widely used regularity indicator is permutation entropy (PE), which is based on the order relations among values of a signal [17]. Although PE is conceptually simple and computationally fast, the method does not consider the mean value of amplitudes and differences between amplitude values [18]. In this paper, a new irregularity indicator is introduced, named dispersion entropy (DE) [19]. The method tackles the abovementioned PE and SE limitations [20]. Because of the relevance and the possible usefulness of DE in several signal analyses, it is important to understand the behavior of the technique for various kinds of classical signal concepts such as amplitude, frequency, noise power, and signal band-width. However, DE estimates the complexity at a single scale [21], which gives rise to unacceptable result when applied to analyze the multiple time scales data [22]. Regarding this disadvantage, a multi-scale dispersion entropy (MDE) procedure was put forward to estimate the complexity of the original time series over a range of scales [23]. In this work, MDE is employed to quantify the PD feature information.

In recent years, a great number of intelligent algorithms have been used in PD fault diagnosis. Support vector machine (SVM) [24], as a learning machine based on kernel functions, that has the property of global optimization and strong generalization ability. However, using hyperplane recognition model, SVM can't accurately classify the samples with nonuniform state distribution. In addition, SVM is restricted in practical application for its inherent binary classification properties [25].

Hypersphere Support Vector Machine (HSSVM), based on SVM, was first proposed by Scholkopf [26]. Instead of the hyperplane, HSSVM uses a hypersphere for pattern recognition. HSSVM can not only separate two different classes, but also divide the sample space into two different parts [27]. Moreover, in order to overcome the limitations of inherent binary classification properties, hypersphere multiclass SVM (HMSVM) was introduced [28]. In HMSVM classification, the samples in same class are assigned to a hypersphere, therefore, the data space is composed of several hyperspheres [29]. Using HMSVM, the multi-class classification is realized directly. The quadratic programming calculation of HMSVM is less than that of one-class SVM, which causes shorter training and testing time. In this paper, particle swarm optimization (PSO) [30] is employed for parameter selection in HMSVM. Then the optimized classification model is applied to PD fault pattern recognition, using extracted PD features.

In this work, the proposed PD fault diagnosis method is combined with the excellent properties of both VMD and MDE. The characteristic parameters representing dominant PD features are effectively extracted. Besides, it can solve the problems in traditional PD feature extraction methods, such as the limitations of high request of sampling rate, slow speed of data processing, difficulties to extract PD phase information, influences by electromagnetic interference, difficulty of selection of wavelet basis, and so on. Finally, HMSVM is employed for PD pattern recognition with extracted parameters. To verify the effectiveness and superiority of the proposed method, different PD feature extraction methods and diverse classifiers are introduced. Results verify the exactness of the conclusion.

The rest of this paper is organized as follows: Section 2 describes the theories of VMD, MDE and HMSVM, and presents the PD fault diagnosis procedure. Section 3 presents a brief introduction to the experimental setup used to generate PD signals. In Section 4 we show the results with their validation. The paper ends with conclusions in Section 5.

#### **2. PD Fault Diagnosis Based on VMD-MDE and HMSVM**

#### *2.1. VMD Algorithm*

VMD decomposes one real signal into *K* independent sub-signal *u*k, which has specific sparsity. This procedure gets the minimum bandwidth estimation of each modal [31]. The procedure of signal decomposition is to solve the variational problem. The variational model with constraint condition is as follows:

$$\begin{cases} \min\_{\{u\_k\}, \{w\_k\}} \left\{ \sum\_{l} \left\| \partial\_l [(\delta(t) + \frac{j}{\pi l}) u\_k(t)] e^{-jw\_k t} \right\|\_2^2 \right\} \\ \text{ s.t. } \sum\_k u\_k = f \end{cases} \tag{1}$$

where {*uk*} = {*u*1, *u*2, ··· , *uK*} demonstrates the modal components, {*wk*} = {*w*1, *w*2, ··· , *wK*} is the center frequency of each modal component, *δ*(*t*) represents impulse function, *∂<sup>t</sup>* means the partial derivatives of *t*, and *f* is the original signal.

In order to obtain the optimal solution of such constrained variational problem, Lagrangian multiplier *λ*(*t*) is introduced. The constrained variational problem is transformed into non-constrained problem:

$$L(\{u\_k\}, \{\omega\_k\}, \lambda) = a \sum\_k \left\| \partial\_t [ (\delta(t) + \frac{j}{\pi t}) u\_k(t) ] e^{-j u\_k t} \right\|\_2^2 + \left\| f(t) - \sum\_k u\_k(t) \right\|\_2^2 + \left< \lambda(t), f(t) - \sum\_k u\_k(t) \right> \tag{2}$$

where *α* is the quadratic penalty factor. Alternate direction method of multipliers (ADMM) is introduced to obtain the saddle point of such Lagrangian function, which is the optimal solution.

The procedure of VMD can be summarized in the following steps:


$$\mathfrak{u}\_{k}^{n+1}(\omega) \leftarrow \frac{\hat{f}(\omega) - \sum\_{ik} \mathfrak{u}\_{i}^{n}(\omega) + \frac{\hat{\lambda}^{n}(\omega)}{2}}{1 + 2a\left(\omega - \omega\_{k}^{\text{n}}\right)^{2}} \tag{3}$$

(3) Update *ωk*.

$$
\omega\_k^{n+1} \leftarrow \frac{\int\_0^\infty \omega \left| \hat{u}\_k^{n+1}(\omega) \right|^2 d\omega}{\int\_0^\infty \left| \hat{u}\_k^{n+1}(\omega) \right|^2 d\omega} \tag{4}
$$

(4) Update *λ* in non-negative frequency intervals:

$$
\hat{\lambda}^{n+1} \gets \hat{\lambda}^n + \tau(\hat{f}(\omega) - \sum\_{k} \hat{\mu}\_i^{n+1}(\omega)) \tag{5}
$$

(5) For a given precision *ε* > 0, if ∑ *k u*ˆ *n*+1 *<sup>k</sup>* <sup>−</sup>*u*ˆ*<sup>n</sup> k* 2 2 *u*ˆ*<sup>n</sup> k* 2 2 < *ε*, then stop iteration. Otherwise, return to (2).

#### *2.2. Theory of Multiscale Dispersion Entropy*

#### 2.2.1. Dispersion Entropy

For a univariate signal *x* = *x*1, *x*2, ··· , *xN*, dispersion entropy method can be described in following steps [32]:

*Entropy* **2019**, *21*, 81

(1) Map *xj*(*j* = 1, 2, ··· , *N*) into *y* = {*y*1, *y*2, ··· , *yN*} from 0 to 1 with the normal cumulative distribution function:

$$y\_j = \frac{1}{\sigma\sqrt{2\pi}} \int\_{-\infty}^{x\_j} e^{\frac{-\left(t-\mu\right)^2}{2\sigma^2}} dt\tag{6}$$

where *σ* and *μ* represent the standard deviation and mean of *x*, respectively.

(2) Assign each *y*<sup>j</sup> to an integer from Label 1 to *c* using a linear algorithm. The mapped signal can be defined as follows:

$$z\_j^c = round(c.y\_j + 0.5) \tag{7}$$

(3) Define embedding vector *zm*,*<sup>c</sup> <sup>i</sup>* with embedding dimension *m* and time delay *d* as:

$$z\_i^{m,c} = \left\{ z\_i^c, z\_{i+d'}^c, \dots, z\_{i+(m-1)d}^c \right\}\_{\prime} \; i = 1, 2, \dots, N - (m-1)d \tag{8}$$

Each time series *zm*,*<sup>c</sup> <sup>i</sup>* is mapped to a dispersion pattern *πv*0*v*1···*vm*−<sup>1</sup> , where:

$$z\_i^\circledcirc = \upsilon\_0.z\_{i+d}^\circledcirc = \upsilon\_1.\cdots \cdot \iota\_{i+(m-1)d} = \upsilon\_{m-1}$$

(4) For each dispersion pattern, the relative frequency can be obtained as:

$$p(\pi\_{v\_0 v\_1 \cdots v\_{m-1}}) = \frac{Number\{i | i \le N - (m-1)d, z\_i^{m,c} \text{ has type } \pi\_{v\_0 v\_1 \cdots v\_{m-1}}\}}{N - (m-1)d} \tag{9}$$

where *<sup>p</sup>*(*πv*0*v*1···*vm*−<sup>1</sup> ) represent the number of dispersion pattern *<sup>π</sup>v*0*v*1···*vm*−<sup>1</sup> , which is assigned to *zm*,*<sup>c</sup> <sup>i</sup>* divided by the total number of embedding signals with embedding dimension *m*.

(5) Based on Shannon's definition of entropy, dispersion entropy with embedding dimension *m*, time delay *d*, and the number of classes *c* can be defined as

$$DE(\mathbf{x}, m, \mathbf{c}, d) = -\sum\_{\pi=1}^{c^m} p(\pi\_{\pi\_0 \upsilon\_1 \cdots \upsilon\_{m-1}}) \cdot \ln(p(\pi\_{\pi\_0 \upsilon\_1 \cdots \upsilon\_{m-1}})) \tag{10}$$

#### 2.2.2. Multiscale Dispersion Entropy

Multiscale Dispersion Entropy (MDE) is the combination of the coarse-graining with dispersion entropy. In MDE, the original signal *x* = *x*1, *x*2, ··· , *xN* of length *N* is first divided into non-overlapping scale factor *τ*. Then the new coarse-grained signals can be shown as follows:

$$\mathbf{x}\_{j}^{(\tau)} = \frac{1}{\tau} \sum\_{i=(j-1)\tau+1}^{j\tau} \mathbf{x}\_{i\prime} \mathbf{1} \le j \le N/\tau \tag{11}$$

Calculate the entropy value of each coarse-grained signal of length *N*/τ with dispersion entropy method:

$$MDE(\mathbf{x}, \mathbf{\bar{x}}, m, \mathbf{c}, d) = DE(\mathbf{x}^{(\mathbf{\bar{x}})}, m, \mathbf{c}, d) \tag{12}$$

#### *2.3. Theory of HMSVM*

#### 2.3.1. HMSVM

HMSVM can classify the samples directly. Each type of samples needs only one-hypersphere training. All training samples are mapped into high-dimension space. Each type of training samples searches for one hypersphere that has small radius and more target samples. HMSVM classification model is shown in Figure 1.

**Figure 1.** Classification model of HMSVM.

For an *M*-class problem, a collection of elements *Xm* (*m* = 1, 2, ... , *M*) is given. Assume that each *Xm* contains *m*-dimension sample *xmi*, *i* = 1, 2 . . . *lm*, which represents *i*-th element in *m*-class.

Assign one hypersphere (*am*,*Rm*) for each sample *Xm*, where *am* is the center of sphere, *Rm* is the radius of suprasphere. The objective function of *m*-th suprasphere can be defined as follows:

$$\begin{array}{l}\min\_{\mathcal{R}\_{\text{m}}} (\mathcal{R}\_{\text{m}}^2 + \mathcal{C}\_{\text{m}} \sum\_{i=1}^{l\_{\text{m}}} \zeta\_{\text{m},i})\\\text{s.t.} ||\Phi(\mathbf{x}\_{\text{m},i}) - a\_{\text{m}}|| \le R\_{\text{m}}^2 + \zeta\_{\text{m},i\prime}\zeta\_{\text{m},i} \ge 0\end{array} \tag{13}$$

where *C*<sup>m</sup> is the penalty factor, representing the trade-off between *Rm* and target samples. *ξm*,*<sup>i</sup>* is the slack variable of HMSVM allowing remote samples staying outside the sphere.

Lagrange function can be obtained after Lagrange multiplier is introduced:

$$L(R, a, \underline{\mathfrak{x}}, a, \gamma) = R\_m^{\cdot 2} + \mathbb{C}\_m \sum\_{i=1}^{l\_m} \underline{\mathfrak{x}}\_{mi} - \sum\_{i=1}^{l\_m} a\_i \left\{ R^2 + \underline{\mathfrak{x}}\_{mi} - \left( \left\| \underline{\mathfrak{x}}^2 \right\| - 2a \cdot \underline{\mathfrak{x}}\_i + \left\| \underline{a}^2 \right\| \right) \right\} - \sum\_{i=1}^{l\_m} \gamma\_i \underline{\mathfrak{x}}\_{mi} \tag{14}$$

The derivative operation of Equation (14) is processed to obtain the dual optimization problem as follows:

$$\min\_{a\_m} \sum\_{i} \sum\_{j} a\_{m,i} a\_{m,j} \mathcal{K}(\mathbf{x}\_{m,i}, \mathbf{x}\_{m,j}) - \sum\_{i=1}^{l\_m} a\_{m,j} \mathcal{K}(\mathbf{x}\_{m,i}, \mathbf{x}\_{m,j}) \tag{15}$$

The restricting condition that the target function should satisfy is shown as follows:

$$\sum\_{i=1}^{l\_m} a\_{m,i} = 1, \ 0 \le a\_{m,i} \le C\_m \tag{16}$$

For an unknown fault sample *d*, we first calculate the square of the distance between *d* and *am* using the formula below:

$$D^2(d) = \left\|d - a\_m\right\|^2 = (d \cdot d) - 2\sum\_{i=1}^{l\_m} a\_i(d \cdot \mathbf{x}\_i) + \sum\_{i=1}^{l\_m} \sum\_{j=1}^{l\_m} a\_i a\_j(\mathbf{x}\_i \cdot \mathbf{x}\_j) \tag{17}$$

The radius of the suprasphere is defined as *Rm = D*(*x*i), where *x*<sup>i</sup> represents the support vector. Therefore, the category assigned to the unknown sample *d* can be determined according to the comparison between *R*m and *D*(*d*).

#### 2.3.2. Kernel Function Selection

Due to the complexity among different PD fault samples, the spherical distribution will not appear in low-dimensional space. PD fault samples need to be mapped into high-dimension space using kernel functions to obtain the optimal hypersphere. In recent time, the common kernel functions include radial basic function (RBF) [33], polynomial kernel function and sigmoid function. After repeating tests, RBF shows outstanding performance. Therefore, RBF is selected as the kernel function for HMSVM. It can be defined in Equation (18):

$$K(\mathbf{x}, \mathbf{x}\_i) = \exp\{-\frac{\left|\mathbf{x} - \mathbf{x}\_i\right|^2}{\sigma^2}\}\tag{18}$$

#### *2.4. PD Fault Diagnosis Based on VMD-MDE and HMSVM*

In this paper, the proposed PD fault diagnosis method combines feature extraction and pattern recognition. Firstly, the original PD signal is decomposed using VMD to obtain the intrinsic mode functions. Secondly MDE value of each intrinsic mode function is calculated. And then principal component analysis (PCA) [34] is introduced to select principal components of MDE as PD feature vectors. Finally, the extracted vectors are sent to HMSVM pattern classifier to recognize different PD faults. The fault diagnosis procedure is as follows:

Step 1: Extract different types of PD signals in experimental environment, including floating discharge (FD), needle-surface discharge (ND), ball-surface discharge (BD) and corona discharge (CD).

Step 2: Select proper initial number of IMF according to the center frequency observation and decompose PD signals using VMD into intrinsic mode functions with different characteristic scales.

Step 3: Calculate the correlation coefficients between each IMF and original PD signal to select effective IMFs [35,36]. If the coefficient is greater than the threshold value, then keep the IMF as effective one. Otherwise, abandon the IMF. In this paper, the threshold value of the correlation coefficient is set to 0.3.

Step 4: Fix the decomposition scale for IMF and calculate the MDE value of extracted IMFs as original PD feature vectors.

Step 5: Analyze the PD vectors by PCA and extract fewer representative principal components as PD characteristic parameters.

Step 6: Send extracted PD characteristic parameters into HMSVM classifier to diagnose different PD fault modes and obtain the final diagnosis result.

The flow chart of PD fault diagnosis with proposed method is shown in Figure 2.

**Figure 2.** PD fault diagnosis procedure based on VMD-MDE and HMSVM.

#### **3. Experiments and Analysis**

#### *3.1. Experimental Setup*

Different PD types can produce different effects in insulation materials, but the range may be diverse. To analyze the characteristics of different PD types, PD signals of different models are extracted in the laboratory [37]. According to the inner insulation structure of power transformers, there are four possible different PD types, including FD, ND, BD and CD. PD models are shown in Figure 3. The experimental setup is shown in Figure 4.

=> ; =**)**>\$;

=>A; =>!;

**Figure 4.** Photograph of experimental setup.

PD signals are detected in the simulated transformer tank in the laboratory. The pulse current is collected by a current sensor with a 500 kHz–16 MHz bandwidth. The UHF signal is extracted by a UHF sensor with a 10–1000 MHz bandwidth. The signal received is imported into the PD analyzer. The test condition is shown in Table 1 and the experimental connection diagram is shown in Figure 5.


**Table 1.** Test condition of PD models.

**Figure 5.** The connection diagram of PD experiment. 1—AC power source; 2—step up transformer; 3—resistance; 4—capacitor; 5—high voltage bushing; 6—small bushing; 7—PD model; 8—UHF sensor; 9—current sensor; 10—console.

#### *3.2. Signal Extraction*

In this paper, four different types of PD signals are extracted with above experimental setup. The extracted PD waveforms are shown in Figure 6.

**Figure 6.** *Cont.*

**Figure 6.** PD signals.

#### **4. Results and Analysis**

#### *4.1. VMD Decomposition*

In this paper, float discharge is taken as an example for VMD decomposition. The number of IMFs, represented as *K*, is determined according to the central frequency observation. The central frequency of IMF with the variation of *K* is shown in Table 2.


6 0.0059 6. 7855 11.7785 13.5579 13.2602 13.9348

**Table 2.** Central frequency.

Table 2 shows that the IMFs with similar central frequency arise from *K* = 5, which means excessive decomposition. Therefore *K* = 4 is selected as the number of IMF. In this paper, the balancing parameter *α* = 2000 and bandwidth parameter *τ* = 0.1. The decomposition results with EMD and VMD are shown in Figures 7 and 8.

7 0.0053 6. 8034 12.1379 13.7877 13.9021 13.9975 14.2814


=>


=**)**>

**Figure 7.** Results of EMD decomposition. (**a**) IMF of decomposition; (**b**) Frequency spectrum of decomposition.

**Figure 8.** *Cont.*

**Figure 8.** Results of VMD decomposition. (**a**) IMF of decomposition; (**b**) Frequency spectrum of decomposition.

Figure 7 shows the EMD decomposition results containing IMF components and frequency spectrum. From the figure we can see that eight IMF components and one remaining component are obtained. However, the problem of mode mixing occurs in EMD decomposition. Besides, IMF component in each decomposition level is different from that of original signal. Figure 8 describes the results of VMD decomposition. It can be seen from this figure that the modal components in VMD approach to the real signal. Figures 7 and 8 verify the effectiveness of VMD and the superiority over EMD. It can be concluded that VMD is more suitable for PD signal decomposition.

#### *4.2. IMF Selection*

In order to obtain the effective IMF, the correlation coefficient (CC) between each IMF and original PD signal is calculated. Given a threshold *T*, if the CC is greater than *T*, the IMF will be selected as effective component; otherwise it will be regarded as false component and abandoned. In this work, *T* is set to 0.3. The CC values of IMF for VMD and EMD are shown as Table 3.


**Table 3.** CC values.

Table 3 shows that the CC value of first three IMFs is larger than the given threshold, which means these IMFs could represent the real components of PD signals. Therefore, the first three IMFs are selected and analyzed for VMD decomposition. Similarly, we can see that the CC value is smaller than the threshold from the fourth IMF, which means these IMFs contain less information of PD signals. Consequently, the first four IMFs are kept for EMD decomposition.

#### *4.3. Feature Extraction*

In this paper four different types of PD signals are decomposed using VMD method. The VMD decomposition parameters are shown in Table 4. *K*<sup>s</sup> is the number of effective IMFs calculated as described in Section 4.2.


**Table 4.** VMD decomposition parameters.

Using the above parameters, the corresponding IMFs of different types of PD are obtained by VMD decomposition. Then the MDE value of each IMF is calculated. During MDE calculation, some preset parameters need to be given, including scale factor *s*, number of classification *c*, time delay *d* and embedded dimension *m*. But considering that aliasing may occur when *d* > 1, *d* is set to 1 as recommended. In order to avoid the trivial case of only one dispersion pattern, *c* is set to 2. For better detection on dynamic change of signals, *m* is set to 6. To analyze the variation of MDE values with different scales, *s* is set to 20. With above parameters, MDE values of four different types of PD signals extracted in the laboratory are calculated. For each type of PD, MDE values are averaged with different IMFs, shown in Figure 9.

**Figure 9.** MDE variation with scale factors.

Figure 9 shows that different types of PD signals have diverse MDE values with variations of scale factors. The reason is that the randomness of PD signals is changing when PD fault occurs, which could change the MDE values. It also indicates that a single scale cannot completely reflect all the signal information and much more important information distributes in other scales. MDE can effectively detect the dynamic variation of PD signals which represent the fault characteristics with different scales. It can be found from the figure that MDE values start to level off after Scale 12. Therefore, the scale factor is set to 12 in this paper. In the case of FD, MDE values of IMFs using VMD and EMD are shown in Figure 10.

**Figure 10.** MDE values of IMFs using VMD and EMD.

Figure 10 shows that with the variation of scales, MDE values extracted by VMD are different. However, MDE values extracted by EMD seems to be same with the increase of decomposition scales which makes it difficult to distinguish different IMFs. The initial FD feature vectors combined with the MDE of all IMFs using VMD decomposition are shown in Table 5.

**Table 5.** Initial feature vectors.


#### *4.4. PCA-Based Dimension Reduction*

Due to the high dimension of extracted feature vectors, it will cause big burden for pattern classifiers which can directly affect the recognition accuracy. In this paper, the PCA method is employed for dimension reduction of initial feature vectors. In the case of *K*1, the covariance matrix is constructed to obtain the principal components. The eigenvalue and eigenvector of the covariance matrix are solved for linear transformation of original vectors. To achieve the goal of dimension reduction, those factors whose eigenvalues are greater than 1 are selected as principal components. The eigenvalue and corresponding contribution rates of the covariance matrix are shown in Table 6.

**Table 6.** Eigenvalues and corresponding contribution rates.


Table 6 shows that first two eigenvalues are greater than 1, and the accumulated contribution rate is larger than 90%. The contribution rate changes with the variation of principle components, shown in Figure 11.

**Figure 11.** The variation of contribution rate with principle components.

It can be concluded from above figure that, the contribution rate from the third principle component starts to level off. In addition, the contribution rates are decreasing gradually which can be ignored. Therefore, first two principle components are suitable for further analysis which represent most of the vector information. To do so, the original 12 indicators are reduced to 2 new ones. With a similar method, the principle components of *K*2, *K*<sup>3</sup> and *K*<sup>4</sup> can be obtained, shown in Table 7.


*K*<sup>4</sup> 0.752 83.368 *R*1, *R*<sup>2</sup>

**Table 7.** Principle components with different IMFs.

It can be seen from Table 7 that nine principle components factors are extracted from 48 feature vectors. And the contribution rate in each IMF is greater than 80%. Given the above, the dimension of feature vectors is reduced to nine after dimension reduction using PCA. Similarly, with above procedure, the calculated PD parameters of different PD types are shown in Table 8.


**Table 8.** Principle components with different IMFs.

#### *4.5. PD Pattern Recognition*

In this paper, 400 PD samples, including FD, ND, BD and CD, are extracted in the laboratory containing 100 samples in each PD type. MDE values of four different PD types are calculated and 50 samples in each type constitute the initial feature vectors. To verify the effectiveness and superiority of the proposed method, the feature extraction methods based on multi-scale sample entropy (MSE) and multi-scale permutation entropy (MPE) are introduced. The calculation method of MSE and MPE is similar with that of MDE. Firstly, PD signals are decomposed using EMD or VMD. After that

MSE or MPE values of extracted IMFs are calculated. Finally, PCA is applied to dimension reduction. The parameters during signal decomposition are shown in Table 9.


**Table 9.** Parameters selection.

PD feature vectors extracted with the above three methods are sent to the HMSVM classifier. Due to the big impact on the fault diagnosis result, HMSVM parameters need optimal configuration with PSO. In the case of VMD-MDE method, first of all, PD samples are divided into training and testing samples. After multiple experimental trials, the number of particle population is set to 20, *w* = 1, *c*<sup>1</sup> = 2, *c*<sup>2</sup> = 2, the maximum number of iterations *N* = 200. The penalty parameter *C* is between 1/n and 1, while the searching range of the kernel parameter *σ* is between 1 and 100. The optimum fitness reaches the maximum value of 96.98% after 19 iterations, when *σ* = 12.26 and *C* = 0.35. Similarly, HMSVM parameters with different feature extraction methods are obtained as follows.

Using the parameters in Table 10, HMSVM classifier is constructed for fault diagnosis of three different PD features. The recognition results with EMD and VMD decomposition are shown in Figures 12 and 13.


**Table 10.** HMSVM parameters.

\$ < = -

**Figure 12.** Recognition results using EMD decomposition.

**Figure 13.** Recognition results using VMD decomposition.

Figures 12 and 13 demonstrate that the recognition result using EMD decomposition is significantly different with that using VMD decomposition. Figure 12 illustrates that the recognition accuracy in each PD type is not less than 80% but no more than 90%, which means, using EMD decomposition, extracted PD features cannot represent most of signal characteristics. In contrary, Figure 13 shows that the recognition accuracy in each PD type is no less than 90%. Moreover, in each PD type, there's no misjudged sample with MDE. This means that, with VMD decomposition, PD features can effectively represent most of signal information. Besides, from above two figures, it gets a satisfactory result with MDE parameters.

To compare the diagnosis results of PD features with different classifiers, artificial neural network (ANN) [38] and support vector machine (SVM) classifiers are introduced for PD pattern recognition. In ANN, back-propagation network is employed as the recognition model, which trains the weight with differentiable nonlinear functions. The classifier parameters are shown in Table 11. *σ* is the kernel parameter of RBF and *C* is the penalty factor in SVM.


**Table 11.** Parameters of ANN and SVM.

With the parameters shown in Tables 10 and 11, ANN, SVM and HMSVM classifiers are constructed for PD pattern recognition. Using diverse classifiers, the recognition result with VMD-MDE can be seen in Figure 14. Table 12 shows the integrative result using different PD features, in which running time means the time used for PD fault diagnosis.

**Figure 14.** Recognition results using VMD-MDE method.



As can be illustrated in Figure 14, using the same PD feature extraction method, the recognition results with different classifiers are significantly different. The average classification accuracy achieved using HMSVM is 100.00%. HMSVM shows great advantages over ANN and SVM. Table 12 shows diverse diagnostic results with different PD features. Compared with different PD feature types, VMD-MDE gives less running time and higher recognition accuracy. It means parameters using VMD-MDE can represent most of PD signal components. The quadratic programming calculation of HMSVM is less than that of SVM, which causes shorter training and testing time. In addition, HMSVM shows better classification ability than other two classifiers, ANN and SVM.

#### **5. Conclusions**

In this paper, a novel PD fault diagnosis method is proposed. This method combines PD feature extraction based on VMD-MDE and PD pattern recognition based on HMSVM. First of all, four types of PD signals are extracted in the experimental environment, including FD, ND, BD and CD. Then VMD is employed for PD signal decomposition. Secondly, proper IMFs are selected according to central frequency observation and MDE values in each IMF are calculated. Afterwards PCA is introduced to select effective principle components in MDE as final PD characteristic parameters. Finally, the extracted principle factors are used as PD features and sent to the HMSVM classifier. Experiment results show the following advantages: the proposed method can extract effective IMFs according to VMD decomposition. PD feature information in IMFs can be quantified successfully with MDE. Using PCA, the principle components which represent prominent characteristics are effectively selected. With small data size and low computational complexity, this approach overcomes the limitations in traditional PD feature extraction methods. Compared with PD feature extraction methods based on EMD-MSE, EMD-MPE, EMD-MDE, VMD-MSE and VMD-MPE, this proposed approach based on VMD-MDE achieves higher recognition accuracy and needs less running time, which can improve the diagnosis efficiency to satisfy real time requirements.

HMSVM uses one hypersphere for pattern recognition. HMSVM can not only separate two different classes, but also divide the sample space into two different parts. Using HMSVM, the classification of multi-classes was realized directly. Compared with ANN and SVM classifiers, HMSVM obtains higher recognition rate and improves the accuracy and efficiency in PD fault diagnosis. On the whole, this proposed method provided a new scheme for PD fault diagnosis. For further consideration, the proposed fault diagnosis method can be employed in PD on-line monitoring and diagnosis.

**Author Contributions:** Y.W. and H.S. conceived and designed the experiments; H.S. performed the experiments; F.L. analyzed the data and contributed to analysis tools; H.S. wrote the paper. All authors have read and approved the final manuscript.

**Funding:** This research was funded by the Science and Technology Project of the State Grid Corporation of China (SGLNDK00KJJS1500008), and the Doctoral Scientific Research Foundation of Northeast Electric Power University (No. BSJXM-201406), China.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18