A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set

Mantach, Sara; Ashraf, Ahmed; Janani, Hamed; Kordi, Behzad

doi:10.3390/en14051355

Open AccessArticle

A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set

¹

Department of Electrical & Computer Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada

²

Verint Systems, Vancouver, BC V6E 4E6, Canada

^*

Author to whom correspondence should be addressed.

Energies 2021, 14(5), 1355; https://doi.org/10.3390/en14051355

Submission received: 5 January 2021 / Revised: 20 February 2021 / Accepted: 25 February 2021 / Published: 2 March 2021

(This article belongs to the Special Issue New Trends in Condition Monitoring and Diagnostics of Power System Assets)

Download

Browse Figures

Versions Notes

Abstract

:

Classification of the sources of partial discharges has been a standard procedure to assess the status of insulation in high voltage systems. One of the challenges while classifying these sources is the decision on the distinct properties of each one, often requiring the skills of trained human experts. Machine learning offers a solution to this problem by allowing to train models based on extracted features. The performance of such algorithms heavily depends on the choice of features. This can be overcome by using deep learning where feature extraction is done automatically by the algorithm, and the input to such an algorithm is the raw input data. In this work, an enhanced convolutional neural network is proposed that is capable of classifying single sources as well as multiple sources of partial discharges without introducing multiple sources in the training phase. The training is done by using only single-source phase-resolved partial discharge (PRPD) patterns, while testing is performed on both single and multi-source PRPD patterns. The proposed model is compared with single-branch CNN architecture. The average percentage improvements of the proposed architecture for single-source PDs and multi-source PDs are 99.6% and 96.7% respectively, compared to 96.2% and 77.3% for that of the traditional single-branch CNN architecture.

Keywords:

partial discharges; phase resolved partial discharge; insulation systems; automated pattern recognition; deep Learning; convolution neural network

1. Introduction

Effective insulation degradation diagnosis is a key prerequisite for monitoring the integrity of any electrical system. An acceptable diagnostic method which has been used over the years is the measurement of partial discharges (PD) [1]. Different parameters were employed for PD classification throughout the years. Some of the parameters include maximum discharge magnitude and number of discharges as a function of time, PD pulses on an elliptic time-base, phase of the positive half cycle of the PRPD patterns, features that were extracted using different dimensionality reduction techniques, the application of mixed Weibull functions and wavelet transform on discharge patterns. For the deep learning part, the parameters used are waveform spectrogram, time-domain waveform signal, and PRPD patterns. Okamoto and Tanaka were among the first to work on developing techniques to measure partial discharges in 1986 [2]. Their work demonstrated the existence of a correlation between the distribution profile of the charge against the phase angle and the level of insulation degradation by analysing the skewness of the profile. Another approach for determining partial discharge sources was based on the analysis of different quantities of discharge as a function of time; these include maximum discharge magnitude, the number of discharges, and the inception voltage [3]. By 1990s it was evident that distinctive characteristic behaviors such as increase, decrease, strong or weak fluctuations of these quantities can be correlated to discharge sources.

With the advancements in the field of pattern recognition, interest increased in automating partial discharge recognition and classification. In 1993, one of the first successful applications of neural networks for automatic recognition of any partial discharge source was reported [4]. The input was extracted from commercial partial discharge detectors that would display PD pulses on an elliptic time-base. The phase position and the spread of the pulses were shown to be correlated with the nature of PD source, suggesting that PD pulses on an elliptic time-base provided important features for characterization. The rate of correct classification varied between 70% and 90% depending on the number of layers in the neural network and the classes to be classified. The choice of the neural network architecture has been an open question since that time. Poor generalization was recorded on real patterns compared to that on training synthesised patterns [5]. In [6], phase resolved partial discharge (PRPD) patterns were considered as inputs to the neural network wherein the phase of the positive half cycle was considered. The study proposed a way to separate superimposed charge-phase patterns which was based on separating contours before passing it to the neural network. The limitation of this method is that it required the patterns to be non-overlapping. More progress was done by Krivda who used dimensionality reduction techniques to derive low-dimensional representations of different partial discharge patterns [7]. Krivda [7] concluded that in order to decide on the right features, a balance should be set between the number of features and the time needed to compute the features; moreover, new types of neural network could yield better results. In [8], the authors discussed automatic recognition of multiple PD sources. A stochastic method based on on applying mixed Weibull functions to the pulse-height distribution patterns was investigated. The study concluded that in case of partially or completely superimposed PD patterns, separation was impossible.

Up to the year 2000, automatic recognition of multiple PD sources was yet to be resolved. The authors in [9] introduced the application of wavelet transform on PD detection proposing the use of Daubechies mother wavelet. Features were extracted from the third level reconstructed horizontal H and vertical V component images. A feature vector was composed by averaging the H and V images in the magnitude and phase directions resulting in 150 elements. The neural network used in this model had one hidden dense layer, and multiple source patterns were used while training. The overall classification accuracy was 88%. However, the authors concluded that further study of actual multiple source PD was required for more accurate assessment of the proposed method. In [10], stochastic procedures and fuzzy classifier were implemented to identify different PD pulses; however, it was noted that the fuzzy classifier was not efficient when PD pulses had similar shapes.

Historically, the input data for any machine learning algorithm had to be pre-processed by using the user’s knowledge of the domain and assessment of which features are important for the specific problem. By 2006, automatic feature extraction became possible through the use of deep artificial neural networks which could accept raw data as input. The first application of deep learning on PD diagnosis was done in 2015 [11]. In [11], the authors recorded the PRPD patterns for six different PD defects in oil, where the patterns were treated as 50 × 64 dimensional images. The classification accuracy increased as the number of hidden layers increased reaching 86% for five hidden layers. The authors in [12] were among the first to use a deep learning architecture called Recurrent Neural Network (RNN) for the classification of PRPD patterns. Trials were performed to decide on the best values for the number of layers and number of power cycles. They achieved an accuracy of 96.62% that outperformed simple deep neural networks (with an accuracy of 93.01%) and traditional machine learning techniques using support vector machine (with an accuracy of 88.63%). Recently, a number of authors have reported the use of deep learning models, such as convolutional neural networks (CNN) for classifying PD sources [13,14,15]. Among these works, various formats of input have been used for PD source identification; these inputs include: waveform spectrogram, time-domain waveform signal, and PRPD patterns. For the waveform spectrogram data, the authors in [16] used CNN to detect PD signal with varying noise and interference signals. The input to the network was an image showing the time-frequency spectrum of sound clips, which were measurements recorded from a switch gear using the transient earth voltage method (TEV). CNN showed superior performance in terms of detection accuracy and detection time compared to other methods prevalent in the industry. In [17], Che et al. used 2D- CNN to classify three PDs sources in XLPE cable which are internal PDs, corona PDs, surface PDs, in addition to noise. Acoustic signals were generated using an optical fiber distributed acoustic sensing system. The 1D-signals were converted to 2D spectral representation by applying mel-frequency cepstrum coefficients analysis (MFCC). For the time-domain waveform data, authors in [18] used signals from an analog transformer model which consisted of impulse fault current waveforms for different fault conditions. Each waveform was represented by a 2500 dimensional vector. The training was performed using the PD data from sources co-occurring simultaneously at two different locations within the winding. This resulted into a total of 20,304 classes corresponding to the different fault conditions at different winding locations. The classification accuracy attained was 99.2%. Wang et al were interested in UHF signals for partial discharge detection in GIS [19]. They collected time-series data from lab experiments and simulations using the finite- difference time-series method (FDTD). The input to the CNN were 64 by 64 images that were down sampled originally from a 600 by 438 time-resolved partial discharge (TRPD) image. The classification accracy was compared with conventional methods based on using statistical features as input. It was concluded that when CNN outperforms the traditional methods when the number of training examples is greater than 500. For the PRPD input data, the authors in [20] obtained mixed onsite and experimental PRPD patterns for six different sources of partial discharge. The input data was represented as a 72 × 50 matrix. An accuracy of 89.7% was achieved. In [21], the authors used CNN in order to detect the deterioration of the insulation in high voltage systems using PRPD images. Four classes were distinguished: start, middle, end and noise. The tested specimens were aged by undergoing high electric stress in a lab setup. Different architectures of the CNN were investigated by changing the hyper-parameters as the number and the size of the kernels. The results were reported in terms of the confusion matrix and the accuracy percentage. In [22], an algorithm was presented to identify multi-source PDs based on a two-step logistic regression model.

It is noteworthy that all of the prior methods reported above depend on the availability of training data from multi-source PD inputs [23]. There are a number of drawbacks associated with this choice. Such a training data is difficult to collect in practice, is time consuming, and, by its very combinatorial nature, precludes the collection of examples for all possible combinations of concurrently occurring defects. In this paper, to address these drawbacks, we propose a novel convolutional architecture for single-source PD and multi-source PD classification using training data with ground-truth available only at the level of single-source PDs. Our proposed architecture consists of a convolutional backbone feeding into multiple fully connected neural networks (FCNs). The input to the convolutional part of the network is the PRPD pattern matrix (Section 2.1). The output of this CNN stage is a common feature representation which is broadcast to different FCNs, wherein each FCN is trained to output the probability of occurrence of a specific PD. Thus the proposed hybrid architecture moves from extracting general representations to more fine-tuned representation in a hierarchical fashion. The overall loss of the network is the combination of individual binary cross entropy losses from each of the FCNs. This loss is jointly optimized with respect to the parameters of the CNN stage and the FCNs. At testing time, our network produces a multi-label output vector signifying the probability of the presence of respective PDs. We show superior performance as compared to models trained independently on single-source PDs demonstrating the value of shared convolutional stage and joint optimization of the FCNs.

2. Material and Methods

2.1. Experimental Setup

Several experiments were performed by different groups to classify partial discharges by the use of their phase resolved partial discharge patterns. PD classification and identification using laboratory data has been used to establish proof of concept for a number of techniques available in the literature (e.g., [24,25]). Lab experiments were done by Janani et al. [26,27] to simulate artificial defects. The experimental setup consisted of a high voltage transformer, a capacitive divider to measure the AC voltage, the test cell, and the PD measurement system as shown in Figure 1. The lab setups simulate common sources of PD in air, oil, and SF₆. PD data collection was conducted in accordance with IEC 60270 standard [28]. The test cells include three sources (floating electrode, moving particle, and fixed protrusion) of partial discharge in SF₆ (Figure 2), two sources of PD (free particle and needle electrode) in transformer oil (Figure 3), and corona in air which has the same setup as floating electrode but filled with air. For the floating electrode, the distance between the gap between the two electrodes is 1 mm. For the free particle, a small bearing with a diameter of 3.17 mm was placed on a concave dish ground electrode. For the point plane electrode, the needle has a diameters of of 20 μm [27]. More details on the experimental setups are explained in [29]. In total, there are six different PD patterns generated. In addition, four different combinations of multiple partial discharges were simulated by using simultaneously two or three test cells. A commercial PD measurement system (Omicron MPD 600) was used to acquire the PRPD of each test cell.

The output data from the Omicron software is exported as binary files. This data includes information about partial discharges taking place relative to the applied phase voltage. The discharge magnitude and phase are divided into 400 and 500 bins respectively. This results in a 400 × 500 matrix

M (x, y)

, where the number in each bin represents the number of discharges occurring at a specific phase angle

(x)

and a specific discharge magnitude

(y)

. Figure 4 shows a visual representation of the six single-source PRPD patterns. The 400 × 500 matrix is reduced to 100 × 100 by summing up the counts in each 40 × 50 sub-matrix. In addition, background noise is unavoidable even with perfect measurement conditions. Background noise is reflected by having an offset charge over all the phase windows. In this work, it has been removed for all the PRPD patterns. In addition to the six classes, an additional no-pattern class corresponding to the cases not involving the presence of any PD is added. In order to encourage the model to learn features related to the shape of PRPDs, the samples were converted into binary samples, where zero threshold is considered for binarization. A visual representation of the binary matrices are shown on the side of the Omicron representation of each of the six single PD source classes. Figure 5 shows the PRPD patterns of the four multi-source PD classes.

To make the systems insensitive to changes in charge magnitude settings on Omicron software and to different applied voltages, samples are extracted using multiple magnitude settings; hence, introducing variability in the dataset. Particularly, for each class, the following charge magnitude settings for the Omicron software were employed that are summarized in Table 1. The numbering of the six single-source classes is done as follows:

Class 1 (corona in air)
Class 2 (floating electrode in SF₆)
Class 3 (free particle in oil)
Class 4 (free particle in SF₆)
Class 5 (point plane electrode in oil)
Class 6 (point plane electrode in SF₆)

The numbering of the four multiple-source classes is done as follows:

Class 14 (corona in air and free particle in SF₆)
Class 16 (corona in air and point plane electrode in SF₆)
Class 46 (free particle in SF₆ and point plane electrode in SF₆)
Class 146 (corona in air, free particle in SF₆ and point plane electrode in SF₆)

2.2. Method

Convolutional neural networks (CNNs) represent a class of deep neural networks that were originally designed for visual images, and have shown state-of-the-art performance for a range of applications [30,31,32,33]. Typically, CNNs consist of a cascade of alternating convolutional and pooling layers as shown in Figure 6. A convolutional layer comprises of a bank of linear 2D or 3D filters which are convolved with a multi-channel input image to produce a multi-channel output of feature maps. The output of convolutions is often passed through a non-linear activation function such as a rectified linear unit (ReLU). A pooling layer subsamples the input in a non-linear fashion (e.g., maximum value in a local window). The successive convolutional and pooling layers, coupled with non-linear activations, confer the CNNs with the capability to automatically learn feature representations at different spatial scales of an image in a hierarchical fashion. Common applications include classification, regression, and matrix-to-matrix transformations [34,35].

In classification problems, a data-point can belong to a single class (mutually exclusive membership) or it could belong to multiple categories at the same time. The latter is usually referred to as multi-label classification. Since PRPD patterns from multiple sources can occur concurrently, PD detection is essentially a multi-label classification problem. In the presence of training data with various combinations of co-occurring multi-source PD labels, building a multi-label classification model is tenable. However, as mentioned in Section 1, collection of such a dataset is expensive, time consuming, and may not allow to span all possible combinations of PDs. On the other hand, it is practically more feasible to collect single-source PD data in large quantities. We therefore focus on methods to capitalize single-source training data for solving multi-label classification problem.

Let K be the number of PD sources. To enable explicit detection of cases with no PDs, we define a separate category representing the absence of all the PDs. Let the training data consisting of N examples be represented as

{X_{i}, y_{i}}_{i = 1}^{N}

, where

X_{i} \in R^{H \times W}

is the

i

th PRPD pattern image, and

y_{i} \in {0, 1}^{K + 1}

is the corresponding

(K + 1)

-dimensional label. The label vector is

(K + 1)

-dimensional because, as described above, we have defined an additional class for cases with no PDs binary label vector, signifying the presence or absence of each PD. Since only single-source examples are considered during training, each

y_{i}

is a one-hot vector. At testing time, the label vector for a test-case can contain multiple 1 s.

2.2.1. Multiple Single-Source Classifiers (Baseline)

To achieve multi-label classification, a traditional way has been to learn multiple

(K + 1)

independent binary classifiers, each trained to detect an individual PD defect. The loss for the

k

th model, given the training dataset, is given by

\begin{matrix} L_{k} (θ_{k}) = & - \frac{1}{N} \sum_{i = 1}^{N} [y_{i k} log (F_{θ_{k}} (X_{i})) \\ + (1 - y_{i k}) log (1 - F_{θ_{k}} (X_{i}))] \end{matrix}

(1)

where

F_{θ_{k}}

is the function encoded by the

k

th model, depending on parameters

θ_{k}

.

y_{i k}

is the

k

th element of

y_{i}

. After the training phase, given a test case

X^{(test)}

, one then needs to invoke

K + 1

models to build a multilabel output,

{\hat{y}}^{(test)} = {F_{θ_{k}} (X^{(test)})}_{k = 1}^{K + 1}

. An example convolutional architecture that accepts a PRPD pattern image and performs a binary classification for the presence of a specific single-source PD is shown in Figure 7a.

2.2.2. Joint Model with Shared CNN Parameters (Proposed)

While the baseline approach described above may learn excellent single-source classifiers, it is not expected to generalize for multilabel classification task. This is due primarily to overtuned class specific parameters

{θ_{k}}_{k = 1}^{K + 1}

learned independently for each single-source PD. To address these issues we propose to decompose the network parameters into two sets: a shared set of common parameters,

ρ_{_{CNN}}

(for the convolutional part), and class specific parameters,

{ϕ_{_{{FCN}_{k}}}}_{k = 1}^{K + 1}

(for fully connected networks). In particular, our proposed architecture has a shared convolutional stage for feature extraction. These features are then distributed to multiple FCNs. Our motivation is to encourage the CNN to learn to extract more general feature representations which are useful for all classes. The FCNs accept these general features to learn class specific models in a joint fashion. Our architecture is shown in Figure 7b. Let the CNN part be represented by the network

G

, and each of the fully connected networks be represented by

H_{k}

. Our joint loss function is then given by

\begin{matrix} L (ρ_{_{CNN}}, {ϕ_{_{{FCN}_{k}}}}_{k = 1}^{K + 1}) = & - \frac{1}{N (K + 1)} \sum_{k = 1}^{K + 1} \sum_{i = 1}^{N} [y_{i k} log H_{k} (G (X_{i})) \\ + (1 - y_{i k}) log (1 - H_{k} (G (X_{i})))] . \end{matrix}

(2)

2.2.3. Design Details for Network Layers

Two CNN layers consisting of 36 filters, a kernel of size 3 × 3 followed by two dense layers with 128 and 64 filters respectively, and ending with a classification layer of seven nodes constitute the network used in this study. Batch normalization has been used in order to decrease the effect of over-fitting. The schematic for one of the classifiers in Figure 7a is shown in Figure 8. The design details for the implemented classifiers are shown in Table 2. The hyper parameters of our neural network such as the number of layers, number of nodes per layer, kernel size were chosen by running experiments for different values of the parameters and plotting the training and validation accuracy curves as a function of epochs.

The design of the layers is kept the same for both the baseline model and the proposed model so that the difference in performance due to the proposed parameter-sharing based architecture can be investigated. The proposed model architecture is shown in Figure 9.

3. Performance Metrics

Since the model is trained using single PRPD patterns only, the generalization of the model is tested by evaluating the performance on a new hybrid dataset that includes PRPD patterns from single as well as from multiple partial discharge sources. In addition to that, samples with different charge magnitude specification on the Omicron software are tested. Different standard multi-label classification metrics have been used in the literature to evaluate the performance of trained models. Some of these metrics include mean average precision, 0-1 exact match, macro and micro F1, per class precision, per class recall, overall precision and overall recall [36]. In this paper, the individual recall (Recall(k)) and the individual precision (Precision(k)) are calculated for each of the classes 1 to 7 by taking into account both single-source PDs and multiple-source PDs. The recall reflects the proportion of the positive examples that is correctly classified, and the precision reflects the proportion of the examples predicted to be positive that are actually positive. PCR and PCP represent the arithmetic mean of recall and precision respectively,

PCR = \frac{1}{K} \sum_{k = 1}^{K} Recall (k)

(3)

PCP = \frac{1}{K} \sum_{k = 1}^{K} Precision (k) .

(4)

In addition, classification accuracy and false negative accuracy are evaluated for single as well as for multiple classes. The classification accuracy is calculated considering equal weights for all classes, while the false negative accuracy is calculated taking into consideration only the true class or classes that the sample truly belongs to. The importance of calculating the false negative accuracy metric in this context comes from the fact that it is of high importance to detect the correct source of PD in high voltage systems. Consistent false identification of a PD source will put the high voltage apparatus in failing condition, in addition to safety risk for employees working near this apparatus. The false negative accuracy reflects on the performance of the model by quantitatively evaluating single classes and multiple classes separately in comparison to the individual recall and precision. If a PRPD pattern belongs to classes one, four and six, then the ground truth is

[1001010]

. The classification accuracy is then calculated by checking the matching elements in each of the seven-element vector

[1001010]

. The classification accuracy for a single sample is calculated as

P_{classification} = \sum_{k = 1}^{7} \frac{M_{k}}{7} \times 100

(5)

where

M_{k}

is equal to one when the element k in the ground truth vector agrees with the prediction of the model for the corresponding class k, and zero otherwise. The ideal classification accuracy is 100% and the worst is 0%.

The false negative accuracy for a single sample is calculated as:

P_{false negative} = \sum_{j = 1}^{T} \frac{N_{j}}{T} \times 100

(6)

where

N_{j}

equals one when the element j in the ground truth vector which is equal to one does not agree with the prediction of the model for the corresponding class j, and zero otherwise. In this metric, checking the matching prediction is performed only on the class or classes that the sample truly belongs to. T is the number of classes that the sample truly belongs to. In our tested dataset, T can be 1 for single-sourced PDs, 2 or 3 for multiple-sourced PDs. Hence, the ideal false negative accuracy is 0% and the worst is 100%. Calculating the classification and false negative accuracies over a number of samples is done by averaging (5) and (6) over the number of samples.

4. Results and Discussion

4.1. Independent Classifiers

Figure 10 shows calculated loss of the trained model, using (1), as a function of epochs (or iterations) for both the training and the validation dataset in log scale. An epoch is when the entire dataset is passed forward and backward through the neural network. Under-fitting is clearly seen for classes six and seven where the gap between the training loss and validation loss increases at epoch 1000. Table 3 shows the classification and the false negative accuracies.

As seen in Table 3, the model does not generalize well to the multiple classes especially Class 16 and Class 146. The arithmetic mean of both precision and recall are shown in Table 4. The precision for each of class three and class four is low similar to the recall of class six. In Table 5 we show the hybrid confusion matrix in which the rows and columns represent the input and predicted classes respectively. The true positives are highlighted for better visibility.

4.2. Proposed Model

As we proceed with training the model, a trade off takes place between generalization and learning deeper features about single partial discharges. Generalization comes in the context of correct classification of multiple sourced—PRPD patterns. The training of the model is terminated when the validation accuracy is observed to start shifting from the training accuracy. During the training phase, a portion of the dataset is used for validation purposes where this portion is used to calculate the loss for back propagation in each epoch. The decision is collectively made by analyzing the average validation and training loss of the seven classes. For epoch 4000, the percentage difference between the validation and the training loss is 0.8% compared to 2.7% for epoch 8000 as shown in Figure 11, and consequently, the training is stopped at iteration (epoch) 4000.

The calculated loss of the trained model as a function of epochs or iterations for both the training and the validation dataset in log scale, using (1), is shown in Figure 12.

The classification accuracy and false negative accuracy are shown in Table 6. In comparison with Table 3, better performance is recorded where the average classification accuracy for single classes increased from 96.2% to 99.6%. The average false negative accuracy for the multiple classes dropped from 23.5% to 8.7%. On the other hand, the arithmetic mean of both precision and recall are calculated in Table 7. Comparing Table 7 with Table 4, ideal recall is recorded for class 6 and ideal precision is recorded for all classes. This indicates that our proposed model enhanced the prediction of true positives. The hybrid confusion matrix for the proposed model is shown in Table 8. As seen in these tables, compared to Table 5, our proposed model has enhanced classification ability not only for single-source PDs, but also for multi-source PDs. This is shown in the last four rows of Table 8 corresponding to the multiple classes and comparing them with that of Table 5. This indicates that our proposed model decreased false negatives predictions.

5. Conclusions

In this paper, a customized approach based on deep learning algorithm particularly CNN has been developed in order to identify single and multiple sources PDs which can occur in high voltage insulation systems. The difficulty of identifying multiple sources PDs using training set of single sources PDs results from the fact that the PRPD patterns are partially overlapping. As a result, traditional machine learning techniques which are based on the manual extraction of features get confused when multiple source PRPD patterns are set to be classified. Different algorithms should be deployed in order to decide on the separation criteria between these overlapped PRPDs. A customized CNN model has been shown to be useful for this problem through the proposed enhanced version based on sharing the weights among different classes. The essence of the proposed model is that the training is done on single sources of PDs only. This is appreciated in the industry where additional financial resources and time are needed to acquire data from simultaneous sources of PDs. The model is robust to electric interference as well as to applied phase voltage. The average percentage improvements of the proposed architecture for single-source PDs and multi-source PDs are 99.6% and 96.7%, respectively compared to 96.2% and 77.3% for that of the independent classifiers architecture.

Author Contributions

Conceptualization: S.M., A.A., H.J. and B.K.; methodology, S.M., A.A.; software, S.M.; formal analysis, A.A. and H.J. and B.K.; investigation, S.M., A.A., H.J. and B.K.; original draft preparation, S.M.; writing, S.M.; review and editing, A.A., H.J. and B.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Natural Sciences and Engineering Research Council of Canada.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Luo, Y.; Li, Z.; Wang, H. A review of online partial discharge measurement of large generators. Energies 2017, 10, 1694. [Google Scholar] [CrossRef] [Green Version]
Okamoto, T.; Tanaka, T. Novel partial discharge measurement computer-aided measuremnet systems. IEEE Trans. Electr. Insul. 1986, 6, 1015–1019. [Google Scholar] [CrossRef]
Gulski, E.; Kreuger, F. Determination of discharge sources by analysis of discharge quantities as a function of time. In Proceedings of the IEEE International Symposium on Electrical Insulation, Baltimore, MD, USA, 7–10 June 1992; pp. 397–400. [Google Scholar]
Satish, L.; Gururaj, B. Partial discharge pattern classification using multilayer neural networks. IET IEE Proc. A Sci. Meas. Technol. 1993, 140, 323–330. [Google Scholar] [CrossRef] [Green Version]
Satish, L.; Zaengl, W.S. Artificial neural networks for recognition of 3-d partial discharge patterns. IEEE Trans. Dielectr. Electr. Insul. 1994, 1, 265–275. [Google Scholar] [CrossRef]
Cachin, C.; Wiesmann, H.J. PD recognition with knowledge-based preprocessing and neural networks. IEEE Trans. Dielectr. Electr. Insul. 1995, 2, 578–589. [Google Scholar] [CrossRef]
Krivda, A. Automated recognition of partial discharges. IEEE Trans. Dielectr. Electr. Insul. 1995, 2, 796–821. [Google Scholar] [CrossRef]
Cacciari, M.; Contin, A.; Mazzanti, G.; Montanari, G. Identification and separation of two concurrent partial discharge phenomena. Proceedings of Conference on Electrical Insulation and Dielectric Phenomena, Millbrae, CA, USA, 23–23 October 1996; Volume 2, pp. 476–479. [Google Scholar]
Lalitha, E.; Satish, L. Wavelet analysis for classification of multi-source PD patterns. IEEE Trans. Dielectr. Electr. Insul. 2000, 7, 40–47. [Google Scholar] [CrossRef] [Green Version]
Contin, A.; Cavallini, A.; Montanari, G.; Pasini, G.; Puletti, F. Digital detection and fuzzy classification of partial discharge signals. IEEE Trans. Dielectr. Electr. Insul. 2002, 9, 335–348. [Google Scholar] [CrossRef]
Catterson, V.; Sheng, B. Deep neural networks for understanding and diagnosing partial discharge data. In Proceedings of the 2015 IEEE Electrical Insulation Conference, Seattle, WA, USA, 7–10 June 2015; pp. 218–221. [Google Scholar]
Nguyen, M.T.; Nguyen, V.H.; Yun, S.J.; Kim, Y.H. Recurrent neural network for partial discharge diagnosis in gas-insulated switchgear. Energies 2018, 11, 1202. [Google Scholar] [CrossRef] [Green Version]
Tuyet-Doan, V.N.; Tran-Thi, N.D.; Youn, Y.W.; Kim, Y.H. One-Shot Learning for Partial Discharge Diagnosis Using Ultra-High-Frequency Sensor in Gas-Insulated Switchgear. Sensors 2020, 20, 5562. [Google Scholar] [CrossRef]
Puspitasari, N.; Khayam, U.; Kakimoto, Y.; Yoshikawa, H.; Kozako, M.; Hikita, M. Partial Discharge Waveform Identification using Image with Convolutional Neural Network. In Proceedings of the 54th International Universities Power Engineering Conference (UPEC), Bucharest, Romania, 3–6 September 2019; pp. 1–4. [Google Scholar]
Barrios, S.; Buldain, D.; Comech, M.P.; Gilbert, I.; Orue, I. Partial discharge classification using deep learning methods—Survey of recent progress. Energies 2019, 12, 2485. [Google Scholar] [CrossRef] [Green Version]
Lu, Y.; Wei, R.; Chen, J.; Yuan, J. Convolutional neural network based transient earth voltage detection. In Proceedings of the 2016 15th International Symposium on Parallel and Distributed Computing (ISPDC), Fuzhou, China, 8–10 July 2016; pp. 386–389. [Google Scholar]
Che, Q.; Wen, H.; Li, X.; Peng, Z.; Chen, K.P. Partial discharge recognition based on optical fiber distributed acoustic sensing and a convolutional neural network. IEEE Access 2019, 7, 101758–101764. [Google Scholar] [CrossRef]
Dey, D.; Chatterjee, B.; Dalai, S.; Munshi, S.; Chakravorti, S. A deep learning framework using convolution neural network for classification of impulse fault patterns in transformers with increased accuracy. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 3894–3897. [Google Scholar] [CrossRef]
Wang, Y.; Yan, J.; Yang, Z.; Liu, T.; Zhao, Y.; Li, J. Partial Discharge Pattern Recognition of Gas-Insulated Switchgear via a Light-Scale Convolutional Neural Network. Energies 2019, 12, 4674. [Google Scholar] [CrossRef] [Green Version]
Song, H.; Dai, J.; Sheng, G.; Jiang, X. GIS partial discharge pattern recognition via deep convolutional neural network under complex data source. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 678–685. [Google Scholar] [CrossRef]
Florkowski, M. Classification of partial discharge images using deep convolutional neural networks. Energies 2020, 13, 5496. [Google Scholar] [CrossRef]
Janani, H.; Kordi, B.; Jozani, M.J. Classification of simultaneous multiple partial discharge sources based on probabilistic interpretation using a two-step logistic regression algorithm. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 54–65. [Google Scholar] [CrossRef]
Ganguly, B.; Chaudhury, S.; Biswas, S.; Dey, D.; Munshi, S.; Chatterjee, B.; Dalai, S.; Chakravorti, S. Wavelet Kernel based Convolutional Neural Network for Localization of Partial Discharge Sources within a Power Apparatus. IEEE Trans. Ind. Inform. 2020, 17, 1831–1841. [Google Scholar]
Gulski, E.; Kreuger, F. Computer-aided recognition of discharge sources. IEEE Trans. Electr. Insul. 1992, 27, 82–92. [Google Scholar] [CrossRef]
Tang, J.; Jin, M.; Zeng, F.; Zhang, X.; Huang, R. Assessment of PD severity in gas-insulated switchgear with an SSAE. IET Sci. Meas. Technol. 2017, 11, 423–430. [Google Scholar] [CrossRef]
Janani, H.; Jacob, N.D.; Kordi, B. Automated recognition of partial discharge in oil-immersed insulation. In Proceedings of the IEEE Electrical Insulation Conference (EIC), Seattle, WA, USA, 7–10 June 2015; pp. 467–470. [Google Scholar]
Janani, H.; Kordi, B. Towards automated statistical partial discharge source classification using pattern recognition techniques. IET High Volt. 2018, 3, 162–169. [Google Scholar] [CrossRef]
IEC 60270 Standard. High-Voltage Test Techniques: Partial Discharge Measurements. Available online: https://webstore.iec.ch/publication/1247 (accessed on 21 January 2021).
Janani, H. Partial Discharge Source Classification Using Pattern Recognition Algorithms. Ph.D. Thesis, University of Manitoba, Winnipeg, MB, Canada, 2016. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016; Volume 1. [Google Scholar]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
Duan, L.; Hu, J.; Zhao, G.; Chen, K.; Wang, S.X.; He, J. Method of inter-turn fault detection for next-generation smart transformers based on deep learning algorithm. IET High Volt. 2019, 4, 282–291. [Google Scholar] [CrossRef]
Polisetty, S.; El-Hag, A.; Jayram, S. Classification of common discharges in outdoor insulation using acoustic signals and artificial neural network. IET High Volt. 2019, 4, 333–338. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI); Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351, pp. 234–241. [Google Scholar]
Durand, T.; Mehrasa, N.; Mori, G. Learning a deep convnet for multi-label classification with partial labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 647–657. [Google Scholar]

Figure 1. Experimental setup for PD measurement [27].

Figure 2. SF₆ test cells: (a) floating electrode; (b) free particle; (c) point-plane electrode [22] (used with permission).

Figure 3. Oil test cells: (a) free particle; (b) point-plane electrode [27].

Figure 4. PRPD patterns and their binary representation of various single defects: (a) class 1; (b) class 2; (c) class 3; (d) class 4; (e) class 5; (f) class 6.

Figure 5. PRPD patterns and their binary representation of various multiple defects: (a) class 14; (b) class 16; (c) class 46; (d) class 146.

Figure 6. Simple CNN architecture.

Figure 7. (a) baseline model with independent model for each class; (b) proposed model with a common convolutional backbone shared across all classes.

Figure 8. CNN architecture of an independent classifier.

Figure 9. Proposed deep learning model architecture: common convolutional backbone shared across all classes.

Figure 10. Training and validation log losses vs. number of iterations for independent classifiers (a) class 1; (b) class 2; (c) class 3; (d) class 4; (e) class 5; (f) class 6; (g) class 7.

Figure 11. Decision on stopping criteria: the percentage difference between the validation loss and the training loss is minimal in the marked region.

Figure 12. Training and validation log losses vs. number of iterations for the proposed model (a) class 1; (b) class 2; (c) class 3; (d) class 4; (e) class 5; (f) class 6; (g) class 7.

Table 1. Different levels of charge magnitude scale setting on the Omicron Software.

Classes	Charge Magnitude Scale Setting
Class 1	100, 200, 250, 500, and 1000 pC
Class 2	70, 100, and 200 nC
Class 3	200, 300, and 350 pC
Class 4	70, 150, and 250 pC
Class 5	10, 50, and 100 nC
Class 6	20, 50, 100, and 200 pC

Table 2. Design specification of an independent classifier.

Layer Type	Output Shape
Input Layer	(None, 100, 100, 1)
Batch Normalization	(None, 100, 100, 1)
Convolution1 2D	(None, 100, 100, 36)
Max-pooling1 2D	(None, 50, 50, 36)
Batch Normalization1	(None, 50, 50, 36)
Activation1	(None, 50, 50, 36)
Convolution2 2D	(None, 50, 50, 36)
Max-pooling2 2D	(None,25, 25, 36)
Batch Normalization2	(None, 25, 25, 36)
Activation2	(None, 25, 25, 36)
Flatten	(None, 22500)
Dense1	(None, 128)
Batch Normalization3	(None, 128)
Activation2	(None, 128)
Dense2	(None, 64)
Batch Normalization4	(None, 64)
Activation4	(None, 64)
Dense3	(None, 1)
Activation4	(None, 1)

Table 3. Accuracy of single and multiple source PRPD patterns.

Classes	Classification Accuracy	False Negative Accuracy
Class 1	87.71%	0%
Class 2	97.7%	0%
Class 3	100%	0%
Class 4	99.82%	0%
Class 5	100%	0%
Class 6	88.44%	0%
Class 7	100%	0%
Class 16	79.42%	44%
Class 46	78.28%	0%
Class 14	85.71%	0%
Class 146	65.71%	50%

Table 4. PCR and PCP for independent classifiers.

Classes	Recall(i)	Precision(i)
Class 1	1	0.83
Class 2	1	1
Class 3	1	0.38
Class 4	0.89	0.6
Class 5	1	0.89
Class 6	0.65	0.99
Class 7	1	0.95
Arithmetic Mean	PCR: 0.93	PCP: 0.8

Table 5. Hybrid confusion matrix of independent classifiers.

		Predicted Classes
		1	2	3	4	5	6	7
Imput Classes	1	100	0	49	37	0	0	0
	2	0	140	10	0	1	0	10
	3	0	0	130	0	0	0	0
	4	0	0	10	97	0	1	0
	5	0	0	0	0	130	0	0
	6	0	0	5	90	10	120	0
	7	0	0	0	0	0	0	200
	14	50	0	50	50	0	0	0
	16	50	0	14	14	0	6	0
	46	50	0	26	50	0	50	0
	146	50	0	45	25	0	0	0

Table 6. Accuracy of single and multiple source PRPD patterns.

Class	Classification Accuracy	False Negative Accuracy
Class 1	100%	0%
Class 2	100%	0%
Class 3	100%	0%
Class 4	97.17%	0%
Class 5	100%	0%
Class 6	100%	0%
Class 7	100%	0%
Class 16	100%	0%
Class 46	100%	0%
Class 14	96.28%	13%
Class 146	90.57%	22%

Table 7. PCR and PCP for the proposed model.

Class	Recall(i)	Precision(i)
Class 1	1	1
Class 2	1	1
Class 3	1	1
Class 4	0.86	0.89
Class 5	1	1
Class 6	1	1
Class 7	1	1
Arithmetic Mean	PCR: 0.98	PCP: 0.99

Table 8. Hybrid confusion matrix of the proposed model.

		Predicted Classes
		1	2	3	4	5	6	7
Imput Classes	1	100	0	0	0	0	0	0
	2	0	140	0	0	0	0	0
	3	0	0	130	0	0	0	0
	4	0	0	0	97	0	12	0
	5	0	0	0	0	130	0	0
	6	0	0	0	0	0	120	0
	7	0	0	0	0	0	0	200
	14	50	0	0	37	0	0	0
	16	50	0	0	0	0	50	0
	46	0	0	0	50	0	50	0
	146	50	0	0	17	0	50	0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mantach, S.; Ashraf, A.; Janani, H.; Kordi, B. A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set. Energies 2021, 14, 1355. https://doi.org/10.3390/en14051355

AMA Style

Mantach S, Ashraf A, Janani H, Kordi B. A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set. Energies. 2021; 14(5):1355. https://doi.org/10.3390/en14051355

Chicago/Turabian Style

Mantach, Sara, Ahmed Ashraf, Hamed Janani, and Behzad Kordi. 2021. "A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set" Energies 14, no. 5: 1355. https://doi.org/10.3390/en14051355

APA Style

Mantach, S., Ashraf, A., Janani, H., & Kordi, B. (2021). A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set. Energies, 14(5), 1355. https://doi.org/10.3390/en14051355

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Convolutional Neural Network-Based Model for Multi-Source and Single-Source Partial Discharge Pattern Classification Using Only Single-Source Training Set

Abstract

1. Introduction

2. Material and Methods

2.1. Experimental Setup

2.2. Method

2.2.1. Multiple Single-Source Classifiers (Baseline)

2.2.2. Joint Model with Shared CNN Parameters (Proposed)

2.2.3. Design Details for Network Layers

3. Performance Metrics

4. Results and Discussion

4.1. Independent Classifiers

4.2. Proposed Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI