A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training

Zhang, Yi; Yu, Yang; Zhang, Yingying; Liu, Zehuan; Zhang, Mingjia

doi:10.3390/en17184574

Open AccessArticle

A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training

by

Yi Zhang

^1,*,

Yang Yu

¹,

Yingying Zhang

¹,

Zehuan Liu

¹ and

Mingjia Zhang

²

¹

State Grid Economic and Technological Research Institute Co., Ltd., Beijing 102200, China

²

State Grid Zhejiang Electric Power Co., Ltd. Construction Company, Hangzhou 310009, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(18), 4574; https://doi.org/10.3390/en17184574

Submission received: 2 August 2024 / Revised: 3 September 2024 / Accepted: 8 September 2024 / Published: 12 September 2024

(This article belongs to the Section F6: High Voltage)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the digital transformation of the grid, partial discharge (PD) recognition using deep learning (DL) and big data has become essential for intelligent transformer upgrades. However, labeling on-site PD data poses challenges, even necessitating the removal of covers for internal examination, which makes it difficult to train DL models. To reduce the reliance of DL models on labeled PD data, this study proposes a semi-supervised approach for PD fault recognition by combining the graph convolutional network (GCN) and virtual adversarial training (VAT). The approach introduces a novel PD graph signal to effectively utilize phase-resolved partial discharge (PRPD) information by integrating numerical data and region correlations of PRPD. Then, GCN autonomously extracts features from PD graph signals and identifies fault types, while VAT learns from unlabeled PD samples and improves the robustness during training. The approach is validated using test and on-site data. The results show that the approach significantly reduces the demand for labeled samples and that its PD recognition rates have increased by 6.14% to 14.72% compared with traditional approaches, which helps to reduce the time and labor costs of manually labeling on-site PD faults.

Keywords:

partial discharge (PD); unlabeled PD samples; semi-supervised learning; graph convolutional network (GCN); virtual adversarial training (VAT)

1. Introduction

Partial discharge (PD) is a critical indicator of insulation degradation in power transformers and plays a significant role in the deterioration process [1,2]. Precise recognition of PD types is essential for identifying fault locations and assessing risk levels. Past research studies have employed machine learning (ML) techniques like the back-propagation neural network (BP network) [3] and support vector machine (SVM) [4] to classify PD types. However, these approaches depend on manually extracted features as inputs, which may not consistently offer the optimal representation for the selected classifier, thus impacting accuracy.

The advancement of deep learning (DL) presents innovative approaches for autonomous feature extraction and intelligent diagnosis of PD faults. Many scholars have introduced techniques like deep belief network (DBN) [5,6], sparse auto-encoder (SAE) [7], and long-short-term memory (LSTM) [8] to learn the complex mapping between PD data and their categories. Moreover, numerous studies have combined CNN image processing with the time-frequency spectrum or statistical spectrum of PD signals, showing superior performance to traditional ML [9,10]. However, these investigations are typical supervised learning (SL) and rely solely on labeled PD data for the training of DL models, which limits the effective utilization of unlabeled samples. Even without fault categories in unlabeled data, it can provide valuable insights into data distribution, ultimately boosting the training process of DL models [11]. Laboratory discharge tests can generate labeled data with relative ease, but the label acquisition of on-site data often needs professional analysis or, in some cases, internal inspection by lifting transformers’ covers to guarantee precise labeling. Consequently, such on-site data are often recorded in an unlabeled format. There is an urgent need to explore a novel DL-based PD recognition approach that can utilize unlabeled data to reduce the reliance on labeled data.

Unsupervised learning (USL) and semi-supervised learning (SSL) can extract knowledge from unlabeled data. Unlike SL, USL operates exclusively on unlabeled data. However, the efficacy of USL in terms of recognition capabilities is often limited, particularly in complex situations with low signal-to-noise ratios or unclear features. Consequently, this presents challenges in applying USL to PD identification tasks. SSL is an effective approach for addressing the above challenge of SL and USL, as it leverages a small quantity of labeled data in conjunction with a substantial volume of unlabeled data to enhance the training of classifiers. The mainstream pseudo-labeling method (PL) [12] uses the model to label unlabeled samples during the training process and incorporate them into the next round of training, which can assist the model in learning the hidden information in unlabeled samples. Virtual adversarial training (VAT) represents another SSL method based on consistency regularization. This method applies perturbations to both labeled and unlabeled data and minimizes variations in model outputs before and after perturbation, which can enhance the model’s robustness [13]. Furthermore, the original graph convolutional network (GCN), a semi-supervised approach based on PL, can simultaneously learn rich numerical and structural features from unlabeled graph signals [14,15]. Currently, the supervised form of GCN has been implemented in mechanical fault diagnosis, such as rolling bearings, and provides advantages over CNN models when dealing with small-scale datasets [16].

This study will introduce GCN and VAT to propose a semi-supervised recognition approach for PD faults. The remaining parts are organized as follows: Section 2 delves into the creation of PD graph signals, which considers both numerical information and region correlation in PRPD. Section 3 elaborates on the semi-supervised approach using GCN and VAT, along with the PD fault diagnosis procedure. Section 4 introduces the process of transformer PD experiments, including platform construction, simulation of typical faults, data collection, and other related processes. Section 5 validates the efficacy of the proposed approach through multifaceted comparisons and applies it to on-site data. Finally, Section 6 concludes this paper.

2. PD Graph Signals

The PRPD is used to represent the distribution of discharge pulse amplitude (q) and pulse count (n) in relation to phase (φ), making it the preferred method in PD detection. The graph signal consists of nodes and edges, with the nodes as the main subject of analysis and the edges representing the correlations between these nodes. These correlations can be represented by a matrix A, where A_ij is the connection strength between pairs of nodes. Furthermore, the Laplacian matrix L (L = D − A) can also serve as an alternative representation. Here, D represents the degree matrix, with D_ii indicating the total count of edges ending at the ith node [17].

To convert a PRPD into a graph signal, this study proposes a method called the granularity window scan to obtain graph nodes. The process is depicted in Figure 1. For instance, the granularity window employs a granularity of g = 6 and a scanning step of l = 2 to scan the PRPD matrix in a top-down and left-to-right manner. At each scan, a 6 × 6 submatrix can be generated as a node, with its value serving as the node’s features. After the scan, a total of 3844 nodes are obtained and arranged spatially.

Subsequently, it is imperative to associate each node with its neighboring nodes to establish the topological connections among them. As shown in Figure 2, the first-order neighbors consist of two parts: one part takes into account adjacency in Euclidean space by selecting eight spatial adjacent nodes as neighbors, while the other part considers similarity in non-Euclidean space by including eight nodes with the highest similarity in features. The quantification of the similarity between nodes [17] is as follows:

A_{i j} = \exp (\frac{- {‖x_{i} - x_{j}‖}^{2}}{2 σ^{2}})

(1)

where x_i is the features of the ith node and σ is the Gaussian kernel. To optimize computation, only the first, second, and third-order spatial neighbors are selected as candidate nodes.

Once the one-order adjacent relationships of all nodes are determined, the similarity between adjacent nodes can determine the connection strength, resulting in the complete graph signal of PD. The graph signal comprises not only the numerical information of the PRPD but also integrates the topology associations among local regions of the PRPD, collectively representing the entire PRPD.

3. Semi-Supervised Recognition Approach for PD Faults Combining GCN and VAT

This section will elaborate on the principle of GCN and VAT and then present a semi-supervised recognition approach for PD faults of power equipment. The approach aims to utilize the data distribution information from unlabeled PD samples, as depicted in Figure 3. The use of unlabeled samples occurs in two phases. In the first phase, the GCN model identifies unlabeled samples and assigns pseudo-labels to those with high confidence. These samples are added to the training data, and the GCN model is updated in the next iteration. The second phase applies maximum perturbation to both labeled and unlabeled samples using the VAT technique and updates the model to ensure minimal output variation before and after perturbation.

3.1. Principle of GCN

In this study, the GCN model comprises GCN layers and self-attention graph pooling (SAGPool) layers and takes PD graph signals as inputs. The GCN layers autonomously extract the PD feature, whereas SAGPool layers selectively remove unimportant features to emphasize critical information.

3.1.1. GCN Layers

To avoid the complexity of decomposing L in traditional graph convolution [18], the literature [19] has introduced the Chebyshev polynomials with K terms (K ≤ S) to simply graph convolution:

y = \sum_{k = 0}^{K} θ ’_{k} T_{k} (\tilde{L}) x

(2)

where I_S is the S-order identity matrix;

\tilde{L}

can be calculated by the equation

\tilde{L} = L - I_{S} = - D^{- 1 / 2} A D^{- 1 / 2}

; θ′ = [θ′₁, θ′₂, …, θ′_K] represents the Chebyshev coefficients; and

T_{k} (\tilde{L})

denotes the Chebyshev polynomials, which are expressed as follows:

\{\begin{cases} T_{k} (\tilde{L}) = 2 \tilde{L} T_{k - 1} (\tilde{L}) - T_{k - 2} (\tilde{L}) \\ T_{0} (\tilde{L}) = I_{S}, T_{1} (\tilde{L}) = \tilde{L} \end{cases}

(3)

The Chebyshev polynomials not only decrease the convolution kernel parameters from S to K but also provide a new spatial-domain perspective on graph convolution. To further optimize the convolution, Kipf et al. [14] have introduced simplifications by setting K = 1 and θ′₀ = −θ′₁ = 1 for Equation (2):

y = θ ’_{0} I_{S} x + θ ’_{1} \tilde{L} x = (I_{S} + D^{- 1 / 2} A D^{- 1 / 2}) x

(4)

The term (I_S + D^−1/2AD^−1/2) is normalized to

{\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2}

for preventing gradient vanishing or exploding that may occur during the training, and a trainable weight W is added to augment the network’s capacity for fitting data. Finally, the matrix form of the GCN layer can be expressed as follows:

\{\begin{cases} Y = Re LU (({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2}) X W) \\ \tilde{A} = A + I_{S}, {\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j} \end{cases}

(5)

where X ∈ R^N×C represents the node features; N and C denote the number of nodes and the feature dimension of each node, respectively; ReLU is the activation function; and Y ∈ R^N×F is the output with F-dimension features.

3.1.2. SAGPool Layer

Through pooling, the GCN layer can integrate local detailed features to construct a larger perspective representation.

The principle of SAGPool is as follows: a GCN layer is utilized to determine the importance scores z of each node in the graph signal and then arrange them in descending order [20]. The top pN nodes with higher scores are preserved as crucial nodes for the next iteration, and the other nodes are abandoned, where p ∈ (0, 1] represents the pooling rate [21]. This can enable the GCN to concentrate on crucial information within the PRPD. Assuming vector m indicates the retention status of all nodes, the upgrades of variables X and A are depicted in Equation (6).

\{\begin{cases} X ’ = (X_{m, :} \otimes \tanh (z m)) \\ A ’ = A_{m m} \end{cases}

(6)

where X_m_,_: represents the crucial nodes determined by m; z_m signifies the attention mask for nodes; and A_mm indicates the update of matrix A in accordance with the retained nodes.

The updated graph signals can serve as inputs for the next stages or can be globally aggregated through global average and maximum operations to generate the final outputs. For more details, refer to the reference [21].

3.2. Semi-Supervised Training Based on VAT

3.2.1. Overview of Semi-Supervised Training

As shown in Figure 3, x_l and x_ul represent the labeled and unlabeled PD samples, respectively. The semi-supervised approach initially employs x_l to train the GCN model. After each iteration, the unlabeled x_ul are tested, and those with a confidence level b > 0.7 (

{\bar{x}}_{ul}

) are marked pseudo-labels

{\bar{y}}_{ul}

, which dynamically enlarge the training set for the next iteration. During training, the VAT seeks to identify the maximum perturbation boundary for each sample in order to enhance the robustness of GCN. Its principle will be introduced.

3.2.2. VAT and Loss Functions

Adding small perturbations to samples can cause DL models to produce incorrect outputs with high confidence [22]. These perturbed samples are called adversarial examples. VAT introduces adversarial examples and identifies the most sensitive perturbation direction of DL models [13]. By minimizing the output differences before and after perturbation, the model strengthens decision boundaries and improves its robustness.

Assuming

x_{*}

includes both x_l and x_ul, VAT can be described as follows:

L D S (x_{*}, W) = D_{KL} [p (y | x_{*}, \hat{W}), p (y | x_{*} + r_{vadv}, W)]

(7)

where D_KL,

\hat{W}

, and r_vadv represent the Kullback–Leibler (KL) divergence, the model parameters of the current state, and the adversarial perturbation, respectively.

The core of VAT lies in searching for the optimal direction and magnitude of r_vadv. If the search boundary is defined as ||r||₂ < ε, the search process can be expressed as follows:

r_{vadv} : = \underset{{‖r‖}_{2} < ε}{\arg \max} D_{K L} [p (y | x_{*}, \hat{W}), p (y | x_{*} + r, \hat{W})]

(8)

This equation indicates that solving for r_vadv is essentially searching for perturbations, which can maximize the difference in KL divergence. Hence, the gradient ascent direction is the optimal perturbation direction, as shown in Equation (9).

ξ = \nabla_{r} D_{K L} [p (y | x_{*}, \hat{W}), p (y | x_{*} + r, \hat{W})]

(9)

Further normalizing ξ can lead to the optimal perturbation amplitude, as shown in Equation (10).

r_{vadv} = ε \frac{ξ}{{‖ξ‖}_{2}}

(10)

The total loss function of GCN can be formulated as follows:

J = J_{C l a s s} + γ \cdot M e a n (L D S (x_{*}, W))

(11)

\{\begin{cases} J_{C l a s s} = CE (x_{l}, y_{l}) + α (a c c_{v a l}) \cdot CE ({\bar{x}}_{u l}, {\bar{y}}_{u l}) \\ where α (a c c_{v a l}) = 0.5 a c c_{v a l} \end{cases}

(12)

where Mean() and CE() denote the mean function and cross-entropy function, respectively; γ is a penalty coefficient; and α(acc_val) reflects the attention to pseudo-labeled samples in training, where acc_val is the recognition rate of the validation set.

3.3. Algorithm Flow

The flow of the semi-supervised PD recognition method is illustrated in Figure 4. The steps are as follows.

Step One: Data preprocessing, involving compressing the PRPD into a 128 × 128 two-dimensional matrix.

Step Two: Construction of a PD graph signal. This employs a granularity window to scan the PRPD matrix, thereby extracting the PD graph signals along with their corresponding graph nodes and topological structure. These graph signals are categorized into labeled and unlabeled samples, with the labeled subset further divided into training, validation, and test sets.

Step Three: Building the GCN Model and training model.

Step Four: Testing the model, involving utilizing the trained GCN to identify PD faults in the test set and outputting the types of PD faults.

4. Transformer PD Experiment

4.1. Data Collection

This section follows the standard IEC60270:2000 [23] to establish a PD test platform, as shown in Figure 5. The experiment uses a high-frequency current transformer (HFCT) to detect PD signals within a detection band of 1–35 MHz. The signals are recorded on an oscilloscope with a bandwidth range of 1 to 10 MHz and a sampling rate of 20 MS/s.

Four fault models have been developed to simulate PD faults in power transformers, such as point discharge, surface discharge, gap discharge, and floating discharge. These models, illustrated in Figure 6, are constructed using various electrodes: a needle-plate electrode, a ball-plate electrode containing insulating paper (to simulate discharge along the insulating surface), a plate-to-plate electrode with an air gap in the middle (to simulate bubble discharge in oil), and another plate-to-plate electrode with a suspended copper block (to simulate potential-induced discharge). Each model is submerged in mineral oil during the experiment.

4.2. PRPD and Preprocessing

A PRPD spectrum is produced through collecting time-domain signals at 2 s intervals (100 cycles). The initial step involves converting the moments of PD pulses into phases ranging from 0° to 360°. To optimize the computational efficiency of GCN, the discharge phase and quantity (0~q_max) are evenly divided into 128 intervals each, resulting in a 128 × 128 matrix. The values in the matrix denote the statistics of PD within specific phase and quantity intervals. These values undergo linear normalization using the ‘max-min value’ approach. The diagram of preprocessing is shown in Figure 7.

Following preprocessing, four types of PRPD shown in Figure 8 are obtained, consisting of 400, 110, 270, and 310 samples. These PRPDs undergo enhancement through a generative adversarial network (GAN) to address class imbalance, which achieves 400 samples per class. This approach has been validated by several research teams as effective in addressing class imbalance [24,25]. Furthermore, the PD graph sample set, which consists of 1000 labeled samples, has been divided into training, validation, and test sets, containing 420, 100, and 480 samples, respectively. Moreover, 600 samples with hidden labels have been allocated for inclusion in the semi-supervised training.

5. Discussion

5.1. GCN Structure and Graph Signal Parameters

An increased number of GCN layers can result in feature over-smoothing and reduced computational efficiency. Hence, it is advisable to restrict GCN depth to four or fewer. To test the impact of varying GCN depths, a four-layer GCN model is trained, and the outputs of each layer are visualized in Figure 9.

The remaining parameters, such as graph signal parameters g-l, the number of iterations, the learning rate, the size of the GCN hidden layer, and the weight decay coefficient, are determined through searching within a certain range. The search range and the selected parameters (highlighted in bold) are presented in Table 1.

5.2. Effectiveness Analysis of the Semi-Supervised PD Recognition

This section aims to validate the effectiveness of the proposed approach in using unlabeled PD samples by means of iterative curves and ablation experiments. The experimental setup includes a control group consisting of a supervised model with a limited number of samples (LS model), a supervised model with all available samples (AS model), and a semi-supervised model with pseudo-labels (PL model), as outlined in Table 2.

Figure 10 displays the iteration curves. The FS model demonstrates a PD recognition rate of approximately 90%, while the AS model achieves around 96%. Through successive iterations, the proposed model consistently converges towards the recognition rate of the AS model. These findings indicate that the proposed model has the capability to extract features from unlabeled PD samples, consequently enhancing its performance in comparison to the LS model, which disregards unlabeled samples.

We maintained the number of labeled samples while varying the number of unlabeled samples to 100, 300, and 500 in order to further compare the proposed approach with the AS model. This means the AS model incorporates the labels of unlabeled samples into the supervised training process. The outcomes are shown in Figure 11. As the number of unlabeled PD samples increases, the average error rates for the four types of PD faults gradually decrease to 9.17%, 7.08%, and 5.83%. This trend exhibits a resemblance to the performance of the AS model when utilizing all available samples and their labels.

5.3. Ablation Experiment

The control groups of the ablation experiment consist of the LS model and the PL model. The experiment utilizes 600 unlabeled samples, 100 labeled validation samples, and 480 test samples. These samples are used to create a new training set by randomly adding 10%, 20%, 30%, 50%, 70%, and 100% of the labeled training data. The recognition rates of these approaches are depicted in Figure 12, where the line chart shows the PD recognition rate, and the bar chart demonstrates the improvement compared to the LS model.

The results presented in Figure 12 illustrate a progressive increase in the PD recognition rate as the number of labeled samples grows. In comparison to the FS model, both the PL model and the proposed model exhibit improvements ranging from 3.33% to 5.42% and 5.83% to 10.83%, respectively. Moreover, the proposed VAT-based approach significantly enhances semi-supervised performance compared to the PL model, resulting in an increase of more than 2%. This is because VAT can identify the most sensitive direction of PD samples and generate adversarial examples that are dynamically expanded into the next iteration, thus decreasing the need for a large scale of labeled training data.

5.4. Comparison with Traditional PD Recognition Approaches

The comparative approaches are as follows: (1) The ‘Statistical features + SVM’ includes 16 statistics of PRPD, such as discharge amplitude, skewness, and steepness [26]. The kernel parameters of SVM are determined by grid search. (2) The ‘Image moment features + BP network’ extracts grayscale image moments as PRPD features and employs a three-layer BP network with 10 neurons in the hidden layer and 100 iterations. (3) The ‘PRPD + CNN’ directly feeds the PRPD matrix into CNN that comprises two stacks of a ‘convolution layer + pooling layer’. (4) The ‘PD graph signal + GCN’ is a supervised approach that does not utilize unlabeled samples and VAT compared to the proposed model. The comparative results are detailed in Table 3 (100% of labeled training samples) and Table 4 (50% of labeled training samples). In these tables, the metric ‘Average’ donates the average recall rate of four PD types, where the recall rate ‘Rate’ is the proportion of accurately identified samples relative to the total number of samples within the class.

The initial two approaches show the lowest recognition rates due to the segregation between manual feature extraction and classifier training. The average rate of ‘PRPD + CNN’ exceeds that of the first approach by approximately 4.28%, which mainly owes to the self-learning capacity of the CNN. However, the CNN inherently relies on huge amounts of data, resulting in a notable decline in recognition accuracy with decreasing data. In contrast, the average rate of ‘PD graph signal + GCN’ exceeds that of the CNN approach by 3.37%, as depicted in Table 4. The proposed approach further introduces unlabeled samples and VAT, which results in a more than 6% improvement compared to ‘PD graph signal + GCN’. Even with a 50% decrease in labeled samples, the average rate can still reach 93.34%. This suggests that the proposed approach reduces the reliance of DL models on PD labels, which may play a crucial role in reducing the time and labor costs associated with annotating fault data for field maintenance staff.

5.5. Field Case Analysis

This part selects the on-site PD data of power equipment for further validation. The primary data sources include (1) offline detection data from operating power transformers and oil-immersed reactors and (2) factory test data of power transformers. The equipment voltage levels range from 35 kV to 220 kV and 1000 kV. Typically, on-site PD detection stores data in the form of PRPD images, so it is necessary to ensure that the images are aligned with the dimensions of the input data in a GCN before recognition. The main process is illustrated in Figure 13. Firstly, the foreground information of these PRPD images is obtained through image segmentation, gradually removing redundant elements such as grid lines, power-frequency sinusoidal waves, and coordinate axes. Finally, each image is converted into a PRPD matrix of size 128 × 128.

The proposed approach successfully recognizes on-site PD data, as shown in Table 5. For surface discharge, internal inspections have found that it is due to discharge on insulating paper surfaces. The approach accurately identifies all nine samples of this discharge type. Gap discharge, primarily originating from power transformer bulges and bubbles on insulating cardboard surfaces, was identified in 13 out of 15 samples. Floating discharge, linked to loosely fixed nuts on transformers or reactors, was diagnosed in over 70% of the 23 samples. Moreover, this approach shows superior performance compared to the mainstream CNN. The above analysis indicates that the approach is efficient for on-site data with complex sources and can offer a valuable reference for the maintenance and operation of field equipment.

6. Conclusions

To effectively utilize unlabeled PD samples for DL models, this paper proposes a semi-supervised recognition approach for PD faults. The proposed approach is evaluated against conventional approaches and applied to the analysis of field cases. The main works and conclusions are as follows.

A PD graph signal and the GCN are introduced to PD recognition. This approach can autonomously acquire the node information and topology association of PRPD and diagnose fault types. This approach demonstrates superior advantages over the CNN-based approach, particularly in scenarios characterized by limited sample sizes.
A semi-supervised framework is constructed by further integrating VAT into GCN. The findings from ablation experiments indicate that this framework effectively improves the PD recognition rate by 5.83% to 10.83% with the support of unlabeled samples.
Compared to traditional approaches based on SVM, BPNN, and CNN, the proposed approach attains enhancements of 12.12%, 14.72%, and 6.14%, respectively, in recognition rate when dealing with limited labeled training data. This advancement reduces the reliance on the PD category information during the training process of common DL models.
For on-site detection data, the proposed approach can still obtain superior PD recognition performance and help to reduce the time and labor costs of manually labeling on-site PD faults.

Author Contributions

Conceptualization, Y.Z. (Yi Zhang) and M.Z.; methodology, Y.Z. (Yi Zhang) and Y.Y.; software, Y.Z. (Yi Zhang) and Y.Y.; validation, Y.Z. (Yi Zhang), Y.Z. (Yingying Zhang), and Z.L.; writing—original draft preparation, Y.Z. (Yi Zhang); writing—review and editing, M.Z. and Z.L.; funding acquisition, Y.Z. (Yingying Zhang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Corporation of China, grant number 5200-202456149A-1-1-ZN.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Yi Zhang, Yang Yu, Yingying Zhang and Zehuan Liu were employed by State Grid Economic and Technological Research Institute Co., Ltd. Author Mingjia Zhang was employed by State Grid Zhejiang Electric Power Co., Ltd., Construction Company. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, F.; Du, J.; Shi, Y.; Zhang, S.; Wang, W.; Xiao, M.A. Localization of Dual Partial Discharge in Transformer Windings Using Fabry–Pérot Optical Fiber Sensor Array. Energies 2024, 17, 2537. [Google Scholar] [CrossRef]
Sekatane, P.M.; Bokoro, P. Time Reversal vs. Integration of Time Reversal with Convolution Neural Network in Diagnosing Partial Discharge in Power Transformer. Energies 2023, 16, 7872. [Google Scholar] [CrossRef]
Candela, R.; Mirelli, G.; Schifani, R. PD Recognition by Means of Statistical and Fractal Parameters and A Neural Network. IEEE Trans. Dielectr. Electr. Insul. 2000, 7, 87–94. [Google Scholar] [CrossRef]
Mas’ud, A.A.; Albarracín, R.; Ardila-Rey, J.A.; Muhammad-Sukki, F.; Illias, H.A.; Bani, N.A.; Munir, A.B. Artificial Neural Network Application for Partial Discharge Recognition: Survey and Future Directions. Energies 2016, 9, 574. [Google Scholar] [CrossRef]
Karimi, M.; Majidi, M.; Mirsaeedi, H.; Arefi, M.M.; Oskuoee, M. A Novel Application of Deep Belief Networks in Learning Partial Discharge Patterns for Classifying Corona, Surface, and Internal Discharges. IEEE Trans. Ind. Electron. 2019, 67, 3277–3287. [Google Scholar] [CrossRef]
Dai, J.; Song, H.; Sheng, G.; Jiang, X. Dissolved Gas Analysis of Insulating Oil for Power Transformer Fault Diagnosis with Deep Belief Network. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2828–2835. [Google Scholar] [CrossRef]
Duan, L.; Hu, J.; Zhao, G.; Chen, K.; He, J.; Wang, S.X. Identification of Partial Discharge Defects Based on Deep Learning Method. IEEE Trans. Power Deliv. 2019, 34, 1557–1568. [Google Scholar] [CrossRef]
Balouji, E.; Hammarström, T.; McKelvey, T. Classification of Partial Discharges Originating from Multilevel PWM Using Machine Learning. IEEE Trans. Dielectr. Electr. Insul. 2022, 29, 287–294. [Google Scholar] [CrossRef]
Peng, X.; Yang, F.; Wang, G.; Wu, Y.; Li, L.; Li, Z.; Bhatti, A.A.; Zhou, C.; Hepburn, D.M.; Reid, A.J. A Convolutional Neural Network-Based Deep Llearning Methodology for Recognition of Partial Discharge Patterns from High-voltage Cables. IEEE Trans. Power Deliv. 2019, 34, 1460–1469. [Google Scholar] [CrossRef]
Song, H.; Dai, J.; Sheng, G.; Jiang, X. GIS Partial Discharge Pattern Recognition via Deep Convolutional Neural Network Under Complex Data Source. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 678–685. [Google Scholar] [CrossRef]
Ziraki, N.; Bosaghzadeh, A.; Dornaika, F. Semi-supervised Learning for Multi-View Data Classification and Visualization. Information 2024, 15, 421. [Google Scholar] [CrossRef]
Zhang, J.; You, S.; Liu, A.; Xie, L.; Huang, C.; Han, X.; Li, P.; Wu, Y.; Deng, J. Winter Wheat Mapping Method Based on Pseudo-Labels and U-Net Model for Training Sample Shortage. Remote Sens. 2024, 16, 2553. [Google Scholar] [CrossRef]
Miyato, T.; Maeda, S.-i.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1979–1993. [Google Scholar] [CrossRef] [PubMed]
Kipf, T.N.; Welling, M. Semi-supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
Jiménez-Aparicio, M.; Hernández-Alvidrez, J.; Montoya, A.Y.; Reno, M.J. Embedded, Real-time, and Distributed Traveling Wave Fault Location Method Using Graph Convolutional Neural Networks. Energies 2022, 15, 7785. [Google Scholar] [CrossRef]
Zhang, D.; Stewart, E.; Entezami, M.; Roberts, C.; Yu, D. Intelligent Acoustic-based Fault Diagnosis of Roller Bearings Using a Deep Graph Convolutional Network. Measurement 2020, 156, 107585. [Google Scholar] [CrossRef]
Manoj, B.; Chakraborty, A.; Singh, R. Complex Networks: A Networking and Signal Processing Perspective, 1st ed.; Mechanical Industry Press: Beijing, China, 2018; pp. 10–38. [Google Scholar]
Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The Emerging Field of Signal Processing on Graphs: Extending High-dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, In Proceedings of Conference and Workshop on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016.
Zhang, S.; Wang, J.; Yu, S.; Wang, R.; Han, J.; Zhao, S.; Liu, T.; Lv, J. An Explainable Deep Learning Framework for Characterizing and Interpreting Human Brain States. Med. Image Anal. 2023, 83, 102665. [Google Scholar] [CrossRef]
Lee, J.; Lee, I.; Kang, J. Self-attention graph pooling. Statistics 2019, 3, 3–10. [Google Scholar]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing Properties of Neural Networks. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
IEC 60270; Partial Discharge Measurement. IEC: Geneva, Switzerland, 2000.
Wang, Y.; Yan, J.; Yang, Z.; Jing, Q.; Wang, J.; Geng, Y.J.H.V. GAN and CNN for imbalanced partial discharge pattern recognition in GIS. High Volt. 2022, 7, 452–460. [Google Scholar] [CrossRef]
Barrios, S.; Buldain, D.; Comech, M.P.; Gilbert, I.; Orue, I. Partial Discharge Classification Using Deep Learning Methods—Survey of Recent Progress. Energies 2019, 12, 2485. [Google Scholar] [CrossRef]
Liang, H.; Ju, T.; Chao, L.; Zhang, X.X. Pattern Recognition for Partial Discharge Based on Multi-feature Fusion Technology. High Volt. Eng. 2015, 41, 947–955. [Google Scholar] [CrossRef]

Figure 1. The granularity window scan and nodes.

Figure 2. The topological connections among nodes.

Figure 3. The diagram of the semi-supervised PD recognition approach combining GCN and VAT.

Figure 4. Flowchart of the proposed semi-supervised PD recognition approach.

Figure 5. PD test platform.

Figure 6. Discharge fault models.

Figure 7. PD data preprocessing.

Figure 8. Four types of PRPD.

Figure 9. Feature visualization results for varying GCN layers.

Figure 10. Comparison of iteration curves with supervised methods.

Figure 11. The results with varying numbers of unlabeled samples.

Figure 12. The results of ablation experiment.

Figure 13. Preprocessing of on-site PRPD images.

Table 1. The parameters for GCN and graph signals.

Hyperparameter	Range
g-l	3–1, 6–2, 12–4
Number of Iterations	50, 100, 200
Learning Rate	1 × 10⁻², 1 × 10⁻³, 1 × 10⁻⁴
Hidden Layer Size	32, 64, 128
Weight Decay Coefficient	1 × 10⁻³, 1 × 10⁻⁴ and 5 × 10⁻⁴

Table 2. Description of control groups and the proposed approach.

Approach	Labeled Samples		Unlabeled Samples		Pseudo-Labels	VAT
Approach	x_l	Labels	x_ul	Labels	Pseudo-Labels	VAT
LS model	√	√
AS model	√	√	√	√
PL model	√	√	√		√
Proposed model	√	√	√		√	√

Table 3. The comparative results with 100% of labeled training samples.

PD Type	Statistical Features + SVM/%		Image Moment Features + BP/%		PRPD Matrix + CNN/%		Graph Signal + GCN/%		Proposed Approach/%
PD Type	Rate	Average	Rate	Average	Rate	Average	Rate	Average	Rate	Average
Tip	83.46	84.36	81.10	81.94	89.76	88.74	90.55	89.57	96.06	95.84
Surface	84.68		79.03		87.10		87.90		94.35
Gap	81.25		82.14		87.50		88.39		94.64
Floating	88.03		85.47		90.60		91.45		98.29

Table 4. The comparative results with 50% of labeled training samples.

PD Type	Statistical Features + SVM/%		Image Moment Features + BP/%		PRPD Matrix + CNN/%		Graph Signal + GCN/%		Proposed Approach/%
PD Type	Rate	Average	Rate	Average	Rate	Average	Rate	Average	Rate	Average
Tip	78.74	81.22	76.38	78.62	85.83	83.83	89.76	87.20	92.91	93.34
Surface	83.06		76.61		81.89		86.29		92.74
Gap	75.89		78.57		82.14		82.14		91.96
Floating	87.18		82.91		85.47		90.60		95.73

Table 5. Recognition results of on-site PD data.

PD Type	Internal Inspection	Count	Approach	Results
PD Type	Internal Inspection	Count	Approach	Point	Surface	Gas	Floating
Surface	Creepage discharge along the insulating paper	9	Proposed	0	9	0	0
Surface	Creepage discharge along the insulating paper	9	CNN	2	5	1	1
Gap	Bulges and bubbles on insulating cardboard	15	Proposed	0	2	13	0
Gap	Bulges and bubbles on insulating cardboard	15	CNN	0	3	12	0
Floating	Loose nuts on transformers or reactors	23	Proposed	0	0	6	17
Floating	Loose nuts on transformers or reactors	23	CNN	1	0	8	14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Yu, Y.; Zhang, Y.; Liu, Z.; Zhang, M. A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training. Energies 2024, 17, 4574. https://doi.org/10.3390/en17184574

AMA Style

Zhang Y, Yu Y, Zhang Y, Liu Z, Zhang M. A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training. Energies. 2024; 17(18):4574. https://doi.org/10.3390/en17184574

Chicago/Turabian Style

Zhang, Yi, Yang Yu, Yingying Zhang, Zehuan Liu, and Mingjia Zhang. 2024. "A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training" Energies 17, no. 18: 4574. https://doi.org/10.3390/en17184574

APA Style

Zhang, Y., Yu, Y., Zhang, Y., Liu, Z., & Zhang, M. (2024). A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training. Energies, 17(18), 4574. https://doi.org/10.3390/en17184574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semi-Supervised Approach for Partial Discharge Recognition Combining Graph Convolutional Network and Virtual Adversarial Training

Abstract

1. Introduction

2. PD Graph Signals

3. Semi-Supervised Recognition Approach for PD Faults Combining GCN and VAT

3.1. Principle of GCN

3.1.1. GCN Layers

3.1.2. SAGPool Layer

3.2. Semi-Supervised Training Based on VAT

3.2.1. Overview of Semi-Supervised Training

3.2.2. VAT and Loss Functions

3.3. Algorithm Flow

4. Transformer PD Experiment

4.1. Data Collection

4.2. PRPD and Preprocessing

5. Discussion

5.1. GCN Structure and Graph Signal Parameters

5.2. Effectiveness Analysis of the Semi-Supervised PD Recognition

5.3. Ablation Experiment

5.4. Comparison with Traditional PD Recognition Approaches

5.5. Field Case Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI