A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture

Li, Jiawei; Ma, Shangang; Jin, Fubao; Zhao, Ruiting; Zhang, Qiang; Xie, Jiawen

doi:10.3390/info16020110

Open AccessArticle

A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture

by

Jiawei Li

,

Shangang Ma

^*,

Fubao Jin

,

Ruiting Zhao

,

Qiang Zhang

and

Jiawen Xie

School of Energy and Electrical Engineering, Qinghai University, Xining 810016, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(2), 110; https://doi.org/10.3390/info16020110

Submission received: 8 January 2025 / Revised: 28 January 2025 / Accepted: 31 January 2025 / Published: 6 February 2025

(This article belongs to the Special Issue Emerging Research on Neural Networks and Anomaly Detection)

Download

Browse Figures

Versions Notes

Abstract

:

Aiming at the problem of manual feature extraction and insufficient mining of feature information for partial discharge pattern recognition under different insulation faults in GIS, a deep learning model based on phase and timing features with Swin Transformer-AFPN-LSTM architecture is proposed. Firstly, a GIS insulation fault simulation experimental platform is constructed, and the PRPD phase data and TRPD timing data under different faults are obtained; secondly, the TRPD timing data are converted into MTF; then the PRPD phase data and MTF timing data are input into the Swin Transformer-AFPN-LSTM model and other deep learning models for performance comparison. The experimental results show that the Swin Transformer-AFPN-LSTM model improves the performance by 14.09–21.23% compared with the traditional CNN model and LSTM model. Moreover, using this model to extract phase features and timing features simultaneously improves the accuracy by 10.67% and 8.66%, respectively, compared with single feature extraction, and the overall accuracy reaches 98.82%, which provides a new idea for GIS insulation fault identification.

Keywords:

MTF; Swin Transformer-AFPN-LSTM model; multi-feature extraction; multi-feature fusion; GIS fault identification

1. Introduction

Gas-insulated metal-enclosed switchgear (GIS) is widely utilized in high-voltage substations due to its advantages of high reliability, small footprint, and strong anti-interference, showing an increasing development trend year by year. However, partial discharge (PD) will inevitably occur during the pre-assembly and later charged operation of GIS due to various insulation faults, and long-term partial discharge will lead to weakening of the insulation capacity of GIS or even an insulation breakdown, which brings great challenges to the stable operation of GIS [1]. Therefore, an intensive study of PD under different insulation faults is of great significance for assessing the internal state of GIS, discovering and eliminating various faults in time, and improving the reliability of GIS equipment and ensuring the stable operation of substations.

PD identification for different insulation faults in GIS is mainly based on the phase-based PRPD (Phase Resolved Partial Discharge) mode and time-based TRPD (Time Resolved Partial Discharge) mode [2], and on this basis, PD features are extracted and then identified by classification algorithms for various types of insulation faults. Traditional PD identification mainly relies on human selection and extraction of features, the main features include statistical features, fractal features, grayscale image features, moment features, and time-frequency domain features of the discharge waveform. A study [3] formed a feature matrix by extracting statistical parameters such as skewness, steepness, and sixth-order moments of the PRPD spectrogram. Another study [4] performed time-frequency feature extraction based on the energy distribution spectrogram obtained from STFT (short-time Fourier transform) analysis. Because different features contain different discharge information, the manual selection of PD features is often subjective and limited, and the previous GIS fault identification is often based only on the phase features contained in the PRPD mode or the temporal features contained in the TRPD mode, failing to comprehensively utilize the phase and temporal features of the PD, which is prone to lose the feature information of the samples, limiting the great potential of the PD data in the multi-feature and multi-scale. In addition, traditional machine-learning classification algorithms, such as support vector machine, decision tree, and cluster analysis, require artificial parameter adjustments, which greatly increases the time cost of recognition, and the recognition accuracy needs to be further investigated [5,6,7].

In recent years, with the high-speed development of deep learning methods in semantic analysis, machine vision, pattern recognition, and other fields, a series of successful use of the CNN (convolutional neural network), DRNNs (deep residual neural networks), LSTMs (long- and short-term memory neural networks), the Transformer algorithm, and other successful use of the traditional machine-learning recognition methods has gone beyond the traditional machine-learning recognition methods, to the degree where the recognition of GIS fault identification has brought a new direction [8,9,10,11]. The authors of [12] studied pattern recognition of PRPS (Phase Resolved Pulse Sequence) spectrograms based on the deep CNN, and the recognition accuracy was significantly improved compared with the SVM (support vector machine) and the BPNN (back propagation neural network), but the CNN is prone to gradient explosion or disappearance with the stacking of convolutional kernels. Further, another study [13] utilized a deep residual neural network for feature extraction of raw signals of partial discharges, which overcomes the disadvantage of exploding or vanishing gradients by using a jumping residual structure, but ignores other feature representations of the partial discharge data to some extent. A study [14] combined LSTM extraction of temporal features with CNN extraction of image features to realize fault detection for GIS, which made up for the shortcomings of single feature extraction that easily misses other features, and the correct rate reached 97.79%. The authors of [15] have shown the effectiveness of multi-scale feature fusion by parallel branching the CNN structure for different scales of feature extraction from the fused information of the original fault signals. By observing the PRPD phase data and TRPD timing data of different defects, the characteristics of partial discharges are reflected in the differences in the distribution of the discharge capacity in different phases and discharge moments, for example, the tip discharges are mainly concentrated in the negative half week of the discharge phase, with a small amount of them occurring in the positive half week, the particle discharges are irregularly distributed throughout the entire phase, and the phase distributions of the suspension discharges and the air gap discharges are similar, but their amplitudes are significantly different. The discharge capacity of different faults is also different in each discharge moment and discharge time interval. For a complete partial discharge caused by a GIS internal insulation fault, the discharge phase and discharge pulse time contain different feature information, and there is an inseparable phase synchronization relationship between the two. Therefore, on the one hand, multi-feature extraction enriches the feature information, and on the other hand, it lays the foundation for the fusion of the two features.

First proposed by Google in 2017, the Transformer model was initially used to process natural language tasks while incorporating residual concatenation internally to effectively overcome the gradient explosion and gradient vanishing problems. With the demand for image-based tasks, Vision Transformer came into existence. A study [16] successfully implemented fault recognition of bearings using Vision Transformer. The core advantage of Vision Transformer stems from the innovative multi-head encoding attention mechanism, which facilitates the accurate extraction of image spatial features [17]. However, Vision Transformer computes self-attention based on global image blocks, relatively neglecting the detailed information of local features. To solve this problem, Swin Transformer, which was proposed in 2021, is able to better focus on features in local regions through window-based self-attention computation, thus capturing local structures and details in images more effectively. A study [18] proposes a fault diagnosis method based on MCADCRN and the Swin Transformer, which fuses local features with global features and is applied in rolling bearing fault diagnosis. Another study [19] proposed a multi-scale lightweight Swin MLP model with adaptive channel soft thresholding to realize fault identification for power transformers.

In this paper, a GIS insulation fault classification method based on Swin Transformer-AFPN-LSTM with phase features and timing features is proposed. Firstly, the experimental platform and experimental circuit for GIS insulation faults are built, and the PRPD data and TRPD data under each fault are collected. Secondly, the TRPD data are converted into MTF. Then the PRPD data and MTF data are input into the Swin Transformer-AFPN-LSTM model for feature extraction and classification. The model contains three key components: a two-way Swin Transformer feature extraction module, an AFPN feature fusion module, and an LSTM classification module. Among them, the two-way Swin Transformer feature extraction module extracts phase and timing features from the PRPD and MTF spectra, respectively; the AFPN feature fusion module is responsible for multi-scale feature fusion of phase and timing features; and the LSTM classification module carries out the classification after the fusion of features. In addition, the Swin Transformer-AFPN-LSTM model is compared with the CNN and LSTM deep learning models for performance comparison experiments, which comprehensively discusses the superiority of the method and verifies the advantages of multi-feature extraction and multi-scale fusion.

2. GIS Insulation Fault Experiment Design and Dataset Acquisition

2.1. GIS Typical Fault Model

For the GIS field operation fault experiment, four typical fault models were designed, namely, tip discharge model, particle discharge model, suspension discharge model, and air gap discharge model. Each defect physical model is shown in Figure 1. Figure 1a shows the tip discharge model, which consists of a needle electrode and a circular plate electrode to simulate the tip discharge in the GIS shell; Figure 1b shows the particle discharge model, which consists of two circular plate electrodes and metal particles with a diameter of 3 mm, which are placed in the middle of two round cake electrodes; Figure 1c shows the suspended discharge model, which consists of upper and lower circular plate electrodes, epoxy resin, and screws. The epoxy resin is 46 mm high and has a radius of 15 mm, and the nut is placed right in the middle of its interior, which may form a suspended potential due to poor contact during the discharge; Figure 1d shows the air gap discharge model consisting of upper and lower circular plate electrodes with epoxy resin containing air bubbles inside. The epoxy resin is also 46 mm high with a radius of 15 mm, and its interior contains a number of air bubbles with a diameter of about 2 mm. The four types of fault models are encapsulated in a metal cavity as a GIS experimental platform.

2.2. GIS Experimental Circuit

The experimental equivalent circuit of GIS is shown in Figure 2. The experimental circuit mainly consists of a booster circuit, a GIS experimental platform, and a partial discharge detection and analysis part. The boost circuit consists of a voltage regulator and a transformer, which provides alternating current voltage at working frequency for the whole experiment; the GIS experimental platform is used to simulate the insulation fault; the partial discharge detection and analysis part consists of UHF (ultra-high frequency) sensors, partial discharge detectors, and portable computers, which are responsible for collecting and analyzing the partial discharge signals from the GIS experimental platform. The overall experimental platform is shown in Figure 3. The experimental platform mainly consists of two cylindrical cavities separated from each other by basin insulators, with an overall length of 3862 mm, a cavity diameter of 760 mm, and a height of 1620 mm. The introduction of the fault models is realized by manually rotating the model lifting bar in Figure 3. The model lifting bar is the fault master switch for the entire experimental platform. For the experiment, after unscrewing the model lifting bar and rotating the black knob on top of the model, the electrode spacing will be reduced by 1 mm by rotating it clockwise for one revolution. The spacing of the four fault models is set to be 46 mm in the course of the experiment.

2.3. Experimental Pressurization and PD Dataset Acquisition

For the experiments, the model lifting bar in Figure 3 was placed in the on state, and each fault introduction situation was realized by manually adjusting the black threaded bar. The voltage at which the local discharge meter first detected the repeated PD signal was used as the partial discharge starting voltage of the discharge model. At the beginning of the experiment, the voltage was gradually increased to 252 kV, 548 kV, and 753 kV and maintained at each voltage level for 10 min. PRPD and TRPD data at different voltage levels were recorded using the partial discharge detector. Among them, the PRPD analysis mode, also called Q–φ–N mode, characterizes the discharge pulse through the discharge volume–phase–discharge number.

In this paper, the Q–φ relationship in the PRPD mode is used to construct a PRPD dataset based on phase characteristics. Nine hundred PRPD spectra are collected in each fault case, totaling 3600 fault samples. The typical PRPD phase data for each type of fault are shown in Figure 4.

The TRPD analysis model contains information on the temporal distribution of discharge pulses by collecting 300 frequency cycles at each voltage level and 900 cycles for each fault, totaling 3600 fault samples. With the advancement of mining time series information, the techniques represented by MTF and GAF (gram angle field) have achieved great success and are more suitable to be used as image recognition input sources for deep learning models [20,21,22]. Since the GAF conversion process involves trigonometric calculations, which are computationally expensive for large-scale time series data, this paper presents a further information representation of the discharge versus time in the collected TRPD in terms of MTF.

3. Markovian Transfer Field Representation Based on Partial Discharge Time Series Data

3.1. Markov Transfer Field Construction Principle

3.1.1. Markov Transfer Probability Matrix

For a given set of one-dimensional PD time sequence

X = \{x_{1}, x_{2}, \dots, x_{t}, \dots, x_{n}\}

, where

x_{t} (1, 2, \dots, n)

denotes the discharge of the charge for each moment. Divide the sequence

X

into

q

interquartile intervals

Q_{i}

and map

x_{t}

within each interquartile interval

Q_{i}

. Calculating the transfer probability of each

Q_{i}

in the direction of increasing

x_{t}

according to the first-order Markov chain approach yields a

q \times q

Markov transfer probability matrix

M

as shown in Equation (1). The elements in

M

denote the probability that the next point of a point located in the quantile interval

Q_{i}

is within the interval

Q_{i + 1}

.

M = [\begin{matrix} m_{1, 1 ∣ P (x_{t} \in Q_{1} ∣ x_{t - 1} \in Q_{1})} & m_{1, 2 ∣ P (x_{t} \in Q_{1} ∣ x_{t - 1} \in Q_{2})} & \dots & m_{1, q ∣ P (x_{t} \in Q_{1} ∣ x_{t - 1} \in Q_{q})} \\ m_{2, 1 ∣ P (x_{t} \in d_{2} ∣ x_{t - 1} \in Q_{1})} & m_{2, 2 ∣ P (x_{t} \in Q_{2} ∣ x_{t - 1} \in Q_{2})} & \dots & m_{2, q ∣ P (x_{t} \in Q_{2} ∣ x_{t - 1} \in Q_{q})} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ m_{q, 1 ∣ P (x_{t} \in d_{q} ∣ x_{t - 1} \in Q_{1})} & m_{q, 2 ∣ P (x_{t} \in Q_{q} ∣ x_{t - 1} \in Q_{2})} & \dots & m_{q, q ∣ P (x_{t} \in Q_{q} ∣ x_{t - 1} \in Q_{q})} \end{matrix}]

(1)

Since the Markov chain specifies that the transfer probability of the current state depends only on its previous moment, the Markov transfer probability matrix, although able to maintain the Markov’s temporal dynamics, ignores the dependence of the time series on the time step, and needs to be further improved.

3.1.2. Markov Transfer Fields

Expanding the Markov transfer probability matrix by sorting each transfer probability in chronological order yields the MTF, thus realizing the full utilization of the time series information [23]. The specific definition of the MTF is shown in Equation (2).

M = [\begin{matrix} m_{i, i + 1 ∣ P (x_{1} \in Q_{i} ∣ x_{1} \in Q_{i + 1})} & m_{i, i + 1 ∣ P (x_{1} \in Q_{i} ∣ x_{2} \in Q_{i + 1})} & \dots & m_{i, i + 1 ∣ P (x_{1} \in Q_{i} ∣ x_{n} \in Q_{i + 1})} \\ m_{i, i + 1 ∣ P (x_{2} \in Q_{i} ∣ x_{1} \in Q_{i + 1})} & m_{i, i + 1 ∣ P (x_{2} \in Q_{i} ∣ x_{2} \in Q_{i + 1})} & \dots & m_{i, i + 1 ∣ P (x_{2} \in Q_{i} ∣ x_{n} \in Q_{i + 1})} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ m_{i, i + 1 ∣ P (x_{n} \in Q_{i} ∣ x_{1} \in Q_{i + 1})} & m_{i, i + 1 ∣ P (x_{n} \in Q_{i} ∣ x_{2} \in Q_{i + 1})} & \dots & m_{i, i + 1 ∣ P (x_{n} \in Q_{i} ∣ x_{n} \in Q_{i + 1})} \end{matrix}]

(2)

In the MTF,

Q_{i}

and

Q_{i + 1}

represent the interquartile intervals of steps i and i + 1, respectively, and the transfer probability from

Q_{i}

to

Q_{i + 1}

is

m_{i, i + 1}

. For

m_{i, i + 1}

, the transfer probability of the time step

i

is assigned to the time step

i + 1

by assigning the transfer probability to the time step

i + 1

in order to realize the transfer probability encoding of the MTF for a multiscale time span.

3.2. MTF Representation of GIS Time Series Datas

Taking the tip discharge as an example, firstly, the charge signal is segmented with 20 ms as unit step, and each time length contains 2880 data points; then interquartile interval setting is performed based on the sliding window approach. As different interquartile interval

q

values have different effects on MTF image coding. When the quartile

q

value is small, the sequence is divided into fewer states, resulting in a smaller transfer probability matrix. This will simplify the transfer relationship between states and lose some detailed information to capture the subtle changes in the time series. Therefore, the generated MTF images have lower resolution and cannot effectively mine the detailed information about the state changes in the sequence. When the quantile

q

is larger, the time series is divided into more states, and the transfer matrix is therefore larger. This will capture more detailed information, but at the same time, introduce more noise, resulting in the main features not being obvious. In addition, an increase in the

q

value will lead to an increase in computation, making the image coding less efficient. Combining the above factors, the experiments in this paper found that a

q

value set to 4 ensured computational efficiency while capturing the dynamically changing features of the signal. Therefore, we chose the number of bins

q = 4

for MTF image coding; then we constructed it according to the Markov transfer field principle; finally, we obtained the MTF representation based on GIS time series data. Figure 5 shows the MTF representation under different insulation faults.

4. GIS Fault Identification Algorithm Based on Swin Transformer-AFPN-LSTM Model

The Swin Transformer-AFPN-LSTM model proposed in this study is shown in Figure 6, which integrates the Swin Transformer-based phase feature and timing feature extraction module, the AFPN feature fusion module, and the LSTM timing classification module. Firstly, the PRPD spectrograms containing phase features and the MTF spectrograms containing time-sequence features are input into the Swin Transformer feature extraction module, which precisely extracts and reorganizes the phase and time-sequence features by using the sliding-window based self-attention mechanism to effectively capture the subtle differences in the phase features and time-sequence changes in the time-sequence features of the PD data. In the AFPN feature fusion module, the feature maps at different scales are extracted through the AFPN structure, and the phase distribution information and temporal dynamics information are synthesized to realize the multi-scale fusion of features. Finally, the multi-scale fused features are integrated at the full connectivity layer and input into the LSTM time series classification module; based on the fused features and taking advantage of their long-term dependency capturing, these features are decoded by the classifier, and the identification prediction about the fault category is finally generated.

4.1. Swin Transformer Module

Each Swin Transformer feature extraction branch is mainly divided into four stages, and the number of internal Swin Transformer blocks adopts the structure of (2,2,6,2) in order. For the first stage, the Swin Transformer phase feature extraction branch firstly takes the PRPD feature image

X_{φ} \in R^{H \times W \times C}

, where H denotes the height of the feature image, W denotes the width of the feature image, and C denotes the number of channels. By the Patch Partition layer slicing into the

P = M \times M

feature map and by rearranging and spreading each token in the feature map in the channel dimension, the phase feature map is output as

X_{φ} \in R^{H \times W \times D_{φ}}

, and

D_{φ}

is the channel that outputs the phase features. Since the temporal features have time continuity, in order to maintain dimensional consistency with the phase feature map, the Swin Transformer temporal feature extraction branch splices the input MTF image

X_{t} \in R^{H \times W \times (C \times T)}

in the entire discharge time dimension, and this operation connects the MTFs of the initial discharge, stable discharge, and critical breakdown in the time dimension, which ensures that the temporal feature extraction branch has complete information acquisition. After managing the same Patch Partition layer for

M \times M

, the timing feature map is output as

X_{t} \in R^{H \times W \times D_{t}}

, and

D_{t}

is the channel for outputting timing features. Subsequently, the phase feature map and the timing feature map are linearly transformed through the Linear Embedding layer, and the cut

P = M \times M

feature map is unfolded;

N

is the size of the unfolded picture block,

N = (H \times W) / P

, and then, pixels at the same position in each picture block are input into the Swin Transformer for positional encoding, and then, the encoded output of the phase feature map and the timing feature map are obtained, respectively. The encoded phase feature map,

X_{φ} \in R^{P_{φ} \times N_{φ} \times D_{φ}}

, and temporal feature map,

X_{t} \in R^{P_{t} \times N_{t} \times D_{t}}

, are obtained, respectively, and then input into the subsequent second, third, and fourth stage Swin Transformer blocks for feature extraction. Each pass through the Patch Merging layer reduces the size of the features from the previous stage, thus obtaining image features of different scales, and the AFPN structure realizes the extraction and fusion of these different scales.

Swin Transformer block is the core component of the Swin Transformer feature extraction module, which contains two parts: the encoder and the decoder. The encoder and decoder are coupled with each other and are internally composed of a normalization layer, Layer Norm (LN), a windows multi-head self-attention (W-MSA), a shifted windows multi-head self-attention (SW-MSA), and a multilayer perceptron (MLP), and they are composed while residual connections are embedded internally [24]. The encoder module and decoder module are shown in Figure 7.

The encoder component divides the feature map into windows, and linearly projects the vector matrices within the windows into Q-query matrix, K-key matrix, and V-matrix and uses these matrices to perform W-MSA calculations within each individual window, as shown in Equation (3).

A t t e n t i o n (Q, K, V) = s o f t m a x (Q K^{T} / \sqrt{(d_{k})}) V

(3)

where

d_{k}

is the row vector dimension of the K-key matrix.

Since the W-MSA module computes in independent windows, the computer’s GPU resources can be fully utilized to perform parallel computation on multiple windows, which greatly improves the network computation efficiency. Meanwhile, as the height and width of the input feature map increase, the computational complexity of the traditional multi-head self-attention (MSA) mechanism increases in square steps, while the window size processed by the W-MSA is fixed, so the computational amount is greatly reduced. In the modeling of this paper, the size of the encoder window is set to P = 4 × 4. Such a size setting can take into account the scale information of different sizes, which is conducive to the average distribution of the subsequent window information, and also reduce the amount of computation. The computational complexity of the MSA and the W-MSA is shown in Equations (4) and (5).

Ω (MSA) = 4 h w C^{2} + 2 {(h w)}^{2} C

(4)

Ω (W - MSA) = 4 h w C^{2} + 2 M^{2} h w C

(5)

where

h

represents the height of the feature map,

w

is the width of the feature map,

C

represents the depth of the feature map, and

M

is the size of each window.

Since the W-MSA modules are computed in their own separate windows, which greatly limits the flow of information, the SW-MSA operation is performed within the decoder component. Specifically, by performing offset processing on the W-MSA based windows in the encoder component, the 4 × 4 windows in the W-MSA module are moved by two tokens in the lower right direction and rearranged, thus constructing an SW-MSA window that covers the information of the original W-MSA windows. The specific schematic diagram is shown in Figure 8. Suppose a window is divided into 4 small windows represented by different colors. Through the offset processing to the lower right corner, 4 regions A, B, C, D are formed, and after the rearrangement of different regions A, B, C, D, the final formation of each window contains the information in different regions before the offset.When calculating the SW-MSA window, the information of each W-MSA window is fused, which enables the decoder component to have a global perceptual view and effectively enhances its feature extraction ability.

In the feature output section, both the encoder and decoder use LN to normalize the inputs of each layer to maintain the stability of the feature distribution. The MLP uses the ReLU function, which is responsible for the linear transformation and up-sampling operations on the feature maps of different resolutions processed by the W-MSA module and the SW-MSA. The residual connection is embedded in the middle of the whole structure to avoid gradient vanishing and explosion.

4.2. AFPN Module

The AFPN extracts the last layer of features from the feature layers output from each Swin Transformer block, resulting in a set of features at different scales, denoted as

\{C 2, C 3, C 4, C 5\}

. Before performing feature fusion, the bottom features

C 2

and

C 3

are first input into the feature pyramid, after which the deep features

C 4

and top features

C 5

are added to the network, resulting in a set of multi-scale features

\{P 2, P 3, P 4, P 5\}

. During the fusion process, the adaptive spatial feature fusion (ASFF) module within the AFPN adaptively learns the fused spatial weights for each scale feature map. This module enhances the importance of key layer features and mitigates the effect of conflicting information from between layers to better capture feature detail information [25]. The structure of the adaptive spatial feature fusion module ASFF is shown in Figure 9.

Taking a three layers network

N = 3

as an example, the ASFF operation is divided into the following two main steps:

Feature transformation

Define the feature map dimension of the current layer

L (L \leq N)

as

X_{L}

, for which the feature map dimension

X_{E}

of the other layers

E (E \neq L)

is transformed into the layer

X_{L}

by up-sampling or down-sampling. This part of the sampling operation is performed by 1 × 1, 2 × 2, and 4 × 4 convolution operations.

Adaptive fusion

Let

X_{i j}^{E \to L}

denote the feature vector at the

(i, j)

position from the feature layer

E

to the feature layer

L

.

Y_{i j}^{L}

denotes the fused feature vector at the

(i, j)

position, then the feature fusion formula for the layer

L

is shown in Equation (6). After performing the adaptive fusion operation according to Equation (6), a set of weight parameters is obtained, which are redistributed to different feature layers after weight normalization based on Softmax. Finally, the weighted average is summed to obtain the final output.

Y_{i j}^{L} = α_{i j}^{L} \cdot x_{i j}^{1 \to L} + β_{i j}^{L} \cdot x_{i j}^{2 \to L} + γ_{i j}^{L} \cdot x_{i j}^{3 \to L}

(6)

where

α_{i j}^{L}

,

β_{i j}^{L}

, and

γ_{i j}^{L}

denote the weight parameters learned through adaptive learning in layer 1, layer 2, and layer 3, respectively, which are all between 0 and 1, and

α_{i j}^{L} + β_{i j}^{L} + γ_{i j}^{L} = 1

.

4.3. LSTM Module

The partial discharge data under each fault may present different discharge amount cases in the same phase region at different discharge moments. Based on this, the LSTM temporal classification module is introduced. Before classification, the LSTM layer deals with features that have been fused by the AFPN, thus ensuring that the model not only absorbs the phase change information but also understands the dynamic information of the discharge state over time.

The LSTM consists of input gates, oblivion gates, output gates, and a unit state, which together determine how the state of the model is updated [26].

f_{t}

is the forgetting gate of the LSTM cell, which determines the information that should be removed from the cell state by accepting the previous hidden state output

h_{t - 1}

and the current input

x_{t}

and generating a value between 0 and 1 for each element in the cell state

C_{t}

. The forgetting gate

f_{t}

is calculated as Equation (7);

i_{t}

is the input gate that determines how much information should be transmitted to update the long-term memory, and the input gate

i_{t}

is calculated as Equation (8);

o_{t}

is the output gate that determines how much of the latent memory should be transmitted as the hidden state output; the output gate

o_{t}

is calculated as Equation (9);

c_{t}

is the alternative update information and is calculated as Equation (10);

C_{t - 1}

and

C_{t}

are the before-and-after time steps of the unit cell state, calculated as Equation (11);

h_{t}

is the hidden state output, calculated as Equation (12) [27].

f_{t} = σ (W_{f} \times [h_{t - 1}, x_{t}] + b_{f})

(7)

i_{t} = σ (W_{i} \times [h_{t - 1}, x_{t}] + b_{i})

(8)

o_{t} = σ (W_{o} \times [h_{t - 1}, x_{t}] + b_{o})

(9)

c_{t} = \tanh (W_{c} \times [h_{t - 1}, x_{t}] + b_{c})

(10)

C_{t} = f_{t} \times c_{t - 1} + i_{t} \times c_{t}

(11)

h_{t} = o_{t} \times \tanh (C_{t})

(12)

where

x_{t}

is the input vector at a given moment t;

h_{t}

is the hidden state output;

H_{t - 1}

is the output at moment

t - 1

;

b_{f}

,

b_{i}

,

b_{c}

, and

b_{o}

are the bias matrices of the forgetting gate, the input gate, the cell state, and the output gate, respectively; and

W_{f}

,

W_{i}

,

W_{c}

, and

W_{o}

are the weight matrices of the forgetting gate, the input gate, the cell state, and the output gate, respectively.

σ

is a sigmoid activation function. tanh is a hyperbolic tangent function.

5. Experimental Analysis

5.1. Experimental Environment

The experiment is based on the Windows 10 64.bit operating system with Intel Core i7-14700HX CPU and NVIDIA GeForce RTX4060 with 8 G of video memory. A deep learning environment based on Python version 3.10 and the Pytorch 2.1.0 framework was built for model training and testing.

The 3600 MTF and 3600 PRPD spectra are divided into training and test sets, where the training set accounts for 70% and the test set accounts for 30%. Each time the model training used the category-balanced sampling method, we randomly input four categories of fault samples each 100 sets of data and performed each round of iteration 200 times, which resulted in a total of 1000 iterations. The model optimizer takes the Adam optimizer. The initial learning rate is set to 1 × 10⁻⁶, and the weight decay coefficient is 1 × 10⁻². The cross-entropy loss function is selected as the loss function. The input image size is 256 × 256 × 3. The output feature maps of the four stages in the feature extraction branch of Swin Transformer have the dimensions of 32 × 32, 16 × 16, 8 × 8, and 4 × 4, respectively, and the feature dimensions are 32, 64, 128, and 256, with a head of 4.

5.2. Evaluation Indicators

In order to measure the performance of the model for classification, Accuracy, Precision, Recall, and F1 Score are used as evaluation metrics. Accuracy reflects the overall performance of the model for classification and is calculated as Equation (13); Precision reflects the correct rate of actual positive categories among the predicted positive categories and is calculated as Equation (14); Recall represents the proportion of actual positive categories correctly recognized and is calculated as Equation (15); F1 Score combines Precision and Recall, which balances the classifier’s need to recognize a small number of classes with the requirement to avoid too many false positives, is calculated as Equation (16).

Accuracy = \frac{TP + FN}{TP + TN + FP + FN}

(13)

Precision = \frac{TP}{TP + FP}

(14)

Recall = \frac{TP}{TP + FN}

(15)

F 1 Score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(16)

where

T P

is the number of samples that are actually positive and predicted to be positive;

F N

is the number of samples that are actually positive and predicted to be negative;

T N

is the number of samples that are actually negative and predicted to be negative; and

F P

is the number of samples that are actually negative and predicted to be positive.

5.3. Comparison of Different Deep Learning Methods

The performance of the Swin Transformer-AFPN-LSTM model is compared with other common deep learning neural networks such as the convolutional neural network (CNN) and the long short-term memory (LSTM) neural network. Among them, the CNN extracts the phase features of PRPD spectrograms and the temporal features of MTF spectrograms of various types of faults, respectively; meanwhile, in order to study the effectiveness of visualizing the MTFs of one-dimensional temporal data for classification recognition, the original one-dimensional temporal data are input into the LSTM neural network for a comprehensive comparison. The accuracy and iteration of each network are shown in Figure 10. The recognition performance metrics for each network are shown in Table 1 and Figure 11.

As can be seen from the combined Figure 10 and Figure 11 and Table 1, the Swin Transformer-AFPN-LSTM model stabilizes after about 200 iterations, and the final recognition accuracy reaches 98.82%, which is significantly better than the CNN (PRPD) model, the CNN (MTF) model, and the LSTM model, and improves, respectively, the overall accuracy by 14.09%, 16.43%, and 21.23%; 13.87%, 14.94%, and 23.1% in the precision rate; 14.71%, 18.32%, and 21.22% in the recall rate; and 14.28%, 16.26%, and 22.21% in the F1 Scores; exploring the reasons for these performance enhancements, we find that the Swin Transformer-AFPN-LSTM model selects phase features and temporal features for extraction in the feature extraction stage, and also due to the presence of the encoder W-MSA and decoder SW-MSA mechanisms, the feature information fully interacts with each other within different windows, enhancing the information expression. And the AFPN module fuses the different resolution feature maps to further enhance the feature interactions at different scales, prompting these performance improvements. It is worth noting that the model improves the precision rate more significantly while ensuring the overall accuracy rate; for example, the Swin Transformer-AFPN-LSTM model improves the accuracy rate by 23.1% over the LSTM model, which is higher than the improvement effect of the other models, and this finding further proves the existence of the self-attention mechanism within the encoder and the decoder, which learns the features that have the key weighted features, which makes the probability of predicting the type of local discharge as a positive class when it is actually a positive class effectively increased. In addition, the CNN (MTF) model improves 4.8%, 8.16%, 2.9%, and 5.59% over the LSTM model in terms of accuracy, precision, recall, and F1 Score, respectively, which demonstrates that the PD time-series data can be effectively mined for its intrinsic feature information after the MTF visualization operation.

Further synthesizing the experimental results, it can be analyzed that the LSTM is only able to capture the long- and short-term temporal features in one-dimensional temporal PD data due to its own threshold structure; whereas the CNN can effectively mine the phase features embedded in the PRPD data or the temporal features embodied in the MTFs to a certain extent by adjusting the size of the convolution kernel, but the problem of selecting the size of the convolution kernel and the computational overhead of the convolution operation cannot be ignored with the increase of the data volume. In contrast, the Swin Transformer-AFPN-LSTM model simultaneously extracts phase features and timing features, and the existence of the window-based self-attention mechanism realizes parallel computing while the computational overhead is further reduced. In feature processing, the feature maps of each size are extracted for multi-scale fusion, and the PD information in the fused features is finally captured, so that the individual performance metrics are comprehensively improved compared with the other models, reflecting the superiority of this model for the PD recognition task.

5.4. Experiments to Validate Multi-Feature Extraction and Fusion

To further validate the advantages of multi-feature extraction as well as multi-scale feature fusion, ablation experiments are designed for the different input sources and AFPN fusion module using Swin Transformer-LSTM as the base model. The identification of each network is shown in Figure 12 and Figure 13 and Table 2.

As can be seen from the combined Figure 12 and Figure 13 and Table 2, firstly, comparing the Swin Transformer-LSTM model with single feature extraction and multi-feature extraction, the recognition accuracy of the model after multi-feature extraction reaches 93.37%, which is 5.22% and 3.21% higher than that of Swin Transformer-LSTM with the extraction of a single phase feature or temporal feature, respectively, in terms of overall accuracy; 5.85% and 3.11% higher in terms of precision; 3.28% and 3.2% higher in terms of recall; and 4.57% and 3.15% higher in terms of F1 Score, respectively. It shows that extracting multiple features at the same time can effectively improve the recognition effect of the model. Further, after we added the AFPN fusion module to the Swin Transformer-LSTM model, the accuracy, precision, recall, and F1 Score of the model were improved by 5.45%, 3.42%, 6.4%, and 4.9%, respectively, compared with the pre-addition. Surprisingly, the improvement in recall is more obvious. It is worth noting that the accuracy of phase and temporal features input into the Swin Transformer-AFPN-LSTM model is improved by 10.67% and 8.66%, respectively, over the input of a single feature, indicating that the model is effectively enhanced for the information connection of different features, which makes the subsequent LSTM module learn more feature representations for classification. Therefore, it performs the best on PD recognition.

In addition, combining Table 1 and Table 2, when the input sources are consistent, the Swin Transformer-LSTM model improves 3.42% and 7.77% over the CNN model in overall accuracy; 4.6% and 8.41% in precision; 5.03% and 8.72% in recall; and 4.81% and 8.57% in F1 Score, respectively, and it can be equally found that the Swin Transformer-LSTM model improves the recall rate more significantly, demonstrating that Swin Transformer-LSTM better satisfies the PD recognition task than the CNN model. Moreover, when the MTF is used as the input source, the recognition accuracy of the CNN model is 82.39%, and the recognition accuracy of the Swin Transformer-LSTM model is 90.16%, with an overall improvement of 7.77%, which exceeds the improvement effect of using PRPD as the input source, indicating that the feature mining effect of the Swin Transformer module on the MTF is more obvious than that of PRPD, and it can be inferred that the feature mining effect on the MTF is more obvious than that of PRPD. The reason can be speculated that on the one hand, the MTF contains more feature information than PRPD. On the other hand, it is because the self-attention mechanism based on the moving window in the Swin Transformer module has more obvious effect on feature mining than the convolutional kernel processing of the CNN.

6. Conclusions

In this paper, a GIS partial discharge fault identification model based on Swin Transformer-AFPN-LSTM algorithm with phase features and timing features was proposed. Using the constructed GIS fault experimental circuits, the phase-feature-based PRPD data and the timing-based TRPD data were obtained, and the TRPD data were transformed into the MTF for easy pattern recognition. And the performance of the proposed model in this paper was compared with other deep learning models and verified the advantages of multi-feature extraction. The following conclusions can be drawn from the experimental results:

After transforming one-dimensional time-series PD data into the MTF, the overall accuracy of feature extraction and recognition using the CNN is improved by 4.8% compared with directly using one-dimensional time-series data input to LSTM, and comparing the CNN model with the Swin Transformer-LSTM model, the enhancement effect on the MTF is greater than that of PRPD, which suggests that the MTF operation can effectively enrich the feature representations and is suitable as an input source for the Swin Transformer-LSTM model.
The multi-feature extraction Swin Transformer-LSTM neural network is suitable for GIS insulation fault identification, and its accuracy is improved by 5.22% and 3.21% compared to phase or time series single feature extraction, respectively. And the accuracy of the Swin Transformer-LSTM neural network is improved by 3.42% and 7.77% compared with the traditional CNN single phase or time sequence feature extraction and identification, respectively.
The addition of the AFPN multi-scale feature fusion module makes the Swin Transformer-AFPN-LSTM neural network strengthen the information connectivity between multi-scale feature maps and realizes the complementary advantages between the high-resolution refined features and low-resolution globalized features, and the recall of the improved Swin Transformer-AFPN-LSTM model is improved by 6.4%, while the overall accuracy is improved by 5.45% compared with that before the improvement, reaching 98.5%. The recall rate of the improved Swin Transformer-AFPN-LSTM model is improved by 6.4%, which is the most significant improvement, and the overall accuracy is improved by 5.45% compared with the unimproved one, reaching 98.82%, which can satisfy the requirements of the GIS fault identification task.
Compared with the traditional classification, the Swin Transformer-AFPN-LSTM model avoids the human subjective selection of certain features for extraction and has high recognition accuracy, which has achieved good detection results in the fault detection of GIS power equipment. It is also worth noting that in the whole power transmission process, other power equipment such as power transformers, transmission line insulators, etc. will also appear similar to the partial discharge phenomenon of the GIS equipment, which will have a negative impact on the transmission of power, so it is meaningful to consider migrating the model to other power equipment for fault detection. In addition, since the collected dataset is based on experiments in the laboratory, it may be different from the dataset collected in the actual engineering field, which has certain limitations and requires further practical engineering verification to test and improve the generalization ability of the model. Therefore, in the further research work, the Swin Transformer-AFPN-LSTM model is applied to the fault detection of power transformers, transmission line insulators, and other power equipment to test the generalization ability of the model and improve the optimization, so as to improve the ability of fault detection of different power equipment in the actual field of engineering.

Author Contributions

Conceptualization, J.L. and S.M.; methodology, J.L. and S.M.; software, J.L.; validation, J.L., F.J., and R.Z.; formal analysis, R.Z., Q.Z., and J.X.; investigation, Q.Z.; resources, J.L. and F.J.; data curation, S.M. and R.Z.; writing—original draft preparation, J.L.; writing—review and editing, J.L., S.M., and F.J.; visualization, J.L., J.X., R.Z., and Q.Z.; supervision, S.M.; project administration, S.M. and F.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Yan, J.; Zhang, W.; Yang, Z.; Wang, J.; Geng, Y.; Srinivasan, D. Mutitask Learning Network for Partial Discharge Condition Assessment in Gas-Insulated Switchgear. IEEE Trans. Ind. Inform. 2024, 20, 11998–12009. [Google Scholar] [CrossRef]
Long, J.C.; Xie, L.J.; Wang, X.P.; Zhang, J.; Lu, B.; Wei, C.; Dai, D.D.; Zhu, G.W.; Tian, M. A Comprehensive Review of Signal Processing and Machine Learning Technologies for UHF PD Detection and Diagnosis (II): Pattern Recognition Approaches. IEEE Access 2024, 12, 29850–29890. [Google Scholar] [CrossRef]
Mas’ud, A.A.; Stewart, B.G.; McMeekin, S.G. An investigative study into the sensitivity of different partial discharge φ-q-n pattern resolution sizes on statistical neural network pattern classification. Measurement 2016, 92, 497–507. [Google Scholar] [CrossRef]
Li, G.Y.; Wang, X.H.; Li, X.; Yang, A.J.; Rong, M.Z. Partial Discharge Recognition with a Multi-Resolution Convolutional Neural Network. Sensors 2018, 18, 3512. [Google Scholar] [CrossRef] [PubMed]
Jiang, J.; Chen, J.D.; Li, J.S.; Yang, X.P.; Bie, Y.F.; Ranjan, P.; Zhang, C.H.; Schwarz, H. Partial Discharge Detection and Diagnosis of Transformer Bushing Based on UHF Method. IEEE Sens. J. 2021, 21, 16798–16806. [Google Scholar] [CrossRef]
Morette, N.; Heredia, L.C.C.; Ditchi, T.; Mor, A.R.; Oussar, Y. Partial discharges and noise classification under HVDC using unsupervised and semi-supervised learning. Int. J. Electr. Power Energy Syst. 2020, 121, 106129. [Google Scholar] [CrossRef]
Sun, S.Y.; Sun, Y.Y.; Xu, G.D.; Zhang, L.A.; Hu, Y.R.; Liu, P. Partial Discharge Pattern Recognition of Transformers Based on the Gray-Level Co-Occurrence Matrix of Optimal Parameters. IEEE Access 2021, 9, 102422–102432. [Google Scholar] [CrossRef]
Do, T.D.; Tuyet-Doan, V.N.; Cho, Y.S.; Sun, J.H.; Kim, Y.H. Convolutional-Neural-Network-Based Partial Discharge Diagnosis for Power Transformer Using UHF Sensor. IEEE Access 2020, 8, 207377–207388. [Google Scholar] [CrossRef]
Nguyen, M.T.; Nguyen, V.H.; Yun, S.J.; Kim, Y.H. Recurrent Neural Network for Partial Discharge Diagnosis in Gas-Insulated Switchgear. Energies 2018, 11, 1202. [Google Scholar] [CrossRef]
Wang, Y.X.; Yan, J.; Sun, Q.F.; Li, J.Y.; Yang, Z. A MobileNets Convolutional Neural Network for GIS Partial Discharge Pattern Recognition in the Ubiquitous Power Internet of Things Context: Optimization, Comparison, and Application. IEEE Access 2019, 7, 150226–150236. [Google Scholar] [CrossRef]
Yongyong, J.; Min, D.; Yujie, L.I.; Chun, A.I.; Jinggang, Y.; Chengbao, L. Research on GIS Partial Discharge Pattern Recognition Based on Deep Residual Network. High Volt. Appar. 2018, 54, 123–129. [Google Scholar]
Song, H.; Dai, J.; Sheng, G.; Jiang, X. GIS partial discharge pattern recognition via deep convolutional neural network under complex data source. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 678–685. [Google Scholar] [CrossRef]
Tian, J.P.; Song, H.; Sheng, G.H.; Jiang, X.C. Knowledge-Driven Recognition Methodology of Partial Discharge Patterns in GIS. IEEE Trans. Power Deliv. 2022, 37, 3335–3344. [Google Scholar] [CrossRef]
Liu, T.L.; Yan, J.; Wang, Y.X.; Xu, Y.F.; Zhao, Y.M. GIS Partial Discharge Pattern Recognition Based on a Novel Convolutional Neural Networks and Long Short-Term Memory. Entropy 2021, 23, 774. [Google Scholar] [CrossRef] [PubMed]
Zou, Z.; Zeng, Z.; Wen, Y.; Wang, W.; Xu, Y.; Jin, T. Information Fusion Model Based Improved Multi-Scale Convolutional Neural Network for Fault Diagnosis in EV V2G Charging Pile. In Proceedings of the 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2), Hangzhou, China, 15–18 December 2023; pp. 4068–4073. [Google Scholar]
Zhang, Z.C.; Li, J.; Cai, C.Z.; Ren, J.H.; Xue, Y.F. Bearing Fault Diagnosis Based on Image Information Fusion and Vision Transformer Transfer Learning Model. Appl. Sci. 2024, 14, 2706. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.H.; Chen, H.T.; Chen, X.H.; Guo, J.Y.; Liu, Z.H.; Tang, Y.H.; Xiao, A.; Xu, C.J.; Xu, Y.X.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef]
Guo, H.K.; Zhao, X.Q. Intelligent Diagnosis of Dual-Channel Parallel Rolling Bearings Based on Feature Fusion. IEEE Sens. J. 2024, 24, 10640–10655. [Google Scholar] [CrossRef]
Liu, X.Y.; He, Y.G. A multi-stream multi-scale lightweight SwinMLP network with an adaptive channel-spatial soft threshold for online fault diagnosis of power transformers. Meas. Sci. Technol. 2023, 34, 075014. [Google Scholar] [CrossRef]
Tong, A.; Zhang, J.; Xie, L. Intelligent Fault Diagnosis of Rolling Bearing Based on Gramian Angular Difference Field and Improved Dual Attention Residual Network. Sensors 2024, 24, 2156. [Google Scholar] [CrossRef]
Wang, M.J.; Wang, W.J.; Zhang, X.N.; Iu, H.H.C. A New Fault Diagnosis of Rolling Bearing Based on Markov Transition Field and CNN. Entropy 2022, 24, 751. [Google Scholar] [CrossRef]
Yan, J.L.; Kan, J.M.; Luo, H.F. Rolling Bearing Fault Diagnosis Based on Markov Transition Field and Residual Network. Sensors 2022, 22, 3936. [Google Scholar] [CrossRef]
Lei, C.L.; Miao, C.X.; Wan, H.Y.; Zhou, J.Y.; Hao, D.F.; Feng, R.C. Rolling bearing fault diagnosis method based on MTF-MFACNN. Meas. Sci. Technol. 2024, 35, 035007. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic Feature Pyramid Network for Object Detection. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; pp. 2184–2189. [Google Scholar]
Wu, C.M.; Zheng, S.P. Fault Diagnosis Method of Rolling Bearing Based on MSCNN-LSTM. Comput. Mater. Contin. 2024, 79, 4395–4411. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.S.; Hu, C.H.; Zhang, J.X. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]

Figure 1. GIS insulation fault model. (a) tip discharge model, (b) particle discharge model, (c) suspension discharge model, (d) air gap discharge model.

Figure 2. Experimental equivalent circuit of GIS.

Figure 3. Experimental platform. (1) Built-in UHF transducer, (2) model lifting bar, (3) suspension charge model, (4) air gap charge model, (5) tip charge model, (6) particle charge model, (7) external UHF transducer, (8) pulse calibration bar, (9) coupling capacitance, (10) detecting impedance, (11) local amplitude free experimental transformer.

Figure 4. PRPD for each type of fault. (a) Tip discharge, (b) particle discharge, (c) suspension discharge, (d) air gap discharge.

Figure 5. MTF for each type of fault. (a) Tip discharge, (b) particle discharge, (c) suspension discharge, (d) air gap discharge.

Figure 6. The Swin Transformer-AFPN-LSTM model.

Figure 7. Encoder module and decoder module.

Figure 8. Shift window flow diagram.

Figure 9. ASFF structure.

Figure 10. Accuracy and iteration of each deep learning model.

Figure 11. Comparison of the performance of different deep learning models. (a) Accuracy comparison, (b) precision comparison, (c) recall comparison, (d) F1 Score comparison.

Figure 12. Accuracy and iteration of ablation experiments.

Figure 13. Comparison of the advantages of multi-feature extraction and fusion. (a) accuracy comparison, (b) precision comparison, (c) recall comparison, (d) F1 Score comparison.

Table 1. Comparison of performance metrics of different deep learning network models.

Models	Input	Accuracy/%	Precision/%	Recall/%	F1 Score/%
LSTM	TRPD	77.59	73.46	77.83	75.58
CNN	MTF	82.39	81.62	80.73	81.17
CNN	PRPD	84.73	82.69	84.34	83.51
Swin Transformer-AFPN-LSTM *	MTF/PRPD	98.82	96.56	99.05	97.79

* Swin Transformer-AFPN-LSTM abbreviated as STAL.

Table 2. Comparison of performance metrics of ablation experiments.

Models	Input	Accuracy/%	Precision/%	Recall/%	F1 Score/%
Swin Transformer-LSTM *	MTF	90.16	90.03	89.45	89.74
Swin Transformer-LSTM *	PRPD	88.15	87.29	89.37	88.32
Swin Transformer-LSTM	MTF/PRPD	93.37	93.14	92.65	92.89
Swin Transformer-AFPN-LSTM *	MTF/PRPD	98.82	96.56	99.05	97.79

* Swin Transformer-LSTM abbreviated as STL; Swin Transformer-AFPN-LSTM abbreviated as STAL.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Ma, S.; Jin, F.; Zhao, R.; Zhang, Q.; Xie, J. A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture. Information 2025, 16, 110. https://doi.org/10.3390/info16020110

AMA Style

Li J, Ma S, Jin F, Zhao R, Zhang Q, Xie J. A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture. Information. 2025; 16(2):110. https://doi.org/10.3390/info16020110

Chicago/Turabian Style

Li, Jiawei, Shangang Ma, Fubao Jin, Ruiting Zhao, Qiang Zhang, and Jiawen Xie. 2025. "A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture" Information 16, no. 2: 110. https://doi.org/10.3390/info16020110

APA Style

Li, J., Ma, S., Jin, F., Zhao, R., Zhang, Q., & Xie, J. (2025). A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture. Information, 16(2), 110. https://doi.org/10.3390/info16020110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on Partial Discharge Fault Identification in GIS Based on Swin Transformer-AFPN-LSTM Architecture

Abstract

1. Introduction

2. GIS Insulation Fault Experiment Design and Dataset Acquisition

2.1. GIS Typical Fault Model

2.2. GIS Experimental Circuit

2.3. Experimental Pressurization and PD Dataset Acquisition

3. Markovian Transfer Field Representation Based on Partial Discharge Time Series Data

3.1. Markov Transfer Field Construction Principle

3.1.1. Markov Transfer Probability Matrix

3.1.2. Markov Transfer Fields

3.2. MTF Representation of GIS Time Series Datas

4. GIS Fault Identification Algorithm Based on Swin Transformer-AFPN-LSTM Model

4.1. Swin Transformer Module

4.2. AFPN Module

4.3. LSTM Module

5. Experimental Analysis

5.1. Experimental Environment

5.2. Evaluation Indicators

5.3. Comparison of Different Deep Learning Methods

5.4. Experiments to Validate Multi-Feature Extraction and Fusion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI