1. Introduction
Modern power systems are exhibiting rapid development in complexity regarding power transmission and supply in order to satisfy increasing energy requirements [
1]. As an important component, transmission lines transmit power from source areas to distribution networks. However, faults frequently occur in transmission lines, impacting the power supply and degrading the reliability of power systems [
2,
3]. Therefore, to maintain damaged components and reduce downtime, it is of significant importance to accurately diagnose transmission line faults and rapidly eliminate them. Against this background, efficient fault diagnosis schemes for transmission lines are urgently needed to remove these faults and guarantee the safe operation of power systems [
4].
With the progress in data measurement and collection systems, massive transmission line running data have become more available. Based on the gathered running data, data-driven-based approaches have a greater ability to diagnose transmission line faults [
5,
6]. For many data-driven-based transmission line recognition models, original datasets are often used as model inputs. However, these input datasets are directly collected from the original process variables. Thus, these models cannot cope with the redundant correlations between different original variables [
7]. In practice, the variables’ redundant correlations lead the process variables to be interrelated and influenced, which always influences the fault recognition effect of the abovementioned methods. Under this circumstance, the performance of these diagnosis approaches for identifying transmission line faults is seriously affected by the variables’ redundant correlated properties [
8].
Recently, principal component analysis (PCA) has been applied to remove the redundant correlations of original variables and extract the intrinsic latent variables by preserving the main variance information [
9,
10]. In view of PCA’s superiority, a PCA-based approach is applied to characterize the transmission line’s running state by retaining the key latent statistics in this paper. However, the traditional PCA-based approach only mines global structure features and neglects important local structure features to eliminate redundant features among the original variables [
11]. Local structure features are also significant for feature extraction and dimension reduction because they indicate detailed neighbor relationships between different samples [
12]. Missed local structures result in a downside influence on eliminating redundant information and extracting key latent features. Recently, local structure-preserving-based methods have been suggested to excavate the local neighbor structure of samples for feature extraction [
13,
14] by exploiting the underlying geometrical manifold of the dataset. However, without preserving variance information, the outer shape of the dataset may be broken after the dimension reduction procedure. Hence, the performance of local feature extraction may be degraded if a dataset has significant directions for variance information.
In order to mine the global and local data structures of process data during feature extraction, a novel dimension reduction technique termed comprehensive feature preservation (CFP) is proposed in our work by combining the advantages of PCA and the local structure preservation technique together. In this way, the developed CFP-based feature extraction model exploits the useful global structure information of process data and excavates the original variables’ important local data structure features. As a result, the transmission line’s fault diagnosis performance is significantly improved by employing the constructed CFP-based feature extraction model to exploit the global and local structure data features during dimension reduction.
After the global and local structure features of the snapshot dataset are extracted, how to effectively analyze these mined fault features is also very significant in recognizing the fault patterns of the transmission line. Recently, deep learning-based approaches have displayed good effectiveness in the fields of fault diagnosis, speech recognition and intelligent transportation due to their superior capability to catch more valuable and useful information from the input data [
15,
16,
17]. For instance, Guo et al. [
18] employed the convolutional neural network (CNN)-based technique to diagnose aircraft sensor faults. Lu et al. [
19] adopted modified long short-term memory (LSTM)-based neural networks to identify bearing faults. Based on the deep confidence network, Guo et al. [
20] suggested a novel fault recognition approach for the variable refrigerant flow (VRF) system. Furthermore, Zhou [
21] discovered that the deep neural network-based approach is more suitable for the VRF system’s multiple category fault diagnosis by comparing the five classification approaches. However, most deep learning-based fault diagnosis models, such as CNN and LSTM, contain intricate structures and need numerous preset parameters [
22], which can result in the problem of overfitting. As the network layer number increases, the computational complexity increases rapidly, which puts forward much higher demands for hardware devices.
In recent years, convolutional networks have revealed the advantageous ability to deal with the sequence modeling issue. As a typical convolution-based method, temporal convolutional networks (TCNs) have been widely adopted for time-series prediction and pattern recognition [
23,
24] owing to their simple network structure and abundant neurons. The convolution kernel sharing and parallel computing modules lead the TCN model to possess reduced computational complexity and shorter computation time. Inspired by the merits of the TCN, a lot of TCN-based approaches have been discussed to enhance the fault diagnosis effectiveness in many applications. For example, Li et al. [
7] declared that the TCN-based diagnosis model achieved a much better effect than other neural networks in identifying the chiller fault by managing the process data’s coupled dynamics. On account of the TCN’s successful applications in the fault diagnosis fields, the fault diagnosis results of the transmission line can be further improved by developing a TCN-based fault recognition method to classify the mined global and local structure features.
The above-introduced TCN-based fault diagnosis algorithms always focus the same attention on different fault samples for the classification instead of affording more attention to the fault samples, which are more difficult to diagnose [
25]. These difficultly identified fault samples, indeed, usually have a significant adverse impact on the final fault diagnosis results. Inspired by the emerging transformer network [
26], the attention mechanism can be regarded as an effective way to pay more attention to specific fault samples. To more effectively utilize the key features exploited from the original data, the attention mechanism enables fault diagnosis models to possess the capability to pay close attention to different fault samples according to diverse fault patterns, which are widely applied in the fault diagnosis field. For instance, Deng et al. [
27] combined the attention mechanism with the LSTM network to set up a sentiment dictionary. Nath et al. [
28] employed an amalgamation of a modified attention mechanism-based sensor and transformers to gain an improved performance in rotor fault diagnosis. In addition, based on the feature extraction from time-series images, Fahim et al. [
29] integrated another revised attention mechanism into the CNN model to identify the fault pattern of transmission lines. However, the existing attention-based networks always encounter the problem of gradient vanishing for the large number of layers, which results in poor learning ability. Therefore, improvement needs to be carried out to solve this limitation of traditional attention-based networks. After that, if the improved attention mechanism is infused into the existing TCN network for the transmission line fault diagnosis, more attention will concentrate on the fault samples with a significant influence on the fault diagnosis effectiveness, which will further enhance the overall performance of the transmission line’s fault identification results.
To exploit much more comprehensive and detailed feature information of the transmission line fault data, an effective combination of the PCA method and the local structure-preserving technique is proposed to mine the data’s global and local structure features during the dimension reduction in our work. Furthermore, a novel skip connection attention (SCA) network is established to effectively select and train the parameters of the attention mechanism network. Finally, to achieve much better fault diagnosis effectiveness for the transmission line, the constructed SCA network is integrated into the TCN, referred to as the enhanced attention-based TCN (EATCN), to pay more attention to the extracted global and local structure features of fault samples, which are difficult to recognize and influence the fault identification performance. Based on the above motivations, a comprehensive feature extraction-based EATCN is proposed to identify the transmission line’s faults in this paper. The main innovations and contributions are given.
(1) To remove the superfluous correlation information among the process variables, a novel CFP-based dimension reduction model is developed to carry out the feature extraction. The CFP model is capable of mining the global and local structure features of original process data, which are utilized as the input of the developed fault recognition approach. Specifically, considering the traditional PCA only maintains the global structures feature for the dimension reduction, the local structure-preserving technique is combined into the PCA to establish the CFP model;
(2) To train the network’s parameters more effectively and to improve the network’s learning capability, the skip connection attention (SCA) network is proposed. To be specific, to establish the issue of gradient vanishing for the masses of layers when training the parameters, the skip connection is incorporated into the conventional attention mechanism to construct the SCA network. In addition, the two fully connected layers are adopted in the SCA to enhance the network’s generalization ability;
(3) To further reinforce the transmission line’s fault recognition performance, the established SCA network is integrated into the TCN model to classify the global and local structure features extracted by the CFP. By combing the SCA network with the TCN’s residual blocks, the suggested EATCN is set up to derive and stack up the modified residual blocks. The fault diagnosis model’s input, i.e., the mined global and local structure features, is recoded by the developed EATCN, which guarantees that more attention is focused on the valuable information and internal correlations of the imported features;
(4) To prove the fault diagnosis effect of the proposed EATCN-based transmission line diagnosis scheme, the EATCN model is compared with some closely related fault diagnosis models. The experiments of suggested EATCN and comparisons with related approaches are carried out on the datasets of the simulated power system. In contrast to the related diagnosis models, the fault diagnosis results display the superior effectiveness of the EATCN-based recognition scheme.
3. The Developed CFP-Based Feature Extraction Technique
As introduced in
Section 2, the PCA cannot grasp the local structure features of the transmission line data. Motivated by this, to further remove the redundant correlation information of the original process variables, a novel comprehensive feature-preserving (CFP)-based dimension reduction technology is proposed to mine the global and local structure features of the original process data, which incorporate the locality structure-preserving technique into the PCA model.
Let the matrix
with n normal samples and m variables denote the normal operating dataset. The training matrix
X is first scaled by subtracting its mean and dividing its standard deviation. Based on the normalized matrix
X, the PCA model seeks a loading vector
p, which can guarantee that the distance among all the samples in the PC space is maximized.
where
is the mean value of
n samples, and
.
From Equation (6), it can be found that the PCA only preserves the global structure features while clearing off the redundant correlations. To optimally preserve the global and local structure features of the training matrix X, the local structure-preserving framework is combined with the PCA model in our work.
For the
i-th sample
, its nearest neighbors
are first searched to construct the local neighborhood subset
by means of the k-nearest neighbor approach. Therefore, the acquired neighbor matrix
has the ability to reveal if the
j-th sample
belongs to the local neighborhood of the
. To be specific, the element
of the similarity matrix
W is determined as:
where
indicates the neighborhood relationship between the samples
and
; thus, the matrix
W can represent the local neighbor relations of the training matrix
X.
The local structure-preserving (LSP) framework aims for the loading vector
p to hold the local neighbor relations of the training matrix
X by minimizing the distances of neighbor samples in the PC space.
where
represents the Laplacian matrix [
12,
13,
14], and
D is a diagonal matrix with the i-th element as
.
To exploit the local and global structure features of the training matrix during the dimension reduction, the optimization
of the proposed CFP technique is constructed in Equation (9) to derive the optimal loading vector
p, which simultaneously maximizes the PCA’s objective function and minimizes the optimization of the local structure-preserving framework.
where
is utilized to keep the global structure features of the training matrix and the local structure features are retained by
. The
is a tradeoff parameter to balance the optimizations
and
. The matrix
is calculated as:
Equation (9) can be solved by computing the eigenvalue decomposition defined in Equation (11).
Suppose are the eigenvectors of the first d largest eigenvalues . Then, the loading matrix P of the CFP is built by retaining these d eigenvectors . These loading vectors are mutually orthogonal, which can effectively improve the discriminative ability of the CFP-based dimension reduction method when extracting the global and local structure features of original data.
The number of the retained loading vectors
d in the PC space is determined based on the cumulative contribution rate, which is given as follows:
In this paper, the value of d in the CFP model is selected by the 95% cumulative contribution rate.
Based on the loading matrix
P, the training matrix
X is decomposed by the suggested CFP model.
where
is the score matrix, which is composed of the derived global and local structure features in the feature space and
indicates the residual matrix.
When the fault sample
becomes available, the latent significant features
of the
are then extracted by protecting
in the feature space, which is expressed as follows.
After the snapshot dataset is set up based on the gathered n fault samples, the dataset is then normalized using the training matrix X. Thereafter, the built CFP is applied to extract the global and local structure features of the snapshot dataset and the historical fault datasets. To gain improved fault diagnosis effectiveness, these exploited global and local structure features are regarded as the input of the subsequently developed recognition model.
6. The Experiments and Comparisons
6.1. Introduction of the Experimental Data
A benchmark power system is modeled in the MATLAB/Simulink environment [
3,
36,
37] to simulate the normal operating and multiple fault datasets, whereas Simscape Electrical affords the component library to model electronic, mechatronic and electrical power systems. The simulated system is widely applied to perform power system studies, which is discussed in the literature [
3,
37]. As displayed in
Figure 7, the simulated power system includes two symmetrical areas, which are connected by the 220 km length transmission line. Except for the transmission line, the faults may occur on other components of the simulated power system, such as transformers, generators, incipient short circuit faults at underground cables and so on. However, because our work focuses on identifying patterns in the short circuit faults on transmission lines, the eleven fault patterns listed in
Table 1 are simulated and studied at the 220 km length transmission line of the simulated power system, i.e., the region from
V7 to
V8 and the region from
V8 to
V9.
The power system is simulated under normal running and short circuit fault conditions on the transmission line. Comprehensive normal and fault datasets are constructed to build the developed EATCN-based fault diagnosis model by measuring and collecting the line voltages and currents of the power system. A total of 12,000 samples are generated and labeled for twelve types of operating condition, which includes one normal operating condition and eleven short circuit fault patterns. Thus, each type of operating condition contains 1000 samples. Before the normalization of transmission line datasets, the Gaussian noise with zero mean and 0.01 variance is introduced to the monitored variables for the purpose of simulating the actual measurement noise. As listed in
Table 1, the simulated eleven fault patterns are {AG, BG, CG, AB, BC, AC, ABG, BCG, ACG, ABC, ABCG}, where the symbols A, B, C and G, respectively, stand for the phases A, B, C and ground. These eleven short circuit fault patterns are classified as double line (LL) faults, line-to-ground (LG) faults, triple line (LLL) faults, double line-to-ground (LLG) faults and triple line-to-ground (LLLG) faults, where only the LLL and LLLG fault patterns are symmetric faults and the remaining are asymmetric faults.
To diagnose the fault pattern of the transmission line, the first 400 collected fault samples are utilized to build the snapshot dataset. The remaining 600 fault samples from the same pattern are regarded as the historical fault dataset. The EATCN-based fault diagnosis model is first trained by feeding the mined historical fault dataset’s global and local structure features, which are extracted by the developed CFP approach, and the snapshot dataset’s global and local structure features are then extracted and imported to the established EATCN-based diagnosis model for the purpose of identifying the pattern of detected faults.
6.2. Compared Approaches and Effectiveness Evaluation Index
To train and deploy the proposed EATCN, the experiments are conducted in the MATLAB R2020a computational environment, which runs on a computer with an Intel(R) Core (TM) i7-10750H CPU @ 2.60GHz and 16.0 GB (15.7 GB usable) installed RAM. In addition, to prove the effect of the proposed EATCN-based diagnosis approach, some traditional fault diagnosis methods, i.e., the support vector machine (SVM), the deep belief network (DBN) and the long short-term memory (LSTM) network, are contrasted with the suggested EATCN. The global and local structure features derived by the constructed CFP are imported to the SVM, DBN and LSTM.
To train the EATCN-based diagnosis model, a tradeoff parameter is chosen as 0.5 by experience; the threshold of the cumulative contribution rate is determined as 95% through trial and error. The learning rate is chosen as 0.001, the batch size is determined as 64, the number of hidden units is 500, and the expansion factor is 2, according to the cross-validation. During the LSTM training, the number of hidden units is set to 300, the batch size is 64 and the 0.001 learning rate is utilized through trial and error. In addition, the optimal model parameters are determined using the Adam optimizer. The node numbers of the DBN’s three convolution layers are, respectively, selected as 32, 64 and 128 through trial and error. The batch size and the learning rate in the DBN are also, respectively, chosen as 64 and 0.001 for fairness. In the SVM, the Gaussian kernel function is adopted, and the parameter is set to 600 using the grid search method. In addition, the weight factor of the SVM is experientially determined as 50.
To assess the performance of the discussed EATCN for the transmission line fault diagnosis, four performance indices, i.e., the fault diagnosis rate of the fault samples in the i-th pattern, the average fault diagnosis rate of fault samples in the total of C patterns, the precision for the i-th pattern and the average precision for a total of C patterns, are utilized.
Particularly, the index
is defined as
where
denotes the number of correctly diagnosed fault samples for the
i-th pattern.
The index
expressed in Equation (18) indicates the average value of all the acquired fault diagnosis rates for the
C fault snapshot datasets.
The precision
for the
i-th pattern is defined as:
where
represents the number of wrongly identified fault samples in the i-th pattern.
The index average precision
expressed in Equation (20) indicates the average value of all the computed precisions for the
C fault snapshot datasets.
6.3. Comparison of the Fault Diagnosis Results
- (1)
Fault diagnosis results comparison for the pattern ABC
After the snapshot dataset,
SABC of the short circuit fault ABC is gathered, the index
FDR’s values of SVM, DBN, LSTM and EATCN for the dataset
SACG are computed as 64.25%, 71.75%, 79.00% and 93.50%. It is observed that the SVM displays the worst fault identification effect, while the DBN and LSTM have better index values for
FDR, i.e., 71.75% and 79.00%, which still needs to be enhanced. Different from SVM, DBN and LSTM, EATCN gains the best recognition capability for the dataset
SABC with the largest value of the index
FDR, i.e., 93.50%. This is due to EATCN’s outstanding performance in exploiting and classifying the global and local structure features contained in the running data of the transmission line. Again, the histogram of the fault identification results for these four fault diagnosis models is illustrated in
Figure 8, which clearly reveals that the diagnosis effectiveness of the discussed EATCN is much better than that of the SVM, DBN and LSTM for discerning the fault pattern ABC.
- (2)
Fault diagnosis results comparison for the pattern BC
The values of the index
FDR for the SVM, DBN, LSTM and EATCN for the short circuit fault BC turn out to be, respectively, 76.50%, 70.75%, 81.75% and 92.50%. It is obvious that there are different fault recognition results for these four diagnosis models to discern the dataset
SBC. Compared with the LSTM and EATCN, the DBN and SVM exhibit a much more unsatisfied fault identification effect to diagnose the fault pattern BC. On the contrary, LSTM reveals the improved diagnosis effect because LSTM’s index
FDR value is 81.75%. In the end, the EATCN reveals the best fault recognition effect as its
FDR value is 92.50%. To make a more vivid comparison, the values of the index
FDR for the SVM, DBN, LSTM and EATCN are plotted in the histogram in
Figure 9. According to the above analysis, the advantage of the EATCN is fully certified over the LSTM, DBN and SVM when identifying the pattern BC.
- (3)
Fault diagnosis results comparison for the pattern ABG
After the fault snapshot dataset
SABG is constructed, the SVM-, DBN-, LSTM- and EATCN-based fault diagnosis models are employed to identify the pattern of short circuit fault ABG by computing the performance index
FDR. To be specific, the values of the index
FDR for the SVM, DBN, LSTM and EATCN are, respectively, 71.50%, 83.50%, 87.25% and 94.25%, which proves that the EATCN possesses the highest value of the index
FDR among these four fault diagnosis models. To make a more visualized analysis, the four different
FDR values are further represented by a bar chart in
Figure 10. In this way, it can be concluded that the EATCN-based diagnosis model outperforms the LSTM, DBN and SVM in terms of recognizing the fault pattern ABG.
- (4)
Fault diagnosis results comparison for the eleven patterns
After the above-introduced eleven fault patterns are detected, fault diagnosis results of the SVM, DBN, LSTM and EATCN for these eleven fault patterns are established and exhibited in
Figure 11 and
Figure 12. Specifically,
Figure 11 provides the confusion matrices of these four fault diagnosis approaches. From
Figure 11, the percentages in the dark orange blocks denote the percentages of accurately diagnosed fault samples, while the percentages in the shallow orange blocks stand for the percentages of mistakenly diagnosed fault samples. As displayed in
Figure 11a–c, the percentages in the shallow orange blocks of the six rows for the SVM, DBN and LSTM are much greater than those in
Figure 11d of the EATCN. This demonstrates that more fault data points pertaining to the fault AC are mistakenly identified by the SVM, DBN and LSTM. Moreover, in comparison with
Figure 11d, many more shallow orange blocks appear in
Figure 11a–c. This phenomenon means that the SVM, DBN and LSTM inaccurately discern many more fault data points of these eleven faults than that of the EATCN. To implement more graphical analysis and comparison, line charts of the index
FDR values for the four approaches under the eleven fault patterns are exhibited in
Figure 12. As revealed in
Figure 12, the fault diagnosis rates of the EATCN are significantly improved compared with those of the SVM, DBN and LSTM. To be specific, the EATCN’s values of the index
FDR for the eleven fault patterns are all above 92.00%, and the largest value of the index
FDR even reaches 98.75%. As displayed in
Figure 12, the great differences between the index
FDR’s values of the four approaches prove the superiority of the EATCN for implementing the transmission line fault diagnosis.
The fault diagnosis rates of the SVM, DBN, LSTM and EATCN for the eleven fault patterns are further quantized in
Table 2. In addition, for the sake of fairness, the values of the index
FDRaverage for the four diagnosis models on the eleven fault patterns are also exhibited in
Table 2. From
Table 2, the values of the index
FDRaverage for the SVM, DBN, LSTM and EATCN are, respectively, computed as 73.68%, 80.75%, 86.64% and 94.98%. Thus, the EATCN-based identification approach demonstrates the largest value of the index
FDRaverage for all eleven fault patterns among the four approaches, which testifies to the superiority of the EATCN’s overall fault recognition effectiveness. Furthermore, in comparison with the SVM, DBN and LSTM, the suggested EATCN also exhibits more remarkable diagnosis performance to discern the particular fault of the eleven fault patterns. For example, the index
FDR’s value of the fault pattern AB is 94.75% for the EATCN, in contrast to only 85.50% for the LSTM, 79.25% for the DBN and 70.50% for the SVM. Analogously, the value of the index
FDR for the fault pattern ABCG is 97.25% for the EATCN, in comparison with only 89.25% for the LSTM, 84.25% for the DBN and even 80.00% for the SVM. It can be concluded that the presented EATCN approach is excellent at recognizing the short circuit fault patterns of the transmission line. This is because the global and local structure features extracted by the EATCN promote an improvement in the transmission line’s fault identification task. To facilitate further visualized analysis, the values of the index
FDR for the four algorithms under the eleven fault patterns are plotted in a histogram in
Figure 13, which also proves the outstanding recognition performance of the EATCN over the SVM, DBN and LSTM for discerning all eleven short circuit faults.
The values of the index precision
P for the SVM, DBN, LSTM and EATCN are listed in
Table 3. In addition, the values of the index
Paverage for the four diagnosis models are also exhibited in
Table 3. From
Table 3, the values of the index
Paverage for the SVM, DBN, LSTM and EATCN are, respectively, computed as 73.71%, 80.77%, 86.68% and 95.01%. Thus, the EATCN-based identification approach demonstrates the largest value of the index
Paverage for all eleven fault patterns, which testifies to the superiority of the EATCN’s overall fault recognition effectiveness. In comparison with the SVM, DBN and LSTM, the suggested EATCN displays a more remarkable diagnosis performance to discern the particular fault of the eleven fault patterns. For example, the index’s precision value of the fault pattern ACG is 94.27% for the EATCN, in contrast to only 89.95% for the DBN, 84.92% for the LSTM and 82.03% for the SVM. Analogously, the value of the index precision for the fault pattern AB is 93.81% for the EATCN, in comparison with only 85.71% for the LSTM, 81.28% for the DBN and 75.40% for the SVM. It can be concluded that the presented EATCN is excellent at recognizing the short circuit fault patterns of the transmission line.
6.4. Fault Diagnosis Effects of the Proposed EATCN under Different Noise Environments
To further verify the fault diagnosis effects of the proposed EATCN-based model under different noise environments, the Gaussian noise with zero mean and different variances is introduced to the monitored variables. The specific values of Gaussian noise’s different variances are set to be 0.1, 0.01, 0.001 and 0.0001 through experience. In this way, the EATCN’s fault diagnosis effects under different noise environments are tested and displayed in
Table 4 and
Table 5.
To be specific,
Table 4 lists the EATCN’s
FDR and
FDRaverage values, while
Table 5 exhibits the EATCN’s indices
P and
Paverage values for the eleven fault patterns, with the noise variance varying from 0.1 to 0.0001. When the value of noise variance is 0.1, which is the largest in our experiment, the EATCN achieves the worst fault diagnosis performance as the values of the
FDRaverage and
Paverage are both the smallest, i.e., 92.23% and 92.61%, respectively. However, the EATCN’s values of
FDRaverage and
Paverage at the largest noise variance environment can be acceptable because they are both above 92.00%. With the decrement in the noise variance, the EATCN’s fault diagnosis effect becomes better and better. But, when the noise variance decreases from 0.001 to 0.0001, the diagnosis effectiveness of the EATCN improves slightly because the
FDRaverage only varies from 96.55% to 97.20% and the
Paverage only increases from 96.71% to 97.55%. Based on the above analysis, the developed EATCN shows outstanding accuracy and robustness under different noise environments.