1. Introduction
The oil-immersed transformer serves as the central component of the power grid, playing a crucial role in the transmission of electrical power. When the transformer malfunctions, it poses a dual risk to public safety and the economy, including incurring high costs for equipment repairs and significant financial losses resulting from power disruptions. Hence, it is necessary to precisely identify the internal problems of transformers beforehand to ensure the secure and reliable operation of the power grid system [
1].
Many electrical power experts developed diagnostic methods for transformers in the 1960s. These traditional methods included vibration testing, infrared testing, voltage and current measurements, and dissolved gas analysis in oil [
2]. Among these, the dissolved gas analysis primarily determines the type of transformer fault based on the relative concentrations of gases such as H
2, CH
4, C
2H
6, C
2H
4, and C
2H
2 produced during insulation failures or aging [
3]. However, a limitation of this method is that the diagnostic results rely on expert experience and observations of the equipment parameters. With the emergence of artificial intelligence technology, various machine learning methods, including Support Vector Machines (SVMs), Multi-Layer Perceptrons (MLPs), and Extreme Learning Machines (ELMs), have been applied to transformer fault diagnosis nowadays.
Jin et al. [
4] enhance the BP neural network by stacking multiple residual network modules and introducing an SVM to evaluate the extracted feature vectors from each layer. Ivanov and Palyukh [
5] present a method for creating a training dataset specifically tailored to fuzzy neural networks, which enables a rapid probability estimation of abnormal critical events or accident causes within a transformer diagnostic system. Research by Kari et al. [
6] employs a deep belief network to extract features from dissolved gases in oil and perform mean clustering for transformer fault classification. Song et al. propose [
7] a novel transformer fault diagnosis model called Meta–OSSA–KELM, which combines Meta-Learning with a Kernel Extreme Learning Machine and opposition-based learning Sparrow Search Algorithm. By integrating chaos mapping and opposition-based learning concepts into the Sparrow Search Algorithm, the experiment results indicate that this model provides a stable and high-performance approach for transformer fault diagnosis.
While the above methods have achieved beneficial results, determining the optimal structure of the diagnostic model and its hyperparameters often faces challenges, limiting further enhancements in diagnostic effectiveness. Furthermore, the uncertainty regarding the types and quantities of features contained within fault samples restricts the widespread application of deep learning and other neural network methods [
8]. Moreover, input features for fault diagnosis models typically rely on IEC or IEEE standard gas concentrations, gas ratios, or relative percentages. Nevertheless, a universally accepted feature set for diagnosing faults in oil-immersed transformers remains elusive. Consequently, previous studies have encountered issues such as a plethora of gas feature types, leading to inadequate analysis of features [
9].
Therefore, the shortcomings leave significant room for further improving fault diagnosis performance. To address these two issues, this paper proposes a transformer fault diagnosis model based on feature selection and IPSO–BP–AdaBoost. The model first establishes an optimal feature set predicated on differences in detection accuracy caused by varying numbers of input features, employing the Random Forest (RF) method for feature selection. Then, it utilizes the multi-strategy Improved Particle Swarm Optimization (IPSO) method to optimize the key parameters in the backpropagation neural network (BPNN). Lastly, the AdaBoost algorithm is applied to enhance the robustness and generalization of the diagnosis model.
2. Feature Selection Based on Random Forest
In case of transformer failure, gases of different compositions and concentrations are released, and the specific gas volume fraction increases rapidly, mainly including methane (CH4), ethane (C2H6), ethylene (C2H4), acetylene (C2H2), and hydrogen (H2). The relationship between dissolved gases in oil and faults is as follows:
- (1)
When high-energy discharge occurs inside the transformer, characteristic gases such as acetylene C2H2 and H2 are mainly produced in the oil, followed by C2H4, CH4, and C2H6.
- (2)
When low-energy discharge occurs inside the transformer, the dissolved gases in the oil are mainly C2H2 and H2, followed by CH4 and C2H4.
- (3)
When partial discharge occurs inside the transformer, the characteristic gases produced in the oil vary with the discharge energy density. When the discharge density is below C, the total hydrocarbon content is generally not high, and the main component is H2, followed by CH4.
- (4)
When the temperature at the fault point is low, the proportion of CH
4 is high. As the hot spot temperature increases (above 500 °C), the content of C
2H
4 and H
2 components sharply increases, with C
2H
4 content exceeding CH
4, but with H
2 content generally not exceeding 30% of the hydrogen hydrocarbon content [
10].
Considering the large dispersion of contents among gases, the diagnosis accuracy of the traditional ratio method is only about 70% [
11]. Therefore, this study created a comprehensive and diverse feature set. Through investigating relevant standards and publications on dissolved gas analysis at home and abroad, the generation rules of dissolved gas in transformer oil under different working conditions are summarized, and the non-code ratios of gases are introduced into fault diagnosis. Ref. [
12] verified that non-code ratios as the input of the fault diagnosis model can effectively improve the accuracy of transformer fault diagnosis.
In this study, the fault sample dataset was collected from Refs. [
13,
14] and IEC TC10 databases, and a total of 596 DGA samples with known transformer operating states were selected. Six types of faulty samples are under consideration, including normal condition (NC), partial discharge (PD), low-energy discharge (D
1), high-energy discharge (D
2), medium- and low-temperature thermal defect (T
1), and high-temperature thermal defect (T
2), as shown in
Table 1.
H
2, CH
4, C
2H
6, C
2H
4, and C
2H
2 are selected as the original gases, and 26 groups of gas features are obtained by combining the gas concentration, three-ratio method, and non-code ratio method, as shown in
Table 2.
Some of the 26 gas compositions may not be directly relevant to fault diagnosis, potentially leading to redundancy. In addition, the multi-dimensional dataset has an impact on the running time of the classification model. Therefore, it is crucial to determine an informative and concise subset of features. In this study, the out-of-bag data (OOB) error rate in Random Forest was used to calculate the importance of each feature. A higher ranking indicates a greater contribution of the feature to fault diagnosis. OOB error rate is the model error rate calculated on the out-of-bag data. Specifically, it involves using each decision tree to predict the corresponding out-of-bag error rate and then calculating the proportion of samples that predict errors [
15]. The specific process is as follows.
For each decision tree in the Random Forest, its corresponding out-of-bag data error rate is calculated, denoted as .
Select a specific feature and randomly add noise to all sample values of this feature in the out-of-bag data.
After introducing noise interference to feature , the error rate of data outside the bag is calculated again and denoted as .
The importance of feature can be measured by comparing the difference between and . Specifically, the importance of feature can be expressed as the mean of ; the greater the value, the greater the influence of feature on the model’s prediction and therefore the greater its importance.
In the experiment, the RF algorithm was run 30 times to eliminate randomness, and the average importance of 26 input features was calculated. Based on the average value, a ranking list was created accordingly. Subsequently, these features are sequentially used as inputs by incrementing or decrementing their ranking order until all features are considered. The importance of inputs and their corresponding diagnostic accuracy are depicted in
Figure 1. For this research, the minimum leaf size is set at 8, and the number of decision trees is set at 400.
The red curve illustrates how diagnostic accuracy evolves as more elements are incorporated, whereas the blue curve demonstrates the accuracy changes as elements are excluded. As can be seen from the red curve, the diagnostic accuracy increases from 51% to 87% as the first eight features are added sequentially, indicating that these gas combinations contribute greatly to the accuracy of the results. However, after the eighth feature, the accuracy remains relatively stable at around 83% ± 3, suggesting that the further addition of features does not lead to a significant increase in accuracy, suggesting that these additional features may be considered redundant. Through the analysis of the blue curve, the diagnostic accuracy gradually improves as the number of elements decreases from 26 to 14. As the number of features continues to decline, the diagnostic performance fluctuates slightly but remains at about 85%
These results indicate that not all gas combinations are crucial for accurate fault diagnosis models. The inclusion of features 9 to 26 did not significantly enhance diagnostic accuracy, and the reduction in some features could even improve it. In summary, this research selects a subset of eight of the most critical gas characteristics, which can reduce the workload of the model while providing optimal diagnostic performance, as shown in
Table 3.
3. BPNN Network Model
The backpropagation neural network (BPNN) is a multi-layer feedforward neural network trained with the error backpropagation algorithm. It comprises an input, a hidden, and an output layer, exhibiting classification capabilities for arbitrarily complex problems and excellent multi-dimensional function mapping abilities. It solves the XOR problem, which simple perceptrons are unable to handle. Its workflow primarily consists of forward and backward propagation, with the forward propagation formula being:
where
represents the input variable;
represents the output variable;
is the hidden layer output;
is the mapping relationship of the activation function;
is the weight of the
th input variable and the
th hidden layer neuron;
is the offset term of the
th neuron of hidden layer
;
is the weight of the
th neuron connected with
;
is the offset of
. The loss function is the mean square error between the real value and the predicted value, and the objective function is:
where
is the predicted value and
is the actual value. In the process of backpropagation, the gradient descent method is used to adjust the weights and biases of each neuron according to the error. The updating formulas of the weights and biases of the output layer are as follows:
where
is the learning rate, which determines the speed of moving parameters to the optimal value, and
is the error term of neuron
.
In addition, a BPNN typically contains one or more hidden layers that introduce nonlinear transformations to approximate complex nonlinear functions. The addition of a hidden layer can enhance the nonlinear fitting ability of the model and improve the expression ability of the model. For some complex problems, a single hidden layer may not be enough to capture the underlying structure of the data, while two hidden layers can provide a richer representation of features [
16]. The backpropagation neural network structure used in this paper is shown in
Figure 2.
4. Theoretical Basis of AdaBoost Algorithm
AdaBoost is an ensemble learning method that assigns initial weights to training samples, trains the first weak classifier, and dynamically adjusts sample weights based on the classification accuracy of the weak classifier, giving more attention to misclassified samples [
17,
18]. After each round, the weight of the weak classifier is adjusted based on its error
. The formula for calculating the weight of the weak classifier is as follows:
Before generating the next base classifier, AdaBoost will amplify the weights of the misclassified samples while reducing the weights of the correctly classified samples. This approach makes the next iteration of the algorithm more focused on the misclassified samples [
19,
20]. The formula for updating the sample weights is as follows:
Finally, a strong classifier is obtained based on the weights of all weak classifiers. When the number of weak classifiers is set to T, the strong classifier outputs according to the following formula:
where
represents the weight of each weak classifier and
represents the output of each weak classifier. The model structure based on BP–AdaBoost is shown in
Figure 3.
5. The Improved Particle Swarm Optimization Algorithm Used in This Research
The performance of a backpropagation neural network is often highly sensitive to the initial weight setting. Inappropriate initial weights may lead to the network converging to suboptimal solutions or failing to converge altogether [
21]. To address this issue, this paper utilizes an enhanced IPSO algorithm to optimize the weights and biases of neurons in the BP network. By using prediction accuracy as the objective function, the IPSO algorithm aims to minimize the fitness value of particles while achieving rapid convergence.
5.1. The Basic Principle of Particle Swarm Optimization
In a D-dimensional target search space, there are N particles forming a community, each of which is a D-dimensional vector, and its spatial position can be represented as:
The flight speed of the
th particle is also a D-dimensional vector, denoted as:
The optimal position found by the
ith particle is called the individual optimal position, and the optimal position in the population is called the global optimal position, respectively, denoted as:
In traditional particle swarm optimization algorithms, the position of particles is determined by their position and velocity in the previous iteration. Each particle demonstrates individual and collective behavior, adjusting its speed and position based on the historical optimal values of both the individual and the population [
22]. The velocity updating formula of particles in the particle swarm is:
The position updating formula of particles is:
Based on the formula, it is evident that the initialization stage plays a crucial role in the optimization effect of the swarm intelligence algorithm itself, as the random distribution of the particle swarm and its uncertain quality can sometimes lead to local optima in early iterations. Furthermore, subsequent iterations face challenges in achieving fast convergence due to the lack of a mutation mechanism. To address these issues, this paper proposes four collaborative optimization strategies that aim to enhance global search and local exploration capabilities, accelerate convergence, and improve the algorithm’s accuracy.
5.2. SPM Chaos Mapping Based on Opposition-Based Learning
In the traditional particle swarm optimization algorithm, the population initialization process exhibits strong randomness, often resulting in poor performance of each particle. Chaos mapping, known for its excellent space ergodicity, can distribute particles more evenly [
23] and is commonly employed in various swarm algorithms. Among these methods, the SPM chaotic function has been shown to enhance the diversity of particles in the population [
24]. Therefore, this paper adopts the SPM function as the chaotic function. The SPM function formula is:
where
represents the (
i+1)th particle,
, and when
the system is in a chaotic state. The generated initial solution is determined based on the upper and lower bounds
and
of the optimized weights and biases. The particle distribution and spectrogram of the SPM chaotic function and three other chaotic functions are shown in
Figure 4.
When the population size is large, chaos mapping significantly enhances the convergence efficiency of the algorithm. However, obtaining points of good quality is often challenging when the population size is small. Therefore, this paper introduces an opposition-based learning (OBL) strategy [
25], which aims to find as many high-quality points as possible to accelerate the convergence speed. Initially, N solutions are generated using SPM mapping during population initialization, followed by the generation of N opposite solutions using opposition-based learning. Subsequently, fitness evaluation and sorting are conducted for a total of 2N solutions, with the top N solutions selected based on better fitness. This approach enhances the diversity and exploration capabilities of the population during the search process. Such a strategy facilitates the algorithm in escaping local optima and accelerating convergence rates. The effectiveness of this strategy has been validated through the improvement of various swarm intelligence optimization algorithms.
The SPM chaotic function is then reapplied to evenly distribute the selected solutions in the solution space. The formula for opposition-based learning is as follows:
where
and
are the lower and upper bounds of solution variables, respectively.
is a random number in the range [0,1].
5.3. Nonlinear Adaptive Weight
According to the updating formula of particle velocity, the influence of
on
mainly depends on the size of the weight
. Therefore, this paper introduces a new nonlinear function to calculate
. The nonlinear function formula is:
During the initial phases of iteration, the value of
should be relatively large with a small rate of change, enabling particles to conduct a wide range of explorations across the entire solution space. During the later stages of iteration, the value of
should decrease but have a high rate of change, which facilitates local exploration and accelerates convergence. Therefore, Equation (17) possesses functional attributes that effectively fulfill the needs for both global and local particle exploration.
Figure 5 shows the comparison of the change curve of
during the iteration process before and after the improvement.
5.4. Crossover Mutation Strategy
To enhance the diversity of the population and avoid becoming stuck in suboptimal solutions, this paper introduces a mutation strategy to modify the population and global optimal solutions [
26]. In the early stages of particle iteration, as the individual positions within the particle swarm may still be relatively dispersed, employing horizontal crossover during this phase can further augment particle diversity, facilitate global searches, and prevent premature convergence to local optima. The mathematical formula for horizontal crossover is:
where
and
represent the parents
and
of dimension
, respectively;
and
are the offspring of
and
after crossing in the
dimension;
and
are random numbers in the range
, which are used to control the weight distribution in the crossover operation. In horizontal crossover, these random numbers determine the degree of information mixing between parent particles.
controls the linear combination ratio between
and
, while
controls the linear combination ratio of another part. Because
is random, each crossover operation will generate different offspring particles, which helps to increase the diversity of the population.
and
are random numbers in the range
, which are used to introduce disturbance or mutation in the crossover operation. This disturbance helps the algorithm jump out of the local optimal solution and explore a broader search space. In the formula,
is a disturbance term, which adjusts the value of offspring particles according to the difference between
and
and the random value of
.
Through the randomness of and , each crossover operation will generate different offspring particles, which increases the diversity of the population. At the same time, the disturbance introduced by and makes the offspring particles different from the parent particles, which further increases the diversity of the population.
In the later stages of iteration, the particle swarm may gradually converge around local optimal solutions. Vertical crossover can assist in breaking the positional inertia of particles and guide them toward new search areas by performing crossover operations in different dimensions [
27]. The mathematical equation for vertical crossover is:
where
is generated by the vertical crossover of
in the
and
dimensions;
. By employing both horizontal crossover and vertical crossover, particles can share information and update their positions across many dimensions and particles. This enables them to thoroughly explore the search space.
5.5. Spiral Search Strategy
Inspired by the prey encirclement behavior of the leading whales in the Whale Optimization Algorithm (WOA), in the iterative process of the whale algorithm, individual whales utilize a spiral search strategy to update their positions relative to prey. This not only ensures the convergence speed of the algorithm but also enhances individual diversity [
28]. In the early stages of the search phase, this paper assigns a higher probability of selecting the traditional PSO particle update strategy to improve the convergence speed of the algorithm, and in the later stages, it selects the spiral search method with a higher probability to accelerate convergence [
29,
30]. The spiral formula is as follows:
where
represents a random number in the range
. The spiral parameter Z cannot remain constant because it limits the search to a monotonic pattern, which may trap particles in local optima and impair the overall search capability of the algorithm. Consequently,
is designed as an adaptive variable, dynamically adjusting the spiral shape of particle search trajectories. This design enhances the particles’ exploration of unknown regions, ultimately improving the algorithm’s search efficiency and global search performance. The formula is as follows:
where
represents the variation coefficient, and in this paper,
.
varies with the number of operations. Specifically, the value of
will vary between
and
. When
approaches its maximum value (
) and
approaches
,
will be very large, thus having a significant impact on the particle position. On the other hand, when
approaches its minimum value (
) and
approaches
,
approaches
, with a smaller impact on the particle position. Through the above stages, the whole process of the IPSO algorithm is complete, and its flowchart is shown in
Figure 6.
6. Ablation Experiment
To assess the impact of four strategies on enhancing the performance of the particle swarm model, this paper compares the traditional PSO with an improved version, IPSO, which incorporates these strategies alongside the backpropagation neural network.
Table 4 illustrates the enhanced PSO incorporating various strategies and relevant parameter settings. The experimental setup of this study is based on Intel Corporation's Core i7-12700H hardware processor and MATLAB 2020 software platform.
In the ablation experiment, the model iteration count was set to 30, the population size was 20, and both the individual learning factor and social learning factor were 2.
As can be seen from
Figure 7, the introduction of the nonlinear weight and crossover strategy in IPSO1 enhances the algorithm’s local optimization ability in the later stages. After introducing chaos mapping, the convergence speed of IPSO2 is improved in the early iteration stages, but its optimization ability has not improved much. By introducing chaos mapping based on opposition-based learning, IPSO3 can find a superior global optimum during the later stages of iteration. After introducing spiral search, IPSO4 can obtain a better set of optimal parameters faster than IPSO3, which proves the effectiveness of the proposed strategy.
7. Test Function Experiment
To validate the performance of the improved algorithm, the DBO, SSA, PSO, and HO algorithms and the proposed IPSO in this paper were tested using single-peak and multi-peak test functions from the CEC2017 test suite. Parameter Settings for the DBO, SSA, PSO, and HO algorithms can be found in
Appendix A.
The single-peak test functions F1, F3, and F4 were employed to assess the local exploration capabilities of the algorithms, while the multi-peak test functions F5–F10 served to evaluate their global search abilities. The mathematical formulas and boundaries for all functions are shown in
Table 5.
All algorithms underwent 500 iterations with a population size of 100. Specifically, the dimension of test function F1 was set to 30, and the dimensions of F3, F4, F8, F9, and F10 were set to 50. To ensure the reliability of the results, each algorithm was tested independently 30 times on each test function, yielding best, worst, and average values and standard deviations. The experimental results are summarized in
Table 6.
The outcomes of the single-peak test function reveal that the IPSO algorithm exhibits superior optimization capabilities and convergence rates in comparison to three other benchmark algorithms. In the context of multimodal test functions, the IPSO algorithm stands out with exceptional optimization performance, particularly when confronted with complex, multi-dimensional functions. Furthermore, analysis of the standard deviation underscores the significantly higher stability of the IPSO function, attributable to its enhanced population diversity. The integration of spiral search and crossover strategies ensures comprehensive exploration of the global search space, thereby effectively mitigating the issue of severe homogeneity among individuals during late iterations. Overall, the IPSO algorithm introduced in this study has undergone substantial improvements in terms of precision and stability, with notable enhancements in convergence accuracy and stability, particularly for both unimodal and multimodal functions. To display the effect of optimization intuitively, this paper selects the optimization graphs of six functions, as shown in
Figure 8.
Based on the iteration curves, it can be concluded that compared to the improved IPSO, the DBO, SSA, PSO and HO algorithms require more iterations to achieve the optimal value or to reach the same accuracy.
This is because, in the iterative process of IPSO, the introduction of multiple strategies not only increases the diversity of high-quality particles in the population but also effectively balances the global search and local search so that the particles can find better global optimal values and improve the convergence speed of the algorithm.
On the other hand, statistical analysis helps to confirm the significance of differences between the results obtained by different algorithms. This study uses a nonparametric statistical test called the Wilcoxon rank sum test. Statistical testing has an output parameter called a p-value, which determines the significance level of two algorithms. In this study, two algorithms are considered statistically different only if the p-value resulting from the Wilcoxon rank sum test is less than 0.05. Details are shown in
Table 7.
It is observed that the p-value is less than 0.05 in most cases, which indicates that the optimization ability of IPSO is statistically superior compared with other algorithms.
8. Example of Transformer Fault Data Diagnosis
8.1. Transformer Fault Diagnosis Model Based on Improved IPSO–BP–AdaBoost
To verify the effectiveness of the model in practical transformer fault diagnosis, DGA samples collected from previously published literature and IEC TC10 databases were used to create a fault diagnosis model. Among them, 70% of the sample dataset (417 samples) was used as a training set, and the remaining 30% of the samples (179 samples) were used as a test set. The model operation steps are as follows:
Step 1: Initialize the relevant parameters of IPSO: population size, maximum iteration times, spatial dimensions, upper and lower bounds of the solution, and learning factors. Initialize AdaBoost-related parameters: weak classifier weight α and sample weight D.
Step 2: Use the SPM function to generate N solutions according to Equation (15).
Step 3: Use opposition-based learning to generate N opposite solutions according to Equation (16).
Step 4: Calculate the fitness of a total of 2N solutions and select the top N ranked solutions.
Step 5: Update the particle positions according to Equations (13) and (21).
Step 6: Determine whether the maximum iteration times have been reached. If so, output the global optimal individual position; otherwise, return to the loop.
Step 7: Calculate the weight of each sample according to Equation (7).
Step 8: Calculate the weight of the weak classifier according to Equation (6).
Step 9: Determine whether each weak classifier has been trained. If so, integrate them according to the weight of the weak classifier and output the result, which is the strong classifier result according to Equation (8). Otherwise, return to the loop.
The diagnostic model used in this paper is shown in
Figure 9.
8.2. Diagnostic Accuracy of Different Models and Feature Combinations
Firstly, the experiment aims to determine the optimal number of neurons in the two hidden layers of the backpropagation neural network (BPNN). The initial configuration is set at 8 neurons, with a subsequent increase to 16. To mitigate randomness, the experiment was conducted over 30 iterations, and the average values were calculated. The results are presented in
Figure 10.
It is evident that the BPNN achieves optimal diagnostic performance when the number of neurons is set at 12. However, it is noteworthy that without undergoing algorithmic optimization, the diagnostic accuracy remains relatively low. Secondly, the experiment compares the key parameters of the backpropagation neural network (BPNN) based on various optimization algorithms, including the Sparrow Search Algorithm (SSA), Particle Swarm Optimization (PSO), the Dung Beetle Optimization Algorithm (DBO), and the Hippopotamus Optimization (HO) algorithm. The number of iterations for each optimization algorithm in the experiment is set to 30, the population size is 20, and the BPNN is configured with 8 input nodes, 2 hidden layers each containing 12 neurons, and 6 output nodes. The iteration curves of all models are shown in
Figure 11.
The experimental results indicate that, compared to other optimization algorithms, the proposed IPSO algorithm in this paper possesses excellent global and local search capabilities, achieving the fastest convergence while obtaining the smallest fitness value. Some other algorithms lack local search capabilities in the later stages of iteration or require longer iteration times to complete optimization. It is noteworthy that the optimization ability of IPSO has significantly enhanced compared to PSO, further substantiating the effectiveness of incorporating the four proposed strategies.
To further validate the effectiveness of the feature selection and models used in this paper for practical transformer fault diagnosis, a comparative experiment was conducted between the Duval triangle method, the Rogers ratios method [
31], the IEC standard code method, the clustering method, the ANN method, the conditional probability method, the modified Rogers ratios method, the modified IEC code method, and the IPSO–BP–AdaBoost method. The conditional probability method uses a Multivariate Normal Probability Density Function (MVNPDF). Each method was tested independently 30 times, and the average values were taken to minimize the randomness of single experimental results. The experimental results are summarized in
Figure 12, where the modified Rogers ratio method and IEC code method are represented as *Rogers4 and *IEC60599, respectively. The closer the color of the bar chart is to yellow, the higher the average accuracy; conversely, the closer the color is to blue, the lower the average accuracy.
As seen in
Figure 12, the accuracies of different methods for fault diagnosis are 64.34% for Duval triangles, 47.87% for Rogers’ 4-ratios, 56.19% for IEC, 77.07% for clustering, 79.96% for the ANN technique, 74.02% for the conditional probability method, 68.93% for refining IEC, and 62.98% for refining Rogers’ 4-ratios. The IPSO–BP–AdaBoost used in this paper has the highest diagnostic accuracy (89.6%), which proves the effectiveness of the method.
To verify the superiority of feature selection, PSO, DBP, SSA, HO, and IPSO algorithms are used to optimize BP–AdaBoost, and the accuracy of 26 input features and 8 input optimal features is compared. All results are averaged after 30 experiments, as shown in
Figure 13.
It is obvious that the diagnostic accuracy of all models is improved after the use of the best feature set. Meanwhile, IPSO–BP–AdaBoost, proposed in this paper, has the highest accuracy regardless of the input of all feature sets or the input of the best feature set.
Figure 14 shows the best performance of each model in 30 tests.
As illustrated in
Figure 14, the IPSO demonstrates the highest accuracy in classifying various fault categories. This not only validates the effectiveness of the improved strategy proposed in this paper when compared to traditional PSO but also highlights its advantages over other intelligent algorithms.
In addition to the average diagnostic accuracy, this paper summarizes the kappa coefficient, F1-scores, precisions, and recalls of various models. The kappa coefficient is a measure that assesses the agreement between the predictions of a classification model and the actual results. It effectively solves the problem of data imbalance and random prediction. Precision represents the proportion of predicted positive instances that are actually positive, while recall represents the proportion of actual positive instances that the model correctly identifies. The F1-score is a measure that combines precision and recall. A higher F1-score means better model performance, making it more reliable than accuracy in scenarios involving unbalanced sample classes, as shown in
Table 8.
It can be concluded from
Table 8 that among the five models, IPSO–BP–AdaBoost shows the best performance, which is superior to other models in classification accuracy (Kappa coefficient), comprehensive performance (F1-score), ability to find all positive samples (recall), and prediction accuracy (precision). SSA–BP–AdaBoost and HO–BP–AdaBoost also showed better performance but were slightly inferior to IPSO–BP–AdaBoost in various indexes. DBO–BP–AdaBoost and PSO–BP–AdaBoost, on the other hand, display a degree of effectiveness, yet their performance is comparatively less robust when compared to the other three models, particularly IPSO–BP–AdaBoost.
This article further compares four deep learning models—DBNs (Deep Belief Networks), CNNs (Convolutional Neural Networks), ELMs (Extreme Learning Machine), and BiGRUs (Bidirectional Gated Recurrent Units)—with IPSO–BP–AdaBoost. All models utilized the dataset that had undergone feature selection, and each model was run 30 times to obtain the average accuracy, as presented in
Table 9.
From
Table 9, DBN has the lowest average accuracy and the largest difference between the highest and lowest accuracy, indicating poor stability of the model. The average accuracy of CNN and ELM is higher than that of DBN, but the robustness of both models needs improvement. Although BiGRU exhibits good robustness, its average accuracy is relatively low. The IPSO–BP–AdaBoost proposed in this paper can maintain high accuracy while ensuring minimal fluctuation in each running result. This indicates that the model has good robustness and consistency and can provide reliable diagnostic results.
9. Conclusions
In this article, the IPSO-BP AdaBoost algorithm, integrated with advanced feature selection techniques, is employed to diagnose faults in power transformers. The feature selection method utilizes Random Forests to construct more compact, precise, and informative feature subsets. The optimally selected features include C2H2/C2H4, C2H2/TH, TH, (H2 + C2H4)/TG, H2, C2H6/TH, C2H4/TH, and C2H2. The IPSO algorithm, augmented with multiple strategic enhancements, is utilized to optimize the initial weights and biases of the BPNN. The innovative application of chaos mapping and opposition-based learning methods for generating initial solutions has proven to be advantageous for parameter optimization and mitigates the initialization sensitivity issue of the BPNN.
DGA samples sourced from domestic transformer data and the IEC-TC10 database were employed to validate the efficacy of the optimal feature subset. Comparative analysis of fault diagnosis performance between the optimal feature subset and other gas ratio methods revealed that the proposed feature subset delivers superior diagnostic performance with an accuracy of 91.06%. This substantiates the advantages and effectiveness of the proposed methodology. Additionally, comparisons with other models underscore the high accuracy and robustness of the IPSO–BP–AdaBoost diagnostic model. However, due to its high precision, it requires a longer training time. Consequently, future research will focus on optimizing the model further to reduce training duration while simultaneously enhancing accuracy. On the other hand, in future research, in addition to the gas characteristics in the oil, it is necessary to further study the relationship between other physical parameters and transformer faults, to comprehensively judge transformer faults from multiple angles.
Author Contributions
Conceptualization, Z.F.; methodology, L.Z.; validation, L.Z.; formal analysis, L.Z.; data curation, K.L., H.R., and Y.W.; writing—original draft preparation, L.Z.; writing—review and editing, Z.F., K.L., Y.W., and H.R.; visualization, L.Z.; supervision, Z.F.; funding acquisition, Z.F. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.
Acknowledgments
We thank Ayman HobAllah and Ibrahim B. M. Taha for professional guidance.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
Table A1.
Parameter description of other algorithms in this paper.
Table A1.
Parameter description of other algorithms in this paper.
Algorithm | Parameter Settings |
---|
PSO | Cognitive and social constants: 2 and 2 Inertia weight: linear reduction from 0.9 to 0.1 |
DBO | Probability of encounter: 0.1 Deflection coefficient: 0.1 Natural coefficient: 1 or −1 Constant: 0.3 |
SSA | Leader position update probability: 0.5 Proportion of the number of leaders: 0.2 |
References
- Faiz, J.; Soleimani, M. Dissolved gas analysis evaluation in electric power transformers using conventional methods a review. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 1239–1248. [Google Scholar] [CrossRef]
- Song, X. Research on D-S Evidence Theory and Its Application in Transformer Fault Diagnosis. Master’s Thesis, Huaibei Normal University, Huaibei, China, 2023. [Google Scholar] [CrossRef]
- Liu, Y.; Ni, Y.P. Transformer fault diagnosis method based on grey correlation analysis of three ratios. High-Volt. Technol. 2002, 10, 16–17,27. [Google Scholar] [CrossRef]
- Jin, Y.; Wu, H.; Zheng, J.; Zhang, J.; Liu, Z. Power Transformer Fault Diagnosis Based on Improved BP Neural Network. Electronics 2023, 12, 3526. [Google Scholar] [CrossRef]
- Ivanov, V.K.; Palyukh, B.V. Application of Evidence Theory for Training Fuzzy Neural Networks in Diagnostic Systems. Pattern Recognit. Image Anal. 2023, 33, 354–359. [Google Scholar] [CrossRef]
- Kari, T.; He, Z.; Rouzi, A.; Zhang, Z.; Ma, X.; Du, L. Power transformer fault diagnosis using random forest and optimized kernel extreme learning machine. Intell. Autom. Soft Comput. 2023, 37, 691–705. [Google Scholar] [CrossRef]
- Yu, S.; Tan, W.; Zhang, C.; Tang, C.; Cai, L.; Hu, D. Power transformer’s fault diagnosis based on a meta-learning approach to kernel extreme learning machine with opposition-based learning sparrow search algorithm. J. Intell. Fuzzy Syst. 2023, 44, 455–466. [Google Scholar] [CrossRef]
- Bai, X.; Zang, Y.; Ge, L.; Li, C.; Li, J.; Yuan, X. Selection Method of Feature Derived from Dissolved Gas in Oil for Fault Diagnosis of Transformers. High Volt. Eng. 2022, 48, 1–15. [Google Scholar]
- Zhang, G.; Chen, K.; Fang, R.; Wang, K.; Zhang, X. Transformer fault diagnosis based on DGA and a whale algorithm optimizing a LogitBoost-decision tree. Power Syst. Prot. Control 2023, 51, 63–72. [Google Scholar] [CrossRef]
- Cao, H.; Zhou, C.; Meng, Y.; Shen, J.; Xie, X. Advancement in transformer fault diagnosis technology. Front. Energy Res. 2024, 12, 1437614. [Google Scholar] [CrossRef]
- JWG D1/A2.47; Advances in DGA Interpretation. CIGRE: Paris, France, 2019.
- Dai, J.; Song, H.; Yang, Y.; Chen, Y.; Sheng, G.; Jiang, X. ReLU-DBN method for transformer fault diagnosis based on gas analysis in oil. Power Grid Technol. 2018, 42, 658–664. [Google Scholar] [CrossRef]
- Sun, C. Transformer Fault Diagnosis Based on Machine Learning Algorithms. Master’s Thesis, Shanghai Jiao Tong University, Shanghai, China, 2019. [Google Scholar] [CrossRef]
- Hoballah, A.; Mansour, D.-E.A.; Taha, I.B.M. Hybrid Grey Wolf Optimizer for Transformer Fault Diagnosis Using Dissolved Gases Considering Uncertainty in Measurements. IEEE Access 2020, 8, 139176–139187. [Google Scholar] [CrossRef]
- Wang, Z. Research on Feature Selection Methods based on Random Forest. Teh. Vjesn. 2023, 30, 623–633. [Google Scholar]
- Yu, G.; Zhao, Y.; Fu, Z.; Chen, Z. Application of back propagation neural network in the analysis of isothermal elastohydrodynamic lubrication. Tribol. Int. 2024, 198, 109883. [Google Scholar] [CrossRef]
- Huang, S.; Zhang, J.; He, Y.; Fu, X.; Fan, L.; Yao, G.; Wen, Y. Short-Term Load Forecasting Based on the CEEMDAN-Sample Entropy-BPNN-Transformer. Energies 2022, 15, 3659. [Google Scholar] [CrossRef]
- Zuo, W.; He, Z.; Yang, Y. Transformer health prediction based on digital twins and SSA-BP. Comput. Digit. Eng. 2023, 51, 2457–2463. [Google Scholar]
- Zhang, Z.; Kong, W.; Li, L.; Zhao, H.; Xin, C. Prediction of Transformer Oil Temperature Based on an Improved PSO Neural Network Algorithm. Recent Adv. Electr. Electron. Eng. 2024, 17, e270423216280. [Google Scholar] [CrossRef]
- Li, J.; Li, G.; Hai, C.; Guo, M. Transformer Fault Diagnosis Based on Multi-Class AdaBoost Algorithm. IEEE Access 2022, 10, 1522–1532. [Google Scholar] [CrossRef]
- Wang, F.; Yuan, G.; Guo, C.; Li, Z. Research on fault diagnosis method of aviation cable based on improved AdaBoost. Adv. Mech. Eng. 2022, 14, 16878132221125762. [Google Scholar] [CrossRef]
- Zhang, Y.; Qu, J.; Fang, X.; Luo, G. Motor bearing fault diagnosis based on multi-feature fusion and PSO-BP. In Proceedings of the 2021 IEEE 4th Student Conference on Electric Machines and Systems (SCEMS), Huzhou, China, 1–3 December 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Lu, W.; Shi, C.; Fu, H.; Xu, Y. A Power Transformer Fault Diagnosis Method Based on Improved Sand Cat Swarm Optimization Algorithm and Bidirectional Gated Recurrent Unit. Electronics 2023, 12, 672. [Google Scholar] [CrossRef]
- Lou, L.; Zhang, H. Grey Wolf Optimization algorithm based on Hybrid Multi-strategy. In Proceedings of the 2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 21–23 April 2023; pp. 1342–1345. [Google Scholar] [CrossRef]
- Hu, Y.; Xiong, R.; Li, J.; Zhou, C.; Wu, Q. An Improved Sand Cat Swarm Operation and Its Application in Engineering. IEEE Access 2023, 11, 68664–68681. [Google Scholar] [CrossRef]
- Tizhoosh, H.R. Opposition-Based Learning: A New Scheme for Machine Intelligence. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’06), Vienna, Austria, 28–30 November 2005; pp. 695–701. [Google Scholar] [CrossRef]
- Nazari, M.; Esnaashari, M.; Parvizimosaed, M.; Damia, A. A Noval Reduced Particle Swarm Optimization with Improved Learning Strategy and Crossover Operator. In Proceedings of the 2023 28th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 25–26 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Cao, J.; Lu, M. Industrial water consumption prediction based on hybrid strategy improved SSA-SVM. Hydroelectr. Energy Sci. 2023, 41, 28–31. [Google Scholar] [CrossRef]
- Zhang, D.; Zhao, Y.; Ding, J.; Wang, Z.; Xu, J. Multi-Strategy Fusion Improved Adaptive Hunger Games Search. IEEE Access 2023, 11, 67400–67410. [Google Scholar] [CrossRef]
- Gao, J.; Xing, Q.; Li, L.; Fan, C. Improved Particle Swarm Optimization Algorithm Using Projection Spiral Search. J. Xi’an Jiaotong Univ. 2018, 52, 48–54. [Google Scholar]
- Ibrahim, S.I.; Ghoneim, S.S.M.; Taha, I.B.M. DGALab: An extensible software implementation for DGA. IET Gener. Transm. Distrib. 2018, 12, 4117–4124. [Google Scholar] [CrossRef]
Figure 1.
Feature importance evaluation and diagnostic accuracy curves.
Figure 1.
Feature importance evaluation and diagnostic accuracy curves.
Figure 2.
Structure of BPNN.
Figure 2.
Structure of BPNN.
Figure 3.
AdaBoost algorithm model.
Figure 3.
AdaBoost algorithm model.
Figure 4.
Particle distributions and spectrograms when using four chaotic functions.
Figure 4.
Particle distributions and spectrograms when using four chaotic functions.
Figure 5.
The curve of inertial weight.
Figure 5.
The curve of inertial weight.
Figure 6.
The workflow of IPSO.
Figure 6.
The workflow of IPSO.
Figure 7.
Comparison of iteration curves of PSO with different strategies added.
Figure 7.
Comparison of iteration curves of PSO with different strategies added.
Figure 8.
Iteration curves of five models on different test functions.
Figure 8.
Iteration curves of five models on different test functions.
Figure 9.
Transformer fault diagnosis model.
Figure 9.
Transformer fault diagnosis model.
Figure 10.
Determination of the number of hidden layer neurons.
Figure 10.
Determination of the number of hidden layer neurons.
Figure 11.
Fitness curves are based on different optimization algorithms.
Figure 11.
Fitness curves are based on different optimization algorithms.
Figure 12.
Diagnostic results of different methods.
Figure 12.
Diagnostic results of different methods.
Figure 13.
Average diagnostic accuracies of different feature sets.
Figure 13.
Average diagnostic accuracies of different feature sets.
Figure 14.
The best diagnostic accuracies of five models.
Figure 14.
The best diagnostic accuracies of five models.
Table 1.
Fault sample distribution.
Table 1.
Fault sample distribution.
Working Status | Category Label | Total Number of Samples |
---|
D1 | 1 | 108 |
D2 | 2 | 86 |
T2 | 3 | 104 |
T1 | 4 | 119 |
PD | 5 | 113 |
NC | 6 | 66 |
Table 2.
Feature sets.
Number | Feature | Number | Feature |
---|
1 | H2 | 14 | C2H2/TH |
2 | CH4 | 15 | (CH4 + C2H2/TH |
3 | C2H6 | 16 | (CH4 + C2H4)/TH |
4 | C2H4 | 17 | (CH4 + C2H6)/TH |
5 | C2H2 | 18 | (C2H4 + C2H2)/TH |
6 | TH | 19 | (C2H6 + C2H2)/TH |
7 | H2/TH | 20 | (C2H4 + C2H6)/TH |
8 | CH4/TH | 21 | TG |
9 | C2H2/C2H4 | 22 | H2/TG |
10 | CH4/H2 | 23 | (H2 + CH4)/TG |
11 | C2H4/C2H6 | 24 | (H2 + C2H4)/TG |
12 | C2H6/TH | 25 | (H2 + C2H6)/TG |
13 | C2H4/TH | 26 | (H2 + C2H2)/TG |
Table 3.
Optimal feature subset.
Table 3.
Optimal feature subset.
Number | Feature Item | Number | Feature Item |
---|
1 | C2H2/C2H4 | 5 | H2 |
2 | C2H2/TH | 6 | C2H6/TH |
3 | TH | 7 | C2H4/TH |
4 | (H2 + C2H4)/TG | 8 | C2H2 |
Table 4.
Improved PSO with the introduction of different strategies.
Table 4.
Improved PSO with the introduction of different strategies.
Algorithm | Introduced Strategies | Parameter Setting |
---|
PSO | None | |
IPSO1 | Nonlinear weight Crossover strategy | |
IPSO2 | Nonlinear weight Crossover strategy Chaos mapping |
|
IPSO3 | Nonlinear weight Crossover strategy Chaos mapping based on opposition-based learning |
|
IPSO4 | Nonlinear weight Crossover strategy Chaos mapping based on opposition-based learning Spiral search strategy |
|
Table 5.
Standard test function table.
Table 5.
Standard test function table.
Function | Search Range |
---|
| [−100,100] |
| [−100,100] |
| [−100,100] |
| [−100,100] |
| [−100,100] |
| [−100,100] |
Table 6.
Evaluation indicators of test function results.
Table 6.
Evaluation indicators of test function results.
Function | Result | IPSO | DBO | PSO | HO | SSA |
---|
F1 | Best value | 3.22 × 103 | 1.83 × 106 | 1.47 × 104 | 6.8 × 105 | 3.88 × 103 |
Standard deviation | 2.18 × 104 | 5.81 × 107 | 7.08 × 108 | 2.42 × 107 | 4.11 × 103 |
Average value | 2.08 × 104 | 4.83 × 107 | 3.17 × 108 | 1.44 × 107 | 9.87 × 103 |
Worst value | 5.79 × 104 | 1.41 × 108 | 1.58 × 109 | 5.73 × 107 | 1.5 × 104 |
F3 | Best value | 8.4 × 103 | 3.37 × 104 | 1.43 × 104 | 2.37 × 104 | 3.26 × 104 |
Standard deviation | 7.46 × 103 | 2.8 × 104 | 1.1 × 104 | 7.43 × 103 | 7.56 × 103 |
Average value | 2.16 × 104 | 9.49 × 104 | 3.16 × 104 | 4.14 × 104 | 5.25 × 104 |
Worst value | 4.28 × 104 | 1.65 × 105 | 6.46 × 104 | 5.47 × 104 | 6.8 × 104 |
F4 | Best value | 4.99 × 102 | 8.23 × 102 | 6.37 × 102 | 6.79 × 102 | 5.27 × 102 |
Standard deviation | 59.9 | 2.25 × 102 | 1.96 × 102 | 2.62 × 102 | 2.52 × 102 |
Average value | 5.9 × 102 | 1.21 × 103 | 9.1 × 102 | 1.06 × 103 | 7.75 × 102 |
Worst value | 7.06 × 102 | 1.77 × 103 | 1.45 × 103 | 1.75 × 103 | 1.59 × 103 |
F8 | Best value | 9.57 × 102 | 1.11 × 103 | 1.1 × 103 | 1.06 × 103 | 1.02 × 103 |
Standard deviation | 33.4 | 84.1 | 40.2 | 33 | 54.6 |
Average value | 1.03 × 103 | 1.24 × 103 | 1.20 × 103 | 1.14 × 103 | 1.17 × 103 |
Worst value | 1.08 × 103 | 1.41 × 103 | 1.29 × 103 | 1.2 × 103 | 1.28 × 103 |
F9 | Best value | 6.86 × 103 | 9.4 × 103 | 9.6 × 103 | 9.46 × 103 | 9.97 × 103 |
Standard deviation | 7.14 × 102 | 6.67 × 103 | 1.49 × 103 | 5.19 × 103 | 1.46 × 103 |
Average value | 7.99 × 103 | 1.92 × 104 | 1.36 × 104 | 1.78 × 104 | 1.36 × 104 |
Worst value | 8.62 × 103 | 3.84 × 104 | 1.80 × 104 | 2.87 × 104 | 1.76 × 104 |
F10 | Best value | 6.86 × 103 | 7.41 × 103 | 5.91 × 103 | 6.36 × 103 | 6.39 × 103 |
Standard deviation | 7.14 × 102 | 9.95 × 102 | 1.25 × 103 | 1.1 × 103 | 8.23 × 102 |
Average value | 7.99 × 103 | 9.22 × 103 | 8.15 × 103 | 8.36 × 103 | 8.31 × 103 |
Worst value | 8.62 × 103 | 1.16 × 104 | 1.12 × 104 | 1.1 × 104 | 1 × 104 |
Table 7.
p-values obtained from test function.
Table 7.
p-values obtained from test function.
Function | IPSO versus DBO | IPSO versus PSO | IPSO versus HO | IPSO versus SSA |
---|
F1 | 5.26 × 10−4 | 1.95 × 10−3 | 7.94 × 10−3 | 9.52 × 10−2 |
F3 | 2.61 × 10−10 | 0.42 | 7.62 × 10−3 | 1.68 × 10−3 |
F4 | 7.77 × 10−9 | 4.36 × 10−4 | 3.34 × 10−11 | 0.22 |
F8 | 3.67 × 10−3 | 1.46 × 10−10 | 1.17 × 10−3 | 4.24 × 10−2 |
F9 | 4.23 × 10−3 | 3.64 × 10−2 | 1.41 × 10−9 | 3.01 × 10−4 |
F10 | 3.55 × 10−1 | 7.22 × 10−6 | 4.55 × 10−1 | 9.63 × 10−2 |
Table 8.
Comparative evaluation of model diagnostic performance.
Table 8.
Comparative evaluation of model diagnostic performance.
Models | Kappa Coefficient | F1-Score | Recall | Precision |
---|
PSO–BP–AdaBoost | 0.7723 | 0.8090 | 0.8111 | 0.8068 |
DBO–BP–AdaBoost | 0.8126 | 0.8398 | 0.8379 | 0.8418 |
SSA–BP–AdaBoost | 0.8327 | 0.8617 | 0.8614 | 0.8620 |
HO–BP–AdaBoost | 0.8326 | 0.8589 | 0.8599 | 0.8579 |
IPSO–BP–AdaBoost | 0.8928 | 0.9145 | 0.9166 | 0.9124 |
Table 9.
Comparison results with four deep learning models.
Table 9.
Comparison results with four deep learning models.
Models | Highest Accuracy Rate (%) | Minimum Accuracy Rate (%) | Average Accuracy Rate (%) |
---|
DBN | 80.00% | 50.83% | 64.27% |
CNN | 86.03% | 79.88% | 82.64% |
ELM | 89.38% | 80.44% | 85.27% |
BiGRU | 85.47% | 81.00% | 83.89% |
IPSO–BP–AdaBoost | 91.06% | 87.68% | 89.60% |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).