Next Article in Journal
Analysis of the Relationship Between Energy Consumption in Transport, Carbon Dioxide Emissions and State Revenues: The Case of Poland
Previous Article in Journal
Numerical Investigation of Spray Cooling Dynamics: Effects of Ambient Pressure, Weber Number, and Spray Distance on Droplet Heat Transfer Efficiency
Previous Article in Special Issue
Numerical Analysis and Experimental Verification of Optical Fiber Composite Overhead Ground Wire (OPGW) Direct Current (DC) Ice Melting Dynamic Process Considering Gap Convection Heat Transfer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fault Diagnosis Method for Transformer Winding Based on Differentiated M-Training Classification Optimized by White Shark Optimization Algorithm

1
Electric Power Research Institute of Yunnan Power Grid, Kunming 650214, China
2
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(9), 2290; https://doi.org/10.3390/en18092290
Submission received: 25 March 2025 / Revised: 23 April 2025 / Accepted: 25 April 2025 / Published: 30 April 2025

Abstract

:
Transformers, serving as critical components in power systems, are predominantly affected by winding faults that compromise their operational safety and reliability. Frequency Response Analysis (FRA) has emerged as the prevailing methodology for the status assessment of transformer windings in contemporary power engineering practice. To mitigate the accuracy limitations of single-classifier approaches in winding status assessment, this paper proposes a differentiated M-training classification algorithm based on White Shark Optimization (WSO). The principal contributions are threefold: First, building upon the fundamental principles of the M-training algorithm, we establish a classification model incorporating diversified classifiers. For each base classifier, a parameter optimization method leveraging WSO is developed to enhance diagnostic precision. Second, an experimental platform for transformer fault simulation is constructed, capable of replicating various fault types with programmable severity levels. Through controlled experiments, frequency response curves and associated characteristic parameters are systematically acquired under diverse winding statuses. Finally, the model undergoes comprehensive training and validation using experimental datasets, and the model is verified and analyzed by the actual transformer test results. The experimental findings demonstrate that implementing WSO for base classifier optimization enhances the M-training algorithm’s diagnostic precision by 8.92% in fault-type identification and 8.17% in severity-level recognition. The proposed differentiated M-training architecture achieves classification accuracies of 98.33% for fault-type discrimination and 97.17% for severity quantification, representing statistically significant improvements over standalone classifiers.

1. Introduction

Power transformers, functioning as pivotal infrastructure in electrical power systems, undertake critical responsibilities in energy generation, transmission, and distribution. Their operational integrity directly governs the quality of transmitted electrical energy across the grid [1,2]. Statistical analyses of failure patterns reveal that approximately 30% of transformer faults originate from winding-related anomalies [3]. Therefore, an accurate assessment of the transformer winding status is of great significance in enhancing the stability and reliability of the power system.
At present, the main detection methods for transformer winding status are the short circuit impedance method, vibration signal method and FRA. FRA has higher sensitivity to transformer winding faults than the other methods [4], and it is also one of the most widely used methods for on-site detection. For the frequency response method, standards such as IEC 60076-18-2012 and C57.149-2012 have been developed, which state that the correlation coefficient can be used to determine the status of the winding [5,6]. In addition to the correlation coefficients mentioned in the standard, scholars at home and abroad have carried out a lot of research on feature extraction and fault identification methods for FRA results. In terms of feature extraction, scholars at home and abroad have carried out research from various aspects such as statistical features, waveform features and image features, for example: scholars at the University of Toronto proposed the vector matching method [7], which proved that the transformer status can be recognized by the frequency response curve obtained by fitting; scholars at the University of Tehran adopted the mean square deviation and standard deviation of amplitude-frequency curves for the judgment of winding state [8], and the results showed that this feature has a strong anti-interference ability against the external noise; scholars from the University of Queensland convert the frequency response curve into a Nyquist diagram [9], which explores and verifies the existence of a certain correspondence between the transformer winding state and the Nyquist diagram; scholars from the Southwest University binarize the image of the frequency response curve [10], and extract the feature coefficients by using the similarity of the image matrix to achieve the identification of the three kinds of common winding faults. In terms of fault diagnosis methods, the threshold comparison of eigenvalues was mainly used for evaluation and diagnosis at the beginning, and in recent years, with the development of artificial intelligence, machine learning methods have been used in the identification of transformer winding faults to improve accuracy, for example, scholars at the Southwest Jiaotong University used support vector machine (SVM) to analyze the area of the transformer crossover band as well as the centre-of-mass deviation, and to identify the state of the windings [11]; scholars at the Northwest A&F University used the Kernel Extreme Learning Machine (KELM) to identify the type and degree of faults in transformer winding deformation [12]. The scholars of the Amirkabir University of Technology proposed a model using artificial neural networks (ANN) for the identification of the type and extent of winding faults [13], but the model performed poorly in the identification of the extent of winding faults and the identification of axial displacement faults. In addition, the scholars of the Northwest A&F University and Amirkabir University of Technology used Random Forest (RF) [14] and K-Nearest Neighbors (KNN) [15] classifiers to identify the winding faults of the transformer, respectively. All of the above scholars use a single classifier to identify winding faults, and each classifier involves the selection of key parameters. The randomness of the key parameters will lead to the uncertainty of the model identification effect, which ultimately affects the accuracy and reliability of the algorithm in identifying winding faults. Meanwhile, the application of optimization algorithms in the field of power systems has made significant progress. For example, the decentralized stochastic recursive gradient method (DSRG) [16] achieves an efficient parallel solution in the optimized tidal current computation of multi-area power systems through the distributed stochastic gradient updating mechanism; the beluga whale optimization algorithm (BWO) [17] demonstrates an excellent global search capability in energy management and battery scheduling of microgrids by simulating the beluga whales’ group collaborative behaviors. However, the applicability of these methods in transformer fault diagnosis has not been fully explored: DSRG relies on the gradient information of the objective function, which makes it difficult to be directly applied to discrete parameter optimization; and BWO, although it excels in continuous-space optimization, is weakly adapted dynamically to high-dimensional, nonlinear parameter spaces.
To address the above problems, this paper applies the M-training algorithm to the fault identification of transformer windings by combining multiple classifiers to reduce the impact of parameter selection and using WSO to solve the parameter optimization problem of the original classifiers to improve the classification effect of each classifier. Secondly, the transformer fault simulation experimental platform is constructed to obtain the frequency response data under different winding fault types, and the M-training algorithm is trained and tested with the obtained frequency response data, comparing the differences between the M-training algorithm and the WSO-RF, SVM and KNN algorithm, and the results show that WSO differentiated M-training classification algorithm proposed in this paper works well.

2. M-Training Algorithm for Differentiated Classifiers

2.1. M-Training Algorithm Based on Differentiated Classifiers

2.1.1. Fundamentals of the M-Training Algorithm

There are N base classifiers in the M-training algorithm, denoted as ci, i = 1, 2, 3, … N, where N is a positive integer and N > 3. Let L denote all the labeled samples in the initial sample set, and |L| denote the total number of labeled samples. Let U denote all unlabeled samples in the initial sample set, and |U| denote the total number of unlabeled sample sets.
Assuming a training dataset containing m samples and a noise rate of η, the worst-case error rate ξ of the classifier satisfies the following equation:
m = c ξ 2 ( 1 2 η ) 2
where c is a constant, Equation (1) can be transformed to:
u = c ξ 2 = m ( 1 2 η ) 2
.
The sample complexity m in Equation (1) derives from the Probably Approximately Correct learning framework under label noise [18], where the term (1 − 2η)2 accounts for the degradation of learning efficiency due to mislabeled samples. Equation (2) reformulates this relationship to define the normalized constant u, which encapsulates the interplay between η and ξ
Use Li(t − 1) and Li(t) to denote the set of samples labeled as completed by the auxiliary classifier ci for the main classifier in the t − 1-th and t-th cycles, respectively, and the total number of main classifier training sample sets U in the t − 1-th and t-th cycles are |LLi(t − 1)| and |LLi(t)|, respectively.
Let ei(t) denote the maximum value of the classification error rate of the auxiliary classifier ci at the t-th cycle; each training refines the main classifier ci, and each cycle also changes the labeled and unlabeled sample sets. Let η denote the classification noise rate of L. The number of mislabeled samples in L is ηL|L|. The classification noise rate for the t-th round of the loop is:
η i ( t ) = η L | L | + e i ( t ) L i ( t ) L L i ( t )
Equation (3) computes the effective noise rate ηi(t) by combining the original noise and the auxiliary classifier’s noise. This weighted average ensures that the algorithm adapts to both historical and newly introduced label errors. For example, if the auxiliary classifier improves (ei(t) < ηL), the overall noise rate decreases, enabling more aggressive data expansion. Conversely, if ei(t) > ηL, subsampling is triggered to maintain robustness.
Substituting Equation (3) into Equation (2) yields:
u i ( t ) = m i ( t ) 1 2 η i ( t ) 2 = L L i ( t ) 1 2 η L | L | + e i ( t ) L i ( t ) L L i ( t )
According to Equation (2), in order to make ei(t) < ei(t − 1), ui(t) > ui(t − 1) needs to be satisfied:
L L i ( t ) 1 2 η L | L | + e i ( t ) L i ( t ) L L i ( t ) > L L i ( t 1 ) 1 2 η L | L | + e i ( t 1 ) L i ( t 1 ) L L i ( t 1 )
Assuming 0 ≤ ei(t − 1), ei(t) ≤ 0.5, Equation (5) can be simplified as:
0 < e i ( t ) e i ( t 1 ) < | L i ( t 1 ) | L i ( t ) < 1
The transition from Equation (5) to (6) formalizes the conditions for iterative refinement in M-training. By requiring the auxiliary classifier’s error rate to decrease (ei(t) < ei(t − 1)) while its labeled dataset grows (|Li(t)| > |Li(t − 1)|), Equation (6) ensures that each training cycle reduces noise and enhances model accuracy.
Since |Li(t)| may be much larger than |Li(t − 1)|, even if ei(t) is smaller than ei(t − 1), there will still exist ei(t)|Li(t)| larger than ei(t−1)|Li(t − 1)|, and the sub-sampling method proposed by the paper [19] is used to randomly sample Li(t). Define the integer Si to denote the number of Li(t) after sampling, and be able to let ei(t)|Li(t)| be greater than ei(t − 1)|Li(t − 1)| hold when Si satisfies the condition of the following equation:
S i = e i ( t 1 ) L i ( t 1 ) e i ( t ) 1
In addition, Li(t − 1) needs to satisfy the following condition to ensure that the size of Li(t) is still larger than |Li(t − 1)| after subsampling:
L i ( t 1 ) > e i ( t ) e i ( t 1 ) e i ( t )
Equations (7) and (8) formalize the subsampling mechanism. Equation (7) ensures that the retained samples Si do not amplify label noise, while Equation (8) guarantees dataset stability. For Equation (7), if Si < 0, subsampling is skipped, and all samples from Li(t − 1) are retained. Equation (8) requires that the classifier’s error rate improvement (ei(t) < ei(t − 1)is sufficient to justify expanding the labeled set. When Si ≤ 0, the algorithm retains all samples in Li(t − 1) to avoid data loss but imposes a confidence threshold (τ = 0.9) to filter low-probability predictions. This balances stability and noise tolerance. Skipping subsampling increases the overfitting by 4.8%, which highlights the importance of Equations (7) and (8) in maintaining robustness.
When the above conditions (6), (7), and (8) are satisfied, the unlabeled sample U is decided by the auxiliary classifier and added to the main classifier for the classifier adjustment work. Separately, the remaining classifiers are assumed to be the same classification method when cycling the above process for the main classifier until all the classifiers are no longer changed, and finally, the voting method will be taken for decision fusion. Each base classifier independently predicts the class label for a test sample. The majority class among the four classifiers’ predictions is selected as the final output. In cases of a tie, the algorithm prioritizes the prediction from the RF classifier with higher accuracy. This prioritization is based on empirical validation of individual classifier performance on the training set. Building on the foundational M-training framework [20], our method introduces differentiated classifiers and WSO-based optimization. We combine RF, SVM, and KNN to enhance robustness and adaptability. Further, WSO replaces manual parameter tuning, ensuring optimal performance across diverse fault scenarios.

2.1.2. Diversity of Base Classifiers

In order to eliminate the problem of lack of variability in the base classifiers of the traditional M-training algorithm, this paper uses multiple classifiers with large differences to be combined, and the specific reasons for the selection of classifiers are as follows:
In the original feature space, if the data is not linearly separable, i.e., it is not possible to completely separate different classes of data points by a linear hyperplane, the SVM can map the data to a higher dimensional space through a kernel function to make the original data linearly separable in this higher dimensional space [21]. Therefore, using SVM as a base classifier can improve the M-training algorithm’s ability to handle nonlinear data. However, SVM classifies the dataset by means of support vectors, which also leads to the fact that SVM is more sensitive to noise and outliers. RF is more robust to noise and outliers due to the integration of multiple decision trees [22], and RF is more suitable for dealing with large-scale datasets compared to SVM. Therefore, combining RF with SVM not only improves the algorithm’s ability to handle noise in data but also improves the M-training algorithm’s ability to handle data of different sizes.
SVM is, in principle, trying to find a globally optimal solution from a global point of view, which will ignore the local features of the data, whereas KNN is based on samples from the nearest neighbors, so it can adapt well to local changes in the data. Therefore, combining KNN with SVM allows the algorithm to deal with the problem from both a global perspective and observe the local information of the data.
In summary, there are large differences between RF, SVM, and KNN, and the advantages and disadvantages of the algorithms are complementary. Combining them can effectively improve the differences between the base classifiers of the M-training algorithm. Since RF has better results in transformer fault identification compared to SVM and KNN [23], this paper uses two RFs, one SVM and one KNN, as the base classifiers of the M-training algorithm, as shown in Figure 1.

2.2. Base Classifier Based on WSO

When SVM deals with linearly indivisible problems, it maps the data from the original space to the high-dimensional space through the kernel function and constructs the optimal classification surface in this space to realize data classification. Therefore, its kernel parameter δ directly affects the performance of the kernel function as well as the performance of SVM. The penalty factor C controls the confidence and error balance of the SVM in the feature space, thus optimizing the generalization ability of the model and affecting the classification results.
Random forest algorithm is an integrated learning method based on decision trees, which improves the accuracy and robustness of the model by constructing multiple decision trees and combining their predictions; the main parameters include mtry and ntrees, mtry represents the number of tree forks, too many forks may learn noise and outliers in the data and reduce the generalization ability of the model. ntrees represents the number of decision trees. Too large ntrees will lead to model redundancy and reduce the model estimation efficiency, while too small will reduce the model accuracy.
KNN is a classification method based on a distance metric, where the closest K samples are selected for classification by calculating the distance between test samples and training samples. The value of K determines the number of nearest neighbors that are considered when making a decision [24]. Smaller values of K make the model more sensitive to local features and may lead to overfitting; larger values of K may smooth the model but may ignore important local information.
For the key parameters in each of the above base classifiers, this paper adopts WSO, the core idea of which is to mimic the special hearing, olfactory and fish behavior of white sharks when tracking and foraging and mathematically model the hunting characteristics of the white sharks, searching for and exploiting each potential region in the space in order to achieve parameter optimization. The specific hyperparameters of the algorithm are configured as follows: population size of 50, maximum iterations of 100, and olfactory sensitivity coefficient of 0.8. These parameter settings are referenced from the convergence experiments in [24]. Detailed mathematical models and formulations of the algorithm can be found in [24], with the specific workflow illustrated in Figure 2.
While the WSO algorithm incurs a computational cost proportional to population size and iterations, its rapid convergence and parallelizable fitness evaluations mitigate practical overhead. In our experiments, WSO achieved stable optimization within 20 iterations, outperforming PSO and GA in both speed and solution quality. Further efficiency gains are possible through early termination and distributed computing.
The differences between the proposed method in this paper and the existing integrated learning methods are mainly reflected in the following two aspects: Unlike the stacking of homogeneous classifiers in traditional integrated methods, this study selects significantly different classifiers, such as globally optimized SVMH and locally sensitive KNN, which improves the robustness of the model through complementarity, whereas in traditional methods, the base classifiers are trained independently, and are weighted at the output layer only by the meta-classifiers or the stacking for simple weighting, which may lead to insufficient diversity of base classifiers. WSO is firstly applied to the tuning of the base classifier parameters, which solves the performance fluctuation problem caused by the randomness of parameters in the traditional integration method.

2.3. Feature Extraction

According to the circuit equivalence theory, the transformer winding can be equivalent to a two-port network consisting of resistive, capacitive and inductive elements at frequencies of 1 kHz and above. When the internal structure of the transformer winding changes, the electrical parameters of the equivalent circuit model will also change, which is manifested in the amplitude change of the resonance point on the frequency response curve and the frequency shift. In this paper, the correlation coefficient, mathematical statistical features and waveform statistical features in the IEC 60076-18-2012 standard [5], which are commonly used in the frequency response method to diagnose winding deformation, are used as the characteristic information of the frequency response curve.
Let X(k) and Y(k) be the data k = 0, 1, …, N − 1 under the amplitude-frequency response curves of the normal transformer and test transformer at 1~1000 kHz, respectively.
(1) The correlation coefficient R is calculated as follows:
Calculate the standardized variance D of the two series:
D x = 1 N k = 0 N 1 [ X ( k ) 1 N k = 0 N 1 X ( k ) ] 2
Calculate the covariance cxy of the two series:
c xy = 1 N k = 0 N 1 [ X ( k ) 1 N k = 0 N 1 X ( k ) ] × [ Y ( k ) 1 N k = 0 N 1 Y ( k ) ]
Calculate the normalized covariance coefficient LRxy of the two series:
L R x y = C x y D x D y
where Dx, Dy are the standardized variance of X(k), Y(k).
Calculate the correlation coefficient Rxy:
R x y = 10 1 L R x y < 10 10 lg ( 1 L R x y )   else
(2) Mathematical and statistical characteristics
Standard Deviation SD:
S D ( X , Y ) = i = 1 N ( X i Y i ) 2 N 1
Absolute Sum of Logarithmic Errors ALSE:
A L S E ( X , Y ) = 1 N ( i = 1 N 20 log 10 X i 20 log 10 Y i )
Mean Absolute Errors DABS:
D A B S ( X , Y ) = 1 N ( i = 1 N X i Y i )
Sum of Squared Relative Errors SSRE:
S S R E ( X , Y ) = 1 N ( i = 1 N ( Y i X i 1 ) 2 )
(3) Waveform statistical characteristics
Kurtosis (Ku):
K u = 1 N i = 1 N Z i Z ¯ i 4 1 N i = 1 N Z i Z ¯ i 2 2
Crest Factor (Cf):
C f = Z max R M S
Skewness (Sk):
S k = 1 N i = 1 N Z i Z ¯ i 3 1 N i = 1 N Z i Z ¯ i 2 3 2
where Zi is the amplitude value at frequency point i of the amplitude-frequency response curve, Z ¯ i is the average value of amplitude-frequency characteristic curve amplitude, RMS is the effective value of amplitude-frequency response curve amplitude, and Zmax is the maximum value of amplitude-frequency response curve amplitude.

2.4. Diagnostic Process Based on Differentiated M-Training Algorithm

The M-training algorithm based on different classifiers in this paper is shown in Figure 3, which includes two RF classifiers, one SVM classifier and one KNN classifier and the key parameters in the base classifiers are optimized by WSO, the pseudocode of the algorithm is shown in Appendix A.
Step1: Obtain the frequency response curves under different winding fault types to extract the characteristic information of the samples;
Step2: Use labeled samples to train the base classifier to identify unlabeled samples by auxiliary classifiers; when the classification results obtained by the auxiliary classifiers are highly consistent, the samples will be added to the set of labeled samples; if there are no unlabeled samples that need to be added to the set of labeled samples, the set of labeled samples will be updated by random sampling;
Step3: Train the main classifier using the updated labeled sample set;
Step4: Each base classifier acts as the main classifier in turn, repeat Step2 and stop training when all base classifiers are not changing;
Step5: The test sample is recognized, and the voting result of the base classifier is used as the final output.

3. Experimental Platform Construction and Model Training

In order to study the transformer winding fault identification method based on the frequency response method, this paper utilizes an experimental transformer with a voltage level of 10 kV, and the internal structure of the transformer is shown in Figure 4. From outside to inside, there are 16 cakes of high-voltage windings, 16 cakes of low-voltage windings and the iron core, and each cake of the high-voltage and low-voltage windings consists of 32 turns and 16 turns of coils, respectively, which are separated by pads to separate the wires from the cakes.

3.1. Fault Simulation

(1) Axial displacement faults (AD)
The axial displacement fault is specifically characterized by an overall shift in the height of the windings at the fault location, which is simulated in this paper by adding pads between the windings, as shown in Figure 5a. The degree of failure of axial displacement of transformer windings can be quantified by Equation (20) [25].
D F = Δ h H × 100 %
where Δh is the height of the axial displacement of the winding, i.e., the thickness of the pads when performing the fault simulation.
The transformer windings were simulated with AD of 1~5% with different fault levels, and the faults were set at three locations: the top (between 1 and 2 cakes), middle (between 8 and 9 cakes) and bottom (between 15 and 16 cakes) of the windings in the way shown in Table 1.
(2) Short circuit fault between cakes (SC)
For the fault simulation of SC, this paper uses alligator clip wires to directly connect the copper noses of different cakes and simulates different degrees of SC between cakes by shorting different numbers of cakes, which is realized as shown in Figure 5b, and different degrees of SC are simulated at the top, middle, and bottom of the winding, which is set up in the way as shown in Table 2.
(3) Bulging and warping faults (SCV)
When the transformer windings are twisted, bulging, and other deformations occur, the spacing between the wires and cakes is reduced, resulting in a larger longitudinal equivalent capacitance between the winding cakes [26]. The longitudinal equivalent capacitance of the model transformer can be obtained at the level of 100 pF based on the calculation method of the literature [26]. In this paper, different degrees of SCV are simulated by connecting capacitors ranging from 220 pF to 680 pF in parallel between the winding wires and cakes, which are implemented as shown in Figure 5c, and different degrees of SCV are simulated at the top, middle, and bottom of the windings, which are set up as shown in Table 3.

3.2. Test Results

Figure 6 shows the frequency response curve of AD under different fault degrees and fault locations; the amplitude of the curve in the frequency band 1~100 kHz is basically not fluctuating; the curve in the frequency band 100~500 kHz is shifted downward as a whole, and the degree of shifting becomes smaller with the increase of frequency; the curve in the frequency band 800~1000 kHz fluctuates markedly, which is specifically reflected in the decrease of the amplitude of the peaks and valleys of the high-frequency band; and it can be seen that with the aggravation of the degree of AD, the curve also tends to move to the low-frequency band. The detailed figure shows that with the increase in AD, the curve also shows a tendency to move to the low-frequency band.
Figure 7 shows the frequency response curves under different fault degrees and fault locations of SC, and the curves are shifted in the whole frequency range; the curve in the frequency band 1~100 kHz shows an upward shift, and the valley of the band at 10 kHz as well as the peak of the band at 80 kHz are shifted to the high-frequency band; the curve in the frequency band 100~500 kHz is upwardly shifted overall, and the extent of the shift increases with the increase of the degree of SC. The curve in the frequency band 100~500 kHz is shifted upward, and the degree of shift increases with the SC degree, which has a better resolution of the fault degree; the curve in the frequency band 500~1000 kHz fluctuates a lot, and there is a tendency to shift to the high-frequency band on the basis of the upward shift.
Figure 8 shows the frequency response curves under different fault degrees and fault locations of the winding SCV; when SCV occurs, the overall amplitude of the frequency response curve increases, the upward translation, the frequency response curve moves toward the low-frequency band, and with the increasing degree of fault, the amplitude of the move toward the low-frequency band increases.
The distinct impacts of AS, SC, and SCV on frequency response curves arise from their unique modifications to the winding’s equivalent electrical parameters. AD primarily alters longitudinal capacitance and mutual inductance, manifesting as a downward shift in mid-frequency resonance. SC reduces inter-turn impedance, causing global upward shifts and high-frequency noise. SCV increases localized capacitance, leading to amplitude elevation and resonance damping. In summary, the frequency response curve presents a certain degree of differentiation for different fault types and fault degrees, but when applied in the field, it is impossible to accurately identify transformer winding faults only through the trend of waveform changes. Therefore, it is necessary to quantify the changes in the frequency response curve, analyze the subtle changes in the frequency response curve through machine learning methods, and accurately identify the winding operation status.

3.3. Model Training Results

In this paper, a total of 1200 sample data were obtained for different fault types, fault locations, and fault levels. According to the failure type, there are 30 normal state samples, 450 AD samples, 270 SC samples and 450 SCV samples in the failure samples. According to the degree of failure, there are 30 normal state samples, 270 samples with 1~3% failure degree, and 180 samples with 4% and 5% failure degree in the fault samples. In order to reduce the complexity of the base classifiers and improve the generalization ability of the algorithms, this paper uses two M-training algorithms to identify the fault types in terms of degree, respectively. The different types of samples under both divisions are divided into training and test sets in the ratio of 7:3; 3/7 samples in the training set are labeled samples, and 4/7 samples are unlabeled samples.
WSO is used to optimize each base classifier to obtain the best key parameters, as shown in Table 4. Meanwhile, the convergence speed and final optimization effect of WSO with Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) are compared, as shown in Figure 9. It can be seen that compared with PSO and GA, the convergence of WSO for both fault type identification base classifiers and fault degree identification base classifiers reaches a lower error rate within 20 iterations, and the classification error rates are lower than those of the comparison algorithms by 2.97–15.41%. This indicates that WSO has better convergence speed and optimization effect for each base classifier.
Table 5 and Table 6 demonstrate the recognition effectiveness of the M-training algorithm with or without optimization of the base classifier using WSO, including the recognition effectiveness for fault type and fault degree. In terms of fault type and degree recognition, optimizing the base classifier can improve the total recognition accuracy by 8.92% and 8.17%, respectively. To validate the statistical significance of the accuracy improvements, we conducted paired t-tests and McNemar’s tests across 10 independent runs [27]. For fault type recognition, the mean improvement of 8.92% yielded t(9) = 12.74, p < 0.001, with a 95% confidence interval of 7.21–10.63%. Similarly, for severity recognition, the 8.17% improvement showed t(9) = 9.85, p < 0.001, and a confidence interval of 6.05–10.29%. McNemar’s tests further confirmed the robustness of these gains (χ2 > 14.2, p < 0.001). These results conclusively demonstrate that the WSO-optimized M-training algorithm achieves statistically and practically significant enhancements in diagnostic accuracy.
In order to verify whether the M-training algorithm has the ability to synthesize the advantages of each base classifier, the M-training algorithm and each individual base classifier are trained using the training set respectively, and the training set is labeled samples when training the individual base classifiers, and the effect of each classifier is tested through the test set, and the accuracies of the fault type recognition and fault degree recognition are shown in Table 7 and Table 8.
Redundancy from two RF classifiers reduces the risk of overfitting to noise or outliers. By aggregating predictions from slightly divergent RF models, the ensemble becomes less sensitive to perturbations in specific subsets of training data. The RF configuration improves fault-type accuracy by 1.28%, and fault degree accuracy by 0.94% improves fault-type accuracy compared to the RF classifiers. The optimized RFs exhibit complementary error patterns, ensuring that misclassifications from one RF are often corrected by the other.
The accuracy of the M-training algorithm for fault type recognition is above 98%, and the accuracy of fault degree recognition is above 97%, which is higher than that of the base classifiers, which shows that the M-training algorithm is able to make the ability of multiple classifiers complement each other so that the base classifiers can effectively play their respective advantages, which makes the integrated model have higher fault recognition accuracy than a single model. The M-training algorithm is able to complement the abilities of multiple classifiers, allowing the base classifiers to effectively utilize their respective advantages, resulting in a higher fault recognition accuracy of the combined model than a single model.

3.4. Live Case Study

This paper takes a 110 kV power transformer of Yunnan Power Grid as the main research object, as shown in Figure 10, and the specific parameters of this transformer are shown in Table 9. Due to the presence of a substation in the vicinity, the SNR for ambient noise at the site is approximately 15 dB. The high and medium voltage windings are in star wiring mode, and the low voltage windings are in triangular wiring mode. The high, medium and low voltage windings are tested separately, and the test program for the low voltage side is A injection B test, B injection C test and C injection A test; the test program for the high and medium windings is: take the O point as the signal injection point, and collect signals in the three phases of A, B and C, and name the collected signals through the file naming provisions in the IEC 60076-18-2012 [5].
A total of nine signals are obtained from the field test, and in this paper, the M-training algorithm is verified using the signals from the MV side of the three-phase windings of A, B and C. The test results are shown in Figure 11.
In order to determine the specific fault condition of the winding, the characteristic quantity mentioned before was calculated for the frequency response curve of the winding, which was inputted into the M-training algorithm, and the result showed that 5% AD appeared in the medium-voltage side of the winding in the C-phase of the winding.
In order to verify the validity of the above analysis method, the transformer lifting cover overhaul was carried out on-site. Although ambient noise (SNR ≈ 15 dB) and load cycles (40–85%), The overhaul results still show that the upper part of the C-phase winding of the power transformer in the medium voltage side has a heavy AD in 5 cakes of windings, as shown in Figure 12. The overhaul results are consistent with the analysis of the M-training algorithm, which verifies the effectiveness of the model in this paper.

4. Conclusions

This paper carries out research on transformer winding fault identification, proposes a transformer fault identification method based on the WSO-differentiated M-training algorithm, acquires data based on a laboratory platform, trains the established evaluation model, and, finally, applies the proposed fault identification method to the field transformer for analysis and validation. The conclusions are obtained as follows:
1.
Optimization of the base classifiers using WSO can improve the accuracy of the M-training algorithm by 8.92% and 8.17% in terms of fault type and degree identification.
2.
The WSO-differentiated M-training algorithm proposed in this paper has an accuracy of 98.33% and 97.17% for the identification of transformer fault type and fault degree, respectively, which is a significant improvement compared to the identification effect of a single classifier, indicating that the method proposed in this paper is able to accurately diagnose transformer winding faults.
3.
In the field test of the 110 kV transformer, under the actual noise and load variation, it was accurately judged that 5% AD appeared in the medium voltage side of winding phase C, and the overhaul results were consistent with the analysis results of the M-training algorithm, which verified the validity of the model in this paper.

Author Contributions

Conceptualization, G.Q. and K.Y.; methodology, J.H. and D.W.; validation, K.Y., J.H. and H.L.; formal analysis, S.H. and D.W.; investigation, D.Z.; resources, G.Q.; data curation, K.Y. and D.W.; writing—original draft preparation, W.D.; writing—review and editing, W.D.; visualization, H.W.; supervision, H.W.; project administration, K.Y.; funding acquisition, G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Electric Power Research Institute of Yunnan Power Grid (Principle and system development of online monitoring of transformer winding faults under load-side multi-signal source excitation,·YNKJXM20222300).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on reasonable request.

Conflicts of Interest

Guochao Qian, Kun Yang, Jin Hu, Hongwen Liu, Shun He, Dexu Zou, Weiju Dai, and Haozhou Wang are affiliated with Electric Power Research Institute of Yunnan Power Grid. The authors declare no conflicts of interest.

Appendix A

Pseudocode for the differentiated M-training classification optimized by the WSO algorithm:
Input: Labeled dataset L; Unlabeled dataset U; Base classifiers: RF1, RF2, SVM, KNN; WSO parameters: population size, max iterations; Voting threshold τ
Output: Final fault type/severity prediction for test samples
1. WSO-based Parameter Optimization
For each base classifier ci ∈ { RF1, RF2, SVM, KNN }:
a. Initialize the WSO population with random hyperparameters.
b. While iteration < max_iter: i. Evaluate fitness; ii. Update WSO positions using hunting and foraging mechanisms.
c. Select optimal hyperparameters θi* for ci.
2. Differentiated M-training Process
Initialize: L_pool = L, U_pool = U; Trained classifiers 𝒞 = {c1θ1*, c2θ2*, c3θ3*, c4θ4*}
Repeat until all ci stabilize (no changes in the labeled set):
For each ci𝒞 acting as the main classifier:
a. Label Propagation: i. Use auxiliary classifiers 𝒞\{ci} to predict labels for U_pool; ii. Select samples where ≥ τ auxiliary classifiers agree; iii. Add these samples to L_pool; remove them from U_pool.
b. Subsampling (if no new labels are added): Randomly select Si samples from L_pool.
c. Retrain ci using updated L_pool.
3. Final Voting Mechanism
For a test sample x:
a. Obtain predictions {y1, y2, y3, y4} from 𝒞.
b. Compute majority vote: If a tie occurs, Prioritize predictions from RF1/RF2 (higher accuracy).
c. Return final prediction y_final.

References

  1. Hamzeh, M.; Vahidi, B. The Impact of Cyber Network Configuration on the Dynamic-thermal Failure of Transformers Considering Distributed Generator Controller. Int. J. Electr. Power Energy Syst. 2022, 137, 107786. [Google Scholar] [CrossRef]
  2. Yao, H.; Xu, Y.; Guo, Q.; Chen, S.; Lu, B.; Huang, Y. Study on Transformer Fault Diagnosisbased on Improved Deep Residual Shrinkage Network and Optimized Residual Variational Autoencoder. Energy Rep. 2025, 13, 1608–1619. [Google Scholar] [CrossRef]
  3. Zhao, X.; Yao, C.; Zhang, C.; Abu-Siada, A. Toward Reliable Interpretation of Power Transformer Sweep Frequency Impedance Signatures: Experimental Analysis. IEEE Electr. Insul. Mag. 2018, 34, 40–51. [Google Scholar] [CrossRef]
  4. Zhang, H.; Zhang, H.; Ma, Q.; Han, H.; Wang, S. Frequency Response Signature Analysis for Winding Mechanical Fault Detection of Power Transformer Using Sensitivity Method. Int. J. Appl. Electromagn. Mech. 2019, 61, 593–603. [Google Scholar] [CrossRef]
  5. IEC 60076-18:2012; Power Transformers—Part 18: Measurement of Frequency Response. IEC: Geneva, Switzerland, 2012.
  6. IEEE Standard C57.149-2012; IEEE Guide for the Application and Interpretation of Frequency Response Analysis for Oil-Immersed Transformers. IEEE: New York, NY, USA, 2013.
  7. Gustavsen, B.; Semlyen, A. Rational Approximation of Frequency Domain Responses by Vector Fitting. IEEE Trans. Power Deliv. 1999, 14, 1052–1061. [Google Scholar] [CrossRef]
  8. Samimi, M.; Tenbohlen, S.; Akmal, A.; Mohseni, H. Dismissing Uncertainties in the FRA Interpretation. IEEE Trans. Power Deliv. 2018, 33, 2041–2043. [Google Scholar] [CrossRef]
  9. Yousof, M.; Ekanayake, C.; Saha, T. Frequency Response Analysis to Investigate Deformation of Transformer Winding. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 2359–2367. [Google Scholar] [CrossRef]
  10. Zhao, Z.; Yao, C.; Saha, T.; Li, C.; Islam, S. Detection of Power Transformer Winding Deformation Using Improved FRA Based on Binary Morphology and Extreme Point Variation. IEEE Trans. Ind. Electron. 2018, 65, 3509–3519. [Google Scholar] [CrossRef]
  11. Zhou, L.; Jiang, J.; Zhou, X.; Wu, Z.; Lin, T.; Wang, D. Detection of Transformer Winding Faults Using FRA and Image Features. IET Electr. Power Appl. 2020, 14, 972–980. [Google Scholar] [CrossRef]
  12. Wang, G.; Qiu, S.; Xie, F.; Luo, T.; Song, Y.; Wang, S. Diagnosing Fault Types and Degrees of Transformer Winding Combining FRA Method With SOA-KELM. IEEE Access 2024, 12, 50287–50299. [Google Scholar] [CrossRef]
  13. Ghanizadeh, A.J.; Gharehpetian, G.B. ANN and Cross-correlation Based Features for Discrimination between Electrical and Mchanical Defects and Their Localization in Transformer Winding. IEEE Trans. Dielectr. Electr. Insul. 2014, 21, 2374–2382. [Google Scholar] [CrossRef]
  14. Wang, S.; Qiu, S.; Xie, F.; Yang, S.; Yu, K.; Li, T. Diagnosis of AD and DSV Winding Faults Based on FRA Method and Random Forest Algorithm. In Proceedings of the 4th IEEE International Conference on Electrical Materials and Power Equipment (ICEMPE), Shanghai, China, 7–10 May 2023. [Google Scholar] [CrossRef]
  15. Behkam, R.; Moradzadeh, A.; Karami, H.; Naderi, M.S.; Mohammadi-Ivatloo, B.; Gharehpetian, G.B.; Tenbohlen, S. Mechanical Fault Types Detection in Transformer Windings Using Interpretation of Frequency Responses via Multilayer Perceptron. J. Oper. Autom. Power Eng. 2023, 11, 11–21. [Google Scholar] [CrossRef]
  16. Hussan, U.; Wang, H.; Ayub, M.A.; Rasheed, H.; Majeed, M.A.; Peng, J.; Jiang, H. Decentralized Stochastic Recursive Gradient Method for Fully Decentralized OPF in Multi-Area Power Systems. Mathematics 2024, 12, 3064. [Google Scholar] [CrossRef]
  17. Ayub, M.A.; Hussan, U.; Rasheed, H.; Liu, Y.; Peng, J. Optimal energy management of MG for cost-effective operations and battery scheduling using BWO. Energy Rep. 2024, 12, 294–304. [Google Scholar] [CrossRef]
  18. Kearns, M.; Li, M. Learning in the Presence of Malicious Errors. Siam. J. Comput. 1993, 22, 807–837. [Google Scholar] [CrossRef]
  19. Sally, G.; Yan, Z. Enhancing Supervised Learning with Unlabeled Data. In Proceedings of the Seventeenth International Conference on Machine learning (ECML 2000), Stanford, CA, USA, 29 June–2 July 2000. [Google Scholar]
  20. Pengfei, J.; Tailai, H.; Shukai, D.; Lingpu, G.; Jia, Y.; Lidan, W. A Novel Semi-Supervised Electronic Nose Learning Technique: M-Training. Sensors 2016, 16, 370. [Google Scholar] [CrossRef]
  21. Efeoglu, E.; Tuna, G. Machine Learning for Predictive Maintenance: Support Vector Machines and Different Kernel Functions. J. Mach. Manuf. Reliab. 2022, 51, 447–456. [Google Scholar] [CrossRef]
  22. Xiong, W.; Yu, G.; Ma, J.; Liu, S. A Novel Robust Adaptive Subspace Learning Framework for Dimensionality Reduction. Appl. Intell. 2024, 54, 8939–8967. [Google Scholar] [CrossRef]
  23. Shabani, S.; Samadianfard, S.; Sattari, M.T.; Mosavi, A.; Shamshirband, S.; Kmet, T.; Varkonyi-Koczy, A.R. Modeling Pan Evaporation Using Gaussian Process Regression K-Nearest Neighbors Random Forest and Support Vector Machines; Comparative Analysis. J. Mach. Manuf. Reliab. 2020, 11, 66. [Google Scholar] [CrossRef]
  24. Ali, A.K. An Optimal Design for An Automatic Voltage Regulation System Using a Multivariable PID Controller Based on Hybrid Simulated Annealing-White Shark Optimization. Sci. Rep. 2024, 14, 30218. [Google Scholar] [CrossRef]
  25. Hashemnia, N.; Abu-Siada, A.; Islam, S. Improved Power Transformer Winding Fault Detection Using FRA Diagnostics? Part 1: Axial Displacement Simulation. IEEE Trans. Dielectr. Electr. Insul. 2015, 22, 556–563. [Google Scholar] [CrossRef]
  26. Abu-Siada, A.; Hashemnia, N.; Islam, S.; Masoum, M.A.S. Understanding Power Transformer Frequency Response Analysis Signatures. IEEE Electri. Insul. Mag. 2013, 29, 48–56. [Google Scholar] [CrossRef]
  27. Kebalepile, M.M.; Chakane, P.M. Commonly Used Statistical Tests and Their Application. S. Afr. J. Anaesth 2022, 28, S80–S84. [Google Scholar] [CrossRef]
Figure 1. Structure of the M-training algorithm.
Figure 1. Structure of the M-training algorithm.
Energies 18 02290 g001
Figure 2. Flowchart of white shark optimization algorithm.
Figure 2. Flowchart of white shark optimization algorithm.
Energies 18 02290 g002
Figure 3. Flowchart of M-training algorithm.
Figure 3. Flowchart of M-training algorithm.
Energies 18 02290 g003
Figure 4. Internal structure of the transformer.
Figure 4. Internal structure of the transformer.
Energies 18 02290 g004
Figure 5. Transformer winding fault simulation.
Figure 5. Transformer winding fault simulation.
Energies 18 02290 g005
Figure 6. Frequency response curve of transformer winding axial displacement faults.
Figure 6. Frequency response curve of transformer winding axial displacement faults.
Energies 18 02290 g006
Figure 7. Frequency response curve of transformer winding short circuit fault between cakes.
Figure 7. Frequency response curve of transformer winding short circuit fault between cakes.
Energies 18 02290 g007
Figure 8. Frequency response curve of transformer winding bulging and warping faults.
Figure 8. Frequency response curve of transformer winding bulging and warping faults.
Energies 18 02290 g008
Figure 9. Iterative curves of error rate for different algorithms optimized base classifiers.
Figure 9. Iterative curves of error rate for different algorithms optimized base classifiers.
Energies 18 02290 g009
Figure 10. 110 kV transformer field test.
Figure 10. 110 kV transformer field test.
Energies 18 02290 g010
Figure 11. Frequency response curve of the medium voltage side of the transformer.
Figure 11. Frequency response curve of the medium voltage side of the transformer.
Energies 18 02290 g011aEnergies 18 02290 g011b
Figure 12. Hanger overhaul results.
Figure 12. Hanger overhaul results.
Energies 18 02290 g012
Table 1. Fault settings for axial displacement of windings.
Table 1. Fault settings for axial displacement of windings.
Ordinal
Number
Thread Cake
Position
Degree of
Displacement/%
Ordinal
Number
Thread Cake
Position
Degree of
Displacement/%
11, 2198, 94
21, 22108, 95
31, 231115, 161
41, 241215, 162
51, 251315, 163
68, 911415, 164
78, 921515, 165
88, 93
Table 2. Fault settings for short circuits between cakes of the windings.
Table 2. Fault settings for short circuits between cakes of the windings.
Ordinal
Number
Thread Cake
Position
Ordinal
Number
Thread Cake
Position
11, 266, 9
21, 3716, 15
31, 4816, 14
46, 7916, 13
56, 8
Table 3. Fault settings for bulging and warping of windings.
Table 3. Fault settings for bulging and warping of windings.
Ordinal
Number
Thread Cake
Position
Capacitance/pFOrdinal
Number
Thread Cake
Position
Capacitance/pF
11, 210098, 9470
21, 2220108, 9680
31, 23301115, 16330
41, 24701215, 16470
51, 26801315, 16680
68, 91001415, 16100
78, 92201515, 16220
88, 9330
Table 4. Parameters of the base classifier after optimization.
Table 4. Parameters of the base classifier after optimization.
ModelBase ClassifierOptimization Parameters
Fault Recognition
M-training
RF1ntrees49
mtry7
RF2ntrees58
mtry7
SVMδ1.3779
C1.4793
KNNK23
Degree Recognition
M-training
RF1ntrees51
mtry7
RF2ntrees47
mtry7
SVMδ3.2069
C10.3954
KNNK27
Table 5. Fault type recognition accuracy.
Table 5. Fault type recognition accuracy.
M-TrainingAccuracy/%
NormalADSCSCVAverage
Pre-optimization10091.5586.2988.4489.41
Post-optimization10098.0098.5198.4498.33
Table 6. Fault level recognition accuracy.
Table 6. Fault level recognition accuracy.
M-TrainingAccuracy/%
Normal1%2%3%4%5%Average
Pre-optimization10087.7885.1991.1189.4491.1189.00
Post-optimization10095.9397.4198.1596.1197.7897.17
Table 7. Fault type recognition accuracy for each classifier.
Table 7. Fault type recognition accuracy for each classifier.
SorterAccuracy/%
NormalADSCSCVAverage
M-training10098.0098.5198.4498.33
RF10093.5593.3391.3392.83
RFs10094.8994.2793.1694.11
SVM10089.5591.8586.2289.08
KNN10083.1191.4879.3383.91
Table 8. Fault level recognition accuracy for each classifier.
Table 8. Fault level recognition accuracy for each classifier.
SorterAccuracy/%
Normal1%2%3%4%5%Average
M-training10095.9397.4198.1596.1197.7897.17
RF10091.8594.4493.7090.5691.6792.83
RFs10093.7895.1694.5993.0692.2893.77
SVM10090.3789.6387.0489.4488.3389.25
KNN10081.8580.7484.8183.8980.5682.83
Table 9. 110 kV power transformer parameters.
Table 9. 110 kV power transformer parameters.
ParametersData
Rated Capacity/kVA40,000
Linkage Group LabelingYNyn0d11
Rated Voltage/kV1103510
Rated Current/A36311424000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qian, G.; Yang, K.; Hu, J.; Liu, H.; He, S.; Zou, D.; Dai, W.; Wang, H.; Wang, D. Fault Diagnosis Method for Transformer Winding Based on Differentiated M-Training Classification Optimized by White Shark Optimization Algorithm. Energies 2025, 18, 2290. https://doi.org/10.3390/en18092290

AMA Style

Qian G, Yang K, Hu J, Liu H, He S, Zou D, Dai W, Wang H, Wang D. Fault Diagnosis Method for Transformer Winding Based on Differentiated M-Training Classification Optimized by White Shark Optimization Algorithm. Energies. 2025; 18(9):2290. https://doi.org/10.3390/en18092290

Chicago/Turabian Style

Qian, Guochao, Kun Yang, Jin Hu, Hongwen Liu, Shun He, Dexu Zou, Weiju Dai, Haozhou Wang, and Dongyang Wang. 2025. "Fault Diagnosis Method for Transformer Winding Based on Differentiated M-Training Classification Optimized by White Shark Optimization Algorithm" Energies 18, no. 9: 2290. https://doi.org/10.3390/en18092290

APA Style

Qian, G., Yang, K., Hu, J., Liu, H., He, S., Zou, D., Dai, W., Wang, H., & Wang, D. (2025). Fault Diagnosis Method for Transformer Winding Based on Differentiated M-Training Classification Optimized by White Shark Optimization Algorithm. Energies, 18(9), 2290. https://doi.org/10.3390/en18092290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop