Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling

Taha, Ibrahim B. M.

doi:10.3390/electronics12112405

Open AccessArticle

Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling

by

Ibrahim B. M. Taha

Department of Electrical Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

Electronics 2023, 12(11), 2405; https://doi.org/10.3390/electronics12112405

Submission received: 7 April 2023 / Revised: 20 May 2023 / Accepted: 22 May 2023 / Published: 25 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

The transformer health index (HI) concept has been used as an important part of management resources and is implemented for the state assessment and ranking of Power transformers. The HI state is estimated based on many power transformer oil parameters. However, the main problem in the HI procedure as a diagnostic method is the presence of routine measurements and accurate test results. The power transformer HI prediction is carried out in this work using 1361 dataset samples collected from two different utilities. The proposed model is used to predict and diagnose the HI state of the power transformer by using a convolutional neural network (CNN) approach. The imbalance between the training dataset sample classes produces a good prediction of the class with a major number of samples while a low detection of the class has a minor number of samples. The oversampling approach is used to balance the training samples to enhance the prediction accuracy of the classification methods. The proposed CNN model predicts the HI of the power transformers after applying the oversampling approach to the training dataset samples. The results obtained with the proposed CNN model are compared with those obtained with the optimized machine learning (ML) classification methods with the superiority of the CNN results. Feature reductions are applied to minimize testing time, effort, and costs. Finally, the proposed CNN model is checked with uncertain noise in full and reduced features of up to ±25% with a good prediction diagnosis of the power transformer HI.

Keywords:

transformer health index; transformer tests; CNN; ML classification methods

1. Introduction

In the last decades, electrical power generation resources have increased rapidly. The electrical power systems include different electrical power resources, especially renewable energy resources. The electrical power generation resources are far from the load centers for different reasons. The power transformer is one of the more important pieces of equipment in energy transmission systems. In the case of power transformer failures, the utilities will be subject to major economies such as loss of revenue. An electrical shortage at the end users causes the shutdown of industries, halting production, and worsening redundancy [1,2]. Therefore, transformer asset management has been extensively adopted to accomplish and suddenly avoid failures.

Lately, the power transformer health index (HI) state has become a suitable tool to combine current information about the power transformer. This information includes transformer operating observations, field tests, inspections, and laboratory testing [2]. The laboratory tests include three main tests. The first test is the dissolved gas analysis (DGA) that depends on seven gases (hydrogen (H₂), methane (CH₄), acetylene (C₂H₂), ethylene (C₂H₄), ethane (C₂H₆), carbon dioxide (CO₂), and carbon monoxide (CO)). The second test is the oil quality test (OQ), which depends on six factors of the transformer oil (the oil dielectric strength or breakdown voltage (BD), oil interfacial tension (IFT), oil color, oil humidity (Mois.), oil insulation dissipation factor (DF), and oil acidity (Acid.)). The third test is the depolarization (DP) test, which depends on the furfural content of the transformer oils. These three tests’ output results are collected to evaluate the overall value of the power transformer HI state. The power transformer HI state is used for planning routine maintenance that affects the transformer’s age and end of life [1,2].

The main four approaches are introduced to evaluate the power transformer HI state. These four approaches are the scoring and ranking method [3,4,5,6]; the combination of scoring, ranking, and tier method [7,8]; matrices [9]; and the multi-features factor assessment model [10]. The scoring and ranking method is the most famous for evaluating the power transformer HI state. In this approach, to assess the HI state, the most common parameters are the DGA, OQ, and DP tests. The first step includes determining each test’s HI code or factor (HIC). The HIC for DGA, OQ, and DP are then standardized using the scoring and weighting values (4, 3, 2, 1, and 0). Finally, the HIC can be used for calculating the final value of the power transformer HI state using a constant weighting factor (

K_{j}

). The main drawback of these approaches is the requirement for high numbers of features from 24 to 27, which requires high cost, high effort, and high test times.

Some researchers have used the artificial intelligence approach for predicting the power transformer HI state value [11,12,13,14,15,16]. In [11], a hybrid fuzzy-logic support vector machine approach (FLSVM) is used for indicating the HI of in-service transformers based on transformer test data. The FLSVM is used to find the relation between the output results of the insulation system test results and the output HI state value. It deals with the imbalanced dataset distribution due to small samples of the ‘Poor’ stats. The main drawback of this work is the accuracy of predicting the HI state, especially with the ‘Poor’ states. The researchers in [12] presented a general cost-effective ANN model to predict the power transformer HI state with approximately 95% accuracy with subset input features. The ANN model is tested with another dataset collected from another utility company with about 89% predicting accuracy.

The main drawbacks are low signifying accuracy with new data and not presenting any results about the predicting accuracy with the majority and minority classes. In [13], an assessment model of the transformer HI is developed based on the Neural-Fuzzy (NF) technique. Two systems of datasets (in-service assessment dataset and Mont Carlo Simulation (MCS) dataset) are used for the training process of the NF model. The results illustrate the high predicting accuracy of the NF model using the MCS dataset compared with that of scoring methods. Still, it had bad detecting accuracy with the in-service assessment dataset. The main drawbacks are low detecting accuracy and there is no solution to bad data distribution between the different HI state classes. In [14], a machine learning (ML) model was presented for predicting the power transformer HI state. It used ML techniques such as ANN, decision tree, support vector machine, k-nearest classification, and random forest (RF) classification methods. The ML predicting method, especially the RF model, had high predicting accuracy for detecting the HI state. Moreover, feature-reduction techniques were used to reduce the number of input features in the ML models. However, the main drawback of this model was its bad detecting accuracy of the minority class (‘Poor’ state) and it was dealing with the bad distribution between classes in the training dataset. In [15], a fuzzy evidence fusion was presented to predict the transformer condition. It introduced a more detailed fuzzy model for predicting the transformer condition state. The main drawback of this model is the small number of cases that confirmed the accuracy of the suggested model (only 39 samples), with only 84.6% predicting accuracy. In [16], four optimized machine learning (ML) models were presented for predicting the transformer HI states. It used 1365 dataset samples collected from two different electric utilities. High predicting accuracy was obtained with the four ML models (95.9% with the ensemble classification model (EN)). Feature reduction with the MRMR technique was used to reduce the input features to only eight features with good accuracy, especially with the EN model (95%). The drawbacks of this model are that the predicting model is a two-stage model that requires a lot of time and effort for model building, low detecting accuracy with minority class state ‘Poor’, and there is no dealing with bad dataset distribution between classes.

The main drawbacks of the previous works of suggested predicting approaches of the power transformers HI are the low overall predicting accuracy, particularly the low predicting accuracy of the minor class of the HI states. The minor class is the “Poor” class, which is very important for predicting the transformer states, as the bad prediction of the “Poor” class leads to the fast failure of the power transformers. Therefore, a true prediction of the “Poor” class is very important for the continued operation of the transformers and the maintenance procedure in future steps to reduce power transformers’ failure rates in future periods.

This work proposes a new CNN model for power transformer HI state prediction. The proposed model is built using the output results of the DGA, OQ, and DP as inputs to the CNN model that is used to predict the final power transformer HI state. The imbalance between classes in the training data produces the tendency of the trained model toward the class with the majority number of samples, which weakens the prediction of the CNN model to the class with the minority number of samples. An oversample generator is suggested to generate new samples for the minority class to equal the number of samples for different classes. The prediction of the proposed CNN model is enhanced after applying the oversampling process to the training set. The overall prediction accuracy is improved from 89.92% without the oversampling process to 98.52% after applying the oversampling process to the training dataset.

The main contributions of this work are the enhancement of predicting the accuracy of the minor transformer HI state (which is very low in most previous works) and the overall accuracy using the CNN model and the suggested oversampling approach. Different feature-selection techniques are used to reduce the input features to decrease costs, effort, and time of testing. Five feature-selection approaches are used with the effectiveness of the ReliefF technique to determine the most important parameters used in predicting the HI state with the proposed CNN model. The robustness of the proposed CNN model is checked by applying the uncertainty to the input dataset (± up to 25%) that is inserted into the CNN model with good predicting accuracy to the CNN model. The effectiveness of the proposed CNN model is checked by comparing its results with different optimized machine learning approaches. Furthermore, the accuracy of the proposed CNN model for predicting the HI state is compared with recently published works with the efficacy of the proposed CNN model.

This work is organized into four sections. Section 2 covers the mathematical analysis and system model, which consists of three subsections: power transformer HI state calculations, suggested CNN model and solution procedure, and suggested oversampling technique. The results and outcomes are presented in Section 3, which consists of three sections: Results and Discussions consist of subsections: oversampling approach, reduced-feature results, and effectiveness of the CNN model. The conclusions are presented in Section 4.

2. Mathematical Analysis and System Model

2.1. Power Transformer HI State Calculations

The power transformer HI state is generally known as a tool that combines operation state, field tests, and laboratory testing results. The mixture of the results is transformed into an objective and measurable index, providing a complete transformer health assist [1,2]. HI plays a significant role in evaluating the condition and monitoring the transformer’s health. The power transformer HI state is an excellent index considering capital investment, asset maintenance cost, and operating maintenance. Figure 1 illustrates the three stages of the HI state concept.

The three stages are input data, mathematical technique/algorithm for HI evaluation, and decision output. The output decision can be maintenance, repair and upgrades, replacement, monitoring, and contingency control based on the power transformer’s value or the state of HI.

Some algorithms evaluate the power transformer HI state [1,2,3,4,5,6,7,8,9,10,16]. The Naderian et al. [3] is considered the most common technique used to assess the HI state based on different tests of the transformer oils. The transformer tests are dissolved gas analysis (DGA), oil quality (OQ), depolarization test (DP), and load factor test (LF). In this work, the HI state is built based on the first three tests using Naderian et al.

2.1.1. DGA Test

In the DGA test, seven dissolved gasses in the transformer oil are used as an input to obtain the output DGA index/factor (

{D G A}_{i}

). These gases are hydrogen (H₂), methane (CH₄), acetylene (C₂H₂), ethylene (C₂H₄), ethane (C₂H₆), carbon dioxide (CO₂), and carbon monoxide (CO). The

{D G A}_{i}

can be estimated as follows [3,16]:

{D G A}_{i} = \frac{\sum_{i = 1}^{7} S_{i} \times W_{i}}{\sum_{i = 1}^{7} W_{i}}

(1)

where

S_{i}

is the scoring factor that has a value of 1, 2, 3, 4, 5, or 6 according to the level of the dissolved gasses. For example, for H2,

S_{1}

= 1 when its value is less than 100 ppm and equals 2 for the 100 to 200 ppm range (see Appendix A).

W_{i}

is the weighting factor. The

W_{i}

can be evaluated as follows [3,16]:

W_{i} = \{\begin{matrix} 1, f o r C O a n d C O 2 \\ 3, f o r C H 4, C 2 H 6 a n d C 2 H 4 \\ 5, f o r C 2 H 2 \\ 2, f o r H 2 \end{matrix}

(2)

The transformer state condition according to the

{D G A}_{i}

index can be converted to code that is used later in evaluating the HI state value as follows [3,16]:

{D G A}_{C o d e} = \{\begin{matrix} 4, f o r {D G A}_{i} \leq 1.2 \\ 3, f o r {1.2 < D G A}_{i} \leq 1.5 \\ 2, f o r {1.5 < D G A}_{i} \leq 2.0 \\ 1, f o r {2.0 < D G A}_{i} \leq 3.0 \\ 0, f o r {D G A}_{i} > 3.0 \end{matrix}

(3)

2.1.2. OQ Test

The oil quality (

O Q)

test is the second main test required to evaluate the power transformer HI state. The

O Q

test of the transformer oil depends on six factors: the oil dielectric strength or breakdown voltage (BD), oil interfacial tension (IFT), oil color, oil humidity (Mois.), oil insulation dissipation factor (DF), and oil acidity (Acid.). The oil quality index

{(O Q}_{i})

can be evaluated like that of the

{D G A}_{i}

as follows [3,16]:

{D G A}_{i} = \frac{\sum_{i = 1}^{7} S_{i} \times W_{i}}{\sum_{i = 1}^{7} W_{i}}

(4)

where

S_{i}

is the scoring factor that has a value 1, 2, 3, or 4 according to the level of each factor. For example, for the oil humidity,

S_{1}

= 1 when its value is less than 20 ppm, equals 2 for the value of the 20 to 30 ppm range, 3 for the value of the 30 to 40 ppm range, and 4 for the value greater than 40 ppm (see Appendix A).

W_{i}

is the weighting factor. The

W_{i}

can be evaluated as follows [3,16]:

W_{i} = \{\begin{matrix} 1, f o r A c i d . \\ 2, f o r I F T a n d c o l o r \\ 3, f o r B D \\ 4, f o r M o i s . a n d D F \end{matrix}

(5)

The transformer state condition according to the

{O Q}_{i}

index can be converted to code that is used later in evaluating the HI state value as follows [3,16]:

{O Q}_{C o d e} = \{\begin{matrix} 4, f o r {O Q}_{i} \leq 1.2 \\ 3, f o r {1.2 < O Q}_{i} \leq 1.5 \\ 2, f o r {1.5 < O Q}_{i} \leq 2.0 \\ 1, f o r {2.0 < O Q}_{i} \leq 3.0 \\ 0, f o r {O Q}_{i} > 3.0 \end{matrix}

(6)

2.1.3. DP Test

The depolarization (

D P

) of the power transformer can be evaluated based on the value of the furan or furfural factor. The furfural content of the power transformer oil is used as a polymerization degree of the insulation of the paper. The Furan levels in the power transformers are normally less than 0.1 ppm. It is recommended to carry out the furfural test when the power transformer has a high level of carbon monoxide and carbon dioxide is overheated, or its age is greater than 25 years. In this instance, a furfural test is recommended periodically (see Appendix A).

The depolarization index (

{D P}_{C o d e}

) can be evaluated based on the value of the furan values as follows [3,16]:

{D P}_{C o d e} = \{\begin{matrix} 4, f o r F u r a n < 0.1 \\ 3, f o r 0.1 \leq F u r a n < 0.5 \\ 2, f o r 0.5 \leq F u r a n < 1.0 \\ 1, f o r 1.0 \leq F u r a n < 5.0 \\ 0, f o r F u r a n \geq 5.0 \end{matrix}

(7)

The power transformer health index state (

H I

) can be investigated based on three previously mentioned

H I C

codes (

{D G A}_{c o d e}, {O Q}_{c o d e} a n d {D P}_{c o d e})

. The score of each code is presented in Table 1. The weighting factor (

K_{i}

) of the

{D G A}_{c o d e}

is taken by 10 due to its greater effect on evaluating the HI state compared to

{O Q}_{c o d e} a n d {D P}_{c o d e}

. The weighting values of the

{O Q}_{c o d e} a n d {D P}_{c o d e}

are taken with 8 and 6, respectively.

The power transformer HI is evaluated based on the three codes as follows [4,14,16]:

H I = 100 \times \frac{\sum_{i = 1}^{3} K_{i} \times {H I C}_{i}}{\sum_{i = 1}^{3} 4 K_{i}}

(8)

where

K_{i}

is the weight factor for each transformer condition index code [1,17]. The

{D G A}_{c o d e}

is the important code that denotes the HI faulty state ‘Poor’ and healthy state ‘Fair’ and ‘Good’ states. Utility engineers use the values of the

D G A

features as indicators for faulty and healthy states of the power transformers.

The power transformer HI states and their corresponding threshold limits are presented in Table 2 [14,16]. The ‘Good’ state represents the state where the power transformer HI value is higher than or equal to 85%, while the ‘Fair’ state is where the power transformer HI varies from 85% to 50%. The ‘Poor’ state is achieved when the power transformer HI is smaller than 50%. The required decision corresponding to each HI state is also specified in Table 2 [14,16].

2.2. Suggested CNN Model and Solution Procedure

Figure 2 illustrates the main architecture of the CNN model used for the classification process. CNN architecture has two main parts. The first part is the feature selections, while the second is the classification part. The first part has three layers: input, convolution, and pooling. The input features are converted to images and then inserted into the input layer, and the output of the input layer is inserted into the convolution layer. The objective of the convolution layer is to separate and identify the input features of the image for analysis in a procedure called feature extraction. The input image is passed through N × N filters to extract image various features. It is carried out by sliding the filter above the image. The convolution layer’s output is termed the feature map, which provides information about the image, such as the corners and edges. Then, this feature map is fed to other layers to learn several other features of the input image. The output of the convolution layer is inserted into the pooling layer to decrease the feature map to minimize the computational costs. A fully connected layer utilizes the pooling layer’s output and detects the image’s class based on the features extracted in the convolution layer. The fully connected layer’s output classes are inserted into the output layer that extracts the output class diagnoses.

Figure 3 presents the solution procedure of the power transformer HI state. The following steps summarize the solution procedure:

The collected dataset samples are divided into two main subsets (training subset and testing subset).
The new generating dataset is obtained by oversampling to enhance training set distribution between classes.
After the oversampling process, the training is normalized before applying it to the suggested CNN model that uses the classification process.
The training and the testing datasets are inserted into the diagnosed CNN model to obtain the diagnosis of each set.
Classification analysis is carried out for the obtained results.

Figure 3. Classification model procedure.

2.3. Suggested Oversampling Technique

The imbalanced training dataset samples between classes cause the trained model’s tendency toward the majority class. The detecting accuracy of the minority class is very bad due to this imbalanced distribution of the training dataset: very bad training set distribution between the different HI state classes (662 samples for ‘Good’, 204 samples for the ‘Fair’ state, and 19 samples for the ‘Poor’ state). The imbalance in training dataset samples can be solved using oversampling or undersampling. This work applies the oversampling process to the training dataset samples to enhance the balance between training set classes.

The number of samples of each HI state can be determined, and the HI state with the majority number of samples can be defined as follows:

m = m a x . \{\begin{matrix} {c l a s s}_{1} \\ . \\ . \\ {c l a s s}_{n} \end{matrix}

(9)

where n is the number of HI state classes. class₁, class₂, …, and class_n are the number of samples of dataset number 1, 2, …, and n, respectively. The number of a repeat of each HI state class can be evaluated as follows:

m_{i} = ⌊\frac{m}{{c l a s s}_{i}}⌋, i = 1,2, \dots .,

(10)

The rest number of the samples of the dataset HI state that will be selected randomly from the dataset of each class will be calculated from (5):

m_{i i} = m - m_{i} \times {c l a s s}_{i}, i = 1,2, \dots ., n

(11)

The ith HI sate dataset samples are finally increased according to:

{d a t a s e t}_{{c l a s s}_{1}} = (\begin{matrix} {d a t a s e t}_{i 1} \\ {d a t a s e t}_{i 2} \\ . \\ . \\ {d a t a s e t}_{i m_{i} - 1} \\ {r a n d (d a t a s e t}_{i}, m_{i i}) \end{matrix})

(12)

The procedure of oversampling is introduced in the flowchart shown in Figure 4.

The input features are the 14 features obtained from three tests: dissolved gas analyses, oil quality, and depolarization tests, which requires more time, cost, and operations. Therefore, the reduced-feature technique is used to reduce these features to reduce cost, time, and operations. The reduced-feature techniques used are MRMR, Chi2, ANOVA, RelifF, and Kruskal–Wallis.

2.3.1. MRMR Technique

The features reduction using the Minimum Redundancy Maximum Relevance (MRMR) Algorithm. The MRMR technique obtains an optimal set of features that are commonly and maximally different and can effectively represent the response variable. The MRMR technique minimizes the feature set redundancy and maximizes the feature set relevance of the response variable. More details of the MRMR technique are presented in [16,18,19].

2.3.2. Chi2 Technique

Chi-2 or Chi-square, a feature-selection technique, is used to investigate whether each interpreter variable is independent of the response variable. Then, the probability of occurrence of a given feature (p-values) is used to rank features. The feature scores are related to –log(p). More details of Chi-2 are presented [17,20,21].

2.3.3. ANOVA Technique

ANOVA, or analysis of variance, is used to determine the predictor variable. Then, rank features utilizing the probability of occurrence of a given feature (p-values) based on the difference in means between feature groups. The ANOVA checks the assumption that the predictor values grouped by the output corresponding classes are drawn from inhabitants with the same standard against the other assumption that the inhabitants imply are not all similar. The feature scores relate to –log(p). More details of ANOVA are presented [22,23,24].

2.3.4. ReliefF Technique

ReliefF is a procedure created by Kira and Rendell in 1992 that uses a filter-method method for feature selection or feature reduction that is remarkably sensitive to feature relations. This technique performs well for estimating feature importance for distance-based supervised models. It utilizes pairwise distances between observations to predict the corresponding output response. More details of ReliefF are presented in [25,26].

2.3.5. Kruskal–Wallis

The Kruskal–Wallis test is defined as one-way ANOVA on ranks. It is a non-parametric technique to test whether the samples have the same distribution. It ranks features utilizing the p-values returned by the Kruskal–Wallis test. The Kruskal–Wallis test tests the assumption that the predictor values categorized by the response output classes are drawn from inhabitants with the same median compared to the alternative hypothesis that the population medians are not all the same for each predictor variable. The test scores relate to –log(p). More details of Kruskal–Wallis are introduced in [27,28].

3. Results and Discussion

The dataset samples were assembled from two regions. The first dataset (730 samples) is collected from the Gulf Region at a sub-transmission stage of a medium voltage with 66 kV. The second dataset (631 samples) is assembled from the transmission regions of Malaysia Electricity Company at power transformers in the transmission line stage with 220 kV. The two datasets are added and then divided randomly into 65% (885 samples) and 35% (476 samples) for the training and testing data process. The CNN model is built using R2022a MATLAB/software. The main training dataset samples (885 samples) are applied to the CNN model. The dataset samples are normalized before inserting them into the classification CNN model. The normalization of the dataset samples is implemented as follows:

x_{i} = {(x}_{i} - x_{j m i n}) / (x_{j m a x} - x_{j m i n})

(13)

where

x_{i}

is the ith dataset sample of the jth dataset feature,

x_{j m i n}

is the minimum value of the certain jth dataset feature, and

x_{j m a x}

is the maximum value of the certain jth dataset feature.

The testing dataset samples (476 samples) after applying the normalizing process (Equation (13)) are used to check the CNN model prediction accuracy. The results of the CNN model during the training and testing are presented in Table 3 and Table 4, respectively. The results illustrate that the prediction accuracy of the power transformer HI state is low, especially during the testing stage, due to the tendency of the trained CNN model to the majority class (‘Good’ state). The prediction accuracy of the majority state type (good state) (335/341 = 98.24%), while the prediction accuracy of the minority class is very low (8/15 = 53.33%). The training dataset is applied to the optimized classification machine learning (ML) methods such as decision tree (DT), Discriminant analysis (DA), Naïve Bayes (NB), support vector machines (SVM), k-nearest neighbors (KNN), ensemble (EN), and artificial neural network (ANN). The ML methods are built using MATLAB/software 2021a classification learner toolbox. Table 5 and Table 6 compare the proposed CNN model and the ML methods, and the results of these ML methods during the training and testing stages, respectively. The results illustrate that the accuracy of all ML methods for detecting the ‘Good’ state is better than that of the ‘Fair’ and ‘Poor’ states. The ‘Poor’ state (minority class) has a bad prediction accuracy compared to the ‘Good’ state (the majority class).

The accuracy of the CNN model and other ML methods for the different stages of calculations (training, testing, and overall) is calculated as

% A c c u r a c y = \frac{T_{P} + T_{N}}{P + N} \times 100

(14)

where

T_{P}, T_{N}

are the number of true positive and true negative samples, respectively.

F_{P}, F_{N}

are the number of false positive and false negative samples, respectively.

{P = T}_{P} + F_{N}

and

{N = F}_{P} + T_{N}

.

Other classification performance factors are presented for more comparisons between the suggested CNN model and the different classification models, such as sensitivity, specificity, precision, and F1-score, which can be evaluated as follows:

S e n s i t i v i t y = \frac{T_{P}}{P}

(15)

Specificity = \frac{T_{N}}{N}

(16)

P r e c i s i o n = \frac{T_{P}}{T_{P} + F_{P}}

(17)

F 1 - S c o r e = \frac{2 \times S p e c i f i c i t y \times P r e c i s i o n}{S p e c i f i c i t y + P r e c i s i o n}

(18)

Table 3. Confusion matrix of the CNN model during the training process without data oversampling.

True Class					% Accuracy
	Good	659	3	0	99.55
	Fair	3	200	1	98.04
	Poor	0	0	19	100
		Good	Fair	Poor	99.21
	Predict Class

Table 4. Confusion matrix of the CNN model during the testing process without data oversampling.

True Class					% Accuracy
	Good	335	6	0	98.24
	Fair	30	85	5	70.83
	Poor	0	7	8	53.33
		Good	Fair	Poor	89.92
	Predict Class

Table 5. Comparison between the results of the CNN model and the other methods during the training stage.

HI	Good	Fair	Poor	% Accuracy
TSN *	662	204	19	% Accuracy
DT	639	172	8	92.54
DA	650	158	11	92.54
NB	626	167	11	90.85
SVM	649	182	7	94.69
KNN	654	170	8	94.01
EN	653	187	4	95.37
ANN	649	175	7	93.90
CNN	659	200	19	99.21

* TSN is the total number of samples.

Table 6. Comparison between the results of the CNN model and the other methods during the testing stage.

HI	Good	Fair	Poor	Sensitivity	Specificity	Precision	F1-Score	% Accuracy
TSN	341	120	15	Sensitivity	Specificity	Precision	F1-Score	% Accuracy
DT	332	70	4	0.61	0.88	0.64	0.62	85.29
DA	338	65	5	0.62	0.87	0.70	0.65	85.71
NB	323	85	8	0.73	0.90	0.74	0.74	87.39
SVM	330	93	3	0.65	0.92	0.75	0.68	89.50
KNN	339	75	3	0.61	0.88	0.83	0.66	87.61
EN	331	82	2	0.60	0.88	0.89	0.63	87.18
ANN	332	70	4	0.66	0.91	0.81	0.71	89.92
CNN	338	65	5	0.74	0.91	0.80	0.77	89.92

3.1. Oversampling Technique

The imbalanced training dataset samples between classes cause the trained model’s tendency toward the majority class. The detecting accuracy of the minority class is very bad due to this imbalanced distribution of the training dataset: very bad training set distribution between the different HI state classes (662 samples for ‘Good’, 204 samples for the ‘Fair’ state and 19 samples for the ‘Poor’ state). The imbalance in training dataset samples can be solved using oversampling or undersampling. This work applies the oversampling process to the training dataset samples to enhance the balance between training set classes.

The training dataset numbers of the three HI states before and after the oversampling process are shown in Figure 5. It illustrates the bad distribution of the training dataset number of samples before the oversampling process and an equality distribution after applying the suggested oversampling procedure.

Figure 6 illustrates the prediction accuracy and loss versus the iteration number during the training attempts after oversampling the training dataset to the CNN model. The results demonstrate that the training accuracy is near one hundred percent, while the loss is very low and close to zero, showing a good training accuracy of the suggested CNN model.

The CNN loss can be expressed as follows:

L o s s = \sum_{i = 1}^{n} \sum_{j = 1}^{C} k_{i j} O_{i j}

(19)

where n is the sample numbers, C is the class numbers,

k_{i j}

represents the probability that the

i

th sample goes to the

j

th class, and

O_{i j}

is the output of the dataset sample

i

in the class

j

, which is the output of the CNN softmax layer.

Table 7 presents the CNN hyperparameters used for the classification process of the power transformer HI state. The CNN hyperparameters are selected to have a good classification performance.

Table 8 presents the predicted accuracy of the training dataset after applying the oversampling process to the training dataset. The predicting accuracy of the ‘Poor’ state is 100%, while that of the ‘Good’ and ‘Fair’ states are 99.95% and 97.73%, respectively, and the overall accuracy is 99.19%.

Table 9 shows the predicting accuracy of the CNN model after the oversampling process with the testing dataset (476 samples). The predicted accuracy of both the ‘Poor’ and ‘Fair’ states are 100% and 97.73%, respectively, while that of the training model without oversampling process is 53.33% and 70.83%, respectively. Moreover, the overall accuracy of the CNN model with the oversampling process is enhanced to 98.53% compared to 89.92% for the CNN model without oversampling process.

Table 7. CNN model selected parameters.

	Parameter	Value
Convolution Layer 1	Filter Size	$3 \times$ 1
	Number of filters	32
	Padding	$1 \times$ 0
Convolution Layer 2	Filter Size	$3 \times$ 1
	Number of filters	175
	Padding	$1 \times$ 0
Max-Pooling Layer	Stride	1
Fully Connected Layer	Outputs	3
Learning Algorithm Options	Step size, α	10⁻³
	Gradient threshold	0.001
	Training algorithm	Adam
	Max. Epochs	150
	Verbose	1
	Activation	softmax
	CNN types	classification

Table 8. Confusion matrix of the CNN model during the training process.

True Class					% Accuracy
	Good	661	1	0	99.85
	Fair	15	647	0	97.73
	Poor	0	0	662	100
		Good	Fair	Poor	99.19
	Predict Class

Table 9. Confusion matrix of the CNN model during the testing process.

True Class					% Accuracy
	Good	335	6	0	98.24
	Fair	1	119	0	99.17
	Poor	0	0	15	100
		Good	Fair	Poor	98.53
	Predict Class

Figure 7 compares the actual HI state of the transformer (Good, Fair, and Poor) against the predicted HI state to illustrate the prediction accuracy of the suggested CNN model during the testing process with 476 dataset samples. The results demonstrate that the proposed CNN model has excellent predicting accuracy with high prediction accuracy of 98.24%, 99.17%, and 100% for the ‘Good’, ‘Fair’, and ‘Poor’ states, respectively.

Figure 8 compares the CNN model prediction results of the power transformer HI state accuracy based on the testing dataset (476 samples) with and without oversampling. It illustrates the enhancement of the HI state after applying the oversampling process, especially with the ‘Poor’ and ‘Fair’ states.

After the oversampling process, the dataset is used to train the optimizing ML models. The training process of the different ML models is presented in Figure 9.

Figure 7. HI state prediction of the suggested CNN model during the testing process after oversampling.

Figure 8. Comparison between the fault prediction of the CNN model during the testing stage without and with the oversampling process.

Figure 9. Comparison among the ML classification methods against the iteration number during the optimization process.

Compared to other models, the EN method model has a minimum error during the training stages. In contrast, the NB method model has a higher error. The cross-fold validation approach with five folds was used for training the ML-optimized models. Hence, the optimization parameter in the classification learner toolbox is applied to select the suitable classification model and the matching parameters of the main chosen methods. This work uses Bayesian optimization (BO) with ML methods to determine their optimal parameters. The BO approach is useful for optimization problems and can be used for most ML techniques for optimal parameter selection [29,30,31]. The training parameters of different ML models are introduced in Table 10.

Table 11 compares the results of the CNN model and the other ML methods after applying the oversampling process to the training dataset samples. The number of predicting samples for each power transformer HI state and the overall accuracy corresponding to the CNN and ML methods during the training stage. The results illustrate the high predicting accuracy for all models compared to those without the training dataset’s oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 99.19%) compared to other ML methods (overall accuracy of best ML learning with the EN method is 97.94%).

Table 12 compares the results of the CNN model and the other ML learning methods (trained after applying oversampling process) using the testing dataset samples. It introduces the number of predicting samples for each power transformer HI state and the predicting performance factors corresponding to the CNN and ML methods during the testing stage. The results illustrate the high predicting accuracy for all models compared to those without applying the oversampling process (Table 6). The CNN model predicting accuracy is enhanced to 98.53% compared to 89.92% with the model without the oversampling process. Moreover, all the other ML learning model-predicting performances are better than the predicting performance for the models without oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 98.53%) compared to other ML methods (overall accuracy of best ML learning with the SVM method is 96.43%).

Table 10. Optimal parameters of the ML methods during optimization with the training dataset after oversampling processes.

Method	Optimization Parameters
DT	Max. No. of splits: 120 Split criterion: Towing rule
DA	Discriminant type: Quadratic
NB	Distribution names: Kernel Kernel type: Gaussian
SVM	Multiclass method: One-vs.-All Box constraint level: 985.7716 Kernel function: Cubic
KNN	Number of neighbors: 991 Distance metric: Cosine Distance weight: Squared inverse Standardize data: true
EN	Ensemble method: AdaBoost Number of learners: 140 Learning rate: 0.9897 Maximum number of splits: 18
ANN	Number of fully connected layers: 2 Activation: Sigmoid Standardize data: No Regularization strength (Lambda): 5.1411 × 10⁻⁹ First layer size: 138 Second layer size: 248

Table 11. Comparison between the results of the CNN model and the other methods during the training stage after the oversampling process.

HI	Good	Fair	Poor	% Accuracy
SN	662	662	662	% Accuracy
DT	627	639	647	96.32
DA	623	502	649	89.33
NB	460	551	642	83.23
SVM	638	638	652	97.08
KNN	639	618	655	96.27
EN	652	647	646	97.94
ANN	632	639	656	97.03
CNN	661	647	662	99.19

Table 12. Comparison between the results of the CNN model and the ML methods during the testing stage after the oversampling process.

HI	Good	Fair	Poor	Sensitivity	Specificity	Precision	F1-Score	% Accuracy
SN	341	120	15	Sensitivity	Specificity	Precision	F1-Score	% Accuracy
DT	303	120	15	0.96	0.96	0.92	0.93	92.02
DA	324	81	15	0.88	0.91	0.77	0.80	88.24
NB	259	105	15	0.88	0.90	0.71	0.76	79.62
SVM	325	119	15	0.98	0.98	0.96	0.97	96.43
KNN	318	120	15	0.98	0.98	0.95	0.96	95.17
EN	272	120	15	0.93	0.94	0.88	0.89	85.50
ANN	322	120	15	0.98	0.98	0.95	0.97	96.01
CNN	335	119	15	0.99	0.99	0.98	0.99	98.53

3.2. Reduced-Feature Results

This section presents the results of the reduced-features CNN model. The selected features are carried out based on five feature-reduction techniques. These methods are MRMR, Chi2, RelifF, and Kruskal–Wallis. The minimum number of features that give good predicting results is eight, like that presented in [16].

The arrangements of eight high-ranked features with different feature-reduction techniques are shown in Table 13. The training scores for eight important features against the MRMR, ReliefF, ANOVA, and Kruskal–Wallis approaches are demonstrated in Figure 10.

The CNN model with the oversampling process is trained with the high-ranked eight features of each of the five feature-reduction approaches. The predicting accuracy corresponding to each power transformer HI state and the overall state are presented against each feature-reduction approach. The results are shown in Table 14. The results illustrate that the predicting accuracy with the ReliefF technique is better than that of other feature-reduction methods.

Table 14 presents the predicting accuracy of the CNN model for each feature-reduction technique during the training stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.

Table 15 presents the predicting accuracy of the CNN model for each feature-reduction technique during the testing stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.

Figure 10. High-ranked eight features of the MRMR, ReliefF, ANOVA, and Wallis methods.

Table 13. High-ranked eight features corresponding to the five feature-reduction techniques.

MRMR	CHi2	ReleifF	ANOVA	Wallis
DF	DF	IF	IF	DF
Furan	Furan	BDV	Color	IF
IF	IF	Mois	DF	Color
Color	Color	Acid	Acid	Acid
Acid	Acid	C2H2	Furan	C2H4
CO2	CO2	CO	Mois	CO2
C2H2	C2H4	C2H6	CO2	Mois
C2H4	C2H2	CH4	CO	H2

Table 14. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the training stage.

HI State	MRMR	CHi2	ReliefF	ANOVA	Wallis
Good	97.89	97.89	97.89	97.89	94.11
Fair	87.31	87.31	96.68	96.68	94.86
Poor	99.7	99.7	100	100	99.7
All	94.96	94.96	98.19	98.19	96.22

Table 15. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the testing stage.

HI State	MRMR	CHi2	ReliefF	ANOVA	Wallis
Good	95.01	95.01	94.13	84.75	90.32
Fair	88.33	88.33	98.33	96.67	95.83
Poor	100	100	100	100	100
All	93.49	93.49	95.38	88.24	92.02

3.3. Effectiveness of the CNN Model

Two methods measure the effectiveness of the proposed model. The first method used to measure the effectiveness of the proposed CNN model is the uncertainty in the input dataset inserted into the CNN model. In contrast, the second method compares the results of the suggested model with the recently published works.

3.3.1. CNN Model with Uncertainty

The datasets of the power transformers are collected offline in three major steps. The first one is obtaining samples from the power transformers. The second step is obtaining the gases from the transformer oil, and the third is detecting the power transformer HI state. Special syringes are used to extract oil samples. The extracted samples are saved and transported to laboratories. Storage time and temperature affected the gas concentration value. Air bubbles are the most critical factor affecting gas concentrations [32]. Air bubbles decrease dissolved gases due to the gas diffusion into the air bubbles, thus leaving behind the oil [33]. Hence, uncertainty during the measurement process affects the power transformer HI state detection. The uncertainty during measurements must be studied by the classification methods implemented for detecting the power transformer HI state. A ±14% uncertainty noise is produced by the temperature effects and the sample’s storage, and an uncertainty noise up to ±5% is made by measurement process accuracy [34]. This study considers an uncertainty noise up to ±25%.

The uncertainty in testing samples is presented to each sample,

R = {⟨r_{i}⟩}_{i = 1}^{14}

, to generate a new sample with a selected uncertainty level of up to ±25%,

R^{'} = {⟨r_{i}^{'}⟩}_{i = 1}^{14}

, using the following equation adapted from [35].

R_{i}^{'} = r_{i} \times \{1 + \frac{m (2 n_{i} - 1)}{100}\}

(20)

where

m

is the maximum uncertainty level (±25%) and

N = {⟨n_{i}⟩}_{i = 1}^{14}

is a 14×1 random vector with component values between 0 and 1.

When

m

has an uncertainty noise level of ±5% to ±25% with the step of ±5%, the original input feature vector and

N_{l}

are produced element by element to obtain a dataset with uncertainty noise. These datasets with uncertainty noise are incorporated into the proposed CNN model to measure the predicted performance during the uncertainty evaluation.

Table 16 and Table 17 present the predicting accuracy for different transformer HI states and the overall accuracy of the proposed CNN model against uncertainty from 0 up to ±25% during the testing stage with full and reduced features, respectively. Table 17 illustrates the effectiveness of the proposed model against uncertainty noise up to ±25%. Moreover, the suggested CNN model results against the uncertainty noise up to ±25% are satisfactory.

Figure 11 compares the overall accuracy of the CNN model with full and reduced-feature scenarios with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25%.

Figure 12 compares the overall accuracy of the CNN model with other ML learning models for the full-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.

Figure 13 compares the overall accuracy of the CNN model with other ML learning models for the reduced-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.

Figure 11. Comparison between full and reduced-feature scenarios under uncertainty levels up to ±25%.

Figure 12. Comparison between CNN model and other ML methods with full-feature scenarios based on the 476-testing dataset.

Figure 13. Comparison between CNN model and other ML methods with reduced-feature scenarios based on the 476-testing dataset.

3.3.2. Comparisons with Recently Published Works

Table 18 compares the results obtained by the proposed CNN model and those presented in [14,16] for both the full-feature and reduced-feature scenarios. The results are carried out based on dataset system 2 (Gulf region) and on the methods of DT, SVM, KNN, and EN methods in [16] and NN, MLR, J48, and RF with [14]. The proposed CNN model demonstrates higher accuracy than the techniques presented in [14,16] for the full-feature and reduced-feature scenarios. For the full-feature procedure, the highest accuracy achieved by the proposed CNN model is 98.4%. In contrast, the highest accuracy obtained by the method in [16] was 96.7% with the EN model, and 96.6% is the highest accuracy obtained with the RF model in [14]. The proposed CNN model results are also better than those in [14,16], with an accuracy of 96.9%.

4. Conclusions

The power transformer HI state was studied based on the results of three tests: dissolved gas analysis (DGA), oil quality (OQ), and depolarization factor (DP). The power transformer HI state prediction was carried out in this work using 1361 dataset samples collected from two different regions (the first region was the Gulf Region with 730 dataset samples, while the second region was a Malaysia utility with 631 dataset samples). The proposed CNN model was implemented to predict and diagnose the power transformer HI state. The imbalance between the training dataset sample classes produced a high detection accuracy of the class with a major number of samples. In contrast, the low detection accuracy of the class with a minor number of samples was obtained. The oversampling approach was used to balance the training dataset samples to enhance the prediction accuracy of the classification methods. After applying the oversampling approach to the training datasets samples, the proposed CNN model predicted the power transformer HI state. The prediction accuracy of the proposed CNN model was enhanced to 98.53% after applying the oversampling process.

In comparison, the prediction accuracy of the CNN model without the oversampling process was 89.92%, based on the testing dataset samples. The results obtained with the proposed CNN model were compared with that obtained with the optimized ML classification methods such as DT, DA, NB, SVM, KNN, EN, and ANN under the oversampling process with the superiority of the CNN results. The predicting accuracy was 92.02%, 88.24%, 79.62%, 96.43%, 95.17%, 85.50%, 96.01%, and 98.53% based on the testing dataset samples for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively. Five feature-reduction techniques were applied to minimize the cost of testing time, effort, and prediction process. The reduced-feature techniques were MRMR, Chi2, RelifF, and Kruskal–Wallis were used with the proposed CNN model to reduce the number of applied features to only eight features. The results of the proposed CNN model are compared with that of the ML learning classification methods with the superiority of the proposed CNN model. The predicting accuracy was 93.70%, 79.41%, 85.08%, 92.23%, 95.17%, 95.38%, 93.70%, and 95.38% based on the testing dataset samples and ReliefF reduced-feature approach for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively.

Furthermore, the proposed CNN model was checked with uncertain noise in full and reduced features of up to ±25% with a good prediction diagnosis of the power transformer HI state. Finally, the proposed model results were compared with that obtained from recently published works, ensuring the efficacy of the proposed model for both full and reduced-feature approaches. The main contribution of this work is the enhancement of predicting the accuracy of the minor transformer HI state and the overall accuracy using the CNN model and the suggested oversampling approach.

Funding

This research received no external funding.

Acknowledgments

The authors would like to acknowledge the Deanship of Scientific Research, Taif University, for funding this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Scoring and weight factors for gas levels (ppm).

Gas	S_i						W_i
Gas	1	2	3	4	5	6	W_i
H₂	≤100	100–200	200–300	300–500	500–700	>700	2
CH₄	≤75	75–125	125–200	200–400	400–600	>600	3
C₂H₆	≤65	65–80	80–100	100–120	120–150	>150	3
C₂H₄	≤50	50–80	80–100	100–150	150–200	>200	3
C₂H₂	≤3	3–7	7–35	35–50	50–80	>80	5
CO	≤350	350–700	700–900	900–1100	1100–1400	>1400	1
CO₂	≤2500	≤3000	≤4000	≤5000	≤7000	>7000	1

Table A2. Grading method for oil quality test parameters.

	U < 69 kV	69 < U < 230 kV	U > 230 kV	S_i	W_i
Dielectric strength kV (2 mm gap)	≥45	≥52	≥60	1	3
	35–45	45–52	50–60	2
	30–35	35–45	40–50	3
	≤30	≤35	≤40	4
IFT dyne/cm	≥25	≥30	≥32	1	2
	20–25	23–30	25–32	2
	15–20	18–23	20–25	3
	≤15	≤18	≤20	4
Acidity	≤0.05	≤0.04	≤0.03	1	1
	0.05–0.1	0.04–1	0.03–0.07	2
	0.1–0.2	0.1–0.15	0.07–0.1	3
	≥0.2	≥0.15	≥0.1	4
Moisture (ppm)	≤20			1	4
	20–30			2
	30–40			3
	≥40			4
Color	≤1.5			1	2
	1.5–2			2
	2–2.5			3
	≥2.5			4

Table A3. Furan test rating code or age rating when testing is not available.

Rating Code	Furan (ppm)	Age Year
A	0–0.1	<20
B	0.1–0.5	20–40
C	0.5–1	40–60
D	1–5	>60
E	>5	-

References

Azmi, A.; Jasni, J.; Azis, N.; Kadir, M.Z.A.A. Evolution of transformer health index in the form of mathematical equation. Renew. Sustain. Energy Rev. 2017, 76, 687–700. [Google Scholar] [CrossRef]
Zuo, W.; Yuan, H.; Shang, Y.; Liu, Y.; Chen, T. Calculation of a Health Index of Oil-Paper Transformers Insulation with Binary Logistic Regression. Hindawi Math. Probl. Eng. 2016, 2016, 6069784. [Google Scholar] [CrossRef]
Naderian, A.; Cress, S.; Piercy, R.; Wang, F.; Service, J. An Approach to Determine the Health Index of Power Transformers. In Proceedings of the Conference Record of the 2008 IEEE International Symposium on Electrical Insulation, Vancouver, BC, Canada, 9–12 June 2008; pp. 192–196. [Google Scholar]
Jahromi, A.; Piercy, R.; Cress, S.; Service, J.; Fan, W. An approach to power transformer asset management using health index. IEEE Electr. Insul. Mag. 2009, 25, 20–34. [Google Scholar] [CrossRef]
Haema, J.; Phadungthin, R. Condition assessment of the health index for power transformer. In Proceedings of the Power Engineering and Automation Conference (IEEE PEAM 2012), Wuhan, China, 18–20 September 2012; pp. 1–4. [Google Scholar]
Haema, J.; Phadungthin, R. Development of condition evaluation for power transformer maintenance. In Proceedings of the 4th International Conference on Power Engineering, Energy and Electrical Drives, Istanbul, Turkey, 13–17 May 2013; pp. 620–623. [Google Scholar]
Yang, Y.; Talib, M.A.; Rosli, H. TNB experience in condition assessment and life management of distribution power transformers. In Proceedings of the CIRED 2009—20th International Conference and Exhibition on Electricity Distribution-Part 1, Prague, Czech Republic, 8–11 June 2009; pp. 1–4. [Google Scholar]
Yang, Y.; Talib, M.A.; Rosli, H. Condition assessment of power transformers in TNB distribution system and determination of transformer condition index. In Proceedings of the Conference of the Electric Power Supply Industry (CEPSI), Macau, China, 27–31 October 2008. [Google Scholar]
Zhou, Y.; Ma, L.; Yang, J.; Xia, C. Entropy weight health index method of power transformer condition assessment. In Proceeding of the 9th International Conference Reliability Maintainability Safety, Guiyang, China, 12–15 June 2011; pp. 426–431. [Google Scholar]
Li, E.; Song, B. Transformer health status evaluation model based on multifeatured factors. In Proceedings of the 2014 International Conference on Power System Technology (POWERCON 2014), Chengdu, China, 20–22 October 2014; pp. 1417–1422. [Google Scholar]
Ashkezari, A.D.; Ma, H.; Saha, T.K.; Ekanayake, C. Application of Fuzzy Support Vector Machine for Determining the Health Index of the Insulation System of In-service Power Transformers. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 965–973. [Google Scholar] [CrossRef]
Alqudsi, H.; El-Hag, A. Assessing the Power Transformer Insulation Health Condition Using a Feature-Reduced Predictor Model. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 853–862. [Google Scholar] [CrossRef]
Kadim, E.J.; Azis, N.; Jasni, J.; Ahmad, S.A.; Talib, M.A. Transformers Health Index Assessment Based on Neural-Fuzzy Network. Energies 2018, 11, 710. [Google Scholar] [CrossRef]
Alqudsi, A.; El-Hag, A. Application of Machine Learning in Transformer Health Index Prediction. Energies 2019, 12, 2694. [Google Scholar] [CrossRef]
Tian, F.; Jing, Z.; Zhao, H.; Zhang, E.; Liu, J. A Synthetic Condition Assessment Model for Power Transformers Using the Fuzzy Evidence Fusion Method. Energies 2019, 12, 857. [Google Scholar] [CrossRef]
Ghoneim, S.S.M.; Taha, I.B.M. Comparative Study of Full and Reduced Feature Scenarios for Health Index Computation of Power Transformers. IEEE Access 2020, 8, 181326–181339. [Google Scholar] [CrossRef]
Nikulin, M.S. Chi-squared test for normality. In Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, Vilnius, Lithuania, 25–30 June 1973; Volume 2, pp. 119–122. [Google Scholar]
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef] [PubMed]
Darbellay, G.A.; Vajda, I. Estimation of the information by an adaptive partitioning of the observation space. IEEE Trans. Inf. Theory 1999, 45, 1315–1321. [Google Scholar] [CrossRef]
Johannes, B. On the efficient calculation of a linear combination of chi-square random variables with an application in counting string vacua. J. Phys. A Math. Theor. 2013, 46, 505202. [Google Scholar]
Bagdonavicius, V.; Nikulin, M.S. Chi-squared goodness-of-fit test for right censored data. Int. J. Appl. Math. Stat. 2011, 24, 30–50. [Google Scholar]
Cox David, R. Principles of Statistical Inference; Cambridge University Press: Cambridge, NY, USA, 2006; ISBN 978-0-521-68567-2. [Google Scholar]
Tabachnick, G.; Fidell, S. Using Multivariate Statistics, 5th ed.; Pearson International Edition: Boston, MA, USA, 2007; ISBN 978-0-205-45938-4. [Google Scholar]
Moore, S.; McCabe, P. Introduction to the Practice of Statistics, 4th ed.; W H Freeman & Co: New York, NY, USA, 2003; ISBN 0-7167-9657-0. [Google Scholar]
Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
Urbanowicz, J.; Meeker, M.; LaCava, W.; Olson, R.; Moore, J.; Jason, H. Relief-Based Feature Selection: Introduction and Review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef] [PubMed]
Spurrier, J.D. On the null distribution of the Kruskal–Wallis statistic. J. Nonparametr. Stat. 2003, 15, 685–691. [Google Scholar] [CrossRef]
Corder, W.; Foreman, I. Nonparametric Statistics for Non-Statisticians; John Wiley & Sons: Hoboken, NJ, USA, 2009; pp. 99–105. [Google Scholar]
Putatunda, S.; Rama, K. A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting. In Proceedings of the Fifteenth International Conference on Information Processing (ICINPRO), Bengaluru, India, 20–22 December 2019. [Google Scholar]
William, W.; Burank, B.; Efstratios, P. Hyperparameter optimization of machine learning models through parametric programming. Comput. Chem. Eng. 2020, 139, 1–12. [Google Scholar]
Jia, W.; Xiu, C.; Hao, Z.; Li-Diong, X.; Si-Hao, D. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Wang, M.-H. A novel extension method for transformer fault diagnosis. IEEE Trans. Power Del. 2003, 18, 164–169. [Google Scholar] [CrossRef]
Zhu, Y.-L.; Wang, F.; Geng, L.-Q. Transformer fault diagnosis based on naive Bayesian classifier and SVR. In Proceedings of the ENCON 2006 IEEE Region 10 Conference, Hong Kong, China, 14–17 November 2006; pp. 1–4. [Google Scholar]
Sarma, D.V.S.S.S.; Kalyani, G.N.S. ANN approach for condition monitoring of power transformers using DGA. In Proceedings of the 2004 IEEE Region 10 Conference TENCON 2004, Chiang Mai, Thailand, 24 November 2004; pp. 444–447. [Google Scholar]
Taha, I.B.M.; Ibrahim, S.; Mansour, D.-E.A. Power Transformer Fault Diagnosis Based on DGA Using a Convolutional Neural Network with Noise in Measurements. IEEE Access 2021, 9, 111162–111170. [Google Scholar] [CrossRef]

Figure 1. Power transformer HI state concept.

Figure 2. CNN architecture.

Figure 4. Training dataset samples redistributing using the oversampling process.

Figure 5. Training dataset samples distribution before (a) and after (b) oversampling process.

Figure 6. Percentage accuracy and loss during the training process after oversampling the training dataset.

Table 1. Scoring of the transformer Health index [16].

	Transformer Condition Index	$K_{i}$	$H I C$
1	DGA	10	4, 3, 2, 1, 0
2	OQ	8	4, 3, 2, 1, 0
3	DP	6	4, 3, 2, 1, 0

Table 2. Threshold limits of the HI state [16].

HI State	Limits	Output Operation Decision
Good	≥85	No maintenance required
Fair	85–50	More diagnostic testing is required.
Poor	<50	The transformer must be out of service.

Table 16. Accuracy of the full-feature scenarios against uncertainty data from 0% to ±25% uncertainty noise level.

	0%	±5%	±10%	±15%	±20%	±25%
Good	98.24	97.95	97.07	97.65	98.24	96.48
Fair	99.17	96.67	95.00	93.33	89.17	85.83
Poor	100	100	100	100	100	100
All	98.53	97.69	96.64	96.64	96.01	93.91

Table 17. Accuracy of the reduced-feature scenarios (ReliefF) against uncertainty data from 0% to ±25% uncertainty noise level.

	0%	±5%	±10%	±15%	±20%	±25%
Good	98.33	96.67	96.67	94.17	89.17	82.50
Fair	100	100	100	80.00	93.33	86.67
Poor	95.38	94.12	93.49	91.39	90.55	88.45
All	94.13	92.96	92.08	90.91	90.91	90.62

Table 18. Comparison of the proposed methods and the methods in [24] by using two dataset samples.

		Models in [16]				Models in [14]
	Proposed CNN	DT	SVM	KNN	EN	ANN	MLR	J48	RF
Full-Feature	98.4	96	96.4	96.7	96.4	94.9	95.5	95.6	96.6
Reduced-Feature	96.9	94.7	95.2	95.5	95.6	95.1	95.3	95.3	96.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Taha, I.B.M. Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics 2023, 12, 2405. https://doi.org/10.3390/electronics12112405

AMA Style

Taha IBM. Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics. 2023; 12(11):2405. https://doi.org/10.3390/electronics12112405

Chicago/Turabian Style

Taha, Ibrahim B. M. 2023. "Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling" Electronics 12, no. 11: 2405. https://doi.org/10.3390/electronics12112405

APA Style

Taha, I. B. M. (2023). Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics, 12(11), 2405. https://doi.org/10.3390/electronics12112405

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling

Abstract

1. Introduction

2. Mathematical Analysis and System Model

2.1. Power Transformer HI State Calculations

2.1.1. DGA Test

2.1.2. OQ Test

2.1.3. DP Test

2.2. Suggested CNN Model and Solution Procedure

2.3. Suggested Oversampling Technique

2.3.1. MRMR Technique

2.3.2. Chi2 Technique

2.3.3. ANOVA Technique

2.3.4. ReliefF Technique

2.3.5. Kruskal–Wallis

3. Results and Discussion

3.1. Oversampling Technique

3.2. Reduced-Feature Results

3.3. Effectiveness of the CNN Model

3.3.1. CNN Model with Uncertainty

3.3.2. Comparisons with Recently Published Works

4. Conclusions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI