Next Article in Journal
Evaluation of a Smart Intercom Microservice System Based on the Cloud of Things
Previous Article in Journal
Bridge of Trust: Cross Domain Authentication for Industrial Internet of Things (IIoT) Blockchain over Transport Layer Security (TLS)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling

by
Ibrahim B. M. Taha
Department of Electrical Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
Electronics 2023, 12(11), 2405; https://doi.org/10.3390/electronics12112405
Submission received: 7 April 2023 / Revised: 20 May 2023 / Accepted: 22 May 2023 / Published: 25 May 2023

Abstract

:
The transformer health index (HI) concept has been used as an important part of management resources and is implemented for the state assessment and ranking of Power transformers. The HI state is estimated based on many power transformer oil parameters. However, the main problem in the HI procedure as a diagnostic method is the presence of routine measurements and accurate test results. The power transformer HI prediction is carried out in this work using 1361 dataset samples collected from two different utilities. The proposed model is used to predict and diagnose the HI state of the power transformer by using a convolutional neural network (CNN) approach. The imbalance between the training dataset sample classes produces a good prediction of the class with a major number of samples while a low detection of the class has a minor number of samples. The oversampling approach is used to balance the training samples to enhance the prediction accuracy of the classification methods. The proposed CNN model predicts the HI of the power transformers after applying the oversampling approach to the training dataset samples. The results obtained with the proposed CNN model are compared with those obtained with the optimized machine learning (ML) classification methods with the superiority of the CNN results. Feature reductions are applied to minimize testing time, effort, and costs. Finally, the proposed CNN model is checked with uncertain noise in full and reduced features of up to ±25% with a good prediction diagnosis of the power transformer HI.

1. Introduction

In the last decades, electrical power generation resources have increased rapidly. The electrical power systems include different electrical power resources, especially renewable energy resources. The electrical power generation resources are far from the load centers for different reasons. The power transformer is one of the more important pieces of equipment in energy transmission systems. In the case of power transformer failures, the utilities will be subject to major economies such as loss of revenue. An electrical shortage at the end users causes the shutdown of industries, halting production, and worsening redundancy [1,2]. Therefore, transformer asset management has been extensively adopted to accomplish and suddenly avoid failures.
Lately, the power transformer health index (HI) state has become a suitable tool to combine current information about the power transformer. This information includes transformer operating observations, field tests, inspections, and laboratory testing [2]. The laboratory tests include three main tests. The first test is the dissolved gas analysis (DGA) that depends on seven gases (hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4), ethane (C2H6), carbon dioxide (CO2), and carbon monoxide (CO)). The second test is the oil quality test (OQ), which depends on six factors of the transformer oil (the oil dielectric strength or breakdown voltage (BD), oil interfacial tension (IFT), oil color, oil humidity (Mois.), oil insulation dissipation factor (DF), and oil acidity (Acid.)). The third test is the depolarization (DP) test, which depends on the furfural content of the transformer oils. These three tests’ output results are collected to evaluate the overall value of the power transformer HI state. The power transformer HI state is used for planning routine maintenance that affects the transformer’s age and end of life [1,2].
The main four approaches are introduced to evaluate the power transformer HI state. These four approaches are the scoring and ranking method [3,4,5,6]; the combination of scoring, ranking, and tier method [7,8]; matrices [9]; and the multi-features factor assessment model [10]. The scoring and ranking method is the most famous for evaluating the power transformer HI state. In this approach, to assess the HI state, the most common parameters are the DGA, OQ, and DP tests. The first step includes determining each test’s HI code or factor (HIC). The HIC for DGA, OQ, and DP are then standardized using the scoring and weighting values (4, 3, 2, 1, and 0). Finally, the HIC can be used for calculating the final value of the power transformer HI state using a constant weighting factor ( K j ). The main drawback of these approaches is the requirement for high numbers of features from 24 to 27, which requires high cost, high effort, and high test times.
Some researchers have used the artificial intelligence approach for predicting the power transformer HI state value [11,12,13,14,15,16]. In [11], a hybrid fuzzy-logic support vector machine approach (FLSVM) is used for indicating the HI of in-service transformers based on transformer test data. The FLSVM is used to find the relation between the output results of the insulation system test results and the output HI state value. It deals with the imbalanced dataset distribution due to small samples of the ‘Poor’ stats. The main drawback of this work is the accuracy of predicting the HI state, especially with the ‘Poor’ states. The researchers in [12] presented a general cost-effective ANN model to predict the power transformer HI state with approximately 95% accuracy with subset input features. The ANN model is tested with another dataset collected from another utility company with about 89% predicting accuracy.
The main drawbacks are low signifying accuracy with new data and not presenting any results about the predicting accuracy with the majority and minority classes. In [13], an assessment model of the transformer HI is developed based on the Neural-Fuzzy (NF) technique. Two systems of datasets (in-service assessment dataset and Mont Carlo Simulation (MCS) dataset) are used for the training process of the NF model. The results illustrate the high predicting accuracy of the NF model using the MCS dataset compared with that of scoring methods. Still, it had bad detecting accuracy with the in-service assessment dataset. The main drawbacks are low detecting accuracy and there is no solution to bad data distribution between the different HI state classes. In [14], a machine learning (ML) model was presented for predicting the power transformer HI state. It used ML techniques such as ANN, decision tree, support vector machine, k-nearest classification, and random forest (RF) classification methods. The ML predicting method, especially the RF model, had high predicting accuracy for detecting the HI state. Moreover, feature-reduction techniques were used to reduce the number of input features in the ML models. However, the main drawback of this model was its bad detecting accuracy of the minority class (‘Poor’ state) and it was dealing with the bad distribution between classes in the training dataset. In [15], a fuzzy evidence fusion was presented to predict the transformer condition. It introduced a more detailed fuzzy model for predicting the transformer condition state. The main drawback of this model is the small number of cases that confirmed the accuracy of the suggested model (only 39 samples), with only 84.6% predicting accuracy. In [16], four optimized machine learning (ML) models were presented for predicting the transformer HI states. It used 1365 dataset samples collected from two different electric utilities. High predicting accuracy was obtained with the four ML models (95.9% with the ensemble classification model (EN)). Feature reduction with the MRMR technique was used to reduce the input features to only eight features with good accuracy, especially with the EN model (95%). The drawbacks of this model are that the predicting model is a two-stage model that requires a lot of time and effort for model building, low detecting accuracy with minority class state ‘Poor’, and there is no dealing with bad dataset distribution between classes.
The main drawbacks of the previous works of suggested predicting approaches of the power transformers HI are the low overall predicting accuracy, particularly the low predicting accuracy of the minor class of the HI states. The minor class is the “Poor” class, which is very important for predicting the transformer states, as the bad prediction of the “Poor” class leads to the fast failure of the power transformers. Therefore, a true prediction of the “Poor” class is very important for the continued operation of the transformers and the maintenance procedure in future steps to reduce power transformers’ failure rates in future periods.
This work proposes a new CNN model for power transformer HI state prediction. The proposed model is built using the output results of the DGA, OQ, and DP as inputs to the CNN model that is used to predict the final power transformer HI state. The imbalance between classes in the training data produces the tendency of the trained model toward the class with the majority number of samples, which weakens the prediction of the CNN model to the class with the minority number of samples. An oversample generator is suggested to generate new samples for the minority class to equal the number of samples for different classes. The prediction of the proposed CNN model is enhanced after applying the oversampling process to the training set. The overall prediction accuracy is improved from 89.92% without the oversampling process to 98.52% after applying the oversampling process to the training dataset.
The main contributions of this work are the enhancement of predicting the accuracy of the minor transformer HI state (which is very low in most previous works) and the overall accuracy using the CNN model and the suggested oversampling approach. Different feature-selection techniques are used to reduce the input features to decrease costs, effort, and time of testing. Five feature-selection approaches are used with the effectiveness of the ReliefF technique to determine the most important parameters used in predicting the HI state with the proposed CNN model. The robustness of the proposed CNN model is checked by applying the uncertainty to the input dataset (± up to 25%) that is inserted into the CNN model with good predicting accuracy to the CNN model. The effectiveness of the proposed CNN model is checked by comparing its results with different optimized machine learning approaches. Furthermore, the accuracy of the proposed CNN model for predicting the HI state is compared with recently published works with the efficacy of the proposed CNN model.
This work is organized into four sections. Section 2 covers the mathematical analysis and system model, which consists of three subsections: power transformer HI state calculations, suggested CNN model and solution procedure, and suggested oversampling technique. The results and outcomes are presented in Section 3, which consists of three sections: Results and Discussions consist of subsections: oversampling approach, reduced-feature results, and effectiveness of the CNN model. The conclusions are presented in Section 4.

2. Mathematical Analysis and System Model

2.1. Power Transformer HI State Calculations

The power transformer HI state is generally known as a tool that combines operation state, field tests, and laboratory testing results. The mixture of the results is transformed into an objective and measurable index, providing a complete transformer health assist [1,2]. HI plays a significant role in evaluating the condition and monitoring the transformer’s health. The power transformer HI state is an excellent index considering capital investment, asset maintenance cost, and operating maintenance. Figure 1 illustrates the three stages of the HI state concept.
The three stages are input data, mathematical technique/algorithm for HI evaluation, and decision output. The output decision can be maintenance, repair and upgrades, replacement, monitoring, and contingency control based on the power transformer’s value or the state of HI.
Some algorithms evaluate the power transformer HI state [1,2,3,4,5,6,7,8,9,10,16]. The Naderian et al. [3] is considered the most common technique used to assess the HI state based on different tests of the transformer oils. The transformer tests are dissolved gas analysis (DGA), oil quality (OQ), depolarization test (DP), and load factor test (LF). In this work, the HI state is built based on the first three tests using Naderian et al.

2.1.1. DGA Test

In the DGA test, seven dissolved gasses in the transformer oil are used as an input to obtain the output DGA index/factor ( D G A i ). These gases are hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4), ethane (C2H6), carbon dioxide (CO2), and carbon monoxide (CO). The D G A i can be estimated as follows [3,16]:
D G A i = i = 1 7 S i × W i i = 1 7 W i
where S i   is the scoring factor that has a value of 1, 2, 3, 4, 5, or 6 according to the level of the dissolved gasses. For example, for H2, S 1   = 1 when its value is less than 100 ppm and equals 2 for the 100 to 200 ppm range (see Appendix A).
W i is the weighting factor. The W i can be evaluated as follows [3,16]:
W i = 1 ,         f o r   C O   a n d   C O 2 3 ,         f o r   C H 4 ,   C 2 H 6   a n d   C 2 H 4 5 ,         f o r   C 2 H 2 2 ,         f o r   H 2
The transformer state condition according to the D G A i   index can be converted to code that is used later in evaluating the HI state value as follows [3,16]:
D G A C o d e = 4 ,         f o r   D G A i 1.2 3 ,         f o r   1.2 < D G A i 1.5 2 ,         f o r   1.5 < D G A i 2.0 1 ,         f o r   2.0 < D G A i 3.0 0 ,         f o r   D G A i > 3.0

2.1.2. OQ Test

The oil quality ( O Q )   test is the second main test required to evaluate the power transformer HI state. The O Q   test of the transformer oil depends on six factors: the oil dielectric strength or breakdown voltage (BD), oil interfacial tension (IFT), oil color, oil humidity (Mois.), oil insulation dissipation factor (DF), and oil acidity (Acid.). The oil quality index ( O Q i ) can be evaluated like that of the D G A i as follows [3,16]:
D G A i = i = 1 7 S i × W i i = 1 7 W i
where S i   is the scoring factor that has a value 1, 2, 3, or 4 according to the level of each factor. For example, for the oil humidity, S 1   = 1 when its value is less than 20 ppm, equals 2 for the value of the 20 to 30 ppm range, 3 for the value of the 30 to 40 ppm range, and 4 for the value greater than 40 ppm (see Appendix A).
W i is the weighting factor. The W i can be evaluated as follows [3,16]:
W i = 1 ,         f o r   A c i d . 2 ,         f o r   I F T   a n d   c o l o r 3 ,         f o r   B D 4 ,         f o r   M o i s .   a n d   D F
The transformer state condition according to the O Q i   index can be converted to code that is used later in evaluating the HI state value as follows [3,16]:
O Q C o d e = 4 ,         f o r   O Q i 1.2 3 ,         f o r   1.2 < O Q i 1.5 2 ,         f o r   1.5 < O Q i 2.0 1 ,         f o r   2.0 < O Q i 3.0 0 ,         f o r   O Q i > 3.0

2.1.3. DP Test

The depolarization ( D P ) of the power transformer can be evaluated based on the value of the furan or furfural factor. The furfural content of the power transformer oil is used as a polymerization degree of the insulation of the paper. The Furan levels in the power transformers are normally less than 0.1 ppm. It is recommended to carry out the furfural test when the power transformer has a high level of carbon monoxide and carbon dioxide is overheated, or its age is greater than 25 years. In this instance, a furfural test is recommended periodically (see Appendix A).
The depolarization index ( D P C o d e ) can be evaluated based on the value of the furan values as follows [3,16]:
D P C o d e = 4 ,         f o r   F u r a n < 0.1 3 ,         f o r   0.1 F u r a n < 0.5 2 ,         f o r   0.5 F u r a n < 1.0 1 ,         f o r   1.0 F u r a n < 5.0 0 ,         f o r   F u r a n 5.0
The power transformer health index state ( H I ) can be investigated based on three previously mentioned H I C codes ( D G A c o d e ,   O Q c o d e   a n d   D P c o d e ) . The score of each code is presented in Table 1. The weighting factor ( K i ) of the D G A c o d e is taken by 10 due to its greater effect on evaluating the HI state compared to O Q c o d e   a n d   D P c o d e . The weighting values of the O Q c o d e   a n d   D P c o d e are taken with 8 and 6, respectively.
The power transformer HI is evaluated based on the three codes as follows [4,14,16]:
H I = 100 × i = 1 3 K i × H I C i i = 1 3 4 K i
where K i is the weight factor for each transformer condition index code [1,17]. The D G A c o d e is the important code that denotes the HI faulty state ‘Poor’ and healthy state ‘Fair’ and ‘Good’ states. Utility engineers use the values of the D G A features as indicators for faulty and healthy states of the power transformers.
The power transformer HI states and their corresponding threshold limits are presented in Table 2 [14,16]. The ‘Good’ state represents the state where the power transformer HI value is higher than or equal to 85%, while the ‘Fair’ state is where the power transformer HI varies from 85% to 50%. The ‘Poor’ state is achieved when the power transformer HI is smaller than 50%. The required decision corresponding to each HI state is also specified in Table 2 [14,16].

2.2. Suggested CNN Model and Solution Procedure

Figure 2 illustrates the main architecture of the CNN model used for the classification process. CNN architecture has two main parts. The first part is the feature selections, while the second is the classification part. The first part has three layers: input, convolution, and pooling. The input features are converted to images and then inserted into the input layer, and the output of the input layer is inserted into the convolution layer. The objective of the convolution layer is to separate and identify the input features of the image for analysis in a procedure called feature extraction. The input image is passed through N × N filters to extract image various features. It is carried out by sliding the filter above the image. The convolution layer’s output is termed the feature map, which provides information about the image, such as the corners and edges. Then, this feature map is fed to other layers to learn several other features of the input image. The output of the convolution layer is inserted into the pooling layer to decrease the feature map to minimize the computational costs. A fully connected layer utilizes the pooling layer’s output and detects the image’s class based on the features extracted in the convolution layer. The fully connected layer’s output classes are inserted into the output layer that extracts the output class diagnoses.
Figure 3 presents the solution procedure of the power transformer HI state. The following steps summarize the solution procedure:
  • The collected dataset samples are divided into two main subsets (training subset and testing subset).
  • The new generating dataset is obtained by oversampling to enhance training set distribution between classes.
  • After the oversampling process, the training is normalized before applying it to the suggested CNN model that uses the classification process.
  • The training and the testing datasets are inserted into the diagnosed CNN model to obtain the diagnosis of each set.
  • Classification analysis is carried out for the obtained results.
Figure 3. Classification model procedure.
Figure 3. Classification model procedure.
Electronics 12 02405 g003

2.3. Suggested Oversampling Technique

The imbalanced training dataset samples between classes cause the trained model’s tendency toward the majority class. The detecting accuracy of the minority class is very bad due to this imbalanced distribution of the training dataset: very bad training set distribution between the different HI state classes (662 samples for ‘Good’, 204 samples for the ‘Fair’ state, and 19 samples for the ‘Poor’ state). The imbalance in training dataset samples can be solved using oversampling or undersampling. This work applies the oversampling process to the training dataset samples to enhance the balance between training set classes.
The number of samples of each HI state can be determined, and the HI state with the majority number of samples can be defined as follows:
m = m a x .   c l a s s 1 . . c l a s s n
where n is the number of HI state classes. class1, class2, …, and classn are the number of samples of dataset number 1, 2, …, and n, respectively. The number of a repeat of each HI state class can be evaluated as follows:
m i = m c l a s s i   ,   i = 1,2 , . ,
The rest number of the samples of the dataset HI state that will be selected randomly from the dataset of each class will be calculated from (5):
m i i = m m i × c l a s s i ,   i = 1,2 , . , n
The ith HI sate dataset samples are finally increased according to:
d a t a s e t c l a s s 1 = d a t a s e t i 1 d a t a s e t i 2 . . d a t a s e t i m i 1 r a n d ( d a t a s e t i   ,   m i i   )
The procedure of oversampling is introduced in the flowchart shown in Figure 4.
The input features are the 14 features obtained from three tests: dissolved gas analyses, oil quality, and depolarization tests, which requires more time, cost, and operations. Therefore, the reduced-feature technique is used to reduce these features to reduce cost, time, and operations. The reduced-feature techniques used are MRMR, Chi2, ANOVA, RelifF, and Kruskal–Wallis.

2.3.1. MRMR Technique

The features reduction using the Minimum Redundancy Maximum Relevance (MRMR) Algorithm. The MRMR technique obtains an optimal set of features that are commonly and maximally different and can effectively represent the response variable. The MRMR technique minimizes the feature set redundancy and maximizes the feature set relevance of the response variable. More details of the MRMR technique are presented in [16,18,19].

2.3.2. Chi2 Technique

Chi-2 or Chi-square, a feature-selection technique, is used to investigate whether each interpreter variable is independent of the response variable. Then, the probability of occurrence of a given feature (p-values) is used to rank features. The feature scores are related to –log(p). More details of Chi-2 are presented [17,20,21].

2.3.3. ANOVA Technique

ANOVA, or analysis of variance, is used to determine the predictor variable. Then, rank features utilizing the probability of occurrence of a given feature (p-values) based on the difference in means between feature groups. The ANOVA checks the assumption that the predictor values grouped by the output corresponding classes are drawn from inhabitants with the same standard against the other assumption that the inhabitants imply are not all similar. The feature scores relate to –log(p). More details of ANOVA are presented [22,23,24].

2.3.4. ReliefF Technique

ReliefF is a procedure created by Kira and Rendell in 1992 that uses a filter-method method for feature selection or feature reduction that is remarkably sensitive to feature relations. This technique performs well for estimating feature importance for distance-based supervised models. It utilizes pairwise distances between observations to predict the corresponding output response. More details of ReliefF are presented in [25,26].

2.3.5. Kruskal–Wallis

The Kruskal–Wallis test is defined as one-way ANOVA on ranks. It is a non-parametric technique to test whether the samples have the same distribution. It ranks features utilizing the p-values returned by the Kruskal–Wallis test. The Kruskal–Wallis test tests the assumption that the predictor values categorized by the response output classes are drawn from inhabitants with the same median compared to the alternative hypothesis that the population medians are not all the same for each predictor variable. The test scores relate to –log(p). More details of Kruskal–Wallis are introduced in [27,28].

3. Results and Discussion

The dataset samples were assembled from two regions. The first dataset (730 samples) is collected from the Gulf Region at a sub-transmission stage of a medium voltage with 66 kV. The second dataset (631 samples) is assembled from the transmission regions of Malaysia Electricity Company at power transformers in the transmission line stage with 220 kV. The two datasets are added and then divided randomly into 65% (885 samples) and 35% (476 samples) for the training and testing data process. The CNN model is built using R2022a MATLAB/software. The main training dataset samples (885 samples) are applied to the CNN model. The dataset samples are normalized before inserting them into the classification CNN model. The normalization of the dataset samples is implemented as follows:
x i = ( x i x j m i n ) / ( x j m a x x j m i n )
where x i is the ith dataset sample of the jth dataset feature, x j m i n is the minimum value of the certain jth dataset feature, and x j m a x is the maximum value of the certain jth dataset feature.
The testing dataset samples (476 samples) after applying the normalizing process (Equation (13)) are used to check the CNN model prediction accuracy. The results of the CNN model during the training and testing are presented in Table 3 and Table 4, respectively. The results illustrate that the prediction accuracy of the power transformer HI state is low, especially during the testing stage, due to the tendency of the trained CNN model to the majority class (‘Good’ state). The prediction accuracy of the majority state type (good state) (335/341 = 98.24%), while the prediction accuracy of the minority class is very low (8/15 = 53.33%). The training dataset is applied to the optimized classification machine learning (ML) methods such as decision tree (DT), Discriminant analysis (DA), Naïve Bayes (NB), support vector machines (SVM), k-nearest neighbors (KNN), ensemble (EN), and artificial neural network (ANN). The ML methods are built using MATLAB/software 2021a classification learner toolbox. Table 5 and Table 6 compare the proposed CNN model and the ML methods, and the results of these ML methods during the training and testing stages, respectively. The results illustrate that the accuracy of all ML methods for detecting the ‘Good’ state is better than that of the ‘Fair’ and ‘Poor’ states. The ‘Poor’ state (minority class) has a bad prediction accuracy compared to the ‘Good’ state (the majority class).
The accuracy of the CNN model and other ML methods for the different stages of calculations (training, testing, and overall) is calculated as
% A c c u r a c y = T P + T N P + N × 100
where T P ,   T N are the number of true positive and true negative samples, respectively. F P ,   F N are the number of false positive and false negative samples, respectively. P = T P + F N and N = F P + T N .
Other classification performance factors are presented for more comparisons between the suggested CNN model and the different classification models, such as sensitivity, specificity, precision, and F1-score, which can be evaluated as follows:
S e n s i t i v i t y = T P P
Specificity =   T N N
P r e c i s i o n = T P T P + F P
F 1 S c o r e = 2 × S p e c i f i c i t y × P r e c i s i o n S p e c i f i c i t y + P r e c i s i o n
Table 3. Confusion matrix of the CNN model during the training process without data oversampling.
Table 3. Confusion matrix of the CNN model during the training process without data oversampling.
True Class % Accuracy
Good6593099.55
Fair3200198.04
Poor0019100
GoodFairPoor99.21
Predict Class
Table 4. Confusion matrix of the CNN model during the testing process without data oversampling.
Table 4. Confusion matrix of the CNN model during the testing process without data oversampling.
True Class % Accuracy
Good3356098.24
Fair3085570.83
Poor07853.33
GoodFairPoor89.92
Predict Class
Table 5. Comparison between the results of the CNN model and the other methods during the training stage.
Table 5. Comparison between the results of the CNN model and the other methods during the training stage.
HIGoodFairPoor% Accuracy
TSN *66220419
DT639172892.54
DA6501581192.54
NB6261671190.85
SVM649182794.69
KNN654170894.01
EN653187495.37
ANN649175793.90
CNN6592001999.21
* TSN is the total number of samples.
Table 6. Comparison between the results of the CNN model and the other methods during the testing stage.
Table 6. Comparison between the results of the CNN model and the other methods during the testing stage.
HIGoodFairPoorSensitivitySpecificityPrecisionF1-Score% Accuracy
TSN34112015
DT3327040.610.880.640.6285.29
DA3386550.620.870.700.6585.71
NB3238580.730.900.740.7487.39
SVM3309330.650.920.750.6889.50
KNN3397530.610.880.830.6687.61
EN3318220.600.880.890.6387.18
ANN3327040.660.910.810.7189.92
CNN3386550.740.910.800.7789.92

3.1. Oversampling Technique

The imbalanced training dataset samples between classes cause the trained model’s tendency toward the majority class. The detecting accuracy of the minority class is very bad due to this imbalanced distribution of the training dataset: very bad training set distribution between the different HI state classes (662 samples for ‘Good’, 204 samples for the ‘Fair’ state and 19 samples for the ‘Poor’ state). The imbalance in training dataset samples can be solved using oversampling or undersampling. This work applies the oversampling process to the training dataset samples to enhance the balance between training set classes.
The training dataset numbers of the three HI states before and after the oversampling process are shown in Figure 5. It illustrates the bad distribution of the training dataset number of samples before the oversampling process and an equality distribution after applying the suggested oversampling procedure.
Figure 6 illustrates the prediction accuracy and loss versus the iteration number during the training attempts after oversampling the training dataset to the CNN model. The results demonstrate that the training accuracy is near one hundred percent, while the loss is very low and close to zero, showing a good training accuracy of the suggested CNN model.
The CNN loss can be expressed as follows:
L o s s = i = 1 n j = 1 C k i j O i j
where n is the sample numbers, C is the class numbers, k i j represents the probability that the i th sample goes to the j th class, and O i j is the output of the dataset sample i in the class j , which is the output of the CNN softmax layer.
Table 7 presents the CNN hyperparameters used for the classification process of the power transformer HI state. The CNN hyperparameters are selected to have a good classification performance.
Table 8 presents the predicted accuracy of the training dataset after applying the oversampling process to the training dataset. The predicting accuracy of the ‘Poor’ state is 100%, while that of the ‘Good’ and ‘Fair’ states are 99.95% and 97.73%, respectively, and the overall accuracy is 99.19%.
Table 9 shows the predicting accuracy of the CNN model after the oversampling process with the testing dataset (476 samples). The predicted accuracy of both the ‘Poor’ and ‘Fair’ states are 100% and 97.73%, respectively, while that of the training model without oversampling process is 53.33% and 70.83%, respectively. Moreover, the overall accuracy of the CNN model with the oversampling process is enhanced to 98.53% compared to 89.92% for the CNN model without oversampling process.
Table 7. CNN model selected parameters.
Table 7. CNN model selected parameters.
ParameterValue
Convolution Layer 1Filter Size 3   × 1
Number of filters32
Padding 1   × 0
Convolution Layer 2Filter Size 3   × 1
Number of filters175
Padding 1   × 0
Max-Pooling LayerStride1
Fully Connected LayerOutputs3
Learning Algorithm OptionsStep size, α10−3
Gradient threshold0.001
Training algorithmAdam
Max. Epochs150
Verbose1
Activationsoftmax
CNN typesclassification
Table 8. Confusion matrix of the CNN model during the training process.
Table 8. Confusion matrix of the CNN model during the training process.
True Class % Accuracy
Good6611099.85
Fair15647097.73
Poor00662100
GoodFairPoor99.19
Predict Class
Table 9. Confusion matrix of the CNN model during the testing process.
Table 9. Confusion matrix of the CNN model during the testing process.
True Class % Accuracy
Good3356098.24
Fair1119099.17
Poor0015100
GoodFairPoor98.53
Predict Class
Figure 7 compares the actual HI state of the transformer (Good, Fair, and Poor) against the predicted HI state to illustrate the prediction accuracy of the suggested CNN model during the testing process with 476 dataset samples. The results demonstrate that the proposed CNN model has excellent predicting accuracy with high prediction accuracy of 98.24%, 99.17%, and 100% for the ‘Good’, ‘Fair’, and ‘Poor’ states, respectively.
Figure 8 compares the CNN model prediction results of the power transformer HI state accuracy based on the testing dataset (476 samples) with and without oversampling. It illustrates the enhancement of the HI state after applying the oversampling process, especially with the ‘Poor’ and ‘Fair’ states.
After the oversampling process, the dataset is used to train the optimizing ML models. The training process of the different ML models is presented in Figure 9.
Figure 7. HI state prediction of the suggested CNN model during the testing process after oversampling.
Figure 7. HI state prediction of the suggested CNN model during the testing process after oversampling.
Electronics 12 02405 g007
Figure 8. Comparison between the fault prediction of the CNN model during the testing stage without and with the oversampling process.
Figure 8. Comparison between the fault prediction of the CNN model during the testing stage without and with the oversampling process.
Electronics 12 02405 g008
Figure 9. Comparison among the ML classification methods against the iteration number during the optimization process.
Figure 9. Comparison among the ML classification methods against the iteration number during the optimization process.
Electronics 12 02405 g009
Compared to other models, the EN method model has a minimum error during the training stages. In contrast, the NB method model has a higher error. The cross-fold validation approach with five folds was used for training the ML-optimized models. Hence, the optimization parameter in the classification learner toolbox is applied to select the suitable classification model and the matching parameters of the main chosen methods. This work uses Bayesian optimization (BO) with ML methods to determine their optimal parameters. The BO approach is useful for optimization problems and can be used for most ML techniques for optimal parameter selection [29,30,31]. The training parameters of different ML models are introduced in Table 10.
Table 11 compares the results of the CNN model and the other ML methods after applying the oversampling process to the training dataset samples. The number of predicting samples for each power transformer HI state and the overall accuracy corresponding to the CNN and ML methods during the training stage. The results illustrate the high predicting accuracy for all models compared to those without the training dataset’s oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 99.19%) compared to other ML methods (overall accuracy of best ML learning with the EN method is 97.94%).
Table 12 compares the results of the CNN model and the other ML learning methods (trained after applying oversampling process) using the testing dataset samples. It introduces the number of predicting samples for each power transformer HI state and the predicting performance factors corresponding to the CNN and ML methods during the testing stage. The results illustrate the high predicting accuracy for all models compared to those without applying the oversampling process (Table 6). The CNN model predicting accuracy is enhanced to 98.53% compared to 89.92% with the model without the oversampling process. Moreover, all the other ML learning model-predicting performances are better than the predicting performance for the models without oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 98.53%) compared to other ML methods (overall accuracy of best ML learning with the SVM method is 96.43%).
Table 10. Optimal parameters of the ML methods during optimization with the training dataset after oversampling processes.
Table 10. Optimal parameters of the ML methods during optimization with the training dataset after oversampling processes.
MethodOptimization Parameters
DTMax. No. of splits: 120
Split criterion: Towing rule
DADiscriminant type: Quadratic
NBDistribution names: Kernel
Kernel type: Gaussian
SVMMulticlass method: One-vs.-All
Box constraint level: 985.7716
Kernel function: Cubic
KNNNumber of neighbors: 991
Distance metric: Cosine
Distance weight: Squared inverse
Standardize data: true
ENEnsemble method: AdaBoost
Number of learners: 140
Learning rate: 0.9897
Maximum number of splits: 18
ANNNumber of fully connected layers: 2
Activation: Sigmoid
Standardize data: No
Regularization strength (Lambda): 5.1411 × 10−9
First layer size: 138
Second layer size: 248
Table 11. Comparison between the results of the CNN model and the other methods during the training stage after the oversampling process.
Table 11. Comparison between the results of the CNN model and the other methods during the training stage after the oversampling process.
HIGoodFairPoor% Accuracy
SN662662662
DT62763964796.32
DA62350264989.33
NB46055164283.23
SVM63863865297.08
KNN63961865596.27
EN65264764697.94
ANN63263965697.03
CNN66164766299.19
Table 12. Comparison between the results of the CNN model and the ML methods during the testing stage after the oversampling process.
Table 12. Comparison between the results of the CNN model and the ML methods during the testing stage after the oversampling process.
HIGoodFairPoorSensitivitySpecificityPrecisionF1-Score% Accuracy
SN34112015
DT303120150.960.960.920.9392.02
DA32481150.880.910.770.8088.24
NB259105150.880.900.710.7679.62
SVM325119150.980.980.960.9796.43
KNN318120150.980.980.950.9695.17
EN272120150.930.940.880.8985.50
ANN322120150.980.980.950.9796.01
CNN335119150.990.990.980.9998.53

3.2. Reduced-Feature Results

This section presents the results of the reduced-features CNN model. The selected features are carried out based on five feature-reduction techniques. These methods are MRMR, Chi2, RelifF, and Kruskal–Wallis. The minimum number of features that give good predicting results is eight, like that presented in [16].
The arrangements of eight high-ranked features with different feature-reduction techniques are shown in Table 13. The training scores for eight important features against the MRMR, ReliefF, ANOVA, and Kruskal–Wallis approaches are demonstrated in Figure 10.
The CNN model with the oversampling process is trained with the high-ranked eight features of each of the five feature-reduction approaches. The predicting accuracy corresponding to each power transformer HI state and the overall state are presented against each feature-reduction approach. The results are shown in Table 14. The results illustrate that the predicting accuracy with the ReliefF technique is better than that of other feature-reduction methods.
Table 14 presents the predicting accuracy of the CNN model for each feature-reduction technique during the training stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.
Table 15 presents the predicting accuracy of the CNN model for each feature-reduction technique during the testing stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.
Figure 10. High-ranked eight features of the MRMR, ReliefF, ANOVA, and Wallis methods.
Figure 10. High-ranked eight features of the MRMR, ReliefF, ANOVA, and Wallis methods.
Electronics 12 02405 g010
Table 13. High-ranked eight features corresponding to the five feature-reduction techniques.
Table 13. High-ranked eight features corresponding to the five feature-reduction techniques.
MRMRCHi2ReleifFANOVAWallis
DFDFIFIFDF
FuranFuranBDVColorIF
IFIFMoisDFColor
ColorColorAcidAcidAcid
AcidAcidC2H2FuranC2H4
CO2CO2COMoisCO2
C2H2C2H4C2H6CO2Mois
C2H4C2H2CH4COH2
Table 14. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the training stage.
Table 14. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the training stage.
HI StateMRMRCHi2ReliefFANOVAWallis
Good97.8997.8997.8997.8994.11
Fair87.3187.3196.6896.6894.86
Poor99.799.710010099.7
All94.9694.9698.1998.1996.22
Table 15. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the testing stage.
Table 15. Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the testing stage.
HI StateMRMRCHi2ReliefFANOVAWallis
Good95.0195.0194.1384.7590.32
Fair88.3388.3398.3396.6795.83
Poor100100100100100
All93.4993.4995.3888.2492.02

3.3. Effectiveness of the CNN Model

Two methods measure the effectiveness of the proposed model. The first method used to measure the effectiveness of the proposed CNN model is the uncertainty in the input dataset inserted into the CNN model. In contrast, the second method compares the results of the suggested model with the recently published works.

3.3.1. CNN Model with Uncertainty

The datasets of the power transformers are collected offline in three major steps. The first one is obtaining samples from the power transformers. The second step is obtaining the gases from the transformer oil, and the third is detecting the power transformer HI state. Special syringes are used to extract oil samples. The extracted samples are saved and transported to laboratories. Storage time and temperature affected the gas concentration value. Air bubbles are the most critical factor affecting gas concentrations [32]. Air bubbles decrease dissolved gases due to the gas diffusion into the air bubbles, thus leaving behind the oil [33]. Hence, uncertainty during the measurement process affects the power transformer HI state detection. The uncertainty during measurements must be studied by the classification methods implemented for detecting the power transformer HI state. A ±14% uncertainty noise is produced by the temperature effects and the sample’s storage, and an uncertainty noise up to ±5% is made by measurement process accuracy [34]. This study considers an uncertainty noise up to ±25%.
The uncertainty in testing samples is presented to each sample, R = r i i = 1 14 , to generate a new sample with a selected uncertainty level of up to ±25%, R = r i i = 1 14 , using the following equation adapted from [35].
R i = r i × 1 + m 2 n i 1 100
where m is the maximum uncertainty level (±25%) and N = n i i = 1 14 is a 14×1 random vector with component values between 0 and 1.
When m has an uncertainty noise level of ±5% to ±25% with the step of ±5%, the original input feature vector and N l are produced element by element to obtain a dataset with uncertainty noise. These datasets with uncertainty noise are incorporated into the proposed CNN model to measure the predicted performance during the uncertainty evaluation.
Table 16 and Table 17 present the predicting accuracy for different transformer HI states and the overall accuracy of the proposed CNN model against uncertainty from 0 up to ±25% during the testing stage with full and reduced features, respectively. Table 17 illustrates the effectiveness of the proposed model against uncertainty noise up to ±25%. Moreover, the suggested CNN model results against the uncertainty noise up to ±25% are satisfactory.
Figure 11 compares the overall accuracy of the CNN model with full and reduced-feature scenarios with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25%.
Figure 12 compares the overall accuracy of the CNN model with other ML learning models for the full-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.
Figure 13 compares the overall accuracy of the CNN model with other ML learning models for the reduced-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.
Figure 11. Comparison between full and reduced-feature scenarios under uncertainty levels up to ±25%.
Figure 11. Comparison between full and reduced-feature scenarios under uncertainty levels up to ±25%.
Electronics 12 02405 g011
Figure 12. Comparison between CNN model and other ML methods with full-feature scenarios based on the 476-testing dataset.
Figure 12. Comparison between CNN model and other ML methods with full-feature scenarios based on the 476-testing dataset.
Electronics 12 02405 g012
Figure 13. Comparison between CNN model and other ML methods with reduced-feature scenarios based on the 476-testing dataset.
Figure 13. Comparison between CNN model and other ML methods with reduced-feature scenarios based on the 476-testing dataset.
Electronics 12 02405 g013

3.3.2. Comparisons with Recently Published Works

Table 18 compares the results obtained by the proposed CNN model and those presented in [14,16] for both the full-feature and reduced-feature scenarios. The results are carried out based on dataset system 2 (Gulf region) and on the methods of DT, SVM, KNN, and EN methods in [16] and NN, MLR, J48, and RF with [14]. The proposed CNN model demonstrates higher accuracy than the techniques presented in [14,16] for the full-feature and reduced-feature scenarios. For the full-feature procedure, the highest accuracy achieved by the proposed CNN model is 98.4%. In contrast, the highest accuracy obtained by the method in [16] was 96.7% with the EN model, and 96.6% is the highest accuracy obtained with the RF model in [14]. The proposed CNN model results are also better than those in [14,16], with an accuracy of 96.9%.

4. Conclusions

The power transformer HI state was studied based on the results of three tests: dissolved gas analysis (DGA), oil quality (OQ), and depolarization factor (DP). The power transformer HI state prediction was carried out in this work using 1361 dataset samples collected from two different regions (the first region was the Gulf Region with 730 dataset samples, while the second region was a Malaysia utility with 631 dataset samples). The proposed CNN model was implemented to predict and diagnose the power transformer HI state. The imbalance between the training dataset sample classes produced a high detection accuracy of the class with a major number of samples. In contrast, the low detection accuracy of the class with a minor number of samples was obtained. The oversampling approach was used to balance the training dataset samples to enhance the prediction accuracy of the classification methods. After applying the oversampling approach to the training datasets samples, the proposed CNN model predicted the power transformer HI state. The prediction accuracy of the proposed CNN model was enhanced to 98.53% after applying the oversampling process.
In comparison, the prediction accuracy of the CNN model without the oversampling process was 89.92%, based on the testing dataset samples. The results obtained with the proposed CNN model were compared with that obtained with the optimized ML classification methods such as DT, DA, NB, SVM, KNN, EN, and ANN under the oversampling process with the superiority of the CNN results. The predicting accuracy was 92.02%, 88.24%, 79.62%, 96.43%, 95.17%, 85.50%, 96.01%, and 98.53% based on the testing dataset samples for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively. Five feature-reduction techniques were applied to minimize the cost of testing time, effort, and prediction process. The reduced-feature techniques were MRMR, Chi2, RelifF, and Kruskal–Wallis were used with the proposed CNN model to reduce the number of applied features to only eight features. The results of the proposed CNN model are compared with that of the ML learning classification methods with the superiority of the proposed CNN model. The predicting accuracy was 93.70%, 79.41%, 85.08%, 92.23%, 95.17%, 95.38%, 93.70%, and 95.38% based on the testing dataset samples and ReliefF reduced-feature approach for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively.
Furthermore, the proposed CNN model was checked with uncertain noise in full and reduced features of up to ±25% with a good prediction diagnosis of the power transformer HI state. Finally, the proposed model results were compared with that obtained from recently published works, ensuring the efficacy of the proposed model for both full and reduced-feature approaches. The main contribution of this work is the enhancement of predicting the accuracy of the minor transformer HI state and the overall accuracy using the CNN model and the suggested oversampling approach.

Funding

This research received no external funding.

Acknowledgments

The authors would like to acknowledge the Deanship of Scientific Research, Taif University, for funding this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Scoring and weight factors for gas levels (ppm).
Table A1. Scoring and weight factors for gas levels (ppm).
Gas SiWi
123456
H2≤100100–200200–300300–500500–700>7002
CH4≤7575–125125–200200–400400–600>6003
C2H6≤6565–8080–100100–120120–150>1503
C2H4≤5050–8080–100100–150150–200>2003
C2H2≤33–77–3535–5050–80>805
CO ≤350350–700700–900900–11001100–1400>14001
CO2≤2500≤3000≤4000≤5000≤7000>70001
Table A2. Grading method for oil quality test parameters.
Table A2. Grading method for oil quality test parameters.
U < 69 kV 69 < U < 230 kV U > 230 kV SiWi
Dielectric strength kV (2 mm gap)≥45≥52≥6013
35–4545–5250–602
30–3535–4540–503
≤30≤35≤404
IFT dyne/cm≥25≥30≥3212
20–2523–3025–322
15–2018–2320–253
≤15≤18≤204
Acidity≤0.05≤0.04≤0.0311
0.05–0.10.04–10.03–0.072
0.1–0.20.1–0.150.07–0.13
≥0.2≥0.15≥0.14
Moisture (ppm)≤2014
20–302
30–403
≥404
Color≤1.512
1.5–22
2–2.53
≥2.54
Table A3. Furan test rating code or age rating when testing is not available.
Table A3. Furan test rating code or age rating when testing is not available.
Rating Code Furan (ppm) Age Year
A0–0.1<20
B0.1–0.520–40
C0.5–140–60
D1–5>60
E>5-

References

  1. Azmi, A.; Jasni, J.; Azis, N.; Kadir, M.Z.A.A. Evolution of transformer health index in the form of mathematical equation. Renew. Sustain. Energy Rev. 2017, 76, 687–700. [Google Scholar] [CrossRef]
  2. Zuo, W.; Yuan, H.; Shang, Y.; Liu, Y.; Chen, T. Calculation of a Health Index of Oil-Paper Transformers Insulation with Binary Logistic Regression. Hindawi Math. Probl. Eng. 2016, 2016, 6069784. [Google Scholar] [CrossRef]
  3. Naderian, A.; Cress, S.; Piercy, R.; Wang, F.; Service, J. An Approach to Determine the Health Index of Power Transformers. In Proceedings of the Conference Record of the 2008 IEEE International Symposium on Electrical Insulation, Vancouver, BC, Canada, 9–12 June 2008; pp. 192–196. [Google Scholar]
  4. Jahromi, A.; Piercy, R.; Cress, S.; Service, J.; Fan, W. An approach to power transformer asset management using health index. IEEE Electr. Insul. Mag. 2009, 25, 20–34. [Google Scholar] [CrossRef]
  5. Haema, J.; Phadungthin, R. Condition assessment of the health index for power transformer. In Proceedings of the Power Engineering and Automation Conference (IEEE PEAM 2012), Wuhan, China, 18–20 September 2012; pp. 1–4. [Google Scholar]
  6. Haema, J.; Phadungthin, R. Development of condition evaluation for power transformer maintenance. In Proceedings of the 4th International Conference on Power Engineering, Energy and Electrical Drives, Istanbul, Turkey, 13–17 May 2013; pp. 620–623. [Google Scholar]
  7. Yang, Y.; Talib, M.A.; Rosli, H. TNB experience in condition assessment and life management of distribution power transformers. In Proceedings of the CIRED 2009—20th International Conference and Exhibition on Electricity Distribution-Part 1, Prague, Czech Republic, 8–11 June 2009; pp. 1–4. [Google Scholar]
  8. Yang, Y.; Talib, M.A.; Rosli, H. Condition assessment of power transformers in TNB distribution system and determination of transformer condition index. In Proceedings of the Conference of the Electric Power Supply Industry (CEPSI), Macau, China, 27–31 October 2008. [Google Scholar]
  9. Zhou, Y.; Ma, L.; Yang, J.; Xia, C. Entropy weight health index method of power transformer condition assessment. In Proceeding of the 9th International Conference Reliability Maintainability Safety, Guiyang, China, 12–15 June 2011; pp. 426–431. [Google Scholar]
  10. Li, E.; Song, B. Transformer health status evaluation model based on multifeatured factors. In Proceedings of the 2014 International Conference on Power System Technology (POWERCON 2014), Chengdu, China, 20–22 October 2014; pp. 1417–1422. [Google Scholar]
  11. Ashkezari, A.D.; Ma, H.; Saha, T.K.; Ekanayake, C. Application of Fuzzy Support Vector Machine for Determining the Health Index of the Insulation System of In-service Power Transformers. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 965–973. [Google Scholar] [CrossRef]
  12. Alqudsi, H.; El-Hag, A. Assessing the Power Transformer Insulation Health Condition Using a Feature-Reduced Predictor Model. IEEE Trans. Dielectr. Electr. Insul. 2018, 25, 853–862. [Google Scholar] [CrossRef]
  13. Kadim, E.J.; Azis, N.; Jasni, J.; Ahmad, S.A.; Talib, M.A. Transformers Health Index Assessment Based on Neural-Fuzzy Network. Energies 2018, 11, 710. [Google Scholar] [CrossRef]
  14. Alqudsi, A.; El-Hag, A. Application of Machine Learning in Transformer Health Index Prediction. Energies 2019, 12, 2694. [Google Scholar] [CrossRef]
  15. Tian, F.; Jing, Z.; Zhao, H.; Zhang, E.; Liu, J. A Synthetic Condition Assessment Model for Power Transformers Using the Fuzzy Evidence Fusion Method. Energies 2019, 12, 857. [Google Scholar] [CrossRef]
  16. Ghoneim, S.S.M.; Taha, I.B.M. Comparative Study of Full and Reduced Feature Scenarios for Health Index Computation of Power Transformers. IEEE Access 2020, 8, 181326–181339. [Google Scholar] [CrossRef]
  17. Nikulin, M.S. Chi-squared test for normality. In Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, Vilnius, Lithuania, 25–30 June 1973; Volume 2, pp. 119–122. [Google Scholar]
  18. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef] [PubMed]
  19. Darbellay, G.A.; Vajda, I. Estimation of the information by an adaptive partitioning of the observation space. IEEE Trans. Inf. Theory 1999, 45, 1315–1321. [Google Scholar] [CrossRef]
  20. Johannes, B. On the efficient calculation of a linear combination of chi-square random variables with an application in counting string vacua. J. Phys. A Math. Theor. 2013, 46, 505202. [Google Scholar]
  21. Bagdonavicius, V.; Nikulin, M.S. Chi-squared goodness-of-fit test for right censored data. Int. J. Appl. Math. Stat. 2011, 24, 30–50. [Google Scholar]
  22. Cox David, R. Principles of Statistical Inference; Cambridge University Press: Cambridge, NY, USA, 2006; ISBN 978-0-521-68567-2. [Google Scholar]
  23. Tabachnick, G.; Fidell, S. Using Multivariate Statistics, 5th ed.; Pearson International Edition: Boston, MA, USA, 2007; ISBN 978-0-205-45938-4. [Google Scholar]
  24. Moore, S.; McCabe, P. Introduction to the Practice of Statistics, 4th ed.; W H Freeman & Co: New York, NY, USA, 2003; ISBN 0-7167-9657-0. [Google Scholar]
  25. Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
  26. Urbanowicz, J.; Meeker, M.; LaCava, W.; Olson, R.; Moore, J.; Jason, H. Relief-Based Feature Selection: Introduction and Review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef] [PubMed]
  27. Spurrier, J.D. On the null distribution of the Kruskal–Wallis statistic. J. Nonparametr. Stat. 2003, 15, 685–691. [Google Scholar] [CrossRef]
  28. Corder, W.; Foreman, I. Nonparametric Statistics for Non-Statisticians; John Wiley & Sons: Hoboken, NJ, USA, 2009; pp. 99–105. [Google Scholar]
  29. Putatunda, S.; Rama, K. A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting. In Proceedings of the Fifteenth International Conference on Information Processing (ICINPRO), Bengaluru, India, 20–22 December 2019. [Google Scholar]
  30. William, W.; Burank, B.; Efstratios, P. Hyperparameter optimization of machine learning models through parametric programming. Comput. Chem. Eng. 2020, 139, 1–12. [Google Scholar]
  31. Jia, W.; Xiu, C.; Hao, Z.; Li-Diong, X.; Si-Hao, D. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
  32. Wang, M.-H. A novel extension method for transformer fault diagnosis. IEEE Trans. Power Del. 2003, 18, 164–169. [Google Scholar] [CrossRef]
  33. Zhu, Y.-L.; Wang, F.; Geng, L.-Q. Transformer fault diagnosis based on naive Bayesian classifier and SVR. In Proceedings of the ENCON 2006 IEEE Region 10 Conference, Hong Kong, China, 14–17 November 2006; pp. 1–4. [Google Scholar]
  34. Sarma, D.V.S.S.S.; Kalyani, G.N.S. ANN approach for condition monitoring of power transformers using DGA. In Proceedings of the 2004 IEEE Region 10 Conference TENCON 2004, Chiang Mai, Thailand, 24 November 2004; pp. 444–447. [Google Scholar]
  35. Taha, I.B.M.; Ibrahim, S.; Mansour, D.-E.A. Power Transformer Fault Diagnosis Based on DGA Using a Convolutional Neural Network with Noise in Measurements. IEEE Access 2021, 9, 111162–111170. [Google Scholar] [CrossRef]
Figure 1. Power transformer HI state concept.
Figure 1. Power transformer HI state concept.
Electronics 12 02405 g001
Figure 2. CNN architecture.
Figure 2. CNN architecture.
Electronics 12 02405 g002
Figure 4. Training dataset samples redistributing using the oversampling process.
Figure 4. Training dataset samples redistributing using the oversampling process.
Electronics 12 02405 g004
Figure 5. Training dataset samples distribution before (a) and after (b) oversampling process.
Figure 5. Training dataset samples distribution before (a) and after (b) oversampling process.
Electronics 12 02405 g005
Figure 6. Percentage accuracy and loss during the training process after oversampling the training dataset.
Figure 6. Percentage accuracy and loss during the training process after oversampling the training dataset.
Electronics 12 02405 g006
Table 1. Scoring of the transformer Health index [16].
Table 1. Scoring of the transformer Health index [16].
Transformer Condition Index K i H I C
1DGA104, 3, 2, 1, 0
2OQ84, 3, 2, 1, 0
3DP64, 3, 2, 1, 0
Table 2. Threshold limits of the HI state [16].
Table 2. Threshold limits of the HI state [16].
HI StateLimitsOutput Operation Decision
Good≥85No maintenance required
Fair85–50More diagnostic testing is required.
Poor<50The transformer must be out of service.
Table 16. Accuracy of the full-feature scenarios against uncertainty data from 0% to ±25% uncertainty noise level.
Table 16. Accuracy of the full-feature scenarios against uncertainty data from 0% to ±25% uncertainty noise level.
0%±5%±10%±15%±20%±25%
Good98.2497.9597.0797.6598.2496.48
Fair99.1796.6795.0093.3389.1785.83
Poor100100100100100100
All98.5397.6996.6496.6496.0193.91
Table 17. Accuracy of the reduced-feature scenarios (ReliefF) against uncertainty data from 0% to ±25% uncertainty noise level.
Table 17. Accuracy of the reduced-feature scenarios (ReliefF) against uncertainty data from 0% to ±25% uncertainty noise level.
0%±5%±10%±15%±20%±25%
Good98.3396.6796.6794.1789.1782.50
Fair10010010080.0093.3386.67
Poor95.3894.1293.4991.3990.5588.45
All94.1392.9692.0890.9190.9190.62
Table 18. Comparison of the proposed methods and the methods in [24] by using two dataset samples.
Table 18. Comparison of the proposed methods and the methods in [24] by using two dataset samples.
Models in [16]Models in [14]
Proposed CNNDTSVMKNNENANNMLRJ48RF
Full-Feature98.49696.496.796.494.995.595.696.6
Reduced-Feature96.994.795.295.595.695.195.395.396.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Taha, I.B.M. Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics 2023, 12, 2405. https://doi.org/10.3390/electronics12112405

AMA Style

Taha IBM. Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics. 2023; 12(11):2405. https://doi.org/10.3390/electronics12112405

Chicago/Turabian Style

Taha, Ibrahim B. M. 2023. "Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling" Electronics 12, no. 11: 2405. https://doi.org/10.3390/electronics12112405

APA Style

Taha, I. B. M. (2023). Power Transformers Health Index Enhancement Based on Convolutional Neural Network after Applying Imbalanced-Data Oversampling. Electronics, 12(11), 2405. https://doi.org/10.3390/electronics12112405

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop