1. Introduction
In the last decades, electrical power generation resources have increased rapidly. The electrical power systems include different electrical power resources, especially renewable energy resources. The electrical power generation resources are far from the load centers for different reasons. The power transformer is one of the more important pieces of equipment in energy transmission systems. In the case of power transformer failures, the utilities will be subject to major economies such as loss of revenue. An electrical shortage at the end users causes the shutdown of industries, halting production, and worsening redundancy [
1,
2]. Therefore, transformer asset management has been extensively adopted to accomplish and suddenly avoid failures.
Lately, the power transformer health index (HI) state has become a suitable tool to combine current information about the power transformer. This information includes transformer operating observations, field tests, inspections, and laboratory testing [
2]. The laboratory tests include three main tests. The first test is the dissolved gas analysis (DGA) that depends on seven gases (hydrogen (H
2), methane (CH
4), acetylene (C
2H
2), ethylene (C
2H
4), ethane (C
2H
6), carbon dioxide (CO
2), and carbon monoxide (CO)). The second test is the oil quality test (OQ), which depends on six factors of the transformer oil (the oil dielectric strength or breakdown voltage (BD), oil interfacial tension (IFT), oil color, oil humidity (Mois.), oil insulation dissipation factor (DF), and oil acidity (Acid.)). The third test is the depolarization (DP) test, which depends on the furfural content of the transformer oils. These three tests’ output results are collected to evaluate the overall value of the power transformer HI state. The power transformer HI state is used for planning routine maintenance that affects the transformer’s age and end of life [
1,
2].
The main four approaches are introduced to evaluate the power transformer HI state. These four approaches are the scoring and ranking method [
3,
4,
5,
6]; the combination of scoring, ranking, and tier method [
7,
8]; matrices [
9]; and the multi-features factor assessment model [
10]. The scoring and ranking method is the most famous for evaluating the power transformer HI state. In this approach, to assess the HI state, the most common parameters are the DGA, OQ, and DP tests. The first step includes determining each test’s HI code or factor (HIC). The HIC for DGA, OQ, and DP are then standardized using the scoring and weighting values (4, 3, 2, 1, and 0). Finally, the HIC can be used for calculating the final value of the power transformer HI state using a constant weighting factor (
). The main drawback of these approaches is the requirement for high numbers of features from 24 to 27, which requires high cost, high effort, and high test times.
Some researchers have used the artificial intelligence approach for predicting the power transformer HI state value [
11,
12,
13,
14,
15,
16]. In [
11], a hybrid fuzzy-logic support vector machine approach (FLSVM) is used for indicating the HI of in-service transformers based on transformer test data. The FLSVM is used to find the relation between the output results of the insulation system test results and the output HI state value. It deals with the imbalanced dataset distribution due to small samples of the ‘Poor’ stats. The main drawback of this work is the accuracy of predicting the HI state, especially with the ‘Poor’ states. The researchers in [
12] presented a general cost-effective ANN model to predict the power transformer HI state with approximately 95% accuracy with subset input features. The ANN model is tested with another dataset collected from another utility company with about 89% predicting accuracy.
The main drawbacks are low signifying accuracy with new data and not presenting any results about the predicting accuracy with the majority and minority classes. In [
13], an assessment model of the transformer HI is developed based on the Neural-Fuzzy (NF) technique. Two systems of datasets (in-service assessment dataset and Mont Carlo Simulation (MCS) dataset) are used for the training process of the NF model. The results illustrate the high predicting accuracy of the NF model using the MCS dataset compared with that of scoring methods. Still, it had bad detecting accuracy with the in-service assessment dataset. The main drawbacks are low detecting accuracy and there is no solution to bad data distribution between the different HI state classes. In [
14], a machine learning (ML) model was presented for predicting the power transformer HI state. It used ML techniques such as ANN, decision tree, support vector machine, k-nearest classification, and random forest (RF) classification methods. The ML predicting method, especially the RF model, had high predicting accuracy for detecting the HI state. Moreover, feature-reduction techniques were used to reduce the number of input features in the ML models. However, the main drawback of this model was its bad detecting accuracy of the minority class (‘Poor’ state) and it was dealing with the bad distribution between classes in the training dataset. In [
15], a fuzzy evidence fusion was presented to predict the transformer condition. It introduced a more detailed fuzzy model for predicting the transformer condition state. The main drawback of this model is the small number of cases that confirmed the accuracy of the suggested model (only 39 samples), with only 84.6% predicting accuracy. In [
16], four optimized machine learning (ML) models were presented for predicting the transformer HI states. It used 1365 dataset samples collected from two different electric utilities. High predicting accuracy was obtained with the four ML models (95.9% with the ensemble classification model (EN)). Feature reduction with the MRMR technique was used to reduce the input features to only eight features with good accuracy, especially with the EN model (95%). The drawbacks of this model are that the predicting model is a two-stage model that requires a lot of time and effort for model building, low detecting accuracy with minority class state ‘Poor’, and there is no dealing with bad dataset distribution between classes.
The main drawbacks of the previous works of suggested predicting approaches of the power transformers HI are the low overall predicting accuracy, particularly the low predicting accuracy of the minor class of the HI states. The minor class is the “Poor” class, which is very important for predicting the transformer states, as the bad prediction of the “Poor” class leads to the fast failure of the power transformers. Therefore, a true prediction of the “Poor” class is very important for the continued operation of the transformers and the maintenance procedure in future steps to reduce power transformers’ failure rates in future periods.
This work proposes a new CNN model for power transformer HI state prediction. The proposed model is built using the output results of the DGA, OQ, and DP as inputs to the CNN model that is used to predict the final power transformer HI state. The imbalance between classes in the training data produces the tendency of the trained model toward the class with the majority number of samples, which weakens the prediction of the CNN model to the class with the minority number of samples. An oversample generator is suggested to generate new samples for the minority class to equal the number of samples for different classes. The prediction of the proposed CNN model is enhanced after applying the oversampling process to the training set. The overall prediction accuracy is improved from 89.92% without the oversampling process to 98.52% after applying the oversampling process to the training dataset.
The main contributions of this work are the enhancement of predicting the accuracy of the minor transformer HI state (which is very low in most previous works) and the overall accuracy using the CNN model and the suggested oversampling approach. Different feature-selection techniques are used to reduce the input features to decrease costs, effort, and time of testing. Five feature-selection approaches are used with the effectiveness of the ReliefF technique to determine the most important parameters used in predicting the HI state with the proposed CNN model. The robustness of the proposed CNN model is checked by applying the uncertainty to the input dataset (± up to 25%) that is inserted into the CNN model with good predicting accuracy to the CNN model. The effectiveness of the proposed CNN model is checked by comparing its results with different optimized machine learning approaches. Furthermore, the accuracy of the proposed CNN model for predicting the HI state is compared with recently published works with the efficacy of the proposed CNN model.
This work is organized into four sections.
Section 2 covers the mathematical analysis and system model, which consists of three subsections: power transformer HI state calculations, suggested CNN model and solution procedure, and suggested oversampling technique. The results and outcomes are presented in
Section 3, which consists of three sections: Results and Discussions consist of subsections: oversampling approach, reduced-feature results, and effectiveness of the CNN model. The conclusions are presented in
Section 4.
3. Results and Discussion
The dataset samples were assembled from two regions. The first dataset (730 samples) is collected from the Gulf Region at a sub-transmission stage of a medium voltage with 66 kV. The second dataset (631 samples) is assembled from the transmission regions of Malaysia Electricity Company at power transformers in the transmission line stage with 220 kV. The two datasets are added and then divided randomly into 65% (885 samples) and 35% (476 samples) for the training and testing data process. The CNN model is built using R2022a MATLAB/software. The main training dataset samples (885 samples) are applied to the CNN model. The dataset samples are normalized before inserting them into the classification CNN model. The normalization of the dataset samples is implemented as follows:
where
is the
ith dataset sample of the
jth dataset feature,
is the minimum value of the certain
jth dataset feature, and
is the maximum value of the certain
jth dataset feature.
The testing dataset samples (476 samples) after applying the normalizing process (Equation (13)) are used to check the CNN model prediction accuracy. The results of the CNN model during the training and testing are presented in
Table 3 and
Table 4, respectively. The results illustrate that the prediction accuracy of the power transformer HI state is low, especially during the testing stage, due to the tendency of the trained CNN model to the majority class (‘Good’ state). The prediction accuracy of the majority state type (good state) (335/341 = 98.24%), while the prediction accuracy of the minority class is very low (8/15 = 53.33%). The training dataset is applied to the optimized classification machine learning (ML) methods such as decision tree (DT), Discriminant analysis (DA), Naïve Bayes (NB), support vector machines (SVM), k-nearest neighbors (KNN), ensemble (EN), and artificial neural network (ANN). The ML methods are built using MATLAB/software 2021a classification learner toolbox.
Table 5 and
Table 6 compare the proposed CNN model and the ML methods, and the results of these ML methods during the training and testing stages, respectively. The results illustrate that the accuracy of all ML methods for detecting the ‘Good’ state is better than that of the ‘Fair’ and ‘Poor’ states. The ‘Poor’ state (minority class) has a bad prediction accuracy compared to the ‘Good’ state (the majority class).
The accuracy of the CNN model and other ML methods for the different stages of calculations (training, testing, and overall) is calculated as
where
are the number of true positive and true negative samples, respectively.
are the number of false positive and false negative samples, respectively.
and
.
Other classification performance factors are presented for more comparisons between the suggested CNN model and the different classification models, such as sensitivity, specificity, precision, and F1-score, which can be evaluated as follows:
Table 3.
Confusion matrix of the CNN model during the training process without data oversampling.
Table 3.
Confusion matrix of the CNN model during the training process without data oversampling.
True Class | | % Accuracy |
Good | 659 | 3 | 0 | 99.55 |
Fair | 3 | 200 | 1 | 98.04 |
Poor | 0 | 0 | 19 | 100 |
| Good | Fair | Poor | 99.21 |
Predict Class | |
Table 4.
Confusion matrix of the CNN model during the testing process without data oversampling.
Table 4.
Confusion matrix of the CNN model during the testing process without data oversampling.
True Class | | % Accuracy |
Good | 335 | 6 | 0 | 98.24 |
Fair | 30 | 85 | 5 | 70.83 |
Poor | 0 | 7 | 8 | 53.33 |
| Good | Fair | Poor | 89.92 |
Predict Class | |
Table 5.
Comparison between the results of the CNN model and the other methods during the training stage.
Table 5.
Comparison between the results of the CNN model and the other methods during the training stage.
HI | Good | Fair | Poor | % Accuracy |
---|
TSN * | 662 | 204 | 19 |
---|
DT | 639 | 172 | 8 | 92.54 |
DA | 650 | 158 | 11 | 92.54 |
NB | 626 | 167 | 11 | 90.85 |
SVM | 649 | 182 | 7 | 94.69 |
KNN | 654 | 170 | 8 | 94.01 |
EN | 653 | 187 | 4 | 95.37 |
ANN | 649 | 175 | 7 | 93.90 |
CNN | 659 | 200 | 19 | 99.21 |
Table 6.
Comparison between the results of the CNN model and the other methods during the testing stage.
Table 6.
Comparison between the results of the CNN model and the other methods during the testing stage.
HI | Good | Fair | Poor | Sensitivity | Specificity | Precision | F1-Score | % Accuracy |
---|
TSN | 341 | 120 | 15 |
---|
DT | 332 | 70 | 4 | 0.61 | 0.88 | 0.64 | 0.62 | 85.29 |
DA | 338 | 65 | 5 | 0.62 | 0.87 | 0.70 | 0.65 | 85.71 |
NB | 323 | 85 | 8 | 0.73 | 0.90 | 0.74 | 0.74 | 87.39 |
SVM | 330 | 93 | 3 | 0.65 | 0.92 | 0.75 | 0.68 | 89.50 |
KNN | 339 | 75 | 3 | 0.61 | 0.88 | 0.83 | 0.66 | 87.61 |
EN | 331 | 82 | 2 | 0.60 | 0.88 | 0.89 | 0.63 | 87.18 |
ANN | 332 | 70 | 4 | 0.66 | 0.91 | 0.81 | 0.71 | 89.92 |
CNN | 338 | 65 | 5 | 0.74 | 0.91 | 0.80 | 0.77 | 89.92 |
3.1. Oversampling Technique
The imbalanced training dataset samples between classes cause the trained model’s tendency toward the majority class. The detecting accuracy of the minority class is very bad due to this imbalanced distribution of the training dataset: very bad training set distribution between the different HI state classes (662 samples for ‘Good’, 204 samples for the ‘Fair’ state and 19 samples for the ‘Poor’ state). The imbalance in training dataset samples can be solved using oversampling or undersampling. This work applies the oversampling process to the training dataset samples to enhance the balance between training set classes.
The training dataset numbers of the three HI states before and after the oversampling process are shown in
Figure 5. It illustrates the bad distribution of the training dataset number of samples before the oversampling process and an equality distribution after applying the suggested oversampling procedure.
Figure 6 illustrates the prediction accuracy and loss versus the iteration number during the training attempts after oversampling the training dataset to the CNN model. The results demonstrate that the training accuracy is near one hundred percent, while the loss is very low and close to zero, showing a good training accuracy of the suggested CNN model.
The CNN loss can be expressed as follows:
where
n is the sample numbers,
C is the class numbers,
represents the probability that the
th sample goes to the
th class, and
is the output of the dataset sample
in the class
, which is the output of the CNN softmax layer.
Table 7 presents the CNN hyperparameters used for the classification process of the power transformer HI state. The CNN hyperparameters are selected to have a good classification performance.
Table 8 presents the predicted accuracy of the training dataset after applying the oversampling process to the training dataset. The predicting accuracy of the ‘Poor’ state is 100%, while that of the ‘Good’ and ‘Fair’ states are 99.95% and 97.73%, respectively, and the overall accuracy is 99.19%.
Table 9 shows the predicting accuracy of the CNN model after the oversampling process with the testing dataset (476 samples). The predicted accuracy of both the ‘Poor’ and ‘Fair’ states are 100% and 97.73%, respectively, while that of the training model without oversampling process is 53.33% and 70.83%, respectively. Moreover, the overall accuracy of the CNN model with the oversampling process is enhanced to 98.53% compared to 89.92% for the CNN model without oversampling process.
Table 7.
CNN model selected parameters.
Table 7.
CNN model selected parameters.
| Parameter | Value |
---|
Convolution Layer 1 | Filter Size | 1 |
Number of filters | 32 |
Padding | 0 |
Convolution Layer 2 | Filter Size | 1 |
Number of filters | 175 |
Padding | 0 |
Max-Pooling Layer | Stride | 1 |
Fully Connected Layer | Outputs | 3 |
Learning Algorithm Options | Step size, α | 10−3 |
Gradient threshold | 0.001 |
Training algorithm | Adam |
Max. Epochs | 150 |
Verbose | 1 |
Activation | softmax |
CNN types | classification |
Table 8.
Confusion matrix of the CNN model during the training process.
Table 8.
Confusion matrix of the CNN model during the training process.
True Class | | % Accuracy |
Good | 661 | 1 | 0 | 99.85 |
Fair | 15 | 647 | 0 | 97.73 |
Poor | 0 | 0 | 662 | 100 |
| Good | Fair | Poor | 99.19 |
Predict Class | |
Table 9.
Confusion matrix of the CNN model during the testing process.
Table 9.
Confusion matrix of the CNN model during the testing process.
True Class | | % Accuracy |
Good | 335 | 6 | 0 | 98.24 |
Fair | 1 | 119 | 0 | 99.17 |
Poor | 0 | 0 | 15 | 100 |
| Good | Fair | Poor | 98.53 |
Predict Class | |
Figure 7 compares the actual HI state of the transformer (Good, Fair, and Poor) against the predicted HI state to illustrate the prediction accuracy of the suggested CNN model during the testing process with 476 dataset samples. The results demonstrate that the proposed CNN model has excellent predicting accuracy with high prediction accuracy of 98.24%, 99.17%, and 100% for the ‘Good’, ‘Fair’, and ‘Poor’ states, respectively.
Figure 8 compares the CNN model prediction results of the power transformer HI state accuracy based on the testing dataset (476 samples) with and without oversampling. It illustrates the enhancement of the HI state after applying the oversampling process, especially with the ‘Poor’ and ‘Fair’ states.
After the oversampling process, the dataset is used to train the optimizing ML models. The training process of the different ML models is presented in
Figure 9.
Figure 7.
HI state prediction of the suggested CNN model during the testing process after oversampling.
Figure 7.
HI state prediction of the suggested CNN model during the testing process after oversampling.
Figure 8.
Comparison between the fault prediction of the CNN model during the testing stage without and with the oversampling process.
Figure 8.
Comparison between the fault prediction of the CNN model during the testing stage without and with the oversampling process.
Figure 9.
Comparison among the ML classification methods against the iteration number during the optimization process.
Figure 9.
Comparison among the ML classification methods against the iteration number during the optimization process.
Compared to other models, the EN method model has a minimum error during the training stages. In contrast, the NB method model has a higher error. The cross-fold validation approach with five folds was used for training the ML-optimized models. Hence, the optimization parameter in the classification learner toolbox is applied to select the suitable classification model and the matching parameters of the main chosen methods. This work uses Bayesian optimization (BO) with ML methods to determine their optimal parameters. The BO approach is useful for optimization problems and can be used for most ML techniques for optimal parameter selection [
29,
30,
31]. The training parameters of different ML models are introduced in
Table 10.
Table 11 compares the results of the CNN model and the other ML methods after applying the oversampling process to the training dataset samples. The number of predicting samples for each power transformer HI state and the overall accuracy corresponding to the CNN and ML methods during the training stage. The results illustrate the high predicting accuracy for all models compared to those without the training dataset’s oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 99.19%) compared to other ML methods (overall accuracy of best ML learning with the EN method is 97.94%).
Table 12 compares the results of the CNN model and the other ML learning methods (trained after applying oversampling process) using the testing dataset samples. It introduces the number of predicting samples for each power transformer HI state and the predicting performance factors corresponding to the CNN and ML methods during the testing stage. The results illustrate the high predicting accuracy for all models compared to those without applying the oversampling process (
Table 6). The CNN model predicting accuracy is enhanced to 98.53% compared to 89.92% with the model without the oversampling process. Moreover, all the other ML learning model-predicting performances are better than the predicting performance for the models without oversampling process. The results also demonstrate the effectiveness of the CNN model (overall accuracy of 98.53%) compared to other ML methods (overall accuracy of best ML learning with the SVM method is 96.43%).
Table 10.
Optimal parameters of the ML methods during optimization with the training dataset after oversampling processes.
Table 10.
Optimal parameters of the ML methods during optimization with the training dataset after oversampling processes.
Method | Optimization Parameters |
---|
DT | Max. No. of splits: 120 Split criterion: Towing rule |
DA | Discriminant type: Quadratic |
NB | Distribution names: Kernel Kernel type: Gaussian |
SVM | Multiclass method: One-vs.-All Box constraint level: 985.7716 Kernel function: Cubic |
KNN | Number of neighbors: 991 Distance metric: Cosine Distance weight: Squared inverse Standardize data: true |
EN | Ensemble method: AdaBoost Number of learners: 140 Learning rate: 0.9897 Maximum number of splits: 18 |
ANN | Number of fully connected layers: 2 Activation: Sigmoid Standardize data: No Regularization strength (Lambda): 5.1411 × 10−9 First layer size: 138 Second layer size: 248 |
Table 11.
Comparison between the results of the CNN model and the other methods during the training stage after the oversampling process.
Table 11.
Comparison between the results of the CNN model and the other methods during the training stage after the oversampling process.
HI | Good | Fair | Poor | % Accuracy |
---|
SN | 662 | 662 | 662 |
---|
DT | 627 | 639 | 647 | 96.32 |
DA | 623 | 502 | 649 | 89.33 |
NB | 460 | 551 | 642 | 83.23 |
SVM | 638 | 638 | 652 | 97.08 |
KNN | 639 | 618 | 655 | 96.27 |
EN | 652 | 647 | 646 | 97.94 |
ANN | 632 | 639 | 656 | 97.03 |
CNN | 661 | 647 | 662 | 99.19 |
Table 12.
Comparison between the results of the CNN model and the ML methods during the testing stage after the oversampling process.
Table 12.
Comparison between the results of the CNN model and the ML methods during the testing stage after the oversampling process.
HI | Good | Fair | Poor | Sensitivity | Specificity | Precision | F1-Score | % Accuracy |
---|
SN | 341 | 120 | 15 |
---|
DT | 303 | 120 | 15 | 0.96 | 0.96 | 0.92 | 0.93 | 92.02 |
DA | 324 | 81 | 15 | 0.88 | 0.91 | 0.77 | 0.80 | 88.24 |
NB | 259 | 105 | 15 | 0.88 | 0.90 | 0.71 | 0.76 | 79.62 |
SVM | 325 | 119 | 15 | 0.98 | 0.98 | 0.96 | 0.97 | 96.43 |
KNN | 318 | 120 | 15 | 0.98 | 0.98 | 0.95 | 0.96 | 95.17 |
EN | 272 | 120 | 15 | 0.93 | 0.94 | 0.88 | 0.89 | 85.50 |
ANN | 322 | 120 | 15 | 0.98 | 0.98 | 0.95 | 0.97 | 96.01 |
CNN | 335 | 119 | 15 | 0.99 | 0.99 | 0.98 | 0.99 | 98.53 |
3.2. Reduced-Feature Results
This section presents the results of the reduced-features CNN model. The selected features are carried out based on five feature-reduction techniques. These methods are MRMR, Chi2, RelifF, and Kruskal–Wallis. The minimum number of features that give good predicting results is eight, like that presented in [
16].
The arrangements of eight high-ranked features with different feature-reduction techniques are shown in
Table 13. The training scores for eight important features against the MRMR, ReliefF, ANOVA, and Kruskal–Wallis approaches are demonstrated in
Figure 10.
The CNN model with the oversampling process is trained with the high-ranked eight features of each of the five feature-reduction approaches. The predicting accuracy corresponding to each power transformer HI state and the overall state are presented against each feature-reduction approach. The results are shown in
Table 14. The results illustrate that the predicting accuracy with the ReliefF technique is better than that of other feature-reduction methods.
Table 14 presents the predicting accuracy of the CNN model for each feature-reduction technique during the training stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.
Table 15 presents the predicting accuracy of the CNN model for each feature-reduction technique during the testing stages. The results illustrate the effectiveness of the ReliefF technique compared to other feature-reduction techniques.
Figure 10.
High-ranked eight features of the MRMR, ReliefF, ANOVA, and Wallis methods.
Figure 10.
High-ranked eight features of the MRMR, ReliefF, ANOVA, and Wallis methods.
Table 13.
High-ranked eight features corresponding to the five feature-reduction techniques.
Table 13.
High-ranked eight features corresponding to the five feature-reduction techniques.
MRMR | CHi2 | ReleifF | ANOVA | Wallis |
---|
DF | DF | IF | IF | DF |
Furan | Furan | BDV | Color | IF |
IF | IF | Mois | DF | Color |
Color | Color | Acid | Acid | Acid |
Acid | Acid | C2H2 | Furan | C2H4 |
CO2 | CO2 | CO | Mois | CO2 |
C2H2 | C2H4 | C2H6 | CO2 | Mois |
C2H4 | C2H2 | CH4 | CO | H2 |
Table 14.
Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the training stage.
Table 14.
Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the training stage.
HI State | MRMR | CHi2 | ReliefF | ANOVA | Wallis |
---|
Good | 97.89 | 97.89 | 97.89 | 97.89 | 94.11 |
Fair | 87.31 | 87.31 | 96.68 | 96.68 | 94.86 |
Poor | 99.7 | 99.7 | 100 | 100 | 99.7 |
All | 94.96 | 94.96 | 98.19 | 98.19 | 96.22 |
Table 15.
Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the testing stage.
Table 15.
Prediction accuracy corresponds to the five feature-reduction techniques against each transformer HI state during the testing stage.
HI State | MRMR | CHi2 | ReliefF | ANOVA | Wallis |
---|
Good | 95.01 | 95.01 | 94.13 | 84.75 | 90.32 |
Fair | 88.33 | 88.33 | 98.33 | 96.67 | 95.83 |
Poor | 100 | 100 | 100 | 100 | 100 |
All | 93.49 | 93.49 | 95.38 | 88.24 | 92.02 |
3.3. Effectiveness of the CNN Model
Two methods measure the effectiveness of the proposed model. The first method used to measure the effectiveness of the proposed CNN model is the uncertainty in the input dataset inserted into the CNN model. In contrast, the second method compares the results of the suggested model with the recently published works.
3.3.1. CNN Model with Uncertainty
The datasets of the power transformers are collected offline in three major steps. The first one is obtaining samples from the power transformers. The second step is obtaining the gases from the transformer oil, and the third is detecting the power transformer HI state. Special syringes are used to extract oil samples. The extracted samples are saved and transported to laboratories. Storage time and temperature affected the gas concentration value. Air bubbles are the most critical factor affecting gas concentrations [
32]. Air bubbles decrease dissolved gases due to the gas diffusion into the air bubbles, thus leaving behind the oil [
33]. Hence, uncertainty during the measurement process affects the power transformer HI state detection. The uncertainty during measurements must be studied by the classification methods implemented for detecting the power transformer HI state. A ±14% uncertainty noise is produced by the temperature effects and the sample’s storage, and an uncertainty noise up to ±5% is made by measurement process accuracy [
34]. This study considers an uncertainty noise up to ±25%.
The uncertainty in testing samples is presented to each sample,
, to generate a new sample with a selected uncertainty level of up to ±25%,
, using the following equation adapted from [
35].
where
is the maximum uncertainty level (±25%) and
is a 14×1 random vector with component values between 0 and 1.
When has an uncertainty noise level of ±5% to ±25% with the step of ±5%, the original input feature vector and are produced element by element to obtain a dataset with uncertainty noise. These datasets with uncertainty noise are incorporated into the proposed CNN model to measure the predicted performance during the uncertainty evaluation.
Table 16 and
Table 17 present the predicting accuracy for different transformer HI states and the overall accuracy of the proposed CNN model against uncertainty from 0 up to ±25% during the testing stage with full and reduced features, respectively.
Table 17 illustrates the effectiveness of the proposed model against uncertainty noise up to ±25%. Moreover, the suggested CNN model results against the uncertainty noise up to ±25% are satisfactory.
Figure 11 compares the overall accuracy of the CNN model with full and reduced-feature scenarios with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25%.
Figure 12 compares the overall accuracy of the CNN model with other ML learning models for the full-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.
Figure 13 compares the overall accuracy of the CNN model with other ML learning models for the reduced-feature scenario with the uncertainty of the input dataset up to ±25%. The results illustrate the robustness of the suggested model against uncertainty noise up to ±25% compared to the other ML learning models.
Figure 11.
Comparison between full and reduced-feature scenarios under uncertainty levels up to ±25%.
Figure 11.
Comparison between full and reduced-feature scenarios under uncertainty levels up to ±25%.
Figure 12.
Comparison between CNN model and other ML methods with full-feature scenarios based on the 476-testing dataset.
Figure 12.
Comparison between CNN model and other ML methods with full-feature scenarios based on the 476-testing dataset.
Figure 13.
Comparison between CNN model and other ML methods with reduced-feature scenarios based on the 476-testing dataset.
Figure 13.
Comparison between CNN model and other ML methods with reduced-feature scenarios based on the 476-testing dataset.
3.3.2. Comparisons with Recently Published Works
Table 18 compares the results obtained by the proposed CNN model and those presented in [
14,
16] for both the full-feature and reduced-feature scenarios. The results are carried out based on dataset system 2 (Gulf region) and on the methods of DT, SVM, KNN, and EN methods in [
16] and NN, MLR, J48, and RF with [
14]. The proposed CNN model demonstrates higher accuracy than the techniques presented in [
14,
16] for the full-feature and reduced-feature scenarios. For the full-feature procedure, the highest accuracy achieved by the proposed CNN model is 98.4%. In contrast, the highest accuracy obtained by the method in [
16] was 96.7% with the EN model, and 96.6% is the highest accuracy obtained with the RF model in [
14]. The proposed CNN model results are also better than those in [
14,
16], with an accuracy of 96.9%.
4. Conclusions
The power transformer HI state was studied based on the results of three tests: dissolved gas analysis (DGA), oil quality (OQ), and depolarization factor (DP). The power transformer HI state prediction was carried out in this work using 1361 dataset samples collected from two different regions (the first region was the Gulf Region with 730 dataset samples, while the second region was a Malaysia utility with 631 dataset samples). The proposed CNN model was implemented to predict and diagnose the power transformer HI state. The imbalance between the training dataset sample classes produced a high detection accuracy of the class with a major number of samples. In contrast, the low detection accuracy of the class with a minor number of samples was obtained. The oversampling approach was used to balance the training dataset samples to enhance the prediction accuracy of the classification methods. After applying the oversampling approach to the training datasets samples, the proposed CNN model predicted the power transformer HI state. The prediction accuracy of the proposed CNN model was enhanced to 98.53% after applying the oversampling process.
In comparison, the prediction accuracy of the CNN model without the oversampling process was 89.92%, based on the testing dataset samples. The results obtained with the proposed CNN model were compared with that obtained with the optimized ML classification methods such as DT, DA, NB, SVM, KNN, EN, and ANN under the oversampling process with the superiority of the CNN results. The predicting accuracy was 92.02%, 88.24%, 79.62%, 96.43%, 95.17%, 85.50%, 96.01%, and 98.53% based on the testing dataset samples for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively. Five feature-reduction techniques were applied to minimize the cost of testing time, effort, and prediction process. The reduced-feature techniques were MRMR, Chi2, RelifF, and Kruskal–Wallis were used with the proposed CNN model to reduce the number of applied features to only eight features. The results of the proposed CNN model are compared with that of the ML learning classification methods with the superiority of the proposed CNN model. The predicting accuracy was 93.70%, 79.41%, 85.08%, 92.23%, 95.17%, 95.38%, 93.70%, and 95.38% based on the testing dataset samples and ReliefF reduced-feature approach for DT, DA, NB, SVM, KNN, EN, ANN, and the proposed CNN models, respectively.
Furthermore, the proposed CNN model was checked with uncertain noise in full and reduced features of up to ±25% with a good prediction diagnosis of the power transformer HI state. Finally, the proposed model results were compared with that obtained from recently published works, ensuring the efficacy of the proposed model for both full and reduced-feature approaches. The main contribution of this work is the enhancement of predicting the accuracy of the minor transformer HI state and the overall accuracy using the CNN model and the suggested oversampling approach.