New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers

Laburú, Felipe M.; Cabral, Thales W.; Gomes, Felippe V.; de Lima, Eduardo R.; Filho, José C. S. S.; Meloni, Luís G. P.

doi:10.3390/en17122889

Open AccessArticle

New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers

by

Felipe M. Laburú

¹,

Thales W. Cabral

¹

,

Felippe V. Gomes

²,

Eduardo R. de Lima

³,

José C. S. S. Filho

¹

and

Luís G. P. Meloni

^1,*

¹

Department of Communications, School of Electrical and Computer Engineering, University of Campinas, Campinas 13083-852, Brazil

²

Transmissora Aliança de Energia Elétrica S.A.—TAESA, Praça Quinze de Novembro, Centro, Rio de Janeiro 20010-010, Brazil

³

Department of Hardware Design, Instituto de Pesquisa Eldorado, Campinas 13083-898, Brazil

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(12), 2889; https://doi.org/10.3390/en17122889

Submission received: 19 April 2024 / Revised: 31 May 2024 / Accepted: 7 June 2024 / Published: 12 June 2024

(This article belongs to the Special Issue Application of Artificial Intelligence in Power System Monitoring and Fault Diagnosis II)

Download

Browse Figures

Versions Notes

Abstract

:

The dissolved gas analysis of insulating oil in power transformers can provide valuable information about fault diagnosis. Power transformer datasets are often imbalanced, worsening the performance of machine learning-based fault classifiers. A critical step is choosing the proper evaluation metric to select features, models, and oversampling techniques. However, no clear-cut, thorough guidance on that choice is available to date. In this work, we shed light on this subject by introducing new tailored evaluation metrics. Our results and discussions bring fresh insights into which learning setups are more effective for imbalanced datasets.

Keywords:

power transformers; DGA sensoring; fault diagnosis; dissolved gas analysis; evaluation metrics; artificial intelligence

1. Introduction

Power supply interruptions such as instability and blackouts, possibly caused by equipment malfunctions in electrical substations, demand attention from energy companies. These malfunctions can cause legal financial penalties for the utilities, and high-cost equipment may become useless due to failures. The power transformer is a piece of expensive and essential equipment in electrical systems. For this reason, there is an interest from energy utilities in power transformer monitoring strategies to minimize faults.

There are several strategies in the literature on the diagnosis of transformer failures. Some works use acoustic emission [1], vibration data [2], thermographic images [3], and dissolved gas analysis (DGA) for the diagnosis [4]. The DGA is the most commonly used approach for transformer fault diagnosis. As discussed in [5], it is relevant to emphasize that a DGA can address a wide variety of states regarding the transformer’s condition, such as low-energy discharge, high-energy discharge, low-temperature thermal fault (T <

300 °

C), medium-temperature thermal fault (

300 °

C < T <

700 °

C), high-temperature thermal fault (T >

700 °

C), partial discharge, and the normal state. According to [1], the strategy based on acoustic waves is focused on detecting partial discharges. As per [3], thermal imaging only detects faults related to the heat emitted by the transformer, i.e., the thermal faults. On the other hand, the findings in [2] suggest that vibration analysis is a complementary diagnosis to DGA. Therefore, the advantages of dissolved gas analysis methods compared with other approaches are the DGA robustness and the holistic comprehension of the transformer’s operating status.

The DGA relies on the concentration of certain gases to diagnose the status of the transformer. Generally, the so-called seven key gases are employed: hydrogen (

H_{2}

), methane (

{CH}_{4}

), ethylene (

C_{2} H_{4}

), ethane (

C_{2} H_{6}

), acetylene (

C_{2} H_{2}

), carbon monoxide (CO), and carbon dioxide (

{CO}_{2}

). The high-temperature decomposition of oil may produce a higher concentration of

C_{2} H_{4}

, and a low temperature leads to a higher formation of

{CH}_{4}

and

C_{2} H_{6}

concentrations. Electric arc events are mainly correlated with the generation of

H_{2}

. And the formation of CO and

{CO}_{2}

may originate from the insulating paper decomposition [6].

According to [7], interpretation methods and artificial intelligence (AI) are the most widely used strategies within DGA for power transformer fault diagnosis. Earlier works employed interpretation methods, such as the Doernenburg ratios [8], the Roger ratios [9], the IEC ratios [10], the gas ratio combinations [11], non-code ratios [12], and the Duval triangle and extensions [13]. Currently, machine learning (ML) strategies are employed using learning models, such as the k-Nearest Neighbor (k-NN) [14], Decision Tree (DT) [15], Support Vector Machine (SVM) [16], Artificial Neural Networks (ANNs), and its variants [17,18]. These AI-based methods can improve performance compared to the interpretation methods [19]. However, to train the ML models, a database is needed that historically reports the state of the transformer.

Such public databases are scarce in the literature as this information is sensitive to concessionaires. Therefore, there are a limited number of data available with high quality. For this reason, some authors employ dataset fusion. This work applies the fusion of six public databases, resulting in 551 samples. These databases can be found in [4,10,17,20,21,22]. However, another major problem remains, which is the imbalance between classes. The imbalance occurs due to the stochastic character of failures. There is no assurance that all kinds of failure events occur in the same proportion (they do not!). Consequently, it is natural that there is an imbalance between the normal state and the fault state and between fault classes. To minimize the imbalance across all classes, many researchers use additional techniques that address imbalanced learning.

In the literature, there are three main approaches for counteracting data imbalance: classifier-specific solutions [23,24], cost-sensitive methods [25,26], and oversampling techniques [27,28]. This work applies oversampling techniques to solve the class imbalance in the fusion of datasets. Oversampling directly addresses the cause of the data imbalance problem, which is the lack of data from the minority class. By generating additional samples for the minority class, the oversampling technique increases the representativeness of this class in the dataset, allowing the machine learning model to learn more effectively from examples of this underrepresented class. Moreover, oversampling is a preprocessing approach so that it can be seamlessly integrated into the existing machine learning pipelines. This flexibility makes it a versatile and widely applicable solution for many scenarios. Three oversampling algorithms are employed: the minority oversampling technique (SMOTE) [28], adaptive synthetic minority oversampling technique (ASMOTE) [6], and clustering using SMOTE representatives (CURE-SMOTE) [29].

Nitesh et al. [28] proposed the SMOTE algorithm in 2003. The SMOTE oversamples minority classes creating synthetic samples. The technique generates the examples in feature space and inserts them along segments connecting the k nearest neighbors of the minority class. In 2017, Ma et al. [29] proposed the CURE-SMOTE algorithm. This technique employs the CURE method, an efficient hierarchical clustering algorithm insensitive to outliers and able to recognize abnormal points [30]. Then, the CURE-SMOTE uses the CURE to cluster samples of the minority class, removing noise and outliers. Afterward, the CURE-SMOTE generates synthetic examples between the center point and representative samples. In 2019, Tra et al. [6] proposed the ASMOTE algorithm. This method is a modification of the SMOTE, which employs a weighting for different minority classes. This algorithm does not use a homogeneous weight, such as traditional approaches. Instead, the ASMOTE distributes different weights to each minority class instance and, for this reason, it is considered adaptive. Besides using an oversampling technique, it is necessary—critical, indeed—to establish an appropriate evaluation metric for the imbalanced learning problem at hand. In this context, our work investigates unexplored gaps in the literature. It is known that accuracy is a poor metric for imbalanced datasets, as it may bias the performance toward majority classes [31]. Alternative metrics (e.g., the F-score, G-score, MG-score, and minority recall) have been explored elsewhere, but it remains unclear which one is best, if any, and why. Currently, no comprehensive advice on that matter exists in the DGA-related literature. To exploit this gap, our experiments, results, and discussions are especially interesting to new enthusiasts in ML techniques for DGA. We address fundamental questions, including the following. How does one properly choose a feature set? Do classical learning models guarantee a sufficiently high accuracy? Which evaluation metric is suitable for unbalanced learning? How does one properly interpret that metric? To answer these questions, we cover the most used metrics, learning models, features, and oversampling algorithms in the DGA literature. To select the most suitable metric for the unbalanced learning problem, we consider the recall of each class (normal—N, discharge—D, partial discharge—PD, and temperature—T) and propose two dedicated metrics to analyze the overall performance of the oversampling algorithms. The proposed metrics bring new insights into the suitability of traditional metrics in an imbalanced learning problem.

To preview the contributions of our work, consider that an ML engineer selects k-NN as the learning model and wants to choose the best oversampling algorithm and feature engineering. To this end, they try the three most popular evaluation metrics: the accuracy, F1-score, and multiclass G (MG)-score. In the validation step, they obtain the three highlighted columns of these metrics for different features and oversampling algorithms, as shown in Figure 1. Usually, in the first step, the set of features with maximum accuracy is chosen, pointed in a black rectangle in the figure (the logarithm of gas concentrations, as detailed later). The next step is to select the best oversampling algorithm for the problem. The accuracy suggests using no oversampling algorithm, i.e., the “as-is” case, in blue. The MG-score suggests the SMOTE, in red. And the F1-score suggests the ASMOTE, in green. But then, which metric should the engineer ultimately choose? The purpose of the unbalanced learning problem is to increase the recall of minority classes without considerably decreasing that of majority classes. So, to answer the previous question, we propose considering two metrics combined: the average level of recall per class (overall gain: the higher, the better), say

M_{Re}

, and the average standard deviation of recall per class around

M_{Re}

(imbalance: the smaller, the better), say

σ_{Re}

. Accordingly, in Figure 1, among the three traditional metrics considered, the one that maximizes

M_{Re}

, i.e.,

M_{Re} = 87.5

, while minimizing

σ_{Re}

, i.e.,

σ_{Re} = 2.4

, is the MG-score, in red in the SMOTE case. Still, it is possible to see the trade-offs between the recalls of the minority and majority classes by comparing the case without oversampling (as is) with the one chosen by the MG-score (SMOTE).

Considering what has been discussed previously (gap and proposed solution), the contributions of this work are summarized below:

We introduce two simple, dedicated evaluation metrics that, combined, unveil the actual performance of DGA-based learning algorithms on imbalanced datasets;
We show that the MG-score turns out to be a good proxy for the proposed dedicated metrics;
We propose a new set of features based on the classical non-code ratios, outperforming this one;
We run and analyze a series of empirical experiments to provide clear guidance on the choice of learning models, feature sets, and oversampling techniques for DGA-based fault diagnosis with imbalanced datasets.

The remainder of this paper is organized as follows. Section 2 provides a comprehensive review of the approaches in the literature, especially the related works. Section 3 introduces the databases utilized in this work and details the data fusion process. Section 4 describes the adopted learning framework. Section 5 delves into feature engineering and presents our proposed feature set. Section 6 outlines the evaluation metrics introduced in this manuscript and provides a rationale for their selection. Section 7 shows the experimental results, thoroughly discussing the outcomes and providing recommendations and interpretations regarding the findings. Finally, Section 8 concludes this paper by presenting the contributions and implications of the proposed strategy.

2. Related Works

This section seeks to review, in a comprehensive manner, published related works. Several strategies for diagnosing transformer failures are discussed in the literature, including the use of acoustic emission [1], vibration data [2], thermographic images [3], and DGA. According to [5], DGA is the most commonly used method, as it can diagnose a wide range of transformer conditions, such as low-energy discharge, high-energy discharge, various temperature-related faults, partial discharge, and normal states. Acoustic emission focuses on detecting partial discharges, while thermographic imaging is limited to identifying heat-related faults. Vibration analysis is considered complementary to DGA. The primary advantages of DGA over other methods are its robustness and comprehensive understanding of the transformer’s operating status. Moreover, this review also considers works involving oversampling techniques and different feature sets.

In 2012, Mirowski et al. [14] provided initial insights into DGA, suggesting logarithmic transformation for gas concentration preprocessing. They found that models like ANNs, Gaussian kernel SVMs, and local linear regression outperformed linear classifiers and regressors. Evaluation metrics included accuracy (Acc) and the area under the curve (AUC). In the same period, Ma et al. [32] proposed a study on power transformer fault diagnosis, addressing measurement uncertainties with fuzzy c-means clustering-based FSVM (FCM-FSVM) and kernel fuzzy c-means clustering-based FSVM (KFCM-FSVM). Their evaluation metrics were accuracy, G-score, sensitivity, and specificity. In 2013, Ashkezari et al. [33] used Fuzzy Support Vector Machine (FSVM) to create an equipment health index, employing the SMOTE for the data imbalance. Although the FSVM-SMOTE achieved high average accuracy, the individual class accuracy varied. In 2014, Cui et al. [34] introduced the SMOTEBoost, a hybrid algorithm enhancing AI algorithms’ generalization capability. They employed the SVM, DT, Radial Basis Function (RBF) k-NN, and Neural Network (NN) architectures for fault classification, using accuracy for evaluation.

Over time, oversampling algorithms have become increasingly popular and are now commonly integrated into methods. In 2016, Peimankar et al. [35] used a binary variant of the Multi-Objective Particle Swarm Optimization (bi-MOPSO) algorithm and adaptive synthetic (ADASYN) to handle a data imbalance, evaluating the SVM, Fuzzy k-NN, NN, Naive Bayes, and RF models. In 2017, Peimankar et al. [36] proposed a two-step algorithm for power transformer fault diagnosis, exploring different architectural approaches and employing ADASYN for data imbalance. They used the accuracy, F1-score, and Matthews correlation coefficient (MCC) as evaluation metrics. In 2019, Tra et al. [6] introduced the adaptive synthetic minority oversampling technique (ASMOTE), focusing on concentrations as features and evaluating the SVM, kNN, and ANN using the average accuracy. Also in 2019, Liao et al. [37] developed a fault diagnosis method based on four-stage data preprocessing, utilizing Gradient Boosting and comparing with the LDA, k-NN, ANN, SVM, and RF models, with accuracy as the evaluation metric. In 2020, Wu et al. [38] proposed a new diagnostic method for DGA, combining the CNN and LSTM models. They used the Dempster–Shafer (DS) evidence theory to integrate diagnostic matrices from both models, calling it deep parallel diagnosis. Addressing class imbalance, they employed ADASYN and tested various algorithms like the k-NN, DT, SVM, ANN, CNN, LSTM, and fuzzy c-means, with accuracy as the main metric. Dhini et al. [39] introduced a fault diagnosis method using an SVM for M-ary classification. They compared it with the BPNN and ELM-RBF, incorporating the SMOTE for oversampling and evaluating primarily based on accuracy.

In 2021, Zhang et al. [7] employed the self-paced ensemble (SPE) algorithm for DGA, addressing class imbalance with the SMOTE and its variants. They evaluated using the kNN, DT, and SVM, with the minority recall and G-score as the principal metrics. During the same year, de Andrade et al. [40] introduced a novel deep neural network for power transformer fault diagnosis, incorporating k-fold cross-validation and the SMOTE. Their evaluation was based on accuracy. Jiang et al. [41] presented a BPNN-based strategy with Focal Loss for fault diagnosis, utilizing the SMOTE for oversampling. They compared performance with kNN and RF, assessing using accuracy, precision, recall, and the F1-score. Also in 2021, Dwiputranto et al. [42] combined the Genetic Algorithm and Artificial Neural Network (GA-ANN), employing the SMOTE and evaluating with accuracy, precision, and recall metrics. In 2022, Passos et al. [43] employed three strategies based on Optimum-Path Forest (OPF). One of them is a variant of the OPF that performs oversampling, another is the Optimum-Path Forest-based approach for undersampling (OPF-US), and the third is a hybrid technique that combines the OPF variant and OPF-US. They also investigated various oversampling techniques: the SMOTE, Borderline SMOTE, Agglomerative Hierarchical Clustering (AHC), ADASYN, Majority Weighted Minority Oversampling Technique (MWSMOTE), Self-Organizing Map Oversampling (SOMO), and k-means SMOTE. They evaluated using the F1-score. Also in 2022, Yong et al. [44] utilized subspace KNN, boosted trees, and RUS boosted trees techniques, addressing imbalance with the SMOTETomek. Accuracy served as their performance evaluation metric. Finally, Li et al. [45] applied the SMOTE to balance classes and used Grey Wolf Optimization (GWO) for SVM parameter optimization, evaluating performance based on accuracy.

To facilitate the analysis of the considered papers, we structured Table 1, which contains the main elements of each work: the evaluation metrics, learning models, oversampling algorithms, classification type, and feature set.

3. Dataset

A limited number of public databases are available. This work employs the fusion of six datasets: Mang-Huy data [4], 2001 Duval data [10], data from an Egyptian utility [17], 2002 Duval data [20], Ganyun data [21], and also data from the Gouda dataset [22]. The data fusion contains samples from the concentrations of seven different gases: hydrogen (

H_{2}

), methane (

{CH}_{4}

), ethylene (

C_{2} H_{4}

), ethane (

C_{2} H_{6}

), acetylene (

C_{2} H_{2}

), carbon monoxide (CO), and carbon dioxide (

{CO}_{2}

). Such samples have labels for the normal state and different types of faults. Initially, there are seven categories as defined in [46]: low-energy discharge (

D_{1}

); high-energy discharge (

D_{2}

); low-temperature (T <

300 °

C) (

T_{1}

), medium-temperature (

300 °

C < T <

700 °

C) (

T_{2}

), and high-temperature thermal faults (T >

700 °

C) (

T_{3}

); partial discharge (PD); and the normal state (N). The datasets use subsets and unions of these categories.

To unify all these datasets into one, it is necessary to merge some categories, resulting in four classes. So, classes

D_{1}

and

D_{2}

are re-organized into a single class, named discharges (D).

T_{1}

,

T_{2}

, and

T_{3}

are also rearranged into a category covering all the thermal faults (T). The labels of the PD and N categories are unchanged. Table 2 presents the rearrangement of the labels.

According to Table 2, there is a severe imbalance among all the categories. By default, this imbalance degrades the performance of the ML-based classifier. Section 7 applies the oversampling techniques for minimizing the imbalance problem.

4. Learning Framework

Choosing procedures to process the data and a learning model is the main challenge in the practice of machine learning. Figure 2 describes the learning framework adopted in this paper. The dataset consists of gas concentrations in ppm and transformer fault labels. We clipped all the concentrations below 1 ppm to 1 ppm, assuming that this is the limitation of the measuring device, as it was done in [14]. It is worth noting that this strategy, where concentrations below 1 ppm are rounded up to 1 ppm, does not alter the labels that characterize the fault categories. Furthermore, as demonstrated in [14], this is a convenient strategy to avoid dealing with negative log values, which would otherwise render such samples unusable. In addition, there is a subsequent normalization phase that minimizes potential effects by adjusting all the data.

In the preprocessing block, there are three steps: feature selection, normalization, and the oversampling algorithm. In the first step, we choose the four most-used features in the literature to investigate which one is the most appropriate for the power transformer fault classification problem. A normalization step is used to keep the numerical operations stable and to help the convergence of the learning models. In this step, we standardize all the features to the unit variance and zero mean. In the last step, we use an oversampling algorithm to balance the dataset that will be used in training the learning model.

In the training model, we use the SVM with the Gaussian kernel, k-NN, DT, and RF. All these models are individually evaluated for a specific metric and they are compared one by one. Each of these models represents a quaternary classifier that gives a diagnosis of the type of failure (D—discharge, PD—partial discharge, or T—temperature) or if the transformer is healthy (normal state—N).

5. Feature Engineering

Several studies employ gas concentrations or gas ratios as inputs to the AI-based models for DGA [6,16,47]. The present study uses four sets of features: (1) gas concentrations; (2) a logarithm of the gas concentrations [14]; (3) non-code ratios [6,12]; and (4) a proposed set of features.

Gas concentrations: This set of features presents the gas concentrations as an input to ML-based DGA. This is a naive choice as it does not require any kind of preprocessing. These features represent an initial scenario in this paper. Hence, the features consist solely of the concentrations of the seven gases in ppm: $f_{c, 1} = H_{2}$ , $f_{c, 2} = {CH}_{4}$ , $f_{c, 3} = C_{2} H_{4}$ , $f_{c, 4} = C_{2} H_{6}$ , $f_{c, 5} = C_{2} H_{2}$ , $f_{c, 6} = CO$ , and $f_{c, 7} = {CO}_{2}$ .
Logarithm of gas concentrations: Gas concentrations can vary over a wide range of values. Some samples present ppm values around unity, while others are in the vicinity of thousands of ppm. These variations tend to affect the convergence of some learning models. To reduce this issue, researchers apply the logarithmic transformation to gas concentrations [14]. This procedure results in a set with seven features ( $f_{l, 1}$ to $f_{l, 7}$ ) as $f_{l, i} = {log}_{10} f_{c, i}$ , $i = 1, \dots, 7$ .
Non-code ratios: Some works use non-code ratios as their main set of features [6,12]. The initial papers employed them as an empirical model, using thresholds to classify faults. Hence, from the gas concentrations present in the dataset fusion, nine ratios ( $f_{r, 1}, \dots, f_{r, 9}$ ) form the feature set for this case.

$f_{r, 1} = \frac{{CH}_{4}}{H_{2}}$

(1)

$f_{r, 2} = \frac{C_{2} H_{4}}{C_{2} H_{2}}$

(2)

$f_{r, 3} = \frac{C_{2} H_{4}}{C_{2} H_{6}}$

(3)

$f_{r, 4} = \frac{C_{2} H_{2}}{{CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2}}$

(4)

$f_{r, 5} = \frac{H_{2}}{H_{2} + {CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2}}$

(5)

$f_{r, 6} = \frac{C_{2} H_{4}}{{CH}_{4} + C_{2} H_{4} + C_{2} H_{6 +} C_{2} H_{2}}$

(6)

$f_{r, 7} = \frac{{CH}_{4}}{{CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2}}$

(7)

$f_{r, 8} = \frac{C_{2} H_{6}}{{CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2}}$

(8)

$f_{r, 9} = \frac{{CH}_{4} + C_{2} H_{4}}{{CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2}}$

(9)
Proposed set of features: A redesign of the feature set is proposed based on the nine ratios of the non-code ratios ( $f_{r, 1}$ , ⋯, $f_{r, 9}$ ). This redesign properly uses the numerators and denominators of these ratios to encompass all the information of the non-code features. In addition, the logarithm is applied to each equation, resulting in a set of eight features ( $f_{p, 1}$ , $f_{p, 2}$ , $f_{p, 3}$ , $f_{p, 4}$ , $f_{p, 5}$ , $f_{p, 6}$ , $f_{p, 7}$ , and $f_{p, 8}$ ). It is worth noting that applying the logarithmic function reduces the range of values, which is especially beneficial when the data span several orders of magnitude. This characteristic enhances the separability between classes.

$f_{p, 1} = {log}_{10} (H_{2})$

(10)

$f_{p, 2} = {log}_{10} ({CH}_{4})$

(11)

$f_{p, 3} = {log}_{10} (C_{2} H_{2})$

(12)

$f_{p, 4} = {log}_{10} (C_{2} H_{4})$

(13)

$f_{p, 5} = {log}_{10} (C_{2} H_{6})$

(14)

$f_{p, 6} = {log}_{10} ({CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2})$

(15)

$f_{p, 7} = {log}_{10} (H_{2} + {CH}_{4} + C_{2} H_{4} + C_{2} H_{6} + C_{2} H_{2})$

(16)

$f_{p, 8} = {log}_{10} ({CH}_{4} + C_{2} H_{4})$

(17)

6. Performance Metrics

Several metrics encompass different aspects of a learning problem. In the ML literature, researchers identified three families of evaluation metrics used in the context of classification: threshold metrics (e.g., the F

_{β}

-measure, accuracy, and G-score), probabilistic metrics, and ranking metrics (e.g., the area under the curve and receiver operating characteristics) [48]. In these families, the first natural choice is to measure the fraction of correctly classified instances in the test set

T

, the well-known accuracy, defined as

Acc = \frac{1}{|T|} \sum_{j = 1}^{|T|} 1 \{{\hat{y}}_{j} = y_{j}\},

(18)

where

1 \{\cdot\}

is the indicator function, which yields 1 for a correct prediction (

{\hat{y}}_{j} = y_{j}

) and 0 otherwise;

y_{j}

is the target; and

{\hat{y}}_{j}

is the prediction.

In an imbalanced data scenario, this metric poses difficulty for classical learning models as they can bias the performance toward the majority classes [49]. This could be an issue if the minority class possesses valuable knowledge. The F-score was proposed as a variation of the Van Rijsbergen E-measure that combines precision (Pr) and recall (Re) satisfying certain measurement theoretic properties, and it is defined by [50]

F_{β} = \frac{(β^{2} + 1) \Pr \times Re}{β^{2} \Pr + Re},

(19)

where

β

weights the Pr and Re trade-off, Pr measures the correctly classified positive class samples defined as

\Pr = \frac{TP}{TP + FP}

(20)

and Re measures the proportion of rightly identified samples of all that are truly positive, defined as

Re = \frac{TP}{TP + FN},

(21)

where TP, FP, and FN indicate the true-positive, false-positive, and false negative counts, respectively. The Pr and Re are single-class metrics that are better suited for imbalanced problems as they are less sensitive to the skewed data domain. The F-score can be extended to a multiclass problem by using a micro-averaging (summing the numerator and denominator in Equations (20) and (21) over all classes and computing F-score globally) and macro-averaging (computing Pr and Re for each class and averaging those in a single F-score) approach. When

β = 1

, the F1-score corresponds to the equal weighting of the Re and Pr, and the classifier will only obtain a high score when both the Re and Pr are high [51].

The metric G-score was originally proposed by [52] and it was conceived from an imbalanced point of view. The G-score is defined as the following geometric mean

G = \sqrt{\frac{TP}{TP + FN} \frac{TN}{TN + FP}},

(22)

where TN stands for true-negative counts. This metric takes into account the balance of both the positive and negative classes. Hence, it incorporates the imbalanced nature of the data. Furthermore, as well as the F-score, the G-score can be extended for multiclass problems using micro- and macro-averaging. Nonetheless, the extension of the G-score for a multiclass problem using the geometric mean of the recall of all classes proposed by [53] is defined as

MG = {(\prod_{j \in |C|} {Re}_{j})}^{\frac{1}{|C|}}

(23)

where

C = \{1, 2, \dots, n\}

is the set of indexed labels with n classes, MG is the multiclass G-score, and Re

_{j}

is calculated for each class separately.

These are the most often used global metrics in the literature, but what measure is best suited for evaluating the performance of DGA classifiers or an oversampling technique? What are the main ingredients that a metric should encompass to be chosen as the best one? Is there an ultimate metric that includes the performance of an imbalanced multiclass classification problem? How can one visualize the subtleties of the trade-offs in oversampling techniques?

We propose to answer all these questions with a holistic view of the aspects that matter the most in our problem. We use two intuitive combined metrics to assess the aspects of an unbalanced learning problem. The first aspect is to observe the overall trade-off between the improvement in recall in the minority class and the possible worsening in the majority class. This trade-off can be observed by obtaining the mean of the recall of each class as

M_{Re} = \frac{1}{|C|} \sum_{j \in C} {Re}_{j},

(24)

where

C

is the set with all the transformer state classes,

|C|

is the cardinality of the transformer state classes, and Re

_{j}

is the recall of the j-th class. In this way, the average of each algorithm can be compared with the case without oversampling to see the total gain in the recall. The second aspect analyzed consists of observing the imbalance between classes around the

M_{Re}

. This measure of imbalance is defined as

σ_{Re} = \sqrt{\frac{1}{|C|} \sum_{j \in C} {({Re}_{j} - M_{Re})}^{2}} .

(25)

With the metrics

M_{Re}

and

σ_{Re}

combined, one can analyze the performance of any oversampling algorithm from the point of view of unbalanced learning. These metrics, hereafter called combined mean-σ, bring insights from the oversampling algorithms by pointing out the average recall gain and how well the classes were balanced in terms of the recall of each class.

7. Experimental Results and Discussions

In this section, we evaluate the results of each experiment through new insights from the combined metrics and the recall per class. The number of samples of each class is represented by Table 2, where PD and N are the minority classes and D and T are the majority classes. Each result of each set of features depicted and listed in Section 5 are shown in Section 7.1, Section 7.2, Section 7.3, and Section 7.4, respectively. And for each of these features, the results for the learning models SVM, k-NN, DT, and RF for the oversampling algorithms SMOTE, ASMOTE, and CURE-SMOTE are presented. The number of samples of the test set, representing 20% of the total samples, remained constant in all of the presented experiments. For each algorithm and each feature space, we repeated the learning experiment 100 times and computed the mean values of each metric. These metrics were calculated according to the equations in Section 6 and the

F_{1}

was evaluated with a macro-average approach.

7.1. Gas Concentrations

The concentration features in ppm are the naive selection of features. This scenario represents a benchmark and a more simple choice of features. Figure 3a shows that SVM using concentration features performs poorly for all the metrics. This model adequately separates the T class but not the D class, even though it has the largest number of samples. The oversampling algorithms increases the

M_{Re}

level and decrease the deviation of each class from the mean. The choice of the best oversampling algorithm, in this case, would be the ASMOTE because it has the highest

M_{Re}

and the lowest deviation from the mean

σ_{Re}

. The accuracy values to the case without an oversampling algorithm is the best choice. The F1 score and MG agree with the interpretation of the combined mean-

σ

pointing as the best option for the ASMOTE.

Figure 3b shows that the majority classes D and T perform better than the minority classes normal and PD in terms of the recall for the k-NN learning model. The best oversampling algorithm indicated by the combined mean-

σ

is the SMOTE. That is, this algorithm is the one that has the highest average recall level of the classes and the one that has the most balanced classes in terms of recall. In this case, the accuracy and the F1-score elect the case without the oversampling algorithm as the best and the MG-score points to the SMOTE.

Figure 3c depicts the results for the DT learning model and the trade-offs of the recall for the majority and minority classes. The combined mean-

σ

indicates the SMOTE as the best choice for the oversampling algorithm in this experiment. The accuracy points to the normal case as the best option, the F1-score points to the ASMOTE, and the MG-score to the SMOTE.

Figure 3d reveals that the RF learning model has a higher performance in all the metrics than the other models. The SMOTE is the best choice for the oversampling algorithm in terms of the combined mean-

σ

. The accuracy points to the non-oversampling case as the best option, and the F1-score and MG-score point to the SMOTE as the best oversampling algorithm.

The SVM, k-NN, DT, and RF models show increased performance in this order in terms of the analyzed metrics, with SVM being the worst and RF the best. The results show that the accuracy and F1-score metrics do not adequately indicate the best oversampling algorithm as they contradict the combined mean-

σ

metrics, which is not the case for the MG-score that agrees with them. The SMOTE presents a better performance concerning its variants, except for the SVM model.

7.2. Logarithm of Gas Concentrations

DGA concentrations typically present highly skewed distributions, where the majority of the transformers have low concentrations of a few ppm and faulty transformers can obtain thousands of ppm [14]. The logarithmic transform is the natural mapping operation to rescale the concentrations for the ML models. Figure 4 depicts the mean value of each metric, model, and oversampling technique considering the set of the logarithm of the DGA concentration features. With this choice of features, the SVM and k-NN present a better performance in all the metrics when compared to the use of the concentration features. On the other hand, DT and RF are approximately insensitive to this new choice of features.

In Figure 4a, the combined mean-

σ

points to SMOTE as the best oversampling technique for the learning model SVM. The maximum accuracy is the case in which there is no oversampling algorithm as the best one, contradicting the evidence of the combined metrics. The F1-score ambiguously elects the ASMOTE and CURE-SMOTE as the best algorithms. This ambiguity is due to the decimal rounding presented in the graphs. The MG-score correctly elects, in terms of the

M_{Re}

and

σ_{Re}

, the best oversampling algorithm as the SMOTE in this case.

Figure 4b depicts the results for the k-NN learning model. The combined mean-

σ

indicates that the best oversampling algorithm is the SMOTE. The maximum accuracy points to the case where there is no oversampling algorithm, the F1-score for ASMOTE, and the MG-score agrees with the combined metrics.

Figure 4c shows that the combined mean-

σ

points to the SMOTE as the best candidate for the oversampling algorithm for the DT learning model. The accuracy is maximum with the CURE-SMOTE. The maximum F1-score points ambiguously to the SMOTE and CURE-SMOTE due to the decimal rounding. The MG-score agrees with the combined mean-

σ

pointing to SMOTE as the best oversampling algorithm.

Figure 4d illustrates the results for the RF learning model. The SMOTE is the oversampling algorithm that obtained a higher average level of recall,

M_{Re}

, and a better balance of recalls by class, represented by

σ_{Re}

. The maximum accuracy criterion elects the case without the oversampling algorithm and CURE-SMOTE. The F1-score and MG-score agree with the combined mean-

σ

, electing the SMOTE as the best algorithm.

For this set of experiments using the logarithm of gas concentrations, the oversampling algorithm that obtained the best performance in all the cases in terms of the combined mean-

σ

is the SMOTE. Although the SVM, k-NN, and RF present similar results in terms of the

M_{Re}

, the k-NN is the learning model that presents the best balance of the recall between the classes, as pointed out by

σ_{Re}

. The learning model that obtained the worst result in all the metrics for the considered feature set is the DT.

7.3. Non-Code Ratios

The non-code ratios are defined as an empirical model for transformer diagnostics to circumvent the limitations of the classic models, i.e., the Doernenburg, IEC, and Rogers ratios [12]. These domain-specific ratios are used for comparison with the other candidates for the ML feature selection problem. Figure 5 illustrates the mean value of each metric, model, and oversampling technique considering the non-code ratios as features. This choice of features presents a better performance for the SVM and a worse one for the DT and RF when compared to the concentration feature in terms of the

M_{Re}

and

σ_{Re}

. In the case of the k-NN, the use of non-code features implies a greater

M_{Re}

but a greater recall imbalance in the classes when compared to the concentration feature. The experiments using the logarithm concentration as features present a higher

M_{Re}

and lower

σ_{Re}

for all the cases in relation to the non-coded features.

Figure 5a depicts the results for the SVM learning model. In this case, the best oversampling algorithm pointed out by the combined mean-

σ

is SMOTE. The accuracy and F1-score indicate the ASMOTE as the best oversampling algorithm. The MG-score agrees with the combined mean-

σ

pointing to the SMOTE. There is a small difference of a few decimals between the

M_{Re}

and

σ_{Re}

values for the ASMOTE and SMOTE and this small difference is also observed in the MG-score.

Figure 5b shows the results for the k-NN learning model. The oversampling algorithm with the highest average level of recall

M_{Re} = 77.7

is the ASMOTE, but it has a greater imbalance, with

σ_{Re} = 6.7

, than the SMOTE,

σ_{Re} = 6.5

. This example shows that the magnitudes of

M_{Re}

and

σ_{Re}

are not necessarily inversely proportional. The accuracy elects the case non-oversampling algorithm as the best. The F1-score and MG-score point to ASMOTE as the best algorithm. In this way, the MG-score points to the highest average level

M_{Re}

but not necessarily to the most balanced case

σ_{Re}

. This example is particularly interesting as the proposed metrics elucidate the average level of recall and balance in the choice of the oversampling algorithm.

Figure 5c illustrates that the combined mean-

σ

elects the SMOTE as the best oversampling algorithm for the DT learning model. The maximum accuracy points to the case without the oversampling algorithm, the normal case. The maximum criterion of the F1-score or MG-score agrees with the combined mean-

σ

. The CURE-SMOTE decreases the

M_{Re}

from

71.5

in the normal case to

71.0

, despite reducing the imbalance from

17.3

to

16.6

. This is the only case in which the use of the oversampling algorithm worsened the

M_{Re}

metric.

Figure 5d presents the results for the RF learning model. The SMOTE algorithm performs the best in terms of the combined mean-

σ

. The accuracy metric points to the case where there is no oversampling algorithm. The F1-score and MG-score elect the SMOTE as the best oversampling algorithm.

The learning model that presents the best performance in terms of the

M_{Re}

and

σ_{Re}

is the SVM using the SMOTE. The MG-score agrees with the

M_{Re}

in all the cases and

σ_{Re}

in all the cases except in the k-NN model. This shows that a higher MG-score points to a higher

M_{Re}

level but not necessarily to a lower

σ_{Re}

. The experiments, in this case, show that the accuracy and F1-score metrics are not adequate to indicate which is the best oversampling algorithm.

7.4. Logarithm of Non-Code Ratios

The logarithm of the non-code ratios is proposed in this work to obtain the smallest number of informative features of the well-established non-code ratios. Figure 6 illustrates the mean value of each metric, model, and oversampling algorithm considering this proposed set of features. The results show that the use of the logarithm of the non-code ratios improves performance, in terms of the combined mean-

σ

metrics, and other metrics when compared to the non-code ratios and concentration features. However, for the used databases, the logarithm of the concentration features presents a better performance, in terms of the combined mean-

σ

, than the proposed feature set.

In Figure 6a, the combined mean-

σ

elects the SMOTE as the best oversampling algorithm for the learning model SVM. The maximum accuracy and F1-score point to the ASMOTE algorithm. The MG-score agrees with the combined mean-

σ

in pointing out the SMOTE as the best oversampling algorithm.

Figure 6b illustrates the results for the k-NN learning model. The SMOTE is the algorithm that presents the highest level of average recalls

M_{Re}

and the smallest imbalance, represented by

σ_{Re}

. The maximum accuracy points to the case where there is no oversampling algorithm. The F1-score and MG-score point to the SMOTE as the best oversampling algorithm.

Figure 6c shows that the combined mean-

σ

elects the ASMOTE as the best candidate for the oversampling algorithm in the DT learning model. The accuracy, F1-score, and MG-score agree with the combined mean-

σ

pointing to the ASMOTE as the best technique.

Figure 6d depicts the results for the RF learning model. The

M_{Re}

ambiguously points to the SMOTE and ASMOTE due to the decimal rounding, but the SMOTE has a smaller imbalance, being the best candidate. The accuracy points to both the ASMOSTE and CURE-SMOTE candidates, the F1-score to the ASMOTE, and the MG-score to the SMOTE. In this way, only the MG metric agrees with the combined mean-

σ

.

The model that has the best performance in terms of the

M_{Re}

is the SVM using the SMOTE, although this model presents a greater imbalance

σ_{Re}

in relation to the SMOTE and ASMOTE in the k-NN model. This indicates that the maximum MG-score points to the maximum

M_{Re}

but not to the case with the smallest imbalance

σ_{Re}

. The SMOTE is the best oversampling algorithm option for all the models tested, except for the DT.

7.5. General Analysis

The results show that the use of the logarithm of concentrations is the best option in terms of the average level of recalls

M_{Re}

and imbalance,

σ_{Re}

, in relation to the other features tested. In all the study experiments, the use of oversampling algorithms increases the

M_{Re}

and decreases the imbalance

σ_{Re}

, except in the CURE-SMOTE case of the DT model using the non-code ratios as features (see Figure 5b). The oversampling algorithm SMOTE almost always performs better in terms of the combined mean-

σ

than its variants.

8. Conclusions

This study presents novel insights by using combined metrics to assess oversampling algorithms, learning models, and features in AI-driven DGA. Conventional metrics like the accuracy and F1-score are found inadequate in identifying a lower imbalance and higher recall, with the MG-score proving more reliable in indicating cases with a higher average recall per class. Additionally, we introduce two effective evaluation metrics and propose a novel set of features derived from classical non-code ratios, demonstrating superior performance. Through empirical experiments, we offer clear guidance on selecting optimal models, features, and oversampling techniques for fault diagnosis using DGA-based approaches on imbalanced datasets. Our findings highlight the SMOTE as yielding the highest average gain in recalls and effectively reducing the imbalance, with the log of concentrations emerging as the optimal feature choice across all metrics and for the public databases used. This work advances fault diagnosis methodologies, providing practical recommendations for improving DGA-based learning algorithm performance in real-world applications.

Author Contributions

Conceptualization, F.M.L., T.W.C., J.C.S.S.F. and L.G.P.M.; methodology, F.M.L., T.W.C., J.C.S.S.F. and L.G.P.M.; software, F.M.L. and T.W.C.; validation, F.M.L., T.W.C., J.C.S.S.F. and L.G.P.M.; formal analysis, F.M.L., T.W.C., J.C.S.S.F. and L.G.P.M.; investigation, F.M.L., T.W.C., J.C.S.S.F. and L.G.P.M.; writing—original draft preparation, F.M.L. and T.W.C.; writing—review and editing, F.M.L., T.W.C., E.R.d.L., F.V.G., J.C.S.S.F. and L.G.P.M.; project administration, F.V.G. and E.R.d.L. All authors have read and agreed to the published version of this manuscript.

Funding

This research was supported by the Transmissora Aliança de Energia Elétrica S.A. under Grant TAESA Projet: ANEEL-PD-07130-0062/2020.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

Author F.V.G. was employed by the company Transmissora Aliança de Energia Elétrica. Author E.R.d.L. was employed by the company Instituto de Pesquisa Eldorado. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest. The authors declare that this study received funding from Transmissora Aliança de Energia Elétrica. The funder was not involved in the study design, collection, analysis, interpretation of the data, and writing of this article; however, it was involved in the decision to submit it for publication.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ADASYN	Adaptive Synthetic
ANN	Artificial Neural Network
ASMOTE	Adaptive Synthetic Minority Oversampling Technique
AUC	Area Under The Curve
BPNN	Backpropagation Neural Network
bi-MOPSO	binary Multi-Objective Particle Swarm Optimization
CART	Classification and Regression Trees
CNN	Convolutional Neural Network
CURE-SMOTE	Clustering Using SMOTE Representatives
DT	Decision Tree
DS	Dempster–Shafer
DGA	Dissolved Gas Analysis
ELM-RBF	Extreme Learning Machine–Radial Basis Function
FSVM	Fuzzy Support Vector Machine
FCM-FSVM	Fuzzy c-means Clustering-based FSVM
KFCM-FSVM	Kernel Fuzzy c-means Clustering-based FCM-FSVM
GA-ANN	Genetic Algorithm and Artificial Neural Network
GS	Grid Search
GWO	Grey Wolf Optimization
k-NN	k-Nearest Neighbors
LDA	Linear Discriminant Analys
LSTM	Long Short-Term Memory
ML	Machine Learning
MCC	Mattheus Correlation Coefficient
MWSMOTE	Majority Weighted Minority Oversampling Technique
NB	Naive Bayes
NN	Neural Network
OPF	Optimum-Path Forest
OPF-US	Optimum-Path Forest-based approach for Undersampling
RBF	Radial Basis Function
RF	Random Forest
SEP	Self-Paced Ensemble
SOMO	Self-Organizing Map Oversampling
SMOTE	Synthetic Minority Oversampling Technique
SVM	Support Vector Machine

References

Boczar, T.; Borucki, S.; Zmarzly, D. Application possibilities of artificial neural networks for recognizing partial discharges measured by the acoustic emission method. IEEE Trans. Dielectr. Electr. Insul. 2009, 16, 214–223. [Google Scholar] [CrossRef]
Hong, K.; Huang, H.; Fu, Y.; Zhou, J. A vibration measurement system for health monitoring of power transformers. Measurement 2016, 93, 135–147. [Google Scholar] [CrossRef]
Ullah, I.; Yang, F.; Khan, R.; Liu, L.; Yang, H.; Gao, B.; Sun, K. Predictive maintenance of power substation equipment by infrared thermography using a machine-learning approach. Energies 2017, 10, 1987. [Google Scholar] [CrossRef]
Ganyun, L.V.; Haozhong, C.; Haibao, Z.; Lixin, D. Fault diagnosis of power transformer based on multi-layer SVM classifier. Electr. Power Syst. Res. 2005, 74, 1–7. [Google Scholar] [CrossRef]
Golarz, J. Understanding dissolved gas analysis (DGA) techniques and interpretations. In Proceedings of the 2016 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Dallas, TX, USA, 3–5 May 2016; IEEE: New York, NY, USA, 2016; pp. 1–5. [Google Scholar]
Tra, V.; Duong, B.; Kim, J. Improving diagnostic performance of a power transformer using an adaptive over-sampling method for imbalanced data. IEEE Trans. Dielectr. Electr. Insul. 2019, 26, 1325–1333. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, H.C.; Du, Y.; Chen, M.; Liang, J.; Li, J.; Fan, X.; Yao, X. Power transformer fault diagnosis considering data imbalance and data set fusion. High Volt. 2021, 6, 543–554. [Google Scholar] [CrossRef]
Dornenburg, E.; Strittmatter, W. Monitoring oil-cooled transformers by gas-analysis. Brown Boveri Rev. 1974, 61, 238–247. [Google Scholar]
Rogers, R.R. IEEE and IEC codes to interpret incipient faults in transformers, using gas in oil analysis. IEEE Trans. Electr. Insul. 1978, EI-13, 349–354. [Google Scholar] [CrossRef]
Duval, M.; Depalba, A. Interpretation of gas-in-oil analysis using new IEC publication 60599 and IEC TC 10 databases. IEEE Electr. Insul. Mag. 2001, 17, 31–41. [Google Scholar] [CrossRef]
Kim, S.W.; Kim, S.J.; Seo, H.D.; Jung, J.R.; Yang, H.J.; Duval, M. New methods of DGA diagnosis using IEC TC 10 and related databases Part 1: Application of gas-ratio combinations. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 685–690. [Google Scholar]
Dai, J.; Song, H.; Jiang, X. Dissolved gas analysis of insulating oil for power transformer fault diagnosis with deep belief network. IEEE Trans. Dielectr. Electr. Insul. 2017, 24, 2828–2835. [Google Scholar] [CrossRef]
Irungu, G.K.; Akumu, A.O.; Munda, J.L. Fault diagnostics in oil filled electrical equipment: Review of duval triangle and possibility of alternatives. In Proceedings of the 2016 IEEE Electrical Insulation Conference (EIC), Montreal, QC, Canada, 19–22 June 2016; pp. 174–177. [Google Scholar]
Mirowski, P.; Lecun, Y. Statistical machine learning and dissolved gas analysis: A review. IEEE Trans. Power Deliv. 2012, 27, 1791–1799. [Google Scholar] [CrossRef]
Senoussaoui, M.E.A.; Brahami, M.; Fofana, I. Combining and comparing various machine-learning algorithms to improve dissolved gas analysis interpretation. IET Gener. Transm. Distrib. 2018, 12, 3673–3679. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, G.; Paul, P.; Zhang, J.; Wu, T.; Fan, S.; Xiong, X. Dissolved Gas Analysis for Transformer Fault Based on Learning Spiking Neural P System with Belief AdaBoost. Int. J. Unconv. Comput. 2021, 16, 239–258. [Google Scholar]
Ibbahim, S.I.; Ghoneim, S.S.M.; Taha, I.B.M. DGALab: An extensible software implementation for DGA. IET Gener. Transm. Distrib. 2018, 12, 4117–4124. [Google Scholar] [CrossRef]
Aciu, A.M.; Nicola, C.I.; Nicola, M.; Nitu, M.C. Complementary analysis for DGA based on duval methods and furan compounds using artificial neural networks. Energies 2021, 14, 588. [Google Scholar] [CrossRef]
Miranda, V.; Castro, A.R.G. Improving the IEC table for transformer failure diagnosis with knowledge extraction from neural networks. IEEE Trans. Power Deliv. 2005, 20, 2509–2516. [Google Scholar] [CrossRef]
Duval, M. A review of faults detectable by gas-in-oil analysis in transformers. IEEE Electr. Insul. Mag. 2002, 18, 8–17. [Google Scholar] [CrossRef]
Wang, M.H. A novel extension method for transformer fault diagnosis. IEEE Trans. Power Deliv. 2003, 18, 164–169. [Google Scholar] [CrossRef]
Gouda, O.E.; Saleh, S.M.; El-Hoshy, S.H. Power transformer incipient faults diagnosis based on dissolved gas analysis. Indones. J. Electr. Eng. Comput. Sci. 2016, 1, 10–16. [Google Scholar]
Li, Y.; Zhang, X. Improving k nearest neighbor with exemplar generalization for imbalanced classification. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Shenzhen, China, 24–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 321–332. [Google Scholar]
Laszlo, Z.; Torok, L.; Kovacs, G. Improving the performance of the k Rare Class Nearest Neighbor classifier by the ranking of point patterns. In Proceedings of the International Symposium on Foundations of Information and Knowledge Systems, Budapest, Hungary, 14–18 May 2018; Springer: Cham, Switzerland, 2018; pp. 265–283. [Google Scholar]
Kukar, M.; Kononenko, I. Cost-sensitive learning with neural networks. ECAI 1998, 15, 88–94. [Google Scholar]
Lomax, S.; Vadera, S. A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Surv. (CSUR) 2013, 45, 1–35. [Google Scholar] [CrossRef]
Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Ma, L.; Fan, S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinform. 2017, 18, 169. [Google Scholar] [CrossRef] [PubMed]
Guha, S.; Rastogi, R.; Shim, K. CURE: An efficient clustering algorithm for large databases. ACM Sigmod Rec. 1998, 27, 73–84. [Google Scholar] [CrossRef]
Brownlee, J. Imbalanced Classification with Python: Better Metrics, Balance Skewed Classes, Cost-Sensitive Learning; Machine Learning Mastery: San Juan, Puerto Rico, 2020; pp. 48–56. [Google Scholar]
Ma, H.; Ekanayake, C.; Saha, T.K. Power transformer fault diagnosis under measurement originated uncertainties. IEEE Trans. Dielectr. Electr. Insul. 2012, 19, 1982–1990. [Google Scholar] [CrossRef]
Ashkezari, A.D.; Ma, H.; Saha, T.K.; Ekanayake, C. Application of fuzzy support vector machine for determining the health index of the insulation system of in-service power transformers. IEEE Trans. Dielectr. Electr. Insul. 2013, 20, 965–973. [Google Scholar] [CrossRef]
Cui, Y.; Ma, H.; Saha, T. Improvement of power transformer insulation diagnosis using oil characteristics data preprocessed by SMOTEBoost technique. IEEE Trans. Dielectr. Electr. Insul. 2014, 21, 2363–2373. [Google Scholar] [CrossRef]
Peimankar, A.; Weddell, S.J.; Jalal, T.; Lapthorn, A.C. Ensemble classifier selection using multi-objective PSO for fault diagnosis of power transformers. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016; pp. 3622–3629. [Google Scholar]
Peimankar, A.; Weddell, S.J.; Jalal, T.; Lapthorn, A.C. Evolutionary multi-objective fault diagnosis of power transformers. Swarm Evol. Comput. 2017, 36, 62–75. [Google Scholar] [CrossRef]
Liao, W.; Wang, H.; Zhang, J.; Guo, C.; Yao, J.; Jin, Y. An Oil-Immersed Transformer Fault Diagnosis Method Based on Data Preprocessing and Gradient Boosting. In Proceedings of the 2019 IEEE Power and Energy Society General Meeting (PESGM), Atlanta, GA, USA, 4–8 August 2019; pp. 1–5. [Google Scholar]
Wu, X.; He, Y.; Duan, J. A deep parallel diagnostic method for transformer dissolved gas analysis. Appl. Sci. 2020, 10, 1329. [Google Scholar] [CrossRef]
Dhini, A.; Faqih, A.; Kusumoputro, B.; Surjandari, I.; Kusiak, A. Data-driven fault diagnosis of power transformers using dissolved gas analysis (DGA). Int. J. Technol. 2020, 11, 388–399. [Google Scholar] [CrossRef]
Lopes, S.M.A.; Flauzino, R.A.; Altafim, R.A.C. Incipient fault diagnosis in power transformers by data-driven models with over-sampled dataset. Electr. Power Syst. Res. 2021, 201, 107519. [Google Scholar] [CrossRef]
Jian, T.; Huijuan, H.; Gehao, S.; Xiuchen, J. Transformer Fault Diagnosis Model with Unbalanced Samples Based on SMOTE Algorithm and Focal Loss. In Proceedings of the 2021 4th International Conference on Energy, Electrical and Power Engineering (CEEPE), Chongqing, China, 23–25 April 2021; pp. 693–697. [Google Scholar]
Dwiputranto, D.T.H.; Setiawan, N.A.; Adji, T.B. DGA-Based Early Transformer Fault Detection using GA-Optimized ANN. In Proceedings of the 2021 International Conference on Technology and Policy in Energy and Electric Power (ICT-PEP), Jakarta, Indonesia, 29–30 September 2021; pp. 342–347. [Google Scholar]
Passos, L.A.; Jodas, D.S.; Ribeiro, L.C.F.; Akio, M.; Souza, A.N.; Papa, J.P. Handling imbalanced datasets through Optimum-Path Forest. Knowl.-Based Syst. 2022, 242, 108445. [Google Scholar] [CrossRef]
Jia Yong, H.; Mohd Yousof, M.F.; Abd Rahman, R.; Talib, M.A.; Azis, N. Classification of Fault and Stray Gassing in Transformer by Using Duval Pentagon and Machine Learning Algorithms. Arab. J. Sci. Eng. 2022, 47, 14355–14364. [Google Scholar] [CrossRef]
Li, X.; Li, Y.; Xu, Y.; Li, R.; Zhang, G. Fault Diagnostics of Oil-immersed Power Transformer via SMOTE and GWO-SVM. In Proceedings of the 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 25–28 March 2022; pp. 935–939. [Google Scholar]
IEEE Guide for the Interpretation of Gases Generated in Mineral Oil-Immersed Transformers. In IEEE Std C57.104-2019 (Revision of IEEE Std C57.104-2008); IEEE: New York, NY, USA, 2019; pp. 1–98.
Ghoneim, S.S.M.; Taha, I.B.M. A new approach of DGA interpretation technique for transformer fault diagnosis. Int. J. Electr. Power Energy Syst. 2016, 81, 265–274. [Google Scholar] [CrossRef]
Japkowicz, N. Assessment metrics for imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications; Wiley: Hoboken, NJ, USA, 2013; pp. 187–206. [Google Scholar]
Tanha, J.; Abdi, Y.; Samadi, N.; Razzaghi, N.; Asadpour, M. Boosting methods for multi-class imbalanced data classification: An experimental review. J. Big Data 2020, 7, 70. [Google Scholar] [CrossRef]
Lewis, D.D.; Gale, W.A. A sequential algorithm for training text classifiers. In Proceedings of the SIGIR’94, Dublin, Ireland, 3–6 July 1994; Springer: London, UK, 1994; pp. 3–12. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. Icml 1997, 97, 179. [Google Scholar]
Ynamin, S.; Kamel, M.S.; Wang, Y. Boosting for learning multiple classes with imbalanced class distribution. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 592–602. [Google Scholar]

Figure 1. Sample insights into evaluation metrics for dissolved gas analysis: k-NN experiments.

Figure 2. Learning framework.

Figure 3. Experiments with gas concentrations as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 3. Experiments with gas concentrations as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 4. Experiments with logarithm of gas concentrations as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 4. Experiments with logarithm of gas concentrations as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 5. Experiments with non-code ratios as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 5. Experiments with non-code ratios as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 6. Experiments with logarithm of non-coded ratios as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Figure 6. Experiments with logarithm of non-coded ratios as features: Energies 17 02889 i001

recall of normal class, Energies 17 02889 i002

recall of PD class, Energies 17 02889 i003

recall of D class, Energies 17 02889 i004

recall of T class, Energies 17 02889 i005

accuracy,

F1-score,

MG-score,

average recall

M_{Re}

, and

recall deviation

σ_{Re}

.

Table 1. Related works on dissolved gas analysis.

Ref.	Metrics	Learning Model	Oversampling Algorithm	Classification	Features
[14]	AUC, Acc	k-NN, DT, SVM, low-dimensional scaling, NN	Random resampling	Binary	Log of concentrations
[6]	Average Acc	k-NN, SVM, NN	ASMOTE	Septenary	Non-coded ratios
[7]	Minority recall, G-score	k-NN, DT, SVM	SMOTE, BorSMOTE, SafeSMOTE ADASYN, MWOTE, CGMOS, MAHAKIL	Binary	Log of concentrations
[43]	F1-score	Optimum-Path Forest (OPF)	OPF variations for oversampling, SMOTE, Borderline SMOTE, AHC, ADASYN, MWSMOTE, SOMO, k-means SMOTE	Binary	Concentrations
[38]	Mean Acc over five experiments	k-NN, DT, SVM, NN, CNN, LSTM, fuzzy c-means, deep parallel	ADASYN	Senary	Concentrations
[40]	Acc, precision, recall, FP rate	Deep NN	SMOTE, Borderline-SMOTE	Senary	Concentrations
[39]	Acc over ten experiments	SVM, NN, Extreme Learning Machine-RBF	SMOTE, Borderline-SMOTE	Multiple Binary	Concentrations
[34]	Acc	k-NN, SVM, DT, NN	SMOTE	Quaternary	Gas concentrations, transformer condition, water content, acidity, 2-furfuraldehyde, and others
[32]	Acc, G-score, sensitivity, specificity over ten experiments	Fuzzy SVM variants	Random oversampling	Quinary	Concentrations
[35]	Acc, AUC	SVM, Fuzzy k-NN, NN, Naive Bayes, RF	ADASYN	Quaternary	Selected ratios and concentra- tions
[44]	Acc	Ensemble Learning	SMOTETomek	Ternary	Concentrations
[45]	Acc	SVM, Grey Wolf Opti- mization-SVM	SMOTE	Senary	Normalized concentration
[36]	Acc, F1-score, Matthews correlation coefficient	Ensemble Learning	ADASYN	Quaternary	Feature selection from fourteen candidates of ratios and concen- trations
[33]	Acc	Fuzzy SVM	SMOTE	Quinary	Gas concentrations, water content, dielectric dissipation factor, and others
[41]	Acc, precision, recall, F1-score	kNN, RF, NN	SMOTE	Nonary	Concentrations
[37]	Acc	k-NN, RF, NN, Linear Discriminant Analysis	SMOTE	Septenary	Feature selection from fourteen candidates of ratios and concen- trations
[42]	Acc, precision, recall	Genetic Algorithm + NN	SMOTE	Senary	Eighteen DGA ratios

Table 2. Sample distribution per fault category.

Original	Merged
Categories	New categories	Number of samples
$D_{1}$	D	289
$D_{2}$	D	289
$T_{1}$	T	234
$T_{2}$
$T_{3}$
PD	PD	43
N	N	62
Total		551

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Laburú, F.M.; Cabral, T.W.; Gomes, F.V.; de Lima, E.R.; Filho, J.C.S.S.; Meloni, L.G.P. New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers. Energies 2024, 17, 2889. https://doi.org/10.3390/en17122889

AMA Style

Laburú FM, Cabral TW, Gomes FV, de Lima ER, Filho JCSS, Meloni LGP. New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers. Energies. 2024; 17(12):2889. https://doi.org/10.3390/en17122889

Chicago/Turabian Style

Laburú, Felipe M., Thales W. Cabral, Felippe V. Gomes, Eduardo R. de Lima, José C. S. S. Filho, and Luís G. P. Meloni. 2024. "New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers" Energies 17, no. 12: 2889. https://doi.org/10.3390/en17122889

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Insights into Gas-in-Oil-Based Fault Diagnosis of Power Transformers

Abstract

1. Introduction

2. Related Works

3. Dataset

4. Learning Framework

5. Feature Engineering

6. Performance Metrics

7. Experimental Results and Discussions

7.1. Gas Concentrations

7.2. Logarithm of Gas Concentrations

7.3. Non-Code Ratios

7.4. Logarithm of Non-Code Ratios

7.5. General Analysis

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI