*2.1. Transformer Fault Classification Procedure*

Transformer faults can be recognized in conformity with the dissolved gases proliferated attributable to the heating of the dielectric oil and the gases that are prevailing at diverse temperatures, i.e., hydrogen (H2), carbon monoxide (CO), methane (CH4), ethane (C2H6), ethylene (C2H4), acetylene (C2H2), and carbon dioxide (CO2). Nevertheless, five gases, i.e., H2, CH4, C2H4, C2H6, and C2H2, are considered in this research in classifying transformer faults. A set of five transformer conditions, including no fault (NF), partial discharge (PD), and thermal fault conditions, are discerned.

#### *2.2. The Fundamental Principle of the SVM Algorithm*

SVM is a vigorous supervised learning technique for developing a dataset classifier. SVM purports to establish a decision boundary among two classes of datasets that facilitate the forecasting of data labels from one or several feature vectors [11–17]. A decision boundary can be described as the area of a problem space whereupon the output dataset label of a classifier is equivocal. This decision boundary is referred to as the hyperplane, and it is positioned such that it is furthest from the nearest datasets from other classes. These nearest data points are so-called support vectors. The learning procedure of the SVM is illustrated in Figure 1. The SVM learns based on the training dataset entered. Then, the validation dataset is applied to determine the learning performance of the trained SVM algorithm. The trained SVM algorithm model is then applied to classifying samples of unknown test datasets.

Considering a tagged SVM training dataset, the latter can be expressed as follows in Equation (1).

$$(\mathbf{x}\_{1\prime}, y\_1), \dots, (\mathbf{x}\_{n\prime}, y\_n), \mathbf{x}\_i \in R^d \text{ and } y \in (-1, +1) \tag{1}$$

Here,

*i*¯Training compound *xi*¯Feature vector (predictors) *yi*¯Class label

**Figure 1.** The learning procedure of an SVM classifier.

An optimal hyperplane can therefore be expressed as follows in Equation (2).

$$w\mathbf{x}^d + b = 0\tag{2}$$

Here, *w*¯Weight vector *x*¯Input feature vector, *b*¯is the bias.

For all elements of the training dataset, the values of *w* and *b* must fulfill the conditions of the inequalities expressed in Equations (3) and (4).

$$w\mathbf{x}\_i^T + b \ge +1 \text{ if } y\_i = 1 \tag{3}$$

$$w\mathbf{x}\_i^T + b \le -1 \text{ if } y\_i = -1 \tag{4}$$

The purpose of training an SVM algorithm is to ascertain the *w* and *b* such that the hyperplane isolates the data points and makes the best use of the margin <sup>1</sup> ||*<sup>w</sup>* ||<sup>2</sup> . In Figure 2, the vectors *xi* with the property that |*yi*| *wxi <sup>T</sup>* + *b* = 1 will be appellate as support vectors [12].

An alternative to a linear SVM classifier for the nonlinear application of the SVM is the kernel technique, which allows the modelling of higher dimensional and nonlinear models [11–16]. For a nonlinear problem, a kernel function can be employed to append supplemental dimensions to the coarse data and therefore create a linear problem in the eventuating higher dimensional space. In a nutshell, a kernel function, which is expressed as shown in Equation (5), can facilitate carrying computations rapidly, which would in other respects necessitate high dimensional space computations.

$$K(x, y) = < f(x), \\ f(y) > \tag{5}$$

Here,

*K*¯Kernel function

*x*, *y*¯*n* dimensional inputs

*f*¯Map function of the input from *n* dimensional to *m* dimensional space

< *f*(*x*), *f*(*y*) >−Indicate the dot product

**Figure 2.** Linear SVM model classifying red vs. blue data points.

Using kernel functions, the computation of the scalar product of data points in a higher dimensional space except specifically evaluating the mapping from the input space to the higher dimensional space can be conducted. In many instances, calculating the kernel is straightforward whereas in the high dimensional space, calculating the inner product of feature vectors is complex. The feature vector for even straightforward kernels can inflate in dimensions, and kernels such as the radial basis function (RBF) kernel can be expressed as follows in Equation (6).

$$\mathcal{K}\_{RBF}(\mathbf{x}, \mathbf{y}) = \mathbf{e}^{(-\gamma||\mathbf{x} - \mathbf{y}||^2)} \tag{6}$$

The proportionate feature vector is incalculable dimensionally. Even so, calculating the kernel is nearly insignificant. Contingent on the essence of the problem, it is likely that one kernel can be a whole lot better than other kernels. An optimum kernel function can be chosen from an established set of kernels numerically in an extremely thorough and careful manner by applying cross-validation.

#### *2.3. Proposed BCSVM Algorithm*

In the current study, DGA oil samples were obtained from mineral oil-immersed transformers in the field owned by different local independent power utilities. The comprehensive flow diagram of the proposed BCSVM is illustrated in Figure 3.


• At the same time, if the fault is categorized as a thermal (T) fault, then it will be ingested as input to the fourth SVM (SVM 4) which will distinguish whether the fault is a T1 or T2 fault as demonstrated.

**Figure 3.** Proposed BCSVM classification tree.

To evaluate the statistical significance of the experimental DGA dataset, a single-factor ANOVA was employed. The results are illustrated in Table 2.


**Table 2.** Statistical analysis of experimental DGA dataset.

It can be observed from the *<sup>p</sup>*-value of 9.1433 × <sup>10</sup>−<sup>5</sup> (<<sup>α</sup> = 0.05), that the experimental DGA results are statistically significant.

The sensitivity analysis of the input gases was carried out by adopting descriptive statistics analysis (DSSA), as shown in Table 3. DSSA examines the quantitative performance of the input dataset features.

The complexity of the proposed solution design is demonstrated in Figure 4. The latter is partially based upon the related state of the problem diagnosing transformer faults and solution of DGA and artificial intelligence knowledge. Additionally, this can be illustrated in terms of a solution design complexity heatmap comprising all the main activities undertaken.


**Table 3.** Descriptive statistics sensitivity analysis of input gases.

**Figure 4.** Complexity of the proposed BCSVM classification tree.

The red regions reveal locales of conceivably sheer complexity. The green regions denote locales of conceivably little complexity.
