Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data

El Ghaly, Abdallah

doi:10.3390/eng6010004

Open AccessArticle

Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data

by

Abdallah El Ghaly

ECE Department, Faculty of Engineering, Beirut Arab University, Beirut 11-5020, Lebanon

Eng 2025, 6(1), 4; https://doi.org/10.3390/eng6010004

Submission received: 5 December 2024 / Revised: 25 December 2024 / Accepted: 26 December 2024 / Published: 1 January 2025

(This article belongs to the Special Issue Artificial Intelligence for Engineering Applications)

Download

Browse Figures

Versions Notes

Abstract

Fault detection and classification in transmission lines are critical for maintaining the reliability and stability of electrical power systems. Quick and accurate fault detection allows for timely intervention, minimizing equipment damage, and reducing downtime. This study addresses the challenge of effective fault classification, particularly when dealing with smaller, more practical datasets. Initially, the study examined the performance of conventional machine learning algorithms on a comprehensive dataset of 7681 samples, demonstrating high accuracy owing to the inherent symmetry of sinusoidal voltage and current signals. However, the true efficacy of these algorithms was evaluated by minimizing the dataset to 231 training samples, with the remainder being used for testing. A novel Multi-Target Ensemble Classifier was developed to improve classification accuracy. The proposed algorithm achieved an impressive overall accuracy of 0.829165, outperforming traditional methods, including the K-Nearest Neighbors Classifier, support vector classification, random forest classifier, decision tree classifier, AdaBoost classifier, gradient boosting classifier, and Gaussian NB. This research highlights the importance of efficient fault classification techniques in power systems and proposes a superior solution in the form of a multitarget ensemble classifier.

Keywords:

machine learning; transmission lines; fault classification; electrical power systems

1. Introduction

1.1. Background

In modern civilization, human actions are closely connected to the long-term viability of electrical power networks, especially those that rely on renewable energy sources. A continuous and steady provision of electricity is crucial, especially for vital sectors such as healthcare, transportation, and security, which depend on a strong and dependable power supply. To ensure reliability, it is necessary to have reliable and redundant infrastructure, as well as the capability to sustain its performance. Hence, it is crucial to possess the capability to identify various types of faults and react accordingly to uphold a high level of continuous network operation. It is essential to understand the precise type of short circuit, as it dictates the required response measures to reduce downtime and guarantee the continuity of the power system [1,2].

The method proposed in this study aims to enhance the accuracy of transmission line fault classification detection, thereby accelerating the maintenance process in the event of these faults. This is achieved through precise prediction of the location and type of faults in electrical cables [3]. Transmission line faults directly impact the reliability of power systems and the duration of power outages, making it essential to ensure the integrity of cables throughout their operational life [4,5]. When a defect occurs, a rapid response is necessary to minimize the clearing time and restore normal operation [6].

Transmission line fault detection is essential for maintaining the reliability and stability of electrical power systems [7]. As modern power grids become increasingly complex and interconnected, traditional fault-detection methods face challenges in terms of accuracy, speed, and adaptability to changing grid conditions. In response, there is growing interest in leveraging machine learning (ML) techniques to enhance fault detection capabilities [8]. ML offers efficient pattern recognition algorithms that can be applied to predict, locate, and classify faults. These techniques solve non-linear problems based on learned experience and accommodate different configurations of electrical distribution systems [8,9,10].

Recent research has resulted in the development of various techniques for detecting faults in transmission lines. These include time-domain reflectometry, impedance-based methods, knowledge-based methods, traveling wave methods, and hybrid approaches [11]. Each of these methods has distinct advantages and limitations. In [12], time-domain reflectometry was effective for single cables, but ineffective for systems with multiple branches. AI-based algorithms provide solutions capable of managing complex systems [13].

In three-phase electrical power systems, the fault types include interruptions and short circuits. Short-circuits can further be categorized into single-line-to-ground faults, line-to-line faults, double-line-to-ground faults, three-line faults, and three line-to-ground faults [14]. ML techniques are adept at addressing these types of faults, ensuring high accuracy and reducing the duration of power outages, thus contributing to the overall sustainability of electrical power distribution systems [15,16].

A short circuit in an electrical power system progresses through distinct stages: transient, subtransient, and steady states. During normal operation, the current and voltage signals exhibit stable patterns, but these signals undergo significant changes when a short circuit occurs, initially experiencing rapid fluctuations in the transient and sub-transient stages before stabilizing in the steady state. It is critical to act swiftly to preserve the transmission line because it can only withstand fault conditions for a limited time before damage occurs. Researchers have leveraged large datasets, often comprising over 1000 data points, to train ML models for fault detection. Even with a 70% training split, a substantial volume of training data typically leads to a high accuracy. However, short-circuit data from actual systems are not as extensive as those in the simulations. This reliance on extensive datasets represents a potential research gap, suggesting the need for effective fault-detection techniques that can achieve high accuracy with smaller, real-world datasets.

1.2. Literature Review

Researchers have extensively used ML algorithms for fault classification prediction in power systems. For instance, in [1], the authors employed a two-terminal fault classification approach and achieved an accuracy of 0.97. Although the exact number of data points was not specified, it was estimated from the graphs to be over 1500 data points used for training.

In another study, the authors in [17] utilized deep learning techniques and achieved an accuracy of 0.8285 with approximately 1198 training data points. Similarly, a deep learning method based on fast dynamic time warping was used in [18], achieving an accuracy of 0.9937 with a training dataset of 10,766 points. Another deep-learning approach mentioned in [19] utilized 70% of the data for training, estimated to be approximately 4900 points, and achieved an accuracy of 0.9661.

In [20], a training dataset was defined for each fault-classification type. The dataset consisted of 278 points for single line-to-ground faults in Phase A, 277 points in Phase B, and 263 points in Phase C. Double line faults A-B and A-B-G had 521 points, B-C and B-C-G had 524 points, and A-C and A-C-G had 540 points. This totaled 2661 training points for faulty cases, resulting in an accuracy of 0.994 for fault-type classification.

A hybrid approach for short-circuit fault detection in transmission lines was used in [9], with an estimated dataset size of over 10,000 points, achieving an accuracy of 0.9822. Additionally, artificial neural networks (ANN) were employed for predicting the fault location and type in electrical cables, as in [3], where trained models were based on data from 6150 simulations, reaching 0.98 accuracy. This study highlighted that a larger number of responses and training data are needed to achieve good accuracy.

A methodology in [21] utilizes a one-dimensional convolutional neural network (CNN) for classifying faults in transmission lines, demonstrating successful performance across real, synthetic, and publicly available datasets, thereby enhancing the reliability and protection planning of power systems. The accuracy achieved is 0.99 utilizing 16,000 points.

Authors in [22] propose a new fault classification scheme using deep learning techniques, specifically Long Short-Term Memory (LSTM) networks, to enhance fault detection in inverter-fed transmission lines, addressing limitations of traditional phase angle-based methods in systems with Inverter Based Resources (IBR) using a dataset of 3460 and achieving 0.9899 accuracy.

In [23], a ML model using stacked Bi-LSTM cells was employed to classify faults in series-compensated transmission lines by analyzing voltage and current signals, enhancing fault detection reliability through local measurements and harmonic robustness via discrete Fourier transform processing, based on 22,680 dataset size with accuracy of 0.9998.

A machine-learning-based technique for fault detection and classification in transmission lines [24], utilizing an optimized pretrained ensemble tree classifier. It achieves high accuracy 0.994 by adapting to different power generation types and continuously monitoring system topology based on 2400 overall datasets.

These studies demonstrated that high accuracies can be achieved through simulations with large training datasets, leveraging the symmetric and sinusoidal nature of the voltage and current signals. This reliance on extensive simulated data underscores an opportunity for further contribution: developing fault-detection methods that maintain high performance even with limited real-world data. Addressing this would significantly enhance the practical applicability and robustness of ML-based fault detection systems. A summary of similar studies is presented in Table 1.

1.3. Contributions

In this study, a public dataset derived from MATLAB R2023a simulations for short-circuit classification was utilized, comprising a total of 7861 indices. This dataset includes data points for both normal operation and various fault types: single-line-to-ground faults, double-line faults, double-line-to-ground faults, three-line faults, and three-lines-to-ground faults. Initially, the commonly used approach of 80% training and 20% testing with conventional ML algorithms was employed, resulting in 6289 training data points and high accuracies. However, when the training dataset was limited to 236 data points, it became evident that the overall accuracy of the conventional ML algorithms significantly decreased. A workflow block diagram is shown in Figure 1.

To address this challenge, a Multi-Target Ensemble Classifier (MTEC) is proposed, which is an innovative algorithm designed to enhance predictive accuracy by combining multiple ML classifiers, each specializing in different aspects of a multi-target problem. Python 3.10 was used to implement and apply MTEC to a small dataset, resulting in significantly higher accuracies, demonstrating its effectiveness and contribution to the existing literature.

The contributions of this work are as follows:

Novel Application of MTEC: Introducing and validating MTEC for fault detection in electrical power systems, particularly under limited training data conditions.
Performance Benchmarking: A comparative analysis of different traditional ML algorithms and the MTEC algorithm on public data to establish a definitive evaluation of performance.
Improved Accuracy with Limited Data: This demonstrates the capability of MTEC to maintain high accuracy even with a significantly reduced training dataset, which highlights the potential of the proposed algorithm in practical applications where large datasets are not feasible.
Comprehensive Evaluation: Provides a thorough evaluation using various performance metrics, including accuracy, specificity, precision, recall, and F1 score, across multiple algorithms and fault types.

The remainder of this paper is organized as follows. Section 2 represents data visualization and pre-processing. Section 3 reviews the theory behind conventional ML algorithms and proposes the MTEC. The performance of these algorithms, along with the MTEC, is evaluated in Section 4. Finally, Section 5 presents the conclusions of this study.

2. Data Visualization and Pre-Processing

This section describes the processes needed to view and preprocess the dataset used for fault classification in electrical power systems. Data visualization offers valuable insights into the organization and attributes of the data, whereas pre-processing ensures that the dataset is appropriately prepared for training machine learning models.

The dataset utilized in this study is publicly available on Kaggle and comprises six features: current in Phase A (Ia), current in Phase B (Ib), current in Phase C (Ic), voltage in Phase A (Va), voltage in Phase B (Vb), and voltage in Phase C (Vc). The labels are binary outputs represented as [G C B A], where G indicates ground faults (1 when faulted, 0 when not faulted), C represents faults in Phase C, B represents faults in Phase B, and A represents faults in Phase A.

The dataset contains 7681 observations collected from MATLAB/Simulink R2023a simulations. To provide clarity regarding the power system setup, a reconstructed MATLAB/Simulink model representing the circuit has been included as Figure 2. The MATLAB/Simulink model used in Figure 2 was based on a short transmission line type of 30 km length. The transmission line parameters include resistance (0.282 Ω/km), and inductance of (1.86 mH/km), which were set to typical values for short transmission lines. The circuit was simulated under normal conditions and various fault conditions, including single-phase-to-ground faults, double-phase faults, double-phase-to-ground faults, three-phase faults, and three-phase-to-ground faults. The measured line voltages and currents were collected and saved as a dataset of the power system. The dataset, sourced from Kaggle [25], has been widely utilized in fault classification studies due to its comprehensive representation of voltage and current signals under diverse scenarios.

Beginning with data visualization, the first step is to detect any missing values in the dataset. It is essential to address missing values as they might have a substantial impact on the performance of ML models. To tackle this issue, a missing data matrix was created using Python, which offers a concise visual depiction of the integrity of the dataset. This study confirmed the absence of any missing values in the observations, thereby ensuring the integrity and dependability of the dataset. Figure 3 confirms that the dataset is fully populated and suitable for additional pre-processing and analysis, supporting this finding.

The next step in the visualization process is to identify potential outliers in the dataset. Box plots are particularly useful for this purpose. Given that the voltage and current values operate on different scales within the per-unit (pu) system, each feature is visualized independently. For instance, Figure 4 illustrates a box plot of the current measurements.

From Figure 4, it is evident that the current values contain potential outliers. However, before considering techniques, such as capping or winsorization for these outliers, an engineering perspective is essential. Outliers in this context correspond to fault conditions that inherently involve current values that are significantly higher than those in normal operating scenarios. Therefore, what are typically classified as outliers in standard datasets are the critical data points necessary for fault classification. Consequently, these data points should not be removed because they are crucial for accurately identifying and classifying fault types.

Scatter plots are effective tools for visualizing data relationships. In this context, the output labels [G, C, B, A] are collectively referred to as “class”, representing different fault classifications. Figure 5 displays the scatter plots of Ia versus Va and Ia versus Vb. These plots reveal noticeable symmetry in the dataset, forming elliptical shapes. The scattered points that deviate from these elliptical trajectories represent the transient states observed during the transitions from normal operation to short circuits or between different fault types.

Another commonly used visualization method is histograms. Figure 6 presents the histogram distribution for phase Vc, specifically for cases classified as [0000], indicating non-faulted conditions. This histogram helps to illustrate the distribution and range of voltage values for the non-faulted state, providing insight into the typical variations in phase Vc when no faults are present.

When analyzing electrical signals, such as voltage and current, it is often beneficial to visualize these signals with respect to time to comprehensively understand their behavior. Figure 7 shows the voltage waveforms across the entire dataset, including both normal and defective situations. The time-domain representation exposes the transient reactions that arise during fault events, offering insights into the deviations of the signals from their usual patterns. The graph illustrates that the waveform transitions conformed to the anticipated pattern, thereby verifying the absence of anomalous data points within the signal. Figure 8 shows the data size distribution for each fault categorization in relation to the overall number of observations. This diagram presents a summary of the dataset composition in terms of different fault categories.

The focus of this study was to classify faults using reduced datasets. The minimization process was implemented by adjusting the proportions of data used for learning and training within the ML algorithms. This approach aims to explore the effectiveness of fault classification when operating with smaller subsets of data, while maintaining accuracy and reliability.

3. ML Algorithms for Fault Classification

This section discusses the different ML methodologies employed for fault categorization in electrical power systems. The aim is to assess the prediction accuracy of conventional algorithms and the proposed MTEC to accurately anticipate and identify fault classifications in power system transmission lines. ML has emerged as a crucial tool in fault identification because of its capacity to effectively handle intricate patterns and extensive datasets, which conventional methods may find challenging. The subsequent sections present a summary of traditional algorithms and outline their advantages and constraints. This is followed by an in-depth examination of MTEC, a novel method aimed at improving classification accuracy, especially in situations where there is a scarcity of training data.

3.1. Conventional ML Algorithms

This subsection provides a comprehensive examination of conventional ML algorithms used for fault classification in electrical power systems, including K-Nearest Neighbors, Support Vector Classification, Random Forest, Decision Tree, AdaBoost, Gradient Boosting, and Gaussian Naive Bayes. The operational principles, strengths, and limitations of each algorithm are discussed, along with their mathematical formulations. This establishes a baseline for evaluating performance.

K-Nearest Neighbors (KNN) is a simple and intuitive algorithm used for classification tasks. It operates on the principle of finding the k-nearest data points for a given sample and classifying the sample based on the majority class among these neighbors. The distance between the data points is typically calculated using the Euclidean distance formula [26]:

d (x_{i}, x_{j}) = \sqrt{\sum_{l = 1}^{n} {(x_{i, l} - x_{j, l})}^{2}} d

(1)

where

x_{i}

and

x_{j}

are the feature vectors of the two samples and

n

is the number of features. KNN is advantageous because of its simplicity and adaptability to multiclass problems. However, it is computationally expensive for large datasets because of distance calculations, and can be sensitive to noisy data and irrelevant features. In addition, its performance may degrade in high-dimensional spaces owing to the curse of dimensionality.

Support Vector Classification (SVC) is a powerful classification algorithm that aims to find the optimal hyperplane that separates data into different classes with the maximum margin. The mathematical formulation of SVC involves solving the following optimization problem [27,28]:

Minimize \frac{1}{2} | w |^{2}

(2)

subject to

y_{i} (w^{T} x_{i} + b) \geq 1

(3)

where

w

is the weight vector,

b

is the bias, and

y_{i}

is a class label. SVC is effective in high-dimensional spaces and robust to overfitting, especially with clear separation margins. However, it requires careful tuning of the hyperparameters and may not perform well with noisy data or very large datasets because of its high computational cost.

Random Forest is an ensemble learning method that constructs a multitude of decision trees during training and outputs the mode of classes (for classification) of individual trees. The final prediction is given by [29]:

\hat{y} = mode \{{Tree}_{1} (x), {Tree}_{2} (x), \dots, {Tree}_{T} (x)\}

(4)

where

{Tree}_{T}

denotes each decision tree in the forest and

T

is the total number of trees. RF is known to handle large datasets and high-dimensional features effectively while being robust to overfitting and noise. Nonetheless, it can be computationally expensive and slow to train and is less interpretable than single decision trees.

Decision Tree is a straightforward algorithm that splits data into subsets based on feature values to form a tree structure in which each node represents a decision based on a feature. The decision was made by [30].

Decision = \arg \max_{c} (\sum_{i = 1}^{N} I (y_{i} = c))

(5)

where

I

is an indicator function and

c

represents the class label. Decision trees are easy to interpret and require minimal data pre-processing. However, they are prone to overfitting, particularly with deep trees, and are sensitive to small variations in the data.

AdaBoost or Adaptive Boosting, is an ensemble method that combines multiple weak classifiers to create a strong classifier. The prediction is given by:

\hat{y} = sign (\sum_{t = 1}^{T} α_{t} {Classifier}_{t} (x))

(6)

where

α_{t}

is the weight of each classifier

{Classifier}_{t}

, and

T

is the number of classifiers. AdaBoost improves the performance of weak classifiers and reduces bias and variance. However, it can be sensitive to noisy data and outliers, and requires careful parameter tuning.

Gradient Boosting (GB) is another ensemble method that builds models sequentially, where each model corrects the errors of its predecessor. The final prediction is obtained by [31]:

\hat{y} = \sum_{m = 1}^{M} β_{m} {Model}_{m} (x)

(7)

where β_m is the weight of each model

{Model}_{m}

, and

M

is the number of models. GB is effective in capturing complex patterns and can be robust to overfitting if properly tuned. However, it is computationally intensive, slow to train, and requires careful tuning of hyperparameters.

Gaussian Naive Bayes (GNB) is a probabilistic classifier based on Bayes’ theorem, with the assumption of feature independence. The class probability is given by [32,33]:

P (y∣ x) = \frac{P (y) \prod_{i = 1}^{n} P (x_{i}∣ y)}{P (x)}

(8)

where

P (y)

is the prior probability of the class and

P (x_{i}∣ y)

is the conditional probability of feature xi given class

y

. GNB is simple, fast, and performs well with small datasets and categorical data. However, it assumes feature independence, which may not hold in practice, and its performance can degrade with the correlated features.

3.2. Proposed MTEC

This subsection introduces MTEC, a novel approach developed to improve predictive accuracy by combining multiple ML classifiers. The MTEC design and methodology are outlined, focusing on how they address the limitations of conventional methods and enhance fault classification performance. MTEC is an innovative algorithm designed to improve predictive accuracy by leveraging the strengths of multiple ML classifiers, each specializing in different aspects of a multitarget problem. In the context of transmission-line fault detection, MTEC employs distinct classifiers for each fault indicator (G, C, B, and A), training them independently to optimize their specific target variables. This separation allows each classifier to specialize and fine-tune its performance, effectively handling the unique distribution and characteristics of each target variable without the influence or complexity of other targets.

After the independent training phase, MTEC combines the predictions from these individual classifiers to form a comprehensive set of predictions for all target variables. This strategy ensures that each prediction benefits from the specialized attention of its respective classifier, leading to more reliable and accurate outcomes. Furthermore, the MTEC provides flexibility in the selection of different types of classifiers for each target variable. For instance, one can use a Random Forest for one target and a Support Vector Machine for another, depending on which classifier performs the best for each specific target.

The benefits of MTEC are manifold, and specialization ensures that each target variable receives focused training, leading to improved performance on complex and diverse datasets. Flexibility allows the use of different models tailored to each target’s specific characteristics. Scalability is inherent because MTEC can include more target variables without compromising the performance of individual classifiers. Robustness was also enhanced over conventional ML algorithms, reducing the risk of overfitting and underfitting by decoupling the learning process for each target variable.

Incorporating MTEC into the fault detection process of transmission lines results in more accurate and reliable performance, making it a valuable approach for real-world applications. By adopting MTEC, researchers and practitioners can achieve significant advancements in the predictive accuracy of multi-target problems, particularly in complex and high-stakes environments, such as power transmission systems. The overall prediction for each target variable

y_{i}

is determined by combining the outputs of the multiple classifiers as follows:

\hat{y_{j}} = {argmax}_{c} (\sum_{i = 1}^{N} α_{i, j} {Classifier}_{i, j} (x))

(9)

where:

\hat{y_{j}}

: Predicted class for target variable j.

c: Class label.

α_{i, j}

: Weight assigned to the i-th classifier for the j-th target variable.

{Classifier}_{i, j}

: Output of the i-th classifier for the j-th target variable given feature vector

x

.

{argmax}_{c}

: The class label that maximizes the weighted sum of the classifier output.

For target-specific prediction, each prediction is generated by a specific classifier tailored to that target:

\hat{y_{j, k}} = {Classifier}_{k} (x_{j})

(10)

Note that

\hat{y_{j, k}}

is the predicted class for the k-th target-specific classifier for target j. Further,

{Classifier}_{k} (x_{j})

is the output of the k-th classifier for the j-th target variable given the feature vector

x_{j}

.

The final prediction is made by aggregating the predictions from all classifiers:

\hat{y_{j}} = mode \{{Classifier}_{1} (x), {Classifier}_{2} (x), \dots, {Classifier}_{T} (x)\}

(11)

where

\hat{y_{j}}

is the final predicted class for the target variable j. mode is the most frequent class label among the outputs of the classifiers, and

{Classifier}_{T} (x)

represents the output of the t-th classifier for the feature vector

x

.

The equations outlined above form the core of the MTEC algorithm, underscoring its advanced fault classification approach. The MTEC algorithm leverages the strengths of multiple classifiers by creating an ensemble that optimizes both the accuracy and specificity through a systematic combination of individual classifier outputs. This ensemble method integrates the decisions of different models, enhances the overall robustness, and mitigates the limitations associated with single classifiers. The MTEC algorithm utilizes an ensemble of base learners with weights optimized through convex optimization to ensure maximal margin separation between the fault classes. By incorporating techniques such as bagging and boosting, MTEC achieves superior variance reduction and bias correction, enhancing fault classification accuracy. A flowchart is shown in Figure 9.

The superior performance of the MTEC, as demonstrated by the comprehensive evaluation metrics, highlights its effectiveness in accurately classifying faults in electrical power systems. Compared to conventional ML algorithms, MTEC consistently achieves higher accuracy, specificity, precision, and recall, as shown in the next section, thereby offering a more reliable solution for fault detection. The ability of MTEC to handle complex and diverse fault scenarios with greater precision makes it a significant advancement over traditional methods, thereby ensuring higher reliability and efficiency in real-world applications. The mathematical foundation of MTEC, coupled with its empirical success, establishes it as a state-of-the-art approach for fault classification and promises improved operational safety and performance in power systems.

4. Results and Discussion

In this section, the results are discussed based on the entire dataset of 7861 observations to demonstrate that all basic algorithms perform well owing to the symmetry of the data. Subsequently, the performance of the conventional ML algorithms using a subset of the data was evaluated, where 231 points only were used for training and the remaining data were used for testing. This will allow us to assess the performance of the basic algorithms and compare them with the proposed MTEC ML algorithm.

It is noteworthy that all faults are of the bolted type, with high-impedance faults leading to a decrease in current magnitude without impacting the pattern waveform used for ML algorithm training. This characteristic does not interfere with the ML process during training and testing.

To compare the performance of the different algorithms, several evaluation metrics were used, including Accuracy, Precision, Recall, F1 Score, and Specificity.

Accuracy is the ratio of the correctly predicted instances to the total number of instances. This is given by the equation:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(12)

where TP, TN, FP, and FN represent the number of true positives, true negatives, false positives, and false negatives, respectively.

Precision is the ratio of the correctly predicted positive instances to the total number of predicted positives. It can be expressed as:

Precision = \frac{T P}{T P + F P}

(13)

Recall, also known as Sensitivity or True Positive Rate, is the ratio of correctly predicted positive instances to actual positives. The formula for recall is as follows:

Recall = \frac{T P}{T P + F N}

(14)

The F1 Score is the harmonic mean of the Precision and Recall, providing a single metric that balances both concerns. The F1 Score is calculated as follows:

F 1 Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(15)

Specificity, also known as the True Negative Rate, is the ratio of correctly predicted negative instances to the actual negatives. Specificity is defined by

Specificity = \frac{T N}{T N + F P}

(16)

These metrics provide a comprehensive understanding of the model’s performance and address different aspects of prediction quality. Accuracy measures the overall accuracy of the model, whereas precision indicates the accuracy of positive predictions. Recall measures the model’s ability to identify positive cases, and the F1 Score balances the Precision and Recall, making it particularly useful for imbalanced class distributions. Specificity, on the other hand, assesses a model’s ability to correctly identify negative cases. By examining these metrics, we can effectively compare the performance of the basic ML algorithms with the proposed MTEC.

Evaluation metrics were applied independently to each classification category (G, C, B, and A), yielding individual metric scores for each class. In addition, the overall classification accuracy is computed. Table 2 presents a comparative analysis of the basic ML algorithms applied to a base dataset of 7861 instances with an 80/20 training-testing split.

It is clear from Table 2 that, despite GaussianNB, all algorithms have high overall accuracy and very high individual accuracy, precision, recall, F1 score, and specificity. These results confirm that this dataset size provides a high-performance classification prediction, regardless of the algorithm used. By limiting the training dataset to 236 points, most conventional ML algorithms perform poorly, especially in terms of overall accuracy. These results are reflected for each class separately, Table 3 presents a comparative analysis of the basic ML algorithms compared to the proposed MTEC for Class G. Table 4 uses the same minimized dataset and shows the results for Class C. Table 5 as well compares the results of the proposed algorithm with the conventional ML for Class B, while Table 6 does the comparison for Class A.

The evaluation results clearly demonstrate the superior performance of MTEC compared to traditional classifiers. The overall accuracy of MTEC was 0.829, which significantly surpassed the accuracies of individual models, such as the K-Nearest Neighbors Classifier (0.669), SVC (0.676), Random Forest Classifier (0.787), Decision Tree Classifier (0.745), AdaBoost Classifier (0.774), Gradient Boosting Classifier (0.768), and Gaussian NB (0.665). The MTEC method outperforms other methods in predicting each fault indicator with remarkable precision, recall, and F1 scores. Specifically, MTEC achieved the highest accuracy across all fault indicators: 0.881 for G, 0.997 for C, 0.949 for B, and 0.999 for A. This is accompanied by exceptional precision and recall, particularly for C, B, and A, for which the performance of the MTEC is exemplary. The classifier also exhibits outstanding specificity across all targets, indicating robust performance in minimizing false positives. These results underscore MTEC’s effectiveness of MTEC in handling multi-target classification problems, providing a comprehensive and reliable solution for transmission-line fault detection.

For a more dynamic representation of the results, the evaluation metric results, including the overall accuracy and Class C, are visualized as a heatmap in Figure 10 and as a bar graph for overall accuracy and Class B in Figure 11. In addition, the parallel coordinates plot for G in Figure 12.

The performance metrics for the ‘C’ target variable across different ML algorithms indicated varying degrees of efficacy in fault classification. The KNeighbors Classifier and SVC exhibit moderate performance, with precision and recall at 0.836 and 0.829, respectively, while their accuracy and specificity were slightly higher, suggesting better identification of non-fault cases. The Random Forest Classifier demonstrated strong performance across all metrics, particularly with a specificity of 0.995, indicating high effectiveness in identifying true negatives. The Decision Tree Classifier maintained a good overall performance with balanced precision and recall at 0.903, along with high accuracy and specificity. The AdaBoost Classifier performed well with an accuracy of 0.957 and a specificity of 0.988, complemented by balanced precision and recall at 0.907. The Gradient Boosting Classifier also showed robust results with a C Accuracy of 0.962 and an almost perfect C Specificity of 0.999, along with precision and recall of 0.908. The GaussianNB classifier showed balanced performance, with accuracy, specificity, precision, and recall in the range of 0.907–0.926. However, the Multi-Target Ensemble Classifier (MTEC) clearly outperformed all other methods. It achieved the highest metrics across the board, including a near-perfect C Accuracy of 0.997 and C Specificity of 0.999, with outstanding precision and recall both at 0.994. These results underscore the superiority of MTEC in fault classification tasks, highlighting its effectiveness and reliability compared with other traditional machine learning algorithms.

The proposed method’s performance was evaluated for each fault type, demonstrating its robustness across all scenarios. MTEC achieved high classification accuracy for single-phase-to-ground faults, double-phase faults, double-phase-to-ground faults, three-phase faults, and three-phase-to-ground faults. The results highlight the algorithm’s adaptability to various fault types, even under limited training data conditions, ensuring reliable and timely fault classification in diverse operational scenarios. By focusing on the strengths of individual classifiers tailored to specific targets, MTEC achieves higher overall accuracy and more reliable predictions across all target variables. This approach is particularly beneficial in complex multi-target problems, where the relationships between targets are minimal or non-linear. In practical implementation for transmission-line fault detection, MTEC demonstrated significant improvements in accuracy, with an overall accuracy of 0.829 and individual accuracies of 0.881 for G, 0.997 for C, 0.949 for B, and 0.999 for A.

5. Conclusions

This study addresses the critical need for the rapid and accurate fault classification of transmission lines, which is essential for maintaining the reliability and stability of electrical power systems. Quick fault detection and clearance are vital for preventing equipment damage, reducing downtime, and ensuring a continuous supply of electricity. Although conventional machine learning algorithms often achieve high accuracy with large datasets owing to the sinusoidal nature of voltage and current signals, this can mask their true effectiveness.

A gap was identified in effectively handling smaller, more practical datasets while maintaining a high classification accuracy. To address this, the dataset was minimized to 231 points for training, and the remaining data were used for testing. A novel MTEC that leverages ensemble learning techniques was proposed and developed to improve the classification performance. The MTEC algorithm works by training separate classifiers for each fault indicator (G, C, B, and A) and combining their outputs to provide a comprehensive fault classification. This approach leverages the strength of ensemble learning to improve the robustness and accuracy of fault detection, particularly when dealing with reduced training datasets.

The MTEC algorithm significantly outperformed traditional machine learning methods, achieving an overall accuracy of 0.829165. This result demonstrates the superiority of MTEC in accurately classifying faults in transmission lines even with a reduced dataset size. The individual accuracies for each fault indicator—0.881, 0.997, 0.949, and 0.999 for G, C, B, and A, respectively—further highlight its robustness and effectiveness.

Future work could involve further optimization of the MTEC algorithm and exploring its applicability to different types of faults and power system configurations. Additionally, integrating real-time data processing capabilities and improving computational efficiency are crucial for practical deployment in live power systems. Furthermore, applying the methodology to other datasets and real-world measurements could validate its robustness and adaptability across diverse scenarios.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available upon request.

Conflicts of Interest

The author declares no conflict of interest.

References

Muzzammel, R.; Arshad, R.; Raza, A.; Sobahi, N.; Alqasemi, U. Two Terminal Instantaneous Power-Based Fault Classification and Location Techniques for Transmission Lines. Sustainability 2023, 15, 809. [Google Scholar] [CrossRef]
Jyothula, V.R.; Purohit, Y.; Paraye, M. Fault Detection in Power Transmission Line Using Machine Learning Techniques. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–5. [Google Scholar]
Moldovan, A.-M.; Buzdugan, M.I. Prediction of Faults Location and Type in Electrical Cables Using Artificial Neural Network. Sustainability 2023, 15, 6162. [Google Scholar] [CrossRef]
Wang, L.; Liu, H.; Dai, L.V.; Liu, Y. Novel Method for Identifying Fault Location of Mixed Lines. Energies 2018, 11, 1529. [Google Scholar] [CrossRef]
Michau, G.; Hsu, C.-C.; Fink, O. Interpretable Detection of Partial Discharge in Power Lines with Deep Learning. Sensors 2021, 21, 2154. [Google Scholar] [CrossRef]
Ali, S.; Bhargava, A.; Saxena, A.; Kumar, P. A Hybrid Marine Predator Sine Cosine Algorithm for Parameter Selection of Hybrid Active Power Filter. Mathematics 2023, 11, 598. [Google Scholar] [CrossRef]
Faza, A.; Al-Mousa, A.; Alqudah, R. Optimal PMU Placement for Fault Classification and Localization Using Enhanced Feature Selection in Machine Learning Algorithms. Int. J. Energy Res. 2024, 2024, 5543160. [Google Scholar] [CrossRef]
Sarmiento, J.L.P.; Delfino, J.C.D.V.; Arboleda, E.R.; Sarmiento, J.L.P.; Delfino, J.C.D.V.; Arboleda, E.R. Machine Learning Advances in Transmission Line Fault Detection: A Literature Review. Int. J. Sci. Res. Arch. 2024, 12, 2880–2887. [Google Scholar] [CrossRef]
Brito Palma, L. Hybrid Approach for Detection and Diagnosis of Short-Circuit Faults in Power Transmission Lines. Energies 2024, 17, 2169. [Google Scholar] [CrossRef]
Bouaziz, F.; Masmoudi, A.; Abdelkafi, A.; Krichen, L. Applying Machine Learning Algorithms for Fault Detection and Classification in Transmission Lines. In Proceedings of the 2023 IEEE 11th International Conference on Systems and Control (ICSC), Sousse, Tunisia, 18–20 December 2023; pp. 207–212. [Google Scholar]
Moldovan, A.-M.; Oltean, S.; Buzdugan, M.I. Methods of Faults Detection and Location in Electrical Systems. In Proceedings of the 2021 9th International Conference on Modern Power Systems (MPS), Cluj-Napoca, Romania, 16–17 June 2021; pp. 1–4. [Google Scholar]
Tariq, R.; Alhamrouni, I.; Rehman, A.U.; Tag Eldin, E.; Shafiq, M.; Ghamry, N.A.; Hamam, H. An Optimized Solution for Fault Detection and Location in Underground Cables Based on Traveling Waves. Energies 2022, 15, 6468. [Google Scholar] [CrossRef]
Mahafzah, K.A.; Obeidat, M.A.; Mansour, A.M.; Al-Shetwi, A.Q.; Ustun, T.S. Artificial-Intelligence-Based Open-Circuit Fault Diagnosis in VSI-Fed PMSMs and a Novel Fault Recovery Method. Sustainability 2022, 14, 16504. [Google Scholar] [CrossRef]
de Alencar, G.T.; dos Santos, R.C.; Neves, A. A New Robust Approach for Fault Location in Transmission Lines Using Single Channel Independent Component Analysis. Electr. Power Syst. Res. 2023, 220, 109281. [Google Scholar] [CrossRef]
Zhou, G.; Zhang, X.; Han, M.; Filizadeh, S.; Geng, Z. Single-Ended Fault Detection Scheme Using Support Vector Machine for Multi-Terminal Direct Current Systems Based on Modular Multilevel Converter. J. Mod. Power Syst. Clean Energy 2023, 11, 990–1000. [Google Scholar] [CrossRef]
Fang, J.; Chen, K.; Liu, C.; He, J. An Explainable and Robust Method for Fault Classification and Location on Transmission Lines. IEEE Trans. Ind. Inform. 2023, 19, 10182–10191. [Google Scholar] [CrossRef]
Maduako, I.; Igwe, C.F.; Abah, J.E.; Onwuasaanya, O.E.; Chukwu, G.A.; Ezeji, F.; Okeke, F.I. Deep Learning for Component Fault Detection in Electricity Transmission Lines. J. Big Data 2022, 9, 81. [Google Scholar] [CrossRef]
Yang, N.-C.; Yang, J.-M. Fault Classification in Distribution Systems Using Deep Learning With Data Preprocessing Methods Based on Fast Dynamic Time Warping and Short-Time Fourier Transform. IEEE Access 2023, 11, 63612–63622. [Google Scholar] [CrossRef]
Teimourzadeh, H.; Moradzadeh, A.; Shoaran, M.; Mohammadi-Ivatloo, B.; Razzaghi, R. High Impedance Single-Phase Faults Diagnosis in Transmission Lines via Deep Reinforcement Learning of Transfer Functions. IEEE Access 2021, 9, 15796–15809. [Google Scholar] [CrossRef]
Al Kharusi, K.; El Haffar, A.; Mesbah, M. Fault Detection and Classification in Transmission Lines Connected to Inverter-Based Generators Using Machine Learning. Energies 2022, 15, 5475. [Google Scholar] [CrossRef]
Turanlı, O.; Benteşen Yakut, Y. Classification of Faults in Power System Transmission Lines Using Deep Learning Methods with Real, Synthetic, and Public Datasets. Appl. Sci. 2024, 14, 9590. [Google Scholar] [CrossRef]
Etukuri, S.; Siva, M.; Varma, B.R.K. Enhanced Fault Classification in Inverter-Fed Transmission Lines Using Deep Learning. Eng. Res. Express 2024, 6, 045302. [Google Scholar] [CrossRef]
Ebrahimi, H.; Golshannavaz, S.; Yazdaninejadi, A.; Pouresmaeil, E. Improving Protection Reliability of Series-Compensated Transmission Lines by a Fault Detection Method through an ML-Based Model. IET Gener. Transm. Distrib. 2024, 18, 3452–3461. [Google Scholar] [CrossRef]
Kharusi, K.A.; Haffar, A.E.; Mesbah, M. Adaptive Machine-Learning-Based Transmission Line Fault Detection and Classification Connected to Inverter-Based Generators. Energies 2023, 16, 5775. [Google Scholar] [CrossRef]
Detection of the Electrical Faults. Available online: https://kaggle.com/code/yaarvnpatr/detection-of-the-electrical-faults (accessed on 20 December 2024).
Jo, S.; Oh, J.-Y.; Lee, J.; Oh, S.; Moon, H.S.; Zhang, C.; Gadh, R.; Yoon, Y.T. Hybrid Genetic Algorithm With K-Nearest Neighbors for Radial Distribution Network Reconfiguration. IEEE Trans. Smart Grid 2024, 15, 2614–2624. [Google Scholar] [CrossRef]
Xiao, W.; Sun, Y.; Li, K.; Xu, M.; Li, H.; Yu, L.; Gao, L. A Modified Forecasting Algorithm for Wind Power Based on SVM. In Proceedings of the TENCON 2015—2015 IEEE Region 10 Conference, Macao, China, 1–4 November 2015; pp. 1–5. [Google Scholar]
Wang, Z.; Li, Y.; Yin, X. Visual MMC Open Circuit Fault Real-Time Rapid Detection System. IEEE Access 2023, 11, 15030–15037. [Google Scholar] [CrossRef]
Tabanelli, E.; Tagliavini, G.; Benini, L. Optimizing Random Forest-Based Inference on RISC-V MCUs at the Extreme Edge. IEEE Trans. Comput. -Aided Des. Integr. Circuits Syst. 2022, 41, 4516–4526. [Google Scholar] [CrossRef]
Meng, L.; Bai, B.; Zhang, W.; Liu, L.; Zhang, C. Research on a Decision Tree Classification Algorithm Based on Granular Matrices. Electronics 2023, 12, 4470. [Google Scholar] [CrossRef]
Singh, U.; Rizwan, M.; Alaraj, M.; Alsaidan, I. A Machine Learning-Based Gradient Boosting Regression Approach for Wind Power Production Forecasting: A Step towards Smart Grid Environments. Energies 2021, 14, 5196. [Google Scholar] [CrossRef]
Guo, W.; Wang, G.; Wang, C.; Wang, Y. Distribution Network Topology Identification Based on Gradient Boosting Decision Tree and Attribute Weighted Naive Bayes. Energy Rep. 2023, 9, 727–736. [Google Scholar] [CrossRef]
Venkata, P.; Pandya, V. Data Mining Model and Gaussian Naive Bayes Based Fault Diagnostic Analysis of Modern Power System Networks. Mater. Today Proc. 2022, 62, 7156–7161. [Google Scholar] [CrossRef]

Figure 1. Workflow block diagram.

Figure 2. Simulink model.

Figure 3. Missing data matrix.

Figure 4. Boxplot for phase currents.

Figure 5. Scatter plot for (a) Ia vs. Va; (b) Ia vs. Vb.

Figure 6. Histogram of Vc for non-faulted classification.

Figure 7. Voltage waveforms.

Figure 8. Pie chart for fault classification data size (a) for Class A, (b) for Class B, (c) for Class C, and (d) for Class G.

Figure 9. MTEC flowchart.

Figure 10. Evaluation metrics results heatmap.

Figure 11. Evaluation metrics results bar graph.

Figure 12. Evaluation metrics results parallel coordinates plot for G.

Table 1. Similar Work Results.

Reference	Algorithm	Training Dataset	Accuracy
[1]	ML	1500	0.97
[3]	ANN	6150	0.98
[9]	ML	10,000	0.9822
[17]	Deep Learning	1198	0.8285
[18]	Deep Learning	10,766	0.9937
[19]	Deep Learning	4900	0.9661
[20]	ML	2661	0.994
[21]	Deep Learning	16,000	0.99
[22]	ML	3460	0.9899
[23]	ML	22,680	0.9998
[24]	ML	2400	0.994

Table 2. Basic dataset ML evaluation.

Algorithm	Overall Accuracy	A Accuracy	A Precision	A Recall	A F1 Score	A Specificity
KNeighborsClassifier	0.810553	0.996821	1	0.994382	0.997183	1
SVC	0.759059	0.961856	0.982558	0.949438	0.965714	0.978038
RandomForestClassifier	0.883662	0.999364	1	0.998876	0.999438	1
DecisionTreeClassifier	0.899555	1	1	1	1	1
AdaBoostClassifier	0.862047	1	1	1	1	1
GradientBoostingClassifier	0.839797	0.998093	1	0.996629	0.998312	1
GaussianNB	0.666243	0.94342	0.93675	0.965169	0.950747	0.915081

Table 3. Minimized dataset evaluation for overall and class G.

Algorithm	Overall Accuracy	G Accuracy	G Precision	G Recall	G F1 Score	G Specificity
KNeighborsClassifier	0.66942	0.821269	0.782089	0.813163	0.797323	0.827443
SVC	0.675846	0.821269	0.781596	0.814073	0.797504	0.82675
RandomForestClassifier	0.787175	0.851429	0.801561	0.872308	0.835439	0.835528
DecisionTreeClassifier	0.745083	0.83753	0.802469	0.828025	0.815047	0.844768
AdaBoostClassifier	0.774456	0.848807	0.827228	0.821959	0.824585	0.869254
GradientBoostingClassifier	0.767899	0.833464	0.788008	0.841068	0.813674	0.827674
GaussianNB	0.664962	0.759245	0.685831	0.817713	0.745988	0.714715
MTEC	0.829165	0.881306	0.883234	0.844466	0.863415	0.910755

Table 4. Minimized dataset evaluation for class C.

Algorithm	C Accuracy	C Precision	C Recall	C F1 Score	C Specificity
KNeighborsClassifier	0.902439	0.92	0.835564	0.826603	0.912651
SVC	0.900341	0.920736	0.829191	0.827	0.911646
RandomForestClassifier	0.968922	0.994545	0.929573	0.961414	0.945497
DecisionTreeClassifier	0.923682	0.910405	0.903442	0.903764	0.935946
AdaBoostClassifier	0.95712	0.98819	0.906628	0.945528	0.947671
GradientBoostingClassifier	0.96171	0.998948	0.907903	0.930406	0.952217
GaussianNB	0.926174	0.907308	0.913958	0.917681	0.914185
MTEC	0.997033	0.998921	0.993562	0.996477	0.99887

Table 5. Minimized evaluation for class B.

Algorithm	B Accuracy	B Precision	B Recall	B F1 Score	B Specificity
KNeighborsClassifier	0.909127	1	0.836634	0.911051	1
SVC	0.912143	1	0.842056	0.914256	1
RandomForestClassifier	0.957776	0.964896	0.958982	0.96193	0.956265
DecisionTreeClassifier	0.965644	0.979287	0.95851	0.968787	0.974586
AdaBoostClassifier	0.961185	0.987883	0.941773	0.964277	0.98552
GradientBoostingClassifier	0.956596	0.974521	0.946723	0.960421	0.968972
GaussianNB	0.954891	0.997956	0.920792	0.957822	0.997636
MTEC	0.948707	1	0.905837	0.950592	1

Table 6. Minimized evaluation for class A.

Algorithm	A Accuracy	A Precision	A Recall	A F1 Score	A Specificity
KNeighborsClassifier	0.910831	1	0.844394	0.915633	1
SVC	0.922764	1	0.865217	0.927739	1
RandomForestClassifier	0.973249	1	0.953318	0.976101	1
DecisionTreeClassifier	0.972594	0.99737	0.954691	0.975564	0.996622
AdaBoostClassifier	0.97102	0.997363	0.951945	0.974125	0.996622
GradientBoostingClassifier	0.973643	0.997375	0.956522	0.976521	0.996622
GaussianNB	0.942696	0.936709	0.965217	0.950749	0.912469
MTEC	0.999152	1	0.998506	0.999253	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

El Ghaly, A. Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data. Eng 2025, 6, 4. https://doi.org/10.3390/eng6010004

AMA Style

El Ghaly A. Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data. Eng. 2025; 6(1):4. https://doi.org/10.3390/eng6010004

Chicago/Turabian Style

El Ghaly, Abdallah. 2025. "Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data" Eng 6, no. 1: 4. https://doi.org/10.3390/eng6010004

APA Style

El Ghaly, A. (2025). Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data. Eng, 6(1), 4. https://doi.org/10.3390/eng6010004

Article Menu

Hybrid ML Algorithm for Fault Classification in Transmission Lines Using Multi-Target Ensemble Classifier with Limited Data

Abstract

1. Introduction

1.1. Background

1.2. Literature Review

1.3. Contributions

2. Data Visualization and Pre-Processing

3. ML Algorithms for Fault Classification

3.1. Conventional ML Algorithms

3.2. Proposed MTEC

4. Results and Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI