Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD

Binsawad, Muhammad; Khan, Bilal

doi:10.3390/a17090406

Open AccessArticle

Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD

by

Muhammad Binsawad

¹

and

Bilal Khan

^2,*

¹

Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

²

Department of Computer Science, City University of Science and Information Technology, Peshawar 25000, Pakistan

^*

Author to whom correspondence should be addressed.

Algorithms 2024, 17(9), 406; https://doi.org/10.3390/a17090406

Submission received: 10 June 2024 / Revised: 4 September 2024 / Accepted: 6 September 2024 / Published: 11 September 2024

Download

Browse Figures

Versions Notes

Abstract

Detecting abnormal ECG patterns is a crucial area of study aimed at enhancing diagnostic accuracy and enabling early identification of Chronic Kidney Disease (CKD)-related abnormalities. This study compares a unique strategy for abnormal ECG patterns using the LADTree model to standard machine learning (ML) models. The study design includes data collection from the MIT-BIH Arrhythmia dataset, preprocessing to address missing values, and feature selection using the CfsSubsetEval method using Best First Search, Harmony Search, and Particle Swarm Optimization Search approaches. The performance assessment consists of two scenarios: percentage splitting and K-fold cross-validation, with several evaluation measures such as Kappa statistic (KS), Best First Search, recall, precision-recall curve (PRC) area, receiver operating characteristic (ROC) area, and accuracy. In scenario 1, LADTree outperforms other ML models in terms of mean absolute error (MAE), KS, recall, ROC area, and PRC. Notably, the Naïve Bayes (NB) model has the lowest MAE, but the Support Vector Machine (SVM) performs badly. In scenario 2, NB has the lowest MAE but the highest KS, recall, ROC area, and PRC area, closely followed by LADTree. Overall, the findings indicate that the LADTree model, when optimized for ECG signal data, delivers promising results in detecting abnormal ECG patterns potentially related with CKD. This study advances predictive modeling tools for identifying abnormal ECG patterns, which could enhance early detection and management of CKD, potentially leading to improved patient outcomes and healthcare practices.

Keywords:

Chronic Kidney Disease (CKD); machine learning (ML); ECG signal data; LADTree model; feature selection

1. Introduction

Kidney diseases are rising daily and millions of people worldwide are affected due to these diseases. Sometimes, this causes a fatal medical ailment leading to place strain on both patients and the healthcare system [1]. Timely and correct diagnosis of kidney diseases is only possible due to effective management and treatment [2]. Electrocardiogram (ECG) data is valuable not only for treating and predicting cardiovascular health but also holds promise for the early identification and prediction of renal diseases [3]. Traditionally used to diagnose heart-related issues, ECG readings contain complex patterns that can reflect various physiological states. The close relationship between kidney function and circulatory dynamics is well-known, suggesting that variations in ECG signals might indicate changes related to kidney health [4].

Machine learning (ML) has recently transformed medical science and diagnostics, demonstrating to be extremely successful in tasks such as categorization, pattern recognition, and prediction [5]. This has prompted research into using various machine-learning algorithms for kidney disease prediction using ECG data [6]. However, creative models are still required to fully realize the promise of ECG data and improve forecast accuracy.

ECGs are valued for their affordability, accessibility, and quick results in diagnosing medical conditions. Deep learning algorithms (DLA) are now being applied to ECG analysis, offering the potential to predict outcomes, identify subclinical disorders, and reveal systemic phenotypes. CKD patients often exhibit cardiovascular risk factors and subclinical cardiac changes, including myocardial fibrosis, which may appear early in the disease. Electrolyte disorders common in CKD, such as hypokalemia and hyperkalemia, manifest as distinct ECG patterns, aiding in their detection. However, these abnormalities are subtle and may not be easily discernible by humans, making DLA-assisted screening valuable for identifying asymptomatic CKD patients who require further evaluation [7].

The need for prompt and non-invasive diagnostic tools underscores the value of detecting abnormal ECG patterns potentially. Traditional CKD diagnosis primarily relies on urine and blood tests, such as serum creatinine and glomerular filtration rate, which can be time-consuming, invasive, and may not immediately reflect disease progression. By leveraging ECG data, which is readily accessible, this approach offers the potential for early identification of cardiac issues related to CKD, facilitating timely treatment and ongoing monitoring of individuals at risk.

This work aims to advance the detection of abnormal ECG patterns by utilizing the LADTree model. This model incorporates logistic boosting to construct an alternating decision tree, where each iteration selects a single attribute test as the splitter node. The model retains per-class weights and a functional response for each training instance. By minimizing the least-squares error between examples, the functional response is adjusted to fit the mean value of instances within a specific subset, enhancing the model’s predictive capabilities for detecting abnormal ECG patterns. The LADTree is compared against several well-established ML models, including K-nearest Neighbor (KNN), Multilayer Perceptron (MLP), Naïve Bayes (NB), Support Vector Machine (SVM), and J48-Decision Tree (J48), are used to assess LADTree’s performance. Accuracy and error rate analysis are the two assessment measures used to evaluate these models. While accuracy is assessed based on Recall, Precision-Recall Curve (PRC) Area, Receiver Operating Characteristic Curve (ROC) Area, and Classification Accuracy (CA), error rates are calculated using Kappa Statistics (KS) and Mean Absolute Error (MAE). Unlike other models, LADTree’s unique hierarchical structure and interpretability offer significant advantages, making it particularly effective in capturing complex relationships within ECG data for CKD prediction.

The contribution of this study lies in its detailed exploration of improving the detection of abnormal ECG patterns potentially linked to CKD through the application of machine learning approaches, with a particular emphasis on the innovative LADTree model. By comparing the LADTree model to traditional ML models such as MLP, KNN, SVM, NB, and J48, the study sheds light on the usefulness of various CKD prediction approaches. The work ensures the selection of the most relevant and non-redundant features for improved detection of abnormal ECG patterns by meticulously collecting and preprocessing data, followed by rigorous feature selection using multiple search approaches. Additionally, this work comprehensively investigates model performance under various circumstances by using K-fold cross-validation and percentage splitting for the training and testing procedures. Both error rate and accuracy evaluation metrics are included in the analysis: MAE (mean absolute error), recall, ROC (receiver operating characteristic curve), PRC (precision-recall curve) area, KS (Kappa Statistics), and CA (classification accuracy). This thorough method offers a clear picture of the advantages and disadvantages of each model. However, the LADTree model regularly performs better than other machine learning models on a range of criteria, suggesting that it might be a top choice for abnormal ECG pattern detection. All things considered, this study improves medical diagnostics by offering a unique method using ECG signal data for abnormal ECG signal prediction and by illuminating the relative efficacy of many ML models in this domain.

The subsequent sections of this paper are structured as follows: Section 2 delineates the literature review, and Section 3 outlines the research design and methodology. Section 4 elucidates the analysis and discussion of the result, whereas Section 5 encapsulates the conclusion of the study.

2. Literature Study

Advanced artificial intelligence (AI) and ML techniques have greatly improved modern healthcare systems by enabling more accurate patient analysis and customized treatments. Revathy et al.’s work [8] addressed the problems by analyzing and predicting the condition using a variety of ML models. They conducted a comparative study based on accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC), and they gave a thorough description of these models. Their research demonstrated the potential of predictive models to enhance healthcare outcomes and resulted in the creation of the most effective model for CKD prediction. In their conclusion, they emphasized the role that machine learning plays in early CKD prediction.

E.P.B. Multia et al. discovered that renal illness may be anticipated through the use of classification algorithms to analyze patients’ ECG data. According to recent research, people with kidney illness frequently encounter cardiac problems called cardiorenal syndrome (CRS), which can result in unexpected cardiac death [9]. ML models can be used by patients with cardiovascular issues to assess whether cardiovascular illness and CRS are affecting their kidneys [10]. Another research by Nusinovici et al. examined the effectiveness of several machine learning models in forecasting hypertension (HTN), diabetes (DM), cardiovascular disease (CVD), and CKD using fundamental clinical markers [11]. Using data from 6762 Asian people, they assessed five machine learning models: neural networks (NN), SVM, random forests (RF), gradient boosting machines, and k-nearest neighbor with logistic regression (LR). According to their findings, LR was more successful at predicting CKD and DM, whereas NN and SVM were more successful at predicting CKD and HTN.

In the field of healthcare, clinical disorders must be recognized, prevented, and treated while simultaneously looking for economical and effective remedies. ML is an essential tool for accomplishing these objectives. Healthcare uses a variety of data types, including clinical and claims data. ML approaches analyze the links between diseases and the tests used to identify them by comparing various diagnostic procedures for each disease to their clinical relevance [12]. Patients with early-stage CKD might halt or even reverse the condition’s course by taking the appropriate medical measures. The digitized ECG data for this model was obtained from the Physionet Database (www.physionet.org, accessed on 30 March 2024), which is combined with the publicly accessible Physikalisch-Technische Bundesanstalt (PTB) (for renal patients) and Fantasia (for healthy persons) databases. To assess the framework, more information was obtained from the same source. The method of testing produced exact findings, differentiating between people who had kidney infections and those who did not. Using both QT and RR interval values, the research outperformed the accuracy attained with a single feature, achieving a precision of 97.6% [13,14]. In a different research, CKD patients were classified from a dataset using seven machine learning algorithms: NBTree, SVM, J48, MLP, LR, NB, and CHIRP. With an accuracy of 99.75%, CHIRP performed better than the other techniques, proving its usefulness in the early identification of CKD [15]. Density-based function selection (DFS) and ant colony-based optimization (D-ACO) were used by Elhoseny et al. [16] to provide an integrated CKD healthcare system. As used in conjunction with DFS to remove redundant features, the D-ACO method greatly increased classification accuracy on a benchmark CKD dataset as compared to earlier techniques.

An effective method for the Kidney Disease Outcomes Quality Initiative (KDOQI) was developed by a team of internists and nephrologists to help primary care physicians diagnose and treat CKD, which is defined as having a glomerular filtration rate (GFR) of 60 mL/min/1.73 m² and/or signs of renal impairment for at least three months [17]. The estimated glomerular filtration rate (eGFR), which is mostly dependent on blood creatinine concentration, and the urine albumin-creatinine ratio (UACR) are the two most often utilized CKD tests in clinical settings. Testing for albuminuria and eGFR is advised for those with diabetes and/or hypertension, but not for the general public [18].

3. Research Design and Procedure

This study aims to enhance the detection of abnormal ECG patterns, known to be associated with CDK, by employing the LADTree model with enhanced predictive features. The LADTree model is compared to several popular machine learning models, including MLP, KNN, SVM, NB, and J48. These models are compared to the proposed model using two distinct types of assessment measures: error rate and accuracy evaluation. The error rate is measured using KS and MAE, whereas accuracy is determined using recall, ROC, PRC, and CA. Figure 1 shows the entire procedure of this investigation, which is further detailed in the following.

3.1. Data Acquisition and Preprocessing

The dataset from Data Hub, which included different features from the MIT-BIH Arrhythmia dataset (Physionet)—a two-lead ECG signal consisting of leads II and V—was the main focus of this investigation. 279 attributes total across sixteen classes make up this dataset. Table 1 shows the statistics for each of the specified attributes, whereas Table 2 illustrates the classifications. The MIT-BIH Arrhythmia Dataset provides a range of ECG recordings that provide meaningful information on cardiac abnormalities, hence the use of this dataset is justified. Although the dataset primarily focused on arrhythmias, the ECG patterns displayed in these recordings reflect the cardiac issues frequently observed in individuals with CKD, such as left ventricular hypertrophy and alterations in heart rate variability. Numerous studies have shown a connection between renal function and ECG findings, indicating that certain cardiac irregularities and arrhythmias may be indicators of the progression of chronic kidney disease [19,20]. Furthermore, the MIT-BIH dataset facilitates cross-study and technique comparisons by acting as a recognized standard in the field of cardiovascular research. The study investigates the association between ECG features and cardiac anomalies using this dataset, therefore bolstering the hypothesis that ECG analysis might help predict and detect kidney disorders early.

Since a large ECG dataset from CKD patients is difficult to acquire, we sought instead to use a publicly available MIT-BIH Arrhythmia dataset that contained normal and abnormal ECG signals with known cardiac anomalies. This dataset is useful to the research of CKD because of the well-established associations between certain cardiac abnormalities and CKD, even though Table 1 lists distinct cardiac disorders rather than renal problems. Research has demonstrated that people with CKD often have unique cardiac problems, including left ventricular hypertrophy (LVH), reduced heart rate variability, and certain ECG abnormalities, such as longer QT intervals and peaked T waves as a result of hyperkalemia. These cardiac anomalies, which are included in the MIT-BIH dataset, offer important new information about the cardiovascular consequences of chronic kidney disease. The extensive ECG recordings in the dataset enable researchers to examine trends and characteristics that could point to the development of CKD.

Since ECG can reveal critical cardiovascular issues related to kidney dysfunction, there is a notable correlation between ECG data and kidney diseases, particularly CKD. For instance, hyperkalemia, a common condition in CKD, can cause distinctive ECG changes such as peaked T waves and enlarged QRS complexes, potentially indicating severe arrhythmias. Additionally, CKD-induced autonomic nervous system dysfunction often leads to reduced heart rate variability, detectable via ECG. ECG can also identify LVH, which is prevalent among CKD patients, underscoring the importance of cardiovascular risk management. Moreover, due to rapid fluid and electrolyte shifts in dialysis patients, continuous ECG monitoring is crucial for detecting arrhythmias. Certain ECG patterns, such as prolonged QT intervals, have been linked to CKD progression and increased mortality risk. Consequently, integrating ECG monitoring into CKD management can enhance patient outcomes and facilitate earlier detection of complications.

The preprocessing approach consists of two different phases. The data did not have a structured format at first, thus it was converted into one. The second phase was fixing missing values. For this aim, mean imputation was used, which substituted missing values with the mean (average) of observed values inside the associated feature (column). A common method for handling missing data, especially numerical data, is mean imputation. The mean input may be formally described using the following equations:

To compute the mean of observed values within a column, the summation of all non-missing values is divided by the count of those values, denoted by n:

μ = \frac{\sum_{i = 1}^{n} x_{i}}{n}

(1)

Here, μ signifies the mean, x_i denotes each observed value, and n represents the number of observed values, excluding missing entries. Subsequently, to replace missing values with the computed mean, the imputed values (x_imputed) for the missing data points are set equal to the calculated mean (μ):

x_{i m p u t e d} = μ

(2)

This approach efficiently handles missing data by substituting them with a representative measure derived from existing observed values within the dataset.

3.2. Feature Selection

In the process of feature selection for machine learning or data mining tasks, a dataset initially comprising 279 features posed a significant challenge due to its complexity. To tackle this, the CfsSubsetEval method was employed as an attribute evaluator, a common technique utilized to discern the most crucial features [22]. This method evaluates the importance of a set of traits by considering both their predictive capability and the level of redundancy among them. The evaluation involves two key aspects:

Individual Predictive Power (IP), quantifies the predictive strength of each feature in isolation, which shows how well it can predict the target variable. Several measures may be used to measure this, depending on the specifics of the issue. Common metrics for categorization jobs include Mutual Information, Chi-square, and Information Gain. Correlation coefficients and coefficient of determination (R²) can be applied to regression tasks. To keep things simple, let’s suppose that a classification task uses Information Gain (IG). Information Gain quantifies the decrease in uncertainty or entropy when a feature is applied to divide the data:

I P (i) = I G (i) = H (Y) - H (Y | X_{i})

(3)

where,

H (Y)

is the entropy of the target variable Y, and

H (Y | X_{i})

is the conditional entropy of Y given feature

X_{i}

.

Redundancy (R), measures the similarity or duplication between pairs of features, often using metrics like Pearson’s correlation coefficient, which frequently results in decreased efficacy when both variables are included in the model. Metrics such as Pearson’s correlation coefficient for continuous characteristics and other similarity measures for categorical features can be used to quantify this.

For continuous features, Pearson’s correlation coefficient is used:

R (i, j) = \frac{c o v (X_{i}, X_{j})}{{σ X}_{i} σ X_{j}}

(4)

where,

c o v (X_{i}, X_{j})

is the covariance between features

X_{i}

and

X_{j}

,

{σ X}_{i}

and

{σ X}_{j}

are the standard deviations of

X_{i}

and

X_{j}

, respectively.

Given a subset of features denoted as S, and the Total Predictive Power (TP) of this subset : T P (S) = \sum_{i \in S} I P (i)

(5)

Total Redundancy (TR) within S:

T R (S) = \sum_{i \neq j} R (i, j) f o r a l l i, j i n S

(6)

To strike a balance between predictive power and redundancy, a scoring function called the Consistency Score is employed. This score is calculated as the ratio of TP to the square root of the sum of TR and TP within the subset:

C o n s i s t e n c y (S) = \frac{T P (S)}{\sqrt{(T R (S) + T P (S))}}

(7)

By incorporating both predictive power and redundancy metrics, the Consistency Score facilitates the selection of feature subsets that offer valuable data for predictive modeling while minimizing redundancy. This iterative process of evaluating feature subsets based on their Consistency Scores aids in identifying the most informative and non-redundant set of features for subsequent machine learning or data mining tasks.

CfsSubsetEval finds the subset S that maximizes the Consistency score by applying sophisticated searching strategies such as evolutionary algorithms and greedy forward selection. By removing unnecessary data, this technique guarantees that the selected characteristics provide insightful information to the prediction model.

Particle Swarm Optimization (PSO), Best First (BF), and Harmony Search (HS) are the three different search techniques used by CfsSubsetEval. These feature selection techniques were used to extensively examine and determine which ECG features were most helpful in CKD prediction. Every approach offers a different viewpoint on feature selection. PSO, as a population-based optimization strategy, allows dynamic feature subset creation; BF methodically adds or eliminates features to optimize a scoring function; and HS offers an element of unpredictability while repeatedly refining feature subsets. The objective was to thoroughly investigate the feature space, identify the most pertinent characteristics for improved CKD prediction with ECG data, reduce redundancy, and finally raise the accuracy and performance of the model.

a. Harmony Search: CfsSubsetEval and the heuristic approach to optimization Attributes are chosen via Harmony Search (HS). It starts by increasing arbitrary feature subsets repeatedly. The consistency score (CS) (see Equation (5)), where the degree of redundancy in subset S is measured by TR(S) and the forecasting capacity is assessed by TP(S). Each cycle’s new harmonies (characteristic subsets) are generated by combining the previously existing ones, with little squeezes made to accommodate variance. It is much saved if the harmonious arrangement improves the CS score. After a favorable number of rounds, this technique includes the United States while still meeting convergence standards. In the end, the set of rules selects the feature subset with the highest CS rating.

b. Best First Search: Using CfsSubsetEval and Best First (BF) Search, feature selection is improved by repeatedly adding or removing features to increase the CS:

Let S be an empty feature subset, to begin with. Find CS(S) by using the main function. Assess possible removals and additions of features: if a function is added, it will improve CS(S); if a feature is removed, it will complement CS(S), thus it will be removed from the subset. Until a halting requirement is satisfied, this procedure is repeated. In the end, choose the feature subset with the highest CS rating.

TP(S) measures predictive power and TR(S) measures redundancy in the subset S as discussed in Equation (5).

c. Particle Swarm Optimization Search: Particle Swarm Optimization (PSO) for at-tribute selection in CfsSubsetEval works by creating a population of feature subsets. Particles (which are subsets) adjust their locations (features) mathematically based on known coordinates and their velocity. To maximize the CS, TP and TR must be balanced (see Equation (5)):

Particles update positions and velocities following:

v_i (t + 1) = w * v_i (t) + c 1 * r a n d 1 * (p_b e s t_i - x_i (t)) + c 2 * r a n d 2 * (g_b e s t - x_i (t))

(8)

x_i (t + 1) = x_i (t) + v_i (t + 1)

(9)

After iterations, select the feature subset with the highest CS score, optimizing attribute selection.

Table 2 shows that all of these search methods found unique feature groups. We next utilized the finishing operation to choose the subset of features according to the majority vote method. We choose just those features that yield additional information about the purpose of the study and are selected by two or all three search strategies. The following is how this operation is expressed mathematically:

Let F represent the collection of all possible features.
F₁, F₂, and F₃ are the feature subsets determined by the three distinct feature selection algorithms.
F_selected is the final subset of features selected using the majority vote method.

The following is the mathematical equation for our ending operation:

F_{s e l e c t e d} = {f \in F ∣ ∣ {i \in {1,2, 3} ∣ f \in F i} ∣ \geq 2}

(10)

By using your different feature selection strategies, this equation selects features from at least two of the three feature subsets. This is a majority vote method, meaning that the final subset has the characteristics selected by the majority of the approaches.

3.3. ML Models, Training, and Performance Evaluation

This study seeks to enhance the accuracy of detecting abnormal ECG patterns using the LADTree model and ECG signal data. Table 3 compares the derived prediction models to many traditional machine learning methods. There are two unique approaches to training and evaluation. First, the percentage splitting strategy is utilized, with 70% of the dataset dedicated to training each model and the remaining 30% to testing. Second, a K-fold cross-validation approach is used, with K equal to 10. This approach enables a thorough evaluation of the model’s performance by systematically partitioning the dataset into subsets for training and testing, revealing insights into their prediction capabilities under various validation techniques. While Table 4 presents the list of benchmarked models used in this study.

Standard measures like recall, the Classification Accuracy (CA) [23,24], the Receiver Operating Characteristic (ROC) area, the Precision-Recall Curve (PRC) area [25,26], the Kappa Statistics (KS) [27], and the Mean Absolute Error (MAE) [28,29] are used to evaluate the performance. The four values in the confusion matrix produced by the machine learning model are True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). These metrics are obtained from this matrix.

R e c a l l = \frac{T P}{T P + F N}

(11)

C A = \frac{T P + T N}{T P + F P + T N + F N}

(12)

K S = \frac{P o - P e}{1 - P e}

(13)

The percentage of situations in which the predicted and actual classifications match is shown by P_o (Observed Agreement) in this equation, whereas the percentage of cases where agreement would be anticipated just by chance is indicated by Pe (anticipated Agreement by Chance).

M A E = \frac{\sum_{i = 1}^{n} | y_{i} - x_{i} |}{n}

(14)

In this case, N is the number of samples, x_i is the true (actual) label or value, and _I y_i is the predicted label or value.

Receiver Operating Characteristic (ROC) Area: The ROC area is calculated by utilizing the ROC curve to graph TPR vs. FPR at different decision thresholds. Often, software libraries or numerical integration are used to determine the ROC area.

Precision-Recall Curve (PRC) Area: The PRC area is obtained by utilizing the PRC curve to graph Precision vs. Recall at different decision criteria. Software library tools or numerical integration are frequently used to compute the PRC area.

Table 4. List of the Models Employed in this Study Compared with the Projected Model.

ML Models	References
Multilayer Perceptron (MLP)	[30,31,32]
K-Nearest Neighbor (KNN)	[31,32,33]
Support Vector Machine (SVM)	[34,35,36]
Naïve Bayes (NB)	[31,37,38]
J48-Decision Tree (J48)	[29,32,39]

3.4. Classification Task Definition

In this work, the classification aim is to use ECG signal data to identify abnormal ECG patterns. The objective is to develop a model that can accurately categorize ECG signals into patterns indicative of potential CKD and non-CKD categories. Traditional machine learning models, such as NB, SVM, MLP, KNN, and J48, are contrasted with the LADTree model-based categorization approach. To guarantee the robustness and trustworthiness of the findings, these models are evaluated under two distinct scenarios: percentage splitting and K-fold cross-validation.

3.5. Proposed Methodology (LADTree)

The error threshold for creating regression trees is established using the Least Absolute Deviation (LAD) method, but Logical Analysis of Data provides an alternate classification methodology in the optimization literature. In LAD, a binary classifier is created by learning logical phrases that successfully differentiate positive and negative samples in a dataset [40,41]. The LAD model’s central premise is that a binary point covered by positive patterns but not negative ones is considered positive, and vice versa for negative patterns. A LAD model is built by creating a wide collection of patterns and choosing a subset that matches these assumptions while ensuring that each pattern meets particular prevalence and homogeneity requirements. LADTree, a binary target variable classifier, learns logical expressions to differentiate between positive and negative samples, employs a logistic strategy for multiclass alternating decision tree generation, and can handle more than two class inputs. It also performs additive logistic regression [42,43]. The overall procedure of the LADTree is presented in Algorithm 1.

Algorithm 1: LADTree

Input:
- Dataset: A set of instances with features and class labels
Output:
- Decision tree model for classification
1. Start
2. Check if the stopping criteria for tree construction are met for the current dataset.
3. If stopping criteria are met:
a. Create a leaf node for the current dataset containing the majority class.
b. Return the created leaf node.
4. Else:
a. Find
the best split for the current dataset using logistic regression.
b. If no optimal split is found:
i. Create a leaf node for the current dataset containing the majority class.
ii. Return the created leaf node.
c. Else:
i. Split the dataset into left and right subsets based on the best split.
ii. Recursively apply the LADTree algorithm to the left and right subsets.
iii. Create a decision node with the best split and its corresponding child nodes.
iv. Return the created decision node.
5. End

The offered algorithm describes how to build a decision tree model for classification using the LADTree technique. It begins by determining if the halting requirements for tree building are fulfilled for the current dataset. If the requirements are satisfied, the algorithm generates a leaf node with the majority class and returns it. Otherwise, it applies logistic regression to determine the appropriate split for the dataset. If no optimal split is discovered, a leaf node with the majority class is generated and returned. If an optimum split is discovered, the dataset is partitioned into left and right subsets according to this split. The LADTree method is then performed iteratively on each subset, resulting in decision nodes with the best split and their child nodes. This procedure continues until each subset’s stopping conditions are fulfilled, resulting in a decision tree model for categorization. The tree-stopping criteria, finding the best split, splitting the dataset, and calculating deviance are presented in Algorithm 2:

Algorithm 2: StoppingCriteria(Dataset)

1. Determine the stopping criteria for tree construction (e.g., maximum depth, minimum samples per node).
2. Return true if the stopping criteria are met; otherwise, return false.

Function: FindBestSplit(Dataset)
1. Initialize best_split as null and best_deviance as infinity.
2. For each feature in the dataset:
a. For each value in the feature:
i. Split the dataset into left and right subsets.
ii. Calculate the deviance using logistic regression.
iii. If the calculated deviance is less than the best_deviance:
A. Update best_deviance with the calculated deviance.
B. Update best_split with the current feature and value.
3. Return the best_split.

Function: SplitDataset(Dataset, feature, value)
1. Initialize left_subset and right_subset as empty subsets.
2. For each instance in the dataset:
a. If the feature value of the instance is less than or equal to the given value:
i. Add the instance to the left_subset.
b. Else:
i. Add the instance to the right_subset.
3. Return left_subset and right_subset.

Function: CalculateDeviance(left_subset, right_subset)
1. Calculate the deviance using logistic regression models based on the given subsets.
2. Return the calculated deviance.

These functions are essential to the LADTree technique, which is used to build logistic regression-based decision trees. The “StoppingCriteria” function defines the conditions that determine when to stop the tree construction process, such as reaching a maximum depth or having a minimum amount of samples per node. It returns true if these requirements are fulfilled, suggesting that further splitting of the dataset is not required. The “FindBestSplit” function determines the optimum split in the dataset that maximizes class separation. It iterates over each feature and value combination, dividing the dataset and computing deviation with logistic regression. The split that produces the lowest deviation is chosen as the best split and returned. The “SplitDataset” function partitions the dataset into left and right subsets depending on a given feature and value. Instances with feature values less than or equal to the provided value are placed in the left subset, while those with higher values are placed in the right subset. The subsets are then returned for further processing. Finally, the “CalculateDeviance” function calculates the deviation using logistic regression models based on the subsets created by the “SplitDataset” function. This deviation measurement quantifies the difference between observed and expected class probabilities, which helps to determine the appropriate split for the decision tree. These functions work together to simplify the iterative process of building a decision tree model with the LADTree algorithm, ensuring that optimal splits are located while adhering to set stopping conditions.

4. Results Analysis and Discussion

This study focuses on detecting abnormal ECG patterns using the LADTree model. The efficacy of the proposed LADTree model is compared to traditional ML models using error rate and accuracy analysis as benchmarks. Training and testing procedures are carried out in two different ways: scenario 1 involves percentage splitting, in which the dataset is partitioned into training and testing sets based on a specified percentage, and scenario 2 uses K-fold cross-validation, in which the dataset is divided into K subsets for training and testing iteratively. These different criteria enable a thorough evaluation and comparison of the LADTree model’s performance with known ML techniques in the context of CKD prediction using ECG data.

4.1. Scenario 1

In this case, the model training and testing are performed using the percentage splitting criteria. The data is split into 70% and 30% for training and testing respectively. Figure 2 shows the MAE values for ML models used to detect abnormal ECG patterns from ECG signal data. Notably, the NB model has the lowest MAE of 0.048, showing higher predictive accuracy than the other models. In contrast, the SVM model has the greatest MAE (0.132), signifying substantially inferior predictive ability. The MLP and LADTree models have modest MAE values of 0.057 and 0.059, respectively, but the KNN and J48 decision tree models have MAE values of 0.073 and 0.065.

Figure 3 shows the KS values for ML models used to predict abnormal ECG patterns from ECG signal data. The Kappa Statistic measures inter-rater agreement or classification accuracy for categorical outcomes, with values closer to one indicating higher agreement or classification performance. In this scenario, the LADTree model has the highest KS value (0.634), indicating the best agreement between projected and observed classifications among the models tested. Following closely is the MLP model, which has a KS score of 0.585, showing a high level of agreement. The NB model also has a comparatively high KS value of 0.553, which indicates strong classification accuracy. In contrast, the KNN and J48 decision tree models have KS values of 0.315 and 0.466, respectively, showing moderate agreement. Notably, the SVM model has the lowest KS value of 0.314, signifying lower classification performance than the other models. Overall, the analysis indicates that the LADTree, MLP, and NB models beat others in terms of classification accuracy for CKD prediction, however, the SVM model performs poorly based on Kappa Statistic values.

Figure 4 shows that the LADTree model consistently surpasses the other models in all three metrics: recall, ROC area, and PRC area. With a recall of 0.779, the LADTree model outperforms all other models in detecting positive cases of renal disease. Furthermore, its ROC area of 0.864 indicates a better classification capacity than the other models. Furthermore, the LADTree model obtains a PRC area of 0.75, demonstrating an excellent balance of accuracy and recall. This overall result implies that the LADTree model is well-suited to the task of predicting renal illness, with high accuracy in detecting positive instances and good precision. While competing models, such as MLP and NB, perform competitively in some measures, the LADTree model consistently outperforms all tested criteria. Based on the data reported, the LADTree model should be considered the leading choice for renal disease prediction.

The accuracy analysis of the applied machine learning models provides useful information about their efficacy in predicting renal illness is presented in Figure 5. The LADTree model outperforms the other models, with an accuracy of 77.942%, suggesting that it can accurately anticipate the outcome in about 78% of situations. This demonstrates the LADTree algorithm’s ability to appropriately categorize occurrences of renal illness. The MLP model follows closely behind with an accuracy of 75%, demonstrating its outstanding overall performance in classification tasks. The NB model also performs competitively, with an accuracy of 71.323%, indicating its usefulness in medical diagnosing tasks. However, other models, such as SVM, J48, and KNN, have significantly lesser accuracies of 65.441%, 64.706%, and 54.412%. These findings indicate that the models’ efficacy varies, with SVM and J48 performing moderately and KNN displaying the lowest accuracy in this job. Overall, the accuracy study emphasizes the need to choose the best machine learning model adapted to the dataset’s features to attain optimal predictive performance in renal illness prediction.

4.2. Scenario 2

In this case, the model training and testing are performed using the K-fold cross-validation criteria, where the value for K is selected as 10. K-fold cross-validation is also a standard model training and testing criteria previously used in different studies [15,29,44,45]. Figure 6 shows the MAE analysis, providing insights into the performance of ML models for detecting abnormal ECG patterns using ECG signal data. Each model’s MAE value reflects itss performance, with lower numbers indicating higher performance. Among the models, the NB model has the lowest MAE of 0.054, followed by the MLP model at 0.062. Both methods produce slightly better CDK predictions from ECG readings. In contrast, the SVM model has the greatest MAE of 0.132, suggesting less accurate predictions than the other models. The KNN model has a significantly higher MAE of 0.07. Decision tree-based models, such as J48 and LADTree, perform relatively well, with MAE values of 0.068 and 0.069.

Figure 7 shows the KS values for each ML model used to detect abnormal ECG patterns, highlighting the agreement between predicted and actual results while accounting for chance agreement. The NB model had the best level of agreement across the models, with a Kappa score of 0.477, indicating significant agreement above chance and confirming its usefulness in predicting CDK outcomes. Following closely, the LADTree model has a Kappa score of 0.467, indicating a moderate to significant level of agreement over chance. The MLP and J48 models also have reasonable agreement with KS (0.446 and 0.453, respectively). In contrast, the SVM and KNN models had lower levels of agreement, with Kappa values of 0.297 and 0.326, respectively, indicating room for improvement in their prediction ability. Overall, the KS analysis emphasizes the different levels of agreement between the models’ predictions and actual abnormal ECG patterns, providing valuable insights into their ability to detect these patterns.

Figure 8 shows the performance measures for ML models used to detect abnormal ECG patterns, including recall, ROC area, and PRC area. Among the models, the NB model has the best overall performance, with notable scores in recall (0.662), ROCA (0.789), and PRCA (0.62), indicating its effectiveness in correctly identifying positive CKD cases, distinguishing between positive and negative instances, and maintaining a balance between precision and recall. The LADTree model comes in second place, with recall, ROCA, and PRCA scores of 0.673, 0.793, and 0.602, respectively. Meanwhile, the MLP, J48, and SVM models show moderate performance in detecting abnormal ECG patterns and distinguishing between classes, while the KNN model lags with lower scores across all metrics, indicating relatively weaker performance in detecting these patterns.

Figure 9 shows the accuracy values of the applied ML models in detecting abnormal ECG patterns using ECG signal data. Among the models, the LADTree model has the best accuracy of 67.257%, followed by the MLP model at 66.372%. These findings show that both models have rather better prediction ability in reliably diagnosing cases of CKD using ECG data. In contrast, the KNN model has the lowest accuracy of 56.195%, indicating that it has worse predictive ability than other models. The SVM and J48 models have accuracies of 63.053% and 65.255%, suggesting reasonable performance levels. The NB model has an accuracy of 66.15%, which is competitive with MLP and J48.

4.3. Discussion

The study focuses on detecting abnormal ECG patterns using ECG data, specifically evaluating the performance of the LADTree model compared to other standard ML models. The analysis includes two scenarios: scenario 1, which uses percentage splitting for training and testing, and scenario 2, which uses K-fold cross-validation. In scenario 1, the models’ prediction ability is assessed using MAE and KS. While the NB model has the lowest MAE, showing stronger prediction accuracy, the LADTree model outperforms KS, demonstrating superior classification accuracy. Moreover, the LADTree model consistently excels in the recall, ROC, and PRC metrics, showcasing its ability to effectively detect abnormal ECG patterns while maintaining a balance of accuracy and recall. Although the NB and MLP models also show competitive MAE values, the LADTree model maintains strong performance across various metrics. The NB model demonstrates notable efficiency in predicting CKD based on agreement with actual results. However, the LADTree model stands out in terms of recall, ROC area, and PRC, underscoring its suitability for identifying abnormal ECG patterns. In terms of accuracy, the LADTree model again proves to be the top performer, demonstrating its reliability in diagnosing conditions related to abnormal ECG data.

Furthermore, the superior performance of the LADTree model is evident. Its robustness to noise and outliers, combined with its capability to handle both numerical and categorical data, makes it particularly effective. This allows LADTree to capture complex relationships within ECG signals and reliably aid in the identification of abnormal patterns. Furthermore, the hierarchical structure of LADTree improves interpretable decision-making, allowing for the discovery of significant variables critical for CKD prediction. However, greater investigation into how LADTree’s decision-making process varies from other models, as well as its computing efficiency, would give a more complete explanation of its superiority. Overall, while the research offers valuable insights, addressing these underlying aspects and concerns will enhance our understanding of the findings and underscore the LADTree model’s advantages in detecting abnormal ECG patterns.

Comparing LADTree’s performance against that of J48, another decision tree, is another factor. The decision tree algorithms J48 and LADTree are renowned for their interoperability, which is essential in medical applications. Nonetheless, the observed inequalities in performance can be explained by significant variances in their underlying structures and processes. Combining boosting with decision trees, the LADTree is an alternating decision tree type that can handle more complicated connections and provide predictions with a better degree of accuracy. It combines many trees to capture a wider variety of patterns in the data, which is especially useful for complex datasets like ECG signals used to forecast chronic kidney disease. J48, an algorithmic implementation of C4.5, on the other hand, constructs a single decision tree according to criteria related to information gain or gain ratio. J48 performs relatively poorly when dealing with the increased complexity and noise seen in medical data, even though it is successful with simpler datasets. Because of its sturdy construction, the LADTree is better able to handle noise and outliers, which enhances prediction accuracy and dependability.

The study’s limitations derive basically from its dependence on a single dataset, the MIT-BIH Arrhythmia database, which may not completely represent the various ECG patterns found in the general population with CKD. As a result, the findings’ generalizability may be restricted since the model’s performance varies among different ECG datasets or populations. Furthermore, while the study compares multiple machine learning models, the emphasis on only a selection of models may obscure other potentially useful methods that might improve CKD prediction accuracy. Furthermore, the research recognizes the relevance of feature selection; yet, investigating advanced feature selection approaches other than those used may provide different outcomes. The LADTree model’s interpretability is promising, but its computational efficiency and resource consumption compared to simpler models like Naïve Bayes may pose challenges for practical deployment in clinical settings. These limitations highlight the need for more studies to test and refine the findings over a wide range of datasets, as well as to improve the suggested approach’s applicability in realistic circumstances.

5. Conclusions

This study aimed to enhance the prediction of abnormal ECG patterns using ECG signal data and ML models. It compared the LADTree model with several well-known ML algorithms, including MLP, KNN, SVM, NB, and J48. Comprehensive testing and analysis provided valuable insights into the performance of these models for detecting abnormal ECG patterns. Our results constantly showed that the LADTree model performed better in terms of mistake rate and accuracy evaluation measures. Notably, the LADTree model continuously performed better on measures including MAE, KS, recall, ROCA, PRCA, and overall accuracy. The study also emphasized how crucial feature selection is to improving machine learning models’ capacity for prediction. All things considered, these results highlight the promise of the LADTree model and underscore the significance of effective feature selection in optimizing the detection of abnormal ECG patterns using ECG signal data. Through the use of many feature selection methods, such as Particle Swarm Optimization, Best First Search, and Harmony Search, we were able to find non-redundant and relevant features for CKD prediction using ECG data. Overall, this study’s findings demonstrate the potential of machine learning models, especially the LADTree model, to support the early identification and diagnosis of CKD using data from ECG signals. These findings have important ramifications for the creation of sophisticated diagnostic instruments and individualized healthcare programs meant to enhance CKD management and lower the rates of morbidity and death connected with the disease. To improve the clinical value and dependability of ML models, additional research in this field may concentrate on improving feature selection techniques, maximizing model parameters, and verifying the models’ performance on bigger and more varied datasets.

Future research for this study might include investigating ensemble approaches that mix different ML models, including the LADTree model, to potentially improve prediction accuracy. Furthermore, using deep learning for CKD prediction using ECG data shows potential. It would be advantageous to conduct larger-scale investigations with various datasets, validate model robustness, and improve interpretability using approaches such as attention processes. Furthermore, including real-time monitoring capabilities in created models might allow for early identification and continuous monitoring of CKD development, leading to improved diagnosis and management of this common medical illness.

Author Contributions

Conceptualization, B.K. and M.B.; methodology, B.K.; software, M.B.; validation, B.K. and M.B.; formal analysis, M.B.; investigation, B.K.; resources, M.B.; data curation, B.K.; writing—original draft preparation, B.K.; writing—review and editing, M.B.; visualization, B.K.; supervision, B.K.; project administration, M.B.; funding acquisition, M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was funded by Institutional Fund Projects under grant no. (GPIP: 778-611-2024). The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.

Data Availability Statement

The data used in this study is online available at: https://physionet.org/about/database/ (accessed on 30 March 2024).

Acknowledgments

This research was supported by the Institutional Fund Projects under grant number GPIP: 778-611-2024. The authors express their sincere gratitude to the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia, for their invaluable technical and financial support. This study builds upon and extends our previous work, “Enhancing Kidney Disease Prediction with Optimized Forest and ECG Signals Data”, which can be accessed online at: https://www.cell.com/heliyon/fulltext/S2405-8440(24)06823-3, accessed on 4 September 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rady, E.H.A.; Anwar, A.S. Prediction of kidney disease stages using data mining algorithms. Inform. Med. Unlocked 2019, 15, 100178. [Google Scholar] [CrossRef]
Alshebly, O.Q.; Ahmed, R.M. Prediction and Factors Affecting of Chronic Kidney Disease Diagnosis using Artificial Neural Networks Model and Logistic Regression Model. Iraqi J. Stat. Sci. 2019, 16, 140–159. [Google Scholar]
Ghosh, P.; Shamrat, F.M.J.M.; Shultana, S.; Afrin, S.; Anjum, A.A.; Khan, A.A. Optimization of prediction method of chronic kidney disease using machine learning algorithm. In Proceedings of the 2020 15th international joint symposium on artificial intelligence and natural language processing (iSAI-NLP), Bangkok, Thailand, 18–20 November 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Murat, F.; Yildirim, O.; Talo, M.; Baloglu, U.B.; Demir, Y.; Acharya, U.R. Application of deep learning techniques for heartbeats detection using ECG signals-analysis and review. Comput. Biol. Med. 2020, 120, 103726. [Google Scholar] [CrossRef]
Gilani, V.N.M.; Hosseinian, S.M.; Ghasedi, M.; Nikookar, M. Data-driven urban traffic accident analysis and prediction using logit and machine learning-based pattern recognition models. Math. Probl. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
Aggarwal, R.; Podder, P.; Khamparia, A. Ecg classification and analysis for heart disease prediction using xai-driven machine learning algorithms. In Biomedical Data Analysis and Processing Using Explainable (XAI) and Responsive Artificial Intelligence (RAI); Springer: Berlin/Heidelberg, Germany, 2022; pp. 91–103. [Google Scholar]
McClellan, W.M. Epidemiology and risk factors for chronic kidney disease. Med. Clin. N. Am. 2005, 89, 419–445. [Google Scholar] [CrossRef] [PubMed]
Revathy, S.; Bharathi, B.; Jeyanthi, P.; Ramesh, M. Chronic kidney disease prediction using machine learning models. Int. J. Eng. Adv. Technol. 2019, 9, 6364–6367. [Google Scholar] [CrossRef]
Mulia, E.P.B.; Nugraha, R.A.; A’yun, M.Q.; Juwita, R.R.; Yofrido, F.M.; Julario, R.; Alkaff, F.F. Electrocardiographic abnormalities among late-stage non-dialysis chronic kidney disease patients. J. Basic. Clin. Physiol. Pharmacol. 2020, 32, 155–162. [Google Scholar] [CrossRef]
Junho, C.V.C.; Trentin-Sonoda, M.; Panico, K.; Dos Santos, R.S.N.; Abrahão, M.V.; Vernier, I.C.S.; Fürstenau, C.R.; Carneiro-Ramos, M.S. Cardiorenal syndrome: Long road between kidney and heart. Heart Fail. Rev. 2022, 27, 2137–2153. [Google Scholar] [CrossRef]
Nusinovici, S.; Tham, Y.C.; Yan, M.Y.C.; Ting, D.S.W.; Li, J.; Sabanayagam, C.; Yin, T.; Cheng, C. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 2020, 122, 56–69. [Google Scholar] [CrossRef]
Sawhney, R.; Malik, A.; Sharma, S.; Narayan, V. A comparative assessment of artificial intelligence models used for early prediction and evaluation of chronic kidney disease. Decis. Anal. J. 2023, 6, 100169. [Google Scholar] [CrossRef]
Singh, A.K.; Krishnan, S. ECG Signal Feature Extraction Trends in Methods and Applications. Biomed. Eng. Online 2023, 22, 22. [Google Scholar] [CrossRef] [PubMed]
Rahman, T.M.; Siddiqua, S.; Rabby, S.E.; Hasan, N.; Imam, M.H. Early Detection of Kidney Disease Using ECG Signals Through Machine Learning Based Modelling. In Proceedings of the 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 10–12 January 2019; pp. 319–323. [Google Scholar]
Khan, B.; Naseem, R.; Muhammad, F.; Abbas, G.; Kim, S. An empirical evaluation of machine learning techniques for chronic kidney disease prophecy. IEEE Access 2020, 8, 55012–55022. [Google Scholar] [CrossRef]
Elhoseny, M.; Shankar, K.; Uthayakumar, J. Intelligent Diagnostic Prediction and Classification System for Chronic Kidney Disease. Sci. Rep. 2019, 9, 1–14. [Google Scholar] [CrossRef]
Gudeti, B.; Mishra, S.; Malik, S.; Fernandez, T.F.; Tyagi, A.K.; Kumari, S. A Novel Approach to Predict Chronic Kidney Disease using Machine Learning Algorithms. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 1630–1635. [Google Scholar] [CrossRef]
Battineni, G.; Sagaro, G.G.; Chinatalapudi, N.; Amenta, F. Applications of machine learning predictive models in the chronic disease diagnosis. J. Pers. Med. 2020, 10, 21. [Google Scholar] [CrossRef] [PubMed]
Ajam, F.; Patel, S.; Alrefaee, A. Cardiac arrhythmias in patients with end stage renal disease (ESRD) on hemodialysis; recent update and brief literature review. Am J Int Med. 2019, 7, 22–26. [Google Scholar] [CrossRef]
Buttar, H.S.; Li, T.; Ravi, N. Prevention of cardiovascular diseases: Role of exercise, dietary interventions, obesity and smoking cessation. Exp. Clin. Cardiol. 2005, 10, 229–249. [Google Scholar]
Binsawad, M. Enhancing kidney disease prediction with optimized forest and ECG signals data. Heliyon 2024, 10, e30792. [Google Scholar] [CrossRef]
Erturan, A.M.; Karaduman, G.; Durmaz, H. Machine learning-based approach for efficient prediction of toxicity of chemical gases using feature selection. J. Hazard. Mater. 2023, 455, 131616. [Google Scholar] [CrossRef]
Mishra, P.; Varadharajan, V.; Tupakula, U.; Pill, E.S. A Detailed Investigation and Analysis of using Machine Learning Techniques for Intrusion Detection. IEEE Commun. Surv. Tutorials 2018, 21, 686–728. [Google Scholar] [CrossRef]
Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection. Electronics 2019, 8, 322. [Google Scholar] [CrossRef]
Iqbal, A.; Aftab, S.; Ali, U.; Nawaz, Z.; Sana, L.; Ahmad, M.; Husen, A. Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 300–308. [Google Scholar] [CrossRef]
Balogun, A.O.; Bajeh, A.O.; Orie, V.A.; Yusuf-asaju, A.W. Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method. J. Eng. Technol. 2018, 3, 50–55. [Google Scholar] [CrossRef]
Nahar, N.; Ara, F. Liver Disease Prediction by Using Different Decision Tree Techniques. Int. J. Data Min. Knowl. Manag. Process 2018, 8, 1–9. [Google Scholar] [CrossRef]
Naseem, R.; Khan, B.; Shah, M.A.; Wakil, K.; Khan, A.; Alosaimi, W.; Uddin, M.I.; Alouffi, B. Performance Assessment of Classification Algorithms on Early Detection of Liver Syndrome. J. Healthc. Eng. 2020, 2020, 6680002. [Google Scholar] [CrossRef]
Naseem, R.; Khan, B.; Ahmad, A.; Almogren, A.; Jabeen, S.; Hayat, B.; Shah, M.A. Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects. Complexity 2020, 2020, 1–21. [Google Scholar] [CrossRef]
Salo, F.; Nassif, A.B.; Essex, A. Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput. Netw. 2019, 148, 164–175. [Google Scholar] [CrossRef]
Khan, B.; Naseem, R.; Shah, M.A.; Wakil, K.; Khan, A.; Uddin, M.I.; Mahmoud, M. Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques. J. Healthc. Eng. 2021, 2021, 8899263. [Google Scholar] [CrossRef]
Obiedat, R.; Qaddoura, R.; Ala’M, A.Z.; Al-Qaisi, L.; Harfoushi, O.; Alrefai, M.A.; Faris, H. Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution. IEEE Access 2022, 10, 22260–22273. [Google Scholar] [CrossRef]
Zamir, A.; Khan, H.U.; Iqbal, T.; Yorsaf, N.; Aslam, F.; Anjum, A.; Hamdani, M. Phishing web site detection using diverse machine learning algorithms. Electron. Libr. 2020, 38, 65–80. [Google Scholar] [CrossRef]
Irfan, M.; Ullah, K.; Muhammad, F.; Khan, S.; Althobiani, F.; Usman, M.; Alshareef, M.; Alghaffari, S.; Rahman, S. Automatic Detection of Outliers in Multi-Channel EMG Signals Using MFCC and SVM. Intell. Autom. Soft Comput. 2023, 36, 169–181. [Google Scholar] [CrossRef]
Mehr, S.Y.; Ramamurthy, B. An SVM based DDoS attack detection method for Ryu SDN controller. In Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies, Orlando, FL, USA, 9–12 December 2019; pp. 72–73. [Google Scholar]
Khan, S.; Ullah, R.; Khan, A.; Wahab, N.; Bilal, M.; Ahmed, M. Analysis of dengue infection based on Raman spectroscopy and support vector machine (SVM). Biomed. Opt. Express 2016, 7, 2249. [Google Scholar] [CrossRef]
Pham, B.T. A Novel Classifier Based on Composite Hyper-cubes on Iterated Random Projections for Assessment of Landslide Susceptibility. J. Geol. Soc. India 2018, 91, 355–362. [Google Scholar] [CrossRef]
Alroobaea, R. An Empirical combination of Machine Learning models to Enhance author profiling performance. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 2130–2137. [Google Scholar] [CrossRef]
Saritas, M.M.; Yasar, A. Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification. Int. J. Intell. Syst. Appl. Eng. 2019, 7, 88–91. [Google Scholar] [CrossRef]
Holmstrom, L.; Christensen, M.; Yuan, N.; Hughes, J.W.; Theurer, J.; Jujjavarapu, M.; Fatehi, P.; Kwan, A.; Sandhu, R.K.; Ebinger, J.; et al. Deep learning based electrocardiographic screening for chronic kidney disease. Commun. Med. 2023, 3, 73. [Google Scholar] [CrossRef] [PubMed]
Holmstrom, L.; Christensen, M.; Yuan, N.; Hughes, J.W.; Theurer, J.; Jujjavarapu, M.; Fatehi, P.; Kwan, A.; Sandhu, R.K.; Ebinger, J.; et al. Deep learning based electrocardiographic screening for chronic kidney disease. medRxiv 2023, 2022-03. [Google Scholar] [CrossRef]
Holmes, G.; Pfahringer, B.; Kirkby, R.; Frank, E.; Hall, M. Multiclass alternating decision trees. Lect. Notes Comput. Sci. 2002, 2430, 161–172. [Google Scholar] [CrossRef]
Kalmegh, S.R. Comparative Analysis of WEKA Data Mining Algorithm RandomForest, RandomTree and LADTree for Classification of Indigenous News Data. Int. J. Emerg. Technol. Adv. Eng. 2008, 9001, 507–517. [Google Scholar]
Raji, C.G.; Chandra, S.S.V. Graft survival prediction in liver transplantation using artificial neural network models. J. Comput. Sci. 2016, 16, 72–78. [Google Scholar] [CrossRef]
Choudhury, S.; Bhowal, A. Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In Proceedings of the 2015 International Congress on Information and Communication Technology, London, UK, 28–30 October 2015; pp. 89–95. [Google Scholar] [CrossRef]

Figure 1. Overview of the study’s methodology for abnormal ECG patterns using the LADTree model and ECG signal data, highlighting the comparison with various machine learning models and assessment measures.

Figure 2. Comparison of Mean Absolute Error (MAE) Values for Machine Learning Models in Predicting Chronic Kidney Disease (CKD) from Electrocardiogram (ECG) Signal Data Using Percentage Splitting Criteria.

Figure 3. Comparison of Kappa Statistic (KS) Values for Employed Models in Predicting Abnormal ECG Patterns Using the Percentage Splitting Criteria.

Figure 4. Comparison of Performance Metrics Across Employed Machine Learning Models for Abnormal ECG Patterns Prediction using the Percentage Splitting Criteria.

Figure 5. Comparison of Machine Learning Model Accuracy in abnormal ECG patterns Using the Percentage Splitting Criteria.

Figure 6. Mean Absolute Error Analysis of Abnormal ECG Patterns on Employed Machine Learning Models Using ECG Signal Data with K-fold Cross-Validation.

Figure 7. Kappa Statistic (KS) Values for Employed Machine Learning Models Used in Predicting Abnormal ECG Patterns Using K-Fold Cross-Validation.

Figure 8. Performance Measures (Recall, ROCA, PRCA) of Machine Learning Models for Predicting Abnormal ECG Patterns using K-Fold Cross-Validation.

Figure 9. Accuracy Values of Employed Machine Learning Models for Predicting Abnormal ECG Patterns Using Electrocardiogram (ECG) Signal Data Based on the K-Fold Cross-Validation Criteria.

Table 1. List of class attributes, descriptions, and number of entries [21].

S. No.	Class	Entries	Description
1	Normal	245	Normal
2	VPC	3	Ventricular Premature Contraction (PVC)
3	IC-CAD	44	Ischemic changes (Coronary Artery Disease)
4	1-DAVB	0	1. degree Atrio-Ventricular block
5	LBBB	9	Left bundle branch block
6	SB	25	Sinus bradycardy
7	ST	13	Sinus tachycardy
8	AF	5	Atrial Fibrillation or Flutter
9	OIMI	15	Old Inferior Myocardial Infarction
10	LVH	4	Left ventricular hypertrophy
11	SPC	2	Supraventricular Premature Contraction
12	RBBB	50	Right bundle branch block
13	OAMI	15	Old Anterior Myocardial Infarction
14	3-DAVB	0	3. degree AV block
15	2-DAVB	0	2. degree AV block
16	Others	22	Others

Table 2. Statistics of Selected Attributes.

Attributes	Units	Description	Min	Max	Mean	StdDev
QRS_Duration	Milliseconds (ms)	Time taken for ventricular depolarization	55	188	88.92	15.364
BN	Microvolts (µV)	Baseline Noise	0	92	31.23	27.949
CJ	ms	Conduction Junctions	0	88	7.478	15.359
DK	ms	Duration of K-wave	0	132	6.327	20.984
DZ	ms	Delta Z (impedance change)	0	96	3.814	16.325
EB	Count	Ectopic Beat	0	112	41.681	16.425
EM	Arbitrary units (AU)	Ectopic Measure	0	88	3.239	11.531
HR	Beats per minute (bpm)	Heart Rate	−3.9	6.4	−1.144	1.116
IN	ms	Interval (possibly RR or QT interval)	−5.5	7	0.868	1.053
IV	ms	Intrinsic Variability	0	19.2	0.318	1.49
JB	µV	Junctional Beat	−216	268.9	−18.738	23.715
JO	µV	Junctional Origin	−32.9	0	−0.654	3.414
JV	Meters per second (m/s)	Junctional Velocity	−11.8	18.8	3.895	2.991
JY	µV	Junctional Yield	−242.4	165.4	−8.269	32.157
KS	Dimensionless	Kolmogorov-Smirnov (a statistical measure)	−5	8.3	1.722	1.708
LE	ms	Latency Event	−6	6	1.222	1.426

Table 3. List of Selected Features using each Searching Method [21].

Search Method	Selected Features
PSO Search	AH, BO, BN, BV, CZ, CJ, DM, DK, DZ, DS, DO, EY, EM, EB, FT, FA, GO, GE, HR, HL, HT, IN, JD, JJ, JH, JB, JY, JP, JO, JV, KS, KH, KY, LE, r_wave, t_interval, qrs_duration = 37
Best First Search	AU, CJ, DK, DD, DA, DZ, DN, EB, HR, HJ, IV, IN, IT, IH, JV, JB, JY, KS, LE, LG, q-t_interval, heart_rate, qrs_duration, t_interval, T = 25
Harmony Search	BN, BI, BY, DK, DB, EF, EB, EN, EM, FC, FB, FO, GR, HR, HN, IV, IJ, JO, JB, KS, KO, KU = 22
Ending Operation	BN, CJ, DK, EB, LE, JO, JV, JY, IN, EM, DZ, IV, JB, KS, HR, qrs_duration = 16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Binsawad, M.; Khan, B. Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD. Algorithms 2024, 17, 406. https://doi.org/10.3390/a17090406

AMA Style

Binsawad M, Khan B. Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD. Algorithms. 2024; 17(9):406. https://doi.org/10.3390/a17090406

Chicago/Turabian Style

Binsawad, Muhammad, and Bilal Khan. 2024. "Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD" Algorithms 17, no. 9: 406. https://doi.org/10.3390/a17090406

APA Style

Binsawad, M., & Khan, B. (2024). Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD. Algorithms, 17(9), 406. https://doi.org/10.3390/a17090406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advanced Detection of Abnormal ECG Patterns Using an Optimized LADTree Model with Enhanced Predictive Feature: Potential Application in CKD

Abstract

1. Introduction

2. Literature Study

3. Research Design and Procedure

3.1. Data Acquisition and Preprocessing

3.2. Feature Selection

3.3. ML Models, Training, and Performance Evaluation

3.4. Classification Task Definition

3.5. Proposed Methodology (LADTree)

4. Results Analysis and Discussion

4.1. Scenario 1

4.2. Scenario 2

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI