Explainable Machine Learning Model for Chronic Kidney Disease Prediction

Arif, Muhammad Shoaib; Rehman, Ateeq Ur; Asif, Daniyal

doi:10.3390/a17100443

Open AccessArticle

Explainable Machine Learning Model for Chronic Kidney Disease Prediction

by

Muhammad Shoaib Arif

^1,2,*

,

Ateeq Ur Rehman

¹

and

Daniyal Asif

^3,*

¹

Department of Mathematics and Sciences, College of Humanities and Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

²

Department of Mathematics, Air University, PAF Complex E-9, Islamabad 44000, Pakistan

³

Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia

^*

Authors to whom correspondence should be addressed.

Algorithms 2024, 17(10), 443; https://doi.org/10.3390/a17100443

Submission received: 13 August 2024 / Revised: 29 September 2024 / Accepted: 30 September 2024 / Published: 3 October 2024

(This article belongs to the Special Issue Artificial Intelligence-based Algorithms with Potential Applications in Healthcare and Prediction of Disease Evolution)

Download

Browse Figures

Versions Notes

Abstract

More than 800 million people worldwide suffer from chronic kidney disease (CKD). It stands as one of the primary causes of global mortality, uniquely noted for an increase in death rates over the past twenty years among non-communicable diseases. Machine learning (ML) has promise for forecasting such illnesses, but its opaque nature, difficulty in explaining predictions, and difficulty in recognizing predicted mistakes limit its use in healthcare. Addressing these challenges, our research introduces an explainable ML model designed for the early detection of CKD. Utilizing a multilayer perceptron (MLP) framework, we enhance the model’s transparency by integrating Local Interpretable Model-agnostic Explanations (LIME), providing clear insights into the predictive processes. This not only demystifies the model’s decision-making but also empowers healthcare professionals to identify and rectify errors, understand the model’s limitations, and ascertain its reliability. By improving the model’s interpretability, we aim to foster trust and expand the utilization of ML in predicting CKD, ultimately contributing to better healthcare outcomes.

Keywords:

explainable machine learning; multi-layer perceptron; chronic kidney disease; healthcare predictive modeling

Graphical Abstract

1. Introduction

The progressive decline in kidney function over time characterizes CKD, which represents a significant and escalating public health concern globally. The kidneys must filter out excess water and waste from the bloodstream to produce urine and keep the body at a steady temperature and pressure. The term “chronic kidney disease” is derived from the fact that the kidneys are gradually damaged, and the condition is frequently undetected until it is too late. This quiet progression of CKD emphasizes the importance of early detection because the body can adjust to lower kidney function in the early stages, which is why there are typically no symptoms. Many people don’t get treatment until their disease has gotten much worse, and the diagnosis is often made by chance in a blood or urine test that would have been normal otherwise [1,2,3,4].

Risk factors like hypertension, diabetes, and cardiovascular diseases significantly increase the likelihood of developing CKD [5,6]. The disease’s prevalence has increased globally, partly due to the rise in these risk factors, with an estimated 843.6 million individuals affected in 2017 [7]. Despite the progress made in the treatment of CKD, it remains a significant cause of mortality on a global scale, requiring vigilant monitoring and intervention to reduce its progression and impact [8].

Healthcare practitioners depend on essential diagnostic procedures, such as the glomerular filtration rate (GFR) and urine tests for albumin, to diagnose and monitor CKD [9]. Using the blood test results and patient information, the GFR indicates how well the kidneys function to rid the body [10]. A GFR of less than 60 may signal kidney disease, whereas numbers under 15 mean kidney failure and the need for dialysis or transplant [11]. A urine test for albumin is a valuable way to identify kidney disease because a properly functioning kidney does not let this protein into your pee [12]. While undoubtedly helpful, these tests have their limitations in the accuracy of predicting disease progression or revealing its underlying physiological mechanisms [13,14].

ML has demonstrated significant potential in improving the diagnosis of various medical conditions [15,16,17]. In particular, the analysis and diagnosis of CKD have benefited from the application of ML techniques [18,19,20]. ML models can sift through complex datasets in search of insights and patterns that would be impossible to uncover with more conventional forms of research [21,22]. However, many ML models are notoriously difficult to understand, leading some to call them “black boxes.” A black box model provides outputs without revealing or explaining the internal decision-making processes [23]. While these models excel at prediction and classification, their lack of interpretability and transparency can hinder therapeutic decision-making, leaving patients and doctors with unanswered questions about the diagnosis and treatment [24,25].

Introducing an explainable ML model is a significant advancement in this area. By integrating a MLP model with LIME, our suggested approach does more than just predict CKD; it also explains how and why it achieved this prediction. This openness allows for a better knowledge of the disease’s origins, allowing for more informed and personalized clinical recommendations. The proposed explainable ML model is being thought about as a way to get around the main problems with current and unclear ML methods for diagnosing CKD. An essential resource for nephrologists, the model delves deep into the variables that influence its projections. This innovation makes CKD detection more precise and sheds light on how the disease develops. As a result, patients may get more effective, personalized care.

Here is the structure of the rest of the paper: Section 2 presents a comprehensive literature review on CKD from the last two years, highlighting the unique features of our study while also addressing their shortcomings. Our methodology, outlining the proposed system model, is described in Section 3. Analyzing the experimental results is the focus of Section 4. Section 5 presents a thorough discussion of our suggested model compared with previous research, including its strengths, weaknesses, and limitations. The conclusion is presented in Section 6, which also includes an examination of possible directions for further research inquiry.

2. Literature Review

In recent years, ML has made remarkable progress in illness prediction and clinical decision-making in the CKD field. Several studies propose ML models for the detection of CKD and perform a review of the developments in this field. This literature review will summarize and analyze the latest research from 2023 and 2024. It will touch on their methodologies, results, and boundaries and offer a comprehensive assessment of the actual progress in this field—and where its research problems stand.

R. K. Halder et al. (2024) developed an ML-based CKD prediction model. The model’s data preprocessing includes imputed missing data, min–max scaling, and categorical variable to numerical conversion. Feature selection methods such as lasso regression, ridge regression, sequential forward selection, variance cutoff, correlation analysis, and chi-square tests are used to refine the datasets. Various predictive models were employed to forecast CKD, including decision trees (DTs), random forests (RF), adaptive boosting (AdaBoost), support vector machines (SVM), extreme gradient boosting (XgBoost), naïve bayes (NB), and gradient boosting machine (GBM). RF and AdaBoost achieve 100% accuracy in validation methods like 70:30, 80:20, and 10-fold. The study did not include explainability measures to increase model transparency [26].

N. Alturki et al. proposed the TrioNet Model for CKD in 2024, which takes an ensemble of extra tree classifiers, RF, and XgBoost as its base model. The K-nearest neighbors (KNN) imputation was employed by the researchers to fill in missing data, and they applied the synthetic minority over-sampling technique (SMOTE) for data balancing. This resulted in an accuracy of 98.97%. Two significant drawbacks of this study are that it does not use feature selection and hyperparameter optimization (HPO) techniques. Additionally, applying SMOTE to the entire dataset rather than just the training set introduced biases and constraints, and the study lacked explainability techniques to enhance model transparency [27].

A study by M. M. Rahman (2024) focused on CKD prediction using various ensemble learning classifiers, including AdaBoost, GBM, XgBoost, light GBM, RF, voting, stacking, and bagging. Multivariate imputation by chained equations was used to address missing data, and the borderline SVM-SMOTE method was employed for data balancing. Recursive feature elimination (RFE) and the Boruta method were employed to identify significant features, with RFE demonstrating superiority by selecting only 50% of the total features. Multiple performance metrics were utilized to identify the most effective classifiers for chronic kidney disease detection. Light GBM outperformed other models with the lowest compilation time and highest accuracy, achieving an average accuracy of 99.75%. However, like the different studies, this research applied the data balancing technique to the entire dataset, which might lead to biases, and did not integrate explainable techniques for model transparency [28].

P. Mahajan et al. (2024) used a number of datasets to compare and contrast different ensemble ML methods for illness prediction. The researchers tested bagging, boosting, and stacking ensemble ML algorithms with different base classifiers. Using grid search HPO, ensemble approaches obtained 100% accuracy on the UCI CKD dataset. The study did not use sophisticated imputation methods to manage missing data, feature selection methods, or cross-validation (CV) to evaluate model generalization. Their methodology’s absence of explainable procedures raises problems about model transparency and interoperability [29].

In their study, Kaur et al. (2023) sought to utilize the UCI CKD dataset to develop a ML model that can accurately identify CKD. In the analysis of missing data, the researchers employed Little’s MCAR test. For the purpose of feature selection, ant colony optimization was utilized. The classification was accomplished using DTs, RF, and KNN algorithms; the RF classifier yielded a 96% accuracy rate. The work did not address issues with explainability, HPO, or advanced ML approaches [30].

D. Swain (2023) conducted a study on predicting CKD using ML methods. The study specifically utilized the UCI CKD dataset. Mean imputation was used for numerical variables and mode imputation for categorical variables to fix the missing data issue. The researchers utilized the SMOTE technique to balance the data and determined nine crucial features based on the chi-squared score. By employing RF and SVM methods, they fine-tuned hyperparameters using grid search CV, resulting in an accuracy of 99.33% for SVM and 98.67% for RF. The 10-fold CV score demonstrated the model’s capacity to generalize. Nevertheless, applying SMOTE to the entire dataset instead of only the training set may induce biases. The study did not employ any explainable strategies to guarantee model transparency [31].

The study conducted by M. S. Arif et al. (2023) introduced an ML model for predicting CKD. The model used a sequential data scaling technique that combined robust, standard, and min–max scaling, and iterative imputation was used to deal with missing values. The researchers used Gaussian NB and KNN models, optimizing hyperparameters by grid search CV. As a result, they attained a 100% accuracy rate with KNN and a 97.5% accuracy rate with Gaussian NB. Their model’s generalization was validated through a 10-fold CV score. Yet, the absence of explainable techniques in their approach raises concerns about the model’s transparency [32].

In 2023, numerous further research projects also made contributions to the prediction of CKD using ML approaches on the UCI CKD dataset. A study that was carried out by A. Farjana et al. showed that Light GBM outperformed other models in predicting CKD, with an impressive accuracy rate of 99% [33]. Furthermore, various ML classifiers were examined by M. A. Islam et al., with the XgBoost classifier achieving the highest performance metrics at 98.3% [34]. V. K. Venkatesan, in a similar vein, compared the efficacy of XgBoost with a variety of base learners, such as SVM, KNN, RF, logistic regression, and DTs. The findings demonstrated that XGBoost surpassed its competitors with an accuracy of 98.00% [35]. S. M. Ganie et al. conducted a comparative analysis of the performance of various boosting algorithms: XgBoost, CatBoost, LightGBM, AdaBoost, and GBM. AdaBoost demonstrated superior overall performance, attaining an accuracy of 98.47% [36]. G. Shukla et al. used various ML techniques like KNN, DTs, and artificial neural networks, finding that the DTs showed the best result with 98.60% accuracy [37]. All these studies have limitations, such as a lack of advanced preprocessing steps, advanced ML models, and explainable techniques for transparency.

After going over this research, it’s obvious that there are a lot of holes and restrictions that need to be filled in order to improve CKD prediction. This study aims to fill a gap in the existing literature by addressing these limitations and adding new perspectives. Our work primarily introduces the following innovations:

Effective preprocessing steps are used to improve the quality of the dataset, including KNN imputation for numerical features and mode imputation for categorical features.
Feature selection is performed using the SelectKBest method with mutual info score to select the top 12 features, enhancing decision-making and reducing computation time.
For the purpose of detecting and predicting CKD, an MLP model is suggested. The model is trained using 75% of the dataset and validated using 25%. By using LIME, one may better understand the model’s predictions and the reasoning behind them.
Several assessment metrics, including accuracy, precision, recall, F1-score, and curve analysis, are employed to confirm the efficacy of the proposed model. In addition, we evaluate the performance of the proposed model by comparing it to other models such as RF, DTs, KNN, ridge classifier, logistic regression, stochastic gradient descent (SGD), Bernoulli NB, and Gaussian NB.

3. Methodology

In this section, we will provide a concise explanation of the process used to predict CKD using a ML model. As seen in Figure 1, the process can be broken down into six primary stages. Data collection is the initial stage, while data preprocessing is the second. The next step is feature selection, which involves picking out the attributes that will be most useful for our model. The next step is to train the model and then test it with different metrics. Following training, we incorporate XAI approaches to decipher and clarify the model’s predictions. At last, we check the model’s robustness and dependability with a performance evaluation.

3.1. Data Collection

The dataset used in this research was obtained from the UCI ML Repository [38]. It contains various types of data, primarily for classification tasks, with real-valued features. The dataset includes 400 instances and 25 features, encompassing demographic, clinical, and laboratory data. The 24 predictive variables included in each sample are distributed as follows: 11 numerical variables and 13 nominal or categorical variables. The target class is binary: CKD and not CKD.

3.2. Preprocessing

Medical datasets often have various issues that can affect the accuracy and efficiency of modeling [39]. Therefore, it is necessary to perform preprocessing to improve data quality [40]. The preprocessing stage includes three main steps, as shown in Figure 2. These are data encoding, data imputation for missing values, and data scaling.

3.2.1. Data Encoding

Since our dataset contains both numerical and categorical values, we use a label encoder to handle this. The label encoder transforms categorical values into numerical values, enabling their integration into the model.

3.2.2. Data Imputation

Handling missing values is a critically important step, as missing values can introduce biases, make data analysis more difficult, and reduce the efficiency of the model [41]. Simply deleting missing data can result in biases and impair the generalizability of the results. Therefore, imputation of missing values is crucial [42]. Using imputation, which involves replacing missing data with suitable values, much of the dataset’s information can be preserved [43]. Traditional techniques like mean or median imputation work well when there is a small amount of missing data [44]. However, in our case, we have a significant percentage of missing data, as shown in Figure 3. Thus, we use KNN imputation for numerical values and mode imputation for categorical values.

KNN imputation fills in missing values by leveraging the values of the nearest neighbors, identified using a similarity measure such as euclidean distance. This multivariate technique captures the inherent relationships in the data, making it especially effective for numerical variables [45,46].

Algorithm 1 outlines the steps for imputing missing data. First, the dataset X is separated into numerical and categorical features. For each numerical feature j, if a sample i has a missing value, the euclidean distance between sample i and all other samples is computed, identifying the 5 nearest neighbors (5-NN). The missing value is then imputed by averaging the values of these neighbors. For categorical features, missing values are filled with the mode. Finally, the imputed numerical and categorical features are combined to produce the complete dataset

\hat{X}

.

Algorithm 1 The pseudocode of missing values imputation

Require: Dataset X
Ensure: Imputed dataset $\hat{X}$
1:
Separate numerical and categorical features
2:
for each numerical features j do
3:
    for each sample i with missing value $X_{i, j}$ do
4:
        Compute $d_{i m} = \sqrt{\sum_{l} {(X_{i, l} - X_{m, l})}^{2}}$ and identify 5-NN
5:
        Impute $X_{i, j}$ with ${\hat{X}}_{i, j} = \frac{1}{5} \sum_{m \in 5 - NN} X_{m, j}$
6:
    end for
7:
end for
8:
for each categorical features j do
9:
    Find $Mode (X_{\cdot, j}) = most frequent (X_{\cdot, j})$ and fill missing values ${\hat{X}}_{i, j} = Mode (X_{\cdot, j})$
10:
end for
11:
Combine imputed numerical and categorical features
12:
Return the imputed dataset $\hat{X}$

3.2.3. Data Scaling

Data scaling is a crucial component of preprocessing, involving the alteration of the range of feature values while keeping the data itself unchanged. The data becomes significantly more comprehensible and amenable to analysis using this technique. Min–max scaling is a widely used technique for scaling data, which entails modifying the data values to be inside a predetermined range, typically ranging from 0 to 1. The formula for min–max scaling is as follows:

Min - Max Scaling (x) = \frac{x - x_{min}}{x_{max} - x_{min}}

(1)

To improve and simplify the data analysis process, we used min–max scaling to normalize the feature values in our study.

3.3. Feature Selection

Feature selection enhances the accuracy and prediction capability of ML algorithms by identifying the most important variables and eliminating unnecessary ones. This technique is crucial because it helps simplify the model, reduce overfitting, and make it easier to interpret and present the results [47].

With a dataset of 24 features, the decision-making process becomes more complex. For streamlining this process, we incorporated feature selection by utilizing the SelectKBest method and a mutual information score. Using a filter-based approach, SelectKBest assesses the significance of each feature, regardless of the ML model employed [48]. Statistical techniques are utilized to assess and prioritize the characteristics based on their correlation with the output variable. In this situation, the mutual information score plays a crucial role as it quantifies the level of dependence between characteristics and the response variable.

The mutual information between feature X and response variable Y is defined as follows:

Mutual Information = \sum_{x \in X, y \in Y} p (x, y) log (\frac{p (x, y)}{p (x) p (y)})

(2)

where

p (x, y)

denotes the joint probability distribution function of X and Y, whereas

p (x)

and

p (y)

represent the marginal probability distribution functions of X and Y, respectively.

Using SelectKBest with mutual information score, we identified the top 12 features, which are detailed in Table 1. These selected features, significant both statistically and clinically for CKD, are used for further analysis and model development. Adding them to the model makes it better at identifying and predicting CKD, which shows how useful it is for finding the disease early and treating it well.

A correlation heatmap (Figure 4) is used to evaluate the linear relationships between different features of CKD. You can assess the strength and direction of these associations using the correlation coefficient, which ranges from −1 to +1. When both variables are increasing, we say that there is a positive correlation, and when both are decreasing, we say that there is a negative correlation. There is no linear relationship when the value is 0. Dark blue denotes high negative correlations in the heatmap, while dark red suggests strong positive correlations. The heatmap reveals that hemoglobin, packed cell volume, and specific gravity are strongly correlated with CKD. For example, specific gravity has a strong positive correlation with CKD (0.71), making it a significant predictor, while albumin shows a strong negative correlation (−0.61). This visual representation helps identify key clinical and laboratory features that play an important role in CKD detection and progression, highlighting variables that may warrant further investigation in clinical practice or ML models.

3.4. Model Training

One kind of feedforward neural network that excels at handling non-linearly separable input is the Multi-layer Perceptron (MLP). Figure 5 shows an MLP model with two hidden layers. In each layer, there are neurons, and neurons in adjacent layers are fully connected to each other. Each connection has an associated weight, determining the strength of the connection. These weights are learned during the training process [49,50]. Mathematically, the operation of a neuron can be described as follows:

z = \sum_{i = 1}^{n} w_{i} x_{i} + b

(3)

where z is the weighted sum of inputs,

w_{i}

are the weights,

x_{i}

are the input features, and b is the bias term. Every neuron in the hidden layers and the output layer utilizes an activation function f on its weighted input sum:

a = f (z)

(4)

MLPs are trained using the backpropagation algorithm, which computes gradients of a loss function

L

with respect to the model’s parameters and updates the parameters iteratively to minimize the loss [51]. The weight update rule can be expressed as:

w_{i j} \leftarrow w_{i j} - η \frac{\partial L}{\partial w_{i j}}

(5)

where

w_{i j}

is the weight between neuron i and neuron j,

η

is the learning rate, and

\frac{\partial L}{\partial w_{i j}}

is the gradient of the loss function with respect to the weight. This training typically uses optimization algorithms, allowing the MLP to learn complex patterns and perform tasks effectively.

In this study, we propose an MLP model for the prediction of CKD. The values of the hyperparameters utilized to construct the MLP model are presented in Table 2. The model consists of an output layer, two hidden layers, and an input layer. The input layer incorporates 12 features associated with CKD. The initial hidden layer comprises 59 neurons, the subsequent hidden layer contains 94 neurons, and the output layer is responsible for classifying CKD. The model employs the rectified linear unit (ReLU) activation function, which is defined as follows:

ReLU (ξ) = max (0, ξ)

To prevent overfitting, we apply an L2 penalty (alpha) with a value of 0.005. The learning rate is set to 0.01, and the model is trained for 100 iterations. We add a momentum of 0.3 to help accelerate gradient vectors towards faster convergences and use a random state of 42. For the purpose of minimizing the loss function, the Adam optimizer is utilized.

XAI Integration Module

The XAI Integration step employs the LIME technique to offer lucid explanations for the outputs produced by our proposed model. LIME is a technique used to understand the relationship between input parameters and the output of a pre-trained model, regardless of its complexity [52]. Using a linear model or another smaller, more interpretable model trained on a portion of the original data centered on the instance of interest, this strategy approximates the behavior of a complicated model near a given data point. This subset is created by altering the attributes of the instance while maintaining the label unchanged. By analyzing the response of the simplified model to these altered examples, we can deduce the behavior of the original model. Mathematically, LIME explanations for a specific observation x are formulated as follows:

ψ (x) = arg min_{λ \in Λ} J (f, λ, σ_{x}) + Γ (λ)

(6)

In this equation, the following is true:

$Λ$ is the set of interpretable models and $λ \in Λ$ denotes the explanation model.
The function f maps from $R^{d}$ to $R$ .
$σ_{x} (z)$ measures the proximity between z and x.
$Γ (λ)$ quantifies the complexity of $λ$ .
The loss function $J$ represents the difference between the approximation of f by $λ$ within the region defined by $σ_{x}$ , measuring how well the approximation fits.

LIME operates by minimizing the loss function

J

without making assumptions about f, which underscores the model-agnostic nature.

The XAI Integration phase comprises six primary steps, as illustrated in the Algorithm 2. To begin, we import the pre-trained CKD model and choose a specific sample for both prediction and explanation. Subsequently, we create variations of this sample and proceed to train a regression model using the modified dataset. Subsequently, we compute the magnitude of each characteristic in the localized model and produce an elucidation by emphasizing the most impactful aspects.

Algorithm 2 LIME for CKD prediction model

Require:
Pre-trained CKD prediction model
Ensure:
Explanation of the model prediction for a sample
1:
procedure LIME_Explanation
2:
    Load the CKD prediction model
3:
    Select a sample
4:
    Generate perturbations of the selected sample
5:
    Fit a regression model on these perturbations
6:
    Compute feature importance in the local model
7:
    Provide an explanation by emphasizing the most influential features
8:
end procedure

3.5. Performance Metrics

In order to evaluate the efficacy of the proposed model, we utilize various metrics, including recall, accuracy, precision, and F1 score. According to Table 3, the confusion matrix is used for the evaluation. The dataset determines the four results that the confusion matrix produces: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Accuracy = \frac{TN + TP}{TN + FP + TP + FN}

(7)

Precision = \frac{TP}{TP + FP}

(8)

Recall = \frac{TP}{TP + FN}

(9)

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(10)

4. Results

In this research, various tools were utilized, such as the Python programming language and several libraries, including NumPy for performing mathematical operations, Pandas for data manipulation and analysis, Matplotlib for creating various types of plots, LIME for explainable AI, and Scikit-learn for building ML models, performing data preprocessing, feature selection, and model evaluation.

4.1. Performance Evaluation

In this study, we propose an ML model for the effective prediction of CKD. The dataset that we utilized was taken from the UCI ML repository, and it has a considerable number of values that are missing. KNN imputation was applied for numerical values, and mode imputation was used for categorical values. Subsequently, we scaled the data using min–max scaling. The dataset comprises 24 predictive features, and we used the SelectKBest method with mutual information score to select the 12 most important features, ensuring that our model is well-formed and easy to use. Our model was trained using a two-hidden layer MLP model. Two hidden layers are present: one with 59 neurons and the other with 94 neurons. We used the Adam solver and the ReLU activation function. The data were divided into 75% for training and 25% for testing. Accuracy, precision, recall, and F1-score were among the metrics used to evaluate the model.

To compare our proposed model, we also implemented other ML models such as ridge classifier, SGD Classifier, Bernoulli NB, logistic regression, Gaussian NB, random forest, and decision tree using the Scikit-learn library. The accuracy, precision, recall, and F1-score, among other performance indicators, are displayed in Table 4 for both our proposed model and the other models. We achieved 100% accuracy, precision, recall, and F1 score with our proposed model, which is superior to others. Figure 6 visually illustrates the performance comparison between the proposed model and other models.

To further evaluate the performance of our model, we calculated the area under the ROC curve (Figure 7) and the precision-recall curve (Figure 8). Our proposed model outperformed the others, achieving a score of 1.

4.2. Model Explainablity

Our study utilizes a LIME model to gain insights into our ML model’s predictions. Figure 9 showcases this for a patient predicted with a high probability (0.98) of CKD. Elevated hemoglobin (>14.80) and serum creatinine (0.90–1.30) are key factors influencing this prediction, despite the absence of hypertension and diabetes. The patient’s specific values (hemoglobin 16.30, creatinine 1.00) align with these high-risk features.

Conversely, Figure 10 demonstrates a patient confidently (1.00) predicted to have normal kidney function. Here, the model emphasizes the importance of normal serum creatinine (>2.70) and albumin (>2.00) in excluding CKD. The patient’s values (creatinine 15.00, albumin 3.00) support this interpretation.

5. Discussion

CKD is a progressive condition that often goes undetected until advanced stages, making early diagnosis crucial for effective intervention. The analysis and diagnosis of CKD are being enhanced by applying ML techniques. ML models can sift through complex datasets in search of insights and patterns that would be impossible to uncover with more conventional forms of research. The research presented in this paper focuses on the development of an explainable ML model for the prediction of CKD. Our study aims to address key challenges in ML-based medical diagnosis, including dataset balancing, handling missing data, feature selection, and model transparency.

We employed the CKD dataset from the UCI ML Repository, renowned for its substantial presence of missing values. This problem was addressed by taking a comprehensive approach to missing data by using KNN imputation for numerical values and mode imputation for categorical values. Min–max scaling was used to normalize the feature values during data scaling. This approach improved the performance of the model. There are 24 predictive features in the dataset. Albumin, specific gravity, blood glucose random, sodium, potassium, hemoglobin, packed cell volume, red blood cell, hypertension, and diabetes mellitus were among the twelve most important features that were chosen using the SelectKBest technique with mutual information score. These selected features, significant both statistically and clinically for CKD, This selection procedure preserves our model’s robustness and generalizability. Our proposed MLP model comprises an output layer, two hidden layers, and an input layer. The first hidden layer consists of 59 neurons, and the second hidden layer has 94 neurons. The model uses the ReLU activation function. To prevent overfitting, we apply an L2 penalty with a value of 0.005. The learning rate is set to 0.01, and the model is trained for 100 iterations. We add a momentum of 0.3 to help accelerate gradient vectors towards faster convergences and use a random state of 42. The model was optimized using the Adam solver. To thoroughly evaluate the model’s performance, the dataset was divided into 25% for testing and 75% for training. Several measures were used to assess our proposed model, such as F1-score, recall, accuracy, and precision. By comparing our model to other popular ML models, we found that it performs better than Bernoulli NB, Gaussian NB, logistic regression, DTs, RF, ridge, and SGD classifiers. In addition, we compared our model to other models in the literature that utilize the same dataset. Table 5 shows that our model outperforms the prior studies regarding accuracy and overall performance, demonstrating our superiority.

The main problem with ML models is that they are not transparent. Even though ML models are very accurate, doctors and patients still don’t always trust them, which is a major issue in the healthcare industry. These models have limited practical application in healthcare due to the lack of openness surrounding their prediction processes, such as disease diagnosis. Lack of trust occurs when healthcare providers are unable to comprehend or justify the reasoning behind a machine learning model’s diagnosis. Because of this, they have not been widely used in practice, especially in the medical sector. To combat this, XAI techniques can make decision-making processes more open and clear, which benefits patients and doctors alike. More effective and moral AI healthcare applications result from such openness. Predictions made by AI in healthcare are extremely significant since, if not properly understood, they might cause wrong diagnoses, treatment suggestions, or potentially fatal outcomes [54,55,56].

Our research makes a noteworthy addition by incorporating XAI techniques to improve the transparency of the model. We utilized the LIME technique to offer clear and comprehensible insights into the predictions made by the model. This integration guarantees that the model not only forecasts CKD but also explains its predictions, addressing the crucial matter of transparency in ML-based medical diagnosis. The explanations produced by LIME aid physicians and patients in comprehending the determinants that impact the judgments made by the model, therefore enhancing trust and enabling more knowledgeable medical choices. According to this study’s results, implications for medical diagnostics ML model development are substantial. Our technique provides a trustworthy and reliable tool for CKD prediction by improving upon previous efforts in data imputation, feature selection, and model explainability.

6. Conclusions

This paper presents an explainable ML model for the early identification of CKD, employing an MLP architecture alongside LIME to improve interpretability. The suggested model exhibited commendable efficacy in predicting CKD while providing transparency in its decision-making process, thereby eliminating a significant obstacle to the integration of machine learning in healthcare. This model enhances trust by elucidating the prediction process, enabling healthcare practitioners to assess its dependability, which may lead to improved clinical decision-making and patient outcomes. The incorporation of explainable AI facilitates the connection between sophisticated technology and clinical application, hence fostering a broader acceptance of machine learning models in medical practice. Subsequent efforts will concentrate on verifying the model’s efficacy using larger and more varied datasets, in addition to enhancing its interpretability features to better assist medical practitioners in comprehending the model’s predictive behavior. Our research concludes by presenting an explainable ML model for CKD prediction to solve crucial difficulties and lay the groundwork for breakthroughs in ML-based medical diagnostics. The combination of LIME and our MLP model is a big step toward implementing trustworthy and transparent AI in healthcare, with the ultimate goal of enhancing patient outcomes and the clinical decision-making process.

Future research could further investigate applying our methodology to additional medical disorders to validate the generalizability and utility of explainable ML models in healthcare. Furthermore, our model exhibits promising results; however, it is imperative to continuously update and validate it with a broader and more diverse array of datasets to guarantee its long-term reliability and effectiveness. The model’s practical applicability could be further enhanced by integrating other advanced XAI techniques and developing user-friendly interfaces for clinical practitioners in future research.

Author Contributions

D.A., conceptualization, data curation, methodology, software, validation, visualization, writing—original draft; M.S.A., conceptualization, methodology, validation, project administration, visualization, writing original draft; A.U.R., funding acquisition, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication.

Data Availability Statement

The data used in this study are publicly available at https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 10 June 2023).

Acknowledgments

We would like to express our sincere appreciation to Prince Sultan University, Riyadh, Saudi Arabia, for their invaluable support in facilitating the publication of this paper through the Theoretical and Applied Sciences Lab.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Podkowińska, A.; Formanowicz, D. Chronic kidney disease as oxidative stress-and inflammatory-mediated cardiovascular disease. Antioxidants 2020, 9, 752. [Google Scholar] [CrossRef] [PubMed]
Wadei, H.M.; Textor, S.C. The role of the kidney in regulating arterial blood pressure. Nat. Rev. Nephrol. 2012, 8, 602–609. [Google Scholar] [CrossRef]
Romagnani, P.; Remuzzi, G.; Glassock, R.; Levin, A.; Jager, K.J.; Tonelli, M.; Massy, Z.; Wanner, C.; Anders, H.J. Chronic kidney disease. Nat. Rev. Dis. Prim. 2017, 3, 1–24. [Google Scholar] [CrossRef] [PubMed]
Kalantar-Zadeh, K.; Jafar, T.H.; Nitsch, D.; Neuen, B.L.; Perkovic, V. Chronic kidney disease. Lancet 2021, 398, 786–802. [Google Scholar] [CrossRef] [PubMed]
Hussain, S.; Jamali, M.C.; Habib, A.; Hussain, M.S.; Akhtar, M.; Najmi, A.K. Diabetic kidney disease: An overview of prevalence, risk factors, and biomarkers. Clin. Epidemiol. Glob. Health 2021, 9, 2–6. [Google Scholar] [CrossRef]
Burnier, M.; Damianaki, A. Hypertension as cardiovascular risk factor in chronic kidney disease. Circ. Res. 2023, 132, 1050–1063. [Google Scholar] [CrossRef]
Jager, K.J.; Kovesdy, C.; Langham, R.; Rosenberg, M.; Jha, V.; Zoccali, C. A single number for advocacy and communication—Worldwide more than 850 million individuals have kidney diseases. Nephrol. Dial. Transplant. 2019, 34, 1803–1805. [Google Scholar] [CrossRef]
Kovesdy, C.P. Epidemiology of chronic kidney disease: An update 2022. Kidney Int. Suppl. 2022, 12, 7–11. [Google Scholar]
Vassalotti, J.A.; Centor, R.; Turner, B.J.; Greer, R.C.; Choi, M.; Sequist, T.D.; National Kidney Foundation Kidney Disease Outcomes Quality Initiative. Practical approach to detection and management of chronic kidney disease for the primary care clinician. Am. J. Med. 2016, 129, 153–162. [Google Scholar] [CrossRef]
Inker, L.A.; Titan, S. Measurement and estimation of GFR for use in clinical practice: Core curriculum 2021. Am. J. Kidney Dis. 2021, 78, 736–749. [Google Scholar] [CrossRef]
Webster, A.C.; Nagler, E.V.; Morton, R.L.; Masson, P. Chronic kidney disease. Lancet 2017, 389, 1238–1252. [Google Scholar] [CrossRef] [PubMed]
Martin, H. Laboratory measurement of urine albumin and urine total protein in screening for proteinuria in chronic kidney disease. Clin. Biochem. Rev. 2011, 32, 97. [Google Scholar]
Levey, A.S.; Inker, L.A. Assessment of glomerular filtration rate in health and disease: A state of the art review. Clin. Pharmacol. Ther. 2017, 102, 405–419. [Google Scholar] [CrossRef] [PubMed]
Stevens, L.A.; Levey, A.S. Current status and future perspectives for CKD testing. Am. J. Kidney Dis. 2009, 53, S17–S26. [Google Scholar] [CrossRef]
Asif, D.; Bibi, M.; Arif, M.S.; Mukheimer, A. Enhancing Heart Disease Prediction through Ensemble Learning Techniques with Hyperparameter Optimization. Algorithms 2023, 16, 308. [Google Scholar] [CrossRef]
Awan, M.Z.; Arif, M.S.; Abideen, M.Z.U.; Abodayeh, K. Comparative analysis of machine learning models for breast cancer prediction and diagnosis: A dual-dataset approach. Indones. J. Electr. Eng. Comput. Sci. 2024, 34, 2032–2044. [Google Scholar] [CrossRef]
Kolasa, K.; Admassu, B.; Hołownia-Voloskova, M.; Kędzior, K.J.; Poirrier, J.E.; Perni, S. Systematic reviews of machine learning in healthcare: A literature review. Expert Rev. Pharmacoeconomics Outcomes Res. 2024, 24, 63–115. [Google Scholar] [CrossRef]
Ho, Y.S.; Fülöp, T.; Krisanapan, P.; Soliman, K.M.; Cheungpasitporn, W. Artificial intelligence and machine learning trends in kidney care. Am. J. Med. Sci. 2024, 367, 281–295. [Google Scholar] [CrossRef]
Almustafa, K.M. Prediction of chronic kidney disease using different classification algorithms. Inform. Med. Unlocked 2021, 24, 100631. [Google Scholar] [CrossRef]
Poonia, R.C.; Gupta, M.K.; Abunadi, I.; Albraikan, A.A.; Al-Wesabi, F.N.; Hamza, M.A.; B, T. Intelligent Diagnostic Prediction and Classification Models for Detection of Kidney Disease. Healthcare 2022, 10, 371. [Google Scholar] [CrossRef]
Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef] [PubMed]
Ghazal, T.M.; Hasan, M.K.; Alshurideh, M.T.; Alzoubi, H.M.; Ahmad, M.; Akbar, S.S.; Al Kurdi, B.; Akour, I.A. IoT for smart cities: Machine learning approaches in smart healthcare—A review. Future Internet 2021, 13, 218. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Fröhlich, H.; Balling, R.; Beerenwinkel, N.; Kohlbacher, O.; Kumar, S.; Lengauer, T.; Maathuis, M.H.; Moreau, Y.; Murphy, S.A.; Przytycka, T.M.; et al. From hype to reality: Data science enabling personalized medicine. BMC Med. 2018, 16, 150. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Halder, R.K.; Uddin, M.N.; Uddin, M.A.; Aryal, S.; Saha, S.; Hossen, R.; Ahmed, S.; Rony, M.A.T.; Akter, M.F. ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application. J. Pathol. Inform. 2024, 15, 100371. [Google Scholar] [CrossRef]
Alturki, N.; Altamimi, A.; Umer, M.; Saidani, O.; Alshardan, A.; Alsubai, S.; Omar, M.; Ashraf, I. Improving Prediction of Chronic Kidney Disease Using KNN Imputed SMOTE Features and TrioNet Model. Comput. Model. Eng. Sci. 2024, 139, 3513–3534. [Google Scholar] [CrossRef]
Rahman, M.M.; Al-Amin, M.; Hossain, J. Machine learning models for chronic kidney disease diagnosis and prediction. Biomed. Signal Process. Control 2024, 87, 105368. [Google Scholar]
Mahajan, P.; Uddin, S.; Hajati, F.; Moni, M.A.; Gide, E. A comparative evaluation of machine learning ensemble approaches for disease prediction using multiple datasets. Health Technol. 2024, 14, 597–613. [Google Scholar] [CrossRef]
Kaur, C.; Kumar, M.S.; Anjum, A.; Binda, M.B.; Mallu, M.R.; Al Ansari, M.S. Chronic Kidney Disease Prediction Using Machine Learning. J. Adv. Inf. Technol. 2023, 14, 384–391. [Google Scholar] [CrossRef]
Swain, D.; Mehta, U.; Bhatt, A.; Patel, H.; Patel, K.; Mehta, D.; Acharya, B.; Gerogiannis, V.C.; Kanavos, A.; Manika, S. A Robust Chronic Kidney Disease Classifier Using Machine Learning. Electronics 2023, 12, 212. [Google Scholar] [CrossRef]
Arif, M.S.; Mukheimer, A.; Asif, D. Enhancing the early detection of chronic kidney disease: A robust machine learning model. Big Data Cogn. Comput. 2023, 7, 144. [Google Scholar] [CrossRef]
Farjana, A.; Liza, F.T.; Pandit, P.P.; Das, M.C.; Hasan, M.; Tabassum, F.; Hossen, M.H. Predicting Chronic Kidney Disease Using Machine Learning Algorithms. In Proceedings of the 2023 IEEE 13th Annual Computing and Communication Workshop and Conference, Virtual, 8–11 March 2023; pp. 1267–1271. [Google Scholar]
Islam, M.A.; Majumder, M.Z.H.; Hussein, M.A. Chronic kidney disease prediction based on machine learning algorithms. J. Pathol. Inform. 2023, 14, 100189. [Google Scholar] [CrossRef] [PubMed]
Venkatesan, V.K.; Ramakrishna, M.T.; Izonin, I.; Tkachenko, R.; Havryliuk, M. Efficient data preprocessing with ensemble machine learning technique for the early detection of chronic kidney disease. Appl. Sci. 2023, 13, 2885. [Google Scholar] [CrossRef]
Ganie, S.M.; Dutta Pramanik, P.K.; Mallik, S.; Zhao, Z. Chronic kidney disease prediction using boosting techniques based on clinical parameters. PLoS ONE 2023, 18, e0295234. [Google Scholar] [CrossRef] [PubMed]
Shukla, G.; Dhuriya, G.; Pillai, S.K.; Saini, A. Chronic kidney disease prediction using machine learning algorithms and the important attributes for the detection. In Proceedings of the 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), Warsaw, Poland, 19–21 May 2023; pp. 1–4. [Google Scholar]
Rubini, L.; Soundarapandian, P.; Eswaran, P. Chronic Kidney Disease. UCI Machine Learning Repository. 2015. Available online: https://archive.ics.uci.edu/dataset/336/chronic+kidney+disease (accessed on 10 May 2024).
Bellazzi, R.; Zupan, B. Predictive data mining in clinical medicine: Current issues and guidelines. Int. J. Med. Inform. 2008, 77, 81–97. [Google Scholar] [CrossRef]
García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big data preprocessing: Methods and prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef]
Salgado, C.M.; Azevedo, C.; Proença, H.; Vieira, S.M. Missing data. In Secondary Analysis of Electronic Health Records; Springer: Cham, Switzerland, 2016; pp. 143–162. [Google Scholar]
Carpenter, J.R.; Smuk, M. Missing data: A statistical framework for practice. Biom. J. 2021, 63, 915–947. [Google Scholar] [CrossRef]
Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021, 8, 140. [Google Scholar] [CrossRef]
Dong, Y.; Peng, C.Y.J. Principled missing data methods for researchers. SpringerPlus 2013, 2, 222. [Google Scholar] [CrossRef]
Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med Inform. Decis. Mak. 2016, 16, 197–208. [Google Scholar] [CrossRef] [PubMed]
Zhang, S. Nearest neighbor selection for iteratively kNN imputation. J. Syst. Softw. 2012, 85, 2541–2552. [Google Scholar] [CrossRef]
Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef] [PubMed]
Saeed, M.H.; Hama, J.I. Cardiac disease prediction using AI algorithms with SelectKBest. Med. Biol. Eng. Comput. 2023, 61, 3397–3408. [Google Scholar] [CrossRef] [PubMed]
Sonawane, J.S.; Patil, D.R. Prediction of heart disease using multilayer perceptron neural network. In Proceedings of the International Conference on Information Communication and Embedded Systems (ICICES2014), Chennai, India, 27–28 February 2014; pp. 1–6. [Google Scholar]
Nayeem, M.O.G.; Wan, M.N.; Hasan, M.K. Prediction of disease level using multilayer perceptron of artificial neural network for patient monitoring. Int. J. Soft Comput. Eng. 2015, 5, 17–23. [Google Scholar]
Sengupta, S.; Basak, S.; Saikia, P.; Paul, S.; Tsalavoutis, V.; Atiah, F.; Ravi, V.; Peters, A. A review of deep learning with special emphasis on architectures, applications and recent trends. Knowl.-Based Syst. 2020, 194, 105596. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Rehman, A.; Saba, T.; Ali, H.; Elhakim, N.; Ayesha, N. Hybrid machine learning model to predict chronic kidney diseases using handcrafted features for early health rehabilitation. Turk. J. Electr. Eng. Comput. Sci. 2023, 31, 951–968. [Google Scholar]
Srinivasu, P.N.; Sandhya, N.; Jhaveri, R.H.; Raut, R. From blackbox to explainable AI in healthcare: Existing tools and case studies. Mob. Inf. Syst. 2022, 1, 8167821. [Google Scholar] [CrossRef]
Chaddad, A.; Peng, J.; Xu, J.; Bouridane, A. Survey of explainable AI techniques in healthcare. Sensors 2023, 23, 634. [Google Scholar] [CrossRef]
Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [Google Scholar] [CrossRef]

Figure 1. Proposed workflow.

Figure 2. Data preprocessing steps, including data encoding, imputation, and scaling.

Figure 3. The missing values.

Figure 4. Correlation heatmap illustrating the linear relationships between different features relevant to CKD.

Figure 5. A multilayer perceptron model with two hidden layers.

Figure 6. The performance comparison of various models.

Figure 7. The area under the ROC curve.

Figure 8. The precision–recall curve.

Figure 9. LIME explanation for a patient predicted to have CKD, with orange highlighting features supporting CKD and blue representing features for a normal prediction.

Figure 10. LIME explanation for a patient predicted to be normal, with blue highlighting features contributing to a normal prediction and orange indicating features that are less relevant to CKD.

Table 1. Comprehensive details of selected features for predictive modeling.

Feature	Name	Description
sg	Specific gravity	Total concentration of all chemical particles in the urine
al	Albumin	Level of albumin in blood (0, 1, 2, 3, 4, 5)
bgr	Blood glucose random	Level of glucose blood at a random time (mgs/dL)
bu	Blood urea	Level of urea nitrogen in blood (mgs/dL)
sc	Serum creatinine	Level of creatinine in blood (mgs/dL)
sod	Sodium	Level of sodium in the blood (mEq/L)
pot	Potassium	Level of potassium in the blood (mEq/L)
hemo	Hemoglobin	Level of hemoglobin (Oxygen-transporting red protein) in blood (gms)
pcv	Packed cell volume	Proportion of blood volume occupied by cells
rc	Red blood cell count	Number of red blood cells (millions/cmm)
htn	Hypertension	Persistently raised arterial blood pressure (yes, no)
dm	Diabetes mellitus	Inadequate control of blood levels of glucose (yes, no)

Table 2. Hyperparameters and their values used to develop the MLP model.

Hyperparameter	Value
`Activation`	`Relu`
`Alpha`	0.005
`Hidden layer size`	(59, 94)
`Learning rate`	0.01
`Iteration`	100
`Momentum`	0.3
`Random state`	42
`Solver`	`Adam`

Table 3. Confusion matrix.

	Predict Positive	Predict Negative
Actual Positive	$T P$	$F N$
Actual Negative	$F P$	$T N$

Table 4. The performance metrics of various models.

Model	Accuracy	Precision	Recall	F1-Score
Ridge classifier	95	89.47	97.14	93.15
SGD Classifier	99	100	97.14	98.55
Bernoulli NB	93	86.84	94.38	90.41
Logistic Regression	98	97.14	97.14	97.14
Gaussian NB	96	89.74	100	94.59
Random Forest	98	100	97.28	97.05
Decision Tree	96	94.28	94.28	94.28
The Proposed Model	100	100	100	100

Table 5. The comparison of our proposed model with other studies on the same CKD dataset.

Authors	Method	Accuracy
R. K. Halder et al (2024) [26]	Random Forest, Adaboost	100%
N. Alturki et al. (2024) [27]	TrioNet (Extra Tree + Random Forest + XgBoost)	98.97%
M. M. Rahman (2024) [28]	Light GBM	99.75%
P. Mahajan et al. (2024) [29]	Bagging, Boosting, Stacking	100%
C. Kaur et al. (2023) [30]	Random Forest	96%
D. Swain et al. (2023) [31]	Support Vector Machine	99.33%
M. S. Arif et al. (2023) [32]	k-Nearest Neighbors	100%
A. Farjana et al. (2023) [33]	Light GBM	98.30%
M. A. Islam et al. (2023) [34]	XgBoost	98.30%
V. K. Venkatesan et al. (2023) [35]	XgBoost	98%
S. M. Ganie et al. (2023) [36]	AdaBoost	98.47%
G. Shukla et al. (2023) [37]	Decision Tree	98.60%
A. Rehman et al. (2023) [53]	Logistic Regression	98.5%
Our proposed model	Multi-layer Perceptron	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arif, M.S.; Rehman, A.U.; Asif, D. Explainable Machine Learning Model for Chronic Kidney Disease Prediction. Algorithms 2024, 17, 443. https://doi.org/10.3390/a17100443

AMA Style

Arif MS, Rehman AU, Asif D. Explainable Machine Learning Model for Chronic Kidney Disease Prediction. Algorithms. 2024; 17(10):443. https://doi.org/10.3390/a17100443

Chicago/Turabian Style

Arif, Muhammad Shoaib, Ateeq Ur Rehman, and Daniyal Asif. 2024. "Explainable Machine Learning Model for Chronic Kidney Disease Prediction" Algorithms 17, no. 10: 443. https://doi.org/10.3390/a17100443

APA Style

Arif, M. S., Rehman, A. U., & Asif, D. (2024). Explainable Machine Learning Model for Chronic Kidney Disease Prediction. Algorithms, 17(10), 443. https://doi.org/10.3390/a17100443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Machine Learning Model for Chronic Kidney Disease Prediction

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Data Collection

3.2. Preprocessing

3.2.1. Data Encoding

3.2.2. Data Imputation

3.2.3. Data Scaling

3.3. Feature Selection

3.4. Model Training

XAI Integration Module

3.5. Performance Metrics

4. Results

4.1. Performance Evaluation

4.2. Model Explainablity

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI