1. Introduction
The progressive decline in kidney function over time characterizes CKD, which represents a significant and escalating public health concern globally. The kidneys must filter out excess water and waste from the bloodstream to produce urine and keep the body at a steady temperature and pressure. The term “chronic kidney disease” is derived from the fact that the kidneys are gradually damaged, and the condition is frequently undetected until it is too late. This quiet progression of CKD emphasizes the importance of early detection because the body can adjust to lower kidney function in the early stages, which is why there are typically no symptoms. Many people don’t get treatment until their disease has gotten much worse, and the diagnosis is often made by chance in a blood or urine test that would have been normal otherwise [
1,
2,
3,
4].
Risk factors like hypertension, diabetes, and cardiovascular diseases significantly increase the likelihood of developing CKD [
5,
6]. The disease’s prevalence has increased globally, partly due to the rise in these risk factors, with an estimated 843.6 million individuals affected in 2017 [
7]. Despite the progress made in the treatment of CKD, it remains a significant cause of mortality on a global scale, requiring vigilant monitoring and intervention to reduce its progression and impact [
8].
Healthcare practitioners depend on essential diagnostic procedures, such as the glomerular filtration rate (GFR) and urine tests for albumin, to diagnose and monitor CKD [
9]. Using the blood test results and patient information, the GFR indicates how well the kidneys function to rid the body [
10]. A GFR of less than 60 may signal kidney disease, whereas numbers under 15 mean kidney failure and the need for dialysis or transplant [
11]. A urine test for albumin is a valuable way to identify kidney disease because a properly functioning kidney does not let this protein into your pee [
12]. While undoubtedly helpful, these tests have their limitations in the accuracy of predicting disease progression or revealing its underlying physiological mechanisms [
13,
14].
ML has demonstrated significant potential in improving the diagnosis of various medical conditions [
15,
16,
17]. In particular, the analysis and diagnosis of CKD have benefited from the application of ML techniques [
18,
19,
20]. ML models can sift through complex datasets in search of insights and patterns that would be impossible to uncover with more conventional forms of research [
21,
22]. However, many ML models are notoriously difficult to understand, leading some to call them “black boxes.” A black box model provides outputs without revealing or explaining the internal decision-making processes [
23]. While these models excel at prediction and classification, their lack of interpretability and transparency can hinder therapeutic decision-making, leaving patients and doctors with unanswered questions about the diagnosis and treatment [
24,
25].
Introducing an explainable ML model is a significant advancement in this area. By integrating a MLP model with LIME, our suggested approach does more than just predict CKD; it also explains how and why it achieved this prediction. This openness allows for a better knowledge of the disease’s origins, allowing for more informed and personalized clinical recommendations. The proposed explainable ML model is being thought about as a way to get around the main problems with current and unclear ML methods for diagnosing CKD. An essential resource for nephrologists, the model delves deep into the variables that influence its projections. This innovation makes CKD detection more precise and sheds light on how the disease develops. As a result, patients may get more effective, personalized care.
Here is the structure of the rest of the paper:
Section 2 presents a comprehensive literature review on CKD from the last two years, highlighting the unique features of our study while also addressing their shortcomings. Our methodology, outlining the proposed system model, is described in
Section 3. Analyzing the experimental results is the focus of
Section 4.
Section 5 presents a thorough discussion of our suggested model compared with previous research, including its strengths, weaknesses, and limitations. The conclusion is presented in
Section 6, which also includes an examination of possible directions for further research inquiry.
2. Literature Review
In recent years, ML has made remarkable progress in illness prediction and clinical decision-making in the CKD field. Several studies propose ML models for the detection of CKD and perform a review of the developments in this field. This literature review will summarize and analyze the latest research from 2023 and 2024. It will touch on their methodologies, results, and boundaries and offer a comprehensive assessment of the actual progress in this field—and where its research problems stand.
R. K. Halder et al. (2024) developed an ML-based CKD prediction model. The model’s data preprocessing includes imputed missing data, min–max scaling, and categorical variable to numerical conversion. Feature selection methods such as lasso regression, ridge regression, sequential forward selection, variance cutoff, correlation analysis, and chi-square tests are used to refine the datasets. Various predictive models were employed to forecast CKD, including decision trees (DTs), random forests (RF), adaptive boosting (AdaBoost), support vector machines (SVM), extreme gradient boosting (XgBoost), naïve bayes (NB), and gradient boosting machine (GBM). RF and AdaBoost achieve 100% accuracy in validation methods like 70:30, 80:20, and 10-fold. The study did not include explainability measures to increase model transparency [
26].
N. Alturki et al. proposed the TrioNet Model for CKD in 2024, which takes an ensemble of extra tree classifiers, RF, and XgBoost as its base model. The K-nearest neighbors (KNN) imputation was employed by the researchers to fill in missing data, and they applied the synthetic minority over-sampling technique (SMOTE) for data balancing. This resulted in an accuracy of 98.97%. Two significant drawbacks of this study are that it does not use feature selection and hyperparameter optimization (HPO) techniques. Additionally, applying SMOTE to the entire dataset rather than just the training set introduced biases and constraints, and the study lacked explainability techniques to enhance model transparency [
27].
A study by M. M. Rahman (2024) focused on CKD prediction using various ensemble learning classifiers, including AdaBoost, GBM, XgBoost, light GBM, RF, voting, stacking, and bagging. Multivariate imputation by chained equations was used to address missing data, and the borderline SVM-SMOTE method was employed for data balancing. Recursive feature elimination (RFE) and the Boruta method were employed to identify significant features, with RFE demonstrating superiority by selecting only 50% of the total features. Multiple performance metrics were utilized to identify the most effective classifiers for chronic kidney disease detection. Light GBM outperformed other models with the lowest compilation time and highest accuracy, achieving an average accuracy of 99.75%. However, like the different studies, this research applied the data balancing technique to the entire dataset, which might lead to biases, and did not integrate explainable techniques for model transparency [
28].
P. Mahajan et al. (2024) used a number of datasets to compare and contrast different ensemble ML methods for illness prediction. The researchers tested bagging, boosting, and stacking ensemble ML algorithms with different base classifiers. Using grid search HPO, ensemble approaches obtained 100% accuracy on the UCI CKD dataset. The study did not use sophisticated imputation methods to manage missing data, feature selection methods, or cross-validation (CV) to evaluate model generalization. Their methodology’s absence of explainable procedures raises problems about model transparency and interoperability [
29].
In their study, Kaur et al. (2023) sought to utilize the UCI CKD dataset to develop a ML model that can accurately identify CKD. In the analysis of missing data, the researchers employed Little’s MCAR test. For the purpose of feature selection, ant colony optimization was utilized. The classification was accomplished using DTs, RF, and KNN algorithms; the RF classifier yielded a 96% accuracy rate. The work did not address issues with explainability, HPO, or advanced ML approaches [
30].
D. Swain (2023) conducted a study on predicting CKD using ML methods. The study specifically utilized the UCI CKD dataset. Mean imputation was used for numerical variables and mode imputation for categorical variables to fix the missing data issue. The researchers utilized the SMOTE technique to balance the data and determined nine crucial features based on the chi-squared score. By employing RF and SVM methods, they fine-tuned hyperparameters using grid search CV, resulting in an accuracy of 99.33% for SVM and 98.67% for RF. The 10-fold CV score demonstrated the model’s capacity to generalize. Nevertheless, applying SMOTE to the entire dataset instead of only the training set may induce biases. The study did not employ any explainable strategies to guarantee model transparency [
31].
The study conducted by M. S. Arif et al. (2023) introduced an ML model for predicting CKD. The model used a sequential data scaling technique that combined robust, standard, and min–max scaling, and iterative imputation was used to deal with missing values. The researchers used Gaussian NB and KNN models, optimizing hyperparameters by grid search CV. As a result, they attained a 100% accuracy rate with KNN and a 97.5% accuracy rate with Gaussian NB. Their model’s generalization was validated through a 10-fold CV score. Yet, the absence of explainable techniques in their approach raises concerns about the model’s transparency [
32].
In 2023, numerous further research projects also made contributions to the prediction of CKD using ML approaches on the UCI CKD dataset. A study that was carried out by A. Farjana et al. showed that Light GBM outperformed other models in predicting CKD, with an impressive accuracy rate of 99% [
33]. Furthermore, various ML classifiers were examined by M. A. Islam et al., with the XgBoost classifier achieving the highest performance metrics at 98.3% [
34]. V. K. Venkatesan, in a similar vein, compared the efficacy of XgBoost with a variety of base learners, such as SVM, KNN, RF, logistic regression, and DTs. The findings demonstrated that XGBoost surpassed its competitors with an accuracy of 98.00% [
35]. S. M. Ganie et al. conducted a comparative analysis of the performance of various boosting algorithms: XgBoost, CatBoost, LightGBM, AdaBoost, and GBM. AdaBoost demonstrated superior overall performance, attaining an accuracy of 98.47% [
36]. G. Shukla et al. used various ML techniques like KNN, DTs, and artificial neural networks, finding that the DTs showed the best result with 98.60% accuracy [
37]. All these studies have limitations, such as a lack of advanced preprocessing steps, advanced ML models, and explainable techniques for transparency.
After going over this research, it’s obvious that there are a lot of holes and restrictions that need to be filled in order to improve CKD prediction. This study aims to fill a gap in the existing literature by addressing these limitations and adding new perspectives. Our work primarily introduces the following innovations:
Effective preprocessing steps are used to improve the quality of the dataset, including KNN imputation for numerical features and mode imputation for categorical features.
Feature selection is performed using the SelectKBest method with mutual info score to select the top 12 features, enhancing decision-making and reducing computation time.
For the purpose of detecting and predicting CKD, an MLP model is suggested. The model is trained using 75% of the dataset and validated using 25%. By using LIME, one may better understand the model’s predictions and the reasoning behind them.
Several assessment metrics, including accuracy, precision, recall, F1-score, and curve analysis, are employed to confirm the efficacy of the proposed model. In addition, we evaluate the performance of the proposed model by comparing it to other models such as RF, DTs, KNN, ridge classifier, logistic regression, stochastic gradient descent (SGD), Bernoulli NB, and Gaussian NB.
4. Results
In this research, various tools were utilized, such as the Python programming language and several libraries, including NumPy for performing mathematical operations, Pandas for data manipulation and analysis, Matplotlib for creating various types of plots, LIME for explainable AI, and Scikit-learn for building ML models, performing data preprocessing, feature selection, and model evaluation.
4.1. Performance Evaluation
In this study, we propose an ML model for the effective prediction of CKD. The dataset that we utilized was taken from the UCI ML repository, and it has a considerable number of values that are missing. KNN imputation was applied for numerical values, and mode imputation was used for categorical values. Subsequently, we scaled the data using min–max scaling. The dataset comprises 24 predictive features, and we used the SelectKBest method with mutual information score to select the 12 most important features, ensuring that our model is well-formed and easy to use. Our model was trained using a two-hidden layer MLP model. Two hidden layers are present: one with 59 neurons and the other with 94 neurons. We used the Adam solver and the ReLU activation function. The data were divided into 75% for training and 25% for testing. Accuracy, precision, recall, and F1-score were among the metrics used to evaluate the model.
To compare our proposed model, we also implemented other ML models such as ridge classifier, SGD Classifier, Bernoulli NB, logistic regression, Gaussian NB, random forest, and decision tree using the Scikit-learn library. The accuracy, precision, recall, and F1-score, among other performance indicators, are displayed in
Table 4 for both our proposed model and the other models. We achieved 100% accuracy, precision, recall, and F1 score with our proposed model, which is superior to others.
Figure 6 visually illustrates the performance comparison between the proposed model and other models.
To further evaluate the performance of our model, we calculated the area under the ROC curve (
Figure 7) and the precision-recall curve (
Figure 8). Our proposed model outperformed the others, achieving a score of 1.
4.2. Model Explainablity
Our study utilizes a LIME model to gain insights into our ML model’s predictions.
Figure 9 showcases this for a patient predicted with a high probability (0.98) of CKD. Elevated hemoglobin (>14.80) and serum creatinine (0.90–1.30) are key factors influencing this prediction, despite the absence of hypertension and diabetes. The patient’s specific values (hemoglobin 16.30, creatinine 1.00) align with these high-risk features.
Conversely,
Figure 10 demonstrates a patient confidently (1.00) predicted to have normal kidney function. Here, the model emphasizes the importance of normal serum creatinine (>2.70) and albumin (>2.00) in excluding CKD. The patient’s values (creatinine 15.00, albumin 3.00) support this interpretation.
5. Discussion
CKD is a progressive condition that often goes undetected until advanced stages, making early diagnosis crucial for effective intervention. The analysis and diagnosis of CKD are being enhanced by applying ML techniques. ML models can sift through complex datasets in search of insights and patterns that would be impossible to uncover with more conventional forms of research. The research presented in this paper focuses on the development of an explainable ML model for the prediction of CKD. Our study aims to address key challenges in ML-based medical diagnosis, including dataset balancing, handling missing data, feature selection, and model transparency.
We employed the CKD dataset from the UCI ML Repository, renowned for its substantial presence of missing values. This problem was addressed by taking a comprehensive approach to missing data by using KNN imputation for numerical values and mode imputation for categorical values. Min–max scaling was used to normalize the feature values during data scaling. This approach improved the performance of the model. There are 24 predictive features in the dataset. Albumin, specific gravity, blood glucose random, sodium, potassium, hemoglobin, packed cell volume, red blood cell, hypertension, and diabetes mellitus were among the twelve most important features that were chosen using the SelectKBest technique with mutual information score. These selected features, significant both statistically and clinically for CKD, This selection procedure preserves our model’s robustness and generalizability. Our proposed MLP model comprises an output layer, two hidden layers, and an input layer. The first hidden layer consists of 59 neurons, and the second hidden layer has 94 neurons. The model uses the ReLU activation function. To prevent overfitting, we apply an L2 penalty with a value of 0.005. The learning rate is set to 0.01, and the model is trained for 100 iterations. We add a momentum of 0.3 to help accelerate gradient vectors towards faster convergences and use a random state of 42. The model was optimized using the Adam solver. To thoroughly evaluate the model’s performance, the dataset was divided into 25% for testing and 75% for training. Several measures were used to assess our proposed model, such as F1-score, recall, accuracy, and precision. By comparing our model to other popular ML models, we found that it performs better than Bernoulli NB, Gaussian NB, logistic regression, DTs, RF, ridge, and SGD classifiers. In addition, we compared our model to other models in the literature that utilize the same dataset.
Table 5 shows that our model outperforms the prior studies regarding accuracy and overall performance, demonstrating our superiority.
The main problem with ML models is that they are not transparent. Even though ML models are very accurate, doctors and patients still don’t always trust them, which is a major issue in the healthcare industry. These models have limited practical application in healthcare due to the lack of openness surrounding their prediction processes, such as disease diagnosis. Lack of trust occurs when healthcare providers are unable to comprehend or justify the reasoning behind a machine learning model’s diagnosis. Because of this, they have not been widely used in practice, especially in the medical sector. To combat this, XAI techniques can make decision-making processes more open and clear, which benefits patients and doctors alike. More effective and moral AI healthcare applications result from such openness. Predictions made by AI in healthcare are extremely significant since, if not properly understood, they might cause wrong diagnoses, treatment suggestions, or potentially fatal outcomes [
54,
55,
56].
Our research makes a noteworthy addition by incorporating XAI techniques to improve the transparency of the model. We utilized the LIME technique to offer clear and comprehensible insights into the predictions made by the model. This integration guarantees that the model not only forecasts CKD but also explains its predictions, addressing the crucial matter of transparency in ML-based medical diagnosis. The explanations produced by LIME aid physicians and patients in comprehending the determinants that impact the judgments made by the model, therefore enhancing trust and enabling more knowledgeable medical choices. According to this study’s results, implications for medical diagnostics ML model development are substantial. Our technique provides a trustworthy and reliable tool for CKD prediction by improving upon previous efforts in data imputation, feature selection, and model explainability.
6. Conclusions
This paper presents an explainable ML model for the early identification of CKD, employing an MLP architecture alongside LIME to improve interpretability. The suggested model exhibited commendable efficacy in predicting CKD while providing transparency in its decision-making process, thereby eliminating a significant obstacle to the integration of machine learning in healthcare. This model enhances trust by elucidating the prediction process, enabling healthcare practitioners to assess its dependability, which may lead to improved clinical decision-making and patient outcomes. The incorporation of explainable AI facilitates the connection between sophisticated technology and clinical application, hence fostering a broader acceptance of machine learning models in medical practice. Subsequent efforts will concentrate on verifying the model’s efficacy using larger and more varied datasets, in addition to enhancing its interpretability features to better assist medical practitioners in comprehending the model’s predictive behavior. Our research concludes by presenting an explainable ML model for CKD prediction to solve crucial difficulties and lay the groundwork for breakthroughs in ML-based medical diagnostics. The combination of LIME and our MLP model is a big step toward implementing trustworthy and transparent AI in healthcare, with the ultimate goal of enhancing patient outcomes and the clinical decision-making process.
Future research could further investigate applying our methodology to additional medical disorders to validate the generalizability and utility of explainable ML models in healthcare. Furthermore, our model exhibits promising results; however, it is imperative to continuously update and validate it with a broader and more diverse array of datasets to guarantee its long-term reliability and effectiveness. The model’s practical applicability could be further enhanced by integrating other advanced XAI techniques and developing user-friendly interfaces for clinical practitioners in future research.