This section provides an overview of the CKD dataset, along with the performance metrics and a detailed analysis of the proposed system’s results.
5.3. Results Analysis
Examining model performance is essential to see how each algorithm performs in classification during testing and training. The model learns the hidden patterns in the data during training, and then it uses the unseen data to make predictions during testing. Seventy percent of the data was used to train the model. The optimized models were tested using the remaining 20%. After each model was optimized, we compared seven different models to assess how well they performed. As shown in
Table 5, KNN imputation gives good results. This evaluation considered several parameters, including recall, accuracy, precision, and F1-score when dealing with missing values with traditional machine learning imputation using KNN imputation [
32]. However, from
Table 6 and
Figure 7, it is shown that improvement in the results when dealing with missing values using GANs. It is revealed that the outperformance of both prototypical networks learning and MAML is all across all metrics. Prototypical networks learning is shown its efficacy in the learning phase by achieving high accuracy (99.99%) and outstanding precision (99.9%). Also, its recall score (99.2%) is good; prototypical networks few-shot learning achieved flawless training results. Like the prototypical networks few-shot learning model, the MAML model achieved good results. So, the few-shot learning models were chosen as the top-performing models after considering; this is especially significant in a CKD where a precise diagnosis is essential. High precision minimizes false positives, saving patients needless worry and additional testing, while high sensitivity (recall) is essential for guaranteeing that patients with CKD are accurately detected [
33]. A box plot for the ML model performance with GAN imputation is illustrated in
Figure 8.
Confusion matrices highlight each model’s classification abilities in further depth (
Figure 9). The classification results are sorted into four groups by these matrices. Instances where a CKD patient was accurately detected are known as true positives (
TP). True negatives (
TN) are instances in which a person without chronic kidney disease is correctly identified. False positives (
FP) are instances in which a person was mistakenly diagnosed with chronic kidney disease (CKD) when they tested negative. Finally, cases where we mistakenly classified a person as non-CKD when they were truly positive are known as false negatives (
FN) [
34].
These confusion matrices help in understanding model performance and detecting possible misclassifications. Prototypical networks and MAML achieved perfect classification, correctly identifying all CKD (TP = 29) and non-CKD cases (TN = 51) with zero misclassifications (FP = 0, FN = 0). This resulted in an MCC of 1.0, indicating optimal classification performance. Random forest (RF) and voting ensemble models also exhibited exceptional accuracy, correctly classifying 49 non-CKD cases and 30 CKD cases, with only one misclassification (FP = 1, FN = 0). Both models attained an MCC of 1.0, reflecting high reliability and minimal error rates. The decision tree (DT) demonstrated strong classification performance, correctly predicting 49 non-CKD cases and 29 CKD cases, with a slight misclassification of one CKD case as non-CKD (FN = 1, FP = 1). Its MCC of 94.67% shows it remains a dependable classifier. Logistic regression (LR) and support vector machine (SVM) performed similarly, with 48 correct non-CKD classifications and 29 correct CKD predictions, but with two non-CKD cases misclassified as CKD (FP = 2). Their MCC scores of 94.73% confirm that they remain robust but slightly less precise than ensemble models.
An assessment of the trade-off between true positive and false positive rates can be made by looking at the ROC curves as shown in
Figure 10 and the corresponding AUC values for the models used in the categorization of chronic renal disease. The prototypical networks, a few-shot learning model, and MAML achieved an exceptionally high AUC of 0.999, demonstrating outstanding generalization capability and superior classification performance. The decision tree (DT), with an AUC of 0.97, showed reliable classification performance, though slightly lower than other models. The support vector machine (SVM), logistic regression (LR), ensemble learning, and random forest (RF) models all achieved an AUC of 0.98, confirming their strong ability to accurately distinguish between CKD and non-CKD cases [
33]. These results validate that few-shot learning models, particularly prototypical networks, provide the best classification performance.
To understand the logic behind CKD predictions, the model explanations were interpreted using the best model, prototypical networks learning. The SHAP (SHapley Additive exPlanations) global explanation of CKD data is shown in
Figure 11. Global explanations encompass the entire dataset.
According to the findings of the performance analysis carried out using these statistical indices, prototypical networks learning is the most effective model for predicting the KCD. It consistently outperformed both MAML and other machine learning across all criteria. The prototypical networks few-shot learning model outperformed other machine learning models, and the underlying process of the outcomes it generated was examined using the SHAP explanation.
Figure 11 displays the meaning of absolute SHAP values, or feature importance, for the prototypical networks model. This figure assesses how interpretable for MAML, a voting ensemble, and prototypical networks. MAML and prototypical networks have more balanced feature importance distributions and more structure to them, while the voting ensemble model seems more scattered. Prototypical networks effectively point out the key features that impact prediction within a narrower scope that guarantees that the model predicts based on the most relevant information. MAML, on the other hand, attributes importance to a wider spectrum of features while retaining stability, which speaks of its generalization capacity in different scenarios. In contrast, the voting ensemble model seems to give tremendous importance to a few of the features, neglecting others, resulting in overfitting and limited adaptability to the real world. Therefore, the approaches pursued by MAML and prototypical networks are seen as much more trustworthy and interpretable, as they could be seen to enhance robustness or relevance in feature selection. LIME is a potent XAI technique that may be used to comprehend the intricate correlations between KCD measures and their influence on overall kidney potability since it approximates a complex machine learning model with a simpler, interpretable model. In terms of KCD, this implies that LIME can help determine which KCD criteria are most important in predicting a particular KCD. The model predicted whether the kidney was diseased or not because of these characteristics. Researchers and decision-makers can benefit greatly from this information since it helps pinpoint the specific issues that must be resolved to improve the KCD prediction, as illustrated in
Figure 12.
From the figure above, it is shown that the most relevant features will have the strongest effects, such as “dm −0.52” (diabetes mellitus), which has the most positive effect, and “cad −0.28” (coronary artery disease), which has the most negative effect. Other important features include “htn > 1.26” (hypertension) and “bp > −0.02” (blood pressure), indicating their significant role in the model’s decision making. The LIME method decomposes individual predictions to facilitate the understanding of black-box models, ensuring transparency and interpretability in AI-based CKD detection.
The study’s findings highlight how well the prototypical networks GANs model and explainable artificial intelligence (XAI) work together to diagnose chronic kidney disease (CKD). Our study confirms prototypical networks’ applicability in the setting of CKD, which is consistent with earlier research showing its accuracy and efficiency in many medical scenarios. The current study, however, fills a critical gap in the field of medical AI where understanding the rationale behind the model is just as important as the diagnostic outcome. It goes beyond simply concentrating on diagnostic accuracy and emphasizes the importance of model interpretability utilizing XAI. This study has noteworthy practical benefits, particularly in kidney care. This work demonstrates the reliability of explainable machine learning in diagnosing CKD, which may pave the way for the eventual integration of these technologies into routine clinical practice. This combination could lead to a faster and more accurate diagnosis of CKD, enabling prompt action to improve patient outcomes and reduce the progression of the illness. Physicians will be using SHAP and LIME in CKD diagnoses by interpretable insights into machine learning modeling. By allowing importance to be laid on serum creatinine, diabetes mellitus, and proteinuria, SHAP serves to help hospitals in validating the prediction by an AI model. By explaining a diagnosis as due to certain symptoms, such as high blood pressure or diabetes history, LIME elucidates individual diagnoses that contribute to a patient’s CKD classification. Besides identifying indicators of deterioration in hemoglobin levels, SHAP could provide predictions on the progression of CKD, enabling early intervention in most cases. SHAP could also identify biases in such AI models, thereby offering itself to unbiased risk assessment across different demographics.
The comparison with previous studies shows that the proposed technique provides outperformance against all other methods in the literatures as shown in
Table 7.
The encouraging results of this study point to the need for additional research, particularly to examine the model’s performance in a range of clinical settings and patient types. To ensure the model’s adaptability and growth potential, future studies should concentrate on validating its performance through multicenter trials encompassing a broad range of clinical and demographic parameters. Furthermore, incorporating social determinants of health and genetic markers may improve the model’s predictive power and provide a more all-encompassing strategy for CKD management. The goal is to seamlessly incorporate AI-powered solutions into healthcare systems, revolutionizing the management of chronic kidney disease through models of customized, predictive, and preventative care.