**4. Discussion**

The results of our study demonstrate that laboratory values have the best performance for identifying CKD ≥ III and NKD from EHRs compared to discharge summaries and ICD-10 billing codes in an elderly multimorbid cohort of hospitalized patients. Combining classifiers based on laboratory values (creatinine/eGFR), ICD-10 billing codes or ICD-10 codes extracted from discharge summaries outperformed each component alone for identification of CKD ≥ III and NKD. Classification could be further improved by calculation of logistic regression and mL models if data were restricted to laboratory values (CKD ≥ III) or if additional values from previous hospital stays were added (NKD).

Although each of the mentioned EHR components have been investigated before, we could demonstrate the extent to which the classification is improved by combining laboratory values with ICD-10 billing codes and discharge summaries. Furthermore, we are the first, to our knowledge, to describe classification performance for NKD.

The good sensitivity and specificity of laboratory values for the identification of CKD ≥ III and NKD can be explained by the fact that both entities are mainly defined by blood creatinine and eGFR values [3,26]. However, many epidemiological studies and clinical trials have utilized ICD-10 billing codes for defining CKD status [4]—more than 50% of cardiovascular trials do not report eGFR measurement in respective study populations [45].

Previous studies have demonstrated a high specificity of billing codes. However, many CKD patients will be overlooked by using billing codes alone and the identified cohort is biased towards more advanced CKD stages with higher creatinine values [5,46,47]. These results have been replicated and confirmed in the current study. A sensitivity of 75% indicates that approximately one-quarter of patients with advanced CKD ≥ III had been missed by ICD-10 billing codes. Patients recognized by ICD-10 billing codes had a lower eGFR and showed a higher morbidity in comparison to the reference standard.

However, the sensitivity of ICD-10 billing codes was much better in our study than in a recent study by Diamantidis et al. who reported a very low sensitivity of ICD-10 billing codes for recognizing CKD > III [43]. The discrepancy might be explained by di fferences in the patient cohorts as the latter study included non-hospitalized patients.

Gomez-Salgado et al., in contrast, recently showed good correlation between ICD-10 billing codes and researchers' judgment based on clinical documentation [48]. A possible explanation for the conflicting results between our study and Gomez-Salgado et al. could be the extent to which laboratory values were considered for identification of CKD.

Our study also confirms previous findings of slight under-documentation of CKD using discharge summaries [49]. Indeed approximately 20% of patients with advanced CKD ≥ III were not identified by discharge summaries. However, in line with the study of Singh et al., we could also show that the sensitivity of discharge summaries is higher than the sensitivity of billing codes for CKD [9]. The reduced specificity of discharge summaries could be explained by the fact that many patients with CKD stage I and II were counted as CKD ≥ III. Differing definitions for chronic kidney disease might also be the reason why a recent study by Hernandez-Boussard et al. observed a better accuracy for unstructured discharge summaries for recognizing CKD compared to our study [50]. Other possible explanations are different information sources and a different study cohort.

In a study by Nadkarni et al., an algorithm was developed and evaluated to identify patients with CKD Stage III caused by hypertension or diabetes, using structured and unstructured information from EHRs [51]. The algorithm based on keywords from medical notes and laboratory values outperformed phenotyping by ICD-10 billing codes by a margin. These results resonate with the outcome of our study that included advanced CKD from any cause in hospitalized patients.

Missing previous health records is a common problem in clinical studies and might affect correct identification of diseases [52]. However, in contrast to the identification of patients with diabetes mellitus [53], we can demonstrate good F1 score (>0.8), although using datasets restricted to the current hospital stay for simple classifiers. For CKD ≥ III, mL models based on laboratory values alone had a similar AUROC as the simple categorical classifiers including discharge summaries and ICD-10 billing codes. This indicates that mL models might be able to—at least partly—compensate for missing information.

The results of our study are encouraging, not only for stratification of patients for clinical and epidemiological studies, but also in the context of, e.g., Healthcare-Integrated Biobanking, where automated classifiers based on minimal clinical information are of grea<sup>t</sup> importance for early selection of samples of specific disease entities.

Structured information such as laboratory values and billing codes are often readily available. Results from our study show that a PPV of 0.77, 0.82 or 0.91 can be achieved for the identification of CKD by using eGFR values at admission, at discharge or from the complete hospital stay, respectively. This is in line with other studies demonstrating that a single measurement of eGFR might overestimate the number of CKD cases [54]. The slightly higher PPV when using eGFR values at discharge compared to admission can be explained by the fact that interfering acute kidney injury is more likely to be present at admission than after a successful treatment at discharge.

Suboptimal PPV values associated with false classification can significantly impact the phenotyping process and thus might cause severe bias in the outcomes of subsequent studies. Consequently, there is a need for further optimization of CKD and NKD classification.

Wei et al. combined different sources of information (primary notes, medication and billing codes) to improve phenotyping based on EHR for several chronic diseases (not CKD though) and demonstrated that PPV and F1 score can be increased by combining different information sources [55]. Results from Wei et al. can be confirmed in our study in relation to CKD and NKD with the caveat that eGFR should be included in any combination.

The addition of discharge summaries and/or ICD-10 billing codes to laboratory values not only increases the performance of correct identification of CKD ≥ III but also helped to further specify the cause of the disease in at least one-third of the cohort. There were more etiologies for CKD in the discharge summaries compared to the ICD-10 billing codes.

Another novelty of this study is that, to the best of the authors' knowledge, for the first time the entity of NKD (no known kidney disease) was investigated using EHRs. Identifying NKD is a challenging task because ICD-10 billing codes and discharge summaries are designed to describe the presence of illness rather than its absence. However, the question of NKD might be of particular interest for scientific reasons. The validity of association studies and clinical trials depends on the correct assignment of co-morbidities. If large cohorts of CKD patients are counted as NKD, studies might be biased and results might thus be flawed. Our study demonstrates that single EHR sources had low PPV and AUCPR for NKD assignment. Combining laboratory values with discharge summaries improved PPV and AUCPR. Interestingly, the further addition of ICD-10 billing codes to this combination did not result in a further improvement of PPV and AUCPR. Future epidemiological studies should take these results in consideration for classification of NKD.

Finally, we demonstrated that logistic regression and mL algorithms have the potential to improve recognition of CKD ≥ III and NKD, particularly in certain scenarios of data availability. This might be helpful for the development of clinical decision support systems (CDSS) in the near future that ultimately will allow clinicians and researches almost instantly to evaluate the chronic kidney status of patients.

Direct comparison with other studies applying mL strategies for the detection of CKD is hampered due to di fferent definitions of CKD, di fferent patient cohorts and data variables used. Almansour et al. described an Artificial Neural Network with an accuracy of more than 99% [20]. Salekin et al. used the same cohort and reduced the number of variables down to 12 and achieved an F1 score of 99% by using a wrapper approach to identify the best subset of attributes and a random forest classifier [56]. However, both studies rely on the same data source comprising 24 variables of 400 patients to build a predictive model. In contrast to our study, the dataset does not include series of creatinine measurements or information from discharge summaries or ICD-10 billing codes about CKD. Rashidian et al. used laboratory values, demographics and ICD-10 billing codes to identify patients with CKD achieving a F1 score of approximately 0.8 [57]. In our study, AUROC and AUCPR for identification of CKD from mL algorithms surpassed 0.95 in all scenarios of unrestricted or restricted data availability. One reason for these di fferences could be that the study by Rashidian et al. did not use discharge letters as source of information. As mentioned before, in our study discharge summaries can add valuable information to the classification process. This is also reflected by the result that mL algorithms did not significantly improve performance of CKD ≥ III identification (AUROC 0.97) compared to a simple classifier based on laboratory values, discharge summaries and ICD-10 billing codes (AUROC 0.96).

The mL algorithms used in our study failed to outperform rule-based classifiers for identification of NKD if data were restricted to the index hospital stay: although AUROC is (non-significantly) increasing, PPV is declining and thus superiority of the models has to be rejected. An explanation for this result could be that the correct assignment of NKD mainly depends on the availability of the complete dataset. Additionally, we cannot exclude that the low prevalence of NKD in our morbid patient cohort a ffected the e fficacy of mL strategies.

To the best of our knowledge, this is the first study trying to detect specifically CKD Stage ≥ III and NKD by mL methods. Therefore, it is mandatory that the proof-of-concept presented here needs further elaboration in larger independent patient cohorts.

The strength of the study is the comprehensive dataset including discharge summaries of the index hospital stay and laboratory values with a reviewed reference standard.

Several limitations need to be acknowledged. The patient cohort included in the study was quite morbid and not representative of a general hospital population or, even more so, an outpatient population. Therefore, the extent of improvement by combining di fferent information sources needs to be prospectively validated in other independent cohorts.

The Averbis Health Discovery software tool was used for the extraction of information attributes from discharge summaries that have been predefined by the authors. The use of natural language processing (NLP) methods for information extraction and automated feature selection could have resulted in an increased performance of the data extraction method.

Similarly, the total number of patients was rather small for training mL classifiers. We may guess that, in a larger patient cohort, the performance of the di fferent models might further increase. However, the scope of the present study was to demonstrate the feasibility and potential of using eHealth sources and mL models to improve phenotyping of CKD and NKD.

The models presented in this manuscript focus on the detection of advanced CKD (Stage III or higher) or on the absence of kidney disease. Patients with mild CKD (Stage I and II) are not taken into consideration although the correct identification of this group might be important for clinical treatment and research purpose. Future studies with larger patient cohorts might be able to develop more granular models di fferentiating between mild and advanced CKD.

Another limitation is that neither a single rule nor a combination of them achieved a sensitivity for identification of CKD ≥ III of 100%. This could be explained by the fact that most patients were treated primarily for non-nephrological reasons during the index hospital stay and thus CKD was not mentioned at all in the current discharge summaries or by the ICD-10 billing codes, although they had a documented eGFR < 60 mL/min/1.73m<sup>2</sup> for a period longer than 90 days.

Furthermore, data included in the analysis were incomplete, since laboratory results from primary care or other institutions (for example, from general practitioners or other hospitals) were not available. Most importantly albuminuria was available in less than 5% of the whole cohort and could therefore not included in the analysis.

Missing data, however, reflects "real-world" conditions. Missing data can be, at least partly, compensated for—as shown in our study—by the extraction of unstructured information from the discharge summaries that usually contain a multitude of pre-existing health data from other healthcare providers.
