**5. Conclusions**

In summary, combining laboratory results (creatinine and eGFR) with discharge summaries and ICD-10 billing codes had the best performance in a simple categorical classifier for phenotyping of CKD ≥ III and NKD. Logistic regression or mL models had the potential to further improve the correct identification of CKD ≥ III if only laboratory values were used and of NKD if data from previous hospital stays were included into models.

**Supplementary Materials:** Supplementary Materials are available online. http://www.mdpi.com/2077-0383/9/ 9/2955/s1, Table S1: Characteristics of the study cohort; Additional characteristics of the study cohort; Table S2: ICD-10 billing codes for definition of CKD; Table S3: ICD-10 billing codes for exclusion of NKD; Table S4: detailed performance characteristics for combinations of simple classifiers for identification of CKD and NKD; Table S5: Detailed AUC-ROC and -PR for combinations of di fferent classifiers for identification of CKD and NKD; Table S6: Cause for CKD in the CKD>III cohort; detailed cause for CKD ≥ III and source of information; Table S7: Incidence of AKI and AKI Recovery in the complete study cohort with creati-nine values (n=780) and in CKD>III cohort with creatinine values (n=372); Table S8: Source of information for etiologies of CKD>III; Table S9: Distribution of true positives and true negatives for CKD and NKD, in the training and test datasets; Table S10: Detailed performance characteristics for combinations of di fferent classifiers for identification of CKD and NKD; Table S11: Detailed AUC-ROC and -PR for combinations of different classifiers for identification of CKD and NKD; Table S12: Detailed performance characteristics for different generalized linear model networks for identification of CKD and NKD; Table S13: Detailed AUC-ROC and -PR for di fferent generalized linear model networks for identification of CKD and NKD; Table S14: Detailed performance characteristics for di fferent gen-eralized linear model networks for identification of CKD and NKD; Table S15: Detailed AUC-ROC and -PR for di fferent generalized linear model networks for identification of CKD and NKD; Table S16: Detailed performance characteristics for different random forest models for identification of CKD and NKD; Table S17: Detailed AUC-ROC and -PR for di fferent random forest models for identification of CKD and NKD; Table S18: Detailed performance characteristics for di fferent random forest models for identification of CKD and NKD; Table S19: Detailed AUC-ROC and -PR for di fferent random forest models for identification of CKD and NKD; Table S20: Detailed performance characteristics for di fferent neural networks models for identification of CKD and NKD; Table S21: Detailed AUC-ROC and -PR for for di fferent neural networks models for identification of CKD and NKD; Table S22: Detailed performance characteristics for di fferent neural networks models for identification of CKD and NKD; Table S23: Detailed AUC-ROC and -PR for for di fferent neural networks models for identification of CKD and NKD; Table S24: Detailed performance characteristics for di fferent generalized linear mod-els for identification of CKD and NKD; Table S25: Detailed AUC-ROC and -PR for for di fferent generalized linear models for identification of CKD and NKD; Table S26: Detailed performance characteristics for di fferent generalized linear models for identification of CKD and NKD; Table S27: Detailed AUC-ROC and -PR for for different generalized linear models for identification of CKD and NKD; Table S28: Detailed hyperparameters of different machine learning models; Table S29: Detailed hyperparameters of different machine learning models.

**Author Contributions:** Conceptualization, U.H., B.B. and M.K.; data curation, C.W. and L.R.; formal analysis, C.W., L.R. and L.M.; funding acquisition, U.H. and M.K.; investigation, C.W., L.R., C.L., T.K., B.B. and M.K.; methodology, C.W., B.B. and M.K.; project administration, M.K.; resources, L.M., C.L., T.K., U.H., D.A. and M.K.; software, C.W.; supervision, U.H. and M.K.; validation, C.W., L.R. and B.B.; visualization, C.W., B.B. and M.K.; writing—original draft, C.W. and B.B.; writing—review & editing, U.H., B.B. and M.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under gran<sup>t</sup> KI 564/2-1 and HA 2079/8-1 within the STAKI2B2 project (Semantic Text Analysis for Quality-controlled Extraction of Clinical Phenotype Information within the Framework of Healthcare-Integrated Biobanking).

**Conflicts of Interest:** The authors declare no conflict of interest.
