*2.1. Description of Naïve Bayes Classifiers*

All PROGRES-CKD models are NBCs. NBCs are probabilistic models based on application of the Bayes' theorem. The basic assumption of NBCs is conditional independence of predictors given the outcome. NBCs are represented through directed acyclic graphs (Figure 1). NBCs have been previously used in medical applications for diagnostic and prognostic reasoning in several therapeutic areas [24,25]. In fact, once derived and validated, NBCs generate metrics informing medical prognostic reasoning. First, they generate a risk score representing the expected incidence of a disease/event given a vector of known patient characteristics. Furthermore, NBCs can be used to generate value of information (VOI) statistics and impact metrics. VOI statistics represent the reduction in uncertainty (i.e., entropy) in the outcome variable that would be obtained had the value of missing

variables been observed instead [26]. Therefore, it can be used to prioritize additional diagnostic testing or biomarker assays for patients with incomplete medical records. Third, NBCs can provide impact metrics (i.e., Normalized Likelihood (NL) [27]) for each observed variable. Impact metrics can be interpreted as the magnitude of association of different subsets of evidence on the outcome variable.

**Figure 1.** The Bayesian Network structure of PROGRES-CKD. (**a**) PROGRESS-CKD-6; (**b**) PROGRESS-CKD-24.

#### *2.2. PROGRES-CKD Training*

In this application of NBCs, we aimed at developing a model to predict the risk of KRT initiation within 6 and 24 months. The risk score is anchored at 0.00 = no risk at all to 1.00 = certainty of failure within the prediction horizons.

We derived model weights for the PROGRES-CKD by a data-driven algorithm, exploiting the wealth of information collected in the European Clinical Database (EuCliD®, Fresenius Medical Care Deutschland GmbH, Bad Homburg, Germany), a large, multinational, database of CKD patients. All nephrology clinics belonging to the Fresenius Medical Care (FMC) NephroCare network confer data collected for healthcare practice into this centralized data-repository. EuCliD® is a fully codified database recording clinical, laboratory, socio-demographic, treatment and prescription data for each medical encounter [19,20]. Information is collected by healthcare professionals either manually or by means of interfaces to existing local data managing systems.

All non-dialysis dependent, stage 3–5 CKD patients receiving care in outpatient renal clinics belonging to the NephroCare network from 2017 to 2018 were screened for eligibility. We enrolled only patients who received at least one outpatient visit and one serum creatinine (s-cr) assessment. The endpoints of interest were KRT initiation within 6 and 24 months. We excluded patients dying before reaching the endpoint or before the end-of-follow-up (i.e., 6 or 24 months, depending on endpoint of interest). Overall, 22,535 subjects met the inclusion criteria. This initial dataset was randomly partitioned into 2 analytical samples: development (70%, *n* = 17,775), and validation (30%, *n* = 6760). The derivation of NBC weights was obtained with Hugin 8.5.
