*3.2. Model Performance*

All versions of the model showed a good performance over the validation period. Figure 3 shows trends in AUC values of the three model versions over a 1-year observation period. Variability in prediction accuracy decreased as retraining was applied: version 1's average AUC was 0.73 (95% CI 0.55–0.91), AUC of version 2 was 0.75 (95% CI 0.65–0.86),

while version 3 had a more stable performance with an average AUC of 0.79 (0.74–0.85). The ROC-AUC diagram for the three model versions have been reported in Figure 4.

**Figure 4.** *Cont*.

**Figure 4.** Panel (**a**–**c**) respectively contain the ROC-AUC plot related to Model 1, Model 2, and Model 3 evaluated on the following dates: 15 April 2020, 1 August 2020, and 15 November 2020.

In order to demonstrate the potential use of the model, we geographically mapped the risk on a few exemplary dates, i.e., the 2 August 2020, 4 October 2020, 1 November 2020, and 3 January 2020 (Figure 5). The graphical representation visually highlights clinic clusters according to the risk of a COVID-19 outbreak occurrence within 2 weeks (Figure 5, left panels, colored circles denote the low, medium, and high-risk categories). There was substantial correlation between the predicted risk (Figure 5, left panels) and the actual outcome (Figure 5, right panels) on all of the validation dates.

**Figure 5.** *Cont*.

**Figure 5.** COVID-19 outbreak risk mapping in European clinics of the Nephrocare network. Geographical risk maps were built considering epidemic data related to the following exemplary dates: (**a**) 2 August 2020, (**b**) 4 October 2020, (**c**) 1 November 2020, and (**d**) 3 January 2020. Panels on the left show clinic clusters according to the risk of a COVID-19 outbreak occurrence within 2 weeks: Red circles: risk > 12.5%; Yellow, 1.5% < risk ≤ 12.5%; Green, risk ≤ 1.5%. Panels on the right report the actual incidence of COVID-19 outbreaks in the forecasting period.

Tables 1 and 2 report the classification performance in terms of P(Outbreak|Class) (i.e., probability of outbreak (Yes/No) given the assigned risk class (L/M/H)) and P(Class |Outbreak) (i.e., probability of the assigned risk class given the outbreak) for the two action-thresholds chosen (0.015 and 0.125). In order to calculate P(Outbreak|Class) and P(Class|Outbreak), we artificially treated our problem as a binary decision for each threshold. We computed average probability values across the whole study period.

**Table 1.** Average classification performance in terms of P(Outbreak|Class) (i.e., probability of outbreak (Yes/No) given the assigned risk class, L) and P(Class|Outbreak) (i.e., probability of the assigned risk class given the outbreak) at the low action-thresholds (predicted risk = 0.015).


**Table 2.** Average classification performance in terms of P(Outbreak|Class) (i.e., probability of outbreak (Yes/No) given the assigned risk class, H) and P(Class|Outbreak) (i.e., probability of the assigned risk class given the outbreak) at the high action-thresholds (predicted risk = 0.125).


Overall, the risk score was strongly associated with the likelihood of COVID-19 outbreak, as demonstrated by the relative risk of outcome occurrence in the three risk classes over the study period (Table 3).

**Table 3.** Average classification performance in terms of relative risk of COVID-19 outbreak by risk class. The relative risk is calculated as RR = *<sup>P</sup>*(*Outbreak*=*Yes*|*Class*) *<sup>P</sup>*(*Outbreak*=*Yes*|*Class*=*L*) .


#### *3.3. Model Feature Importance*

Feature analysis investigated the impact of each variable on model output (Figure 6). Although there are some differences among the model versions, overall, the most important variables are related to the epidemic dynamics in the clinic in the period immediately preceding the index date for risk evaluation. Regional data on the number of COVID-19 cases and deaths were likewise ranked high. The number of COVID-19 cases in adjacent clinics resulted in the top predictor list of all three model versions. Of note, variables routinely measured in clinical practice, including changes in CRP and blood white cell count over the observation period, were also strongly associated with outbreak risk.

**Figure 6.** *Cont*.

**Figure 6.** Panel (**a**–**c**) respectively contain the Shapley additive explanations (SHAP) related to Model 1, Model 2, and Model 3 evaluated on the following dates: 15 April 2020, 1 August 2020, and 15 November 2020. SHAP plots show relative feature importance. The blue bar represents overall SHAP values for each variable and are interpreted as relative importance of each variable to risk estimates. On the right side, SHAP values show the direction of association between predictor and risk estimates. Each dot represents one individual clinic from the test dataset. Higher values of the predictors are represented in red color; lower values of the predictors are represented in blue color. The X axis represents the impact of variables on risk in terms of SHAP values. Red color in correspondence with positive values suggests direct correlations between risk factors and the occurrence of COVID-19 outbreak, while red color in the region of negative SHAP values suggests inverse correlation.
