2.3.4. Features Selection

All features have been included in the first model iteration (Supplementary Table S2). Features that provided trivial contribution to model prediction based on feature importance statistics were excluded from the following training iterations. The final model included a total of 46 features derived from 28 variables (Table 1).


**Table 1.** Patients Characteristics.

**Table 1.** *Cont*.


All variables were included in the AVF Failure Model. IQR, interquartile range; SD, standard deviation; AVF, arteriovenous fistula.

#### 2.3.5. Missing Variables Handling

Missing values for the input variables are automatically managed by XGBoost, so no data manipulation was required. The algorithm has proven greater accuracy compared to the standard statistical sample or model based missing data handling methods, as well as other machine learning techniques such as random forest or Bayesian ridge methods. A detailed explanation of how XGboost handles missing variables for a wide range of missingness patterns is beyond the scope of the manuscript and it has been thoroughly described in previous technical publications [32]

#### *2.4. Statistical Analysis and Model Performance Evaluation*

Model derivation was conducted in a randomly selected partition representing 70% of the original dataset. The final set of variables was obtained as the result of backward stepwise feature selection [33]. Model performance and calibration have been evaluated in the remaining 30% of patients. Model performance was evaluated by concordance statistic and calibration charts. Discrimination was quantified by calculating the area under the receiver operating characteristic curve (ROC AUC) Calibration was visually inspected by plotting observed outcomes incidence by predicted risk score. To evaluate model stability, both training and test has been repeated over 30 random resampling. All statistics are reported as pooled estimates (inverse variance method) and 95% confidence intervals of metrics obtained in the 30 resampling exercises obtained by fixed effect meta-analysis. The importance of input variables for risk prediction was computed using SHAP method. All analysis was performed with Python version 3.7.10, MetaXL® and SAS 9.4®.

#### **3. Results**

#### *3.1. Derivation & Test Dataset*

The final dataset consisted of 13,369 patients, which provided 113,592 patients-quarters. AVF failure incidence density was 6.6 events/100 patient-quarters or 26.4 events/100 patient years. The AVF failure incidence density in the test set was 6.38 (95% CI: 6.33–6.43). A breakdown of AVF failure events by type is reported in supplementary Table S3. Baseline characteristics of participants are shown in Table 1.

#### *3.2. Discrimination and Calibration in the Validation Sample*

The final model had a very good discrimination accuracy. The Area Under the ROC Curve (AUC-ROC) for the AVF-FM was 0.80 (95% CI 0.79–0.81). Model calibration showed excellent representation of observed failure risk (Figure 2).

**Figure 2. Calibration Plot.** The calibration plot represents the relationship between predicted probabilities and observed frequency of events in the test dataset. The shaded band represents the 95% confidence interval of the calibration curve. The dotted line represents perfect calibration. The observed calibration curve overlaps with the perfect calibration line over the whole predicted probability distribution.

> Based on model calibration we established three thresholds identifying 4 risk classes: prevalence and observed event incidence for each risk group is summarized in Table 2.


**Table 2.** Arteriovenous fistula risk score classes.

Risk classes are defined based on three action thresholds of the AVF-FM risk score. Prevalence of each risk class, event rates and risk ratios were estimated in 30 test set obtained as random partition of the original cohort with a 70–30 split. Figures represent pooled estimates (inverse variance method) from 30 random samplings of the of the original cohort. Source figures for each random sampling is reported in Supplementary Table S4. \* The AVF Failure Risk is the Positive Predictive Value (events/100 patient-quarters) computed for patients classified in a given risk class; that is PPV = P (Failure|Class). Note: AVF, Arteriovenous fistula.

#### *3.3. Feature Analysis*

The 20 most important data features contributing to performance of AVF failure risk score model, are shown in Figures 3 and 4. Previous history of AVF complications occurred on the vascular access under consideration was the most impactful variable, followed by recirculation and other functional parameters including metrics describing temporal pattern of spKt/V, blood pump flow (Qb), dynamic venous and arterial pressures. Furthermore, AVF vintage, diastolic blood pressure, serum albumin and C-reactive protein were ranked among the top-20 risk contributors.

**Figure 3.** Shapley additive explanations (SHAP) plot showing relative feature importance. Each dot represents one individual subject from the test dataset. Colour Coding: the red colour represents higher value of the variable; the blue colour represents a lower value of the variable. The X axis represent the impact of variables on risk in terms of SHAP values. Positive values suggest direct correlations between risk factors and the occurrence of AVF failures. Negative values suggest inverse correlation between risk factors and the occurrence of AVF failures. Note: AVF, arteriovenous fistula; DBP, diastolic blood pressure; SD, standard deviation; Qb, blood pump flow.

**Figure 4.** Variable Importance plot. Mean SHAP values represent variable importance plot for the top 20 features in the final model Notes: AVF, arteriovenous fistula; DBP, diastolic blood pressure; SD, standard deviation; Qb, blood pump flow.
