*4.3. Data Collection and Analysis*

Data from each of the included studies were collected with the help of a pre-specified spreadsheet and extraction table refined by all authors. Study design, single or multicenter patient collection, sample size, years of enrollment, urinary biomarker(s) of interest (i.e., index test), the Banff classification used for histological AR diagnosis (i.e., reference standard) and the addressed outcome(s) were collected in a descriptive table. Studies were distinguished between diagnostic and predictive. Diagnostic studies were usually collecting urine samples on the day of the diagnostic biopsy while predictive studies were analyzing urine samples collected before AR development. Studies that reported DTA data, such as sensitivity, specificity, PPV, NPV, and AUC were evaluated for risk of bias and applicability concern using the Quality Assessment Tool for Diagnostic Accuracy Studies-2 (QUADAS-2), a tool for quality evaluation of diagnostic accuracy studies [77]. The most important items for a positive evaluation included; a cross-sectional study design; avoiding patient selection bias and inappropriate exclusion; the definition of the index test (biomarker) threshold in a training set and its validation in a separate set of patients; and compliance with the correct histological definition of AR as a standard reference for all patients included in the analysis. Due to the great heterogeneity of the included studies, a meta-analysis was not performed, and a narrative synthesis of the results was preferred.
