*2.4. Quality Assessment*

Studies reporting diagnostic test accuracy (DTA) analysis (29/38) were graded for risk of bias and applicability concerns according to the QUADAS-2 tool (Table 3). Risk of bias was frequently high for patient selection (14/29, 48%) and index test (23/29, 79%).

**Table 3.** QUADAS-2 tool assessment for DTA studies. Table illustrating risk of bias and applicability concerns evaluation as per QUADAS-2 tool for 29 studies providing diagnostic test accuracy data.


\*, Prediction study; -, Low Risk; , High Risk; ?, Unclear Risk.

The most frequent reasons for high risk of bias were the selection of the study population by case-control design, which was the case for the majority of the studies, the exclusion of the typical confounding of a real-life setting, the absence of threshold definition and independent validation. For example, when control patients were selected among stable patients without performing allograft biopsy, or only among normal histology patients, and the obtained thresholds were not tested in a randomly selected validation group, the study was highlighted for high risk of bias in patient selection and index test (Table 3). This then raised the possibility of an increased risk of over-fitting

association and unrealistic DTA performance and, therefore, concerns for applicability. The ideal control patients were randomly (or in a cross-sectional fashion) selected, all having had an allograft biopsy (per indication or per protocol) with various histological diagnosis (e.g., normal histology; acute tubular necrosis, ATN; interstitial fibrosis and tubular atrophy, IFTA; chronic allograft nephropathy, CAN; BK virus nephropathy, BKVN; recurrence of the primary disease on the allograft). Only 5/29 studies were found to have a low risk of bias in both patient selection and index test. Allograft histology, according to Banff classification, was the reference standard for AR diagnosis, with histology grading usually assigned in a blinded fashion with respect to the index test results. Since urinary samples were frequently obtained for all included patients, prior to a diagnostic allograft biopsy, and all included patients were evaluated in the DTA analysis, a low risk of bias was frequently identified in the flow and timing domain. The QUADAS-2 tool does not include publication bias (PB) as one of the variables and, in the context of this review, it is difficult to formally assess PB. Given the broad variety of different biomarkers that were assessed and the absence of a meta-analysis, performing formal PB assessment such as Egger's test, Deek's test or the construction of a funnel plot was not possible. It is also recognized that the assessment of PB in data synthesis of DTA data is challenging with limited reliability [55].
