**3. Discussion**

With this systematic review, we critically summarize the results of the last five years research, the latest advances, and highlight the most frequent limitations of studies assessing urinary biomarkers for the diagnosis or prediction of acute allograft rejection. We focused on study design, distinction between TCMR and ABMR setting, evaluation of confounding (e.g., DGF, infections, calcineurin inhibitors nephrotoxicity), comparison with the gold standard of diagnosis (both for cases and controls), and presence of estimates of the biomarker(s) performance in validation.

The main finding was the strengthening in evidence for the clinical utility of urinary C-X-C motif chemokine ligands (in particular for the diagnosis of TCMR) alone or in combination with other biomarkers as in the *Q score* (cell-free DNA, methylated cell-free DNA, clusterin, total protein, creatinine, and CXCL10) or in the CTOT-4 formula. CXCL9 and CXCL10 had AUC ranging from 0.67–0.88 with a NPV ranging from 84–98% for AR diagnosis and AUC ranging from 0.50–0.97 with a NPV ranging from 71–96% for AR prediction. Signatures of urinary peptides and metabolites identified through unbiased proteomic and metabolomics, and a cluster of urinary cell pellet genes (*uCRM score*) were also established for the diagnosis of AR, net of some limitations for their introduction in clinical practice. Confounding outcomes need always to be considered due to potential overlap in diagnosis. For example, urinary chemokines are also elevated in allograft BK virus nephropathy (as discussed below), urinary NGAL was proposed as early predictor of DGF [56], and as a biomarker of CNI toxicity [57], while urinary miRNAs dysregulation has been linked to interstitial inflammation and tubular atrophy [58]. For the first time Tinel and colleagues demonstrated that considering (instead of excluding) potential confounding factors (i.e., urinary tract infection and BK virus reactivation) in a diagnostic multi-parametric model could optimize its performance [16]. A model combining eight parameters (recipient age, sex, eGFR, DSA presence, signs of urinary tract infection, BKV blood viral load, CXCL9, and CXCL10) could reach AR diagnosis with high accuracy (AUC: 0.85, 0.80–0.89), paving the way for new studies combining urinary biomarkers with clinical characteristics to reach the highest clinical relevance and provide targeted therapy for our patients.

Up to 2015, almost ninety non-redundant molecules were identified as urinary biomarkers of AR, participating in different pathways such as complement activation, antigen presentation, and inflammation signaling [15]. Urine was the most frequent matrix of choice for these analyses, and studies were often limited by small sample size and case-control design, no histology in the control cohorts, lack of confounding adjustment, lack of a validation set, and technical difficulties with procedure standardization and costs [15]. Although serum creatinine levels and proteinuria monitoring are well established biomarkers used by transplant physicians to suspect AR, they lack both sensitivity and specificity, and they are of little help in the prediction phase, in detecting subclinical rejection, and in differential diagnosis between AR, infections, drug toxicity, and acute tubular necrosis [14,59]. In a study of 281 consecutive biopsies, indicated by an increase in serum creatinine levels, only 27.8% revealed any sign of AR [51]. Conversely, subclinical rejection (i.e., rejection without clinical dysfunction) was found in over 40% of patients with normal renal function in the presence of anti-HLA de novo donor-specific antibodies (DSA) [60]. Proteinuria is common after kidney transplant and, although widely used as a biomarker of renal disease and despite its value as an independent predictor of long-term graft survival, it could also be sign of post-transplant primary disease recurrence (e.g., focal-segmental glomerulosclerosis), infections (e.g., CMV), immunosuppressive medication toxicity, or systemic (e.g., new-onset diabetes) and urologic complications (e.g., ureteral stenosis) [59,61]. DSA monitoring is currently considered the primary biomarker for ABMR but, despite the increasing ability to detect low level of DSAs, their positive predictive value is low, so that up to 60% of patients showing de novo DSA do not show any sign of AR at biopsy [60].

Continuous advances in molecular techniques and the "-omics" sciences have helped to identify many potential new blood and urine biomarkers for the diagnosis and prediction of kidney allograft AR in the last two decades. Of note, elevated pretransplant serum CXCL9 and CXCL10 levels were found to be associated to increased risk of early and severe AR and graft failure [62–64]. Subsequently, among urine-derived proteins, a 2012 study found CXCL9 and CXCL10 to be considerably elevated in patients experiencing either AR (clinical or subclinical) or BK virus infection (86% sensitivity and 80% specificity for CXCL9; 80% sensitivity and 76% specificity for CXCL10), but they were not able to distinguish between the two conditions [65]. These results were reinforced by the 2013 CTOT-1 study, which found that low urine CXCL9 measured at 6 months post-transplant identified a subset of patients at low-risk for AR development (92% NPV for Banff ≥1A TCMR) and predicted allograft stability up to 24 months post-transplant (93-99% NPV) [66]. With the help of mass spectroscopy, elevated beta2-microglobulin levels were identified as strongly correlated with AR (83% sensitivity, 80% specificity, 89% PPV, 71% NPV) and then validated by ELISA in the urine of AR patients [67]. Cytotoxic proteins perforine and granzyme B urine mRNAs were proposed to noninvasively diagnose AR (respectively with 83% sensitivity, 83% specificity, and 79% sensitivity, 77% specificity) [68] and Treg marker FOXP3 was shown to predict reversal of AR (90% sensitivity, 73% specificity) [69]. T-cell immunoglobulin-3 domain, mucin domain mRNA expression (Tim-3, also known as hepatitis A virus cellular receptor 2) in urinary cells was found to be able to discriminate AR from other causes of acute graft dysfunction (calcineurin inhibitor nephrotoxicity or interstitial fibrosis and tubular atrophy) with an AUC of 0.96, 89% PPV and 94% NPV [70]. A 2013 multicenter study from the CTOT-4 study group later identified a 3-gene urinary mRNA signature (CD3ε mRNA, CXCL10 mRNA, 18S rRNA) able to discriminate acute TCMR from no rejection in indication biopsies, with an AUC of 0.74, 79% sensitivity and 78% specificity in a validation set [54]. Also, noncoding miRNAs (e.g., miRNA-10a, miRNA-10b, miRNA-210), although limited by the easy degradation, proved to be detectable in the urine, and in particular low miRNA-210 levels discriminated patients affected by AR from stable control transplant patients (74% sensitivity, 52% specificity) [71].

Our systematic analysis of the more recent literature details the accuracy of a variety of urinary biomarkers for allograft AR with the objective of allowing transplant physicians early diagnosis and prediction of rejection episodes, and differential diagnosis with other causes of allograft dysfunction. A correct histologic diagnosis of AR is essential during the process of new biomarkers validation and the Banff criteria are considered the gold standard for biopsy evaluation. The diagnostic criteria for TCMR have essentially undergone no major change in the last decade with lymphocytic infiltrate of tubules (tubulitis) and larger vessels (vasculitis) being the main descriptive features. The severity of these lesions is graded according to the degree of lymphocytic infiltrate per high-powered field. On the other hand, ABMR criteria has continuously evolved in recent years–thus highlighting the great importance of applying an up to date classification in this setting–with the recognition of its variable histologic presentation [72,73]. Original criteria established in 2000s included active tissue injury, immunohistologic evidence of peritubular capillary complement split-product C4d deposition and circulating DSA. Subsequent studies demonstrating the presence of ABMR also in lacking detectable C4d staining biopsies [74], pushed the Banff Working Group in 2013 to the major change in the ABMR criteria, removing the requirement for C4d detection [75]. The most recent changes in 2017 included removing the requirement for documented circulating DSA in the setting of positive C4d staining and microvascular inflammation and included the use of AMR-associated gene transcripts panels [10].

The ideal biomarker should be readily available, accurate, inexpensive, standardized, repeatable, and noninvasive and would be useful to reduce the need for protocol biopsy and enable early targeted intervention. The chance of finding an ideal biomarker with high sensitivity, specificity, PPV and NPV is small. However, not all biomarkers need to be highly sensitive and highly specific at the same time, depending on the clinical question they are going to answer. Therefore, targeting specific populations and accepting lower predictive values in certain variables may be a better strategy. For example, to

confirm the need for allograft biopsy in a population at high risk for AR (thus providing biopsy to the correct patients), a test with high sensitivity, and low false negative rate, would be the most useful. On the contrary, to propose diagnostic biopsies in a population at low risk for AR (thus avoiding unnecessary per-protocol biopsies), a test with high specificity, and low false positive rate, would be the test of choice. Also, TCMR and ABMR are different clinical entities and it is unrealistic, on current evidence, to hope for a biomarker that will accurately predict AR in both forms in a typical population of transplant patients with possible confounding.

Our systematic review has some limitations. The heterogeneity of the included studies did not permit to detail the many facets of individual study results, especially the more complex ones, to stick with the systematic review question. For space restraints, tables only report the major findings of each study, limited to urinary biomarkers. A narrative synthesis of the most promising results was applied to improve readability and a meta-analysis could not be performed. From our work, overall good quality studies emerged, many with DTA analysis and some comprising a thorough validation process yielding a very good to excellent diagnostic performance. Although specific forms of bias were assessed using QUADAS-2 publication bias could not be formally assessed and the authors acknowledge this can overestimate the weight of positive results. Weaknesses of the included studies were often the use of small cohorts obtained by case-control selection yielding inflated predictive values, the exclusion of confounding, unclear or out of date Banff classification application, the absence of validation cohorts, and lack of hypothesis-driven approach. In fact, the biomarker discovery process should not only consist of a training phase (i.e., a case-control study), but also comprise independent validation in a prospective study and confrontation with real-life clinical setting.
