4. Discussion
The interpretation of diuretic renography is characterized by considerable variation. The main reasons for this are the different protocols applied among centers as well as patient factors, such as poor patient preparation, reduced renal function and a dilated renal collecting system. These can result in false positive or equivocal results, particularly in the diagnosis of obstruction [
9]. Indeed, several studies, consensus reports and guidelines in the field have tried to address the issue of standardized acquisition and interpretation of the examination [
2,
3,
4,
10,
11]. In the quest to reach (insomuch as is possible) an objective scan reading, specific quantitative parameters, such as the herein calculated parameters of DRF, T
max and T
1/2, have been introduced in the interpretation of diuretic renography [
12]. Nevertheless, disagreements are still often raised in clinical practice regarding the interpretation of scan results. Indeed, this can occur in as many as 20% of cases, even between full-time nuclear medicine physicians [
13]. Although the interpretation of results of diuretic renography was not the topic of the present work, we sought to address the clinically relevant issue of intra- and inter-observer agreement of the commonly derived indices of renal function by scintigraphy. A high level of agreement is a prerequisite for the reliable and robust assessment of renography data and is particularly desirable in patients undergoing renal function monitoring by means of this method.
To our knowledge, we have presented data for the largest patient cohort published hitherto. The main strengths of our analysis include the wide range of renal function values of our study participants, the application of two different quantification approaches by both an experienced and a junior operator, and the employment of a robust statistical methodology. The main results of the study can be outlined as follows: regarding the calculation of DRF, despite the favorable results of the manual method, limitations were observed for the semi-manual approach as reflected in estimation of the intra-observer repeatability by the junior radiographer and the inter-observer repeatability. A certain degree of operator-dependence was also observed in the assessment of Tmax, with higher levels of repeatability for the experienced radiographer and no distinct superiority realized in any of the software tools; nevertheless, the levels of bias and LoA for this parameter were rather narrow for both observers. Finally, concerning T1/2, very good levels of agreement were noted in intra- and inter-observer repeatability with both the manual and semi-automated techniques for both operators.
The calculation of DRF, which is the relative renal tracer uptake from the blood, is one of the most common indications for the performance of renography. In general, a DRF of 45–55% is considered to be in the normal range [
14], although ranges of 42–58% have also been reported in normal adults [
12,
15,
16]. A high level of repeatability in DRF evaluation is particularly desirable in terms of renal function monitoring, for example, in the determination of the effect of chronic obstruction on underlying renal function, since DRF changes may be important in clinical decision—in particular, in the direction of surgical management. Commonly applied thresholds for surgical treatment include a DRF decline of 10% (less often even 5%), while, as a rule of thumb, a kidney with a DRF < 10% is considered incapable of sustaining a dialysis-free life, and in such cases, nephrectomy is the suggested treatment strategy [
9,
17]. Interestingly, with regard to descriptive statistics of the herein studied population, the estimated SD of DRF was markedly higher than the SD documented in previous studies, such as the ones by Klingsmith III et al. [
15] and Esteves et al. [
12]. However, this can be explained by the characteristics of the enrolled cohorts, including normal subjects and potential kidney donors, whereas the present study involves patients with wide range of renal function values, among which many patients had a known or suspected renal disease. A further repeatability assessment, after grouping patients based on the different referral causes, would probably clarify the potential impact of underlying pathologies on agreement of the renography parameters. However, the subpopulations formed according to clinical indication (
Table 1) would be too small to afford such a subanalysis.
The results of the present study regarding intra- and inter-observer repeatability of DRF assessments demonstrate which approaches have zero bias, narrow LoA and at least substantial agreement for the manual method by both radiographers, especially for the experienced one. Lezaic et al. also investigated the intra- and inter-observer repeatability of diuretic renography in adults between three observers (nuclear medicine physicians without further clarification regarding their level of experience) using the manual method, but after applying different statistical methods than in our study [
17]. In particular, instead of using the Bland–Altman analysis, the authors quantified repeatability by SD of the DRF measurements, and reported an excellent agreement based on an average intra-observer repeatability of 2.6% and an inter-observer repeatability of 4.2%. These results are in line with ours, where equal or lower SD levels were found in DRF assessments by the manual technique. Moreover, we performed renography assessments by applying a semi-automated approach. In comparison to the results of the manual method, the semi-automated approach yielded worse results regarding intra-observer repeatability of the junior radiographer and inter-observer repeatability, demonstrating moderate agreement and wider 95% LoA, exceeding 9%, with potential influence on patient management. Based on these findings, we encourage cautious use of automated tools regarding DRF measurements and suggest adjunct validation by manual methods where possible.
A comparison of the manual and semi-automated approaches for DRF assessment was also performed. The two quantitative methods exhibited substantial levels of agreement for both observers with very small bias, while the LoA did not exceed 8%. A similar analysis was performed by Rewers et al. who also compared a semi-automated to a manual software package in 65 normal subjects for evaluation of suitability as renal donors [
16]. Our findings can be considered in agreement with that study, although the herein presented biases and LoA that are slightly wider than the ones reported by Rewers et al. (bias = −0.10%; LoA = −6.70–6.50%); this can be, however, attributed to the more heterogeneous consistency of our studied population, including patients with sometimes-marked renal impairment. Moreover, an older study of 21 patients with various renal disorders evaluated the relative kidney function obtained with the semi-automated and manual techniques [
18]. The authors of that study reported almost identical values with the two methods based on correlation, not agreement, analyses. Correlation, however, is not recommended as a method to compare different techniques, since it simply indicates the degree of association between two sets of observations and not their agreement [
19,
20].
Measurements of T
max are performed routinely in the context of diuretic renography. Although no absolute values exist regarding definition of a normal T
max, renograms typically peak by 5 min after injection, while the T
max is prolonged in obstructed kidneys [
11]. In a study by Esteves et al., conducted to define the normal ranges of parameters derived by diuretic renography, T
max mean values for both kidneys and genders ranged between 3.2–4.4 min, while the respective SD lied between 1.0–2.1 min [
12]. Similarly, Rewers et al. reported on normal T
max mean values between 2.1–3.1 min (SD = 0.4–0.5 min) as derived by a semi-automated and a manual renography processing software package. In our study, we observed an operator-dependent influence on the calculation of T
max, with the experienced radiographer exhibiting substantial agreement with both methods, and the junior radiographer only moderate to substantial agreement. It is, however, noteworthy that the bias was almost zero and the LoA were very narrow for both observers (≤0.44 min) and comparable to the respective values defined for normal subjects [
12,
16]. No distinct superiority was observed in any of the software tools. Interestingly, concerning inter-observer repeatability, the semi-automated method demonstrated substantial agreement in the assessment of the right kidney compared to moderate agreement from the manual approach, whereas repeatability in the evaluation of T
maxL was moderate for both approaches. Further, the comparison of the manual and semi-automated methods revealed moderate levels of agreement between the techniques. Despite this seemingly problematic agreement between the two ROI assignment methods, the levels of bias (≤0.1 min) and 95% LoA (≤0.4 min) were rather narrow, comparable to the ones published by Rewers et al. in a similar agreement analysis in a normal cohort [
16].
One of the main indications for performing diuretic renography is the determination of the presence of urinary obstruction. In this context, apart from the pattern of the time–activity renogram curve, which serves as the main interpretation tool in suggesting or excluding obstruction, the measurement of T
1/2 is used as an aid for the further evaluation of the diuretic renogram. T
1⁄2 refers to the time it takes for activity in the kidney to decrease to 50% of its maximum value. Although no consensus exists on the optimal methodology for T
1⁄2 calculation, which remains, to a high degree, institute-dependent, it is generally recognized that urinary obstruction is associated with a prolonged T
1⁄2 [
4,
11]. At our center, the diuretic standard renography protocol applied was the F + 10, where the diuretic furosemide was administered 10 min post-injection of
99mTc-MAG3, while the study was continued for another 10 min. Obstruction can be practically excluded when the time to half-peak counts in the renal cortex is reached before the administration of furosemide (T
1/2 < 10 min); this is considered highly unlikely in patients with T
1/2 between 10–20 min (patients responding adequately to the diuretic), whereas it is highly suspected in those with T
1/2 > 20 min. Thus, the parameter was handled as an ordinal variable after classification of patients in the following three groups: 0–10 min, 10–20 min and ≥20 min. Agreement analyses revealed that the assessment of drainage of both kidneys was highly reliable in terms of intra- and inter-observer repeatability. Importantly, these high levels of agreement applied for both radiographers and both quantification methods. Lezaic et al. also showed a high reproducibility of drainage assessment in adults and children by means of manual processing of the diuretic renograms [
17]. Our findings support those of Lezaic et al., highlighting the very satisfying repeatability of both the manual and semi-automated approaches separately as well as the high agreement between them, suggesting a conditional interchangeability of the two methods in assessment of obstruction.