1. Introduction
Systemic vasculitis encompasses multiple subtypes of chronic inflammatory diseases affecting blood vessels of any size (artery, arteriole, vein, or capillary) [
1] with widely variable clinical manifestations. Large vessel vasculitis (LVV) include Takayasu arteritis and giant cells arteritis (GCA), characterized by inflammation of the major aortic branches or aorta itself [
2], with the second being more frequent in women, with incidence rates between 1.6 and 32.8 cases/100,000 persons over 50 years of age [
3].
Takayasu arteritis recognizes an early phase with non-specific symptoms such as fever, fatigue, or weight loss, and a later phase following the involvement of the carotid arteries and aorta, causing carotidynia, neck pain, and back pain, resulting in decreased or absent peripheral pulses. GCA manifests with abrupt headaches associated with other less frequent symptoms such as impaired vision and claudication of the masseter muscles [
4,
5].
Despite the new American College of Rheumatology classification criteria [
6], the diagnosis of LVV remains challenging and still requires arterial biopsies in most cases, despite the fact that this may be frequently spared. The 2018 European League Against Rheumatism (EULAR) recommendations for the clinical use of different imaging modalities for the diagnosis of vasculitis arrayed the available evidence and pointed at imaging techniques to overcome the need for invasive measures [
7].
[
18F]-Fluorodeoxyglucose ([
18F]FDG) position emission tomography/computed tomography (PET/CT) has been proposed for the diagnosis and follow up of LVV [
8] as it also detects LVV-associated inflammation with good sensitivity and specificity at the early stages of the disease [
9,
10]. In 2018, the joint procedural recommendations by the European Association of Nuclear Medicine (EANM), Society of Nuclear Medicine and Molecular Imaging (SNMMI), and the PET Interest Group for [
18F]FDG PET/CT imaging in LVV aimed at standardizing all procedural steps, from patient preparation to image acquisition and reporting [
11]. Visual assessment remains largely operator dependent and requires a significant experience in the field. Although there is no standard method for visual interpretation in LVV, use of the grading system is suggested to compare the vascular to the liver uptake. Methods of semi-quantitative image assessment have also been proposed based on the measurement of the standardized uptake values (SUV) of the arterial wall compared to a reference organ (e.g., liver, venous wall, blood pool) [
12,
13]. Nevertheless, there is currently no clear consensus on the interpretation criteria for PET imaging in vasculitis.
We retrospectively evaluated the diagnostic accuracy of [18F]FDG PET/CT in suspected LVV using different image analysis approaches (i.e., qualitative and semi-quantitative) and testing different parameters that may influence the exam quality and image interpretation. Finally, we compared the performance of the readers with different experience to account for inter-operator variability.
2. Materials and Methods
2.1. Study Design
In this observational study, we retrospectively screened all patients who underwent a PET/CT with [18F]FDG for a suspected or established diagnosis of vasculitis in IRCCS Humanitas Research Hospital from January 2012 to December 2021. Subsequently, only patients with available PET/CT images and clinical follow-up were included in the study. Initially, 432 patients were selected by screening the medical reports, using the keywords “vasculite”, “vasculitico”, or “arterite”. Clinical diagnosis was used as the reference standard. This was established based on the clinical presentation, laboratory tests, and imaging. More complex and inconclusive cases were handled by interdisciplinary discussion, and final diagnosis was established accordingly, with regard to good clinical practice at our Institution. One hundred and fifty-three patients were excluded, as these were external referrals for PET/CT with no follow-up data in our institution’s database. Overall, 279 patients were included in the analysis. The demographic characteristics of patients (i.e., gender, age, weight, height, body-mass index (BMI)), clinical and laboratory data (i.e., type of vasculitis, if ongoing corticosteroid treatment, erythrocyte sedimentation rate (ESR), and concentration of c-reactive protein (CRP)) were retrieved from the electronic records of the hospital. All information regarding PET/CT imaging including pre-injection blood glucose level, administered [18F]-FDG activity, elapsed time between injection and acquisition, and scanner type were also recorded. The study was approved by the Ethics Committee of IRCCS Istituto Clinico Humanitas (approval no. 53/21, date 14 December 2021).
2.2. [18F]FDG PET/CT Acquisition Protocol
Prior to [
18F]FDG administration, the glucose levels were checked in fasting patients (at least 6 h) and if lower than 200 mg/dL, intravenous injection of [
18F]FDG (~6 MBq/kg) was performed. PET/CT images were acquired approximately 60 min after administration following the EANM guidelines [
11] using one of two integrated EARL accredited (
http://earl.eanm.org/cms/website.php (accessed on 23 October 2022)) PET/CT scanners: a second generation Siemens Biograph LS 6 scanner (Siemens, Munich, Germany) equipped with LSO crystals and a six-slice CT scanner (denominated P1), or a third generation GE Discovery PET/CT 690 equipped with LYSO crystals and a 64-slice CT scanner (General Electric Healthcare, Waukesha, WI, USA) (denominated P2).
2.3. Image Analysis
PET/CT images were retrieved from the institutional picture archiving and communication system (PACS) and visually assessed by an experienced nuclear medicine physician (LA, reader 1) with extensive expertise in inflammation imaging reporting (7 years). A student in their final year of medical school acted as the second reader (MC, reader 2).
During visual assessment, both readers independently examined all vascular districts including extracranial arteries (e.g., temporal and vertebral ones). Each PET/CT was defined as positive for vasculitis if the linear circumferential [
18F]-FDG uptake of at least one vascular region of interest was increased compared to the physiological uptake of the liver [
11]. Whenever the vascular target district was represented by femoral arteries, only markedly increased [
18F]FDG uptake compared to the liver was considered positive. Semi-quantitative analysis of the PET/CT images was also performed by measuring the maximum standardized uptake value (SUV
max), drawing a region of interest (ROI) on the axial images in 14 different vessels of interest: right and left carotid, subclavian and axillary arteries, ascending aorta, aortic arch, descending aorta, abdominal aorta, right and left iliac, and femoral arteries. Background uptake, measured as SUV
max in the liver and in the inferior vena cava (IVC) was calculated by drawing a ROI in the right lobe of the liver and in a region of the venous wall approximately at the medium level of lumbar column for the IVC, respectively. Vascular-to-liver and vascular-to-IVC ratios were calculated for each vessel. The value of these ratios was compared to the cut-off values found in the literature of 1.0 for vascular-to-liver and 1.6 for the vascular-to-IVC ratio [
12,
13], and separate analysis for liver and IVC was conducted. First, to determine the performance of the semi-quantitative analysis, an exam was classified as positive if the vascular-to-liver and vascular-to-IVC were higher than each cut-off value, respectively, in at least one vessel. Exams were defined as negative if the vessels had a vascular-to-liver and vascular-to-IVC ratio lower than the respective cut-off.
Second, the utility of semi-quantitative analysis over visual analysis was also tested. In this regard, in the case of a PET exam visually assessed as negative but presenting a vascular-to-liver ratio higher than the reference cut-off for liver in any vascular region, the examination was re-classified as positive for LVV in the semi-quantitative analysis. The same switch from a negative to a positive exam was performed when the vascular-to-IVC result was higher than the IVC reference cut-off. Exams visually interpreted as positive by the readers were not re-classified irrespectively of the reference cut-off.
The XelerisTM workstation (General Electric Healthcare, Waukesha, WI, USA) was used for visual and semi-quantitative analysis.
2.4. Statistical Analysis
Statistical analysis was carried out using STATA 17 software [
14]. First, we evaluated the diagnostic performance of each reader by comparing their findings with the clinical diagnosis (reference standard). Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Subsequently, Cohen’s Kappa coefficient was estimated to assess the inter-rater agreement, and the interpretation proposed by Landis and Koch for inter-rater reliability was adopted (
Table 1) [
15].
Quantitative variables were tested for normal distribution by the Shapiro–Wilk test. To compare the distribution of the values, a non-parametric Mann–Whitney U test was performed.
Sub-analysis to test the diagnostic performance of each reader according to specific criteria was also performed. Specifically, we assessed whether the diagnostic performances might be affected by age (≤65 years versus >65 years), BMI (normal versus overweight), ongoing corticosteroid therapy (yes versus no), levels of pre-injection glycemia (≤ or >126 mg/dL), early, timely, or delayed images (according to timing suggested by EARL, and the time between the radiotracer injection and image acquisition <60 min, between 60 and 65 min, and >65 min, respectively) and scanner (P1 versus P2).
The semi-quantitative analysis was performed first by applying the cut-off to each vascular district, regardless of the visual assessment and comparing it to clinical diagnosis (reference standard). Finally, the added value of semi-quantitative analysis was calculated for each reader, assessing how often an exam initially visually interpreted as negative switched to positive by applying the reference cut-off and using the clinical diagnosis as the reference standard.
3. Results
Elderly patients (>65 years) accounted for 172 out of 279 cases (62%). BMI was normal (≤25) in 141/279 patients (51%). One hundred and fifty patients were taking oral corticosteroid therapy at the time of imaging (54%). Pre-injection blood glucose levels were within normal range (≤126 mg/dL) in 240/279 patients (86%). Timely images were acquired in 79 cases.
Table 2 summarizes the main characteristics of patients included in the analysis. ESR and CRP were available only in a subgroup of patients (97/279 and 161/279 patients, respectively.) Both ESR and CRP mean values were out of the upper range (49 ± 32 mm/h and 9 ± 12) mg/L.
In total, 81 patients (29%) were clinically diagnosed with LVV. Reader 1 and reader 2 failed diagnosis in 76 patients (37 false negative and 39 false positive) and 92 patients, respectively (33 false negative and 59 false positive).
Figure 1 and
Figure 2 show examples of a true positive and a false positive PET exam. Among the 76 patients wrongly classified by reader 1, 51 were >65-years-old, 44 were overweight, 46 were taking steroids, 12 presented with hyperglycemia, 29 were early acquired, while in 26 patients, the acquisition was delayed, and 23 were scanned with P1.
Among the 92 patients wrongly classified by reader 2, 58 were >65-years-old, 41 were overweight, 49 were taking steroids, eight presented with hyperglycemia, 30 were early acquired, while in 34 patients, the acquisition was delayed, and 22 were scanned with P1.
Collectively, 22/76 patients wrongly classified by reader 1 and 22/92 patients misclassified by reader 2 were elderly overweight patients who were taking steroids at the time of imaging.
Figure 3 shows an example of a false negative PET exam during corticosteroid treatment.
The diagnostic performance of reader 1 and reader 2 are detailed in
Table 3 and
Table 4, respectively. In the entire population, the sensitivity, specificity, accuracy, PPV, and NPP of reader 1 were 54%, 80%, 73%, 53, and 81%, respectively. The diagnostic performances of reader 2 were sensitivity = 59%, specificity = 70%, accuracy = 67%, PPV = 45%, and NPP = 81%.
Figure 4a is a radar chart summarizing the performances for both readers considering the clinical diagnosis of vasculitis.
Reader 1’s best performance was attained in patients with a normal BMI (sensitivity of 65%, specificity of 83%, and 77% accuracy). Seventy out of 81 patients (86%) with a final diagnosis of vasculitis were taking steroids at the time of imaging, and 38 of them (54%) were correctly identified by reader 1. The lowest sensitivity (35%) was observed in patients scanned with P1, although in the same subgroup of patients, reader 1 was so specific (89%) to be even slightly more accurate than in patients imaged using P2 (76% versus 71%, respectively). Reader 2 performed similarly to reader 1, with an accrued decline in performance in the aforementioned subsets of patients (
Table 4). Forty-one out of 70 patients (58%) who assumed steroids at the time of imaging and with a final diagnosis of vasculitis were correctly identified by reader 2.
Overall, the interrater agreement for accuracy between the experienced nuclear medicine physician and the student was 71% with a Cohen’s Kappa value of 0.37 (fair agreement). A radar chart summarizing the performances of reader 2 against reader 1 is presented in
Figure 4b.
The agreement between the nuclear medicine physician and the student was scarce (from fair to moderate), even when considering sub-analysis (
Supplementary Material Table S1). The best agreement was observed in patients ≤65-years-old, with increased blood glucose level, and in those early imaged (Cohen’s Kappa = 0.50, 0.54, and 0.53, respectively).
In all vascular regions, the mean SUV
max values were slightly higher in patients with vasculitis, but only those calculated in the axillar arteries were significantly different compared to patients without vasculitis. Mean SUV
max values calculated in the liver and IVC were comparable in the two groups (
Supplementary Material Table S2).
Using only semi-quantitative analysis, the sensitivity, specificity, accuracy, PPV, and NPV were 21%, 87%, 68%, 41%, and 73%, respectively, when using liver cut-off, while when applying the IVC cut-off, the sensitivity was 85%, specificity 21%, accuracy 39%, PPV 31%, and NPV 74%.
The diagnostic performance of both readers when applying the cut-off values for the liver and IVC, respectively, was reassumed in
Table S3 (Supplementary Material). Compared to the visual analysis, the liver cut-off approach slightly increased the sensitivity similarly (54% versus 59% for reader 1 and 59% versus 62% for reader 2) of both readers and reduced specificity (80% versus 73% for reader 1 and 70% versus 64% for reader 2). Using the IVC cut-off, the sensitivity of both readers markedly increased (54% versus 93% for reader 1 and 59% versus 93% for reader 2), but the specificity drastically dropped (80% versus 17% for reader 1 and 70% versus 15% for reader 2). Adding semi-quantitative analysis to the visual one, the accuracy of both readers was lower when compared to the visual approach alone, regardless of the reference used (liver or IVC).
4. Discussion
Our paper retrospectively evaluated the performance of [
18F]FDG PET/CT in diagnosing vasculitis by comparing the reader ability of an experienced nuclear medicine physician to a medical student. As expected, the experienced nuclear medicine physician outperformed the medical student (accuracy of 73% and 67%, respectively) resulting in higher specificity (80% versus 70%). The literature base underpinning the evidence about the role of [
18F]FDG PET/CT to identify and monitor LVV is continuously increasing [
10,
11,
16,
17], and our findings confirmed that an appropriate learning curve and significant expertise in the field are essential, since many factors may affect image analysis.
First, we considered age as a potential confounder of image interpretation. Although patient demographics were consistent with the expected age and sex distributions for LVV [
3], some conditions such as atherosclerosis were reported to affect the vascular [
18F]FDG signal. We considered age as a surrogate of atherosclerosis and accordingly, we divided and analyzed the population in young (≤65 years) and elderly (>65 years) subjects, although many other factors, conditions, and diseases including cardiovascular ones, might influence, beyond age, the vascular [
18F]FDG uptake [
17,
18,
19,
20]. Diagnostic accuracy was higher in the young than in the aged patients for both readers, suggesting that age may act as a confounder. As aging and BMI may have an influence on vascular [
18F]FDG uptake [
21], overweight patients are more prone to atherosclerosis. Moreover, the quality of images is poorer in overweight patients than in subjects with a normal BMI [
22]. As expected, the sensitivity in overweight patients was lower than in normal individuals for both readers, even if the specificity was relatively high (78% for both readers in overweight versus 83% for reader 1 and 63% for reader 2 in normal patients). Steroids are listed among the factors that may affect the diagnostic performance of [
18F]FDG PET/CT in LVV diagnosis, reducing vascular [
18F]FDG uptake, and increasing the signal in the liver. Although Nielsen et al. [
23] reported a very short diagnostic window (i.e., 3 days) between steroid initiation and [
18F]FDG PET/CT examination, discontinuation or postponement of steroid therapy could expose patients with GCV or Takayasu arteritis to complications [
11], making it difficult to comply with the appropriate timing. As expected, the accuracy of reader 1 was higher in patients not taking steroids, although in our cohort, a high proportion of patients who were diagnosed with vasculitis took steroids at the time of imaging (86%). Reader 1 correctly identified 54% with a final diagnosis of vasculitis patients who took steroids at time of PET/CT, suggesting that the proper and stringent use of diagnostic criteria as recommended by the guidelines [
11] might partially overcome this limitation. The accuracy of reader 2 was comparable independent of steroid use, and 41 out of 70 patients who took steroids at the time of PET/CT with a final diagnosis of vasculitis were correctly identified by reader 2. Interestingly, the student labeled more positive patients who were taking corticosteroid therapy than the expert physician, as denoted by the slightly increased sensitivity (64% versus 55%). This may be explained by considering that an inexperienced reader might tend to evaluate them as positive ambiguous cases to avoid missing the diagnosis. It is also known that even a low maintenance dose of steroids is effective in suppressing [
18F]FDG uptake in the vascular wall [
24]. In our study we did not take into consideration the duration of treatment and/or different dosage of the steroid therapy, since these data were not available for all patients due to the retrospective design of the study.
Although pre-injection blood glucose levels seem to be less important in infection and inflammation than in oncology [
11,
25,
26], a negative correlation between fasting glucose level and arterial [
18F]FDG uptake has been reported [
27], supporting a recommended value of pre-injection blood glucose values lower than 126 mg/dL [
11]. In our series, glycemia did not impact the performance of the experienced reader (sensitivity of 54% in patients with low as well as in those with high blood glucose levels), but it affected reader 2’s performance, decreasing the sensitivity in the case of lower blood glucose levels (accuracy of 65% versus 79% in patients with low versus high blood glucose levels, respectively). These results seemed to contradict the above-mentioned evidence reported by Bucerius et al. [
27], who suggested that in the case of hyperglycemia, analysis should be mathematically corrected for the vascular [
18F]FDG uptake. However, the study of Bucerius et al. [
27] was focused on semi-quantitative parameters and they did not investigate the effect of pre-injection glycemia on visual assessment. Moreover, the distribution of data should be considered. Only 39 patients in our cohort experienced high levels of blood glucose, and only 13 of them had a final diagnosis of LVV, preventing further speculations. The current guidelines [
11] recommend at least 60 min between intravenous administration of [
18F]FDG and acquisition, although it has been reported that delayed images, reducing blood pool activity, resulted in a higher accuracy compared to timely acquisition [
28]. This was not corroborated by our data, since we did not observe significant improvement in the performances in the case of delayed images, even when considering timing according to the EARL guidelines [
25] (
Supplementary Material Table S4). These findings suggest that the use of proper criteria for image interpretation might reduce the impact of factors related to image acquisition. Finally, the scanner might impact the quality of the images and therefore on the diagnostic performance. Both readers were less sensitive in analyzing images acquired using the second generation P1 scanner than those obtained with the third generation P2 scanner. Currently, there are no specific recommendations for acquisition protocols in LVV. At our institution, we acquired LVV patients from vertex to knee (at least), and as a general rule, we preferred to scan heavy patients with P2.
Generally, when comparing the evaluations by both readers, there was a fair interrater agreement, as indicated by Cohen’s kappa. The student’s diagnostic performance was inferior but fair over all parameters compared to the experienced physician’s, except for evaluations made in patients who had their imaging studies performed on the scanner P1, and to a lesser extent in overweight patients.
Collectively, focusing only on patients wrongly classified by reader 1, just under a third were elderly overweight patients who took steroids at the time of imaging. Considering the patients misclassified by reader 2, and comparing them to those of reader 1, other factors such as the scanner seeming to interfere with image interpretation, suggesting that in an inexperienced reader, all elements probably co-occurred, affecting the image analyses.
The diagnostic performance of readers 1 and 2 was not improved by adding semi-quantitative analysis to the visual one. The use of semi-quantitative parameters to diagnose LVV is not recommended unless within the context of clinical trials [
11,
17]. Indeed, there is no evidence that semi-quantitative [
18F]-FDG PET/CT metrics may help to better diagnose LVV than visual scoring [
17]. Moreover, a reduced diagnostic accuracy of target-to-background based semi-quantitative indices has been reported in patients under glucocorticoids [
29]. In our hands, the use of liver as the reference standard performed better than IVC, confirming the literature data reported in naïve giant cell arteritis patients [
29].
Some limitations should be acknowledged. First, this was a retrospective study in which clinical diagnosis was used as the reference standard. Images at the time of PET/CT were interpreted by different physicians with a diverse expertise in the field of inflammation, and medical reports might have impacted on the clinical diagnosis. Indeed, reader 1′s sensitivity was lower than that generally reported in the literature [
10,
11,
20]. However, this was expected, since for the analysis, we did not use medical reports, but we visually re-assessed all images, rigorously applying the diagnostic criteria recommended by the current guidelines [
11]. Accordingly, only patients presenting a vascular uptake higher than the liver was scored as positive. On the other hand, the final clinical diagnosis was realistically influenced by a positive (or negative) medical report, since, as shown by the literature, the incorporation of [
18F]FDG PET/CT significantly impacted the patients’ LVV management [
23]. Bearing this in mind, it is possible that the performance of reader 1 as well as that of reader 2 could have been different if only biopsy was considered as a valid final diagnosis, or if they performed image analysis at the time of diagnosis. Second, as already indicated, the majority of our patients with vasculitis were taking corticosteroid therapy, which intrinsically reduces the diagnostic performance of [
18F]FDG PET/CT if lasting for more than 3 days or even at low-dosage levels. Third, the retrospective nature of the study prevented the possibility of evaluating other conditions or comorbidities as founders.