3.2.3. Image Reconstruction

Among the technical factors that affect PET interpretation, image reconstruction is especially influential because it directly affects both image noise and lesion CR to a potentially high degree (Figure 6).

**Figure 6.** Fused transaxial as well as coronal PET/CT slices through residual mediastinal lymphoma tissue of a 23-year-old female patient reconstructed with the OSEM algorithm ( **A**,**B**) and with OSEM combined with TOF and PSF ( **C**,**D**). While the lesional [18F]FDG uptake was defined as Deauville score 3 based on OSEM reconstruction, it would exceed the liver uptake when assessed based on images reconstructed with TOF and PSF (=Deauville score 4). This could alter the assessment from "adequate" to "inadequate" metabolic response.

Compared to standard ordered subset expectation maximization (OSEM) reconstruction, OSEM with TOF shows improved noise characteristics [49,94,156–158]. Surti et al. showed that TOF reconstruction improves detection rates of simulated liver and lung lesions by human readers. This improvement was pronounced in heavy patients with BMI ≥ 26 kg/m<sup>2</sup> [159]. The same group further demonstrated that TOF improved lesion detection, especially in low-contrast lesions [160].

PSF compensation can also increase the SNR [157,158] and thereby the subjectively rated image quality [157,161,162]. Investigations with an anthropomorphic phantom or with simulated liver and lung lesions have shown that TOF and PSF can have a supplementary effect on increasing lesion detection rates [161,163]. Conflicting results were reported by other authors who did not find higher lesion conspicuity or detection rates with PSF in small patient samples [162,164,165]. These discrepancies may stem from the fact that the ability of PSF to increase lesion CR is most prominent at the periphery of the transaxial FOV [98,166,167] and in small, high-contrast lesions [95], such as pulmonary nodules. This is illustrated by the observation of Schaefferkoetter et al. of an improvement in lesion detection rates with PSF limited mainly to the lung, while the detection rates achieved with TOF extended to the liver and lung [161].

Based on the potential of PL reconstruction to systematically improve SNR compared to OSEM-based algorithms [108,113,168], several authors have reported improved SNR and image quality with different radiopharmaceuticals, such as [18F]FDG [107,112,113], [18F]F-PSMA-1007 [143], [68Ga]Ga-PSMA-11 [169,170], [68Ga]Ga-DOTATATE [171] or 89Zrlabelled tracers [172]. PL reconstruction has repeatedly been shown to increase conspicuity and detection rates of pulmonary lesions, even compared to OSEM with PSF and/or TOF [112,113,164,173,174]. Figure 7 shows a case example.

**Figure 7.** Images of two [18F]FDG-PET/CT examinations in a 63-year-old man with hepatic and pulmonary aspergillosis. The earlier examination was performed with a scanner equipped with conventional photomultiplier tubes (PMT) and reconstructed with OSEM and TOF ( **A**–**D**). The second examination after 5 months used a SiPM-equipped PET scanner and PL reconstruction with a penalization factor beta of 450 (**E**–**H**). Two pulmonary lesions that showed only moderate [18F]FDG uptake during the earlier examination ( **A**,**B**) appeared substantially more intense on the second scan (**E**,**F**). However, uptake in hepatic lesions declined (not shown), and both pulmonary lesions were unaltered in the CT scan ( **C**,**G**), which suggested that the higher conspicuity of the pulmonary lesions was a result of improved reconstructed spatial resolution and lesion contrast recovery (CR) with the SiPM scanner and PL reconstruction. Based on phantom measurements, reconstructed spatial resolution was estimated at 7.8 mm full width at half maximum (FWHM) with the PMT scanner and 4.7 mm with the SiPM scanner.The improvement in image sharpness can also be seen in the myocardium ( **D**,**H**).

In contrast, in a small sample of 13 patients undergoing [18F]fluorocholine PET/CT for prostate cancer staging, PL with different beta values showed a comparable number of positive lymph nodes to that revealed by OSEM with PSF and TOF [175].

When estimating diagnostic accuracy from reported lesion detection rates, it is important to recognize that there is usually no gold standard available with which to assess the correctness of detected lesions and that such analyses are therefore unable to evaluate specificity. As an exception, Teoh et al. retrospectively investigated the diagnostic accuracy of OSEM + TOF and PL reconstruction using SUVmax and visual reading in 121 pulmonary nodules. Here, histological verification was available. Diagnostic sensitivity and accuracy were similar with both algorithms, while specificity tended to be lower with PL than with OSEM + TOF, especially in lesions >10 mm diameter [173].

Furthermore, no blanket conclusion on differences in image noise or lesion detection between reconstruction algorithms should be drawn from isolated results comparing two algorithms with only one set of parameters each (e.g., number of iterations or type of in-plane filter). Such parameters, namely the in-plane filter width or, in the case of PL reconstruction, the beta value, can have drastic effects on image noise and lesion CR (Figure 2). A higher filter width or beta value systematically decreases both image noise and CR. Reconstruction algorithms should therefore be compared with multiple sets of parameters to investigate real systematic differences between the methods [108]. It may otherwise be observed that such differences can only be detected under specific conditions [41,175].

In a study on 52 patients with lymphoma, 5 patients undergoing [18F]FDG-PET for restaging were divergently classified as non-responders (Deauville score 4–5) with PL reconstruction but as responders (Deauville 1–3) with OSEM (without TOF or PSF; compliant with the EARL1 standard) [176].

#### *3.3. Relationship between Objective and Subjective Image Quality*

Although CNR, SNR and NEC are surrogates for image quality, none of these single parameters sufficiently reflects subjective image quality as a whole [147,177]. However, adequately defined quantitative assessments may each measure specific aspects of subjective image quality, such as image sharpness, lesion contrast or image noise [178].

Several studies on subjective image quality in whole-body [18F]FDG-PET with PL reconstruction found that image quality was highest at beta values of 450 (to 600) despite lower lesion CR or "image sharpness" at these beta values compared to lower values [41,108,112,175,179]. This confirms that subjective image quality is a combination of lesion contrasts and image noise and that readers may demand adequately low noise levels even if this comes at the expense of lesion CR (i.e., quantitative accuracy). In low-count conditions, this tendency to prefer smooth, low-noise images with beta values >600 over "sharper" images could become even more evident [112,170]. Thus, images that are rated best regarding subjective image quality are not necessarily those with the highest quantitative accuracy. Conversely, Zhang et al. reported that lesion SUVmax and detection rates in [18F]FDG-PET remained significantly unchanged despite decreasing acquisition time per bed position from 900 s to 60 s and steadily decreasing subjective image quality [180]. Quantitative accuracy may therefore not necessarily require optimal (subjective) image quality. As these criteria may not be equally fulfilled by a single reconstructed dataset, a reconstruction of separate datasets has been proposed for visual reading or optimized quantification in routine clinical practice [24,181,182].

#### *3.4. Relationship between PET Quantification and Image Interpretation*

#### 3.4.1. Quantitative or Visual Interpretation Criteria?

Interpretation of PET images in routine clinical practice primarily follows visual criteria, i.e., the assessment of generalized or focal pathologies in tracer accumulation, while quantitative parameters, including SUV, provide orientation or additional information at most [24]. As the use of SUV to quantify tracer uptake increased, it was anticipated that this would represent a standardized, reliable criterion to classify lesions with their biological properties and prognostic implications. Consequently, diagnostic SUV thresholds have been proposed for pulmonary nodules [183], lymph node staging in lung cancer [184], adrenal lesions [185], musculoskeletal tumors [186], tumor delineation in gliomas [187], and response assessment in lymphoma [188] among other things. Thus, it is reasonable to assume that the achievement of quantitative accuracy will bring certainty and correctness to lesion interpretation.

However, lesion SUVs in [18F]FDG-PET show a test-retest variability in the same patient with the same PET scanner of up to 20% [189] and are usually even less comparable between different scanners and centers or under routine clinical conditions [190]. This has undermined any attempts to establish widely adoptable SUV thresholds unless rigorous harmonization measures are followed [101,103,181]. Given the inability to derive generalizable SUV thresholds, it has not ye<sup>t</sup> been possible to prove that SUV or any other quantitative measures used in clinical practice provide additional value over visual assessment alone for routine clinical diagnostics [24].

#### 3.4.2. SUV: Which Parameter?

If SUV measurements are taken to support the visual assessment of PET images in routine clinical practice, this probably occurs most often in assessments of the response to therapy. However, as stated above, the validity of these measures is determined by the test-retest variability. Despite the common use of SUVmax in clinical practice, arising from its convenience, SUVpeak and SUVmean have been shown to be slightly less variable under test-retest conditions [189,191]. However, SUVmean and SUVpeak are affected by the reproducibility of the size and placement of the volume of interest (VOI) [192], which requires appropriate standardization or automation. Consequently, the choice of SUV parameter can fundamentally change the assessment of disease progression or response to treatment in the majority of cases [193]. A consensus was therefore needed, and Wahl et al. proposed the PET Response Criteria In Solid Tumors (PERCIST 1.0) in 2009 with the aim of standardizing the SUV parameters (SULpeak = SUVpeak corrected for lean body mass), the VOI size, the definition and number of appropriate target lesions and thresholds for response categories [194]. Still, the repeatability of the liver SULmean under clinical conditions in the same patient during treatment has been shown to be only fair (intraclass correlation coefficient <0.6) [195]. Consequently, the use of SUV to support valid clinical response assessment outside of study conditions remains highly challenging.

#### 3.4.3. MTV: Which Delineation Method?

Treatment decisions based on clinical risk stratification might be further improved by including the initial tumor volume in [18F]FDG-PET, as this factor has been shown to be an independent prognostic value for patient survival in conditions such as non-small cell lung cancer [196,197], different gynecological malignancies [198–201], and head and neck cancer [202,203]. Initial results also show a prognostic value of PSMA-PET tumor volume in prostate cancer prior to radioligand therapy with [177Lu]Lu-PSMA-617 [204].

However, measurement of tumor volume is not ye<sup>t</sup> a standardized procedure because numerous methods have been described to delineate the tumor volume, and considerable differences have been reported between those methods [205,206]. The most convenient and common approaches range from the use of fixed absolute or relative activity thresholds to adaptive methods based on the local signal-to-background ratio. Consequently, optimal volume thresholds to separate prognostic groups may differ systematically and foster discordant assessments, although with optimized thresholds, each method on its own may retain its prognostic value [207–209]. Therefore, for both scientific and clinical use, tumor volume should be calculated by parameters that are readily available and promise high reproducibility between different readers and institutions.
