*3.5. Inter-Reader Variability*

Inter-reader or -rater variability can limit the reliability of PET imaging (Figure 5). Numerous studies have analyzed the inter-reader agreemen<sup>t</sup> of visual PET assessments, particularly in [18F]FDG-PET for lymphoma and PSMA-PET for prostate cancer. In lymphoma patients, several studies have reported high inter-rater agreemen<sup>t</sup> using the standardized Deauville criteria [210,211], although conflicting results have also been reported [188,212]. It has been shown that reader training and discussions over divergent assessments increase agreemen<sup>t</sup> even among "expert" readers [213]. It has been suggested that SUV-based criteria in lymphoma might improve inter-reader agreemen<sup>t</sup> because they are unaffected by visual contrast effects [188]. However, both SUV measurements and visual Deauville criteria may be affected by image reconstruction, such as PL reconstruction [176]. Furthermore, besides the reader's subjective assessment of a certain lesion, in a setting in which several lesions of interest are present (e.g., in restaging in lymphoma or metastatic tumors), additional inter-reader variability can result from a divergent choice of the decisive target lesion [213].

A systematic comparison of inter-reader agreemen<sup>t</sup> based on quantitative measures and based on visual reading has rarely been performed, and there is still little evidence of any additional value of the quantitative approach. Furthermore, any quantitative criteria and diagnostic thresholds are a result of certain methodological and technical conditions, which may change over time and require adaptation. However, it has been demonstrated that inter-rater agreemen<sup>t</sup> in response assessment with [18F]FDG-PET in nonsmall cell lung cancer or metastatic breast cancer can be considerably improved through the use of the target lesion SULpeak (PERCIST 1.0 criteria) in comparison with a subjective assessment [194,214,215].

Regarding PSMA-PET for prostate cancer, several standardized evaluation criteria have recently been proposed [216–219] with the aim of improving inter-reader agreement [219] and to aid inexperienced readers [220]. However, inter-reader concordance remains higher between experienced readers [221] and, depending on the specific diagnostic task, substantial to almost perfect agreemen<sup>t</sup> has usually been reported [220–223]. Similar degrees of inter-reader agreemen<sup>t</sup> were achieved with different reporting criteria [224]. However, standardized reading criteria do not negate the dissimilarities in the images obtained using different types of PET hardware and methods of image reconstruction, and both SiPM technology [79] and PSF reconstruction [225] have been found to result in systematically higher lesion conspicuity despite standardized reporting criteria (PSMA-RADS) [216].

## **4. Conclusions and Perspectives**

As we have demonstrated here, a variety of factors influence PET quantification and interpretation. All variables should be considered potential sources of error when interpreting clinical PET images. Although the added value of quantitative uptake parameters for clinical decisions is still not well-defined, it should be kept in mind that even simple quantitative measures such as the SUV are highly variable. The emergence of new PET technologies such as SiPM and advanced image reconstruction algorithms further contributes to the complex issue of image quality and quantitative accuracy. Stringent measures of quality control and standardized imaging protocols should therefore be implemented to ensure robust and valid imaging results in routine clinical care. This will also be crucial to explore and validate the clinical utility of machine learning-based image biomarkers. To ensure comparability, we recommend adhering to the procedure guidelines by the EANM. Furthermore, the EARL initiative has proposed standards for a systematic standardization between imaging centers. This may include the reconstruction of different data sets for image interpretation: (1) a data set for optimal visual lesion detection and (2) a data set for standardized and quantitative image interpretation.

**Author Contributions:** Conceptualization, J.M.M.R. and C.K.; methodology, J.M.M.R., L.v.H. and C.- A.V.; writing—original draft preparation, J.M.M.R., F.H. and C.K.; writing—review and editing, L.v.H., C.-A.V. and R.B.; visualization, J.M.M.R.; supervision, C.K., F.H. and R.B.; project administration, C.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Written informed consent has been obtained from the patients to publish this paper.

**Acknowledgments:** J.M.M.R. is a participant in the BIH-Charité Digital Clinician Scientist Program funded by the Charité–Universitätsmedizin Berlin, the Berlin Institute of Health, and the German Research Foundation (DFG).

**Conflicts of Interest:** The authors declare no conflict of interest.
