*3.3. Predictive Accuracy*

At univariate analysis (Table 4), for scanner 1 (Discovery 690) all PET/CT semiquantitative parameters and RF obtained an AUC value between 0.6 and 0.8. Regarding scanner 2 (Discovery STE), again all of the semiquantitative PET/CT parameters and RF reached a value of AUC between 0.6 and 0.8; in general, these values were lower than the those reported for scanner 1. Furthermore, the evaluation of *p*-value allowed the selection of some parameters with the best performances for the prediction of the final diagnosis of Tis, for both scanner 1 and scanner 2.

Considering the combined analysis of both the scanners together (scanner 1 + 2), in general PET/CT semiquantitative parameters revealed a higher AUC compared to RF, with significant *p*-value. Interestingly, the combined evaluation of both scanners revealed acceptable values of UAC with significant *p*-value for some RF, even if the same RF did not reach these values at the analysis for single scanner.

**Table 4.** Univariate analysis for semiquantitative PET/CT parameters and for radiomics features for the single scanner and for both scanners considered together. Only values with AUC > 0.6 and *p*-value < 0.05 are reported.


AUC: area under the curve.

After performing a bivariate analysis, for both the single scanners and for both of the scanners considered together, the best combinations between PET/CT semiquantitative parameters and RF are summarized in Table 5. Similarly to univariate analysis, none of the combinations reached an optimal AUC of 0.8 and the couples of parameters generally obtained higher AUC values on scanner 1 than on scanner 2. Furthermore, for this analysis, the p-values were statistically more significant on scanner 1 than on scanner 2. In this setting, even if a comparison between the couples of variables obtained before is complex given the heterogeneity between the two scanners, in general GLCM-related parameters variously combined resulted the ones with best performances. This is true for both scanner 1 and scanner 2 and these findings are confirmed by the good results at univariate analysis previously described. The GLRLM-related and GLZLM-related RF also revealed good

performances in this setting. Interestingly, PET/CT semiquantitative parameters were confirmed as good predictors only for scanner 2 (Figure 3).

**Figure 3.** Visual representations of the three combinations ((**A**) GLCM Entropy\_log10+GLZLM\_SZHGE, (**B**) GLCM Entropy\_log2+GLZLM:SZHGE; (**C**) GLCM Entropy\_lo10+GLRLM\_HGRE) with best performances at bivariate analysis for both scanners considered together.

**Table 5.** Bivariate analysis for clinical, semiquantitative PET/CT parameters and radiomics features for the single scanner and for both scanners considered together. For each analysis, only the couples with best performances are reported.



#### **Table 5.** *Cont*.


**Table 5.** *Cont*.

AUC: area under the curve.

#### **4. Discussion**

The aim of this study was to verify the predictive abilities of semiquantitative PET/CT parameters and of RF to discriminate between benignant and malignant nature of TIs revealed at 18F-FDG imaging.

On the basis of the resulting evidence we identified some remarkable points concerning the effect of different PET scanners on RF extraction and the predictive features and associated models.

In our experimental setting, we had to deal with images coming from different PET/CT tomographs and this fact required a preliminary investigation of the effect of different technologies in producing images and subsequent image features. The results showed that the scanner technology concretely affects some RF, as previously underlined in literature, and in clinical day practice the use of different tomographs in the same department is frequent [20,28–34]. In particular, the acquisition of the same phantoms on different tomographs with different scintillators and algorithm used for the reconstruction (number of iterations, number of subsets or on the presence of partial volume correction) demonstrated this evidence.

These findings suggest two relevant points: the former indicates that different scanners can potentially have different preferred features in terms of correlations with a clinical outcome; the second point suggests that we must critically consider radiomics models coming from centers adopting different technologies. In other words, on one hand a unique radiomic best model trained on many scanners is probably suboptimal for each of them and on the other hand, any radiomic model coming from different centers should be internally validated before considering its use in the daily practice. In particular, in the literature only one study which evaluated the predictive role of RF in TIs [23] used different scanners for the extraction of RF: this means that the reproducibility of the results (which is one of the biggest challenges in radiomics) still remain uninvestigated in this field. Furthermore, in our evaluation, only a small amount of RF demonstrated to be significantly different between the two scanners, together with SUVmax, but nevertheless the cross-correlation maps resulted quite similar, adding value to our results. In this setting, of the parameters that after bivariate analysis demonstrated the best performances, GLZLM SZHGE was the only one significantly different between the two scanners.

Regarding the predictive role of RF for the correct evaluation of Tis, at univariate and bivariate analysis a good percentage of the aforementioned parameters revealed an acceptable AUC between 0.6 and 0.8. However, none of them demonstrated an AUC above 0.8. Similarly, these AUC were coupled with a significant *p*-value in a high percentage of the cases. It is worth underlining the fact that at bivariate analysis performed for both the scanner considered together, the AUC values and the *p-*values were the best in the whole study. This fact underlines a good predictive ability of some RF such as GLCMrelated (in particular GLCM entropy\_log2 e GLCM entropy\_log10), GLRLM-related and GLZLM-related.

Only a small amount of works that investigate the predictive role of radiomics in the evaluation of TIs at 18F-FDG PET/CT are available in literature [21–24].

Even if not clearly characterized by the presence of a proper texture analysis, the first study to evaluate the distributive heterogeneity of 18F-FDG in TIs was produced by Kim et al. [24]. In this work, the authors revealed that this heterogeneity was a promising parameter which was able to predict the final nature of these TIs.

Subsequently, Sollini et al. [23] were the first to evaluate the predictive abilities of texture analysis in this setting. Data of this study underlined the fact that SUVstd (the standard deviation of the distribution of SUV inside the considered VOI), SUVmax, MTV, TLG, Histo skewness, Histo kurtosis and GLCM correlation were the only parameters that were able to predict the final diagnosis of TIs, with a general positive predictive value of 54% and a general negative predictive value of 85%.

A similar analysis was also performed by Aksu et al. [22], who underlined how the semiquantitative PET/CT parameters and some shape-related, GLCM-related, GLRLMrelated and GLZLM-related RF obtained AUC values superior to 0.7. These findings were partially confirmed in our study, where the same parameters confirmed these good results, with the exception of semiquantitative parameters and shape-related RF. Furthermore, the authors of the study developed a machine-learning algorithm using GLRLM RLNU e SUVmax with a good general AUC value (0.731).

Lastly, Ceriani et al. [21] demonstrated the ability to predict the final nature of TIs of some PET/CT semiquantitative parameters (SUVmax, SUVmean, SUVpeak, MTV e TLG) and some RF. In this case, the authors performed texture analysis with a different software from LIFEx and so RF resulted partly different in comparison to the ones used in our study. In general, some shape-related and GLCM-related features demonstrated good performances and multivariate analysis confirmed TLG, SUVmax and Shape sphericity as able to predict the final nature of Tis.

It is interesting to underline that PET/CT semiquantitative parameters resulted good predictors in all of the studies, while in our work only SUVmean obtained a certain predictive role at bivariate analysis. In this setting, we reported that AUC of semiquantitative parameters were quite similar to AUC of RF only at monovariate analysis. Given the fact that the bivariate predictive model did not confirm this evidence, we can assume that these parameters do not perform well when trying to build models with multiple variables as in our case. Furthermore, as previously described, data in literature about the role of these parameters for the assessment of TIs are really heterogeneous and our findings confirm these insights. Moreover, RF describe quality and parameters of images that cannot be visually assessed and this is why we focused our attention on the evaluation of these features, allowing us to better understand the role of 18F-FDG PET/CT in the prediction of Tis nature.

Our study surely presents some limitations. First of all, this is a retrospective study with the use of tomography that are not the actual state. Furthermore, the relatively low sample of patients included in the work, even if higher than similar studies, appears sub-optimal to clearly evaluate the predictive abilities of texture analysis. Furthermore, RF extrapolation with a single software appears another limit of our analysis. Lastly, the aforementioned problem of the reproducibility of radiomics analysis in terms of multicentric evaluation is still an open issue and, in this setting, further research in this field are mandatory.
