**4. Discussion**

In this study, we assessed the effect of the inclusion of central necrosis during tumour delineation on radiomic analysis in two cohorts of patients with NSCLC and PPGL. Around two-third of radiomic features showed significant differences between adaptive threshold delineation with and without manual addition of the region of central necrosis. Nevertheless, the predictive performance of radiomic models with and without central necrosis for the noradrenergic biochemical profile of PPGLs was not significantly different. Due to the low number of subjects, the predictive performance was not assessed for the NSCLC cohort.

At least 65% of all features were significantly affected after adjustment for multipletesting by the difference in delineation method. Less features were affected in the NSCLC cohort compared to the PPGL cohort (65% versus 82%, respectively), which is likely a result of lower power of the test due to a smaller cohort (12 versus 31, respectively).

More than 72% of the first order features, describing the distribution of voxel intensities in a histogram, significantly changed when central necrosis, i.e., lower intensity values, was added to the VOI. The number of affected first order features increases with a higher SUVmax, which can be explained by the larger range of voxel values in the intensity histogram in the case of a high SUVmax after the addition of a region with central necrosis.

At least 66% of the texture features, describing the spatial relationships between individual voxels in terms of run lengths, size zones of the same voxel values or combinations of neighbouring voxel values, were affected by central necrosis, which resulted in a change of the spatial relationships between the voxels. It is beyond the scope of this study to dive into the mathematical definition of all texture features, but we will highlight some of our findings and possible explanations.

Similarly to first order features, the number of affected texture features also increased with increasing SUVmax. The introduction of a region with low grey levels might result in longer run lengths and size zones with low values. In a tumour with relatively high grey levels, this might result in a larger run length or size zone matrix, with high incidences for the low grey values (central necrosis) and for the high grey levels (edge of the tumour) and low incidences in the middle range. For a tumour with a lower SUVmax, the matrices remain smaller, with incidences in the low and middle ranges and this results in different feature values.

Furthermore, it is remarkable that almost all normalised texture features were significantly different for both delineation methods for the NSCLC cohort, the PPGL cohort and all patients combined: The *normalised* GLCM inverse difference and inverse difference moment, the GLRLM grey level non-uniformity and run length non-uniformity and the GLSZM grey level non-uniformity and size zone non-uniformity were significantly different between both delineation methods. The normalised GLDM dependence non-uniformity was only different in the PPGL cohort and the combined cohort. Normalisation of GLCM features is performed to improve classification accuracy [29]. Normalised features are standardised for the number of elements in their respective matrix, i.e., the GLCM consists of the square of the number of discretised grey levels and the GLRLM consist of the product of the number of discretised grey levels and the maximal run length [14]. Since the number of discretised grey levels and the maximal run length increase by the addition of the region of central necrosis, it could result in a decrease in feature values, resulting in differences in feature values between delineation methods.

Compared to other features classes, shape features were affected least frequently by the choice of the delineation method, but 50% of shape features were still affected. Some shape features consider the outer diameter or morphology of the VOI, which, in most cases, did not expressively change when adding the region of central necrosis. Nevertheless, in some cases the region of central necrosis touched the outer surface of the volume of interest (3D U-shape) and caused some features to change. The number of affected shape features increased with the NTF as a result of a larger additional region of necrosis.

Several studies on repeatability and reproducibility showed that radiomic feature values are affected by delineation methods [30]. Unfortunately, overviews of repeatability and reproducibility on a feature level are scarce and are often limited to feature classes. Traverso et al. wrote a systematic review on repeatability and reproducibility of radiomic features and assessed to what extent (highly likely/probable/less likely) the different feature classes were affected by different processing steps of the radiomic pipeline [31]. They found that it is probable that semi-automatic VOI delineation exerts an adverse effect on repeatability and reproducibility of texture features. Moreover, in the case of shape features, this adverse effect is probable but when compared to shape features derived from CT, PET shape features are more reproducible. According to Traverso et al., first order features are less likely to be affected by the delineation method, which is in sharp contrast with our study that shows that first order features are affected in particular by the delineation method. They also present that entropy was consistent among the most repeatable and reproducible first order features [31]. However, in our study it can be observed that entropy is one of the features that is significantly affected by the delineation method in all cohorts and subgroups. Coarseness and contrast (GLCM as well as NGTDM), on the other hand, are considered among the least reproducible features [31,32], whereas our results show that coarseness and GLCM contrast are affected in only two and zero out of nine subgroups (Supplementary Table S1), respectively. This shows that non-repeatable or non-reproducible features are not the only features affected by central necrosis and therefore the choice of the delineation method concerning central necrosis should be considered in the design of radiomic studies.

Our study analysed the differences in radiomic features in two cohorts with different tumour types, showing a different tumour-to-background ratio. While the same image analysis (VOI delineation and radiomic feature extraction) was performed, acquisition and reconstruction settings were different between cohorts even though both protocols were in accordance with the EANM guidelines. The features affected by the delineation method, however, were highly similar between both cohorts, indicating that the effect of the delineation method concerning central necrosis is independent of the tumour type. Therefore, whether or not to include central necrosis in the tumour delineation is an important factor to consider when performing clinical radiomic studies. We hypothesise that this might also apply to other tumour types, but the number of affected features might vary as a result of tumour characteristics such as the tracer uptake and distribution, tumour geometry and NTF.

While almost two-third of radiomic feature values were significantly affected by the choice of delineation method, the predictive performances of the radiomic models, as assessed in the PPGL cohort, were not affected accordingly. The predictive performances, as assessed by the AUCs for the noradrenergic biochemical profile of PPGLs and found valid in a sham experiment, were not significantly different between radiomic features derived from VOIs with and without central necrosis. An explanation for this could be that the radiomic feature set describes many different types of heterogeneity and, as a result, feature sets from both delineation methods contain useful features in terms of predictive performance. Multicollinearity within one feature set is high, but might also be high between feature sets of different delineation methods. Additionally, the combination of radiomic features from both delineation methods resulted in an AUC similar to the ones of the different delineation methods seperately. This indicates that, in this small dataset, the combination of features from the two delineation methods does not result in additional information that is favorable for the predictive performance. It should be taken into account that, in order to prevent overfitting of the model, only three features could be retained in factor analysis. In other tumour types, multicollinearity in and between feature sets of different delineation methods is expected to be high as well, but further research is needed to confirm this effect in a larger population as well as for different tumour types.

This study showed that the effect of inclusion or exclusion of the region of central necrosis in the delineation significantly impacts radiomic feature values in PPGL and NSCLC, but does not impact the predictive performance of the PPGL radiomic model. A guideline on the choice to add or leave out central necrosis in delineation could not be provided. From a biological perspective, regions of central necrosis are part of the tumour and should therefore be included in the VOI, especially considering that central necrosis is associated with poor prognosis [6]. On the other hand, from a data-driven perspective, some features might already capture the presence of central necrosis without our awareness, since the features are investigated exploratively and without biological rationale. Both delineation methods can be used in radiomic studies, but feature values vary largely between both methods. For reproducibility purposes, especially in the setting of external validation [33], future studies should report whether regions of central necrosis were included in the delineation.
