**4. Discussion**

Various studies have shown the good performance of ACR-TIRADS in selecting thyroid nodules deserving FNA, stressing its high "rule-out" role, as confirmed by our group here and previous experience [1,6,8,16–18]. In the present work, the ACR and EU-TIRADS systems recommended FNA in 46.5% (223/480) and 51.9% (249/480) of the nodules, respectively. The rate of correct FNA indication as per ACR and EU (i.e., nodules with indication for FNA and with a cytology result ≥TIR3A), was 38.6% (86/223) and 36.1% (90/249), respectively, without statistical significance (*p* = 0.5872), in accordance with the literature [1,19]. On the other hand, in this experiment, the proportions of cytological tests that could be avoided as per ACR-TIRADS and EU-TIRADS were 28.5% (137/480) and 33.1% (159/480), close to what has been recently reported in a meta-analysis (25% and 38%, respectively) [19]. Moreover, some studies have also found no statistically significant differences in the pooled diagnostic performances between the two scores [20].

A meta-analysis showed a pooled sensitivity and specificity of 95% and 55% for TIR4/V and TIR5/VI classes in the ACR system, respectively, and a pooled sensitivity and specificity of 96% and 52% for the same classes in the EU-TIRADS [21]. Another recent meta-analysis showed better overall diagnostic performances for ACR than for EU (sensitivity 74% vs. 54%, specificity 64% vs. 53%, PPV 43% vs. 29%, NPV 84% vs. 81%, respectively) [22]. Other reports indicate that the ACR-TIRADS had significantly higher specificity and PPV but a lower sensitivity and similar NPV when compared to the EU-TIRADS system [1,10,23]. Our findings are in line with those of the latter studies, showing a slightly lower sensitivity (58.9% vs. 61.6%, *p* = 0.3173) but a significantly higher specificity (59.0% vs. 52.4%, *p* = 0.0012) for ACR, with similar PPV (38.6% vs. 36.1%, *p* = 0.1116) and NPV (76.7% vs. 75.8%, *p* = 0.5288). Although the cytological classes already help in the stratification of patients through a predicted ROM known for every single class, other ancillary tests/criteria might help in the distinction of lesions with malignant behavior, especially in indeterminate categories [12,13,24,25]. According to the routine cytopathological classifications, the expected rates of malignancy for classes TIR3A/III and TIR3B/IV are <10%/5–15% and 15–30%, respectively [12,13]. The results of an Italian study showed a malignancy rate estimated based on surgical outcomes of 17% and 40% for TIR3A/III and TIR3B/IV, respectively [26]. In our series, based on the cases with available surgical excision, the risk of malignancy was 22% (2/9) and 38% (5/13) in the

TIR3A/III and TIR3B/IV classes, respectively, which is slightly lower compared to those reported in a recent meta-analyses (TIR3A/III 10%, TIR3B/IV 52%) [27]. Wu et al. tried to better stratify the indeterminate nodules through a KRAS mutation assessment by polymerase chain reaction (PCR), assuming that this genetic alteration is usually associated with a moderate risk of malignancy, mainly represented by follicular tumors with a good prognosis [24]. However, this approach failed to improve the diagnostic performances of the sole ACR-TIRADS, and the KRAS mutation was exclusively found in tumors classified as TIR3B/IV, with no malignant mutated cases in the TIR3A/III class. In this setting, the combination of the existing radiological scales (e.g., ACR and EU-TIRADS) with the routinely used cytological systems has been previously investigated in challenging nodules. Hong et al. combined ultrasound patterns with cytology, finding a lower risk of malignancy for TIR3A/III class nodules with a Korean TIRADS 3 score [28]. A meta-analysis evaluated the putative role of thyroid US in predicting the malignancy of TIR3A/III nodules, finding a high variability in terms of the sensitivity and specificity among the studies analyzed, probably due to the heterogeneity of the different US criteria employed to detect malignant nodules and due to the variable prevalence of malignancies in the different cohorts [7]. However, the only feature with a significant influence on diagnostic accuracy was the increased vascularization of the nodules, which is not taken into account by both the ACR and EU-TIRADS systems. Other recent studies investigated the impact of different dimensional cutoffs, e.g., ≤2 and >2 cm, on the final performances of the available US systems, showing a range of values quite close to the ones obtained in the present cohort with the ACR and EU-TIRADS systems [8,29]. An innovative approach was proposed in 2017 by He et al., creating a new algorithm that significantly increased the predictive performance of US features [30]. In our case series, we found no improvement in diagnostic accuracy by combining either ACR or EU-TIRADS and the cytology class. Nevertheless, the combination of the cytological classes with specific US features extracted from the ACR system—namely, echogenic foci and margins with a not-zero score—led to a significant increase in specificity and PPV, with a slight reduction in sensitivity and NPV as compared to cytology alone (Table 4). The introduction of this new combined US-cytological approach allowed the correct identification of nodules with a ROM > 60%, which could certainly benefit from surgery, as well as those with a low ROM (<10%), still amenable for clinical follow-up as per SIAPEC and Bethesda operative indications (Figure 1) [12,13]. Although these promising results might represent a starting point for the improvement of the actual diagnostic performances of cytological classifications, we recognize some limitations in the present study: the limited number of cases with indeterminate cytology, i.e., TIR3A/III and TIR3B/IV, the short US follow-up period (12 months) for nodules who did not undergo surgery, the low number of cases that underwent surgery, and the high prevalence of benign nodules. This latter situation reflects the target population of multinodular hyperplastic goiters typical of a first-level general hospital population, which only partly reflects the settings encountered in highly specialized centers with a large proportion of malignant cases and where molecular testing may be more easily performed [31]. However, as a description of a real-life practice in first-level general hospitals, this could be of help in validating the proposed combined approach in larger cohorts to further verify the reported diagnostic performances, eventually leading to their implementation for the clinical assessment of thyroid nodules.
