**4. Discussion**

As colposcopy is a fundamental step as part of screening programs for the detection of pre-cancer cervical lesions, the success of the preventive strategy entirely depends on the diagnostic accuracy of the procedure. The assessment of colposcopy accuracy, in other words, the QC and QA processes, requires figures of the highest reliability in order to correctly evaluate the performance and effectiveness of colposcopic practice or to promote changes in standard requirements for operators.

This practical need deals with the objective issue of the very wide range of colposcopy accuracy figures available in the literature; meta-analysis studies have been published with the aim of providing statistically credible data to be used as comparison or reference values, thus allowing effective QC and QA processes in clinical practice. As an example, the most recently published meta-analysis, based on 15 studies and 22,764 cases, reports a combined sensitivity and specificity of 92% and 51% for a LG-SIL+ threshold and of 68% and 93%, respectively, for a HG-SIL+ threshold [18].

Unfortunately, data obtained in this fashion suffers from the significant bias of including papers with different study designs that influence the outcome reported; widely different figures are in fact reported depending on how the outcome of colposcopy is evaluated. Some studies investigate colposcopy outcome based upon the Colposcopic Impression (CI) that a CIN2+ is present; others evaluate the outcome on taking a biopsy because there is thought to be a Disease Present (DP), with the threshold of DP usually being a CIN1+. For this reason, the outcome measures have a significant effect on accuracy evaluation [19], indicating wide differences in both sensitivity and specificity.

That said, the present study, due to its main object of investigating and analyzing the performance of colposcopy mostly in terms of the QC of colposcopists and of the procedure, has to be seen as CI-based. Thereafter, the reported results are mainly discussed and compared with similar literature data. Nevertheless, some DP-based outcome assessments have been possible and are similarly discussed and compared.

The combined CI sensitivity and specificity (CIN2+ threshold) values obtained from the survey were 73.7% and 87.7%, respectively (see Table 1), with no statistically significant differences between senior and junior colposcopists; in general, this can be seen as a favorable result of the teaching programs of the involved institutions. These figures, compared with previous reviews [7,20], may be placed above weighted mean values for sensitivity and fully comparable with weighted mean values for specificity. Being the QC of Italian colposcopy/colposcopists the major objective of the study, these figures, together with the absence of significant differences between juniors and seniors, in our opinion, allow a more than satisfactory general evaluation of the colposcopy/colposcopists performance. The strength of this impression may further be supported considering the difficulty of the survey and the workload required of attendants.

This is particularly interesting in consideration of the experience level of the participants: since junior colposcopists performance accounted for better accuracy in each subset of thresholds, though without statistical significance, this may either reflect the good quality of the teaching programs in the institutions surveyed or the need for senior colposcopists to consider some kind of self-improvement.

In terms of potential methodological biases, the use of static digital images of the cervix versus live colposcopy to assess the diagnostic accuracy and to perform QC evaluations,

does not represent a limitation concerning the reliability of the sensitivity/specificity figures; as reported by Liu [21], recognitions of colposcopic patterns and colposcopic impression formulated on live colposcopy are reproducible on static digital images with high levels of agreement. Moreover, the use of a web-based program of digital colpophotographs, though with the different aim of assessing the accuracy of colposcopically directed biopsies, has already been proposed in Italy and demonstrated effective for QA purposes [9,22–24].

Regarding the results specifically directed to QC of colposcopists, we observed full agreement with the experts panel for the SCJ evaluation, following the 2011 IFCCP terminology [14], in 82.2% of *fully visible* SCJs, in 51.4% of *not fully visible* SCJs, and in 64.9% of *not visible* SCJs; in this analysis, a statistically significant difference was observed between seniors (67.5%) versus juniors (60.7%) for the *not visible* SCJ subgroup (*p* = 0.01).

When SCJ was categorized following the 2017 ASCCP proposal [15], grouping the *not fully visible* and the *not visible* SCJ into one single category named *not fully visible*, full agreement with the experts increased to 75.4%, still having a statistically significant difference between seniors (77.1%) and juniors (72.8%) (*p* = 0.01).

Comparable comments can be made as far as it concerns the Transformation Zone (TZ): full agreement with the expert panel was achieved in 73.2%, 53.8%, and 66.7% of Type 1, Type 2, and Type 3 (2011 IFCPC terminology) [14], respectively; statistically significant differences were present between seniors and juniors for all three categories (see Table 4). The lowest rate of agreement for both SCJ visibility and the type of the TZ was recorded in the intermediate category.

Several authors have addressed the issue and the practical implications of adopting uniform and standardized colposcopy terminology, underlining the importance and accuracy improvement of the procedure when precise definitions of cervical patterns are widely utilized in clinical practice. In this view, the 2011 IFCPC terminology has represented a significant step forward in terms of colposcopy accuracy, having demonstrated better correlation with histology compared to traditional methods [25]. Despite that, the SCJ/TZ parameters have been repeatedly identified as the weak side of the process, as the intermediate categories, namely the *not fully visible* SCJ and the *Type 2* TZ, were always associated with the lowest grade of accuracy and reproducibility [26,27].

Our results consistently confirm this analysis and support the 2017 ASCPC proposal, detailing a significant increase in accuracy when a two-tailed classification of the SCJ is adopted, as recently published articles report [15,28].

The analysis of the grade of the TZ (G) and of the colposcopic impression compared with histology allows some comments that, in our opinion, are particularly interesting in terms of providing accuracy figures having both QC and QA meanings.

In terms of minor/major acetic acid alterations, full agreement was achieved in 76.25% (negative), 60.59% (G1), 59.11% (G2), and 64.64% (cancer suspicious). It is noteworthy that a *negative* interpretation and a *G1* interpretation underestimated 5.05% and 19.26% of CIN2+ histologically proven lesions, respectively (Table 5).

As far as it concerned the colposcopic impression, a *negative* impression and a *LG lesion* impression underestimated 3.06% and 21.2% of CIN2+ histologically proven lesions, respectively (Table 6).

The analysis of these figures, performed consistently with the DP (CIN1+ threshold) principles of QA assessment, provided the following results: Overall, overrating the colposcopic impression was 1.5 times more common than underrating. However, when histologically proven HG lesions (CIN2-CIN3) were considered, overestimation and underestimation were fully comparable. It is in some way reassuring that only 3.06% of CIN2+ were considered colposcopically negative. Less reassuring is the detected 21% underestimation rate of CIN2+ lesions that were colposcopically interpreted as *LG lesions*. In terms of colposcopy principles, this should not represent a serious issue since an *LG lesion* colposcopic impression represents an indication for targeted biopsy, though the option of non-biopsy is acceptable [29]. Unfortunately, the balancing effect of the targeted biopsy

in reducing the negative effect of colposcopic underestimation is largely influenced by real-life practice.

As shown in Table 7, our survey identified a 36.4% non-biopsy rate in histologically notnegative cases (24.3% of HPV-CIN1, 9.1% of CIN2-CIN3, and 3% of cancers, respectively). As reported, non-biopsy rates significantly decreased with increasing severity of histology (*p* < 0.05). These findings are interestingly consistent with several population-based studies on colposcopy QA [30,31]. Further, addressing the analysis specifically to cases with a *LG lesion* colposcopic impression and a CIN2+ histology, the non-biopsy rate accounted for 13.6%, with a statistically significant difference between seniors and juniors (10.1% vs. 20%) (*p* = 0.01) (Table 8). It clearly appears that experience in colposcopy plays an important role, significantly decreasing by 50% the risk of lower CI accuracy.

In parallel, together with the non-biopsy rates, our figures regarding the correctness of biopsy-taking deserve some comments; correctly performed biopsies accounted for 58.9% of HPV-CIN1, 77.3% of CIN2-CIN3, and 91.7% of cancers. In our data, the overall amount of incorrect-site biopsies performed accounted for 16.8% in HPV-CIN1, 13.6% in CIN2-CIN3, and 5.3% in cancers (*p* < 0.05%); in the subgroup with an *LG lesion* colposcopic impression and CIN2+ histology, a biopsy was correctly performed in 71.4% of cases (seniors 73.9% vs. juniors 66.9%) (*p* < 0.05).

As reported by Sideri [9], potential biases can be addressed when the accuracy of colposcopically targeted biopsy is investigated for QA purposes. Some may favor accuracy (e.g., the artificial conditions that may facilitate recognition of colposcopic features), while others may have the opposite effect (e.g., the impossibility of increasing the magnification and the single-shot chance given to participants). Nonetheless, the overall sensitivity does not appear to be significantly influenced by these factors.

Despite an overall good performance of the decision-making process for taking a colposcopically targeted biopsy, our results provide another confirmation that the sensitivity of biopsy for HG lesions is a justified concern; a large amount of data are available on the subject, consistently pointing to the need for improving options [5,32–35]. Colposcopists' experience, though with marginal differences, has consistently been identified as positively influencing colposcopy accuracy [36,37].

Being cervical pre-cancer lesions detection the primary objective of colposcopy within cervical cancer screening programs, results from the present QC and QA assessments of colposcopy in Italy suggest some final considerations: (a) the overall sensitivity/specificity figures are in agreement with, and in some aspects better than, the mean figures reported by meta-analysis; (b) underestimation of colposcopy is particularly relevant when a *LG lesion* colposcopic impression is formulated; (c) the recommendation of taking a colposcopically targeted biopsy in cases of *LG lesion* colposcopic impression is justified by the rate of missed CIN2+ cases; (d) the low rate of statistically significant differences between experienced and junior colposcopists allows a favorable judgment of teaching programs; and (e) the need for continuous update, improvement, and QC of colposcopists is recommendable. In conclusion, the authors of the present article strongly believe that the adoption of colposcopy standards and quality recommendations by scientific societies is a fundamental step for effective cervical cancer prevention [10–13,29].

**Author Contributions:** Conceptualization, M.O. and F.C.; methodology: M.O., F.C., C.M. and L.I.; data curation: M.O., F.C., C.M. and L.I.; formal analysis, M.O., F.C., C.M. and L.I.; investigation, F.C., F.S., N.C., A.S., B.G., R.D.V., C.R., F.L., M.L.D.M., A.C., J.D.G., E.P., A.D.I., C.C., M.D., M.C. (Massimo Capodanno) and A.P.; supervision, M.O. and M.C. (Massimo Candiani); validation, M.O., F.C., A.C., F.S. and M.B.; writing—original draft, M.O., F.C., A.C. and F.S.; writing—review and editing, M.O., F.C., A.C. and F.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Due to the study design, no ethical approval was required.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Research data are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.
