**4. Discussion**

We found strong evidence that the density of a woman's breast significantly influences recall decisions in a population-based screening program. Mammographic abnormalities were more likely to be recalled when seen in dense breasts than in nondense breasts. We also found that a significant number of the lesions found in women with dense breasts recalled for assessment were benign, and almost double the number of benign lesions recalled in women with nondense breasts. These findings sugges<sup>t</sup> that around 1 in 3 women with dense breasts recalled for assessment had an unnecessary biopsy.

Several factors affect the interpretation of two-dimensional images and may be responsible for the high number of recalls, particularly in women with dense breasts. Summation artefacts caused by the superimposition of dense tissue on benign lesions may mimic breast cancer, which may have resulted in the high rate of unnecessary recalls [15,22]. Falsepositive or negative recall at screening may be due to perceptual or cognitive errors caused by factors such as poor lesion visibility and subtle or atypical cancer appearances [23]. It has been shown that breast density is more likely to cause perceptual errors such as false positives and negatives due to its ability to obscure subtle lesions or create difficulty in distinguishing lesions in distracting background breast tissue [14,24,25]. Such perceptual errors and the higher of cancer incidence in dense breasts may have contributed to the high recall of women with high breast density.

Mammographic abnormalities such as calcifications, masses with indistinct, spiculated or circumscribed margins, and asymmetries are frequent features of breast cancer [15,22,25]. Mammographic features such as calcifications and discrete masses constituted the largest proportion of benign biopsies. These two mammographic features are common findings in screening programs [26–28]. The high false-positive biopsies of these lesion types underscore the need for studies to establish the features of these lesions associated with malignancy to inform criteria for reducing unnecessary recall. Such studies may provide reasonable thresholds for identifying true positive lesions and reduce overtesting and unnecessary biopsies of benign lesions. Another factor that may have been responsible for the higher recall of benign lesions is lesion size. Screening quality can be judged by the detection of small cancers, defined as those with a diameter of ≤15 mm. Small-sized calcifications (≤15 mm) and calcifications that cover a larger region of breast are more likely to be malignant [29–31]. However, the diameters of calcifications varied widely in our data. Malignancy may be established by a complex combination of lesion features including size, morphology, and shape. Studies that combine these features to predict malignancy may better inform criteria for recall and biopsy.

A major focus of our study was to examine the potential role of DBT and ultrasound in reducing unnecessary biopsies. Previous pieces of work that compared DBT and ultrasound

focused on women with mammographically negative dense breasts [19,20] and showed that ultrasound has a higher false-positive rate than DBT. Our study focuses on women with mammographically suspicious findings recalled for assessment and shows that ultrasound has significantly greater potential to decrease unnecessary biopsies than DBT in all breast compositions. We found no significant difference in true negative proportions between ultrasound and DBT in nondense breasts. In dense breasts, ultrasound showed a significantly higher proportion of true negatives than DBT. We also found that the number of cases that required assessment to prevent one unnecessary biopsy was significantly lower with ultrasound than DBT in heterogeneously dense and extremely dense breasts. These findings sugges<sup>t</sup> that every benign lesion in heterogeneously and extremely dense breasts being unnecessarily recalled has approximately a 50% (1 out of 2 benign lesions) chance of receiving benefit from ultrasound.

In women with nondense breasts, we found no significant difference between DBT and ultrasound in terms of the number of cases that required assessment to prevent one unnecessary biopsy. To the best of our knowledge, the current study was the first to compare DBT and ultrasound assessments of recalled lesions across dense and nondense breasts. Previous studies [15,25] that focused on ultrasound showed that mimickers of breast cancer with benign morphologic ultrasound features could be safely managed with ultrasound follow-up to establish stability and confirm benign status. In dense breasts, ultrasound was found to be a satisfactory alternative to biopsy for solid lesions with benign morphological ultrasound features because of the high negative predictive value (99.8%) [32]; this may reduce anxiety for women recalled for assessment.

Previous studies that sought to reduce unnecessary biopsies were based on DBT. In one of these studies [16] incorporating DBT into the diagnostic workup of mammographic abnormalities would have resulted in a reduction in the number of benign biopsies conducted during screening assessment. The authors reported that DBT enhances reader accuracy and confidence in judging whether mammographic abnormalities are cancerous or not, resulting in a decrease in biopsies from 69% to 36%. However, this study did not adjust for breast density and lesion characteristics. A study from the USA [33] showed that DBT has the potential to decrease unnecessary biopsies for all breast densities, with substantial reductions for women with heterogeneously dense breasts (21.3%) and extremely dense breasts (27.5%). Our study, based on an Australian population and radiologists, showed only modest potential of DBT to reduce unnecessary biopsies for women of all breast compositions: entirely fatty (5%), scattered fibroglandular (10%), heterogeneously dense (7%), and extremely dense breasts (9%). These differences may be due to the differences in study designs and recall classification criteria. Unlike the USA study, we included women recalled for assessment following a suspicious finding on their screening mammograms that were read by two radiologists who worked independently. Additionally, the RANZCR grade 3 used by BreastScreen Australia is classified as a positive finding that combines the BI-RADS 3 and BI-RADS 4A categories in the American College of Radiologists BI-RADS Atlas. These differences may have influenced the impact of DBT during assessment for recalled women.

Although ultrasound is an effective assessment tool to differentiate between benign and malignant lesions that appear suspicious on mammography, it is limited in accurately classifying calcifications. Therefore, mammography-recalled calcifications should not be wholly ruled out based on ultrasound findings. This is supported by a previous study [17] that suggested that women should be recalled for biopsy even if suspicious calcifications are considered normal during an ultrasound. This previous work also showed a decrease in the false-positive rate in screening mammography by incorporating ultrasound into the diagnostic work-up of suspicious findings. However, further studies are needed to estimate the benefit-to-harm ratio and costs of ultrasound and DBT as assessment tools.

Our study is not without limitations. First, it is a single-centre study. Second, the sample size is relatively small, and 60.4% of recalled lesions in our study were in dense breasts, representing a large proportion of recalled mammograms. Thus, a greater understanding of

work-up for dense breasts might help screening programs better manage their assessment procedures and resources.
