**1. Introduction**

The overdiagnosis of breast cancer has been suggested by some to be the largest harm associated with breast cancer screening [1–4]. Overdiagnosis refers to the diagnosis of breast cancers that normally would not have appeared in a woman's life, i.e., had not caused harm, before she had died of some other cause. In the case of breast screening, this would come about because of the smaller threshold in lesion size (and presumably development) provided by the screening modality. This provides lead time, and it is this lead time that contributes to the reduction in mortality and morbidity that has been demonstrated in women who participate in screening compared to those who do not [1,5]. If cancers detected earlier through screening would not have surfaced or done harm if they had remained undetected, then they can be considered to have been overdiagnosed, or more correctly, overdetected [6], a term that will be used throughout the remainder of this article when referring to detection by screening, while "overdiagnosis" will be used to refer to pathologic assessments.

Consider a cohort of women of the same age at a given time point. It would be expected that a certain number of breast cancers, illustrated schematically in Figure 1a as discs, would be initiated in the cohort each year. The cancers would vary in size and growth rate according to a variety of driving factors. At an early point, as illustrated on the left in the figure, they would not have ye<sup>t</sup> reached the threshold for detectability; however, as time elapses, the cancers grow and at some point become large or noticeable enough to be detected by the women or by a clinician. Additionally, as shown in the figure, at a point that may occur before or after the threshold for clinical detectability is reached, they have become sufficiently advanced that they will be destined to become lethal (discs indicated with "x"s), or at least their treatment would impose considerable morbidity. The initiation

**Citation:** Yaffe, M.J.; Mainprize, J.G. Overdetection of Breast Cancer. *Curr. Oncol.* **2022**, *29*, 3894–3910. https:// doi.org/10.3390/curroncol29060311

Received: 14 April 2022 Accepted: 17 May 2022 Published: 30 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of cancer is a continuous process over time with a rate that is age-dependent, so that as time progresses new, earlier cancers are added to the population of previously undetected cancers that have grown larger.

**Figure 1.** (**a**) Illustrates the initiation and growth of breast cancers in an unscreened population (e.g., 1000 women). Size of the lesion is represented by the diameter of the discs. X indicates cancers that will result in death. (**b**) The effect of screening. In (**<sup>a</sup>**,**b**), the lower rows indicate cancers that have been detected and treated, while upper rows show cancers in the cohort that have not ye<sup>t</sup> been detected. (**c**) Difference (excess) in the cumulative number of cancers found in screened (black curve) versus unscreened (red curve) individuals, depicted in the graph as the dashed line, increases during the period of screening. In this example where there is no overdetection, cancers are found and treated earlier in the screened group; however, after screening ends at Sn, the number in the unscreened group will catch up over time, eliminating the excess.

.

The principle of screening is illustrated for an identical cohort of women in Figure 1b. If a suitable test is available to which the cohort is exposed at regular intervals and the threshold for lesion detectability is smaller than the clinical detection threshold, then cancers can be found and treated before they reach that clinical threshold. In this screened cohort, the cancers that are detected and treated are shown on the lower track, while those that remain undetected in those women are shown on the upper track. The time that it takes for cancers to grow from the size threshold of the screening test to the threshold for clinical detectability is the lead time afforded by screening. The term "size" is used loosely here as a surrogate for detectability because other factors that develop over time such as changes in morphology may also affect the detection threshold. The expectation is that there would have been less progression in size, a lower probability that metastasis has occurred and a greater chance of avoiding death from those earlier cancers. This paradigm has been demonstrated to be correct through multiple randomized trials, case–control and observational studies, e.g., [1,5,7].

Eventually, most of the equivalent cancers found earlier in the screened cohort would surface in the unscreened women due to symptoms or accidental detection. In the case of very slow growing (indolent) cancers (grey discs in Figure 2), however, these may not have the potential to progress beyond a certain point or to metastasize and, therefore, would not become lethal or at least not been clinically detectable in the absence of screening before the individual had died of some other cause. Under these conditions, the woman would never have been aware that she had cancer. This phenomenon of overdetection is illustrated in Figure 2b. More of these cancers with limited malignant potential will be detected in a screened population. To the extent that this occurs, the total numbers of cancers detected (and treated) in a cohort of women participating in screening will exceed the corresponding numbers in an unscreened cohort (Figure 2c).

**Figure 2.** Initiation and growth of breast cancers in the presence of cancers with limited malignant potential. (**a**) An unscreened population. Grey discs indicate cancers that are destined not to be lethal. (**b**) The effect of screening. Lower row indicates cancers that have been detected and treated, while upper row indicates cancers in the cohort that have not ye<sup>t</sup> been detected. (**c**) Overdetection. After screening ends at Sn, the initial excess of cancers in the screened grouped will not be completely eliminated over time.

The actual diagnosis of breast cancer is performed by a pathologist on biopsied tissue. A concern regarding overdetection is that a woman who otherwise would not have experienced the anxiety and other negative factors associated with having breast cancer would have become a breast cancer patient.

Overdetected cancers are real cancers and should not be confused with the so-called false positive results of screening, where further imaging or biopsy, triggered by an equivocal screening examination, demonstrates that suspicious results on screening are not cancer. The main point is that there is no direct benefit to the individual from finding overdetected cancers. They are currently an unavoidable collateral finding associated with the earlier detection and treatment of other cancers that would indeed otherwise likely become lethal.

Overdetection by screening can be considered as having two components: (1) detection of nonprogressive cancers and (2) detection of cancers that are progressive, but where the progression is sufficiently slow such that they would not have been detected in unscreened women before they would have died from a cause other than breast cancer. In a recent publication, for women in the age range 50–74 screened biennially who were monitored by the Breast Cancer Surveillance Consortium in the U.S., Ryser et al. estimated the rate of overdetection at 15% [8]. In their Bayesian inference study of 718 cancer diagnoses in 36,000 women, they estimated that one-third of overdetected cancers were indolent, while the other two-thirds were progressive but had not emerged before death had occurred due to another cause. Overdetection via the second mechanism is more likely to occur in older than younger women at time of screening because competing causes of death are higher in the former and, therefore, it is more likely that a cancer will not be detected in her unscreened counterpart before she dies.

Overdiagnosis, in its true sense, occurs when the pathology examination is not able to distinguish potentially aggressive from indolent cancers. The harms of overdiagnosis are the morbidities associated with overtreatment if this occurs. The same limitations can result in underdiagnosis and subsequent undertreatment, with a heightened probability of recurrence or death. Both of these are harms of the diagnostic process and the processes leading to therapeutic choices. It is worth mentioning that not all cancers that are overdetected are overdiagnosed. In some cases, the pathologist can identify disease at a very low risk for recurrence at biopsy. However, unlike the trend toward active surveillance in prostate cancer, where some men choose not to be treated for minimal disease, most women currently receive some level of treatment after a diagnosis of breast cancer. Some of these cancers are undoubtedly overtreated. It is, of course, also possible for cancers detected symptomatically to be over or underdiagnosed.

The main difference between screen-detected and symptomatically detected cancers is that the former tend to be smaller and earlier stage making the diagnostic procedure more challenging. This implies that overdiagnosis is more likely to occur in in situ than in invasive disease. The probability of detecting in situ cancer is greatly increased with screening and, therefore, these lesions require special consideration.

#### *1.1. In Situ Cancers*

In situ cancers are rarely detected in unscreened women, whereas in a cohort of women routinely screened with mammography, they constitute 20–30% of detected cancers. It has been argued that in situ cancer (which, here, will be loosely referred to as ductal carcinoma in situ or DCIS) should not be considered as a cancer in that, in itself, it does not have the potential to be lethal. If this were the case, then it could be considered that in situ cancer alone might be responsible for an overdetection rate of 20–30%. Certainly, of those cancers overdetected because they are nonprogressive, in situ cancers likely represent a large proportion. Glasiou et al. used Australian registry data to estimate overdetection for various cancers and concluded that for breast cancer there was an overall 22% overdetection of which 9% was for in situ cancers [9].

Nevertheless, in situ disease cannot simply be dismissed as being innocuous. It is well established that, if treated by breast conserving surgery alone, there will be ipsilateral recurrence in about 28% of cases and half of these will appear as invasive cancer [10–12]. The use of radiation therapy reduces local recurrence by a factor of two. More recent work by Solin et al. showed recurrence rates of 25% for high grade lesions 1 cm or smaller and 14% for larger (2.5 cm or larger) low or intermediate grade in situ cancer. Again, in each case, about half the recurrences were as invasive cancer [13]. The optimization of strategies of how to manage in situ cancers detected by screening is, therefore, a topic of grea<sup>t</sup> interest and some efforts in this direction will be described later in this article.

#### *1.2. Estimating the Amount of Overdetection from Screening*

There have been many attempts to estimate the amount of overdetection that would result from screening. All of these suffer from various limitations, and this is responsible for wide variation among estimates [5,8,14,15]. For example, Bleyer and Welch [16] extrapolated historical breast cancer incidence data before the onset of breast cancer screening from the SEER Registry to predict what incidence should be in the current era and compared with actual incidence to estimate the excess that they assumed was attributable to overdetection from screening. While conceptually this approach is sound, uncertainties in

the year-to-year increase in background age-specific incidence rates, lack of information in SEER on the mode of cancer detection and other sources of variability made their calculation extremely unreliable. Small differences in the assumptions of the values of some of the extrapolation parameters could result in very high estimates of overdetection or even of underdetection [8,17,18].

Puliti et al. and Etzioni and Gulatti have identified several of the critical factors required in the estimation of overdetection [19,20] and these include accounting for effects of lead time from screening and differences in cancer risk between comparison groups [19]. Puliti et al. have suggested that when these effects have been appropriately accounted for, the fraction of cancers that have been overdetected is on the order of 1–10%.

Ideally, overdetection would be assessed through a randomized trial, where women in one arm receive screening and those in the other do not. This would eliminate possible differences between the two groups that could be responsible for differences in breast cancer incidence. Both arms would be followed for cancers detected during the period of the screening intervention and for a time period afterwards that is no less than the lead time provided by screening. To avoid bias, it is essential that the quality of the follow-up is identical for the two trial arms. During the post-intervention period, neither group would receive screening. The number of breast cancers occurring in each group would be carefully and thoroughly monitored and the difference would provide a measure of overdetection.

In such a trial, it would be expected that initially there would be an excess of cancers in the screened group due to their earlier detection. After a delay due to the screening lead time, the corresponding cancers would begin to appear in the control group and there would be a compensating decrease in the excess as illustrated in the graph in Figure 1c. If there was no overdetection occurring, then after the appropriate follow-up time, the excess would be neutralized.

Such an idealized trial is almost impossible to achieve. As in all randomized screening trials, there will be crossover effects due to the noncompliance of women assigned to screening as well as some women in the control group seeking screening outside the trial. If this occurs, it will cause an initial reduction in the measure of overdetection.

The screening behavior of women after the period of the intervention will also affect the estimated overdetection. The measure will be most accurate if neither group receives screening during the post-intervention follow up. Given human behavior this situation is unlikely to be achieved. If there is more post-intervention screening in the screening arm, overdetection will be overestimated and, if there is a greater degree of screening in the control group, the estimated overdetection fraction will be reduced.

#### *1.3. Example–Canadian National Breast Screening Study*

An example of the problems of estimating overdetection can be seen in the publication by Baines et al. of their revised estimates of overdetection of breast cancer using data from the two randomized controlled trials in the Canadian National Breast Screening Study (CNBSS) [21]. This analysis was an update from Miller et al., 2014, who had originally provided estimates based on the merged data from the two studies [22]. The revised estimates by Baines et al. were considerably larger than the previously published values.

The calculation used by Baines et al. was very simple. In each arm of the RCT, the total number of breast cancers found, which comprised screen-detected cancers or other cancers, were totaled over the period of observation. The estimated overdetection at a given point in time was obtained by dividing the difference in these totals between the study and control groups (the excess cancers) by the number of screen-detected breast cancers in the study group during the period of intervention.

Different authors have employed other denominators in this calculation [1]. Those who wish to accentuate the effect tend to choose the smallest number and vice versa. It is not clear that any particular choice is most correct, but the effect on the estimate of overdetection can be large and comparison of studies requires that the same denominator be used in all cases.

In CNBSS1, women in the age range 40–49 years in the intervention (MP) arm received annual mammography and physical examination, while the control group (UC) received a single physical examination at entry followed by "usual care" in the community, whose nature was undefined [23]. Women aged 50–59 at entry in the intervention (MP) arm of CNBSS2 received mammography plus clinical examination by a nurse (by a physician in Quebec) annually, while those in the control (PE) arm received annual clinical examination only [24].

The estimates of overdetection by Baines et al. are far higher than values published by other authors based on data from randomized trials or observational studies of service screening [19,21]. We, therefore, attempted to examine the results reported by Baines et al. to determine if these estimates were supported by their data. We also conducted microsimulation for the purpose of understanding mechanisms that could lead to the discrepancy in results.

#### **2. Materials and Methods**

For each of the two studies, CNBSS1 and CNBSS2, Baines et al. reported the cumulative number of invasive cancers and total (invasive and in situ) cancers that had accrued in each trial arm during the period that screening took place (denoted here as Years −4 to 0) and at 1, 2, 3, 4, 5, 10, 15 and 20 years after the study screening examinations terminated [21]. The differences between these two sets of numbers represented the in situ cancers. We observed the patterns in the accumulation of invasive, total and in situ cancers over time to assess if these patterns reflected those expected in a cohort of Canadian women.
