*4.2. Model Predictions*

The model predictions for CNBSS1 (Figure 4) demonstrate the expected characteristics of the events in a screening regimen. In this case, the model was configured to simulate a screening intervention in the form of a randomized controlled trial with five annual examinations. The curve for unscreened women serves as a reference; there is a gradual monotonic rise of incidence corresponding mainly to increasing age.

For CNBSS1, at the onset of screening ( −4 y), the model predicts a sharp increase in incidence in the arm screened with mammography corresponding to the detection of cancers whose state of development exceeded the threshold for screen detection, but had not ye<sup>t</sup> reached that required to surface symptomatically. Once this "prevalence" screen has occurred, the predicted incidence rates are lower because many of the cancers that had accumulated prior to the onset of screening have been removed from the pool for detection. It might be expected that if there are slow growing or indolent cancers present these would largely be found in that initial prevalence screen along with newer, more aggressive cancers.

A sharp increase compared to no screening is also seen at the single screen of the control (UC) group, but in this case the increase is smaller, under the assumption that mammography was more sensitive than physical examination (higher threshold for detection). While the mammography screened group would include both invasive and in situ cancers, very few in situ cancers would be found in the controls. After the first examination at −4 y, the control group would no longer receive screening, so incidence would fall slightly below the level for unscreened women because some of the cancers had been found earlier due to the lead time provided by an expert clinical examination.

As shown in Figure 5 for CNBSS1, the lead time provided by mammography screening and the greater sensitivity for in situ cancer would be expected to cause an excess in the cumulative cancers detected in the screened group and this would build up gradually over the five screening examinations. After t = 0, the excess of invasive cancers in the MP arm is gradually compensated by a greater number in the control arm. Although half of the excess disappeared at 2.5 y and 75% by 7 y, the excess is not completely cancelled until 23 y, suggesting that there is a broad range of growth rates in these cancers detected by screening. At t = 20 y (25 years post entry), the predicted overdetection is 2% for invasive cancers and 13% for invasive plus in situ cancers. This provides a very different picture as to what happens to the excess cancers after the intervention period than that seen in Figure 3 in the data presented by Baines et al. [21] who estimated overdetection rates of 48% and 53%, respectively, at the same time point.

The predicted mortality reduction for five annual screens at t = 10y is 9% compared to the UC arm. This modest mortality reduction occurs because there were only five screens and the control group received an initial physical exam. Compared to no screening, the model predicts a 16.5% mortality reduction for five screens and 50% reduction for annual screening between ages 50 and 74 [32].

It may be worth making a comment about the effect of the choice of denominator in the calculation. If the number of cancers found in the MP arm (284) had been used rather than the number of screen-detected cancers (213), the overdetection estimate for invasive cancers would have dropped from 48% to 36%.

In CNBSS2 (Figure 6), there is a similar behavior at −4 y in both trial arms, again with a larger increase for the women receiving mammography. Screening continues for four more exams, again building up an excess in the mammography group (Figure 7) that is maximum at the cessation of screening (0 y). This predicted excess is greater than for CNBSS1 despite there being fewer women in the cohort; however, the timing of the decrease in excess cancers is similar to that for CNBSS1.

At t = 20 y (25 years post-entry), the predicted overdetection is 1% for invasive cancers and 16% for invasive plus in situ cancers compared to 5% and 16% from Baines et al. The predicted mortality reduction for five annual screens is 6% compared to the PE arm and 16.5% compared to no screening.

It should be mentioned that, although the model was calibrated against empirical incidence data, it is, of course, not perfect, and one should not overinterpret the numerical predictions. While it provides a mechanistic picture of the elements of screening, there are limited data available to describe the development of DCIS and the rate and degree of its transition to invasive cancer, so that there is considerable uncertainty in modeling the timing associated with the disappearance of the excess created by in situ cancers. Nevertheless, the modeling points to a similar behavior as seen both CNBSS trials in that the excess associated with in situ cancers persists for several decades, suggesting a significant sub-population of nonprogressive in situ cancers.

What are the most plausible explanations for the difference in behavior between the two trials and between the experimental and modeling results for CNBSS1? One possibility is that the women in the MP arm had a higher likelihood of having breast cancer. An anomaly of the CNBSS is that women received a clinical breast examination before they were entered in the register for the trial and, therefore, the suggestion has been put forward that non-randomness could have been introduced at this point, prompted by the findings of the nurse examiner [33]. There is now evidence that this did occur for some of the women, particularly those with advanced cancers at study entry, and this is a credible explanation for the fact that the CNBSS trials were the only RCTs that did not show a mortality benefit associated with routine mammography screening [5]. However, there is a suggestion that the imbalance was not limited to advanced cancers. Because in the prevalence (trial entry) episode of screening in both trials a physical examination of the breast was conducted on women in both trial arms, this provides a point in time where one can compare the number of cancers detectable by palpation between the arms. Although the differences did not reach statistical significance, the authors reported that there were approximately 10% more palpable cancers detected in the MP arm than in the usual care arm of CNBSS1 [23]. In CNBSS2, the excess was 13% [24]. Given that the procedure for detection of palpable cancers was the same in the two study arms and that the same nurses and physicians performed the examinations for both trial arms, such a difference is unexpected, whatever its cause, and would contribute in part to an apparent increase in overdetection.

Another possibility is suggested by the difference in the follow-up in the control groups between CNBSS1 and CNBSS2. In the former, after the initial clinical examination, women in the "Usual Care" arm reported on the incidence of breast cancer primarily through mailed, self-administered questionnaires, while women in the control arm of CNBSS2 received four more annual episodes of clinical examination during the screening epoch and presumably developed a closer relationship and commitment to communicating with study personnel during the subsequent follow-up. A similar situation existed in the study arms of both trials. It is possible that cancer arising after the single screening interaction

in the Usual Care arm was under-reported and this would lead to an overestimate of overdetection. Supporting this finding is the observation in the publication by Baines et al. that virtually no DCIS was reported in the UC arm for the 20-year period following the screening intervention [21].

The CNBSS trials were heavily criticized due to the poor quality of the mammography. This is supported by the observation that the mean diameter of cancers in the screened group was only 2 mm smaller than that in the control group [22]. This suggests that the lead time afforded by mammography in CNBSS for actively growing cancers was relatively short. Additionally, while the mean diameter of cancers detected in CNBSS was about 2.1 cm, those cancers that were detected at the point where they were nonpalpable had a mean diameter of 1.4 cm. Therefore, most of the cancers detected by screening in the CNBSS were palpable and as mentioned above, there was a tendency for more of those cancers to be in the MP trial arms. In addition, the nonpalpable cancers most likely to be detected with poor quality mammography would be those that grew more slowly and, therefore, possibly would not reach the point of clinical detectability for many years, explaining the long tail in the curve for excess invasive cancers.

The observation that the excess invasive cancers increased over time after the period of screening intervention casts suspicion on the reliability of the use of data from the CNBSS1 trial for estimating overdetection. Documented problems with fair randomization and the tendency toward more palpable cancers in the MP arms of both trials adds to these concerns [23,24,33]. The one observation from these trials that does merit further consideration is the persistence of the excess that is associated with screen-detected in situ cancer.
