**1. Introduction**

*People talk about evidence as if it could really be weighed in scales by a blind Justice. No man can judge what is good evidence on any particular subject, unless he knows that subject well.* George Eliot (Mary Ann Evans), Middlemarch

Recent eyewitness accounts [1–3] of the Canadian National Breast Screening Studies (CNBSS) have finally confirmed what was long suspected about the biased allocation of symptomatic women in the screening arm of the trials. Clinical breast examination was performed before allocation at 14 out of 15 study sites, and witnesses confirm that in at least some of those sites, symptomatic women were preferentially placed in the mammography arm of the study. Additionally, symptomatic patients were recruited for mammographic assessment within the screening arm of the studies. This skewed the data, resulting in more late-stage cancers and deaths for women undergoing mammography than for women allocated to the non-mammography arm.

The results of CNBSS have created ongoing doubt about the benefit of screening mammography, particularly in the 40–49 age group, where there was little other research at the time. CNBSS have been used in the formulation of guidelines worldwide for decades, including the Canadian Task Force on Preventive Health Care (CTFPHC) [4], the US Preventive Services Task Force (USPSTF) [5], European Commission [6], World Health Organization (WHO) [7], and more. Yet, early on, CNBSS received extensive criticism about many aspects of implementation.

The volunteer-based recruitment for CNBSS was fundamentally different from the remainder of the mammography randomized controlled trials (RCTs), which were populationbased. As a result of the volunteer recruitment, there were high levels of contamination in CNBSS. Women allocated to the control arm of the trial, but who had volunteered because they were motivated to screen, were more likely to seek mammography outside the trial [8,9]. Difficulties in recruitment were even acknowledged by one of the studies'

**Citation:** Appavoo, S. How Did CNBSS Influence Guidelines for So Long and What Can That Teach Us? *Curr. Oncol.* **2022**, *29*, 3922–3932. https://doi.org/10.3390/ curroncol29060313

Received: 27 March 2022 Accepted: 25 May 2022 Published: 30 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

authors [10], lending plausibility to the eyewitness accounts of CNBSS accepting referrals of symptomatic patients.

The study data also pointed to non-random allocation of women between the mammography and usual care arms. In CNBSS1 [11], equal numbers of women were randomized to either mammography or usual care. Twenty-four late-stage cancers were noted in total. Of these, 19 were allocated to mammography, and 5 were allocated to usual care, a 380% difference. As an expected consequence of this overwhelming imbalance, the 7-year follow-up study demonstrated that 38 women had died in the mammography arm, and 28 women had died in the usual care arm. A study of enrollees at the Winnipeg study site demonstrated that eight out of nine enrolled women, who had prior billing records for breast cancer (an exclusion criterion), were allocated to the mammography arm of the trial, further suggesting non-random allocation [12].

Several articles were published criticizing the allocation and skewed statistics, including a calculation that the imbalance of late-stage cancers between the mammography and non-mammography arms could have occurred randomly only 3.3 times out of 1000 [13–15]. The eyewitness accounts of flawed randomization confirm that which has been evident in the data since early in the studies.

Unfortunately, very few RCTs specifically addressed the 40–49 age group, and, therefore, CNBSS1 has had a large influence on breast screening recommendations for women in this age range. The statistical problems are obvious, so why was this study not excluded by the statistics and epidemiology experts writing guidelines? Several factors may be at play and point to a larger problem with the practical application of evidence-based medicine.

#### **2. The Flaws in CNBSS Ignored**

CNBSS were criticized long before the results were published. The problematic implementation was questioned by external reviewers [16] and the studies' own physicists [17]. There were even attempts to explain away the implausible and unprecedented early finding of excess deaths in the screening arm of the trial [18]. No other study among the eight mammography RCTs ever demonstrated this finding. This lack of reproducibility, alone, should have resulted in skepticism about the results.

Early criticism of CNBSS was so widespread that a forensic assessment was published in 1997. This review was limited. Only 3 of 15 sites were assessed, and, importantly, the study staff was not interviewed at that time, despite this step being mandated in the study design [19]. In fact, the authors of this assessment suggested a confirmation bias in their own article, stating that, "We believe that there would be two advantages to publishing the 7-year follow-up data ... First, this criticism of the study would end ... ". Unfortunately, the quality of the forensic assessment was not questioned, and this study appeased those who would use CNBSS for future guidelines [20,21].

Interestingly, a recent modelling study used only CNBSS as the source material, choosing to focus on the outlier study and ignoring the remaining body of RCTs that converged on a significant benefit to screening [22]. The 2016 USPSTF guideline article went so far as to state, "[Malmo Mammographic Screening Trial I and the Canadian National Breast Screening Study 1 and 2] provided the least-biased estimates" [5].

Despite problematic recruitment and glaring statistical imbalances, recognized decades ago, CNBSS continue to influence research, guidelines, and worldwide guideline-based policy around breast screening. In Canada, CTFPHC guidelines strongly influence many provincial Clinical Practice Guidelines, which may, in turn, define patient access to screening through physician referral practices, programmatic screening structure, and billing restrictions.

How does a study that has been plagued by extensive international criticism over its design and skewed data manage to continue influencing recommendations for decades?

#### **3. Evidence-Based Medicine, Evidence Review, and Guidelines Methodology**

As a result of the evidence-based medicine movement, modern guidelines hinge on evidence review. This is performed by specialized bodies that conduct systematic searches for literature, decide which evidence is appropriate to include in the review, and then synthesize the data, often building upon older evidence reviews of the same topic. While this appears to be an ideal and objective way to expertly handle large amounts of research and perform the complicated statistical and epidemiological calculations involved, evidence review has some limitations.

Content experts have little to no substantial influence on evidence review. For example, no radiologist is included on the list of contributors for the 2018 CTFPHC breast screening evidence review [23].

Many members and frequently the chairs of evidence review and guideline bodies are non-physicians, and, thus, clinical experience and context are minimized. The continued inclusion of CNBSS in guideline evidence reviews is a stark example of the peril of minimizing content expert input. Had content experts been allowed appropriate input into the guideline processes, the well-documented imbalance in late-stage cancers and other significant problems with implementation could have been made clear to the reviewers.

Evidence review is expensive, and evidence reviews are built upon older reviews to save time and money. Once an error has been made, however, it may be perpetuated by copying that error into future versions of the review. This is what is known in radiology as "alliterative error", which is the tendency to perpetuate prior errors, particularly when the previous report has been viewed before assessment of the images—or evidence—one has been tasked with assessing [24].

In addition to the evidence review process, guideline methodology and guideline oversight are problematic. While the evidence review tool, GRADE [25], recommends including observational data, the evidence review team and guideline bodies may choose to ignore this, as seen in the 2018 CTFPHC breast screening recommendations [4]. In this guideline, the evidence review included only randomized controlled trials, largely performed between the 1960s and the 1980s, for the calculation of benefits. Decades of more recent screening program data were ignored. The largest observational study of screening program data in the world is known as the Pan Canadian Study, published in 2014 [26]. This demonstrates an overall mortality benefit of 40% for women attending the screening. In the 40–49 age group, this mortality benefit is even higher at 44%. This study is missing from the 2018 CTFPHC breast screening guideline references, and it is even absent from the list of excluded evidence [27]. It is difficult to explain the fact that landmark Canadian evidence is missing from a Canadian evidence review, but the near-complete absence of content experts from the evidence review process may contribute to this oversight.

The AGREEII [28] guideline development and appraisal instrument recommends the inclusion of content experts and patients as advisors on guideline panels, as do many other guideline methodology recommendations [29,30]. Again, however, oversight into the actual guideline process is lacking, and the systematic exclusion of content experts and patients from panels such as CTFPHC's has largely gone unnoticed.
