1. Introduction
As a major and increasing worldwide public health burden, breast cancer prompts the need for identification of early breast tissue molecular alterations that could be used for risk-tailored early diagnostic and effective primary prevention strategies. DNA methylation, a covalent addition of a methyl group to Cytosine-phosphate-Guanine (CpG) dinucleotides, summarizes genetic, environmental, and stochastic events that contribute to inter-individual variation in gene expression and ultimately, to variation in common complex diseases risk such as breast cancer [
1,
2]. In fact, widespread DNA methylation alterations in normal breast tissue adjacent to cancer that become enriched with breast cancer progression have been identified, suggesting that DNA methylation alterations predate the emergence of breast cancer [
3].
Demonstrating a mechanistic link between DNA methylation patterns and breast cancer occurrence remains a considerable challenge due to the cell type specificity of DNA methylation and the cell type heterogeneity of examined tissues. Such a mechanistic link should be supported by the identification of tissue-specific DNA methylation changes in normal breast tissue prior to breast cancer occurrence [
1]. In fact, many DNA methylation studies have been conducted on blood samples, and the detected methylation marks were not consistent across studies, even when these marks were validated and reproduced in independent datasets within the same study. These inconsistencies can mainly be explained by methodological biases [
4] but can also suggest some limitations when considering blood samples for detecting methylation marks of tissue-specific cancers. The only feasible approach used to date to identify causative molecular alterations in normal breast tissue is the comparison of normal tissue from healthy individuals to normal tissue adjacent to cancer. As few as three studies have attempted this approach for the detection of methylation marks in normal breast tissue [
4]. However, this approach is compromised by the existence of cancer field effects [
3]. In fact, genetic and epigenetic field effects in histologically normal-appearing tissue adjacent to cancer have been reported as far as 4 cm from primary breast tumors [
5,
6]. These molecular alterations may reflect both precancerous alterations that led to breast cancer development and alterations induced by the microenvironment of the adjacent developing cancer [
7].
To cancel out field effects secondary to the cancer environment, we used a nested case–control design to compare the normal breast tissue adjacent to primary tumors between breast cancer patients who developed a contralateral breast cancer (i.e., a second primary breast cancer in the opposite breast) to those who did not develop a contralateral breast cancer. The rationale behind this design is that both breasts of the same patient presumably bear the same precancerous DNA methylation alterations that summarize the complex interplay between genetic and environmental factors associated with her individual risk of developing a primary breast cancer in either breast. To further confirm that the detected DNA methylation alterations predate a second primary breast cancer occurrence, we replicated our analyses in two independent sets of case–control pairs, in which DNA samples of normal breast tissue and blood were obtained before or at the time of a first primary breast cancer occurrence.
2. Materials and Methods
2.1. Study Design and Population
We conducted a nested case–control study based on a cohort of 757 patients diagnosed between 2000 and 2007 with a primary invasive hormone receptor-positive and non-metastatic breast cancer at a breast cancer reference center, the “Centre des maladies du sein du CHU de Québec”. Biological characteristics of tumors were extracted from pathology reports. Demographic and clinical data collected at diagnosis were extracted from medical records and entered into a database by trained nurses and registrars. Women were eligible if they had no previous diagnosis of cancer other than non-melanoma skin cancer and did not receive any treatment prior to surgery. Using an incidence density sampling scheme, 20 patients diagnosed with a contralateral breast cancer (in situ or invasive) at least 12 months after their first breast cancer (cases) were matched (1:1) with 20 patients who did not develop a contralateral breast cancer (controls). Matching variables were the year of surgery (±2 years), age (±5 years), menopausal status, family history of breast cancer (yes/no), histologic type (ductal vs. lobular) of the primary tumor, human epidermal growth factor receptor 2 (HER2) status of the primary tumor, and hormone therapy (yes/no). For cases and controls, normal breast tissue was collected from breast surgery specimens.
Two additional sets of case–control pairs were used to select differentially methylated sites that predate a second primary breast cancer occurrence. The first set consisted of four breast cancer patients diagnosed with a first invasive hormone receptor-positive and non-metastatic breast cancer (cases) drawn from the same cohort described above, and four women diagnosed with a benign breast lesion (controls) drawn from the tissue biobank of the “Centre des maladies du sein du CHU de Québec”. Women were matched (1:1) for the year of surgery (±2 years) and age (±5 years). For cases and controls, normal breast tissue was collected from breast surgery specimens prior to any other treatment.
The second set consisted of six women with high mammographic density (>65%) who eventually developed an invasive breast cancer (cases), and six women (controls) with low mammographic density (<15%) who did not develop a breast cancer by the time their matched case developed a breast cancer. These women were drawn from a cohort of 737 women who attended a mammography screening at the “Clinique radiologique Audet” (Québec, QC, Canada) between February and December 2001 [
8]. Anthropometric data (weight and height measures) were measured at enrollment by a qualified research nurse, the women’s characteristics were collected using standardized questionnaires administered by telephone interview, and clinical data were extracted from medical records. Cases and controls were matched (1:1) for age (±3 years), family history of breast cancer (yes/no), body mass index (BMI, 18.5 to <25; 25 to <30), number of full-term pregnancies, and breast biopsy (yes/no). For these women, blood was collected at the time of mammography 5.6 ± 1.7 years on average before breast cancer occurrence for cases (median = 5.5 years, range 3.4 to 8.1 years).
All participants provided written informed consent. The study protocol was reviewed and approved by the research ethics committee of the CHU de Québec-Université Laval Research Center. The data that support the findings of this study are available upon reasonable request from the corresponding author (C.D.). The data are not publicly available due to legal restrictions to respect research participant privacy and consent.
The design of the main analysis aimed at identifying differentially methylated CpGs sites in normal breast epithelium that are causally associated with breast cancer occurrence, i.e., while canceling out cancer field effects, by comparing normal breast tissue of two groups of patients both exposed to similar cancer field effects. The rationale behind this design is that once cancer field effect has been canceled out from the affected breast, normal breast epithelium both from the affected breast and the non-affected contralateral breast harbor the same epigenetic marks for each individual patient, because both breasts have been exposed to the same genetic and environmental factors. In other words, this study design is comparable to having sampled the unaffected breast (i.e., the contralateral breast) and followed-up patients for the development of a primary breast cancer in this unaffected contralateral breast. Matching for factors that are known to be associated with contralateral breast cancer occurrence ensured that the setting is similar to obtaining normal breast tissue from women who had never had a previous breast cancer and comparing women who develop a primary breast cancer to those who do not develop a primary breast cancer.
Beyond a simple validation of our findings in the exact same study design, we decided to select differentially methylated CpGs sites that replicate in a more traditional (but less robust because of confounding by cancer field effects) study design. In the secondary dataset #1, we compared women who have developed a first primary breast cancer to women who had not developed a first primary breast cancer, which is the setting “artificially” created by the robust design of the main dataset.
Finally, the rationale behind the choice of the secondary dataset #2 was to select those differentially methylated CpGs that could also be detected in blood-derived DNA, i.e., those methylation marks that may have been induced early during development and propagated soma-wide and that could be useful as non-invasive biomarkers for breast cancer screening. Here, we compared women who developed a primary breast cancer with women who did not develop a primary breast cancer, by using blood samples prospectively collected several years before breast cancer occurrence in cases. These women had a mammography screening at time of blood collection, and those who developed breast cancer during follow-up had higher breast density at baseline than those who did not develop a breast cancer during follow-up (breast density is a known risk factor for breast cancer occurrence).
Thus, our three datasets compared women who developed a breast cancer (first primary or second primary breast cancer) to women who did not develop breast cancer, using three different strategies.
2.2. DNA Methylation Measurement
For breast tissue samples, normal breast epithelium located at least 1.0 cm from the primary tumor of cases and controls was identified on corresponding hematoxylin-eosin (H&E) stained slides. Ten to fifteen cores of 1.0 mm with at least 75% epithelial cells content were extracted from formalin-fixed paraffin-embedded (FFPE) tissue blocks and were used to build a tissue microarray (TMA) block for each patient. TMA blocks were serially sectioned at 10.0 μm. H&E-stained histologic tissue sections were verified for cellular content in the first, every 10th, and in the last section. A column-based method for DNA extraction from TMA sections of each patient was performed using GeneRead DNA FFPE deparaffinization solution (Qiagen, Mississauga, Ontario, ON, Canada) and the QIAamp DNA FFPE kit (Qiagen, Mississauga, Ontario, ON, Canada) for subsequent extraction steps. Deparaffinization was done twice to ensure complete paraffin elimination and proteinase K was incubated at 56 °C in ATL buffer for three days with 20 µL of proteinase K added each 24 h.
For blood samples, DNA was extracted from buffy coats using the Gentra Puregene DNA extraction kit (QIAGEN Inc., Canada) following the manufacturer’s protocol.
Quantification of DNA methylation was carried out at McGill University and Génome Québec Innovation Centre (Montreal, Quebec, QC, Canada) using the Infinium Human Methylation 450K (HM450k) BeadChip (Illumina Inc., San Diego, CA, USA), after bisulfite conversion, Infinium FFPE quality control, and DNA restoration, according to the manufacturer’s instructions. The HM450k has been extensively validated and provides reliable coverage of 485,512 CpG sites across 99% of RefSeq genes and 96% of CpG islands in the human genome [
9]. In order to test for potential batch effects, eight samples were replicated between or within batches. Hybridized and processed arrays were scanned using Illumina iScan (Illumina Inc., San Diego, CA, USA) to produce. IDAT files with raw probe intensities.
2.3. Data Preprocessing and Statistical Analyses
Raw methylation data preprocessing and statistical analyses were performed using R software version 3.6.2 [
10] and Bioconductor packages [
11]. The same preprocessing steps were performed separately for breast tissue samples and blood samples.
Data from IDAT files were read using the minfi package [
12]. Quality control plots for bisulfite conversion, extension, and hybridization were generated using the minfi and ENmix [
13] packages. Probes that failed in one or more samples based on detection
p-value > 0.01, probes overlapping a CpG site or single-base extension of the measured methylation loci, cross-reactive probes [
14], probes with multimodal methylation distributions identified using ENmix package [
13], and probes from the X and Y chromosomes were filtered out. Probes from the X chromosome were excluded to avoid a higher probability of both type 1 and type 2 errors associated with analyses of data from sex chromosomes, compared to autosomal data [
15].
The data-driven separate normalization method from the wateRmelon package was used for background adjustment and between-array normalization [
16]. The regression on correlated probes method from the ENmix package was used for probe type bias adjustment [
17]. Using multidimensional scaling (MDS) plots, no obvious batch, chip, or slide effect was detected and no outlier sample was identified. Intra- and inter-batch samples (n = 4) were then removed from the analyses. Beta-values were logit-transformed into M-values for statistical analyses. Of the 485,512 CpG sites included on the array, 409,741 autosomal CpG sites were included in the analyses of breast tissue samples (
Figure S1) and 429,014 in the analyses of blood samples (
Figure S2). In total, 40 samples (20 case-control pairs) were included in the main analysis of normal breast tissue samples (patients who developed a contralateral breast cancer vs. patients who did not develop a contralateral breast cancer). Eight samples (four case-control pairs) were included in the secondary analysis of normal breast tissue samples (patients with a primary breast cancer vs. patients with a benign breast tumor) and twelve samples (six case-control pairs) were included in the analysis of blood samples (women who developed a breast cancer vs. women who did not develop a breast cancer).
Global methylation differences between cases and controls for each set were compared using the Wilcoxon signed-rank test and mean beta-values, both globally and by CpG island and gene spatial distribution.
Probe-wise differential methylation analysis using M-values was performed with the limma package robust linear models. Robust empirical Bayes method was used to generate moderated paired t-test statistics and associated p-values for each CpG site. For blood samples, white blood cell type proportions were estimated using the method described by Houseman et al. [
18] within the minfi package. Confounding by cell type proportions was identified, and differentially methylated CpGs associated with cell type proportions were excluded from further analyses. To determine biological plausibility, differentially methylated CpGs with Benjamini and Hochberg-adjusted
p-values (FDR
q-values) < 0.05 were selected for functional annotations (Gene Ontology, KEGG) and were analyzed using the gometh function of the missMethyl package [
19]. CpG sites passing the FDR threshold in the main nested case-control analysis of normal breast tissue samples were considered replicated in the two secondary sets (normal breast tissue and blood samples) if the nominal p-value in these secondary analyses was < 0.05 with the same direction of association. CpG sites passing the strict Bonferroni correction (nominal
p-value < 1.22 × 10
−7) in the main analysis and those replicated in the secondary analyses were further compared to the CPGA SAGE database, using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 to select differentially methylated genes for which significant methylation changes are consistent with their differential expression in breast cancer.
To identify concordant differentially methylated regions of several consecutive CpG sites (distance to the next consecutive site less than 1000 nucleotides), differential methylation analysis of regions with Benjamini and Hochberg correction for multiple comparisons was performed using the DMRcate package [
20]. Regions with Stouffer
p-value < 0.05, maximum difference > 0.05, and containing at least two or more CpG sites were selected. Genes identified by the region approach were compared to those identified by the individual CpG site approach.
4. Discussion
The present study aimed at identifying normal breast tissue methylation patterns that may predispose to breast cancer development, using a robust study design unprecedented in previous breast cancer epigenome-wide association studies. To cancel out field effects, our main nested case–control analysis compared normal breast tissue adjacent to primary tumors of breast cancer patients who developed and those who did not develop a contralateral breast cancer. We identified 7315 individual CpG sites with an FDR q-value < 0.05 and 52 CpG sites at the strict Bonferroni nominal p-value < 1.22 × 10−7, of which 43 were mapped to known genes involved in metabolic diseases. Pathway analysis of these 43 distinct genes identified six enriched pathways (p-value < 0.01) involving fatty acids metabolic processes.
One gene,
LHX2, harbored significant methylation changes at two different CpG positions, while 15 genes harbored significant methylation changes consistent with their differential expression in breast cancer. Of these,
LHX2, TFAP2B, JAKMIP1, and
SEPT9 were also included in significantly differentially methylated regions. The
LHX2 gene codes for the LIM homeobox 2 protein, a transcription factor downstream of p63 and NF-κB, and upstream of Wnt/β-catenin, Bmp, and Shh [
23], that has a critical role during the epithelial–mesenchymal transition in normal and cancerous breast epithelial cells [
24]. This gene has been shown to harbor aberrant methylation in primary breast tumors [
25]. The
TFAP2B gene codes for the transcription factor AP-2 beta, a sequence-specific DNA-binding protein that has been recognized as an oncogene that mediates cancer cell proliferation, apoptosis, invasion, and migration via the COX-2 signaling pathway in vitro and in vivo [
26]. TFAP2B is also expressed in breast tissue, where it is thought to coordinate HER2 and ER [
27] and has been associated with breast cancer prognosis [
28]. The
JAKMIP1 gene codes for the Janus kinase and microtubule interacting protein 1 and has been shown to be highly expressed in tumor samples, where it enhances the proliferation of cancer cells [
29]. Its upregulation affects cell proliferation via the Wnt and beta-catenin pathways [
29]. The
SEPT9 gene codes for Septin 9, a protein involved in cytokinesis and cell cycle control that has been implicated in early breast cancer development [
30]. The
SEPT9 gene methylation has been detected in breast cancer tissue [
31].
To further detect DNA methylation alterations that predate a second primary breast cancer occurrence, we used two independent sets of case–control pairs in which DNA samples of normal breast tissue and blood were obtained before a second breast cancer occurrence. Out of the 7315 individual CpG sites identified in the main nested case–control analysis, six CpG sites were also differentially methylated with the same direction of association in both secondary sets’ analyses, of which five mapped to known reference genes. Of these, three genes, namely
POM121L2,
KCNQ1, and
CLEC4C, harbored significant methylation changes consistent with their differential expression in breast cancer. The
POM121L2 gene codes for POM121 transmembrane nucleoporin like 2, which has been shown to be upregulated in triple negative breast cancer [
32].
KCNQ1 codes for the potassium voltage-gated channel subfamily Q member 1, which has been shown to play important physiological roles in the mammary epithelium [
33] and has been suggested to act as a tumor suppressor and regulator of the epithelial–mesenchymal transition in colorectal cancers [
34,
35].
CLEC4C codes for a lectin-type cell surface receptor that may play a role in antigen capturing by dendritic cells, inflammation, and immune response, and has been shown to be upregulated in triple negative breast cancer [
36].
Many epigenome-wide studies have investigated the association between DNA methylation and breast cancer risk using blood-derived DNA and the HM450k BeadChip, while as few as three studies measured breast tissue DNA methylation [
4]. These studies identified between 0 and 2761 differentially methylated CpGs, with none of the identified differentially methylated sites overlapping between these studies, and suffered major methodological issues, especially pertaining to incomplete control of confounding and suboptimal preprocessing methods [
4]. Nevertheless, four of our detected differentially methylated CpGs in the main analysis were also differentially methylated in the same direction of association (all hypomethylated in breast cancer) in previous epigenome-wide studies, namely cg07180460 (
ZSWIM6), cg22731164 (
GPR176), and cg18726036 (
FKBP5) in a study of blood DNA methylation from the Sister Study [
37], and cg02168584
(DLX2-AS1) in a study of genetically predicted DNA methylation of patients from the Breast Cancer Association Consortium [
38], all of which have been shown to be dysregulated in breast cancer cell lines [
39,
40,
41,
42].
Taken together, our findings support the hypothesis that detectable methylation differences in cancer-related genes in normal breast tissue predate the occurrence of breast cancer. Some of these methylation changes were also detectable in blood DNA, suggesting that these methylation changes may have been induced early during development and propagated soma-wide [
2,
43], and could be useful as biomarkers for non-invasive screening to identify women with increased risk of developing breast cancer. Methylation changes that were specific to normal breast tissue may have occurred during adulthood as a result of ageing and lifetime exposure to known and unknown risk factors [
2,
43], and could be useful for identifying these unknown risk factors and for potential targeted interventions based on epigenetic agents to prevent breast cancer occurrence [
1].
Using an original and novel study design, we were able to assess methylation changes in normal breast epithelial tissue while minimizing the risk of confounding by cancer field effects. The main strengths include the use of conventional epidemiological approaches to control for selection bias (nested case–control design) and confounding bias (matching for breast cancer risk and prognostic factors), two important drawbacks in previous epigenome-wide DNA methylation studies of breast cancer [
36]. We used
a priori, up-to-date, and recommended data preprocessing methods and workflow, which prevent inflation of the false-positive rate resulting from data-driven selection of preprocessing methods. In addition, we conducted both site-specific and DMR analyses, and we replicated the analyses in two independent datasets. The main limitation of the study is the relatively small sample size, which could have limited the detection of genuine methylation differences (i.e., low study power). However, by using the appropriate data preprocessing methods coupled with the doubly robust statistical modeling approach, which minimizes the risk of false-negative rate, we were able to detect more differentially methylated CpG sites than larger studies [
36].
While robust and promising, our results need to be validated in other populations and with other DNA methylation measurement methods. Epigenome-wide DNA methylation methods are particularly suitable for hypothesis generation as they capture the dynamics of several sites simultaneously across the entire genome, thus being less prone to bias than candidate gene methylation studies [
44]. The next step would be to validate the differentially methylated sites and related genes detected by these methods using a different measurement method, such as a PCR-based method, in a candidate-gene methylation study. A transcriptional or protein expression analysis should then be performed to confirm the functional impact of the detected methylation differences and its association with breast cancer occurrence [
45].