Next Article in Journal
Tandemly Repeated G-Quadruplex Structures in the Pseudorabies Virus Genome: Implications for Epiberberine-Based Antiviral Therapy
Previous Article in Journal
Transcriptomic and Metabolomic Insights into Plant Hormone Modulation and Secondary Metabolite Accumulation in Basil Under Far-Red and Ultraviolet-A Light
Previous Article in Special Issue
Cell Migration in Endometriosis Responds to Omentum-Derived Molecular Cues Similar to Ovarian Cancer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Methylation Risk Score Modelling in Endometriosis: Evidence for Non-Genetic DNA Methylation Effects in a Case–Control Study

1
Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
2
Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
3
Department of Pediatrics, Division of Neonatology, University of California San Francisco, San Francisco, CA 94143, USA
4
Center for Reproductive Sciences, Department of Obstetrics, Gynecology & Reproductive Sciences, University of California San Francisco, San Francisco, CA 94143, USA
5
Australian Women and Girls’ Health Research Centre, University of Queensland, Brisbane, QLD 4006, Australia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(8), 3760; https://doi.org/10.3390/ijms26083760
Submission received: 6 February 2025 / Revised: 7 April 2025 / Accepted: 9 April 2025 / Published: 16 April 2025
(This article belongs to the Special Issue Molecular Studies of Endometriosis and Associated Diseases)

Abstract

:
Endometriosis is a chronic gynaecological disease characterised by endometrial-like tissue found external to the uterus. While several studies have reported strong evidence of a genetic contribution to the disease, studies on the environmental impact on endometriosis are limited. DNA methylation (DNAm) can be influenced by genetic and environmental factors and serves as a useful biological marker of the effects of genetic and environmental exposures on complex diseases. This study aims to develop a methylation risk score (MRS) for endometriosis to increase the power to detect DNAm signals associated with the disease and enhance our understanding of the pathogenesis of the disease. Endometrial methylation and genotype data from 318 controls and 590 cases were analysed. MRSs were developed using several different models. MRS performances were evaluated by splitting samples into training and test sets based on independent cohort institutions, and the area under the receiver-operator curve (AUC) was calculated. The maximum AUC obtained from the best-performing MRS is 0.6748, derived from 746 DNAm sites. The classification performance of MRS and polygenic risk score (PRS) combined was consistently higher than PRS alone. This study demonstrates that there are DNAm signals independent of common genetic variants associated with endometriosis.

1. Introduction

Endometriosis is a chronic gynaecological disease characterised by endometrial-like tissue found external to the uterus [1]. It is estimated that 6–11% of reproductive-aged women worldwide have been affected by endometriosis [2]. Patients can experience symptoms of pelvic pain and/or infertility, which negatively impact their daily lives, personal relationships, and livelihood, leading to a significant burden on the economy [1]. Estimates of heritability from studies on twin samples have shown that genetic factors contribute approximately 50% of the variation in endometriosis, while the remaining is associated with environmental factors [3,4]. While several studies have explored the link between common genetic variants and endometriosis, providing compelling evidence of a genetic contribution to the disease, studies on the environmental impact on endometriosis have been limited [5,6].
DNA methylation (DNAm) is known to be influenced by both genetic and environmental factors and, therefore, may be a useful biological marker and mediator of the effects of genetic and environmental exposures on complex diseases [7,8]. Evidence of the genetic influence on DNAm patterns is reflected in the estimated heritability of DNAm, which ranges from 0.1 to 0.3, and the discovery of genetic variants that affect DNAm levels, known as DNA methylation quantitative trait loci (mQTLs) [9,10]. Studies have also reported associations between various environmental exposures and changes in DNAm patterns, including effects of socioeconomic status [11], early life environment [12], traumatic events [13], pollutants [14], nutrition [15], and physical activity [16,17].
A link between DNAm and endometriosis has been previously reported in several studies, including hypermethylation of the HOXA10 and progesterone receptor-B (PR-B) promoters in the endometrium of women with endometriosis [18,19,20,21], the latter providing a possible explanation for progesterone resistance and the reduction in the level of PR-B in endometriosis [19,20,21]. The promoter for the transcription factor for oestrogen biosynthesis, called steroidogenic factor-1 and oestrogen receptor-beta, has been reported to be hypomethylated in endometriotic cells [22,23]. Recently, with the emergence of genome-wide methylation technology, studies have shown DNAm sites mapped to 10 genes correlated to the change in gene expression in the endometrium, contributing to the development or progression of endometriosis [24]. Dyson et al. [25] also identified 403 genes with significantly different methylation patterns between healthy human endometrial and endometriotic stromal cells. However, due to the low sample size of these studies, further studies are needed to verify the results.
A recent study investigating the relationship between DNAm profiles in endometrium of patients with endometriosis and controls without disease identified a significant difference in DNAm profile between stage III/IV endometriosis and controls and networks of methylation sites associated with disease risk [26]. Notably, the variance in endometriosis captured by DNAm was estimated to be 15.4%, and methylation differences between cases and controls were reported to be in the range of 0.03 to 0.08, suggesting that many methylation signals with small effects may contribute to disease. Based on the study’s estimated effect sizes, the study had limited power to reliably detect differences in individual methylation signals between cases and controls [26].
Methylation risk score (MRS) is a numerical value that quantifies an individual’s risk for a particular disease or trait based on their DNAm profile. Applications of MRS include studying the associations between DNAm and a phenotype [27], identifying biomarkers for environmental exposures [28,29], interaction analyses [30,31,32], mediation analyses [27], and predicting the risk of an individual contracting a disease or treatment outcomes [33,34]. MRS is particularly useful for detecting associations between multiple single DNAm sites and a trait, especially when there is insufficient power to achieve statistical significance for individual loci [35]. Several studies have leveraged MRS to validate findings from methylome-wide association studies, including for type 2 diabetes [36] and neurodegenerative disorders, including Alzheimer’s disease, amyotrophic lateral sclerosis, and Parkinson’s disease [37].
We aimed to develop an MRS for endometriosis using endometrial methylation data from 1074 individuals to detect DNAm signals associated with endometriosis and investigate the unique non-genetic contribution of the DNAm to endometriosis risk. We hypothesise that DNAm contributes to endometriosis risk independently of genetic variation and that an MRS derived from endometrial methylation data can effectively identify DNAm signals associated with the disease.
This study provides additional evidence that DNAm, independent of common genetic variants, is associated with endometriosis and emphasises the need for future research with larger sample sizes to explore this relationship further.

2. Results

2.1. Factors Contributing to Variation in Endometrial DNAm

To identify and address confounders that contribute to variation in endometrial DNAm, a total of 908 samples retained following methylation quality control filtering were included in statistical tests to identify any associations between the potential covariates with endometriosis status and DNAm principal components (PCs). The results showed that age, the institution where the samples were processed, and genetic ancestry were significantly associated with endometriosis status (p-value < 0.05) (Table 1). Institutions were also significantly associated with all top 15 DNAm PCs (Table S1). Furthermore, when plotting PCs, individuals were partially clustered according to institution, as shown in Figure S1. Thus, age and institution were used as covariates during the subsequent analyses. Additionally, genetic PCs were included as covariates during the development of MRS to account for the difference in genetic ancestry and population structure. Surrogate variable (SV) analysis was also conducted to remove batch effects and any hidden sources of variation that were not accounted for by the selected covariates.

2.2. Variation in Endometriosis Status Captured by DNAm in Endometrium Independent of Common Genetic Variants

To identify whether DNAm contributes to the variation in endometriosis status seen among participants, we estimated the proportion of variance in endometriosis status that can be captured by DNAm using omics residual maximum likelihood analyses (OREML), a residual maximum likelihood analysis that calculates the estimates from an omics relationship matrix (ORM) generated from DNAm beta values of the endometrium tissue samples. Additionally, to obtain estimates of the proportion of variance in endometriosis status captured by DNAm that was independent of common genetic variants, we simultaneously included the genomic relationship matrix (GRM) and ORM into the OREML model. Estimates from models with and without covariates were recorded in Table 2 to demonstrate the impacts of covariates on the relationship between endometriosis and DNAm in the endometrium.
In the absence of covariates, the proportion of variance captured by both DNAm (12.35%) and common genetic variants (22.38%) (model 3 = 34.73%) was higher than the variance captured by DNAm (model 1 = 19.58%) and common genetic variance (model 2 = 28.83%) alone (Table 2). The proportion of endometriosis variance captured by DNAm in model 3 changes slightly when covariates such as SVs, age, institution, and menstrual cycle phase were included in models 4 and 5. Specifically, the variance explained by DNAm increased to 18.25%, and the variance for common genetic variants increased to 23.78% when all covariates were included. Moreover, we observed a significant reduction in variance explained by DNAm from model 1 (19.58%) to model 3 (12.35%) when the GRM was included in the model. This suggests that genetic regulation of methylation and population structure, as captured by the GRM; age; institution; and technical variation (SVs) contributed to the variation in methylation in endometrium and, therefore, influenced the relationship between DNAm and endometriosis status and should be accounted for in the subsequent analyses. The last model that includes all potential covariates (ORM + GRM + SVs + age + institution + menstrual cycle phase) was then used to develop the MRS.

2.3. MRS Captures DNAm Differences Between Endometriosis Cases and Controls

An outline of the MRS development and evaluation pipeline is illustrated in Figure 1. We estimated the effect sizes of DNAm probes on endometriosis via MLM-based omic association (MOA), multi-component MLM-based association excluding the target (MOMENT), and best linear unbiased prediction (BLUP) using four different training sets, each excluding one of the four institutions (1—Centre for Inflammation Research, University of Edinburgh (CIR); 2—University of California San Francisco (UCSF); 3—Oxford Endometriosis CaRe Centre (Oxford); 4—Institute for Molecular Bioscience (IMB)). This leave-one-institute-out approach ensures that the associations observed are not driven by a single institute and account for institute-specific sources of biases. Covariates included age, institution, menstrual cycle phase, and genetic PCs. Manhattan plots were plotted to show the statistical significance (p-value) of the associations between each DNAm probe (n = 762,651) and endometriosis, generated using the MOA (Figure S2) and MOMENT (Figure S3) methods. As shown in Figures S2 and S3, probe cg04415176, located at chromosome 2 and annotated to the HOXD13 gene, was shown to be significantly associated with endometriosis according to the Bonferroni-corrected threshold of p < 6.56 × 10−8 when MOA and MOMENT were performed on Training Set 1. However, no probes were found to be significantly associated with endometriosis when MOA and MOMENT were performed with the other training sets.
To assess the consistency and reliability of the MRS generated, we calculated the correlation of effect sizes generated between all three methods, and the results showed that the correlations between MOA and MOMENT, MOA and BLUP, and MOMENT and BLUP were all >0.92 (Figure S4). Summary statistics, including effect sizes and p-values, of all DNAm probes used for generating the MRSs across all three methods on each training set are presented in Tables S2–S5. We also computed the correlation between the MRSs applied to each test set across different MRS models illustrated in Figure S5. Each MRS model differs in terms of the effect size and DNAm probes selected for calculating the score. For effect sizes calculated via MOA and MOMENT, various p-value thresholds representing the significance of the relationship between the DNAm probe and endometriosis were used for probe selection. Alternatively, all DNAm probes were included in the score calculation when probe effect sizes were estimated via BLUP. Across all test sets, the correlation between all MRS models was positive, with a moderate-to-strong correlation (r = 0.59–0.99) between all models except for models with a p-value threshold of 1 × 10−4 and 1 × 10−5. This could potentially be driven by the lower number of DNAm probes used to generate MRS for both thresholds and, therefore, less reliable and unstable estimates.
To evaluate the performance of MRS generated across different models on the classification of endometriosis case–control status, we calculated the area under the receiver-operator curve (AUC) of each MRS model across different test sets. Three classification models were used to calculate the AUC: (1) case–control status was classified by MRS alone (AUC1), (2) case–control status was classified by both MRS and polygenic risk score (PRS) (AUC2), and (3) case–control status was classified by PRS only (AUC3). AUCs for all MRS models across all test sets were included in Figures S6–S9. Overall, the choice of MRS models in general did not result in a statistically significant difference in prediction accuracy. Yet, to provide a more thorough and consistent comparison between test sets, the following selection criteria were used to select the appropriate MRS models that generated the maximum performance within each test set:
  • MRSs derived from p-value thresholds of 1 × 10−4 and 1 × 10−5 were excluded.
  • MRSs that yielded the highest AUC within the test set and classification model and demonstrated a significant association with endometriosis were selected.
  • If none of the MRSs had a significant association, the MRSs with the highest AUC were chosen.
For the MRS-only classification model, MRSs generated on the CIR test set using MOA and a p-value threshold of 0.001 (AUC1 = 0.6748 [CI = 0.5468–0.8029]) performed the best. The maximum AUC of all test sets was higher than 0.5, ranging from 0.5555 to 0.6748. MRSs generated in three of the four test sets showed a significant association with endometriosis (p < 0.05). MRS models that generated the maximum performance also vary across test sets, involving both MOA and MOMENT methods and a range of p-value thresholds from 0.001 to 0.2. (Figure 2a).

2.4. Unique Contribution of MRS in the Case–Control Classification of Endometriosis

The performance of a combined-risk-score classification model was evaluated to demonstrate whether MRS can contribute any additional classification value to the current PRS. Figure 2 and Figure 3 show that, in most instances, the MRS + PRS classification model has higher accuracy when compared to the MRS-only classification model. CIR had the highest combined model AUC2 performance of 0.7284 [CI = 0.6075–0.8493]. Statistical significance between MRS and endometriosis in the combined model was only observed in the CIR (p = 0.0072) test set, yet marginal significance was also seen in the UCSF (p = 0.0541) and IMB (p = 0.0697) test sets. To further validate the unique contribution of MRS, independent of PRS, in the classification of endometriosis case–control status, the performance of PRS is calculated and compared with the MRS classification models. The performance of the MRS + PRS (AUC2 range from 0.5539 [CI = 0.4439–0.6640] to 0.7284 [CI = 0.6075–0.8493]) classification model was consistently higher than the PRS-only model (AUC3 range from 0.5123 [CI = 0.4011–0.6235] to 0.6342 [CI = 0.5083–0.7601]) (Figure 2 and Figure 3). Overall, we observed that MRS offers a distinct contribution to the case–control classification of endometriosis beyond what is provided by the PRS.

2.5. Sensitivity Analyses: Performance of MRS Within European-Ancestry Samples

Weightings used to compute the PRS in this study were developed from European-ancestry cohorts. Therefore, the performance of PRS may be underestimated when applied to a multi-ancestry cohort. Hence, we aimed to verify our results by restricting the development and evaluation of MRS analyses to only participants of European genetic ancestry. Table S6 shows the distribution of participants’ genetic ancestry across institutions. After removing non-European-ancestry participants, a total of 79, 215, 126, and 188 samples from CIR, IMB, Oxford, and UCSF, respectively, that had both DNAm and genotyping data were used for subsequent analyses.
Overall, consistent with the results shown in the multi-ancestry samples, the maximum AUC for the MRS + PRS classification model was higher than the AUC for PRS across all test sets, as shown in Figure 4. The maximum AUC of the MRS + PRS classification model across all test sets ranges from 0.5148 (CI = 0.4014–0.6282) to 0.6952 (CI = 0.5760–0.8144), while the maximum AUC of the PRS-only classification model ranges from 0.4870 (CI = 0.3706–0.6035) to 0.6366 (CI = 0.5090–0.7641). None of the MRSs developed exhibit statistically significant association with endometriosis. Details about DNAm probes used for developing the MRSs, including the effect size estimated from MOA, MOMENT, and BLUP across all test sets, were included in Tables S7–S10. AUCs of each MRS model across all test sets were also reported in Figures S10–S13.

3. Discussion

Studies have explored the association between common germline genetic variants and endometriosis, providing substantial evidence for the genetic contribution to the disease’s aetiology. Still, research on the environmental impact on endometriosis has been limited [5]. Both genetic effects [9,10] and effects of environmental exposures can contribute to variation in DNAm [38,39,40,41]. This makes it a valuable biological mediator for the effects of environmental exposures on complex diseases. However, previous methylation studies in the endometrium could not pinpoint and replicate individual DNAm sites associated with the disease due to the limited sample size and small effect sizes. This study identifies associations between effects aggregated across several DNAm sites, in the form of an MRS, and disease risk and highlights contributions of methylation signals to endometriosis independent of common genetic variants.
Using endometrial methylation data from 881 women, we estimated that 18.25% of the variance in endometriosis case–control status is captured by DNAm. Notably, this estimate was computed after correcting for common genetic variants, which were estimated to capture 23.78% of endometriosis variance, emphasising that DNAm signals associated with endometriosis could be derived from non-genetic factors, i.e., environmental factors. However, the results do not exclude the possibility that somatic mutations, rare variants, and structural variants may also influence DNAm signals associated with endometriosis. Various studies have estimated the amount of phenotypic variance explained by methylation, and values can range from 2.88% to 61.14%, depending on the nature of the trait. For example, body fat and adiposity-related biochemical traits, including body fat percentage (61.14%) [42] and glucose level (29.07%) [42], known to be driven by a large component of nongenetic factors, are reporting higher variance explained by DNAm compared to complex disease traits like endometriosis, Parkinson’s disease (21%) [43], and autism spectrum disorder (2.88%) [44]. Notably, the estimates for Parkinson’s disease and autism spectrum disorder do not account for genetic effects. Moreover, the contribution of common genetic variants to endometriosis, without accounting for DNAm, estimated in this study (28.83%) was also similar to previously published estimates of 26% [45]. Previous studies examining the interplay between DNAm and common genetic variants on disease variance have shown additive effects by estimating and comparing variance explained by PRS and MRS individually and combined. For instance, in major depressive disorder, the variance explained by both PRS and MRS combined (3.99%) was additive to the variance explained by PRS (2.40%) and MRS (1.75%) alone [46]. Similarly, we observed that the variance in endometriosis captured by both DNAm and common genetic variants combined (ORM (12.35%) + GRM (22.38%) = 34.73%) was higher than DNAm (19.58%) and common genetic variants (28.83%) alone. This suggests an additive contribution of each factor to endometriosis. However, it is worth noting that the variances explained and observed in this study were not truly additive. Some variances explained by DNAm overlapped with common genetic variants. For example, a recent discovery of 51 cis-mQTLs associated with endometriosis revealed genetic variants that can influence DNAm levels, highlighting the interplay between genetic factors and DNAm contributing to disease risk [26].
Aggregating the effects of multiple DNAm sites across the genome, we generated an endometriosis MRS using different computational approaches. The best-performing MRS with the highest AUC (0.68) was computed using MOA and a total of 746 DNAm probes (p-value threshold < 0.001), using CIR as the test set and the largest training set containing IMB, Oxford, and UCSF. This performance was comparable to other complex diseases, where previous studies have shown that MRSs developed for the prediction of breast cancer and amyotrophic lateral sclerosis had similar AUCs of 0.63 and 0.65, respectively [47,48]. We observed that the majority of MRSs with the highest AUC for each test set were generated using either MOA or MOMENT and not BLUP, suggesting that effect sizes estimated from MOA and MOMENT, combined with the targeted selection of DNAm probes, align more closely with the epigenetic architecture of endometriosis. This implies that the effects associated with endometriosis may be more concentrated within a specific subset of selected epigenetic markers [49]. Although pathway analysis could potentially provide more biological insights, it was not considered in this study as the number of probes used to calculate the MRSs is large, leading to an increased likelihood of identifying pathways that may not be relevant to endometriosis. Overall, the ability of MRS developed in this study to classify endometriosis cases and controls based on DNAm in the endometrium provides further evidence for DNAm differences between endometriosis cases and controls.
As reported previously, genetic risk variants in the form of PRS capture an increased risk of endometriosis [50]. To demonstrate the additional risk information captured by MRS independent of genetic risk, we calculated the PRS for participants in each test set. We compared its classification performance against the MRS + PRS model. Since the weightings of PRS were derived from predominantly European-ancestry datasets, it was expected to underperform among mixed-ancestry samples. Hence, results were further validated by restraining the risk score development and evaluation process to within European-ancestry samples only. A higher AUC when both MRS and PRS were included in the classification model compared to the PRS-only model was consistently observed across test sets and when models were restricted to include only European-ancestry participants. Similar trends have been observed in previous studies comparing the prediction accuracy of an MRS + PRS model with the single score model in BMI. The study showed that the MRS + PRS model had a larger prediction accuracy than the PRS-only model, suggesting that both scores acted additively and MRS could capture variance in BMI independent of the genetic determinants [51].
A strength of this study is the ability to train and test the performance of endometriosis MRS across multiple independent cohorts. However, variability in sample sizes, potential environmental exposures, cell composition, and independent processing of DNA samples between institutions may have influenced the results. While SVA analysis accounted for unknown variations, training and test sets were processed separately. Limited power, small effect sizes, and lack of significant p-values for individual DNAm sites and for some MRS models highlight the need for larger sample sizes and additional validation cohorts to verify the results. Additionally, although there is evidence of methylation differences between stage III/IV and controls, we did not consider developing MRS for the severe group only due to their much smaller sample size, which would further reduce power during evaluation.
Several observational studies have reported an association between environmental exposures and endometriosis [52,53,54,55]. DNAm serves as a valuable biomarker for environmental exposures [56], providing insights into how pollutants, diet, and other external factors contribute to disease pathogenesis. Identifying these influences could enhance our understanding of how epigenetic modifications regulate key biological mechanisms, including hormonal regulation, cellular proliferation, and inflammation. This knowledge may lead to the discovery of novel therapeutic targets, ultimately improving endometriosis management.
Epigenetic markers also show promise as biomarkers for disease detection and risk stratification alongside non-invasive approaches such as detecting microRNAs in liquid biopsy [57], autoantibodies in blood [58], or menstrual fluid analysis [59] and could facilitate early diagnosis and personalised risk assessment, enabling identification of high-risk individuals and informing targeted preventative and treatment strategies. However, further studies are needed to assess the utility of DNAm biomarkers for early diagnosis, given several known challenges such as tissue-specific DNAm patterns [60], the sensitivity of DNAm signals to external confounding environmental as well as technical factors, and the dynamic change in DNAm signals across the lifespan [61]. These issues will need to be addressed before DNAm biomarkers can be reliably implemented in clinical practice.
This study supports the hypothesis that DNAm influences endometriosis risk independently of genetic variation, emphasising the importance of molecular techniques in studying non-genetic factors. Integrating MRS with PRS has demonstrated an improved classification performance, reinforcing the predictive utility of epigenetic factors beyond common genetic variation. These findings underscore the need for comprehensive epigenetic studies to explore how environmental exposures contribute to endometriosis pathogenesis, paving the way for novel preventive and therapeutic approaches tailored to individual risk profiles.

4. Materials and Methods

4.1. Sample Collection and Processing

Data used for analyses in this study were generated from samples recruited as part of a previously published study [26]. Briefly, endometrial tissue samples were collected through case–control studies at four different institutions namely, the University of California San Francisco, California (480 samples); the University of Melbourne, Melbourne, Australia (315 samples); Oxford Endometriosis CaRe Centre, Oxford, UK (193 samples); and the EXPPECT Centre, The University of Edinburgh, Edinburgh, Scotland, UK (86 samples) and processed as described previously by Mortlock et al. [26]. The recruitment sample size was determined based on previous power calculations by Rahmioglu et al. [62], who estimated that to detect a 2% difference in 78% of the DNAm probes between cases and controls in the endometrium, a sample size of 500 is needed. Participants who had been on contraceptive steroids or gonadotropin-releasing hormone analogues during the 3 months prior to sampling, had undefined menstrual cycles, or had signs of endometrial hyperplasia or cancer were excluded from the recruitment process. A total of 679 surgically diagnosed endometriosis cases, 389 controls, and 6 individuals with unknown endometriosis status were recruited. Cases were defined as women who were surgically diagnosed with endometriosis, while controls comprised women without any visualised endometriosis during surgery and who had no history of endometriosis. To avoid bias, controls from all four institutions were recruited in approximately equal proportions to cases. DNAm measurements were calculated using Illumina Infinium MethylationEPIC Beadchips (Illumina, San Diego, CA, USA). Details of sample processing and preliminary QC are available in Mortlock et al. [26].

4.2. DNAm Quality Control

DNAm quality control and processing were performed as outlined in Nabais et al. [37] using the meffil R package. Low-quality samples and DNAm sites were excluded based on predetermined QC threshold parameters [37]. Technical variation was eliminated through functional normalisation, achieved by fitting linear models to probe intensity quantiles against control probe matrix principal components. After normalisation, the most variable probes were analysed for batch effects, regressing against factors like chip, chip column, and chip row. The significance threshold for association detection p-values is 0.01. Probes linked to sex chromosomes, those overlapping with SNPs, and those with non-unique hybridisation were removed based on recommended masking guidelines reported elsewhere [63]. Participants with unknown endometriosis status were removed. The final DNAm dataset consisted of a total of 318 controls and 590 cases with 762,651 DNAm sites.

4.3. Genotyping Data Quality Control

Samples passing the initial methylation QC were genotyped using Axiom Precision Medicine Research Array (Thermo Fisher Scientific, Waltham, MA, USA). Quality-controlled genotyping data and genetic ancestry used for analysis were obtained from Mortlock et al. 2023 [26]. Quality control was carried out separately for batch I and II samples. Steps include filtering out individuals with genotype call rates < 95%, a heterozygosity rate > 3 standard deviations away from the mean heterozygosity rate, and high relatedness (IBD > 0.2), as well as filtering variants with minor allele frequency < 5%, call rates < 95%, and deviation from Hardy–Weinberg equilibrium (p-value < 1 × 10−5). The data were pre-phased with SHAPEIT2 and imputed using the 1000 Genomes reference. The final genotype dataset consisted of 953 individuals (614 cases and 339 controls) and 5,201,970 common genetic variants. The genetic ancestry of all participants with genotyping information was identified using the 1000 Genomes P3v5 reference data and principal component analysis (PCA), as described in Mortlock et al. 2023 [26]. The total number of samples assigned for each ancestry is shown in Table 1.

4.4. Covariate Selection

Several studies have shown that age [64], ancestry [65], menstrual cycle [26], cell type proportion [11], and batch [66] play a role in the variation of DNAm profiles between individuals. Hence, these factors should be accounted for during analyses to remove any unwanted variation between the samples not contributed by the variables of interest, in this case, endometriosis status. Potential covariates selected for covariate selection analysis include age, menstrual cycle phase, institution, genetic ancestry, chip, sentrix ID, and batch. The definitions for each potential covariate were as follows: Age: Participant’s self-reported chronological age treated as a continuous variable. Menstrual cycle phase: In short, menstrual cycle phase was assigned to specimens as a categorical variable in several phases based on the criteria of Noyes et al. [67] as described in Mortlock et al. 2023 [26]: menstrual, early proliferative (EP), mid-proliferative (MP), late proliferative (LP), early secretory (ESE), mid-secretory (MSE), and late secretory (LSE). All proliferative samples were grouped as PE and unassigned secretory sub-phase samples as SE. Institutions: A categorical variable referring to the sites where tissue samples were analysed. Abbreviations were included in brackets. Samples obtained from the EXPPECT Centre were analysed at the Centre for Inflammation Research, University of Edinburgh (CIR); samples collected by the University of Melbourne were analysed at the Institute for Molecular Bioscience, University of Queensland (IMB); samples collected from the University of California San Francisco were both collected and analysed at the University of California San Francisco (UCSF); and samples from the Oxford Endometriosis CaRe Centre were analysed at the same centre (Oxford). Genetic ancestry: A categorical variable that represents participants’ ancestry assigned using their genotype data.
Statistical tests, including the t-test, chi-squared test, and Fisher’s exact test, were conducted to identify potential covariates that are significantly associated with endometriosis status. The t-test was applied to continuous covariates, while the chi-squared test was used for categorical variables. When the assumptions of the chi-squared test were not met, Fisher’s exact test was used instead.
PCs of DNAm beta values were generated using the Omic-data-based Complex Trait Analysis (OSCA) software, and PCA plots were analysed to identify any variation in sample clustering based on covariates. Statistical tests, including ANOVA for categorical covariates and linear regression for continuous covariates, were performed to identify any significant association between the top 15 DNAm PCs and potential covariates.
Potential covariates that were either significantly associated with endometriosis status (p-value < 0.05) or DNAm PCs (p-value < 0.05 or showed differential clustering on PCA plots) were included as continuous covariates in the downstream analyses. All statistical tests were performed using R version 4.3.2.

4.5. Surrogate Variable Analysis

SV analyses were applied to eliminate batch effects and any hidden sources of variation not addressed by the selected covariates. Analysis was conducted using the R package SmartSVA. Briefly, the residuals of the linear model, where DNAm m-values were modelled as a function of endometriosis case–control status, were computed. A full model matrix, derived from the endometriosis case–control status of the samples, and a null model matrix were generated. SV is a continuous variable that represents sources of variation in DNAm values not contributed by endometriosis status and were estimated by combining all three components together. All SVs estimated were used to adjust the DNAm values prior to analysis.
SVs were generated from all the DNAm samples in this study (908 samples) and applied to the estimation of the proportion of variance in endometriosis captured by DNAm. Similarly, SVs were generated in training and test samples separately during the development of MRSs.

4.6. Estimation of the Proportion of Variance in Endometriosis Captured by DNAm

The proportion of variance in endometriosis risk that can be captured by common genetic variants (SNPs) alone was estimated using genome-based restricted maximum likelihood (GREML) from the GCTA software (version 1.94.1). Similarly, the proportion of variance in endometriosis risk between cases and controls that can be captured by DNAm was estimated using OREML from the OSCA (version 0.46) software [49]. Briefly, an ORM was generated from DNAm beta values of endometrium samples. The proportion of the trait variance captured was then estimated from the ORM using the OREML model. To calculate the proportion of variance in endometriosis captured by DNAm and common genetic variants combined, the ORM and GRM were generated from DNAm beta values and genotype data, respectively, and both were included in the OREML model simultaneously. The five OREML models used to estimate the proportion of variance in endometriosis captured were as follows:
  • ORM;
  • GRM;
  • ORM + GRM;
  • ORM + GRM + SV;
  • ORM + GRM + SV + age + institution + menstrual cycle phase;
and the results were compared. In the first model, ORM was generated using DNAm beta values from 881 individuals and 762,651 DNAm sites. GRM was generated using 5,201,970 SNPs in samples that passed both DNAm and genotyping quality control (881 samples). In the third model, both ORM and GRM were included simultaneously in the model. In the fourth model, the ORM generated in the second model was adjusted using SVs mentioned above prior to the OREML analysis. Lastly, covariates, namely age, institution, and menstrual cycle phase, were included in the fifth model.

4.7. Genetic PC Computation

We calculated the genetic PCs from the genotype data of participants that passed both the DNAm and genotyping quality control (881 samples) to correct for population stratification between samples from different institutions and for genetic ancestry in subsequent analyses. Genetic PCs were computed using the --pca feature from the software GCTA (version 1.94.1) [68], and a total of 11 PCs were selected to include as a covariate in the later analyses.

4.8. Methylation Risk Score (MRS) Development

A summary of MRS development and evaluation is illustrated in Figure 1. To evaluate the performance of the MRS developed, all samples that passed both DNAm and genotyping quality control were separated into training and test sets according to institution, in which samples from each institution were iteratively selected as the test set, while the remaining samples were used as the training set prior to the calculation of the MRS. The purpose of the training set was to estimate the effects used to calculate the MRS, and these effects were then applied to the test set to evaluate the performance of the MRS developed in an independent sample.
A total of four different combinations of training and test sets were analysed using this approach, as shown in Figure 1. Training Set 1 denotes IMB, Oxford, UCSF as training (800 samples); Training Set 2 denotes IMB, Oxford, CIR as training (513 samples); Training Set 3 denotes IMB, UCSF, CIR as training (741 samples); Training Set 4 denotes Oxford, UCSF, CIR as training (589 samples). Training Sets 1, 2, 3, and 4 had CIR (81 samples), UCSF (368 samples), IMB (292 samples), and Oxford (140 samples) as test sets, respectively. SVA analyses were then performed on the training and test sets separately.
Estimation of the effect sizes for each DNAm probe was performed using three different methods: MOA, MOMENT, and BLUP from the OSCA software [49]. MOA and MOMENT are reference-free mixed-linear models and are used for identifying DNAm sites that are associated with a complex trait.
The equation of the MOA (1) and MOMENT (2) models were as follows:
y =   w i b i + C β + Wu + e ,
y =   w i b i + C β + j W j u j + e ,
Briefly, y represents the phenotype values, wi is the standardised DNAm values of the target probe i, bi is the effect of probe i on the phenotype, C is a matrix for covariates, β is the effects of covariates on the phenotype, W is a matrix of DNAm values of all probes, u denotes the joint effects of all probes on the phenotype, e represents residuals, Wj is a matrix of DNAm values of the probes in the jth group (all probes except probes that are highly correlated to the target probe), and uj represents the joint effects of probes in Wj on the phenotype. A summation term for Wj and uj denotes that there could be more than one jth group of probes in the model, depending on the probe effect distribution. In summary, the difference between MOA and MOMENT is the way the effect sizes of each DNAm probe are generated. MOMENT first segregates probes into groups based on their probe effect distribution using linear regression and fits the two groups separately into the model, while MOA assumes all probe effects have a similar distribution and fits them as a single term into the model. In order to reduce convergence problems caused by too much variation explained by the first group of probes, a stepwise selection procedure was implemented in MOMENT to reduce the number of probes in the first group. Moreover, in MOMENT, probes that were highly correlated to the target probe, defined by probes that were located within 50 Kb in distance, were removed from the random-effect term of the model. MOMENT has been shown to be more powerful in correcting for potential confounders but with a slight loss in power compared to MOA [49]. This could reflect either the lack of power to detect true positives for MOMENT or potential false positives for MOA [47]. Hence, both models were used in this study, and the results were compared. Effect sizes for each DNAm probe were estimated by running --moa (MOA) and --moment2 (MOMENT) from the OSCA software on the selected training samples with endometriosis status of the samples as the phenotype. Covariates included in this analysis were age, institution, menstrual cycle phase, and genetic PCs. In addition to effect size, p values representing the statistical significance of associations between each probe and endometriosis were generated as well and were used as a criterion for probe selection. DNAm sites were mapped to the latest GRCh38 genome build and annotated to genes using GENCODE v41 to identify potential genes or regulatory regions associated with the site.
We also applied a genome-wide approach to estimate effect sizes, BLUP. BLUP is a statistical model that estimates the random effects in a mixed-linear model from the variance–covariance matrices of the random effects, the phenotype data, and the fixed effect terms [69]. In this study, effect sizes for each DNAm probe were derived from the overall effect of all DNAm probes, as represented by the ORM of the training individuals. The ORM was first generated from DNAm beta values of the training samples using OSCA. The joint effect of all probes on the phenotype for each sample in the training set was then estimated using --reml-pred-rand in OSCA with ORM, endometriosis status, and covariates including age, institution, menstrual cycle phase, and genetic PCs incorporated as input. Effect sizes of each probe were then predicted from the joint effect of all probes using the --blup-probe feature. Since BLUP probe effects were estimated from the aggregated effects of all probes, all DNAm probes (n = 762,651) were included to generate the MRS.
The definition of MRS is the addition of an individual’s weighted methylation markers’ beta values of a set of CpG sites as indicated in the formula:
MRSi = w1mi1 + … + wkmik
where w stands for the weights or effect size of each of the DNAm sites or markers, m is the methylation beta values, and k is the number of pre-selected methylation probes [35]. The score represents the collective effect of several DNAm sites of an individual. MRS was calculated on the test samples using their DNAm beta values, and the effect size of selected probes was estimated from the three models mentioned above according to Equation (3) using R version 4.2.1.

4.9. Correlation Between Effect Sizes Generated from MOA, MOMENT, and BLUP

To provide a more comprehensive assessment of the MRS, the correlation between effect sizes generated from the three different methods, MOA, MOMENT, and BLUP, was determined. For each training set, a pairwise comparison of effect sizes generated from all methods was performed. Pearson correlation coefficients were calculated using the cor() function in R version 4.3.2, and correlation plots were used to visualise the relationship.

4.10. MRS Evaluation

To assess the accuracy of these profile scores in classifying samples into endometriosis cases and controls, the AUC was employed. This curve illustrates the relationship between the false positive rate (specificity) and the true positive rate (sensitivity) in logistic regression. MRS was evaluated using case–control status as the outcome variable and MRS as the predictor of the logistic regression. The R version 4.2.1 package pROC was utilised for generating receiver-operator characteristic curves and calculating the AUC for each profile score [70]. The 95% confidence intervals for the AUC were calculated using the ci.auc function, employing the DeLong method.

4.11. Correlation Between MRSs Generated from Different MRS Models

Pairwise correlation analysis was carried out on MRS calculated from different models for each test set using the cor() function in R version 4.2.1. Correlation matrix and coefficients were plotted to visualise the relationship between MRSs derived from different models.

4.12. PRS Development and Evaluation

To compare the utility of MRS in the classification of endometriosis case–control with and without PRS, we calculated the PRS of participants in each test set and compared its performance in classifying endometriosis with the other two corresponding MRS classification models using the following models: (1) case–control status as the outcome variable and PRS as the predictor, and (2) case–control status as the outcome variable and MRS and PRS as the predictor of the logistic regression model. PRS was computed using the participant’s genotyping data as input and the plink2 score function on weightings generated from McGrath et al. [6]. Similar to evaluating the performance of MRS, the performance of PRS in classifying endometriosis case–control was identified by calculating the AUC of the logistic regression model where endometriosis case–control status was used as an outcome and PRS as the predictor. Notably, to provide a rigorous comparison between classification models, the AUC of the PRS logistic regression model was generated separately on each test set, and comparisons were made within test sets.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26083760/s1.

Author Contributions

Conceptualisation, S.M., G.W.M., A.F.M. and L.Y.T.; methodology, L.Y.T., S.M. and A.F.M.; software, L.Y.T.; validation, L.Y.T., S.M. and A.F.M.; formal analysis, L.Y.T.; investigation, L.Y.T.; resources, S.M., G.W.M., L.G. and M.S.; data curation, L.Y.T., S.M., G.W.M., L.G. and M.S.; writing—original draft preparation, L.Y.T.; writing—review and editing, S.M., G.W.M., A.F.M., L.G. and M.S.; visualisation, L.Y.T.; supervision, S.M., G.W.M. and A.F.M.; project administration, S.M. and G.W.M.; funding acquisition, G.W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Australian National Health and Medical Research Council (NHMRC), grant number GNT1177194.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of University of Queensland (2020/HE002852, approved 19 August 2021).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All data used in this study were obtained from Mortlock et al. [26]. Methylation data have been deposited and are available from GEO (GEO: GSE223817). Genotype data generated are available upon approval from dbGAP (phs003307.v1). Code used to run the analyses is available on github: https://github.com/Li-Ying-Thong/Methylation-Risk-Score-DNA-Methylation-Endometriosis.git (accessed on 5 February 2025).

Acknowledgments

We gratefully acknowledge all study participants and authors who contributed to the original Mortlock et al. [26] dataset used in this study. L.Y.T. is supported by the University of Queensland Research Training Program Scholarship.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
DNAmDNA methylation
MRSMethylation risk score
AUCArea under the receiver-operator curve
PRSPolygenic risk score
mQTLDNA methylation quantitative trait loci
SVSurrogate variable
ORMOmics relationship matrix
CIRCentre for Inflammation Research, University of Edinburgh
IMBInstitute for Molecular Bioscience, University of Queensland
UCSFUniversity of California San Francisco
OxfordOxford Endometriosis CaRe Centre
OSCAOmic-data-based Complex Trait Analysis
GREMLGenome-based restricted maximum likelihood
OREMLOmics residual maximum likelihood
GRMGenomic relationship matrix
MOAMLM-based omic association
MOMENTMulti-component MLM-based association excluding the target
BLUPBest linear unbiased prediction

References

  1. Chapron, C.; Marcellin, L.; Borghese, B.; Santulli, P. Rethinking mechanisms, diagnosis and management of endometriosis. Nat. Rev. Endocrinol. 2019, 15, 666–682. [Google Scholar] [CrossRef] [PubMed]
  2. Rowlands, I.; Abbott, J.; Montgomery, G.; Hockey, R.; Rogers, P.; Mishra, G. Prevalence and incidence of endometriosis in Australian women: A data linkage cohort study. BJOG Int. J. Obstet. Gynaecol. 2021, 128, 657–665. [Google Scholar] [CrossRef] [PubMed]
  3. Treloar, S.A.; O’connor, D.T.; O’connor, V.M.; Martin, N.G. Genetic influences on endometriosis in an Australian twin sample. Fertil. Steril. 1999, 71, 701–710. [Google Scholar] [CrossRef] [PubMed]
  4. Saha, R.; Pettersson, H.J.; Svedberg, P.; Olovsson, M.; Bergqvist, A.; Marions, L.; Tornvall, P.; Kuja-Halkola, R. Heritability of endometriosis. Fertil. Steril. 2015, 104, 947–952. [Google Scholar] [CrossRef]
  5. Rahmioglu, N.; Mortlock, S.; Ghiasi, M.; Møller, P.L.; Stefansdottir, L.; Galarneau, G.; Turman, C.; Danning, R.; Law, M.H.; Sapkota, Y.; et al. The genetic basis of endometriosis and comorbidity with other pain and inflammatory conditions. Nat. Genet. 2023, 55, 423–436. [Google Scholar] [CrossRef]
  6. McGrath, I.M.; Montgomery, G.W.; Mortlock, S.; International Endometriosis Genetics Consortium. Polygenic risk score phenome-wide association study reveals an association between endometriosis and testosterone. BMC Med. 2023, 21, 482. [Google Scholar] [CrossRef]
  7. Teschendorff, A.E.; Relton, C.L. Statistical and integrative system-level analysis of DNA methylation data. Nat. Rev. Genet. 2018, 19, 129–147. [Google Scholar] [CrossRef]
  8. Sun, Y.V. The Influences of Genetic and Environmental Factors on Methylome-Wide Association Studies for Human Diseases. Curr. Genet. Med. Rep. 2014, 2, 261–270. [Google Scholar] [CrossRef]
  9. Villicaña, S.; Bell, J.T. Genetic impacts on DNA methylation: Research findings and future perspectives. Genome Biol. 2021, 22, 127. [Google Scholar] [CrossRef]
  10. Fujii, R.; Ando, Y.; Yamada, H.; Tsuboi, Y.; Munetsuna, E.; Yamazaki, M.; Mizuno, G.; Maeda, K.; Ohashi, K.; Ishikawa, H.; et al. Integration of methylation quantitative trait loci (mQTL) on dietary intake on DNA methylation levels: An example of n-3 PUFA and ABCA1 gene. Eur. J. Clin. Nutr. 2023, 77, 881–887. [Google Scholar] [CrossRef]
  11. Lam, L.L.; Emberly, E.; Fraser, H.B.; Neumann, S.M.; Chen, E.; Miller, G.E.; Kobor, M.S. Factors underlying variable DNA methylation in a human community cohort. Proc. Natl. Acad. Sci. USA 2012, 109, 17253–17260. [Google Scholar] [CrossRef] [PubMed]
  12. Szyf, M. The early life environment and the epigenome. Biochim. Biophys. Acta (BBA) Gen. Subj. 2009, 1790, 878–885. [Google Scholar] [CrossRef] [PubMed]
  13. Labonté, B.; Suderman, M.; Maussion, G.; Navaro, L.; Yerko, V.; Mahar, I.; Bureau, A.; Mechawar, N.; Szyf, M.; Meaney, M.J.; et al. Genome-wide Epigenetic Regulation by Early-Life Trauma. Arch. Gen. Psychiatry 2012, 69, 722–731. [Google Scholar] [CrossRef] [PubMed]
  14. Pogribny, I.P.; Rusyn, I. Environmental Toxicants, Epigenetics, and Cancer. In Epigenetic Alterations in Oncogenesis; Karpf, A.R., Ed.; Springer: New York, NY, USA, 2013; pp. 215–232. [Google Scholar]
  15. Waterland, R.A.; Jirtle, R.L. Transposable Elements: Targets for Early Nutritional Effects on Epigenetic Gene Regulation. Mol. Cell. Biol. 2003, 23, 5293–5300. [Google Scholar] [CrossRef]
  16. Rönn, T.; Volkov, P.; Davegårdh, C.; Dayeh, T.; Hall, E.; Olsson, A.H.; Nilsson, E.; Tornberg, Å.; Dekker Nitert, M.; Eriksson, K.-F.; et al. A Six Months Exercise Intervention Influences the Genome-wide DNA Methylation Pattern in Human Adipose Tissue. PLoS Genet. 2013, 9, e1003572. [Google Scholar] [CrossRef]
  17. Barrès, R.; Yan, J.; Egan, B.; Treebak, J.T.; Rasmussen, M.; Fritz, T.; Caidahl, K.; Krook, A.; O’Gorman, D.J.; Zierath, J.R. Acute Exercise Remodels Promoter Methylation in Human Skeletal Muscle. Cell Metab. 2012, 15, 405–411. [Google Scholar] [CrossRef]
  18. Fambrini, M.; Sorbi, F.; Bussani, C.; Cioni, R.; Sisti, G.; Andersson, K.L. Hypermethylation of HOXA10 gene in mid-luteal endometrium from women with ovarian endometriomas. Acta Obstet. Gynecol. Scand. 2013, 92, 1331–1334. [Google Scholar] [CrossRef]
  19. Wu, Y.; Strawn, E.; Basir, Z.; Halverson, G.; Guo, S.-W. Promoter Hypermethylation of Progesterone Receptor Isoform B (PR-B) in Endometriosis. Epigenetics 2006, 1, 106–111. [Google Scholar] [CrossRef]
  20. Giudice, L.C.; Kao, L.C. Endometriosis. Lancet 2004, 364, 1789–1799. [Google Scholar] [CrossRef]
  21. Attia, G.R.; Zeitoun, K.; Edwards, D.; Johns, A.; Carr, B.R.; Bulun, S.E. Progesterone receptor isoform A but not B is expressed in endometriosis. J. Clin. Endocrinol. Metab. 2000, 85, 2897–2902. [Google Scholar] [CrossRef]
  22. Xue, Q.; Lin, Z.; Yin, P.; Milad, M.P.; Cheng, Y.-H.; Confino, E.; Reierstad, S.; Bulun, S.E. Transcriptional Activation of Steroidogenic Factor-1 by Hypomethylation of the 5′ CpG Island in Endometriosis. J. Clin. Endocrinol. Metab. 2007, 92, 3261–3267. [Google Scholar] [CrossRef]
  23. Xue, Q.; Lin, Z.; Cheng, Y.-H.; Huang, C.-C.; Marsh, E.; Yin, P.; Milad, M.P.; Confino, E.; Reierstad, S.; Innes, J. Promoter Methylation Regulates Estrogen Receptor 2 in Human Endometrium and Endometriosis. Biol. Reprod. 2007, 77, 681–687. [Google Scholar] [CrossRef] [PubMed]
  24. Naqvi, H.; Ilagan, Y.; Krikun, G.; Taylor, H.S. Altered Genome-Wide Methylation in Endometriosis. Reprod. Sci. 2014, 21, 1237–1243. [Google Scholar] [CrossRef] [PubMed]
  25. Dyson, M.T.; Roqueiro, D.; Monsivais, D.; Ercan, C.M.; Pavone, M.E.; Brooks, D.C.; Kakinuma, T.; Ono, M.; Jafari, N.; Dai, Y.; et al. Genome-Wide DNA Methylation Analysis Predicts an Epigenetic Switch for GATA Factor Expression in Endometriosis. PLoS Genet. 2014, 10, e1004158. [Google Scholar] [CrossRef]
  26. Mortlock, S.; Houshdaran, S.; Kosti, I.; Rahmioglu, N.; Nezhat, C.; Vitonis, A.F.; Andrews, S.V.; Grosjean, P.; Paranjpe, M.; Horne, A.W.; et al. Global endometrial DNA methylation analysis reveals insights into mQTL regulation and associated endometriosis disease risk and endometrial function. Commun. Biol. 2023, 6, 780. [Google Scholar] [CrossRef] [PubMed]
  27. Wahl, S.; Drong, A.; Lehne, B.; Loh, M.; Scott, W.R.; Kunze, S.; Tsai, P.-C.; Ried, J.S.; Zhang, W.; Yang, Y.; et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 2017, 541, 81–86. [Google Scholar] [CrossRef]
  28. Elliott, H.R.; Tillin, T.; McArdle, W.L.; Ho, K.; Duggirala, A.; Frayling, T.M.; Davey Smith, G.; Hughes, A.D.; Chaturvedi, N.; Relton, C.L. Differences in smoking associated DNA methylation patterns in South Asians and Europeans. Clin. Epigenetics 2014, 6, 4. [Google Scholar] [CrossRef]
  29. Hannon, E.; Dempster, E.; Viana, J.; Burrage, J.; Smith, A.R.; Macdonald, R.; St Clair, D.; Mustard, C.; Breen, G.; Therman, S.; et al. An integrated genetic-epigenetic analysis of schizophrenia: Evidence for co-localization of genetic associations and differential DNA Methylation. Genome Biol. 2016, 17, 176. [Google Scholar] [CrossRef]
  30. Eze, I.C.; Imboden, M.; Kumar, A.; von Eckardstein, A.; Stolz, D.; Gerbase, M.W.; Künzli, N.; Pons, M.; Kronenberg, F.; Schindler, C.; et al. Air pollution and diabetes association: Modification by type 2 diabetes genetic risk score. Environ. Int. 2016, 94, 263–271. [Google Scholar] [CrossRef]
  31. Rask-Andersen, M.; Karlsson, T.; Ek, W.E.; Johansson, Å. Gene-environment interaction study for BMI reveals interactions between genetic factors and physical activity, alcohol consumption and socioeconomic Status. PLoS Genet. 2017, 13, e1006977. [Google Scholar] [CrossRef]
  32. Qi, Q.; Chu, A.Y.; Kang, J.H.; Huang, J.; Rose, L.M.; Jensen, M.K.; Liang, L.; Curhan, G.C.; Pasquale, L.R.; Wiggs, J.L.; et al. Fried food consumption, genetic risk, and body mass index: Gene-diet interaction analysis in three US cohort Studies. BMJ Br. Med. J. 2014, 348, g1610. [Google Scholar] [CrossRef] [PubMed]
  33. Villanueva, A.; Portela, A.; Sayols, S.; Battiston, C.; Hoshida, Y.; Méndez-González, J.; Imbeaud, S.; Letouzé, E.; Hernandez-Gea, V.; Cornella, H.; et al. DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology 2015, 61, 1945–1956. [Google Scholar] [CrossRef] [PubMed]
  34. Moreaux, J.; Bruyer, A.; Veyrune, J.-L.; Goldschmidt, H.; Hose, D.; Klein, B. DNA methylation score is predictive of myeloma cell sensitivity to 5-azacitidine. Br. J. Haematol. 2014, 164, 613–616. [Google Scholar] [CrossRef] [PubMed]
  35. Hüls, A.; Czamara, D. Methodological challenges in constructing DNA methylation risk scores. Epigenetics 2020, 15, 1–11. [Google Scholar] [CrossRef]
  36. Do, W.L.; Gohar, J.; McCullough, L.E.; Galaviz, K.I.; Conneely, K.N.; Narayan, K.M.V. Examining the association between adiposity and DNA methylation: A systematic review and meta-analysis. Obes. Rev. 2021, 22, e13319. [Google Scholar] [CrossRef]
  37. Nabais, M.F.; Laws, S.M.; Lin, T.; Vallerga, C.L.; Armstrong, N.J.; Blair, I.P.; Kwok, J.B.; Mather, K.A.; Mellick, G.D.; Sachdev, P.S.; et al. Meta-analysis of genome-wide DNA methylation identifies shared associations across neurodegenerative disorders. Genome Biol. 2021, 22, 90. [Google Scholar] [CrossRef]
  38. Lee, K.W.; Pausova, Z. Cigarette smoking and DNA methylation. Front. Genet. 2013, 4, 132. [Google Scholar] [CrossRef]
  39. Grönniger, E.; Weber, B.; Heil, O.; Peters, N.; Stäb, F.; Wenck, H.; Korn, B.; Winnefeld, M.; Lyko, F. Aging and Chronic Sun Exposure Cause Distinct Epigenetic Changes in Human Skin. PLoS Genet. 2010, 6, e1000971. [Google Scholar] [CrossRef]
  40. Martin, E.M.; Fry, R.C. Environmental Influences on the Epigenome: Exposure- Associated DNA Methylation in Human Populations. Annu. Rev. Public Health 2018, 39, 309–333. [Google Scholar] [CrossRef]
  41. Volkov, P.; Olsson, A.H.; Gillberg, L.; Jørgensen, S.W.; Brøns, C.; Eriksson, K.-F.; Groop, L.; Jansson, P.-A.; Nilsson, E.; Rönn, T.; et al. A Genome-Wide mQTL Analysis in Human Adipose Tissue Identifies Genetic Variants Associated with DNA Methylation, Gene Expression and Metabolic Traits. PLoS ONE 2016, 11, e0157776. [Google Scholar] [CrossRef]
  42. Hatton, A.A.; Hillary, R.F.; Bernabeu, E.; McCartney, D.L.; Marioni, R.E.; McRae, A.F. Blood-based genome-wide DNA methylation correlations across body-fat- and adiposity-related biochemical traits. Am. J. Hum. Genet. 2023, 110, 1564–1573. [Google Scholar] [CrossRef] [PubMed]
  43. Vallerga, C.L.; Zhang, F.; Fowdar, J.; McRae, A.F.; Qi, T.; Nabais, M.F.; Zhang, Q.; Kassam, I.; Henders, A.K.; Wallace, L.; et al. Analysis of DNA methylation associates the cystine–glutamate antiporter SLC7A11 with risk of Parkinson’s disease. Nat. Commun. 2020, 11, 1238. [Google Scholar] [CrossRef]
  44. Yap, C.X.; Henders, A.K.; Alvares, G.A.; Giles, C.; Huynh, K.; Nguyen, A.; Wallace, L.; McLaren, T.; Yang, Y.; Hernandez, L.M.; et al. Interactions between the lipidome and genetic and environmental factors in autism. Nat. Med. 2023, 29, 936–949. [Google Scholar] [CrossRef] [PubMed]
  45. Lee, S.H.; Harold, D.; Nyholt, D.R.; Goddard, M.E.; Zondervan, K.T.; Williams, J.; Montgomery, G.W.; Wray, N.R.; Visscher, P.M. Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer’s disease, multiple sclerosis and endometriosis. Hum. Mol. Genet. 2013, 22, 832–841. [Google Scholar] [CrossRef] [PubMed]
  46. Barbu, M.C.; Shen, X.; Walker, R.M.; Howard, D.M.; Evans, K.L.; Whalley, H.C.; Porteous, D.J.; Morris, S.W.; Deary, I.J.; Zeng, Y.; et al. Epigenetic prediction of major depressive disorder. Mol. Psychiatry 2021, 26, 5112–5123. [Google Scholar] [CrossRef]
  47. Nabais, M.F.; Lin, T.; Benyamin, B.; Williams, K.L.; Garton, F.C.; Vinkhuyzen, A.A.E.; Zhang, F.; Vallerga, C.L.; Restuadi, R.; Freydenzon, A.; et al. Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis. npj Genom. Med. 2020, 5, 10. [Google Scholar] [CrossRef]
  48. Kresovich, J.K.; Xu, Z.; O’Brien, K.M.; Shi, M.; Weinberg, C.R.; Sandler, D.P.; Taylor, J.A. Blood DNA methylation profiles improve breast cancer prediction. Mol. Oncol. 2022, 16, 42–53. [Google Scholar] [CrossRef]
  49. Zhang, F.; Chen, W.; Zhu, Z.; Zhang, Q.; Nabais, M.F.; Qi, T.; Deary, I.J.; Wray, N.R.; Visscher, P.M.; McRae, A.F.; et al. OSCA: A tool for omic-data-based complex trait analysis. Genome Biol. 2019, 20, 107. [Google Scholar] [CrossRef]
  50. Kloeve-Mogensen, K.; Rohde, P.D.; Twisttmann, S.; Nygaard, M.; Koldby, K.M.; Steffensen, R.; Dahl, C.M.; Rytter, D.; Overgaard, M.T.; Forman, A.; et al. Polygenic Risk Score Prediction for Endometriosis. Front. Reprod. Health 2021, 3, 793226. [Google Scholar] [CrossRef]
  51. Shah, S.; Bonder, M.J.; Marioni, R.E.; Zhu, Z.; McRae, A.F.; Zhernakova, A.; Harris, S.E.; Liewald, D.; Henders, A.K.; Mendelson, M.M.; et al. Improving Phenotypic Prediction by Combining Genetic and Epigenetic Associations. Am. J. Hum. Genet. 2015, 97, 75–85. [Google Scholar] [CrossRef]
  52. Mahalingaiah, S.; Hart, J.E.; Laden, F.; Aschengrau, A.; Missmer, S.A. Air Pollution Exposures During Adulthood and Risk of Endometriosis in the Nurses’ Health Study II. Environ. Health Perspect. 2014, 122, 58–64. [Google Scholar] [CrossRef] [PubMed]
  53. Szczęsna, D.; Wieczorek, K.; Jurewicz, J. An exposure to endocrine active persistent pollutants and endometriosis—A review of current epidemiological studies. Environ. Sci. Pollut. Res. 2023, 30, 13974–13993. [Google Scholar] [CrossRef] [PubMed]
  54. Yamamoto, A.; Harris, H.R.; Vitonis, A.F.; Chavarro, J.E.; Missmer, S.A. A prospective cohort study of meat and fish consumption and endometriosis risk. Am. J. Obstet. Gynecol. 2018, 219, e171–e178. [Google Scholar] [CrossRef] [PubMed]
  55. Farland, L.V.; Degnan, W.J.; Harris, H.R.; Han, J.; Cho, E.; VoPham, T.; Kvaskoff, M.; Missmer, S.A. Recreational and residential sun exposure and risk of endometriosis: A prospective cohort study. Hum. Reprod. 2020, 36, 199–210. [Google Scholar] [CrossRef]
  56. Colwell, M.L.; Townsel, C.; Petroff, R.L.; Goodrich, J.M.; Dolinoy, D.C. Epigenetics and the exposome: DNA methylation as a proxy for health impacts of prenatal environmental exposures. Exposome 2023, 3, osad001. [Google Scholar] [CrossRef]
  57. Ronsini, C.; Fumiento, P.; Iavarone, I.; Greco, P.F.; Cobellis, L.; De Franciscis, P. Liquid Biopsy in Endometriosis: A Systematic Review. Int. J. Mol. Sci. 2023, 24, 6116. [Google Scholar] [CrossRef]
  58. Gajbhiye, R.; Sonawani, A.; Khan, S.; Suryawanshi, A.; Kadam, S.; Warty, N.; Raut, V.; Khole, V. Identification and validation of novel serum markers for early diagnosis of endometriosis. Hum. Reprod. 2011, 27, 408–417. [Google Scholar] [CrossRef]
  59. Warren, L.A.; Shih, A.; Renteira, S.M.; Seckin, T.; Blau, B.; Simpfendorfer, K.; Lee, A.; Metz, C.N.; Gregersen, P.K. Analysis of menstrual effluent: Diagnostic potential for endometriosis. Mol. Med. 2018, 24, 1. [Google Scholar] [CrossRef]
  60. Lokk, K.; Modhukur, V.; Rajashekar, B.; Märtens, K.; Mägi, R.; Kolde, R.; Koltšina, M.; Nilsson, T.K.; Vilo, J.; Salumets, A.; et al. DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns. Genome Biol. 2014, 15, r54. [Google Scholar] [CrossRef]
  61. Luo, C.; Hajkova, P.; Ecker, J.R. Dynamic DNA methylation: In the right place at the right time. Science 2018, 361, 1336–1340. [Google Scholar] [CrossRef]
  62. Rahmioglu, N.; Drong, A.W.; Helen, L.; Thomas, T.; Karin, H.; Merli, S.; Triin, L.-P.; Christine, D.; Emily, T.; George, N.; et al. Variability of genome-wide DNA methylation and mRNA expression profiles in reproductive and endocrine disease related tissues. Epigenetics 2017, 12, 897–908. [Google Scholar] [CrossRef] [PubMed]
  63. Zhou, W.; Laird, P.W.; Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2016, 45, e22. [Google Scholar] [CrossRef] [PubMed]
  64. Horvath, S.; Zhang, Y.; Langfelder, P.; Kahn, R.S.; Boks, M.P.M.; van Eijk, K.; van den Berg, L.H.; Ophoff, R.A. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol. 2012, 13, R97. [Google Scholar] [CrossRef] [PubMed]
  65. Horvath, S.; Gurven, M.; Levine, M.E.; Trumble, B.C.; Kaplan, H.; Allayee, H.; Ritz, B.R.; Chen, B.; Lu, A.T.; Rickabaugh, T.M.; et al. An epigenetic clock analysis of race/ethnicity, sex, and coronary heart disease. Genome Biol. 2016, 17, 171. [Google Scholar] [CrossRef]
  66. Ross, J.P.; van Dijk, S.; Phang, M.; Skilton, M.R.; Molloy, P.L.; Oytam, Y. Batch-effect detection, correction and characterisation in Illumina HumanMethylation450 and MethylationEPIC BeadChip array data. Clin. Epigenetics 2022, 14, 58. [Google Scholar] [CrossRef]
  67. Noyes, R.W.; Hertig, A.T.; Rock, J. Dating the endometrial biopsy. Am. J. Obstet. Gynecol. 1975, 122, 262–263. [Google Scholar] [CrossRef]
  68. Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A Tool for Genome-wide Complex Trait Analysis. Am. J. Hum. Genet. 2011, 88, 76–82. [Google Scholar] [CrossRef]
  69. Robinson, G.K. That BLUP is a Good Thing: The Estimation of Random Effects. Stat. Sci. 1991, 6, 15–32, 18. [Google Scholar]
  70. Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.-C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [Google Scholar] [CrossRef]
Figure 1. Methylation risk score (MRS) development and evaluation pipeline. Outline of steps used to generate MRS. Samples (N = 881) were split into training and test sets according to the institutions, as shown in the bottom left panel. Four different training and test set combinations were formed (purple denoted training set; green represented test set), and the same MRS development and evaluation process was applied to each combination. The asterisk indicates the parts of the flowchart where different combinations of training and test sets are applied. The effect size of DNAm probes was estimated using MLM-based omic association (MOA), multi-component MLM-based association excluding the target (MOMENT), and best linear unbiased prediction (BLUP). DNAm probes to be included in the MRS were selected according to their p-value threshold for MOA and MOMENT, while all DNAm probes were included in the MRS for BLUP. MRS was calculated on the test samples according to the features selected from the training set (effect size and DNAm probes), and the area under the receiver-operator curve (AUC) was computed to evaluate the performance of the MRS. Figure created in BioRender, https://BioRender.com/n39u049 (accessed on 28 January 2025).
Figure 1. Methylation risk score (MRS) development and evaluation pipeline. Outline of steps used to generate MRS. Samples (N = 881) were split into training and test sets according to the institutions, as shown in the bottom left panel. Four different training and test set combinations were formed (purple denoted training set; green represented test set), and the same MRS development and evaluation process was applied to each combination. The asterisk indicates the parts of the flowchart where different combinations of training and test sets are applied. The effect size of DNAm probes was estimated using MLM-based omic association (MOA), multi-component MLM-based association excluding the target (MOMENT), and best linear unbiased prediction (BLUP). DNAm probes to be included in the MRS were selected according to their p-value threshold for MOA and MOMENT, while all DNAm probes were included in the MRS for BLUP. MRS was calculated on the test samples according to the features selected from the training set (effect size and DNAm probes), and the area under the receiver-operator curve (AUC) was computed to evaluate the performance of the MRS. Figure created in BioRender, https://BioRender.com/n39u049 (accessed on 28 January 2025).
Ijms 26 03760 g001
Figure 2. Maximum accuracy across different endometriosis MRS methods for each test set. (a) The accuracy of the MRS-only classification model, in which case–control status was the outcome variable, and MRS was the predictor. CIR_MOA_0.001: n = 746, pMRS = 0.0073; UCSF_MOMENT_0.01: n = 7447, pMRS = 0.0440; Oxford_MOA_0.1: n = 76,350, pMRS = 0.2695; IMB_MOMENT_0.2: n = 152,686, pMRS = 0.0494. (b) Accuracy for the MRS + polygenic risk score (PRS) classification model, in which case–control status was the outcome variable and MRS and PRS were the predictors. CIR_MOA_0.001: n = 746, pMRS = 0.0072, pPRS = 2.15 × 10−2; UCSF_MOMENT_0.01: n = 7447, pMRS = 0.0541, pPRS = 3.41 × 10−2; Oxford_MOA_0.1: n = 76,350, pMRS = 0.2643, pPRS = 4.30 × 10−1; IMB_MOMENT_0.5: n = 382,410, pMRS = 0.0697, pPRS = 2.15 × 10−2. The X-axis shows the test set names, the method for estimating DNAm probe effect sizes, and the p-value threshold for probe selection. “n” refers to the number of DNAm probes. The Y-axis plots AUCs, with error bars representing 95% confidence intervals. pMRS and pPRS represent the p-values from logistic regression for the association between endometriosis and MRS or PRS, respectively.
Figure 2. Maximum accuracy across different endometriosis MRS methods for each test set. (a) The accuracy of the MRS-only classification model, in which case–control status was the outcome variable, and MRS was the predictor. CIR_MOA_0.001: n = 746, pMRS = 0.0073; UCSF_MOMENT_0.01: n = 7447, pMRS = 0.0440; Oxford_MOA_0.1: n = 76,350, pMRS = 0.2695; IMB_MOMENT_0.2: n = 152,686, pMRS = 0.0494. (b) Accuracy for the MRS + polygenic risk score (PRS) classification model, in which case–control status was the outcome variable and MRS and PRS were the predictors. CIR_MOA_0.001: n = 746, pMRS = 0.0072, pPRS = 2.15 × 10−2; UCSF_MOMENT_0.01: n = 7447, pMRS = 0.0541, pPRS = 3.41 × 10−2; Oxford_MOA_0.1: n = 76,350, pMRS = 0.2643, pPRS = 4.30 × 10−1; IMB_MOMENT_0.5: n = 382,410, pMRS = 0.0697, pPRS = 2.15 × 10−2. The X-axis shows the test set names, the method for estimating DNAm probe effect sizes, and the p-value threshold for probe selection. “n” refers to the number of DNAm probes. The Y-axis plots AUCs, with error bars representing 95% confidence intervals. pMRS and pPRS represent the p-values from logistic regression for the association between endometriosis and MRS or PRS, respectively.
Ijms 26 03760 g002
Figure 3. Accuracy of endometriosis case–control classification increased with the inclusion of both MRS and PRS: (a) CIR test set (PRS: n = 784,516, pPRS = 2.04 × 10−2; MRS: n = 746, pMRS = 0.0073; MRS + PRS: n = 746, pMRS = 0.0072, pPRS = 2.15 × 10−2); (b) UCSF test set (PRS: n = 784,516, pPRS = 0.0279; MRS: n = 7447, pMRS = 0.0440; MRS + PRS: n = 7447, pMRS = 0.0541, pPRS = 0.0341); (c) Oxford test set (PRS: n = 784,516, pPRS = 4.4 × 10−1; MRS: n = 76,350, pMRS = 0.2695; MRS + PRS: n = 76,350, pMRS = 0.2643, pPRS = 4.3 × 10−1); (d) IMB test set (PRS: n = 784,516, pPRS = 1.55 × 10−2; MRS: n = 152,686, pMRS = 0.0494; MRS + PRS: n = 382,410, pMRS = 0.0697, pPRS = 2.15 × 10−2). The number of DNAm probes included is denoted by n. Classification models were labelled on the X-axis. AUCs were plotted on the Y-axis and labelled at the bottom of each bar graph. Error bars indicate 95% confidence intervals. Red dashed lines indicate an AUC of 0.5. pMRS denotes p-values from the logistic regression model showing the association between endometriosis and MRS. Similarly, pPRS denotes p-values from the logistic regression model showing the association between endometriosis and PRS.
Figure 3. Accuracy of endometriosis case–control classification increased with the inclusion of both MRS and PRS: (a) CIR test set (PRS: n = 784,516, pPRS = 2.04 × 10−2; MRS: n = 746, pMRS = 0.0073; MRS + PRS: n = 746, pMRS = 0.0072, pPRS = 2.15 × 10−2); (b) UCSF test set (PRS: n = 784,516, pPRS = 0.0279; MRS: n = 7447, pMRS = 0.0440; MRS + PRS: n = 7447, pMRS = 0.0541, pPRS = 0.0341); (c) Oxford test set (PRS: n = 784,516, pPRS = 4.4 × 10−1; MRS: n = 76,350, pMRS = 0.2695; MRS + PRS: n = 76,350, pMRS = 0.2643, pPRS = 4.3 × 10−1); (d) IMB test set (PRS: n = 784,516, pPRS = 1.55 × 10−2; MRS: n = 152,686, pMRS = 0.0494; MRS + PRS: n = 382,410, pMRS = 0.0697, pPRS = 2.15 × 10−2). The number of DNAm probes included is denoted by n. Classification models were labelled on the X-axis. AUCs were plotted on the Y-axis and labelled at the bottom of each bar graph. Error bars indicate 95% confidence intervals. Red dashed lines indicate an AUC of 0.5. pMRS denotes p-values from the logistic regression model showing the association between endometriosis and MRS. Similarly, pPRS denotes p-values from the logistic regression model showing the association between endometriosis and PRS.
Ijms 26 03760 g003
Figure 4. Accuracy of endometriosis case–control classification within European-ancestry samples increased with the inclusion of both MRS and PRS: (a) CIR test sets (PRS: n = 784,516, pPRS = 2.46 × 10−2; MRS: n = 381,992, pMRS = 0.0605; MRS + PRS: n = 762,651, pMRS = 0.0953, pPRS = 2.93 × 10−2); (b) UCSF test sets (PRS: n = 784,516, pPRS = 4.45 × 10−2; MRS: n = 7303, pMRS = 0.2367; MRS + PRS: n = 7303, pMRS = 0.1994, pPRS = 3.87 × 10−2); (c) Oxford test sets (PRS: n = 784,516, pPRS = 4.36 × 10−1; MRS: n = 752, pMRS = 0.4941; MRS + PRS: n = 762,651, pMRS = 0.5596, pPRS = 4.37 × 10−1); (d) IMB test sets (PRS: n = 784,516, pPRS = 1.65 × 10−2; MRS: n = 762,651, pMRS = 0.1009; MRS + PRS: n = 76,238, pMRS = 0.1808, pPRS = 2.32 × 10−2). The number of DNAm probes included is denoted by n. Classification models were labelled on the X-axis. AUCs were plotted on the Y-axis and labelled at the bottom of each bar graph. Error bars indicate 95% confidence intervals. Red dashed lines indicate an AUC of 0.5. pMRS denotes p-values from the logistic regression model showing the association between endometriosis and MRS. Similarly, pPRS denotes p-values from the logistic regression model showing the association between endometriosis and PRS.
Figure 4. Accuracy of endometriosis case–control classification within European-ancestry samples increased with the inclusion of both MRS and PRS: (a) CIR test sets (PRS: n = 784,516, pPRS = 2.46 × 10−2; MRS: n = 381,992, pMRS = 0.0605; MRS + PRS: n = 762,651, pMRS = 0.0953, pPRS = 2.93 × 10−2); (b) UCSF test sets (PRS: n = 784,516, pPRS = 4.45 × 10−2; MRS: n = 7303, pMRS = 0.2367; MRS + PRS: n = 7303, pMRS = 0.1994, pPRS = 3.87 × 10−2); (c) Oxford test sets (PRS: n = 784,516, pPRS = 4.36 × 10−1; MRS: n = 752, pMRS = 0.4941; MRS + PRS: n = 762,651, pMRS = 0.5596, pPRS = 4.37 × 10−1); (d) IMB test sets (PRS: n = 784,516, pPRS = 1.65 × 10−2; MRS: n = 762,651, pMRS = 0.1009; MRS + PRS: n = 76,238, pMRS = 0.1808, pPRS = 2.32 × 10−2). The number of DNAm probes included is denoted by n. Classification models were labelled on the X-axis. AUCs were plotted on the Y-axis and labelled at the bottom of each bar graph. Error bars indicate 95% confidence intervals. Red dashed lines indicate an AUC of 0.5. pMRS denotes p-values from the logistic regression model showing the association between endometriosis and MRS. Similarly, pPRS denotes p-values from the logistic regression model showing the association between endometriosis and PRS.
Ijms 26 03760 g004
Table 1. Sample information and differences between cases and controls.
Table 1. Sample information and differences between cases and controls.
CharacteristicsEndometriosis Statusp-Value
Controls (N = 318)Cases (N = 590)
Age
Mean [95% CI] (range)
t-test
37
[36.1–37.9]
(18–55)
(N = 314)
34.2
[33.6–34.8]
(18–53)
(N = 587)
1.29 × 10−6
Menstrual cycle phase
N (%)
Chi-squared
Proliferative154 (48.4%)285 (48.3%)0.75
Secretory
(undefined sub-phase)
7 (2.2%)14 (2.4%)
Early secretory41 (12.9%)71 (12.0%)
Mid-secretory72 (22.6%)121 (20.5%)
Late secretory33 (10.4%)66 (11.2%)
Menstrual11 (3.5%)33 (5.6%)
Institutions
N (%)
Chi-squared
CIR 131 (9.7%)52 (8.8%)8.11 × 10−5
IMB 283 (26.1%)213 (36.1%)
Oxford 341 (12.9%)110 (18.6%)
UCSF 4163 (51.3%)215 (36.4%)
Genetic ancestry
N (%)
Chi-squared
ADMIX24 (7.5%)49 (8.3%)1.89 × 10−6
African33 (10.4%)13 (2.2%)
American21 (6.6%)29 (4.9%)
Eastern Asian25 (7.9%)47 (8.0%)
European207 (65.1%)417 (70.7%)
Southern Asian8 (2.5%)35 (5.9%)
1 Centre for Inflammation Research, University of Edinburgh. 2 Institute for Molecular Bioscience, University of Queensland. 3 University of California San Francisco. 4 Oxford Endometriosis CaRe Centre.
Table 2. Proportion of variance in endometriosis status captured by DNA methylation (DNAm) and common genetic variants.
Table 2. Proportion of variance in endometriosis status captured by DNA methylation (DNAm) and common genetic variants.
No.OREML ModelsProportion of Variance Captured 2 (s.e. a)Phenotypic
Variance 1 (s.e. a)
ORM bGRM cORM + GRM e
1ORM b19.58% (0.07)--0.2481 (0.02)
2GRM c-28.83% (0.17)-0.2251 (0.01)
3ORM b + GRM c12.35% (0.06)22.38% (0.15)34.73%0.2361 (0.01)
4ORM b + GRM c + surrogate variable (SVs) d10.70% (0.07)27.94% (0.16)38.64%0.2251 (0.01)
5ORM b + GRM c + SVs d + age + institution + menstrual cycle phase18.25% (0.08)23.78% (0.15)42.03%0.2187 (0.01)
1 Phenotypic variance denotes the variability in endometriosis status among the sample population. It quantitatively measures the extent to which the two possible outcomes of endometriosis (cases and controls) vary within the sample population. 2 Proportions of variance captured were shown in percentages. They were calculated using (   [ variance   captured   by   the   omics   data / phenotypic   variance ] × 100 % ). a s.e. represents standard error. b Omics relationship matrix (ORM) represents the omics relationship matrix derived from DNAm beta values from endometrium samples, which also denotes the proportion of endometriosis status variance captured by DNAm. c Genomic relationship matrix (GRM) represents the genomic relationship matrix derived from the genotype data of the samples, which also indicates the proportion of endometriosis status variance captured by common genetic variants. d DNAm values had been preadjusted for batch effects using SVA. e The proportion of endometriosis status variance captured by DNAm and common genetic variants is combined and computed by adding the captured variance by ORM + GRM within their respective models.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Thong, L.Y.; McRae, A.F.; Sirota, M.; Giudice, L.; Montgomery, G.W.; Mortlock, S. Methylation Risk Score Modelling in Endometriosis: Evidence for Non-Genetic DNA Methylation Effects in a Case–Control Study. Int. J. Mol. Sci. 2025, 26, 3760. https://doi.org/10.3390/ijms26083760

AMA Style

Thong LY, McRae AF, Sirota M, Giudice L, Montgomery GW, Mortlock S. Methylation Risk Score Modelling in Endometriosis: Evidence for Non-Genetic DNA Methylation Effects in a Case–Control Study. International Journal of Molecular Sciences. 2025; 26(8):3760. https://doi.org/10.3390/ijms26083760

Chicago/Turabian Style

Thong, Li Ying, Allan F. McRae, Marina Sirota, Linda Giudice, Grant W. Montgomery, and Sally Mortlock. 2025. "Methylation Risk Score Modelling in Endometriosis: Evidence for Non-Genetic DNA Methylation Effects in a Case–Control Study" International Journal of Molecular Sciences 26, no. 8: 3760. https://doi.org/10.3390/ijms26083760

APA Style

Thong, L. Y., McRae, A. F., Sirota, M., Giudice, L., Montgomery, G. W., & Mortlock, S. (2025). Methylation Risk Score Modelling in Endometriosis: Evidence for Non-Genetic DNA Methylation Effects in a Case–Control Study. International Journal of Molecular Sciences, 26(8), 3760. https://doi.org/10.3390/ijms26083760

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop