1. Introduction
Aging is a natural and gradual process characterized by deterioration of physiological functions and increased vulnerability to disease over time. It encompasses molecular, cellular, and organismal changes leading to decreased tissue function, heightened susceptibility to stress, and a greater risk of age-related diseases. These diseases often include other age-associated processes and conditions such as reduced immune defense, cardiovascular disease, cancer, neurodegenerative disorders, and musculoskeletal diseases (such as arthritis or osteoporosis) [
1]. Therefore, aging itself is a significant factor contributing to the onset of most diseases present in older adults. The concept of ‘biological age’ was first introduced in 1947 by H. Benjamin [
2] who published one of the first papers indicating that biological age could provide a more accurate measure of an individual’s ageing than chronological age, which only reflects the time elapsed time since birth.
It is now well established that aging is influenced by genetic, environmental, and lifestyle factors, as well as epigenetic mechanisms regulating gene expression and cellular function, which contribute to differences between sexes [
3] and impact the acceleration of aging [
4,
5]. Several tools have been developed to measure biological age using a variety of approaches, such as histology-based data, metabolomics, proteomics, and DNA methylation. Other tools incorporate multiple biomarkers, including clinical variables obtained from blood analysis, hematology, anthropometry, organ function tests, functional aging indices, and frailty indices [
6,
7]. One of the first molecular strategies developed to assess the biological clock was measuring the length of telomeres: repetitive DNA structures at the ends of chromosomes that shorten with each cell division cycle, a mechanism associated with cellular ageing and health status [
8]. Although various methods have been developed to measure telomere length, these exhibit a variability that directly affects the results [
9,
10]. Moreover, some epidemiological studies have shown contradictory results regarding the relationship between telomere shortening and age [
7].
Current approaches based on the exploration of epigenetic clocks have demonstrated a potential to generate more accurate biological age predictors, as most of these clocks correlate well with chronological age [
11]. Some DNA methylation clocks (e.g., Horvath, Hannum and Levine) are based on methylation levels at cytosine followed by guanine nucleotide (CpG) sites, while other clocks are based on histone modifications (e.g., the Rechsteiner clock) or miRNA expression levels (e.g., the Huan clock); some are currently being used as biomarkers to predict morbidity and mortality [
12,
13,
14]. Research on DNA methylation has shown that specific regulatory regions of several genes and promoters become progressively methylated with age, indicating a strong functional link between age and DNA methylation [
15].
Despite the availability of various tools for measuring biological age, these technologies are not yet widely used in clinical practice due to the need for further clinical applications and validation in clinical trials with larger cohorts, as well as their high cost [
16], which, in turn, limits the design of these much-needed bigger clinical trials. To address this barrier, there is currently growing interest in developing clinical epigenetic clocks that use a limited number of CpG sites for age prediction. In this context, methylation levels of CpG islands within the promoter region of the very long chain fatty acid elongase 2 gene
ELOVL2 gene, coding for a transmembrane protein involved in the synthesis of long (C22 and C24) omega-3 and omega-6 polyunsaturated fatty acids (PUFA), have been linked to chronological age across diverse populations, cell types, and tissues [
17,
18]. Interestingly, ELOVL2 is an enzyme that elongates long-chain omega-3 and omega-6 polyunsaturated fatty acids (LC-PUFAs), precursors of 22:6n-3, docosahexaenoic acid (DHA), and very-long-chain PUFAs [
19], which reduces inflammation and has been proposed as a molecule that promotes healthy aging [
20]. This is noteworthy as PUFAs play a key role in key biological functions including energy production, modulating inflammation, and maintaining cell membrane integrity.
Further studies by Jung and colleagues have confirmed the correlation between
ELOVL2 methylation and age across various biospecimens, including blood, saliva, and buccal samples. They also investigated methylation levels in other genes such as
KLF14 and
TRIM59, showing consistent age prediction models across different tissues [
21]. Similarly, Slieker et al. analyzed DNA methylation data from multiple tissues and identified tissue-dependent methylation changes, with
ELOVL2 methylation varying between tissues. For example, cerebellar and other brain tissues exhibited low methylation levels compared to the skin, in which methylation increased with age [
17].
Methylation evaluation of the ELOVL2 gene can be used as a quantitative measure of biological aging, serving as a simplified or minimal clock for age prediction. This approach could help reduce costs and facilitate the assessment of biological aging in clinical trials using large patient cohorts. Predicting biological age in this way could indicate whether individuals are experiencing healthy aging or are in an accelerated aging state compared to their chronological age. One of the key challenges with these clocks is ensuring robustness. In this study, we propose a potentially simplified and robust biological clock, which could provide a more practical and reliable measurement of biological age for widespread use.
3. Discussion
Epigenetics has emerged as a key mechanism underlying the molecular processes that contribute to aging [
24]. For this reason, characterization of epigenetic clocks is gaining importance because of its potential to evaluate biological aging, establish health status [
13,
14], and assess the effects of interventions aimed at promoting healthy aging [
14,
25]. In this context, we have recently described an epigenetic clock based on miRNAs to evaluate the biological age of skin [
26].
Among the different mechanisms of epigenetic regulation, DNA methylation represents a remarkably stable epigenetic signature. During aging, a global age-associated hypomethylation has been observed, however, the process has also been reported in specific loci, including tumor suppressor genes and Polycomb target genes [
1,
27].
The promoter of the
ELOVL2 gene is one of the most informative genes related to human epigenetic age [
28].
ELOVL2 methylation has been shown to strongly correlate with biological age in humans [
29], as well as in rodents [
20,
30]. Furthermore, silencing
ELOVL2 through methylation or ELOVL2 protein depletion accelerates aging in the mouse retina [
31,
32] and has been linked to diabetes [
33], as well as increased risk of breast and male colorectal cancer [
34]. Among other potential applications, this epigenetic marker can be used for predicting the development of neurodegenerative diseases, as alterations in the
ELOVL2 methylation status have been observed in patients at early stages of Alzheimer’s disease [
35]. Therefore, methylation of the
ELOVL2 gene promoter can be considered a good indicator of aging, frailty and age-associated conditions.
In this study, we used DNA from buccal swabs of healthy people for age prediction using the simplified ELOVL2 epigenetic clock. Buccal swabs were chosen because they offer a simple, painless, and non-invasive method for DNA collection, and are easier to transport and store. Collection can be performed by the donor themselves or by a non-specialist, whereas obtaining blood samples is a more invasive procedure requiring a trained professional to draw the blood. Moreover, use of this DNA source is supported by various studies that report a chronological age-related increase in
ELOVL2 gene methylation across different tissues, including blood, saliva, and buccal swabs [
36,
37,
38]. Indeed, age prediction accuracy (R
2 and MAE) for buccal swabs resulted comparable to blood and saliva. Therefore, DNA from buccal swab is an optimal sample type for studying biological aging using
ELOVL2 methylation as an epigenetic clock.
At the technical level, we found significant variability in the experimental design of previous studies evaluating epigenetic clocks, particularly concerning sample size and DNA sources used (i.e., skin, blood, buccal epithelium, etc.). Some studies used a relatively small number of DNA samples, such as Richards et al., who used a total of 28 peripheral blood samples [
38] and the study by Kampmann et al., which used 49 DNA samples obtained from blood [
39]. In contrast, other works used larger sample sizes, such as Horvath et al., who included 485 DNA samples from skin and blood [
40] and the study by McEwen et al., which included 1721 DNA samples from buccal epithelium [
41]. According to Mayne et al., an epigenetic clock should ideally be calibrated with a minimum of 70 samples, but a sample size of 134 individuals would yield more precise and accurate models for predicting epigenetic age [
42].
Several studies, such as those by El-Shishtawy et al. and Sukawutthiya et al., have used 100 blood samples to construct epigenetic clocks [
43,
44]. Our epigenetic clock, based on
ELOVL2 methylation analysis using the pyrosequencing approach, was developed with 83 samples for training, exceeding the minimum threshold proposed by Mayne et al. [
42]. Validation was conducted with a relatively small sample size (
n = 52), following the guidelines proposed by Archer et al. [
45]. This brings the total sample size to 135, which we consider sufficient for generating a robust epigenetic clock. The strong predictive performance of the model, reflected in its high adjusted R
2 and low error metrics, supports its validity as a reliable approach for age estimation.
Regarding the number of CpG used, current research has generated epigenetic clocks using large quantities [
46,
47]. However, other studies have used fewer CpGs, sometimes combining them with CpGs from other genes [
38]. There are even studies describing epigenetic clocks using only two CpG sites [
44] or just one [
48]. It is therefore feasible to design epigenetic clocks using the smallest possible number of CpGs, focusing on the ones showing the most relevant changes over chronological age to provide maximum information while also reducing unnecessary costs. Indeed, some studies have reported ELOVL2-based clocks using a minimal number of CpG sites that still provide significant information [
43,
44]. In our case, we aimed to identify the most robust CpGs that provided more information for our models, resulting in a total of eight CpGs sites for the non-sex-specific model, and six CpGs for both female and male models.
An important aspect of our study was the technical validation carried out. We analyzed variability within the same individual and the changes in DNA methylation in a stored sample over 12 months. Few studies have conducted these types of technical validations, which are necessary to assess the robustness and performance of any analytical determination. Kampmann et al. evaluated inter-laboratory reproducibility by performing validations across different laboratories [
39], while Zbieć-Piekarska et al. tested reproducibility between two laboratories and the stability of epigenetic analysis over time [
37]. Their study conducted temporal comparisons of bloodstains stored on tissue paper after 5, 10, and 15 years at room temperature conditions, with age prediction success rates ranging from 60–78%. Our results align with these previous studies, showing minimal CV, indicating good reproducibility for all CpG sites. Among the different CpGs analyzed, CpG5 exhibited the highest CV value and was subsequently eliminated from the models. Based on the CpG sites selected for the models, we can confirm good reproducibility and sample stability, even after one year of storage.
We obtained three models for epigenetic age prediction. The first was the non-sex-specific model using eight CpGs (CpG1, CpG2, CpG3, CpG4, CpG6, CpG7, CpG8, CpG9) showing an R2 adj 0.852, MAE 4.55 and RMSE 6.01. These results suggest the model performs well for age prediction.
Interestingly, unlike other studies which observed no significant differences between males and females [
43,
49,
50], we found that some CpGs performed better in the sex-specific models generated for age prediction. As a next step, we stratified the training model samples by sex and then adjusted the models for each one. After analyzing the results, we observed that the best models for each sex used six CpGs, but the two models did not share the same CpGs. In the case of females, the best predictors of chronological age of control subjects were CpG1, CpG4, CpG6, CpG7, CpG8, CpG9, whereas for males the best CpGs fitted in the model were CpG1, CpG2, CpG3, CpG7, CpG8, CpG9. Therefore, two different CpGs (CpG4 and CpG6 for females and CpG2 and CpG3 for males) were used to construct the sex-related models for subsequent age prediction.
Recent advances in epigenetic aging research have highlighted the importance of accounting for sex-specific differences in methylation patterns. Although
ELOVL2 methylation has long been recognized as a robust marker of chronological aging [
28,
31], emerging studies indicate that the rate and pattern of epigenetic changes differ between sexes. Kankaanpää et al. provided compelling evidence that epigenetic clocks yield distinct biological age estimates for men and women, potentially reflecting variations in hormonal milieu, lifestyle, and genetic factors [
5]. Likewise, Yusipov et al. reported that males exhibit a markedly higher age-associated increase in methylation variability than females [
51]. Our sex-specific minimal CpG-based model, which leverages
ELOVL2 methylation, extends these observations by capturing subtle yet biologically relevant differences in the epigenetic aging trajectories of men and women. This refined approach not only enhances the accuracy of age estimation but also underscores the potential for personalized aging interventions aimed at mitigating sex-specific health disparities.
The adjusted R2 value for the six CpG female model was 0.852 and for the six CpGs male model 0.843, which were similar to that obtained for the overall eight CpGs non-sex-specific model (R2 adj 0.852, MAE 4.55, RMSE 6.01). However, in the case of the six CpGs male model the MAE (4.22) and RMSE (5.82) improved compared to the eight CpGs non-sex-specific model, and the in six CpGs female one the value of RMSE (5.96) was reduced and the values of MAE (4.75) were homogeneous: that is, they were more similar in the different subgroups. Thus, the models obtained from differentiating between sexes offer better results to predict biological age than those not segregated by sex.
When we evaluated the age prediction values along the different age ranges for the eight CpGs non-sex-specific model, the subgroup of 20 to 29 years old showed a MAE value of 3.67, while in the 70 to 79 years old subgroup a MAE value of 7.01 was observed. In contrast, in the six CpGs female model, the subgroup of females aged 20 to 29 years old exhibited a MAE value of 5.09, whereas in females aged 70–79 years old we observed a MAE value of 7.01. Particularly in the case of the six CpGs male model, the samples obtained from males aged 70–79 years old produced a higher MAE value (9.40) and RMSE (11.49).
All models showed excellent adjusted R
2 values, which concur with or even surpass those reported in different published studies in which sex was not taken into account in generating specific models for age prediction [
37,
44,
52]. MAE and RMSE values also indicated good performance. Hanafi et al. published a systematic analysis including different models and observed different MAE values, ranging from 0.33 to 7.01 [
37,
44,
52,
53]. Our results fall within those ranges, particularly our models ranged on intermediate values described by Hanafi et al. [
53].
When evaluating our results, we observed that the deviation in the MAE was greater in older subjects. This suggests that our models perform less accurately as chronological age increases. One possible explanation for the higher MAE observed in older cohorts is that, with aging, comorbidities tend to arise, or individuals may be affected by undiagnosed age-related conditions that have not yet manifested [
53]. Recent studies have highlighted the association between
ELOVL2 methylation and age-related pathologies, further supporting this hypothesis. For instance, ELOVL2 deficiency has been linked to age-related macular degeneration phenotypes in human retinal pigment epithelium cells [
53], suggesting that methylation patterns may be influenced by underlying age-related conditions, potentially affecting prediction accuracy in older individuals. Similarly, research on chronic kidney disease has explored the role of
ELOVL2 methylation in renal and cardiovascular events [
54], indicating a possible interplay between this epigenetic marker and age-related health deterioration. Although independent associations were not found after adjusting for covariates, this study underscores the complex relationship between
ELOVL2 methylation and comorbidities associated with aging. Additionally, evidence suggests that
ELOVL2 methylation is associated with metabolic dysfunction and mitochondrial stress [
20], both of which increase with age and could contribute to greater variability in methylation patterns among older individuals. Taken together, these findings support the theory that undiagnosed age-related conditions may partially account for the observed discrepancies in age prediction accuracy in older cohorts.
The different age prediction models we have generated in this study have also been tested by a validation cohort, demonstrating reliable results for the three models generated, unlike many previous publications [
37,
39,
43] with proposed but unvalidated models. Our validation results showed an adjusted R
2 of 0.833, with a MAE of 5.04 and an RMSE of 6.26 for the eight CpGs model. In the six CpGs male model we obtained an R
2 of 0.771 with a MAE of 4.37 and a RMSE of 5.33, while in the six CpGs female model we obtained an R
2 of 0.714, a MAE of 5.38, and a RMSE of 7.93.
An important opportunity for future improvement in this study is the inclusion of greater diversity within the cohorts used. While the model has demonstrated strong predictive accuracy in estimating biological age, the lack of explicit stratification by ethnic background in the current dataset may limit the generalizability of the results to broader populations. DNA methylation profiles have been shown to vary across ethnic groups due to genetic, environmental, and socioeconomic factors. Therefore, future research would benefit from the inclusion of more diverse cohorts, allowing for a more comprehensive assessment of the model’s accuracy across varying genetic and environmental contexts. Validation in multi-ethnic datasets would not only enhance the robustness of the model but also expand its clinical applicability, ensuring that it performs effectively across a wider range of populations. This perspective provides a valuable direction for further research and clinical applications, contributing to the ongoing refinement and broader implementation of the model.
In conclusion, our analysis of up to nine methylation sites in the ELOVL2 gene has proven reliable even after one year of storage and has been validated through standard statistical procedures. This allowed us to develop robust models for biological age prediction, including an eight CpGs non-sex-specific model for both sexes and two six CpGs sex-specific models, all showing strong R2, MAE, and RMSE values. Based on our findings, we recommend applying sex-specific DNA methylation models for age prediction, as different CpG sites are necessary to optimize accuracy for each sex. Given the performance and cost-effectiveness of this approach, we propose its evaluation in clinical settings to enhance the precision of age prediction in personalized medicine.