*Article* **Synergistic Effects of Weighted Genetic Risk Scores and Resistin and sST2 Levels on the Prognostication of Long-Term Outcomes in Patients with Coronary Artery Disease**

**Hsin-Hua Chou 1,2, Lung-An Hsu <sup>3</sup> , Jyh-Ming Jimmy Juang 4,5, Fu-Tien Chiang 4,5,6, Ming-Sheng Teng <sup>7</sup> , Semon Wu 7,8 and Yu-Lin Ko 1,2,7,\***


**Abstract:** Resistin and soluble suppression of tumorigenicity 2 (sST2) are useful predictors in patients with coronary artery disease (CAD). Their serum levels are significantly attributed to variations in *RETN* and *IL1RL1* loci. We investigated candidate variants in the *RETN locus* for resistin levels and those in the *IL1RL1* locus for sST2 levels and evaluated the prognostication of these two biomarkers and the corresponding variants for long-term outcomes in the patients with CAD. We included 4652, 557, and 512 Chinese participants from the Taiwan Biobank (TWB), cardiovascular health examination (CH), and CAD cohorts, respectively. Candidate variants in *RETN* and *IL1RL1* were investigated using whole-genome sequence (WGS) and genome-wide association study (GWAS) data in the TWB cohort. The weighted genetic risk scores (WGRS) of *RETN* and *IL1RL1* with resistin and sST2 levels were calculated. Kaplan–Meier curves were used to analyze the prognostication of resistin and sST2 levels, WGRS of *RETN* and *IL1RL1*, and their combinations. Three *RETN* variants (rs3219175, rs370006313, and rs3745368) and two *IL1RL1* variants (rs10183388 and rs4142132) were independently associated with resistin and sST2 levels as per the WGS and GWAS data in the TWB cohort and were further validated in the CH and CAD cohorts. In combination, these variants explained 53.7% and 28.0% of the variation in resistin and sST2 levels, respectively. In the CAD cohort, higher resistin and sST2 levels predicted higher rates of all-cause mortality and major adverse cardiac events (MACEs) during long-term follow-up, but WGRS of *RETN* and *IL1RL1* variants had no impact on these outcomes. A synergistic effect of certain combinations of biomarkers with *RETN* and *IL1RL1* variants was found on the prognostication of long-term outcomes: Patients with high resistin levels/low *RETN* WGRS and those with high sST2 levels/low *IL1RL1* WGRS had significantly higher all-cause mortality and MACEs rates, and those with both these combinations had the poorest outcomes. Both higher resistin and sST2 levels, but not *RETN* and *IL1RL1* variants, predict poor long-term outcomes in patients with CAD. Furthermore, combining resistin and sST2 levels with the WGRS of *RETN* and *IL1RL1* genotyping exerts a synergistic effect on the prognostication of CAD outcomes. Future studies including a large sample size of participants with different ethnic populations are needed to verify this finding.

**Citation:** Chou, H.-H.; Hsu, L.-A.; Juang, J.-M.J.; Chiang, F.-T.; Teng, M.-S.; Wu, S.; Ko, Y.-L. Synergistic Effects of Weighted Genetic Risk Scores and Resistin and sST2 Levels on the Prognostication of Long-Term Outcomes in Patients with Coronary Artery Disease. *Int. J. Mol. Sci.* **2022**, *23*, 4292. https://doi.org/10.3390/ ijms23084292

Academic Editor: Elixabet Lopez-Lopez

Received: 20 March 2022 Accepted: 11 April 2022 Published: 13 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** resistin; soluble suppression of tumorigenicity 2; weighted genetic risk score; Taiwan Biobank; coronary artery disease; all-cause mortality; major adverse cardiac events

#### **1. Introduction**

Despite advancements in guideline-directed medical therapy, cardiovascular diseases remain the leading cause of disease burden and mortality worldwide. The global prevalence of cardiovascular disease nearly doubled from 271 million in 1990 to 523 million in 2019, and the number of cardiovascular deaths steadily increased from 12.1 million in 1990 to 18.6 million in 2019, half of which were due to ischemic heart disease [1]. To provide new insights into cardiovascular disease pathophysiology and to improve patient prognosis, investigators in the past two decades have focused on the association of numerous adipokines/cytokines with cardiovascular disease. Among these, resistin and suppression of tumorigenicity 2 (ST2) have attracted increasing interest in the past 10 years.

Resistin, belonging to a family of cysteine-rich proteins known as resistin-like molecules, was originally shown to be synthesized only by adipocytes and was demonstrated to induce insulin resistance in mice [2]. In humans, however, this protein appears to be expressed mainly by monocyte-derived macrophages [3]. As a proinflammatory adipokine in humans, resistin markedly upregulates the expression of inflammatory cytokines and cellular adhesive molecules [4,5]. Resistin appears to mediate the pathogenesis of atherosclerosis by promoting endothelial dysfunction, vascular smooth muscle cell proliferation, arterial inflammation, and foam cell formation [6,7]. It is predictive of atherosclerosis and poor clinical outcomes in patients with coronary artery disease (CAD), ischemic stroke, and congestive heart failure [7,8].

Approximately 70% of the observed variation in circulating resistin levels may be attributable to genetic factors [9], and several candidate genetic loci for resistin levels have been identified in different ethnic populations [9–12]. Genetic polymorphisms around the *RETN* locus are also related to resistin levels and various metabolic phenotypes, with discrepant results [11–16].

The ST2 receptor, a member of the interleukin (IL)-1 receptor family encoded by the IL-1 receptor-like 1 (*IL1RL1*) gene, is expressed as a membrane-bound receptor variant form (ST2L) and a truncated soluble form (sST2) [17]. IL-33, a biomechanically induced protein predominantly synthesized by cardiac fibroblasts, is the functional ligand of ST2L [18,19]. The IL-33/ST2 signaling pathway is upregulated in cardiomyocytes and fibroblasts in response to mechanical stimulation or injury and is cardioprotective in terms of reducing myocardial fibrosis, preventing cardiomyocyte hypertrophy, reducing apoptosis, and improving myocardial function [19,20]. By contrast, sST2 binds to IL-33 and acts as a "decoy" receptor for IL-33 to inhibit IL-33/ST2L signaling [19]. An elevated sST2 level is an independent predictor of subsequent mortality in patients with heart failure, acute myocardial infarction, and stable CAD [21–25].

*IL1RL1* variations can affect sST2 levels. In the genome-wide association study (GWAS) of the Framingham offspring cohort, up to 45% of the variation in sST2 levels not explained by clinical variables was attributed to genetic factors [26]. The most significant single nucleotide polymorphism (SNP) is rs950880, accounting for 12% of the individual variability in circulating sST2 levels. SNPs in the *IL1RL1* gene have also been linked to the severity of several immune and inflammatory diseases [27]. However, the effects of *IL1RL1* on predicting the outcome of cardiovascular disease remain unclear.

Our recent study indicated that individuals with the rs950880 AA genotype tended to have lower sST2 levels [28]; this genotype was an independent predictor of all-cause mortality in patients with CAD and lower-extremity arterial disease, and patients with high sST2 levels and the rs950880 AA genotype had the lowest survival rate. However, whether other biomarker levels can predict long-term outcomes in patients with CAD when combined with level-determining genotypes remains unknown. The Taiwan Biobank

(TWB) conducted a large-scale population-based cohort study on 30–70-year-old volunteers with no history of cancer [29]. The genetic determinants of resistin and sST2 levels were derived from a regional association plot analysis using whole-genome sequence (WGS) data in a subgroup of 859 participants from the TWB cohort and from GWAS in 5000 participants from the TWB cohort. The weighted genetic risk scores (WGRS) of *RETN* and *IL1RL1* with resistin and sST2 levels were calculated using the data of independent leveldetermining genotypes. We hypothesized that WGRS of *RETN* and *IL1RL1* combined with both biomarker levels may better predict the long-term outcomes of patients with CAD.

#### **2. Results**

#### *2.1. WGS Revealed Candidate SNPs in RETN and IL1RL1 Gene Loci*

Given the previously reported *RETN* gene as the candidate locus for resistin levels, we first performed a regional association plot study with conditional analysis using data from 859 participants from the TWB cohort and the significance of resistin levels with 509 SNPs at positions between 7.715 and 7.755 Mb on chromosome 19p13.2 around the *RETN* gene region was assessed. Our data revealed that 17 SNPs exceeded the genome-wide significance threshold (*<sup>p</sup>* < 5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ), with rs3219175 being the lead SNP (*<sup>p</sup>* = 4.24 <sup>×</sup> <sup>10</sup>−72) (Supplementary Figure S1A). To clarify whether the association of other *RETN* SNPs was independent of the lead SNP, we performed a stepwise conditional analysis. With adjustment for rs3219175, rs370006313 in the regional plot at the *RETN* locus became more significant with resistin levels (*<sup>p</sup>* = 6.56 <sup>×</sup> <sup>10</sup>−68, Supplementary Figure S1B). Furthermore, with adjustment for both rs3219175 and rs370006313, rs3745368 remained significant (*<sup>p</sup>* = 1.24 <sup>×</sup> <sup>10</sup>−<sup>27</sup> , Supplementary Figure S1C). With adjustment for all the three SNPs, none of the SNPs in the regional plot near the *RETN* locus exhibited significance at *p* < 0.01 (Supplementary Figure S1D), indicating that in this chromosomal region, variation in resistin concentrations was mainly explained by at least three signals.

We further evaluated the candidate gene variants for sST2 levels in the *IL1RL1* gene region using a regional association plot study with conditional analysis. In brief, 307 SNPs at positions between 102.915 and 102.985 Mb on chromosome 2q12.1 around the *IL1RL1* gene region were analyzed, and 242 SNPs exceeded the genome-wide significance threshold (*<sup>p</sup>* < 5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ), with rs6543115 being the lead SNP (*<sup>p</sup>* = 2.35 <sup>×</sup> <sup>10</sup>−85) (Supplementary Figure S2A). With adjustment for rs6543115, rs1420091 in the regional plot at the *IL1RL1* locus became more significant with sST2 levels (*<sup>p</sup>* = 3.57 <sup>×</sup> <sup>10</sup>−15, Supplementary Figure S2B). With adjustment for the two aforementioned SNPs, none of the SNPs initially showed genome-wide significance in the regional plot near the *IL1RL1* locus exhibited significance at *p* < 0.01 (Supplementary Figure S2C).

#### *2.2. GWAS and Replication Genotyping Results for Resistin and sST2 Levels*

We then performed GWAS using the data of 4652 participants from the TWB cohort on a TWB genotype array for resistin levels. Because of the absence of the three lead aforementioned *RETN* SNPs (i.e., rs3219175, rs370006313, and rs3745368) in the TWB genotype array, we performed genotyping of the three *RETN* SNPs using TaqMan SNP Genotyping Assays. GWAS with conditional analysis indicated that the genome-wide significance threshold was exceeded only on chromosome 19p13.2, where *RETN* is located, and serial conditional analysis revealed the lead SNPs to be rs3219175, rs370006313, and rs3745368 (*<sup>p</sup>* < 1.00 <sup>×</sup> <sup>10</sup>−<sup>307</sup> , *<sup>p</sup>* = 2.99 <sup>×</sup> <sup>10</sup>−241, and *<sup>p</sup>* = 1.21 <sup>×</sup> <sup>10</sup>−126, respectively) (Figure 1).

We then performed GWAS using the data of 4652 participants from the TWB cohort on a TWB genotype array for sST2 levels. GWAS with conditional analysis indicated that the genome-wide significance threshold was exceeded only on chromosome 2q12.1, where *IL1RL1* is located, and serial conditional analysis revealed that the lead SNPs were rs10183388 and rs4142132 (*<sup>p</sup>* < 1.00 <sup>×</sup> <sup>10</sup>−<sup>307</sup> and *<sup>p</sup>* = 8.12 <sup>×</sup> <sup>10</sup>−51, respectively) (Figure 2). Further analysis revealed that rs10183388 and rs4142132 were in nearly complete linkage disequilibrium with the *IL1RL1* lead SNPs rs6543115 and rs1420091, respectively, in the

WGS study (r<sup>2</sup> = 0.98 and 0.96, respectively) (Supplementary Figure S3). The three lead *RETN* SNPs explained 53.7% of the variation in resistin levels, and the two lead *IL1RL1* SNPs explained 28.0% of the variation in sST2 levels (Supplementary Table S1)**.** The SNP table of *RETN* and *IL1RL1* is provided in Supplementary Table S2. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 19

**Figure 1.** Conditional analysis of *RETN* candidate SNPs using GWAS data in the TWB cohort. (**A**) Manhattan plots for resistin levels from the genome-wide association study of 4652 Taiwan Biobank participants depict the only one peak above genome-wide significance on chromosome 19p13.2, where the *RETN* gene is located. (**B**) Before conditional analysis, regional association plots for resistin level surrounding the *RETN* locus show rs3219175 as the lead SNP. (**C**) After the first conditional analysis adjusting the rs3219175 genotypes, rs370006313 in the regional plot at the *RETN* locus becomes a significant association with resistin levels. (**D**) After the second conditional analysis adjusting for both rs3219175 and rs370006313 genotypes, rs3745368 is significantly associated with resistin levels. (**E**) After the third conditional analysis adjusting for the aforementioned SNPs, no more single SNP is found to be genome-wide significantly associated with resistin levels. **Figure 1.** Conditional analysis of *RETN* candidate SNPs using GWAS data in the TWB cohort. (**A**) Manhattan plots for resistin levels from the genome-wide association study of 4652 Taiwan Biobank participants depict the only one peak above genome-wide significance on chromosome 19p13.2, where the *RETN* gene is located (arrow). (**B**) Before conditional analysis, regional association plots for resistin level surrounding the *RETN* locus show rs3219175 as the lead SNP. (**C**) After the first conditional analysis adjusting the rs3219175 genotypes, rs370006313 in the regional plot at the *RETN* locus becomes a significant association with resistin levels. (**D**) After the second conditional analysis adjusting for both rs3219175 and rs370006313 genotypes, rs3745368 is significantly associated with resistin levels. (**E**) After the third conditional analysis adjusting for the aforementioned SNPs, no more single SNP is found to be genome-wide significantly associated with resistin levels.

table of *RETN* and *IL1RL1* is provided in Supplementary Table S2.

**Figure 2.** Conditional analysis of *IL1RL1* candidate SNPs using GWAS data in the TWB cohort. (**A**) Manhattan plots for sST2 levels from the genome-wide association study of 4652 Taiwan Biobank participants depict the only one peak above genome-wide significance on chromosome 2q12.1, where the *IL1RL1* gene is located. (**B**) Before conditional analysis, regional association plots for sST2 levels around the *IL1RL1* locus show rs10183388 as the lead SNP. (**C**) After the first conditional analysis adjusting for rs10183388 genotype, rs4142131 in the regional plot at *the IL1RL1* locus becomes a significant association with sST2 levels. (**D**) After the second conditional analysis adjusting for both rs10183388 and rs4142131, no more single SNP is found to be associated with sST2 levels significantly. **Figure 2.** Conditional analysis of *IL1RL1* candidate SNPs using GWAS data in the TWB cohort. (**A**) Manhattan plots for sST2 levels from the genome-wide association study of 4652 Taiwan Biobank participants depict the only one peak above genome-wide significance on chromosome 2q12.1, where the *IL1RL1* gene is located (arrow). (**B**) Before conditional analysis, regional association plots for sST2 levels around the *IL1RL1* locus show rs10183388 as the lead SNP. (**C**) After the first conditional analysis adjusting for rs10183388 genotype, rs4142131 in the regional plot at *the IL1RL1* locus becomes a significant association with sST2 levels. (**D**) After the second conditional analysis adjusting for both rs10183388 and rs4142131, no more single SNP is found to be associated with sST2 levels significantly.

We then performed GWAS using the data of 4652 participants from the TWB cohort on a TWB genotype array for sST2 levels. GWAS with conditional analysis indicated that the genome-wide significance threshold was exceeded only on chromosome 2q12.1, where *IL1RL1* is located, and serial conditional analysis revealed that the lead SNPs were rs10183388 and rs4142132 (*p* < 1.00 × 10−307 and *p* = 8.12 × 10−51, respectively) (Figure 2). Further analysis revealed that rs10183388 and rs4142132 were in nearly complete linkage disequilibrium with the *IL1RL1* lead SNPs rs6543115 and rs1420091, respectively, in the WGS study (r2 = 0.98 and 0.96, respectively) (Supplementary Figure S3). The three lead *RETN* SNPs explained 53.7% of the variation in resistin levels, and the two lead *IL1RL1* SNPs explained 28.0% of the variation in sST2 levels (Supplementary Table S1)**.** The SNP

For the validation study, we analyzed the three lead *RETN* SNPs for resistin levels and two lead *IL1RL1* SNPs for sST2 levels in the cardiovascular health examination cohort (CH cohort). As shown in Table 1**,** the three *RETN* SNPs were significantly associated with resistin levels (rs3219175, rs370006313, and rs3745368 genotypes, *p* = 6.77 × 10−41, *p* = 1.36 × 10−12, and *p* = 2.70 × 10−5, respectively), and the two *IL1RL1* SNPs were significantly associated with sST2 levels (rs10183388 and rs4142132 genotypes, *p* = 9.11 × 10−16 and *p* = 2.73 × 10−8, respectively). For the validation study, we analyzed the three lead *RETN* SNPs for resistin levels and two lead *IL1RL1* SNPs for sST2 levels in the cardiovascular health examination cohort (CH cohort). As shown in Table 1**,** the three *RETN* SNPs were significantly associated with resistin levels (rs3219175, rs370006313, and rs3745368 genotypes, *<sup>p</sup>* = 6.77 <sup>×</sup> <sup>10</sup>−<sup>41</sup> , *<sup>p</sup>* = 1.36 <sup>×</sup> <sup>10</sup>−12, and *<sup>p</sup>* = 2.70 <sup>×</sup> <sup>10</sup>−<sup>5</sup> , respectively), and the two *IL1RL1* SNPs were significantly associated with sST2 levels (rs10183388 and rs4142132 genotypes, *<sup>p</sup>* = 9.11 <sup>×</sup> <sup>10</sup>−<sup>16</sup> and *<sup>p</sup>* = 2.73 <sup>×</sup> <sup>10</sup>−<sup>8</sup> , respectively).

We evaluated the association of the *RETN* and *IL1RL1* SNPs with resistin and sST2 levels in the patients with CAD (Table 1), and the results revealed significant associations, except for rs370006313 genotypes for resistin levels (rs3219175, rs370006313, and rs3745368 genotypes for resistin levels, *<sup>p</sup>* = 4.89 <sup>×</sup> <sup>10</sup>−<sup>15</sup> , *p* = 0.055, and *p* = 0.001, respectively, and rs10183388 and rs4142132 genotypes for sST2 levels, *<sup>p</sup>* = 4.49 <sup>×</sup> <sup>10</sup>−<sup>8</sup> and *<sup>p</sup>* = 8.00 <sup>×</sup> <sup>10</sup>−<sup>6</sup> , respectively).

#### *2.3. Baseline Characteristics of TWB, CH, and CAD Cohorts*

The baseline characteristics of the participants in the TWB, CH, and CAD cohorts are provided in Table 2. The participants in the CAD cohort were older, with male predominance, and they tended to have a higher prevalence of smoking, obesity, hypertension, diabetes mellitus, and dyslipidemia. Compared with the TWB and CH cohorts, the CAD cohort had higher fasting glucose, triglyceride, aspartate aminotransferase (AST), and uric acid levels; higher leukocyte counts; lower total cholesterol, high-density lipoprotein (HDL)-cholesterol, and low-density lipoprotein (LDL)-cholesterol levels; and lower hematocrit, platelet counts, and estimated glomerular filtration rate (eGFR). The patients with CAD also had higher resistin levels but lower sST2 levels.

**Table 1.** Associations of *RETN* SNPs with resistin levels and *IL1RL1* SNPs with sST2 levels in the study populations.


TWB, Taiwan Biobank; CH, cardiovascular healthy examination; CAD, cardiovascular disease; MAF, minor allele frequency; M, major allele; m, minor allele; sST2, soluble suppression of tumorigenicity 2; WGRS, weighted genetic risk score. *p*1: unadjusted, *p*2: adjusted for age, sex, body mass index, and current smoking status.

**Table 2.** Baseline characteristics between TWB, CH, and CAD groups.


Continuous data are presented as the mean ± SD or median (interquartile range), and categorical data are presented as numbers (%). Abbreviations: AST, aspartate aminotransferase; BMI, body mass index; CAD, cardiovascular disease; CH, cardiovascular healthy examination; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; TWB, Taiwan Biobank. AST, leukocyte count, hematocrit, and platelet counts were not available in the CH cohort. \* Data with skew distribution are logarithmically transformed before statistical testing to meet the assumption of normal distribution. § *p* < 0.01 for χ2 test. <sup>a</sup> *p* < 0.05 vs. Q1 by the Bonferroni method. <sup>b</sup> *p* < 0.001 vs. Q1 by Bonferroni method. <sup>c</sup> *p* < 0.001 vs. Q2 by the Bonferroni method.

#### *2.4. Correlations of Resistin and sST2 Levels with Clinical, Metabolic, Biochemical, Hematological Parameters, and WGRS of SNPs in All Cohorts*

The correlations of resistin and sST2 levels with clinical, metabolic, biochemical, and hematological parameters and WGRS of SNPs are summarized in Supplementary Tables S3–S5. Resistin levels were strongly correlated with *RETN* WGRS in all three cohorts. Resistin levels were also significantly correlated with the lipid profile, eGFR, leukocyte count, and platelet count in the TWB cohort and with eGFR, leukocyte count, and hematocrit in the CAD cohort. Similarly, sST2 levels were strongly correlated with *IL1RL1* WGRS in all three cohorts. sST2 levels were significantly correlated with fasting glucose, lipid profile, AST, uric acid, eGFR, and leukocyte count in the TWB cohort and with eGFR and leukocyte count in the CAD cohort.

#### *2.5. Association of Resistin Levels and RETN SNPs with Long-Term Outcomes for the Patients with CAD*

In the CAD cohort, the follow-up duration was 1017 ± 324 days; 32 patients died and 58 patients developed major adverse cardiac events (MACEs). Kaplan–Meier survival analysis indicated that the patients with higher resistin levels had significantly higher rates of all-cause mortality and MACEs (Figure 3A,D, *<sup>p</sup>* = 3.10 <sup>×</sup> <sup>10</sup>−<sup>4</sup> and *<sup>p</sup>* = 0.007, respectively). The patients with lower *RETN* WGSR also had a higher rate of all-cause mortality, but *RETN* WGRS did not predict the MACEs rate (Figure 3B,E, *p* = 0.042 and *p* = 0.228, respectively). When the patients with CAD were further divided into four subgroups according to the resistin levels and *RETN* WGRS, the combination of high resistin levels and low *RETN* WGRS was a strong predictor of all-cause mortality and MACEs (Figure 3C,F, *<sup>p</sup>* = 4.0 <sup>×</sup> <sup>10</sup>−<sup>6</sup> and *p* = 0.001, respectively). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the resistin levels and *RETN* WGRS are presented in Supplementary Table S6. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 8 of 19 respectively). The patients with lower *RETN* WGSR also had a higher rate of all-cause mortality, but *RETN* WGRS did not predict the MACEs rate (Figure 3B,E, *p* = 0.042 and *p* = 0.228, respectively). When the patients with CAD were further divided into four subgroups according to the resistin levels and *RETN* WGRS, the combination of high resistin levels and low *RETN* WGRS was a strong predictor of all-cause mortality and MACEs (Figure 3C,F, *p* = 4.0 × 10−6 and *p* = 0.001, respectively). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the resistin levels and *RETN* WGRS are presented in Supplementary Table S6.

**Figure 3.** Kaplan–Meier curve analysis of resistin levels and *RETN* WGRS with long-term outcome in the patients with CAD. (**A**–**C**) Freedom from all-cause mortality in patients stratified by resistin levels, *RETN* WGRS, and combination of resistin levels and *RETN* WGRS. (**D**–**F**) Freedom from MACEs in patients stratified by resistin levels, *RETN* WGRS, and combination of resistin levels and *RETN* WGRS. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: low resistin levels/low *RETN* WGRS; Group 2: low resistin levels/high *RETN* WGRS; Group 3: high resistin levels/high *RETN* WGRS; Group 4: high resistin levels/low *RETN* WGRS. *2.6. Associations of sST2 Levels and IL1RL1 SNPs with Long-Term Outcome for the Patients with CAD*  **Figure 3.** Kaplan–Meier curve analysis of resistin levels and *RETN* WGRS with long-term outcome in the patients with CAD. (**A**–**C**) Freedom from all-cause mortality in patients stratified by resistin levels, *RETN* WGRS, and combination of resistin levels and *RETN* WGRS. (**D**–**F**) Freedom from MACEs in patients stratified by resistin levels, *RETN* WGRS, and combination of resistin levels and *RETN* WGRS. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: low resistin levels/low *RETN* WGRS; Group 2: low resistin levels/high *RETN* WGRS; Group 3: high resistin levels/high *RETN* WGRS; Group 4: high resistin levels/low *RETN* WGRS.

The associations of sST2 levels and *IL1RL1* SNPs with long-term outcomes for the patients with CAD are provided in Figure 4. The patients with higher sST2 levels had

MACEs (Figure 4B,E, *p* = 0.331 and 0.971, respectively). When these patients were further divided into four subgroups according to the sST2 levels and *IL1RL1* WGRS, the combination of high sST2 levels and low *IL1RL1* WGRS was a strong predictor of all-cause mortality and MACEs (Figure 4C,F, *p* = 0.001 and 0.005, respectively). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the

sST2 levels and *IL1RL1* WGRS are provided in Supplementary Table S7.

#### *2.6. Associations of sST2 Levels and IL1RL1 SNPs with Long-Term Outcome for the Patients with CAD*

The associations of sST2 levels and *IL1RL1* SNPs with long-term outcomes for the patients with CAD are provided in Figure 4. The patients with higher sST2 levels had significantly higher rates of all-cause mortality and MACEs (Figure 4A,D, *p* = 0.008 and 0.009, respectively). However, the *IL1RL1* WGRS did not predict all-cause mortality and MACEs (Figure 4B,E, *p* = 0.331 and 0.971, respectively). When these patients were further divided into four subgroups according to the sST2 levels and *IL1RL1* WGRS, the combination of high sST2 levels and low *IL1RL1* WGRS was a strong predictor of all-cause mortality and MACEs (Figure 4C,F, *p* = 0.001 and 0.005, respectively). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the sST2 levels and *IL1RL1* WGRS are provided in Supplementary Table S7. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 9 of 19

**Figure 4.** Kaplan–Meier curve analysis of sST2 levels and *IL1RL1* WGRS with long-term outcomes in the patients with CAD. (**A**–**C**) Freedom from all-cause mortality in patients stratified by sST2 levels, *IL1RL1* WGRS, and combination of sST2 levels and *IL1RL1* WGRS. (**D**–**F**) Freedom from MACEs in patients stratified by sST2 levels, *IL1RL1* WGRS, and combination of sST2 levels and *IL1RL1* WGRS. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: low sST2 levels/low *IL1RL1* WGRS; Group 2: low sST2 levels/high *IL1RL1* WGRS; Group 3: high sST2 levels/high *IL1RL1* WGRS; Group 4: high sST2 levels/low *IL1RL1* WGRS. *2.7. Synergistic Effects of WGRS with Resistin and sST2 on Predicting Long-Term Outcomes of*  **Figure 4.** Kaplan–Meier curve analysis of sST2 levels and *IL1RL1* WGRS with long-term outcomes in the patients with CAD. (**A**–**C**) Freedom from all-cause mortality in patients stratified by sST2 levels, *IL1RL1* WGRS, and combination of sST2 levels and *IL1RL1* WGRS. (**D**–**F**) Freedom from MACEs in patients stratified by sST2 levels, *IL1RL1* WGRS, and combination of sST2 levels and *IL1RL1* WGRS. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: low sST2 levels/low *IL1RL1* WGRS; Group 2: low sST2 levels/high *IL1RL1* WGRS; Group 3: high sST2 levels/high *IL1RL1* WGRS; Group 4: high sST2 levels/low *IL1RL1* WGRS.

#### The patients with CAD were further divided into three subgroups according to the presence of high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS *2.7. Synergistic Effects of WGRS with Resistin and sST2 on Predicting Long-Term Outcomes of the Patients with CAD*

(Figure 5). The patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS had significantly higher rates of all-cause mortality and MACEs, and the patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had the worst outcomes (Figure 5A,B, *p* = 3.12 × 10−11 for all-cause mortality and *p* = 4.00 × 10−6 for MACEs). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the presence of high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS are provided in Table 3. The patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS and the patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had significantly higher rates of all-cause mortality and MACEs after adjustment for baseline characteristics and diseases such as hypertension, diabetes mellitus, or hyperlipidemia. However, predictive power was attenuated after further adjustment for uric acid level, eGFR, and inflammatory markers, including C-reactive protein (CRP), chemerin, and growth differentiation factor (GDF)-15 levels. Furthermore, sig-The patients with CAD were further divided into three subgroups according to the presence of high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS (Figure 5). The patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS had significantly higher rates of all-cause mortality and MACEs, and the patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had the worst outcomes (Figure 5A,B, *<sup>p</sup>* = 3.12 <sup>×</sup> <sup>10</sup>−<sup>11</sup> for all-cause mortality and *<sup>p</sup>* = 4.00 <sup>×</sup> <sup>10</sup>−<sup>6</sup> for MACEs). The results of Cox regression analysis of all-cause mortality and MACEs between the groups stratified by the presence of high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS are provided in Table 3. The patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS and the patients with both high resistin levels/low *RETN* WGRS and high sST2

> nificantly higher levels of inflammatory biomarkers and lower eGFR were observed in the patients with high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS

*the Patients with CAD* 

levels/low *IL1RL1* WGRS had significantly higher rates of all-cause mortality and MACEs after adjustment for baseline characteristics and diseases such as hypertension, diabetes mellitus, or hyperlipidemia. However, predictive power was attenuated after further adjustment for uric acid level, eGFR, and inflammatory markers, including C-reactive protein (CRP), chemerin, and growth differentiation factor (GDF)-15 levels. Furthermore, significantly higher levels of inflammatory biomarkers and lower eGFR were observed in the patients with high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS (Supplementary Table S8). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 10 of 19

**Figure 5.** Kaplan–Meier curve analysis of combining resistin and sST2 levels with corresponding WGRS in predicting freedom from all-cause mortality (**A**) and freedom from MACEs (**B**) in the patients with CAD. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: patients without high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS; Group 2: patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS; Group 3, patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS. **Table 3.** Cox regression analysis of all-cause mortality and MACEs rate between the groups strati-**Figure 5.** Kaplan–Meier curve analysis of combining resistin and sST2 levels with corresponding WGRS in predicting freedom from all-cause mortality (**A**) and freedom from MACEs (**B**) in the patients with CAD. WGRS, weighted genetic risk scores; MACEs, major adverse cardiac events. Group 1: patients without high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS; Group 2: patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS; Group 3, patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS.

 **Group 1 \* Group 2 \* Group 3 \***  All-cause mortality **Table 3.** Cox regression analysis of all-cause mortality and MACEs rate between the groups stratified by the presence of high resistin level/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS.

fied by the presence of high resistin level/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS.


*IL1RL1* WGRS; Group 3, patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS. Abbreviations: HR, hazard ratio; CI, confidence interval. Model 1, Unadjusted. Model 2, Adjusted for age, sex, BMI, and smoking status. Model 3: Adjusted for age, sex, BMI, smoking status, diabetes mellitus, hypertension, and dyslipidemia. Model 4: Adjusted for age, sex, BMI, smoking status, diabetes mellitus, hypertension, dyslipidemia, uric acid level, estimated glomerular filtration rate, CRP levels, chemerin levels, and GDF-15 levels. \* Group 1, patients without high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS; Group 2, patients with either high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS; Group 3, patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS. Abbreviations: HR, hazard ratio; CI, confidence interval. Model 1, Unadjusted. Model 2, Adjusted for age, sex, BMI, and smoking status. Model 3: Adjusted for age, sex, BMI, smoking status, diabetes mellitus, hypertension, and dyslipidemia. Model 4: Adjusted for age, sex, BMI, smoking status, diabetes mellitus, hypertension, dyslipidemia, uric acid level, estimated glomerular filtration rate, CRP levels, chemerin levels, and GDF-15 levels.

#### **3. Discussion**

In this study, we performed two GWAS analyses for 4652 participants in the TWB cohort as well as regional association plot analyses for a subgroup of 859 participants with WGS data, aiming to elucidate the genetic basis of resistin and sST2 levels in the Chinese population. Three *RETN* variants (rs3219175, rs370006313, and rs3745368) and two *IL1RL1* variants (rs10183388 and rs4142132) were found to be independently associated with resistin and sST2 levels. In combination, these variants explained 53.7% and 28.0% of the variation in resistin and sST2 levels, respectively. These results were confirmed in validation studies in the CH and CAD cohorts. To the best of our knowledge, the three *RETN* variants were the strongest genetic determinants of resistin levels compared with the results of other GWAS analyses in ethnic population studies [13–15,30]. We further evaluated the effect of the combination of biomarker levels and genetic variants on predicting long-term outcomes in the patients with CAD. Both higher resistin and sST2 levels predicted higher rates of all-cause mortality and MACEs in the patients with CAD during long-term follow-up, but not WGRS of *RETN* and *IL1RL1* variants, except a borderline significance of low *RENT* WGRS on all-cause mortality. Notably, we observed a synergistic effect of the combination of biomarkers with WGRS of *RETN* and *IL1RL1* on the prediction of the long-term outcomes in the patients with CAD. The patients with high resistin levels/low *RETN* WGRS and the patients with high sST2 levels/low *IL1RL1* WGRS had significantly higher rates of all-cause mortality and MACEs. Furthermore, the patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had the poorest outcomes over long-term follow-up.

#### *3.1. Novel Variants Associated with Resistin Levels*

GWAS of the data of 4652 participants in the TWB cohort revealed the *RETN* gene region as the only locus associated with resistin levels, with rs3219175 as the lead SNP (*<sup>p</sup>* < 1.00 <sup>×</sup> <sup>10</sup>−307). This result confirmed the data from East Asian [11,12,16,30] and African populations [14] but was different from those reported in Caucasian populations, in which *TYW3/CRYZ* and *NDST4* loci, but not *RETN* variants, showed a genome-wide significant association with circulating resistin levels [10]. We used the GWAS data to analyze the *RETN* gene locus and found three *RETN* SNPs, namely rs3219175, rs370006313, and rs3745368, to be independently associated with resistin levels, which contributed to 53.7% of the variation in resistin levels in the TWB cohort. The data were replicated in the CH cohort (41.1% of variations). Notably, minor allele frequencies (MAFs) of two SNPs, rs3219175 and rs3745368, were significantly lower in the Caucasian population than that in East Asian and African populations [11–15]. By contrast, the MAF of the rs370006313 variant was 0.9% in our cohorts. According to the pubmed.gov website, the MAF of rs370006313 was 0% for 1006 European participants and 1322 African participants from the 1000-Genome project. Furthermore, the *RETN* –420C > G (rs1862513) genotypes, initially considered candidate SNPs for resistin levels [9,11,12,16], were not found to be independent determinants of resistin levels in our study population.

These results indicate ethnic heterogeneity not only at the level of gene loci but also at the level of each gene variant. rs370006313 is a novel SNP that is localized within the minimal promoter of human *RETN,* and it has not been previously reported to be associated with resistin levels. Although rs370006313 is a rare variant in the Taiwanese population, the resistin level increased by 4.7 times in the heterozygous state; thus, this SNP contributed to a 13.8% variation in resistin level in the relatively healthy population. Further research is warranted to elucidate the functional significance of this variant.

#### *3.2. Genetic Determinants of sST2 Levels in Taiwan*

In the GWAS analysis, our data revealed that chromosome 2q12.1, where the *IL1RL1* gene is located, was the only gene locus associated with sST2 levels. Further GWAS analysis revealed that the two lead *IL1RL1* SNPs, rs10183388 and rs4142131, were independently associated with sST2 levels and contributed to 28% of the variation; thus, these SNPs are

strong genetic determinants of sST2 levels in the Chinese population. Consistently, the Framingham offspring cohort study reported that all candidate SNPs associated with the sST2 concentration were on chromosome 2q12.1 and that the 11 "independent" genomewide significant SNPs across the *IL1RL1* locus accounted for 36% of the heritability of sST2 levels, with rs950880 as the lead SNP [26]. In the TWB cohort, GWAS analysis also revealed a significant association of rs950880 with sST2 levels (*<sup>p</sup>* = 1.73 <sup>×</sup> <sup>10</sup>−216). Notably, this association disappeared after serial conditional analysis with rs10183388 and rs4142131, suggesting ethnic heterogeneity in *IL1RL1* variants.

#### *3.3. Association of Resistin Levels with Cardiometabolic Phenotypes and Long-Term Outcomes in the Patients with CAD*

As a proinflammatory adipokine, resistin is associated with several cardiometabolic phenotypes, including obesity, diabetes, and hyperlipidemia [31,32]. Elevated resistin levels are also associated with impaired renal function and several inflammatory biomarkers [33,34]. Resistin may also contribute to cholesterol and triglyceride accumulation in macrophages, arterial inflammation, endothelial dysfunction, and angiogenesis [31], leading to accelerated atherogenesis and CAD. In line with these findings, our study also found a strong association of resistin levels with BMI, diabetes mellitus, dyslipidemia, impaired renal function, and elevated leukocyte counts in the TWB cohort. Furthermore, elevated resistin levels were a strong predictor of poor long-term outcomes in the patients with CAD, and the participants with higher resistin levels had significantly higher rates of all-cause mortality and MACEs.

#### *3.4. Association of sST2 Levels with Cardiometabolic Phenotypes and Long-Term Outcome in the Patients with CAD*

sST2, a protein secreted by cultured myocytes subjected to mechanical strain, is a well-known biomarker of severe heart failure and a strong predictor of mortality [21,22]. IL-33, the ligand of sST2, is also induced and released by stretched myocytes. In patients presenting to the emergency department with myocardial infarction with ST elevation and dyspnea, sST2 levels are strongly predictive of heart failure and mortality [23]. In patients with stable CAD, increased sST2 levels also predict long-term MACEs and allcause mortality [24,25]. Recently, sST2 levels have also been found to be a novel biomarker and clinical predictor of metabolic syndrome. In patients without CAD or heart failure, Zong et al. reported that increased sST2 levels were significantly associated with a higher prevalence of hypertension, diabetes mellitus, hypertriglyceridemia, and lower HDL cholesterol levels [35]. Consistently, in the present study, elevated sST2 levels were associated with a higher prevalence of diabetes mellitus, higher BMI, total cholesterol, and lower HDL cholesterol levels in the TWB cohort. Furthermore, elevated sST2 levels were significantly associated with higher AST, uric acid, leukocyte counts, and lower eGFR. For the patients with CAD, elevated sST2 levels predicted poor long-term clinical outcomes. The patients with higher sST2 levels had higher rates of all-cause mortality and MACEs during the follow-up period.

#### *3.5. RENT and IL1RL1 Variants on Long-Term Outcomes in the Patients with CAD*

The results of the association between *RENT* variants and CAD risk were inconsistent in previous studies. The association of *RETN* –420C > G with the risk of CAD was not significant in the Caucasian population [36,37]. By contrast, Tang et al. evaluated the association of *RETN* –420C > G with the presence of CAD and reported a 62% increased risk of CAD in participants with variant genotypes (CG and GG) [38]. Together, these findings suggest the ethnic heterogeneity of *RETN* variants with the risk of CAD. Importantly, none of these studies evaluated the prognostication ability of *RETN* variants for CAD outcomes. In the present study, the patients with CAD with low *RETN* WGRS had an increased all-cause mortality rate during the 4-year follow-up, but low *RETN* WGRS did not predict a high MACEs rate.

The evidence of the association between *IL1RL1* variants and the risk of CAD remains limited. In a case-control association analysis of 4521 individuals with CAD and 4809 controls in the Chinese Han population, Tu et al. reported a strong association of the *IL1RL1* SNP rs11685424 with the risk of CAD and disease severity [39]. Our previous study reported that among patients with CAD and lower-extremity arterial disease, those with the *IL1RL1* SNP rs950880 AA genotype tended to have lower sST2 levels and a lower survival rate [28]. The present study further evaluated the effect of *IL1RL1* variants on the prediction of long-term outcomes in the patients with CAD using the WGS and GWAS data from the TWB cohort. However, the WGRS of the two *IL1RL1* lead SNPs did not predict the outcomes of the patients with CAD, including all-cause mortality and MACEs.

#### *3.6. Synergistic Effects of Genetic Variants and Resistin and sST2 Levels on Predicting Long-Term Outcomes in the Patients with CAD*

When we combined the biomarkers with the WGRS of *RETN* and *IL1RL1*, we observed a synergistic effect on predicting long-term outcomes in the patients with CAD. The patients with high resistin levels/low *RETN* WGRS had the highest all-cause mortality and MACEs rates during the follow-up period (6.2 times for all-cause mortality and 3.1 times for MACEs compared with patients with low resistin levels/low *RETN* WGRS, Figure 3 and Supplementary Table S6). In agreement with this finding, the patients with high sST2 levels/low *IL1RL1* WGRS also had the poorest prognosis during the follow-up period (14.1 times for all-cause mortality and 6.0 times for MACEs compared with the patients with low sST2 level/low *IL1RL1* WGRS, Figure 4 and Supplementary Table S7).

We further analyzed the synergistic effects of combining resistin levels with *RETN* WGRS and sST2 levels with *IL1RL1* WGRS in the prognostication of long-term outcomes in the patients with CAD. The patients with high resistin levels/low *RETN* WGRS or high sST2 levels/low *IL1RL1* WGRS had 5.0 times higher all-cause mortality and 2.6 times higher MACEs rates than those without these presentations (Table 3). Furthermore, the patients with both high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had the poorest outcomes (19.6 times for all-cause mortality and 5.9 times for the MACEs). These results were further adjusted for traditional cardiovascular risk factors, including age, sex, BMI, smoking status, diabetes mellitus, hypertension, and dyslipidemia, and adjusted for known predictors of poorer cardiovascular outcomes, such as uric acid levels [40], eGFR [41], and inflammatory biomarkers. These findings remained unchanged after adjustment for traditional cardiovascular risk factors but were attenuated after further adjustment of uric acid levels, eGFR, and inflammatory markers such as CRP, chemerin, and GDF-15 levels. Furthermore, significantly higher levels of CRP, chemerin, and GDF-15 and lower eGFR were observed in the patients with high resistin levels/low *RETN* WGRS and/or high sST2 levels/low *IL1RL1* WGRS (Supplementary Table S8). These findings indicate that inflammation and impaired renal function may explain the poorer outcomes in the patients with high biomarker levels and low WGRS for the corresponding genetic variants.

#### *3.7. Study Limitations*

Although we had enrolled a large sample size with available WGS and GWAS data in the TWB cohort, the sample sizes in the validation group (CH cohort) and study group (CAD cohort) were relatively small. However, the candidate SNPs in the *RETN* and *IL1RL1* genes were consistent in the three groups, providing strong evidence of the lead SNPs for *RETN* and *IL1RL1* in the Chinese population. Second, the event rates of all-cause mortality and MACEs were low in the CAD cohort. Nevertheless, we found a significant prediction power of resistin and sST2 levels for outcomes in patients with CAD, as well as a synergistic effect when the resistin and sST2 levels were combined with the WGRS of *RETN* and *IL1RL1*. Future studies should include a large sample size of patients with CAD to verify this finding. Finally, because we only included individuals of Han Chinese ethnicity, our results cannot be extended to other ethnic groups.

#### **4. Materials and Methods**

#### *4.1. Study Populations*

#### 4.1.1. TWB Cohort

The study cohort for the GWAS comprised 5000 participants from the TWB cohort, and WGS data were available for 859 of them. The data were collected from recruitment centers across Taiwan between 2008 and 2015. The inclusion criteria were participants without a history of cancer, stroke, CAD, or systemic disease. All participants self-reported having Han Chinese ethnicity. After participants were excluded based on the exclusion criteria, 4652 participants remained and were included in the GWAS (Supplementary Figure S4). Supplementary Table S9 shows the definitions of hypertension, diabetes mellitus, hyperlipidemia, and current smoking status. The Research Ethics Committee of Taipei Tzu Chi Hospital (approval number: 05-X04-007), Buddhist Tzu Chi Medical Foundation, and Ethics and Governance Council of the Taiwan Biobank (approval number: TWBR10507-02 and TWBR10611-03) approved our study. Written informed consent was obtained from all participants before participation.

#### 4.1.2. CH Cohort

The validation group was recruited during routine cardiovascular health examinations from October 2003 to September 2005 at Chang Gung Memorial Hospital and comprised 617 Han Chinese participants (327 men with a mean age of 45.2 ± 10.5 years and 290 women with a mean age of 46.8 ± 10.1 years), who responded to a questionnaire on their medical history and lifestyle characteristics. A total of 60 participants were excluded from the current study, and 557 participants were enrolled in the analysis (Supplementary Figure S4). All of the participants provided written informed consent, and the study was approved by the Ethics Committee of Chang Gung Memorial Hospital and the Ethics Committee of Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation.

#### 4.1.3. CAD Cohort

In the CAD cohort, 565 patients who received coronary angiography, had at least 50% stenosis of one major coronary artery, and had available blood samples for DNA and biomarker analyses were recruited between July 2010 and September 2013 from National Taiwan University Hospital. Among these, 53 patients were excluded from the current study, and 512 patients were included in the analysis (Supplementary Figure S4). All clinical data were obtained from the patients' medical records. The primary enpoint was all-cause mortality. The secondary endpoint was MACEs, including the composite endpoints of all-cause mortality, hospitalization for heart failure, nonfatal myocardial infarction, or nonfatal stroke. Seven patients who were lost to follow-up after enrollment were contacted by telephone before the end of the study. Three of these patients had died, and the cause of death was provided by the relatives. All of the participants provided written informed consent, and the study was approved by the Research Ethics Committee of National Taiwan University Hospital.

#### *4.2. Laboratory Examination*

We examined the following clinical phenotypes: body height, body weight, body mass index (BMI), and systolic, mean, and diastolic blood pressure. In addition, we collected the following biochemical data: lipid profile, including total cholesterol, HDL cholesterol, LDL cholesterol, and triglyceride levels; fasting plasma glucose; and liver and renal functional test-related parameters such as serum creatinine, eGFR, uric acid, and AST. Hematological parameters included white and red blood cell counts, platelet counts, and hematocrit. Circulating levels of resistin, sST2, and inflammatory markers such as chemerin and GDF-15 were measured using commercially available enzyme-linked immunosorbent assay kits (R&D, Minneapolis, MN, USA). Circulating plasma levels of CRP were measured using the particle-enhanced turbidimetric immunoassay technique (Siemens Healthcare Diagnostics, Camberley, UK). The increase in turbidity that accompanies aggregation is proportional

to the CRP concentration. Overall, the intra- and interassay coefficients of variation were 1.2–9.5% (Supplementary Table S10).

#### *4.3. Genomic DNA Extraction and Genotyping*

DNA of the participants was isolated from blood samples using a QIAamp DNA blood kit following the manufacturer's instructions (Qiagen, Valencia, CA, USA). SNP genotyping was conducted using custom TWB chips and was run on the Axiom Genome-Wide Array Plate System (Affymetrix, Santa Clara, CA, USA). Genotyping for *RETN* rs3219175, rs370006313, and rs3745368 genotypes in the participants from the TWB cohort, the participants from the CH cohort, and the patients with CAD as well as for *IL1RL1* rs10183388 and rs4142132 genotypes were performed using TaqMan SNP Genotyping Assays (Applied Biosystems, Foster City, CA, USA). For quality control purposes, approximately 10% of the samples were re-genotyped blindly, and identical results were obtained.

#### *4.4. TWB Whole-Genome Sequencing: RETN and IL1RL1 Gene Region*

The WGS data of 859 participants from the TWB cohort were evaluated using an ultrafast whole-genome secondary analysis on Illumina sequencing platforms [42] (Illumina HiSeq 2500/4000). The resulting reads were aligned to the hg19 reference genome with iSAAC 01.13.10.21. iSAAC Variant Caller 2.0.17 was used to perform SNP and insertiondeletion variant discovery and genotyping [42]. An in-house protocol written in shell script was performed to combine 880 vcf files. A union table of all detected variants in 880 vcf files was used for further analysis. Two gene loci linked to resistin and sST2 levels, according to the GWAS results, were included in the analysis, namely 20 Kb around the *RETN* and *IL1RL1* gene regions. The association between SNPs and resistin and sST2 levels was then analyzed using the GWAS method.

#### *4.5. GWAS Analysis*

The Axiom Genome-Wide CHB 1 Array Plate (Affymetrix) designed by the Taiwan Biomarker Study Group is a TWB genotype array used for GWAS analysis. In this genotyping platform, SNPs with minor allele frequencies of ≥5% in a set of 1950 samples were selected from Taiwan Han Chinese populations previously genotyped at the National Center of Genome Medicine of the Academia Sinica, Taipei, Taiwan [43]. Each genomic DNA was genotyped on the Axiom TWB genome-wide array comprising 642,832 SNPs with assistance from the National Center of Genome Medicine of Academia Sinica. All the samples in the analysis had a call rate of ≥97%. For SNP quality control, SNP call rate < 97%, minor allele frequency <0.01, and violation of Hardy–Weinberg equilibrium (*p* < 10−<sup>6</sup> ) were criteria for exclusion from subsequent analyses. In total, 4652 participants and 614,821 SNPs were included in the GWAS analysis after quality control.

#### *4.6. Statistical Analysis*

Before analysis, resistin and sST2 levels were logarithmically transformed to adhere to a normality assumption. A generalized linear model was used to analyze resistin and sST2 levels in relation to the investigated genotypes and confounders. We assumed the genetic effect to be additive after adjustment for age, sex, BMI, and current smoking status. The software package PLINK was used to conduct genome-wide scans, and *<sup>p</sup>* < 5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> was considered genome-wide significant. For GWAS, a conditional analysis was conducted to assess the residual association with all remaining SNPs after adjustment for the most strongly associated SNP at a locus by adding the SNP as a covariate into the regression model.

The baseline characteristics of all participants were evaluated using analysis of variance. Continuous variables are expressed as mean ± standard deviation or median and interquartile ranges when the distribution is strongly skewed. Differences in categorical data distribution were examined using a chi-square test or a chi-square test for trends. The Bonferroni method was used for post-hoc analysis after ANOVA. The genetic risk score

was calculated using the weighted method, which assumed each SNP to be independently associated with resistin or sST2 levels (i.e., no interaction between each SNP) [44]. We assumed an additive effect of risk alleles for each SNP and applied linear weighting of 0, 1, or 2 to genotypes containing a corresponding number of risk alleles. WGRS were calculated by multiplying the estimated beta-coefficient of each SNP by the number of corresponding risk alleles (0, 1, or 2). Pearson's correlation coefficients were used to examine the relationship of resistin and sST2 levels with clinical and biochemical factors in addition to WGRS. Each variable with a significant association with resistin and sST2 was entered into the multivariate linear regression model with the stepwise method to identify the independent correlates of resistin and sST2.

The rates of freedom from primary and secondary endpoints in the patients with CAD, stratified by the levels and resistin and sST2 and the WGRS of *RETN* and *IL1RL1* genotyping, were assessed using Kaplan–Meier curves and compared using the longrank test. The outcomes were further analyzed by combining WGRS according to *RETN* and *IL1RL1* genotyping with both biomarker levels. Cox regression analysis was used to determine the hazard ratio of primary and secondary endpoints in each group. IBM SPSS Statistics version 24.0 (IBM Corp., Armonk, NY, USA) was used to perform all calculations, with two-sided *p* < 0.05 set as the statistically significant level.

#### **5. Conclusions**

In the GWAS of the TWB cohort, we found that three *RETN* lead SNPs (rs3219175, rs370006313, and rs3745368) were strongly associated with resistin levels and that two *IL1RL1* lead SNPs (rs10183388 and rs4142132) were significantly associated with sST2 levels in the Chinese population. These results were validated in the CH cohort and CAD cohort. Serum resistin and sST2 levels were significantly associated with cardiometabolic risk factors in the TWB and CAD cohorts and were strong predictors of poor clinical outcomes in patients with CAD. Furthermore, a synergistic effect was noted when combining resistin and sST2 levels with the WGRS of *RETN* and *IL1RL1* in the prognostication of CAD outcomes. Patients with high resistin levels/low *RETN* WGRS and high sST2 levels/low *IL1RL1* WGRS had the poorest outcome during long-term follow-up. This study provides insights into the effects of biomarkers and corresponding genetic variants on the prognosis of long-term clinical outcomes in patients with CAD.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23084292/s1.

**Author Contributions:** Conceptualization, Y.-L.K. and H.-H.C.; methodology, Y.-L.K., H.-H.C., M.-S.T. and S.W.; formal analysis, H.-H.C., Y.-L.K., M.-S.T. and S.W.; resources, L.-A.H., J.-M.J.J. and F.-T.C.; data curation, L.-A.H., J.-M.J.J. and F.-T.C.; writing—original draft preparation, H.-H.C.; writing review and editing, Y.-L.K.; visualization, Y.-L.K.; project administration, Y.-L.K.; supervision, Y.-L.K.; funding acquisition, H.-H.C. and Y.-L.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by grants from Buddhist Tzu Chi Medical Foundation Academic Advancement (TCMF-EP 111-02, TCMF-A 107-01-15), grants from the Ministry of Science and Technology (MOST 108-2314-B-303-026-MY3) and Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (TCRD-TPE-MOST-109-05) to Y.-L.K., and Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (TCRD-TPE-109-RT-1) to H.-H.C.

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki. The Research Ethics Committee of Taipei Tzu Chi Hospital (approval number: 05-X04- 007), Buddhist Tzu Chi Medical Foundation, had approved the present study. The TWB study cohort had been approved by the Ethics and Governance Council of the Taiwan Biobank (approval number: TWBR10507-02 and TWBR10611-03). The Cardiovascular Healthy examination cohort had been approved by the Ethics Committee of Chang Gung Memorial Hospital, and the CAD study cohort was approved by the Research Ethics Committee of National Taiwan University Hospital. Written informed consent was obtained from all participants before participation.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** We greatly appreciate technical support from the Core Laboratory of the Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, and expert statistical analysis assistance from Tsung-Han Hsieh.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Differential Genetic and Epigenetic Effects of the** *KLF14* **Gene on Body Shape Indices and Metabolic Traits**

**Semon Wu <sup>1</sup> , Lung-An Hsu <sup>2</sup> , Ming-Sheng Teng <sup>3</sup> , Hsin-Hua Chou 4,5 and Yu-Lin Ko 3,4,5,\***

	- New Taipei City 23142, Taiwan; vincent@tzuchi.com.tw

**Abstract:** The *KLF14* gene is a key metabolic transcriptional transregulator with monoallelic maternal expression. *KLF14* variants are only associated with adipose tissue gene expression, and *KLF14* promoter methylation is strongly associated with age. This study investigated whether age, sex, and obesity mediate the effects of *KLF14* variants and DNA methylation status on body shape indices and metabolic traits. In total, the data of 78,742 and 1636 participants from the Taiwan Biobank were included in the regional plot association analysis for *KLF14* variants and *KLF14* methylation, respectively. Regional plot association studies revealed that the *KLF14* rs4731702 variant and the nearby strong linkage disequilibrium polymorphisms were the lead variants for lipid profiles, blood pressure status, insulin resistance surrogate markers, and metabolic syndrome mainly in female participants and for body shape indices mainly in obese women. Significant age-dependent associations between *KLF14* promoter methylation levels and body shape indices, and metabolic traits were also noted predominantly in female participants. *KLF14* variants and *KLF14* hypermethylation status were associated with metabolically healthy and unhealthy phenotypes, respectively, in obese individuals, and only the *KLF14* variants demonstrated a significant association with both higher adiposity and lower cardiometabolic risk in the same allele, revealing uncoupled excessive adiposity from its cardiometabolic comorbidities, especially in obese women. Variations of *KLF14* are associated with body shape indices, metabolic traits, insulin resistance, and metabolically healthy status. Differential genetic and epigenetic effects of *KLF14* are age-, sex- and obesity-dependent. These results provided a personalized reference for the management of cardiometabolic diseases in precision medicine.

**Keywords:** *KLF14*; body shape indices; metabolic traits; differential effect; genetic variants; DNA methylation

#### **1. Introduction**

Krüpple-like factors (KLFs), belonging to the SP/KLF family, are a group of evolutionarily conserved transcription factors with C-terminal zinc finger domains that are crucial for the recognition of and binding to the GT/GC-rich or CACCC box cis-regulatory sites in gene promoters and enhancers [1,2]. KLFs play a critical role in homeostasis maintenance by regulating the gene expression in many organ systems, such as cardiovascular, renal, digestive, respiratory, immunological, and hematopoietic systems [3–6]. KLFs also regulate cell signaling pathways controlling cell proliferation, apoptosis, migration, differentiation, and other physiological activities [4]. As a member of the KLF family, KLF14 is induced by TGF-β in intraembryonic and ectodermal tissues [4,6] and forms a corepressor transcription

**Citation:** Wu, S.; Hsu, L.-A.; Teng, M.-S.; Chou, H.-H.; Ko, Y.-L. Differential Genetic and Epigenetic Effects of the *KLF14* Gene on Body Shape Indices and Metabolic Traits. *Int. J. Mol. Sci.* **2022**, *23*, 4165. https://doi.org/10.3390/ ijms23084165

Academic Editor: Elixabet Lopez-Lopez

Received: 9 March 2022 Accepted: 6 April 2022 Published: 9 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

complex with Sin3A and HDAC2 to inhibit the expression of TGF-II receptors. KLF14 is only expressed in mammalian tissues, including the muscle, brain, fat, and liver, and it has monoallelic maternal expression in all tissues of both humans and mice [7]. In the livers of high-fat-diet–fed mice and db/db obese mice, KLF14 activates the PI3K/Akt signaling pathway to increase insulin sensitivity [8]. It also mediates lipid signaling, and KLF14-mediated metabolic phenotypes are attributed to the regulation of lipid signaling through the dysregulation of sphingokinase 1, a critical lipid signaling molecule [5,9,10]. With the lack of *Klf14*-mediated apoA-I promoter activation, hepatic-specific *Klf14* deletions in *Apoe*−*/*<sup>−</sup> mice cause marked acceleration of atherosclerotic lesion development in combination with decreased high-density lipoprotein (HDL) cholesterol levels, and cholesterol efflux, whereas Klf14 activation reduces atherosclerosis [11]. Together, these findings indicate that epigenetic regulation–induced *KLF14* loss-of-function may trigger the onset of metabolic diseases. Furthermore, KLF14 is a master metabolic transcriptional regulator that mediates adipogenesis, insulin signaling, lipid metabolism, inflammatory and immune responses, and cell proliferation and differentiation, thus acting as a potential novel therapeutic target to reduce the risk of atherosclerotic cardiovascular disease [12].

The *KLF14* gene is a single exon imprinted gene localized in the human chromosome 7q32.2, with a total length of 1059 bp; it encodes 323 amino acids, and only the allele inherited from the mother is expressed [13]. Using chromatin-state maps, a 1.6-kb enhancer region approximately 5 kb upstream of the *KLF14* transcription start site has been identified in adipose tissue [14]. Genome-wide association studies (GWASs) have identified *KLF14* variants to be associated with a multitude of metabolic pathologies, such as insulin resistance (IR), diabetes mellitus, coronary artery disease, ischemic stroke, and myocardial infarction [14–17]. Lead single-nucleotide polymorphisms (SNPs) in GWASs map to a 3–48-kb region upstream of *KLF14*, and these mapped genetic variants are also associated with the abundance of the *KLF14* transcript only in adipose tissue [14,18]. The associations are usually stronger in women than in men. Thus, the potential interactive effect of sex and obesity on the association between *KLF14* variants and cardiometabolic phenotypes required further elucidation.

Aging is an important risk factor for chronic metabolic and inflammatory disorders such as atherosclerosis, cancer, and type 2 diabetes mellitus [19–21]. CpG sites located in *KLF14* are associated with aging-related alteration of hypermethylation in whole blood samples, thus providing an accurate estimate of chronological age and serving as agerelated epigenetic biomarkers [22–24]. In addition, the age-related DNA methylation level of *KLF14* may be a risk marker for diabetes mellitus [25]. Recently, extensive studies have also focused on the genetic loci of metabolically healthy phenotype in obese individuals and of genetic loci that uncouple excessive adiposity from its cardiometabolic comorbidities [26,27]. The Taiwan Biobank (TWB) population-based cohort study enrolled >100,000 volunteers aged 30–70 years with no history of cancer [28,29]. In the current study, we included participants from the TWB cohort study to elucidate the effects of age, sex, and obesity on the association of *KLF14* variants and methylation status with conventional and allometric body shape indices, IR surrogate marker levels, various metabolic traits, and metabolic syndrome. We also assessed whether *KLF14* variants and methylation levels are markers of metabolically healthy obese phenotype and whether these variations uncouple excessive adiposity from the adverse metabolic traits in Taiwan.

#### **2. Results**

*2.1. Regional Plot Association Studies for Genetic Variants at Positions between 130.3 and 130.5 Mb on Chromosome 7q32.2 and Study Phenotypes*

The flowchart of participant enrollment is presented in Figure 1. We performed regional plot association studies in 78,742 TWB participants to examine the association between genetic variants at positions between 130.3 and 130.5 Mb on chromosome 7q32.2 and the studied phenotypes. Lead SNPs with genome-wide significance were noted for the HDL cholesterol level (rs4731702, *<sup>p</sup>* = 6.69 <sup>×</sup> <sup>10</sup>−14), triglyceride level (rs13240528,

*<sup>p</sup>* = 1.46 <sup>×</sup> <sup>10</sup>−16), metabolic syndrome (rs3996352, *<sup>p</sup>* = 1.30 <sup>×</sup> <sup>10</sup>−13), mean blood pressure (BP) (rs1364422, *<sup>p</sup>* = 3.59 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ), and hip index (HI) (rs35057928, *<sup>p</sup>* = 3.48 <sup>×</sup> <sup>10</sup>−14), whereas associations with *<sup>p</sup>* < 1.41 <sup>×</sup> <sup>10</sup>−<sup>4</sup> were noted for body mass index (BMI) (rs3996352, *<sup>p</sup>* = 1.58 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), a body shape index (ABSI) (rs34072724, *<sup>p</sup>* = 2.14 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), hip circumference (rs35057928, *<sup>p</sup>* = 3.54 <sup>×</sup> <sup>10</sup>−<sup>7</sup> ), and waist circumference (rs34072724, *<sup>p</sup>* = 1.41 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ) (Table 1 and Supplementary Figures S1–S10). Except for rs1364422, which is the lead SNP for mean BP, all other lead SNPs were in nearly complete LD with rs4731702 (r<sup>2</sup> > 0.95) (Supplementary Figure S11). Our data revealed that the lead SNP for each phenotype was located upstream of the *KLF14* gene region, revealing pleiotropic effects on this gene locus. ciations with *p* < 1.41 × 10−4 were noted for body mass index (BMI) (rs3996352, *p* = 1.58 × 10−6), a body shape index (ABSI) (rs34072724, *p* = 2.14 × 10−6), hip circumference (rs35057928, *p* = 3.54 × 10−7), and waist circumference (rs34072724, *p* = 1.41 × 10−5) (Table 1 and Supplementary Figures S1–S10). Except for rs1364422, which is the lead SNP for mean BP, all other lead SNPs were in nearly complete LD with rs4731702 (r<sup>2</sup> > 0.95) (Supplementary Figure S11). Our data revealed that the lead SNP for each phenotype was located upstream of the *KLF14* gene region, revealing pleiotropic effects on this gene locus.

*2.1. Regional Plot Association Studies for Genetic Variants at Positions between 130.3 and 130.5* 

The flowchart of participant enrollment is presented in Figure 1. We performed regional plot association studies in 78,742 TWB participants to examine the association between genetic variants at positions between 130.3 and 130.5 Mb on chromosome 7q32.2 and the studied phenotypes. Lead SNPs with genome-wide significance were noted for the HDL cholesterol level (rs4731702, *p* = 6.69 × 10−14), triglyceride level (rs13240528, *p* = 1.46 × 10−16), metabolic syndrome (rs3996352, *p* = 1.30 × 10−13), mean blood pressure (BP) (rs1364422, *p* = 3.59 × 10−8), and hip index (HI) (rs35057928, *p* = 3.48 × 10−14), whereas asso-

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 22

*Mb on Chromosome 7q32.2 and Study Phenotypes*

**2. Results**


**Figure 1.** Study inclusion and exclusion criteria flowchart. This flowchart presents the inclusion and exclusion criteria used to screen for Taiwan Biobank (TWB) participants. GWAS—genome-wide association study; QC—quality control; HL—hyperlipidemia, HTN—hypertension; DM—diabetes mellitus; FPG—fasting plasma glucose; HbA1c—glycated hemoglobin; SBP—systolic blood pressure; DBP—diastolic blood pressure; MBP—mean blood pressure; LDL-C:—low-density lipoprotein cholesterol; HDL-C—high-density lipoprotein cholesterol; TG—triglyceride; T-CHO—total cholesterol. Other phenotypes include age, body mass index (BMI), waist circumference, hip circumference, waist–hip ratio, a body shape index (ABSI), waist–hip index (WHI), hip index (HI), the product of triglyceride and fasting plasma glucose (the TyG index), TyG with adiposity status (TyG-BMI), and TyG with waist circumference (TyG-WC), current smoking status, and metabolic syn-**Figure 1.** Study inclusion and exclusion criteria flowchart. This flowchart presents the inclusion and exclusion criteria used to screen for Taiwan Biobank (TWB) participants. GWAS—genome-wide association study; QC—quality control; HL—hyperlipidemia, HTN—hypertension; DM—diabetes mellitus; FPG—fasting plasma glucose; HbA1c—glycated hemoglobin; SBP—systolic blood pressure; DBP—diastolic blood pressure; MBP—mean blood pressure; LDL-C:—low-density lipoprotein cholesterol; HDL-C—high-density lipoprotein cholesterol; TG—triglyceride; T-CHO—total cholesterol. Other phenotypes include age, body mass index (BMI), waist circumference, hip circumference, waist–hip ratio, a body shape index (ABSI), waist–hip index (WHI), hip index (HI), the product of triglyceride and fasting plasma glucose (the TyG index), TyG with adiposity status (TyG-BMI), and TyG with waist circumference (TyG-WC), current smoking status, and metabolic syndrome.

drome.


**Table 1.** Lead single nucleotide polymorphisms (SNPs) for various phenotypes at the *KLF14* gene region.

HDL—high-density lipoprotein; Ref—Reference allele; Alt—Alternate allele; MAF—Minor allele frequency; \* LD—Linkage disequilibrium between rs4731702 and lead SNPs.

#### *2.2. Association of the KLF14 rs4731702 Genotype with Parameters of Body Shape Indices, IR Surrogate Markers, and Metabolic Traits*

Because of nearly complete LD between lead *KLF14* polymorphisms, we selected *rs4731702*, a polymorphism associated with adipocyte size and body composition traits on *KLF14* [14], for further genotype–phenotype analysis of the various clinical phenotypes and laboratory parameters of 78,742 participants. By using an additive model, after adjustment for age, sex, BMI, and smoking status, we observed genome-wide significant associations of *rs4731702* with HI, HDL cholesterol, and triglyceride levels and metabolic syndrome, whereas associations with *<sup>p</sup>* < 2.65 <sup>×</sup> <sup>10</sup>−<sup>4</sup> were noted for hip circumference, waist circumference, body fat percentage (BFP), BMI, ABSI, systolic BP, diastolic BP, and mean BP. The products of triglyceride (TG) and fasting plasma glucose (the TyG index), TyG and body mass index (TyG-BMI), TyG and waist circumference (TyG-WC), and hypertension (Supplementary Table S1) influence type 2 diabetes risk via a female-specific effect on adipocyte size and body composition.

#### *2.3. Interactive Effects of Sex and Obesity on the Association between the KLF14 rs4731702 Genotype and Various Phenotypes*

We investigated whether sex and obesity affect the association between the *rs4731702* genotype and the studied phenotypes. As presented in Table 2, for female participants, genome-wide significant associations were found between the *rs4731702* genotype and hip circumference, HI, HDL cholesterol and triglyceride levels, TyG index, TyG-BMI, and metabolic syndrome. Significant associations with *<sup>p</sup>* < 2.65 <sup>×</sup> <sup>10</sup>−<sup>4</sup> were found between the *rs4731702* genotype and waist circumference, BMI, ABSI, diastolic and mean BP, TyG-WC, and diabetes mellitus in women. However, no significant associations were found between the *rs4731702* genotype and the studied phenotypes in men. The results of the two-sample *t* tests revealed that sex affected the association between the *KLF14* rs4731702 genotype and waist circumference, hip circumference, ABSI, HI, HDL cholesterol, and triglyceride levels (Table 2). These results suggest that the association between the *rs4731702* genotype and several phenotypes is sex dependent. We further evaluated the genotype–phenotype associations of *rs4731702* in obese and nonobese participants. Differential associations according to adiposity status were noted only in body shape indices, including waist and hip circumferences and ABSI, with significant associations occurring only in obese participants but not in nonobese participants (Supplementary Table S2). We then divided the participants into four groups according to sex and obesity. No obvious associations were noted between the *rs4731702* genotype and various phenotypes in men, whether obese or nonobese (Supplementary Table S3). In women, differential associations with adiposity status were noted between the *rs4731702* genotype and body shape indices but not between the *rs4731702* genotype and metabolic traits (Table 3).

*Int. J. Mol. Sci.* **2022**, *23*, 4165

**Clinical and Laboratory Parameters Male (***n* **= 28,483) Female (***n* **= 50,259) Median (IQR) Beta SE** *p* **Value \* Median (IQR) Beta SE** *p* **Value \*** Age (years) 51.0 (40.0–59.0) −0.0034 0.1001 0.9733 51.0 (41.0–58.0) 0.0311 0.0705 0.6596 Body shape indices Body height (cm) 169.5 (165.5–173.5) 0.0041 0.0524 0.9377 157.5 (153.5–161.0) −0.0984 0.0360 0.0063 Body weight (kg) 71.7 (65.1–79.2) 0.0108 0.0457 0.8131 56.9 (51.7–63.4) −0.0719 0.0272 0.0081 Hip circumference (cm) 97.0 (93.0–101.3) 0.0175 0.0328 0.5927 94.0 (90.0–99.0) 0.1483 0.0261 1.40 × 10−8 ¶ Waist circumference (cm) 87.0 (82.0–93.0) 0.0295 0.0395 0.4549 79.6 (74.0–86.0) 0.1718 0.0368 2.98 × 10−6 ¶ Waist–hip ratio 0.90 (0.86–0.94) 0.0001 0.0004 0.8457 0.84 (0.80–0.89) 0.0003 0.0004 0.4090 Body fat percentage (%) 22.9 (19.6–26.1) 0.0424 0.0253 0.0939 31.6 (27.6–35.8) 0.0540 0.0153 4.17 × 10−4 Body mass index (kg/m2 ) 25.0 (23.0–27.3) 0.0701 0.0315 0.0262 23.0 (21.0–25.5) 0.0995 0.0251 7.19 × 10−5 ABSI 7.8 (7.6–8.1) 0.0015 0.0032 0.6524 7.8 (7.5–8.2) 0.0177 0.0034 2.34 × 10−7 ¶¶ WHI 4.0 (3.9–4.1) 0.0002 0.0017 0.9198 3.8 (3.7–4.0) 0.0014 0.0018 0.4300 HI 14.6 (14.3–14.9) 0.0022 0.0041 0.5900 15.5 (15.1–15.8) 0.0303 0.0036 2.16 × 10−17 ¶¶ Blood pressure and heart rate Mean heart rate † (/min) 34.5 (32.0–38.0) −0.0969 0.0952 0.3088 35.0 (32.5–38.0) 0.0109 0.0620 0.8599 Systolic BP † (mmHg) 121.0 (111.5–131.3) −0.4146 0.1412 0.0033 111.0 (102.0–123.0) −0.3622 0.1026 4.15 × 10−4 Diastolic BP † (mmHg) 76.5 (70.0–83.0) −0.3409 0.0952 3.45 × 10−4 69.0 (62.7–76.0) −0.2618 0.0659 7.10 × 10−5 Mean BP † (mmHg) 91.2 (84.3–98.7) −0.3654 0.1028 3.78 × 10−4 83.1 (76.3–91.3) −0.2953 0.0731 5.40 × 10−5 Lipid profiles Total cholesterol § (mmol/L) 190.0 (169.0–213.0) 0.0000 0.0007 0.9856 194.0 (172.0–218.0) 0.0004 0.0005 0.4241 HDL cholesterol § (mmol/L) 47.0 (40.0–54.0) 0.0002 0.0009 0.7872 57.0 (49.0–66.0) 0.0058 0.0006 9.26 × 10−20 ¶¶ LDL cholesterol § (mmol/L) 121.0 (101.0–142.0) 0.0004 0.0011 0.6975 118.0 (98.0–140.0) −0.0008 0.0008 0.3344 Triglyceride § (mmol/L) 106.0 (74.0–155.0) −0.0030 0.0021 0.1659 82.0 (59.0–118.0) −0.0127 0.0014 4.47 × 10−19 ¶¶ Glucose metabolism Fasting plasma glucose ‡ (mmol/L) 94.0 (89.0–99.0) −0.1248 0.1576 0.4284 80.0 (86.0–95.0) −0.3007 0.0938 0.0013 HbA1c ‡ (%) 5.6 (5.4–5.9) −0.0062 0.0063 0.3223 5.6 (5.4–5.8) −0.0129 0.0037 5.53 × 10−4 Insulin resistance surrogate markers TyG index §,‡ 10080.0 (6887.0–14952.0) −161.99 117.87 0.1694 7384.0 (5200.0–10878.0) −332.47 54.04 7.71 × 10−10 TyG-BMI §,‡ (×103 ) 252.5 (164.3–393.0) −4483 3138 0.1531 170.1 (113.4–266.2) −8072 1385 5.59 × 10−9 TyG-WC §,‡ (×103 ) 879.6 (578.6–1353.6) −14,439 10905 0.1855 585.6 (395.9–900.1) −25,264 4760 1.12 × 10−7 Atherosclerotic risk factors Diabetes mellitus (%) 12.13% −0.0362 0.0293 0.2161 7.73% −0.0998 0.0270 2.23 × 10−4 Hypertension (%) 30.56% −0.0665 0.0217 0.0021 17.22% −0.0510 0.0198 0.0101 Metabolic syndrome (%) 26.52% −0.0640 0.0239 0.0073 19.34% −0.1414 0.0200 1.47 × 10−12 Participants were analyzed with the exclusion of those with a history of † hypertension, ‡ diabetes mellitus, and § hyperlipidemia. Abbreviations as in Figure 1; SE; —standard error;

IQR—Inter Quartile range. Data are presented as median (interquartile range). *\* p*—adjusted for age, BMI, and current smoking; Age—adjusted for BMI and current smoking; and

.

*t*-test: \*\* *p* < 0.001, \* *p* < 0.01.

BMI—adjusted for age and smoking. Significance was defined as a *p* value of <0.05/(131 + 27) = 3.16 × 10−4

**Table 2.** Association between *KLF14* rs4731702 genotype and body shape indices and metabolic traits according to sex.

*Int. J. Mol. Sci.* **2022**, *23*, 4165



mellitus, and § hyperlipidemia.

#### *2.4. Association between KLF14 Promoter Methylation Status and KLF14 Variants and Age*

We examined whether *KLF14* variants are associated with the DNA methylation status of the *KLF14* promoter region. We observed no significant association between *KLF14* variants and methylation levels in 32 *KLF14* methylation sites (Supplementary Table S4). By contrast, after adjustment for sex, BMI, and smoking, genome-wide significant associations were observed between age and 15 *KLF14* promoter DNA methylation sites, with hypermethylation associated with increasing age and with the minimum *p* value of 3.83 <sup>×</sup> <sup>10</sup>−<sup>253</sup> for *KLF14* cg08097417 (Figure 2). Both male and female participants exhibited a highly significant association between age and *KLF14* promoter methylation status (Supplementary Table S4). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 8 of 22

**Figure 2.** *KLF14* methylation and age: (**A**) Association between *KLF14* promoter methylation cg08097417 level and chronologic age and (**B**) Genomic structure of *KLF14* and the association between *KLF14* methylation status and age. **Figure 2.** *KLF14* methylation and age: (**A**) Association between *KLF14* promoter methylation cg08097417 level and chronologic age and (**B**) Genomic structure of *KLF14* and the association between *KLF14* methylation status and age.

nificant positive associations between cg08097417 methylation and various body shape indices and metabolic traits, such as waist circumference, waist–hip ratio, ABSI, waist–hip index (WHI), systolic and mean BP, total cholesterol levels, and hypertension. Negative associations were observed in cg08097417 methylation with body height, body weight, and hip circumference, and positive associations with *p* < 2.65 × 10−4 between cg08097417 methylation and diastolic BP, low-density lipoprotein (LDL) cholesterol, triglyceride, and hemoglobin A1c (HbA1c) levels and diabetes mellitus, which are in the same direction with age-related adiposity and metabolic changes (Table 4). In addition, after further adjustment for age, significance levels markedly decreased, and all the associations became nonsignificant, suggesting that the associations are predominantly mediated by chronologic

We evaluated the association between a *KLF14* promoter methylation site,

*2.5. Associations between KLF14 Methylation Status at the Promoter Region and Various* 

*Phenotypes Are Mediated by Chronologic Age*

#### *2.5. Associations between KLF14 Methylation Status at the Promoter Region and Various Phenotypes Are Mediated by Chronologic Age*

We evaluated the association between a *KLF14* promoter methylation site, cg08097417, which displayed the strongest association with age and various phenotypes. After adjustment for sex, BMI, and current smoking, our data revealed genome-wide significant positive associations between cg08097417 methylation and various body shape indices and metabolic traits, such as waist circumference, waist–hip ratio, ABSI, waist–hip index (WHI), systolic and mean BP, total cholesterol levels, and hypertension. Negative associations were observed in cg08097417 methylation with body height, body weight, and hip circumference, and positive associations with *<sup>p</sup>* < 2.65 <sup>×</sup> <sup>10</sup>−<sup>4</sup> between cg08097417 methylation and diastolic BP, low-density lipoprotein (LDL) cholesterol, triglyceride, and hemoglobin A1c (HbA1c) levels and diabetes mellitus, which are in the same direction with age-related adiposity and metabolic changes (Table 4). In addition, after further adjustment for age, significance levels markedly decreased, and all the associations became nonsignificant, suggesting that the associations are predominantly mediated by chronologic age (Table 4). We further extended the analysis to other *KLF14* promoter methylation sites with regional plot association studies. The results revealed similar association trends between the *KLF14* promoter methylation levels and body shape indices and metabolic traits, with subsidence of significant associations after adjustment for age (Figure 3).

#### *2.6. Associations between KLF14 Methylation Status at the Promoter Region and Various Phenotypes: Subgroup Analysis with Sex and Obesity*

Subgroup analysis revealed an interactive effect of sex on the associations between cg08097417 methylation levels and several metabolic traits, including systolic BP, total and LDL cholesterol and triglyceride levels, and HbA1c, with the association occurring predominantly in the female sex (Supplementary Table S5 and Figure 4). By contrast, no interactive effect was noted on the associations between cg08097417 methylation levels and all study phenotypes according to adiposity status (Supplementary Table S6).

*Int. J. Mol. Sci.* **2022**, *23*, 4165


and § hyperlipidemia.

**Table 4.** Association of the cg08097417 methylation site and age with body shape indices and metabolic traits.

**Figure 3.** Regional plot association analysis between*KLF14* methylation status and the studied phenotypes (**A**: Age, **B**: Body height, **C**: Body weight, **D**: Hip circumference, **E**: Waist circumference, **F**: Waist hip ratio, **G**: ABSI, **H**: WHI, **I**: Systolic BP, **J**: Mean BP, **K**: Total cholesterol, **L**: LDL-cholesterol, **M**: Triglyceride, **N**: HbA1c, **O**: Diabetes mellitus, **P**: Hypertension, **Q**: Metabolic syndrome) with (blue circle) or without (red circle) adjustment for age. The results showed subsidence of significant associations after adjustment for age. **Figure 3.** Regional plot association analysis between*KLF14* methylation status and the studied phenotypes (**A**) Age, (**B**) Body height, (**C**) Body weight, (**D**) Hip circumference, (**E**) Waist circumference, (**F**) Waist hip ratio, (**G**) ABSI, (**H**) WHI, (**I**) Systolic BP, (**J**) Mean BP, (**K**) Total cholesterol, (**L**) LDLcholesterol, (**M**) Triglyceride, (**N**) HbA1c, (**O**) Diabetes mellitus, (**P**) Hypertension, (**Q**) Metabolic syndrome, with (blue circle) or without (red circle) adjustment for age. The results showed subsidence of significant associations after adjustment for age.

Subgroup analysis revealed an interactive effect of sex on the associations between cg08097417 methylation levels and several metabolic traits, including systolic BP, total and LDL cholesterol and triglyceride levels, and HbA1c, with the association occurring

*2.6. Associations between KLF14 Methylation Status at the Promoter Region and Various* 

*Phenotypes: Subgroup Analysis with Sex and Obesity*

**Figure 4.** Regional plot association analysis between the studied phenotypes (**A**: Systolic BP, **B**: Total cholesterol, **C**: LDL cholesterol, **D**: Triglyceride, **E**: HbA1c, **F**: Metabolic syndrome) and DNA methylation status according to sex, with red circles representing female participants and blue circles representing male participants. **Figure 4.** Regional plot association analysis between the studied phenotypes (**A**) Systolic BP, (**B**) Total cholesterol, (**C**) LDL cholesterol, (**D**) Triglyceride, (**E**) HbA1c, (**F**) Metabolic syndrome, and DNA methylation status according to sex, with red circles representing female participants and blue circles representing male participants.

predominantly in the female sex (Supplementary Table S5 and Figure 4). By contrast, no interactive effect was noted on the associations between cg08097417 methylation levels

and all study phenotypes according to adiposity status (Supplementary Table S6).

#### *2.7. KLF14 rs4731702 Variant as a Marker of Metabolically Healthy Obese Phenotype with the Adiposity-Increasing Allele Associated with a Favorable Cardiometabolic Risk Profile 2.7. KLF14 rs4731702 Variant as a Marker of Metabolically Healthy Obese Phenotype with the Adiposity-Increasing Allele Associated with a Favorable Cardiometabolic Risk Profile*

We further evaluated whether the *KLF14* rs4731702 genotypes and *KLF14* promoter methylation levels are involved in the uncoupling of adiposity from its adverse metabolic risk profiles using the standardized effect size analysis in the Radar plot. From the Radar plots of different subgroups of participants, the *KLF14* rs4731702 C allele was the most strongly and positively associated with HI, ABSI, hip and waist circumferences, BMI, and BFP, but no association with waist–hip ratio and WHI was found (Figure 5A–C). By contrast, the *KLF14* rs4731702 C allele was negatively associated with nearly all metabolic traits, positively associated with HDL cholesterol levels, and had no association with total and LDL cholesterol levels. These results were provided by analyses for total participants, female participants, and obese female participants (Figure 5A–C). By contrast, cg08097417 hypermethylation showed a positive association with both body shape indices and metabolic traits, except for body height, hip circumference, and HI (Figure 5D). In the analysis conducted using the criteria of Park et al. [27], our data indicated that the rs4731702 C allele has a lower risk of metabolically unhealthy status compared with metabolically We further evaluated whether the *KLF14* rs4731702 genotypes and *KLF14* promoter methylation levels are involved in the uncoupling of adiposity from its adverse metabolic risk profiles using the standardized effect size analysis in the Radar plot. From the Radar plots of different subgroups of participants, the *KLF14* rs4731702 C allele was the most strongly and positively associated with HI, ABSI, hip and waist circumferences, BMI, and BFP, but no association with waist–hip ratio and WHI was found (Figure 5A–C). By contrast, the *KLF14* rs4731702 C allele was negatively associated with nearly all metabolic traits, positively associated with HDL cholesterol levels, and had no association with total and LDL cholesterol levels. These results were provided by analyses for total participants, female participants, and obese female participants (Figure 5A–C). By contrast, cg08097417 hypermethylation showed a positive association with both body shape indices and metabolic traits, except for body height, hip circumference, and HI (Figure 5D). In the analysis conducted using the criteria of Park et al. [27], our data indicated that the rs4731702 C allele has a lower risk of metabolically unhealthy status compared with metabolically healthy status in obese participants (odds ratio: 0.88, 95% confidence interval [CI]: 0.85–0.92, *<sup>p</sup>* = 1.92 <sup>×</sup> <sup>10</sup>−11), whereas hypermethylation of the cg08097417 methylation site was associated with a higher risk of metabolically unhealthy status in obese participants (odds ratio 4.04, 95% CI 1.95–8.41, *<sup>p</sup>* = 1.81 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ) (Supplementary Table S7).

healthy status in obese participants (odds ratio: 0.88, 95% confidence interval [CI]: 0.85– 0.92, *p* = 1.92 × 10−11), whereas hypermethylation of the cg08097417 methylation site was associated with a higher risk of metabolically unhealthy status in obese participants (odds

ratio 4.04, 95% CI 1.95–8.41, *p* = 1.81 × 10−4) (Supplementary Table S7).

**Figure 5.** *KLF14* rs4731702 genotype (**A**–**C**) and *KLF14* promoter methylation site cg08097417 (**D**) and their associations with body shape indices (left panel) and metabolic traits (right panel) in the **Figure 5.** *KLF14* rs4731702 genotype (**A**–**C**) and *KLF14* promoter methylation site cg08097417 (**D**) and their associations with body shape indices (left panel) and metabolic traits (right panel) in the Taiwan Biobank population. Standardized coefficients (beta) are shown in Radar plots for total participants (**A**), female participants (**B**,**D**), and obese female participants (**C**). The middle of the three concentric circles is labeled "0," representing no association between the *KLF14* variant and methylation status and trait. Points falling outside the middle octagon or dodecagon represent positive variant–trait or methylation–trait associations, whereas those inside it represent negative variant–trait or methylation–trait associations.

#### **3. Discussion**

This study investigated the differential genetic and epigenetic effects of *KLF14* on the association between body shape indices and metabolic traits. Several novel results were identified. First, we observed an independent association of genetic and epigenetic effects of *KLF14* on both multiple body shape indices and metabolic traits. Second, the association of *KLF14* variants with body shape indices and metabolic traits occurred predominantly in the female sex, and the association of *KLF14* variants with body shape indices occurred predominantly in obese participants. Thus, obese women were the only subgroup associated with both body shape indices and metabolic traits. Third, *KLF14* methylation is a marker of age in forensic medicine for both sexes. In addition to diabetes mellitus, *KLF14* promoter methylation status showed a significant age-dependent association with various body shape indices and metabolic traits, and the associations with lipid profiles, HbA1c, and metabolic syndrome occurred predominantly in female participants. Finally, differential genetic and epigenetic effects were noted in crossphenotype associations using the adiposity–cardiometabolic trait pairs. *KLF14* rs4731702 variant, but not methylation status, served as a marker involved in the uncoupling of adiposity from its adverse metabolic risk profiles predominantly in subgroups of obese female participants. To the best of our knowledge, this is the first report investigating the mediated and interactive effects of age, sex, and obesity on the association of genetic and epigenetic *KLF14* variations with body shape indices and metabolic traits. These results support the critical role of *KLF14* as an age-, sex-, and obesity-specific key transcriptional regulator affecting a large transregulatory network of metabolic traits and adiposity status, and the results also provided evidence for the differential genetic and epigenetic effects of *KLF14* on the risk of cardiometabolic disorders.

#### *3.1. Association between KLF14 Variants and Metabolic Traits: The Role of Sex and Obesity*

By including the *KLF14* rs4731702 genotypes in the analysis, Small et al. [14] demonstrated colocalization of the GWAS signal for metabolic disorders and the expression quantitative trait locus (eQTL) for nearly 400 genes in trans to the *KLF14* locus. The results revealed *KLF14* to be one of the largest trans-eQTL hotspots in the human genome [30]. Despite the nearly ubiquitous expression of KLF14 in many tissues [31], *KLF14* variants regulate gene expression only in adipose tissue [14]. *KLF14* is also an imprinted gene with monoallelic maternal expression [14]. Our data indicated a significant association of *KLF14* variants with various metabolic traits, such as lipid profiles, BP status, IR surrogate marker levels, and metabolic syndrome, but only in female participants. By contrast, no interactive effect for such associations according to adiposity status was noted in our study population.

#### *3.2. Association of KLF14 Variants with Adiposity Indices, including Body Shape and Body Fat Distribution*

Body shape plays a critical role in influencing cardiometabolic complications of obesity, with the abdominal size showing positive associations and gluteofemoral size showing inverse associations with complications [32]. Conventional body shape indices, such as WHR and waist and hip circumferences, are strongly associated with BMI, and even after adjustment for BMI, waist and hip circumferences are strongly associated with height [14]. Allometric body shape indices, such as ABSI and HI, are alternatives to body shape indices and are independent of body size and general obesity [33,34]. In analogy to ABSI as an allometric index of waist circumference, WHI was created as an allometric index of WHR [35]. ABSI has been shown to be significantly associated with cardiometabolic risk factors and mortality [33,36]. ABSI also achieves better mortality risk stratification than other body shape indices, which are correlated with BMI [37]. Similar to our results with sexual dimorphism, GWAS of allometric body shape indices in the UK Biobank Resources of 406,697 participants with British ancestry indicated a genome-wide significant association of *KLF14* variants with HI in women but not in men [35]. We further revealed the interactive effect of obesity on the association between *KLF14* variants and body shape

indices. In subgroup analysis, the association between *KLF14* variants and body shape indices occurred only in obese women, showing the critical role of both sex and obesity in associations between *KLF14* variants and body shape indices.

#### *3.3. Epigenetic Effects of KLF14 on Adiposity Indices and Metabolic Traits Occur Only in the Female Sex and Are Dependent on Age*

Aging generally induces global hypomethylation with a genome-wide decrease in the average DNA methylation level. By contrast, aging-related hypermethylation events occurred in 13% of the CpG sites among the millions of sites in the genome [38]. *KLF14* has a large CpG island that spans almost its entire open reading frame. By analyzing KLF14 in offspring from a DNA methyltransferase 3a conditional knockout mouse, *KLF14* expression was detected within the hypermethylated region inherited from the mother, which is a typical feature of the KLF family [7]. Using pathway analyses of adipose tissue from 190 individuals, altered DNA methylation in 1050 genes, including *KLF14*, has been reported as epigenetic markers of chronological age in blood [39]. DNA methylation in *KLF14* is linearly correlated with human age with an increase in the methylation level of the *KLF14* promoter region in multiple tissues in mice, such as the whole blood, adipose tissue, kidney, spleen, lung, and colon, which is significantly associated with inflammatory marker levels and with decreased *KLF14* downstream gene methylation in the whole blood, the adipose tissue, and the spleen [40]. More importantly, DNA methylation loci in *KLF14* is an age-related epigenetic biomarker; it can be used as an age prediction model to predict chronological age and may be considered an important factor in the development of aging-associated diseases, such as cancer, diabetes mellitus, and cardiovascular diseases [38,41].

Our data revealed the hypermethylation of the *KLF14* promoter region with age in both sexes. We further demonstrated significant associations of *KLF14* promoter methylation with body shape indices and metabolic traits, and the association occurred predominantly in the female sex, which was similar to the results of *KLF14* variants. However, no significant association between *KLF14* variants and *KLF14* promoter methylation was noted. These results are consistent with those of a previous study reporting a significant association between the *KLF14* rs4731702 genotype and *KLF14* gene expression and DNA methylation in adipose tissue, but not in whole blood. The finding suggested that as a blood biomarker, *KLF14* exerts both genetic and epigenetic effects on body shape indices and metabolic traits independently [14]. The association of *KLF14* methylation with various phenotypes was abolished after adjustment for age, revealing an age-dependent effect. Thus, *KLF14* methylation can serve not only as an epigenetic biomarker of age but also as an epigenetic biomarker of age-related changes in adiposity status, body shape indices, and metabolic phenotypes.

#### *3.4. KLF14 rs4731702 Variant as a Marker of Metabolically Healthy Obese Phenotype That Uncouples Excessive Adiposity from Its Comorbidities*

Obesity is typically associated with chronic, low-grade inflammation and a constellation of metabolic abnormalities, which are important risk factors for type 2 diabetes and cardiovascular diseases [42,43]. For a given fat mass, individuals who are obese may not develop metabolic comorbidities, which was termed metabolically healthy obesity (MHO). The mechanism of the discrepancy has been suggested to be related to disproportionate adipose tissue distribution in metabolically unhealthy obesity (MUO) with visceral adiposity [44,45] or an impaired ability to expand subcutaneous fat in the lower part of the body. Pathway analyses have suggested that pathophysiological mechanisms for the differential status of metabolically healthy or unhealthy individuals may involve adipogenesis, angiogenesis, transcriptional regulation, and IR [46]. Cumulative evidence from several large-scale prospective studies suggests that MHO individuals have higher risks than healthy normal-weight individuals of many cardiometabolic outcomes and cardiovascular diseases, such as hypertension, IR, diabetes mellitus, and cerebrovascular disease [47–56]; yet, their risk is lower than that of MUO individuals [46].

Recently, GWASs have identified genetic variants that are associated with increased adiposity in conjunction with a healthy metabolic profile [46]. GWAS investigating genetic determinants of insulin concentrations in individuals without diabetes highlighted that many identified loci showed associations with markers suggestive of clinical IR, such as high triglyceride and low HDL concentrations [57]. Favorable adiposity alleles were also associated with a higher subcutaneous fat mass and lower liver fat content, supporting the hypothesis that storing excess triglycerides in metabolically low-risk depots has beneficial effects on metabolism [57]. Waist-specific and hip-specific polygenic risk scores, which are related to visceral and subcutaneous abdominal fat and gluteofemoral and leg fat, respectively, increased the risks of an adverse metabolic profile, type 2 diabetes, and coronary artery disease [58]. In a genome-wide discovery of genetic loci that uncouple excess adiposity from its comorbidities, Huang et al. [26] identified 62 loci with the same allele significantly associated with both higher adiposity and lower cardiometabolic risk. These loci are enriched for genes expressed in adipose tissue and for regulatory variants influencing nearby genes affecting adipocyte differentiation. In a Korean study examining the genetic determinants of MHO and metabolically unhealthy normal weight phenotypes, the genes common to both models are related to lipid metabolism, and those associated with MHO are related to insulin or glucose metabolism [27]. Our data indicated an inverse association between the rs4731702 genotypes and multiple body shape indices when compared with the association with metabolic traits and IR surrogate markers. These results supported *KLF14* variants as a marker for uncoupling excessive adiposity from its cardiometabolic risk profiles, which occurred predominantly in obese women. By contrast, *KLF14* hypermethylation is associated with a metabolically unhealthy obese phenotype coupling of adiposity with its comorbidities. Interestingly, Cannataro et al. [59] found that microRNAs may modify promoter by methylation on targeted genes and regulate their expression, which affect body shape under Ketogenic Diet. Therefore, we definitely consider investigating the microRNA candidates that regulate KLF14 associated body shape in the near future.

#### *3.5. Limitation*

This study has limitations. First, ethnic genetic heterogeneity has been noted in the genetic association studies, and our data may not be applied to other ethnic populations. Second, several regional plot association studies did not reach genome-wide significance for the associations. Further studies with larger sample size and using different ethnic populations may help to strengthen the validity of our analysis.

#### **4. Participants and Methods**

#### *4.1. TWB Participants*

We included 107,494 TWB participants with no history of cancer who had undergone genotyping from Axiom Genome-Wide CHB 1 or 2 Array (Affymetrix, Santa Clara, CA, USA). Of them, 28,752 were excluded because of the absence of imputation data (12,289 participants) and after quality control (QC) for the GWAS with identity by descent (IBD) > 0.187 revealing relative pairs of second-degree relatives or closer (10,956 participants). Participants with fasting <6 h (2862 participants) and the absence of any of the study phenotypes (2645 participants) were also excluded. When examining the lipid profile, blood pressure status, and glucose metabolism parameters, the participants with a history of hyperlipidemia, hypertension, and diabetes mellitus were also excluded. The flowchart of participant enrollment is presented in Figure 1. The definitions of hypertension, diabetes mellitus, obesity, current smoking, and metabolic syndrome are provided in Supplementary Methods. Ethical approval was received from the Research Ethics Committee of Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (approval number: 05-X04-007), and Ethics and Governance Council of the Taiwan Biobank (approval number: TWBR10507-02 and TWBR10611-03). All participants signed an informed consent form before participating in the study.

#### *4.2. Clinical Phenotypes and Laboratory Examinations*

We measured the following clinical parameters: body height, body weight, and conventional body shape indices such as waist circumference, hip circumference, waist–hip ratio, and BMI, mean heart rate, and systolic, mean, and diastolic BP. We also collected the following biochemistry data: lipid profiles and glucose metabolism parameters such as total HDL and LDL cholesterol and triglyceride levels, fasting plasma glucose levels, and HbA1c. TyG index, TyG-BMI, and TyG-WC were used as IR surrogate markers [60]. BFP was measured using the Body Composition Analyzer BC-420MA (TANITA, TANITA Corporation, Sportlife Tokyo, Japan), a device that sends a weak electric current through the body to measure the impedance (electrical resistance) of the body. ABSI, WHI, and HI were calculated as parameters of allometric body shape indices [35]. The measurement details of BFP and the calculation of conventional and allometric body shape indices and TyG-related parameters are provided in Supplementary Methods. We also performed cross-phenotype analysis for *KLF14* variants and *KLF14* methylation levels and subsequently calculated the standard deviation effect sizes *(β)* of their effects on the body shape indices and metabolic traits to elucidate whether *KLF14* variations may uncouple excessive adiposity from its comorbidities [26]. The resulting effect sizes are on the same scale, making them comparable across all variants, methylation levels, and traits. By using the criteria of Park et al. [27], we further examined the association of *KLF14* variants and *KLF14* methylation status with metabolically healthy and unhealthy obese phenotypes in TWB participants.

#### *4.3. DNA Isolation and Genotyping of KLF14 Variants*

DNA was isolated from blood samples using a PerkinElmer Chemagic 360 instrument (PerkinElmer, Waltham, MA, USA). SNP genotyping was conducted using custom TWB chips and was performed on the Axiom genome-wide array plate system (Affymetrix, Santa Clara, CA, USA).

#### *4.4. Regional Plot Association Studies*

To determine the lead SNPs around the *KLF14* region for various studied phenotypes, we performed a QC for regional plot association analysis by including TWB participants enrolled after applying the exclusion criteria (Figure 1). The Axiom genomewide CHB 1 and 2 array plates, each with 27,720 and 79,774 participants and comprising 611,656 and 640,160 SNPs, respectively, were used for imputation analysis. Genomewide genotype imputation was performed using SHAPEIT and IMPUTE2, with the data of the East Asian population from the 1000 Genome Project Phase study used as the reference panel. After imputation, indels were removed using VCFtools, and the QC was performed by filtering SNPs with an IMPUTE2 imputation quality score of >0.3. All samples enrolled for the analysis had the identity by descent PI\_HAT > 0.1875. For SNP QC, three criteria were used for exclusion from subsequent analyses: (1) an SNP call rate of <3%, (2) a minor allele frequency of <0.01, and (3) the violation of the Hardy–Weinberg equilibrium with *p* < 10−<sup>6</sup> . After QC, the data of 85,508 participants and 131 SNPs at positions between 130.3 and 130.5 Mb on chromosome 7q32.2 were included in the regional plot association analysis.

#### *4.5. DNA Methylation Analysis*

DNA methylation was assessed using sodium-bisulfite-treated DNA from whole blood using the Infinium MethylationEPIC BeadChipEPIC array (Illumina, San Diego, CA, USA). Four criteria were used to perform the QC for DNA methylation analysis: (1) normalization across batches for correction of dye bias, (2) background signal removal, (3) using the median absolute deviation method to eliminate outliers, and (4) elimination of probes with poor detection (*p* > 0.05) and those whose bead counts were <3. Finally, the data of 1702 participants were included in the *KLF14* methylation analysis. Sixty-six participants were further excluded due to fasting for <6 h (*n* = 16) or the absence of any study phenotype (*n* = 50).

#### *4.6. Statistical Analysis*

Continuous variables were expressed as median and interquartile ranges when the distribution was strongly skewed. Categorical data distributions were expressed as a percentage. Before analysis, all study parameters with a skew distribution were logarithmically transformed to adhere to the normality assumption. To test for significant differences in the association between different subgroups of sex and obesity, we used a two-sample *t* test [32]. We assumed the genetic effect to be additive, and a general linear model was used to analyze the studied phenotypes in relation to the predictors of investigated genotypes and confounders. Multivariable logistic regression analysis was used to evaluate the independent effect of genotypes on the risk of hypertension, diabetes mellitus, and metabolic syndrome. The Radar plots were presented with the standardized coefficients (beta) in the linear regression. Genome-wide scans were performed, and the beta coefficient, standard error, and *p* values by general linear model or logistic regression for genome-wide data were analyzed using the analysis software package PLINK. Genome-wide significance was defined as a *<sup>p</sup>* value of <5 <sup>×</sup> <sup>10</sup>−<sup>8</sup> . For Bonferroni correction of regional plot association studies, the significant value was defined as *<sup>p</sup>* < 1.41 <sup>×</sup> <sup>10</sup>−<sup>5</sup> calculated as 0.05/(131 × 27) according to a total of 131 variants and 27 traits analyzed. For Bonferroni correction of each genotype–phenotype and methylation–phenotype analysis, the significant value was defined as *<sup>p</sup>* < 2.65 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , calculated as 0.05/(27 × 7) according to 27 traits and 7 subgroups analyzed. Linkage disequilibrium (LD) between each SNP was analyzed using LDmatrix software (https://analysistools.nci.nih.gov/LDlink/?tab=ldmatrix, accessed on 19 July 2021). All statistical analyses were performed using SPSS (version 22; SPSS, IBM Corp., Armonk, NY, USA) or PLINK. Missing data were approached with listwise deletion.

#### **5. Conclusions**

Taken together, our data revealed that variations of the *KLF14* gene, including gene variants and methylation levels for body shape indices, IR status, and metabolic traits, are highly dependent on age, sex, or obesity. These results support the critical role of *KLF14* as a key age-, sex-, and obesity-specific transcriptional regulator affecting a large adipose-specific transregulatory network of metabolic traits and adiposity status, which provide further evidence for the differential genetic and epigenetic effects of *KLF14* on the risk of cardiometabolic disorders.

**Supplementary Materials:** The Supplementary Materials are available online at https://www.mdpi. com/article/10.3390/ijms23084165/s1.

**Author Contributions:** Conceptualization, Y.-L.K. and S.W.; methodology, Y.-L.K. and M.-S.T.; software and validation, M.-S.T. and S.W.; formal analysis, H.-H.C., Y.-L.K. and L.-A.H.; resources, Y.-L.K. and H.-H.C.; data curation, S.W., M.-S.T. and H.-H.C.; writing—original draft preparation, S.W.; writing—review and editing, Y.-L.K. and S.W.; visualization, M.-S.T. and S.W.; project administration, Y.-L.K. and S.W.; supervision, Y.-L.K.; funding acquisition, M.-S.T., H.-H.C. and Y.-L.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by grants from Buddhist Tzu Chi Medical Foundation Academic Advancement (TCMF-EP 111-02, CMF-A 107-01-15), grants from the Ministry of Science and Technology (MOST 108-2314-B-303-026-MY3) and Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (TCRD-TPE-MOST-109-05) to Y. L. Ko, and Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (TCRD-TPE-109-RT-1) to H. H. Chou.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Research Ethics Committee of Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation (approval number: 05-X04-007), and Ethics and Governance Council of the Taiwan Biobank (approval number: TWBR10507-02 and TWBR10611-03).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** We greatly appreciate technical support from the Core Laboratory of Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, and expert statistical analysis assistance from Tsung-Han Hsieh.

**Conflicts of Interest:** No potential conflict of interest relevant to this article was reported.

#### **Abbreviations**


#### **References**


## *Article* **CD33 rs2455069 SNP: Correlation with Alzheimer's Disease and Hypothesis of Functional Role**

**Fabiana Tortora 1,† , Antonella Rendina 1,† , Antonella Angiolillo <sup>2</sup> , Alfonso Di Costanzo <sup>2</sup> , Francesco Aniello <sup>3</sup> , Aldo Donizetti 1,3,\* , Ferdinando Febbraio 1,‡ and Emilia Vitale 1,\* ,‡**


**Abstract:** The *CD33* gene encodes for a member of the sialic-acid-binding immunoglobulin-type lectin (Siglec) family, and is one of the top-ranked Alzheimer's disease (AD) risk genes identified by genome-wide association studies (GWAS). Many *CD33* polymorphisms are associated with an increased risk of AD, but the function and potential mechanism of many CD33 single-nucleotide polymorphisms (SNPs) in promoting AD have yet to be elucidated. We recently identified the *CD33* SNP rs2455069-A>G (R69G) in a familial form of dementia. Here, we demonstrate an association between the G allele of the rs2455069 gene variant and the presence of AD in a cohort of 195 patients from southern Italy. We carried out in silico analysis of the 3D structures of *CD33* carrying the identified SNP to provide insights into its functional effect. Structural models of the CD33 variant carrying the R69G amino acid change were compared to the CD33 wild type, and used for the docking analysis using sialic acid as the ligand. Our analysis demonstrated that the CD33-R69G variant may bind sialic acid at additional binding sites compared to the wild type, thus potentially increasing its affinity/specificity for this molecule. Our results led to a new hypothesis of rs2455069-A>G SNP as a risk factor for AD, suggesting that a long-term cumulative effect of the CD33-R69G variant results from the binding of sialic acid, acting as an enhancer of the CD33 inhibitory effects on amyloid plaque degradation.

**Keywords:** CD33; Alzheimer's disease; sialic acid; phagocytosis

#### **1. Introduction**

According to recent reports, it is estimated that the number of cases of dementia in the world is expected to rise to 152 million by 2050 as the population ages [1,2], with AD being the most common form of dementia currently [3,4]. Most AD cases are sporadic, with late-onset AD (LOAD) occurring at the age of 65 years or older. Approximately 5% of cases are classified as early-onset AD (EOAD), as reported in a systematic review and meta-analysis [5], where the age of onset is before 65 years and a percentage is considered as familial [6,7]. Except for a small number of early-onset cases who are afflicted due to highly penetrant single-gene mutations, AD, especially in the late-onset form, is genetically heterogeneous, with a polygenic or oligogenic risk inheritance [7]. Genome-wide association studies (GWAS) identified several genetic loci associated with increased susceptibility to LOAD affecting lipid metabolism, tau binding proteins, amyloid precursor protein (APP) metabolism [8], protein catabolism, immune cells, and microglia [9]. Of these, microglial

**Citation:** Tortora, F.; Rendina, A.; Angiolillo, A.; Di Costanzo, A.; Aniello, F.; Donizetti, A.; Febbraio, F.; Vitale, E. CD33 rs2455069 SNP: Correlation with Alzheimer's Disease and Hypothesis of Functional Role. *Int. J. Mol. Sci.* **2022**, *23*, 3629. https://doi.org/10.3390/ ijms23073629

Academic Editor: Elixabet Lopez-Lopez

Received: 25 February 2022 Accepted: 22 March 2022 Published: 26 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

cells are a key participant in LOAD pathogenesis, with many susceptibility loci, including *CD33*, playing a role [10–15]. CD33 belongs to a subfamily of the Ig superfamily comprising a single N-terminal V-set Ig domain that binds sialic-acid-containing glycans through a conserved positively charged arginine that interacts with the negatively charged C-1 carboxyl group on the sialic acid [16]. CD33 may be activated continuously by sialic-acid-containing glycoproteins and glycolipids, which are structural elements of amyloid plaques in the brains of Alzheimer's disease patients, resulting in an inhibition of microglia-mediated immune activation [17,18]. CD33 can affect microglial phagocytosis through crosstalk with the transmembrane receptor TREM2. Knockout of CD33 resulted in improved cognition and decreased amyloid β pathology in 5xFAD mice, whereas these effects were completely abrogated by additional knockout of TREM2 [19]. TREM2, through an interaction with DAP12/TYROBP, stimulates the tyrosine–protein kinase SYK, which in turn triggers the activation of microglial cell phagocytosis [20,21]. This activation could be counteracted by sialic-acid-activated CD33, which stimulates SHP1/SHP2 tyrosine phosphatases, thereby inhibiting microglial phagocytosis [13,21,22].

In accordance with the effect of *CD33* on the inhibition of microglial phagocytosis, primary microglial cells from *CD33* knockout mice [12] or deletion of *CD33* in human macrophages and microglia resulted in an increased phagocytosis of amyloid β1-42 [23]. This was further supported by targeting *CD33* in a mouse model of AD using an adenoassociated virus (AAV) vector encoding a *CD33* microRNA (miRCD33), which demonstrated a reduction in amyloid β40 and β42 levels in brain extracts [24].

Different *CD33* polymorphisms have been positively or negatively correlated with AD susceptibility. These SNPs can affect the expression level, the structure, and the function of CD33, leading to changes in microglia-mediated clearance of amyloid β, likely contributing to the accumulation of senile plaques in the brain [12,25–27]. The CD33 variant rs3865444(A) negatively correlates with AD, has a reduced *CD33* expression [12], and is coinherited with rs12459419(T), which modulates the splicing efficiency of the IgV domain exon involved in sialic acid binding [13,28].

We recently identified the SNP rs2455069 in exon 2 of the human *CD33* gene on chromosome 19, which is involved in an unusual familial form of dementia [29]. This SNP has been previously identified in a block of eight correlated SNPs in *CD33* significantly associated with cognitive decline in univariate analysis in the all-female cohort, with highly significant effects on hyaluronan synthase-1 (HAS1) expression in the temporal cortex [30]. The presence of the minor allele at rs2455069 leads to a change from an arginine to a glycine in the position 69 (R69G). Here, we demonstrate a positive correlation between the *CD33* SNP rs2455069 and Alzheimer's disease in a cohort of patients from a region of southern Italy, and secondly, that the R69G amino acid change affects the CD33 structure in a way that alters its binding affinity for sialic acid residues. We propose that the mechanism of the long-term cumulative effects of the CD33-R69G variant are mediated through an increased binding of sialic acid, which then acts as a more effective enhancer of the CD33 inhibitory effects on amyloid plaque degradation.

#### **2. Results**

#### *2.1. Genetic Analysis*

Based on the evidence of the possible involvement in an unusual form of dementia of the SNP rs2455069-A>G in exon 2 of *CD33* gene [29], leading to a change from arginine to glycine in position 69, we widened the analysis of this SNP to a cohort of AD patients from southern Italy. We found that the genotype frequency for the rs2455069 polymorphism in the control group was 11.1% (*n* = 15) for the GG, 53.0% (*n* = 77) for the AG, and 31.8% (*n* = 43) for the AA genotypes. In AD patients, the frequency of GG, AG, and AA genotypes were 17.9% (*n* = 35), 65.1% (*n* = 127), and 16.9% (*n* = 33), respectively. While genotypes in the LOAD sample were not in Hardy–Weinberg equilibrium (HWE, *p* = 0.00013), the genotype frequencies in controls did not deviate significantly from the Hardy–Weinberg equilibrium (HWE) (*p* = 0.08301). The G allele frequency was 39.6% in controls and 50.5% in AD patients (*p* = 0.00582). A significant association between AD and the GG or AG genotype was found (Table 1), particular in the dominant model (AG + GG vs. AA), leading to the hypothesis that the R69G change may have had an impact on the increased risk of LOAD in the analyzed cohort.


**Table 1.** Association between the genetic polymorphism CD33-rs2455069 and the risk of AD. *Int. J. Mol. Sci.* **2022**, *23*, 3629 3 of 13

AG + GG 2.294 (1.363–3.862) 10.03 *p* = 0.00154

<sup>a</sup> OR = odds ratio; <sup>b</sup> C.I. = confidence interval. GG 3.040 (1.428–6.476) 8.58 *p* = 0.00341

#### *2.2. In Silico Structural Analysis* A Ref

To evaluate the impact of the R69G change on the CD33 protein function, we carried out a series of in silico analyses. Six human CD33 partial structures (Siglec-3) were available in the RCSB Protein Data Bank [31] (Supplementary Table S1, Supplementary Figure S1). We prepared the monomeric form of each CD33 structure, summarized in Supplementary Table S1, using the web-based platform CHARMM-GUI. To reduce possible artifacts due to binding with mimetic substrates, we used a protein model based on the CD33 structure PDB-ID 5IHB, representing the protein region interacting with NAG as a nonspecific ligand. We compared the two structures obtained after CHARMM minimization carrying either arginine or glycine at position 69 of the CD33 protein. We did not observe significant changes in the protein backbone after basic energy minimization, but small rearrangements, both in the α-helix containing the glycine residue (red arrow in Figure 1A) and in a β-sheet of the repeated structural type C2 Ig domain (black arrow in Figure 1A) were evident. Most importantly, few differences were observed in the structures of sidechains, which showed small changes in the connecting region between the two domains (Figure 1B). G 1.555 (1.135–2.130) 7.61 *p* = 0.00582 <sup>a</sup> OR = odds ratio; <sup>b</sup> C.I. = confidence interval. *2.2. In Silico Structural Analysis* To evaluate the impact of the R69G change on the CD33 protein function, we carried out a series of in silico analyses. Six human CD33 partial structures (Siglec‐3) were available in the RCSB Protein Data Bank [31] (Supplementary Table S1, Supplementary Figure S1). We prepared the monomeric form of each CD33 structure, summarized in Supplementary Table S1, using the web‐based platform CHARMM‐GUI. To reduce possible artifacts due to binding with mimetic substrates, we used a protein model based on the CD33 structure PDB‐ID 5IHB,representing the protein region interacting with NAG as a nonspecific ligand. We compared the two structures obtained after CHARMM minimization carrying either arginine or glycine at position 69 of the CD33 protein. We did not observe significant changes in the protein backbone after basic energy minimization, but smallrearrangements, both in the α‐helix containing the glycine residue (red arrow in Figure 1A) and in a β‐sheet of the repeated structural type C2 Ig domain (black arrow in Figure 1A) were evident. Most importantly, few differences were observed in the structures of sidechains, which showed small changes in the connecting region between the two domains (Figure 1B).

**Figure 1.** Structural representation of CD33 variants. (**A**) Ribbon representation of the backbone of superimposed CD33 monomers (PDB ID 5IHB) with Arg69 (cyan) and Gly69 (orange) after minimization

by CHARMM. (**B**) Stick representation of backbone and sidechains of structures in (**A**). The arrows indicate the area with major changes in the structure of sidechains after basic energy minimization. by CHARMM. (**B**) Stick representation of backbone and sidechains of structures in (**A**). The arrows indicate the area with major changes in the structure of sidechains after basic energy minimization.

**Figure 1.** Structural representation of CD33 variants. (**A**) Ribbon representation of the backbone of superimposed CD33 monomers (PDB ID 5IHB) withArg69 (cyan) and Gly69 (orange) after minimization

*Int. J. Mol. Sci.* **2022**, *23*, 3629 4 of 13

These predictions led us to hypothesize that the effect of the arginine-to-glycine change at position 69 could predominantly affect the reorganization of polar sidechains, and only locally, the rearrangement of the backbone. In agreement with these observations, the analysis carried out on 200 structural models of CD33 (PDB ID 5IHB), 100 with the Arg69 residue and 100 with the Gly69 residue, indicated the presence of few changes in the boundary spatial coordinates (min and max) of the entire CD33 structure (Figure 2). These differences were hypothesized to be due to the increased length of the sidechain consisting of a three-carbon aliphatic straight chain ending in a guanidino group as a consequence of the change from a glycine to an arginine residue in position 69. These predictions led us to hypothesize that the effect of the arginine‐to‐glycine change at position 69 could predominantly affect the reorganization of polar sidechains, and only locally, the rearrangement of the backbone. In agreement with these observations, the analysis carried out on 200 structural models of CD33 (PDB ID 5IHB), 100 with the Arg69 residue and 100 with the Gly69 residue, indicated the presence of few changes in the boundary spatial coordinates (min and max) of the entire CD33 structure (Figure 2). These differences were hypothesized to be due to the increased length of the sidechain consisting of a three‐carbon aliphatic straight chain ending in a guanidino group as a consequence of the change from a glycine to an arginine residue in position 69.

**Figure 2.** Variations of the maximum and minimum dimensional values in the XYZ axes of the two CD33 variants (PDB ID 5IHB). The variations were obtained after minimization by CHARMM. Blue dots refer to the structures with the arginine residue, whereas the orange dots refer to the structures with a glycine residue in position 69. **Figure 2.** Variations of the maximum and minimum dimensional values in the XYZ axes of the two CD33 variants (PDB ID 5IHB). The variations were obtained after minimization by CHARMM. Blue dots refer to the structures with the arginine residue, whereas the orange dots refer to the structures with a glycine residue in position 69.

#### *2.3. Docking Analysis*

The predicted binding sites for sialic acid in the N-terminal domain (aa19-135) of the monomeric form of CD33 (PDB ID 5IHB) carrying either an arginine or a glycine residue in position 69 are reported in Figure 3. The affinity of each binding site toward the ligand is expressed as the binding energy (kcal/mol), with lower values indicating a greater bond affinity. In the Arg69 variant, the values were −4.8 kcal/mol near Arg119 and −5.3 kcal/mol near Arg91, while the values were slightly increased in the form with Gly69, with values of −5.2 kcal/mol near Arg119 and −5.7 kcal/mol near Arg91 (Figure 3). *2.3. Docking Analysis* The predicted binding sites for sialic acid in the N‐terminal domain (aa19‐135) of the monomeric form of CD33 (PDB ID 5IHB) carrying either an arginine or a glycine residue in position 69 are reported in Figure 3. The affinity of each binding site toward the ligand is expressed as the binding energy (kcal/mol), with lower values indicating a greater bond affinity. In the Arg69 variant, the values were −4.8 kcal/mol near Arg119 and −5.3 kcal/mol near Arg91, while the values were slightly increased in the form with Gly69, with values of −5.2 kcal/mol near Arg119 and −5.7 kcal/mol near Arg91 (Figure 3).

*Int. J. Mol. Sci.* **2022**, *23*, 3629 5 of 13

**Figure 3.** Sialic acid interactions with the two SNPs of CD33 structures. (**A**) Surface representation of functional domain of the CD33 predominant form with Arg in position 69, colored in red (PDB ID 5IHB). (**B**) Surface representation of functional domain of the CD33 variant form with Gly in position 69, colored in red (PDB ID 5IHB). The areas in orange indicate the binding sites of sialic acid, in stick representation, in correspondence with the positively charged arginine residues with which the ligand establishes saline bridges. The values of the binding energy in kcal/mol are reported for each binding site of sialic acid. **Figure 3.** Sialic acid interactions with the two SNPs of CD33 structures. (**A**) Surface representation of functional domain of the CD33 predominant form with Arg in position 69, colored in red (PDB ID 5IHB). (**B**) Surface representation of functional domain of the CD33 variant form with Gly in position 69, colored in red (PDB ID 5IHB). The areas in orange indicate the binding sites of sialic acid, in stick representation, in correspondence with the positively charged arginine residues with which the ligand establishes saline bridges. The values of the binding energy in kcal/mol are reported for each binding site of sialic acid.

We found similar results when we performed docking analyses on the other two

structures: the N‐terminal domain (aa19‐135) (PDB ID 5J06, 5J0B), which carried a glycine residue in position 69; and the repeated structural type C2 Ig domain (aa145‐228). In the PDB‐ID 5J06 structure, the values were −4.7 kcal/mol in Arg119 proximity and −5.2 kcal/mol in Arg91 proximity; while in the PDB‐ID 5J0B structure, the predicted binding affinity values were −4.7 kcal/mol in Arg119 proximity and −5.3 kcal/mol in Arg91 proximity (Supplementary Figure S2). We confirmed these results through the docking analyses carried out on the other three CD33 structures having only the N‐terminal domain (PDB ID 6D48, 6D49, and 6D4A). In fact, we always observed a greater binding affinity in the Arg91 residue part than in the Arg119 part of the functional domain (Supplementary Figure S3), suggesting a better chance of binding at this site. This result was in contrast with the evidence from the binding of a sialic acid mimetic ligand FVP in the proximity of Arg119, as found by others [32]. However, by analyzing the asymmetric unit assembly of each resolved structure in that study, we observed the formation of dimers and tetramers involving interfaces near Arg91, which likely prevented the binding of ligands near this residue. Considering that the most reactive side of the D2:IgV domain is represented by the positively charged patch comprising arginine residues 89 and 91, the oligomerization trend involving this surface could be explained easily. Notwithstanding We found similar results when we performed docking analyses on the other two structures: the N-terminal domain (aa19-135) (PDB ID 5J06, 5J0B), which carried a glycine residue in position 69; and the repeated structural type C2 Ig domain (aa145-228). In the PDB-ID 5J06 structure, the values were−4.7 kcal/mol in Arg119 proximity and −5.2 kcal/mol in Arg91 proximity; while in the PDB-ID 5J0B structure, the predicted binding affinity values were −4.7 kcal/mol in Arg119 proximity and −5.3 kcal/mol in Arg91 proximity (Supplementary Figure S2). We confirmed these results through the docking analyses carried out on the other three CD33 structures having only the N-terminal domain (PDB ID 6D48, 6D49, and 6D4A). In fact, we always observed a greater binding affinity in the Arg91 residue part than in the Arg119 part of the functional domain (Supplementary Figure S3), suggesting a better chance of binding at this site. This result was in contrast with the evidence from the binding of a sialic acid mimetic ligand FVP in the proximity of Arg119, as found by others [32]. However, by analyzing the asymmetric unit assembly of each resolved structure in that study, we observed the formation of dimers and tetramers involving interfaces near Arg91, which likely prevented the binding of ligands near this residue. Considering that the most reactive side of the D2:IgV domain is represented by the positively charged patch comprising arginine residues 89 and 91, the oligomerization trend involving this surface could be explained easily. Notwithstanding the site of binding,

our results suggested a general increase in the binding affinity and specificity of the glycine variant of CD33 toward ligands.

#### *2.4. Virtual Screening Analysis*

Plotting the carbon 1 atomic coordinate of the best sialic acid poses against the affinity values (kcal/mol) predicted by the docking analysis, we obtained a conceptual understanding of the ligand distributions on the protein surface with respect to the arginine or glycine variants in position 69 of the CD33 protein. The results highlighted a noteworthy increase in the affinity of several binding sites for sialic acid in the glycine 69 variant of CD33 (Supplementary Figure S4). Indeed, the analysis of the distributions of the best ligands on the D2:IgV domain of CD33, obtained by plotting the XYZ coordinates of the C1 atom of sialic acid structures (Supplementary Figure S5), showed binding in specific sites of CD33 receptors for both variants with arginine and glycine residues in position 69. When comparing the position of arginine residues in the D2:IgV domain of the PDB-ID 5IHB structure with the best ligand poses by overlaying the YZ coordinates of the Cζ atom of arginine sidechains with the coordinates of C1 atom of sialic acid poses (Supplementary Figure S6), we observed a number of sialic-acid-binding molecules close to arginine residues 91, 98, and 119. However, the C1 atom of sialic acid near the approximate Z coordinate 70,000, included in the range of Y coordinates 85,000–95,000 and 110,000–115,000, were instead close to the arginine 62 and 89 residues in the 3D representation, respectively. It was apparent that ligands basically spread out on the surface of the Arg69 structure of CD33, with most of the sialic acid binding near arginine residues 91, 119, 89, 98, and 62, ordered by the number of bonded ligands (Supplementary Figure S6A). In contrast, an increased affinity and number of sialic acid molecules was predicted near the position of arginine 91 in the Gly69 variant of CD33 (Supplementary Figure S6B).

These data suggested the possibility of the presence of more than one binding site for sialic acid derivatives with a similar specificity. In fact, it was not surprising that the more reactive domain near arginine 91 was involved in the binding at the interfaces between the CD33 structures (single and double domains) in the crystallographic unit cell, preventing the recognition of this other sialic-acid-binding site in the crystallographic structures resolved in presence of ligands [32].

The 3D representations in Figure 4 illustrate the best orientations of sialic acid molecules in the CD33 binding sites obtained by virtual screening analysis. In the CD33 structure presenting an arginine residue in position 69, we observed the prevalent distribution of the ligands in the two already identified binding sites involving arginine 91 and 119 (Figure 4A). However, two additional binding sites were predicted near arginine 62 and 89. The number of ligands in these two secondary sites increased in the representation of the second-best poses (Figure 4B), having a lower affinity with respect to the previous ones. In contrast, the best poses of sialic acid molecules were located exclusively near arginine 91 and 119 in the Gly69 variant of CD33 (Figure 4C), and only a few ligand poses were predicted near arginine 62 and 89 (Figure 4D). In addition, when observing the manner the sialic acid binds to the sites on the CD33 surface, we can point out that the orientation of sialic acid molecules was very often preferentially maintained in the binding sites of the Gly69 variant of CD33 (Figure 4). Thus, in addition to the increase in affinity, it was likely that the Gly69 variant of CD33 bound to the sialic acid with a higher specificity than the Arg69 variant. These results supported the idea that the balance of the binding equilibrium constant for CD33–sialic acid interaction changed slightly in a positive direction in the Gly69 variants. This suggested a more efficient binding of sialic acid at a lower concentration by this CD33 variant.

**Figure 4.** Detail of the binding sites in the D2:IgV domain for sialic acid molecules, as derived from docking analyses. Surface representation, in light gray, of the N‐terminal D2:IgV functional domain of arginine 69 models of CD33, in which the docked poses of sialic acid molecules with the best (**A**) and the second best (**B**) ligand mode rmsd are shown in stick figures. Surface representation, in light gray, of the N‐terminal D2:IgV functional domain of glycine 69 models of CD33, in which the docked poses of sialic acid molecules with the best (**C**) and the second best (**D**) ligand mode rmsd are shown in stick figures. The surfaces of arginine residues 62, 69, 89, 91, and 119 are presented in light blue. **3. Discussion Figure 4.** Detail of the binding sites in the D2:IgV domain for sialic acid molecules, as derived from docking analyses. Surface representation, in light gray, of the N-terminal D2:IgV functional domain of arginine 69 models of CD33, in which the docked poses of sialic acid molecules with the best (**A**) and the second best (**B**) ligand mode rmsd are shown in stick figures. Surface representation, in light gray, of the N-terminal D2:IgV functional domain of glycine 69 models of CD33, in which the docked poses of sialic acid molecules with the best (**C**) and the second best (**D**) ligand mode rmsd are shown in stick figures. The surfaces of arginine residues 62, 69, 89, 91, and 119 are presented in light blue.

#### Although the association of CD33 with Alzheimer's diseases has not been thoroughly **3. Discussion**

elucidated, literature data supports the hypothesis that through the modulation of immune cell functions (such as phagocytosis, cytokine release, apoptosis, etc.), CD33 could be implicated in the suppression of amyloid fragment uptake [12,18,23,25–27]. Here, we demonstrated a correlation between an SNP variant of the *CD33* gene (SNP rs2455069‐ A>G) and Alzheimer's disease risk, and proposed a new hypothesis for its functional role. Using different in silico approaches, we demonstrated that the binding site for sialic acid could involve additional regions of the protein to the previously documented Arg119 residue. This residue was demonstrated to be essential for the binding of the sialic acid mimetic [32]. Indeed, it was assumed that the binding site for sialic acid was located at the D2:IgV domain of the CD33 receptor, and contained the arginine 119 residue, which was positively charged at physiological pH. Because sialic acid molecules are typically terminal residues on glycoconjugates, the positively charged arginine forms salt bridges with the sialic acid portion of sugar residues, enabling stable interactions [18,33–35]. Due to the presence of FVP (a subtype‐selective sialic acid mimetic ligand) in the 6D49 and 6D4A structures (recently also demonstrated in the 7AW6 structure), the putative binding site was located near the Arg69 and Arg119 positions (Supplementary Figure S7). In agreement with those results, we confirmed the binding of sialic acid to Arg119, but also demonstrated the possibility that the ligand may bind to other regions of the protein with higher affinity that include Arg91. The presence of an additional binding site was recently reported for another Siglec family member [36], pointing out the need to further explore the binding properties of CD33 toward sialic‐acid‐containing molecules. A mutation of arginine 119 to alanine did not affect phagocytosis of U937 cells [27], raising a question on the role of this residue in sialic‐acid‐containing glycan binding, and supporting a hypothesis of additional binding sites for sialic acid. Our *in‐silico* analyses indeed supported the possibility of additional binding sites for sialic acid. In addition, our studies Although the association of CD33 with Alzheimer's diseases has not been thoroughly elucidated, literature data supports the hypothesis that through the modulation of immune cell functions (such as phagocytosis, cytokine release, apoptosis, etc.), CD33 could be implicated in the suppression of amyloid fragment uptake [12,18,23,25–27]. Here, we demonstrated a correlation between an SNP variant of the *CD33* gene (SNP rs2455069- A>G) and Alzheimer's disease risk, and proposed a new hypothesis for its functional role. Using different in silico approaches, we demonstrated that the binding site for sialic acid could involve additional regions of the protein to the previously documented Arg119 residue. This residue was demonstrated to be essential for the binding of the sialic acid mimetic [32]. Indeed, it was assumed that the binding site for sialic acid was located at the D2:IgV domain of the CD33 receptor, and contained the arginine 119 residue, which was positively charged at physiological pH. Because sialic acid molecules are typically terminal residues on glycoconjugates, the positively charged arginine forms salt bridges with the sialic acid portion of sugar residues, enabling stable interactions [18,33–35]. Due to the presence of FVP (a subtype-selective sialic acid mimetic ligand) in the 6D49 and 6D4A structures (recently also demonstrated in the 7AW6 structure), the putative binding site was located near the Arg69 and Arg119 positions (Supplementary Figure S7). In agreement with those results, we confirmed the binding of sialic acid to Arg119, but also demonstrated the possibility that the ligand may bind to other regions of the protein with higher affinity that include Arg91. The presence of an additional binding site was recently reported for another Siglec family member [36], pointing out the need to further explore the binding properties of CD33 toward sialic-acid-containing molecules. A mutation of arginine 119 to alanine did not affect phagocytosis of U937 cells [27], raising a question on the role of this residue in sialic-acid-containing glycan binding, and supporting a hypothesis of additional binding sites for sialic acid. Our *in-silico* analyses indeed supported the possibility of additional

also supported a hypothesis that the CD33‐R69G variant may bind sialic acid with greater

binding sites for sialic acid. In addition, our studies also supported a hypothesis that the CD33-R69G variant may bind sialic acid with greater specificity and affinity, and results in a more efficient binding of the variant to sialic acid, at lower concentrations than that of the wild type. Interestingly, the dependence of binding on sialic acid concentrations was consistent with observations that sialic acid content increases in old-age females [37], who have an increased risk of LOAD compared to males of the same age [38]. In the light of these observations, new perspectives have opened up in the study of CD33, with novel and interesting understandings of the interactions leading to late-onset Alzheimer's disease. Several polymorphisms in the CD33 gene could affect the structure of this receptor, and in turn, the binding affinity for sialic-acid-containing gangliosides. We hypothesized that the microglial phagocytosis, and in turn, the efficient removal of neurotoxic protein aggregates, might also depend on a variety of factors, including dietary habits, sex-specific accumulation of sialic acid, and aging, among others, affecting ganglioside expression in the brain. This may establish a complex inter-relationship between genetic risk factors and additional predictors in the progression of LOAD. Overall, our results led us to hypothesize that the rs2455069-A>G SNP is a risk factor for LOAD, and that the mechanism of the long-term cumulative effects of the CD33-R69G variant are mediated through an increased binding of sialic acid, which then acts as a more effective enhancer of the CD33 inhibitory effects on amyloid plaque degradation (Figure 5). specificity and affinity, and results in a more efficient binding of the variant to sialic acid, at lower concentrations than that of the wild type. Interestingly, the dependence of binding on sialic acid concentrations was consistent with observations that sialic acid content increases in old‐age females [37], who have an increased risk of LOAD compared to males of the same age [38]. In the light of these observations, new perspectives have opened up in the study of CD33, with novel and interesting understandings of the inter‐ actions leading to late‐onset Alzheimer's disease. Several polymorphisms in the CD33 gene could affect the structure of this receptor, and in turn, the binding affinity for sialic‐ acid‐containing gangliosides. We hypothesized that the microglial phagocytosis, and in turn, the efficient removal of neurotoxic protein aggregates, might also depend on a vari‐ ety of factors, including dietary habits, sex‐specific accumulation of sialic acid, and aging, among others, affecting ganglioside expression in the brain. This may establish a complex inter‐relationship between genetic risk factors and additional predictors in the progression of LOAD. Overall, our results led us to hypothesize that the rs2455069‐A>G SNP is a risk factor for LOAD, and that the mechanism of the long‐term cumulative effects of the CD33‐ R69G variant are mediated through an increased binding of sialic acid, which then acts as a more effective enhancer of the CD33 inhibitory effects on amyloid plaque degradation (Fig‐ ure 5).

*Int. J. Mol. Sci.* **2022**, *23*, 3629 8 of 13

**Figure 5.** Model describing the plaque accumulation during aging. The increased effect of the CD33‐ R69G variant in inhibiting the microglial phagocytosis results in a lower extent of clearance of am‐ yloid plaques, supporting late‐onset AD. **Figure 5.** Model describing the plaque accumulation during aging. The increased effect of the CD33-R69G variant in inhibiting the microglial phagocytosis results in a lower extent of clearance of amyloid plaques, supporting late-onset AD.

#### **4. Conclusions 4. Conclusions**

We analyzed the functional role of the SNP rs2455069‐A>G, which we found to be associated with Alzheimer's disease (AD) in a cohort of patients from southern Italy. The SNP changed a single amino acid in the sequence of CD33, and resulted in a local change in its protein structure, which we derived by in silico analysis. Our data demonstrated that this small change resulted in a slight increase in the binding affinity and specificity of CD33 toward sialylated molecules, its natural ligands. We propose that this change in binding enhanced the inhibitory effects of CD33 on amyloid plaque degradation, and to‐ gether with other genetic, epigenetic, and environmental factors, could predispose to the We analyzed the functional role of the SNP rs2455069-A>G, which we found to be associated with Alzheimer's disease (AD) in a cohort of patients from southern Italy. The SNP changed a single amino acid in the sequence of CD33, and resulted in a local change in its protein structure, which we derived by in silico analysis. Our data demonstrated that this small change resulted in a slight increase in the binding affinity and specificity of CD33 toward sialylated molecules, its natural ligands. We propose that this change in binding enhanced the inhibitory effects of CD33 on amyloid plaque degradation, and together with other genetic, epigenetic, and environmental factors, could predispose to the accumulation of amyloid plaques and the onset of AD pathology. These observations support a new approach to the investigation of protein polymorphisms involved in AD, and propose a need for structural analyses to complements genetic studies.

#### **5. Materials and Methods**

#### *5.1. Participant Recruitment and Analysis*

The participants in the study (*n* = 330) were recruited at the Centre for Research and Training in Medicine of Aging (CeRMA) at the University of Molise (Italy). The AD patients (*n* = 195) fulfilled the National Institute of Aging and Alzheimer's Association (NIA-AA) diagnostic criteria for "probable AD with documented decline" [39]. They scored <24 on the Mini Mental State Examination (MMSE) and >0.5 on the Clinical Dementia Rating (CDR). To rule out other potential causes of cognitive impairment, all patients underwent blood tests (including full blood count, erythrocyte sedimentation rate, blood urea nitrogen and electrolytes, thyroid function, vitamin B12, and folate) and brain imaging. One hundred and thirty-five sex/age-matched cognitively healthy subjects (HS) were recruited as a control group. DNA from all participants was analyzed for the presence of the rs2455069 SNP by high-resolution melting analysis (HRM). Genomic DNA was extracted from whole blood with a QIAamp DNA Blood Mini Kit (QIAGEN). PCR amplifications were performed on a SureCycler 8800 Thermal Cycler (Agilent, Santa Clara, CA, USA). The study was conducted in accordance with ethical principles stated in the Declaration of Helsinki, and with approved national and international guidelines for human research. The Institutional Review Board (IRB) of the University of Molise approved the study (IRB Prot. n. 16/2020). Written informed consent was obtained from each participant or caregiver.

#### *5.2. High-Resolution Melting Analysis (HRM)*

The high-resolution melting analysis (HRM) was performed as previously described [29] on a LightCycler 480 Instrument II (Roche, Basel, Switzerland) and analyzed using LightCycler® 480 Gene Scanning Software 1.5.1. High-resolution melting Mastermix (Roche) was used. Specific primers used for amplification and sequencing by the Sanger method and the electropherograms were analyzed with Chromas 2.22 software and aligned according to reference sequences present in GenBank (http://www.ncbi.nlm.nih.gov/GenBank/index. html; accessed on 1 September 2019). Templates were amplified in 96-well plates and ~30 ng analyzed in a 20 µL final volume reaction as previously specified [29].

#### *5.3. CD33 Structure Editing and Minimization*

Computer simulations were carried out on the 3D crystallographic structures available in the Protein Data Bank [31] with ID numbers 5IHB, 5J06, 5J0B, 6D48, 6D49, 6D4A, and 7AW6. The first four included the N-terminal domain (aa19-135) and the repeated structural domain Ig of type C2 (aa145-228), while the last three structures included only the N-terminal domain. All PDB structures of CD33 (excluding 6D48) incorporated ligands. Characteristics of the seven human CD33 structures (Siglec-3) are summarized in Supplementary Table S1. The PDB files were edited by using Pymol [40]. Any molecules bound in the protein catalytic sites and water molecules bound to the protein during the crystallization process were removed. The CHARMM-GUI web-based platform (http://www.charmm-gui.org; accessed on 15 February 2022) was used to extract a single monomer from each of the CD33 structures contained in the PDB files, and to generate the two single-nucleotide polymorphisms (SNPs) at position 69 for each monomer. All generated structures were optimized offline using the CHARMM program [41] by performing a basic energy minimization with the steepest descent (SD) algorithm of the initial 25–50 steps to remove bad van der Waals contact. A more precise minimization was carried out subsequently with the Newton–Raphson method (ABNR) of 1,000,000 steps to remove potential problems, such as incorrect contacts, collisions, and nonphysical contacts/interactions. When the step-to-step energy change was less than 0.001 kcal/mol, the structure was considered sufficiently minimized.

#### *5.4. Ligand Structure Editing and Minimization*

The sialic acid and similar ligand (30 -sialyllactose, 60 -sialyllactose) 3D structures were generated using Avogadro software (http://avogadro.cc; accessed on 15 February 2022), and the structures were optimized through the MMFF94 force field with the steepest descent algorithm until the ∆E reach a value approaching 0 (<10–10).

#### *5.5. Docking Analysis*

The minimized monomeric structures of CD33 were used for the docking analysis of sialic acid and similar ligand structures. The docking analysis was carried out using Autodock Vina [42]. The ADT software package was used to determine the grid sizes, visualize the ligand poses, and add hydrogen atoms to the templates [43]. We superimposed the different CD33 structures and generated a single grid box that included all the CD33 structures. The coordinates of the grid box were put in a configuration file (config file in Supplementary Materials) used to perform both the docking and the virtual screening analysis. During the docking procedure, both the proteins and ligands were considered as rigid, and the only torsion that was allowed was for the carboxylic and alcoholic groups of the ligand. A bash shell command was used to pass receptors, ligands, and grid box parameters to Autodock Vina, producing an output file containing the predicted models (ligand poses) and a log file for each analysis. The structures were analyzed and the images were produced by using the PyMOL [40] and the UCSF Chimera 1.13.1 (http://www.cgl.ucsf.edu/chimera; accessed on 15 February 2022) [44] molecular graphics software programs.

#### *5.6. Virtual Screening Analysis*

We prepared two datasets of 100 models each of the Arg69 and Gly69 variants of the CD33 structure (PDB ID 5IHB). We performed a basic energy minimization using the CHARMM program [41], as previously described, to make them more relaxed for subsequent analysis. Using an ad hoc script (see Supplementary Materials) to transfer receptors and grid box parameters for automating the docking process, we submitted the two structure datasets (200 structures total) for analysis by Autodock Vina against the sialic acid as ligand, retrieving at least eight poses for each CD33 structure.

#### *5.7. Statistical Analysis*

Data were analyzed using the SPSS (v. 17.0) statistical software package (SPSS Inc., Chicago, IL, USA). The frequencies of genotypes and allelotypes of the CD33 gene polymorphism were calculated and determined for deviation from the Hardy–Weinberg equilibrium (HWE). The Chi-squared test and odds ratio (OR) were used to evaluate the association of AD risk with different genotypes and alleles. A *p*-value of <0.05 was considered statistically significant.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ijms23073629/s1.

**Author Contributions:** F.T. and F.F. performed the molecular modeling and simulation; A.R. performed the in vitro experiments; A.A. and A.D.C. performed the collection of data from patients; A.R., A.D. and E.V. performed the analysis and comparison of experimental data; F.F. performed the analysis of in silico simulations; F.F. and E.V. conceived and designed the research; A.D., F.A., A.D.C., F.F. and E.V. interpreted the results; F.T., A.D., F.F. and E.V. wrote the manuscript; all authors edited the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** A.R. was supported by the EMBO Short-Term Fellowship 7532-2018-2019. F.T. was supported by the FESR Campania 2014-2020 (Asse 1—Obiettivo Specifico 1.2—Azione 1.2.2) (ex PON03PE\_00060\_4). A.D.C. and A.A. were supported by the MIUR/PRIN grant N. 2017T9JNLT.

**Institutional Review Board Statement:** The study was conducted in accordance with ethical principles stated in the Declaration of Helsinki, and with approved national and international guidelines for human research. The Institutional Review Board (IRB) of the University of Molise approved the study (IRB Prot. n. 16/2020).

**Informed Consent Statement:** Written informed consent was obtained from each participant or caregiver.

**Data Availability Statement:** All data are available in the main text or the Supplementary Materials.

**Acknowledgments:** Figure 5 contains graphs provided by SMART (Servier Medical Art) (https://smart. servier.com; accessed on 15 February 2022).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **rs7041 and rs4588 Polymorphisms in Vitamin D Binding Protein Gene (VDBP) and the Risk of Diseases**

**Dominika Rozmus 1,\* , Janusz Płomi ´nski 2,3, Klaudia Augustyn <sup>4</sup> and Anna Cie´sli ´nska <sup>1</sup>**


**Abstract:** The purpose of the study was to investigate the role of vitamin D binding protein (VDBP, DBP) and its polymorphism in the vitamin D pathway and human health. This narrative review shows the latest literature on the most popular diseases that have previously been linked to VDBP. Vitamin D plays a crucial role in human metabolism, controlling phosphorus and calcium homeostasis. Vitamin D binding protein bonds vitamin D and its metabolites and transports them to target tissues. The most common polymorphisms in the VDBP gene are rs4588 and rs7041, which are located in exon 11 in domain III of the VDBP gene. rs4588 and rs7041 may be correlated with differences not only in vitamin D status in serum but also with vitamin D metabolites. This review supports the role of single nucleotide polymorphisms (SNPs) in the VDBP gene and presents the latest data showing correlations between VDBP variants with important human diseases such as obesity, diabetes mellitus, tuberculosis, chronic obstructive pulmonary disease, and others. In this review, we aim to systematize the knowledge regarding the occurrence of diseases and their relationship with vitamin D deficiencies, which may be caused by polymorphisms in the VDBP gene. Further research is required on the possible influence of SNPs, modifications in the structure of the binding protein, and their influence on the organism. It is also important to mention that most studies do not have a specific time of year to measure accurate vitamin D metabolite levels, which can be misleading in conclusions due to the seasonal nature of vitamin D.

**Keywords:** VDBP; vitamin D binding protein; rs7041; rs4588; bone density; diabetes; obesity; COPD; pulmonary tuberculosis; SNP; MD; PD

#### **1. Introduction**

Recent data suggest that vitamin D deficiency is widespread across Europe. Analysis of 14 population studies revealed that 13% of the 55,844 European individuals had average yearly serum 25(OH)D concentrations <30 nmol/L, regardless of age group, ethnic mix, and the latitude of study populations [1]. According to the US Endocrine Society definition of vitamin D deficiency (<50 nmol/L), the prevalence was 40.4% and dark-skinned ethnic subgroups were more likely to be vitamin D deficient [2]. Due to the darker skin color, melanin blocks the UVB radiation, which is necessary for vitamin D synthesis. Hypovitaminosis D was highly prevalent among pregnant Bangladeshi women, and parity and gestational age were common risk factors of vitamin D deficiency [3].

1,25(OH)2D is considered to be the most powerful physiological agent. It stimulates the active transport of calcium, phosphorus, and magnesium. Disorders in vitamin D action can lead to a decrease in the net flux of minerals to the extracellular compartment,

**Citation:** Rozmus, D.; Płomi ´nski, J.; Augustyn, K.; Cie´sli ´nska, A. rs7041 and rs4588 Polymorphisms in Vitamin D Binding Protein Gene (VDBP) and the Risk of Diseases. *Int. J. Mol. Sci.* **2022**, *23*, 933. https:// doi.org/10.3390/ijms23020933

Academic Editor: Elixabet Lopez-Lopez

Received: 29 November 2021 Accepted: 13 January 2022 Published: 15 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

which can lead to hypocalcemia and secondary hyperparathyroidism [4]. In addition, low concentrations of calcium and phosphorus will lead to defective mineralization of the bone matrix and rickets [5,6]. Vitamin D is also a regulator of the immune system, where the expression of CYP27B1 in macrophages leads to local production of 1,25-dihydroxyvitamin D ((1,25[OH]2D)), which induces the expression of genes encoding antimicrobial peptides [7]. (1,25[OH]2D) induces and stimulates autophagy resulting in enhanced bacterial killing, suppresses production of pro-inflammatory cytokines, and prevents overstimulated immune response [8].

The purpose of the study is to investigate and systematize the current knowledge regarding the impact of VDBP polymorphisms on the risk of incidence of various diseases and human health. The "Vitamin D" (calciferol) term refers to two secosteroids: vitamin D2 (ergocalciferol) and vitamin D3 (cholecalciferol). They are both produced from sterol precursors with light in the UVB spectrum of 280 to 320 nm. In fungi and plants, ergosterol is the vitamins' D2 precursor, while the vitamins' D3 precursor is 7 dehydrocholesterol (7-DHC), and its high concentration was found in the skin [9]. Vitamin D2 and D3 differ in side chains, but both are converted to 25-hydroxyvitamin D 25(OH)D and 1,25-dihydroxyvitamin D [9,10]. 25OHD is considered to be the best reflection of the vitamin D level in serum [10]. Vitamin D3 can be synthesized endogenously under ultraviolet (UV) light [11].

Vitamin D photoproduction starts with 7-dehydrocholesterol (7-DHC), which is synthesized and built into the membranes of the epidermis and dermis [12]. During sunlight exposure, the epidermal 7-DHC is converted into pre-vitamin D3 [12,13]. Pre-D3 is thermoisomerized to the vitamin D3 in the cell membrane. The produced cholecalciferol is removed into extracellular space and reaches the skin's capillary by diffusion [12]. Prolonged exposure to solar ultraviolet radiation while pre-vitamin D3 synthesis reaches a plateau of 10 to 15 percent of the original 7-DHC has an increasing effect only on lumisterol and tachysterol, two biologically inactive photoisomers [13]. Lumisterol can revert back to pre-D3 in the dark, but maximum levels of pre-D3 lead to the accumulation of inactive luminsterol with continued UV exposure. The production of lumisterol and tachysterol has a protective effect against the production of toxic amounts of D3 [9]. During activation and inactivation processes, cytochrome P450 (CYP) enzymes are involved throughout the vitamin D3 pathway [14,15]. The pathway is presented in Figure 1 (based on [12,15–18].

The first step is the conversion of vitamin D to 25OHD in the liver by 25-hydroxylase enzymes. Enzymes with this activity are identified as: CYP3A4, CYP2R1, CYP27A1, CYP2J1, CYP2C11, CYP2D25 [19]. The production of the hormonally active form: 1α,24 dixydroxyvitamin D (1α,25(OH)2D3) is catalyzed by CYP27B1: 25-hydroxyvitamin D-1 α-hydroxylase. An active form acts with the vitamin D receptor (VDR) [19]. There is also an alternative pathway that starts with the action of CYP11A1 on D3 and produces 20(OH)D3, 22(OH)D3, 20,23(OH)2D3, 20,22(OH)2D3, and 17,20,23(OH)3D3. Hydroxylation of some of these metabolites can occur through the activity of CYP27B1 at C1α, by CYP24A1 at C24 and C25, and by CYP27A1 at C25 and C26 [16,20]. Possible pathways are shown in Figure 2 based on SNPedia base [21]. Mitochondrial CYP24A1 catalyzes the first step of 25(OH)D3 and 1,25(OH)2D3 degradation by 24- or 23-hydroxylation [22]. CYP11A1 was also found to be expressed in extrarenal and extragonadal tissues [23] and also in the immune system [17]. CYP11A1 vitamin D metabolites are also detectable in the serum [16].

Vitamin D receptor is a member of the nuclear receptor superfamily and plays a crucial role in the actions of vitamin D [24]. VDR mediates many genomic and nongenomic effects of vitamin D. Many biological pathways and networks are influenced by VDR, for example metabolism [25], including bone metabolism and remodeling [26,27], immunity and immune response [24,28], cell proliferation and differentiation [24], and cell health [29]. VDR regulates the expression of numerous genes and is involved in calcium/phosphate homeostasis [24]. Many genes are up-regulated (CYP24A1, osteocalcin or *RankI*) or down-regulated (parathormone—PTH, CYP27B1) due to VDR activation [24]. Slominski et al. reported the alternatives to VDR nuclear receptors that are involved in

vitamin D metabolite signaling pathways: retinoic acid-related orphan receptors (RORα–γ; NR1F1–3) [30], AhR [31], and LXR [32]. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 19 *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 19

**Figure 1.** Vitamin D metabolic pathways based on Bikle (2014), Slominski et al. (2012, 2015, 2020), Rozmus et al. (2020) [12,15–18]. Figure fully created with biorender.com (accessed on 1 November 2021). **Figure 1.** Vitamin D metabolic pathways based on Bikle (2014), Slominski et al. (2012, 2015, 2020), Rozmus et al. (2020) [12,15–18]. Figure fully created with biorender.com

**Figure 2.** VDBP gene and its polymorphisms (based on the SNPedia database [21]).

**Figure 2.** VDBP gene and its polymorphisms (based on the SNPedia database [21]).

**Figure 2.** VDBP gene and its polymorphisms (based on the SNPedia database [21]). Vitamin D binding protein was initially named the 'group-specific component' (Gc) by Hirschfield in 1959 after isolation from the α2-globulin portion of plasma [33]. As a result of the binding and transport of vitamin D analogs, the DBP name was adopted. After discovering macrophage-stimulating activities, VDBP was renamed as the macrophageactivating factor (GcMAF/DBP-MAF) [34]. The name has been changed several times, as

many different biological functions of VDBP have been discovered [34,35]. VDBP binds to fatty acids and actin monomers and also has immune functions independent of vitamin D transport [35], such as binding to leukocyte membrane proteoglycans and the activation of the complement C5 system [18]. VDBP is well known for its single nucleotide polymorphisms (SNP), and the most common are rs7041 and rs4588, located in exon 11 of the VDBP gene [18,36]. SNPs are the most abundant genetic variants in genomes [37]. SNPs may affect protein stability, folding, flexibility, and aggregation; functional sites, reaction kinetics, and dependence on environmental parameters, such as pH, salt concentration, and temperature; protein expression and subcellular localization; and protein–small molecule, protein–protein, protein–DNA, and protein–membrane interactions [38]. Many studies have shown associations between SNPs and the concentration of protein as well as substance protein transport via VDBP in this particular case [18,39,40]. The effects of vitamin D supplementation according to the most common polymorphisms of the vitamin D binding protein was studied by Al-Daghri et al., and it was shown that 25[OH]D concentrations were significantly higher among people with the major homozygous rs7041 genotype. Post supplementation 25[OH]D was higher in participants carrying homozygous major genotypes in rs4588 and rs7041 compared to other genotypes [41].

The vitamin D binding protein has a single binding site for all vitamin D metabolites and has a high affinity for 25OHD and 1,25(OH)2D [42], but has no affinity for lumisterol and minimum affinity to tachysterol [13].

Possible haplotypes are as follows: Gc1f(1f): rs7041(T) + rs4588(C); Gc1s(1s): rs7041(G) + rs4588(C); Gc2(2): rs7041(T) + rs4588(A). According to some publications, in which rs2282679 is used as a proxy for rs4588, rs2282679(A) is typically coinherited with rs4588(C) and vice versa. Since human have two copies of each gene, it leads to six possible VDBP phenotypes [21] presented in Table 1. Table 2 present SNPs of VDBP, and chromosome location. Table 3 present the frequency of rs7041 and rs4588 among populations and geographic regions. Figure 2 show the location of the most common SNPs of the VDBP gene, and their loci in exons and introns are pointed out with arrows.

**Table 1.** Characteristics of vitamin D binding protein polymorphisms in most common variants (based on Rozmus et al. 2020 with modifications [18,21]).


**Table 2.** SNPs alleles and chromosome location (based on SNPedia [21]).



**Table 3.** Frequencies of alleles in rs4588 and rs7041 among different populations and geographic regions based on [43,44].

\* [42]. \*\* [43]. \*\*\* nd means no data.

Our review focuses on the most important correlations between VDBP polymorphisms and selected diseases described in the latest scientific reports garding obesity, polycystic ovary syndrome, metabolic syndrome, diabetes mellitus, asthma, pulmonary tuberculosis, chronic obstructive pulmonary disease, coronary artery disease, multiple sclerosis, and Parkinson's Disease. This study is supplementation of our previous review concerning the role of VDBP on malignant tumors [18].

#### **2. Diseases**

#### *2.1. Bone Density*

Osteoporosis is a skeletal disease that affects women older than 50 years of age. During the past few years, it has been a serious public health problem because of the high socioeconomic burden. Patients suffer from deterioration of bone microarchitecture, low bone mineral density (BMD), and increased risk of fragility fractures [45]. The first studies on the effect of vitamin D supplementation on bone density showed that vitamin D (with calcium) reduced bone loss measured in the femoral neck, spine, and total body during the 3-year study and reduced the incidence of non-vertebral fractures [46]. The study by Martinez-Aguilar et al. supports the correlation of low serum VDBP levels with low BMD (osteopenic and osteoporotic). VDBP could be considered a novel, potential, and non-invasive biomarker for the early detection of osteoporosis [45]. The study by Rivera-Paredez et al. supports the association of VDBP and bone health. The article showed that the rs7041 G allele is associated with a higher level of VDBP and BMD compared to homozygous TT. The A allele of rs4588 was associated with a lower VDBP and BMD

compared to homozygous CC. Among men, no association was found between these polymorphisms and VDBP, but GC variants were associated with VDBP levels. In both the women and men subgroup, no association was observed between free and bioavailable 25(OH)D and BMD [47]. Among women and adolescents, the GC genotype was associated with susceptibility to low 25(OH)D levels. The study included 198 healthy girls aged 10–18 years. The AA genotype of rs4588, TT genotype of rs7041, and CT-AT/AT-AT (GC 1f-2/2-2) genotypes were significantly associated with lower 25(OH)D levels, even after adjustment for age and season at the time of blood collection [48]. Studies of Lauridsen et al. also support the VDBP role in premenopausal bone fracture risk among white women aged 45–58, as Gc2-2 is considered to increase the risk of bone fracture compared to Gc1- 1 [49]. The results of Ezura et al. indicated a complex combined effect of VDBP SNPs that underlie susceptibility to low BMD and osteoporosis. The genotyping of 13 SNPs among 384 participants and the analysis of results showed that not only a single SNP, but also a combination of them could act as a risk factor of osteoporosis. Five SNPs (39C > T, IVS1 + 827C > T, IVS1 + 1916C > T, IVS1 − 1154A > G, and IVS11 + 1097G > C) had a significant correlation with radial BMD, and IVS11 + 1097G > C located in intron 11 was the most correlated [50].

#### *2.2. Obesity*

Obesity contributes to reduced life expectancy, poor quality of life, cardiovascular diseases, type 2 diabetes, osteoarthritis, and cancer [51]. Serum vitamin D was found to be lower in obese people [52]. Obesity increases the risk of vitamin D deficiency among different population groups. Higher body mass index (BMI), waist circumference, and the sum of skinfolds were statistically significantly associated with lower 25[OH]D levels and with higher levels of PTH [53]. According to genetic studies, higher adiposity also causes an increased concentration of 25-hydroxyvitamin D, which is used as a vitamin D status indicator [54]. Another study showed that among the haplotypes rs7041 and rs4588, GC2-2 (rs7041 AA and rs4588 TT) has the lowest 25[OH]D levels compared to other haplotypes that contained at least one copy of the Gc1 allele (*p* < 0.0001) [55]. Interestingly, it was also observed that VDBP gene rs7041 polymorphism might be associated with the risk of obesity. In obese patients, a difference was found in the gene type TG + GG and TT frequency of rs7041 between obesity and control groups (*p* = 0.020). The G allele frequency was higher compared to the control group (*p* = 0.023). The TG and TG + GG of VDBP gene rs7041 polymorphism increased the risk of obesity after including age and gender [39]. The study of Almesri et al. showed that rs7041-G and the rare GG genotype were associated with an increase in BMI (*p* = 0.007 and *p* = 0.012, respectively) and had no influence on 25OHD3 levels. On the other hand, rs2282679 (A) and rs4588 (C) were associated with low 25[OH]D3 plasma levels (*p* = 0.039 and *p* = 0.021, respectively). There was no association between rs2282679 (A), rs4588 (C), and BMI in general, but after categorizing patients into subgroups based on their sex, it was shown that rs7041 GG was associated with high BMI in females (*p* = 0.003), and rs4588 CC was associated with high BMI in females (*p* = 0.034) and low levels of 25OHD3 in males (*p* = 0.009). Furthermore, rs12721377 AA was associated with low 25[OH]D3 levels in females (*p* = 0.039) [56].

#### *2.3. Polycystic Ovarian Syndrome (PCOS) and Metabolic Syndrome (MetS)*

All participants from Santos et al. were genotyped for polymorphisms rs2282679, rs4588, and rs7041, and serum 25(OH)D levels were determined. Women with PCOS were at a younger age and had significantly higher body mass index, blood pressure, and insulin resistance than the control group (*p* < 0.05). The 25(OH)D levels were lower among PCOS women with MetS, but no association was observed between PCOS and polymorphisms of VDBP. Above that, PCOS participants with MetS had a higher frequency of the TT genotype in rs7041 [57]. The study by Haldar et al. did not show significant differences in the frequency of rs7041, rs4588, and rs2060793 genotypes in PCOS and control women. The GT allele of rs7041, as well as the allelic combination of Gc1F/1F (T allele of rs4588 and C

allele of rs7041; *p* value = 0.03), were associated with an increased risk of developing PCOS in vitamin D deficient women [36].

#### *2.4. Postmenopausal Women*

Postmenopausal women can exhibit biochemical signs of vitamin D insufficiency. Vitamin D is related to bone integrity, and 25-hydroxyvitamin D is a reliable clinical indicator of vitamin D status. Low levels of vitamin D have been linked to secondary hyperparathyroidism, increased bone turnover, reduced BMD, and increased risk of osteoporotic fractures [58]. As VDBP plays a crucial role in vitamin D transport, the study of Pop et al. showed that lower estradiol levels are associated with lowering VDBP levels [59]. Studies by Sinotte et al. showed that 25(OH)D concentrations in premenopausal women are strongly associated with higher VDBP polymorphisms. Rs7041 and rs4588 were associated with lower 25[OH]D concentrations. Rare alleles of rs7041 (TT genotype) and rs4588 (AA genotype) were associated with the lowest levels of vitamin D3 in a period of low (November to April) and high (May to October) vitamin D load [60]. The study by Alharazy et al. shows that rs7041 among postmenopausal women in Saudi Arabia was associated with total 25(OH)D, and rs4588 did not show an association with total or free levels of 25(OH)D [40]. The results presented by Santos et al. suggest that rs2282679 and the DBP GC2 isoform are related to lower serum levels of DBP and with susceptibility to 25(OH)D deficiency in adults and postmenopausal women [57]. Lauridsen et al. showed that the DBP-phenotype is linked with premenopausal bone fracture risk in perimenopausal white women (595 subjects, age 45–58). There was a significant difference in bone fracture risk among women with different DBP-phenotypes (relative risk of 0.32 in Gc2-2, compared with Gc1-1) [49].

#### *2.5. Diabetes Mellitus (DB)*

Diabetes mellitus is a group of dysfunctions as a result of hyperglycemia characterized by insulin resistance (IR) and secretion or excessive secretion of glucagon. Of the two types of diabetes, type 2 (T2D) is more common and connects a problem of impaired glucose regulation with a combination of dysfunctional pancreatic beta cells and insulin resistance. However, the main risk is obesity (where abdominal is the highest risk of all types). Type 1 (T1D) is an autoimmune disorder that leads to the destruction of pancreatic beta-cells [61].

#### 2.5.1. Diabetes Type 1 (T1D)

The meta-analysis of the five studies presented showed no association between rs7041 and rs4588 polymorphisms with the risk of T1D [62].

#### 2.5.2. Diabetes Type 2 (T2DM)

The study by Zhao et al. showed that there was a significant multiplicative interaction between rs7041 and body mass index (BMI) associated with elevated blood glucose levels and a higher BMI (>28.47), and the carrying allele G was given a stronger effect than the genotype of TT. In conclusion, the interactions between GC rs7041–CYP2R1 (enzyme in the vitamin D metabolic pathway), rs1993116, and GC rs7041-BMI may explain the mechanisms by which these may increase the risk of developing T2DM [63]. Another study by Fawzy et al. examined the frequency distribution of GC-rs7041 and showed no difference between patients and healthy controls, while GC-rs4588 showed an association with T2DM in all genetic models. The rs4588 AA variant was correlated with higher serum GC globulin, albuminuria, and poor glycemic control. On the other hand, a higher frequency of rs7041\*TT and rs4588\*AA was noticed in the macroalbuminuria vs. normoalbuminuric group. Patients with the GC-2 haplotype were approximately 2.5 times more likely to develop diabetes and had higher levels of albuminuria [64]. Other results were obtained in the experiment by Klahold et al. during a case-control cohort study that was conducted to investigate an association of SNPs in the vitamin D metabolic pathway with T2D. Up to 464 T2D patients and 292 healthy controls were genotyped. Patients with

genotypes CYP27B1 rs10877012 "CC" (pc = 4 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ), VDBP rs7041 "GG" (pc = 0.003), rs4588 "CC" (pc = 3 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ), CYP24A1 rs2585426 "CG" (pc = 0.006), and rs2248137 "CG" (pc = 0.001) showed lower 25(OH)D3 and VDBP rs4588 "CC" lower 1,25(OH)2D3 levels (pc = 0.005). This study supports that vitamin D deficiency is highly prevalent in type 2 diabetes and most patients are also functionally affected by low levels of the active metabolite 1,25(OH)2D3. Furthermore, vitamin D system genes can affect the risk of type 2 diabetes and 25(OH)D3 concentration when compared to the healthy group. Despite this, the underlying mechanism has not been clarified, and trials, as well as functional studies, appear to be necessary to identify mechanisms by which the vitamin D system affects the pathophysiology of T2DM [65].

#### *2.6. Asthma*

Asthma is a chronic disease affecting inflammation in the lungs and airways. Common symptoms of the disease are a cough, chest tightness, and shortness of breath. Inflammation causes an overabundance of eosinophils, mast cells, activated T helper lymphocytes, and aids in identifying inflammatory mediators [66,67]. Vitamin D has been studied in asthma progression. Levels of vitamin D were significantly decreased in asthmatic patients in comparison to control patients. Results of genotyping the rs7041 among Kurdish patients showed that the GG genotype, as well as VDBP levels, was increased among the asthmatic group compared to the healthy controls (*p* = 0.003). Asthma progression was increased among patients carrying the rs7041 GG genotype [68]. A study by Fawzy et al. on s group of Egyptian patients showed that the rs7041 GG genotype is correlated significantly with asthma disease, while rs4588CA and AA genotypes were found as protective [69]. Another study supported the importance of rs7041 with the GC1S haplotype, especially among children diagnosed with asthma. The GC1S haplotype was considered to increase the risk of respiratory syncytial virus bronchiolitis in childhood and subsequent asthma development. The GC1s haplotype is associated with higher VDBP levels, which results in less free available vitamin D [70]. A study by Paraskakis et al. on a group of children with asthma showed a higher frequency of the rs7041 G allele and the A allele in rs4588 as a lower frequency allele in children with controlled asthma [71].

#### *2.7. Pulmonary Tuberculosis (PTB, TB)*

Tuberculosis is most often acquired through mycobacterial inhalation. It starts as an infection focus in the lung parenchyma (primary tuberculosis). It begins with necrotizing bronchopneumonia and progresses rapidly to necrotizing granuloma. Mycobacteria from the lungs spread only through the lymphatic system to the lymph nodes of the hilum drainage, where they cause necrotizing granulomatous inflammation [72]. As innate immunity plays an important role in the pathophysiology of tuberculosis, vitamin D with its transporter protein VDBP and its nuclear receptor vitamin D receptor can play a potential role in altering host defense against Mycobacterium tuberculosis. Decreased serum levels of vitamin D were observed in active TB patients as compared to healthy controls (*p* < 0.001) 209 [73]. The study by Zhang et al. showed [74] two less common VDBP polymorphisms, rs3733359 GA and rs16847024 CT, that were significantly associated with a reduced risk of PTB, as well as alleles rs3733359 A and rs16847024 T that were associated with a decreased susceptibility to PTB. In one of the most common polymorphisms, rs4588, the GT genotype was significantly higher in patients with PTB when compared to controls. The findings of Harishankar et al. [75] support those mentioned above, as the CA genotype rs4588 was associated with susceptibility to TB [OR: 1.47 (0.85–2.55); *p* = 0.049] and associated with 47.4% deficiency of 25(OH)D in patients with PTB, but the AA genotype was significantly associated with protection from TB [OR: 0.14 (0.02–1.29); *p* = 0.042]. No association was found with rs7041 polymorphism. Gene variants with 25(OH)D deficiency did not reveal a significant association due to the limited sample size, but the results showed a tendency towards 25(OH)D deficiency in rs7041 TG and rs4588 CA [75].

#### *2.8. Chronic Obstructive Pulmonary Disease*

Vitamin D deficiency was associated with increased risks of chronic obstructive pulmonary disease (COPD). However, the mechanism remains unknown. The vitamin D metabolite 1,25(OH)2D3 reinforced physical interactions between the vitamin D receptor with NF-κB p65 and c-Jun. The results of Fu et al. show that vitamin D is inversely correlated with inflammatory signaling in patients with COPD, and vitamin D may be a vital mediator of the progress of COPD in patients with low vitamin D levels [76]. The study of Li et al. also shows that COPD patients are at high risk of vitamin D deficiency, and the severity of COPD is inversely correlated with vitamin D levels. Furthermore, the homozygous carrier of the rs7041 T allele influences serum levels of 25OHD and is related to the susceptibility of COPD, which could be a potential candidate gene for screening COPD [77]. Among COPD smokers, high frequencies of rs7041/rs4588 haplotypes were homozygous GC1S/1S (42.5%), and higher levels of VDBP in the sputum were observed in stage I and II of COPD, only in the genotype GC1S/1S compared to non-smokers (*p* = 0.034 and *p* = 0.002, respectively) [78]. The studies of Horita et al. included 1712 patients and 1181 non COPD controls among Asians and Caucasians. The prevalence of each allele among the control group was: GC-1F 14.0%, GC-1S 53.8%, and GC-2 31.9%. Compared to GC-1S, the GC-1F allele and the GC-2 allele were associated with the risk of COPD with pooled odds ratios of 1.44 (95% CI 1.14–1.83, *p* = 0.002) and 0.83 (95% CI 0.69–0.996, *p* = 0.045), respectively. In comparison to the 1S-1S genotype, the 1F-1F genotype was a risk factor of COPD with a pooled odds ratio of 2.64 (95% CI 1.29–5.39, *p* = 0.008). The VDBP GC-1F allele was a risk for COPD in the recessive model [79]. Another study showed that patients carrying C allele at rs4588 exhibited a higher frequency of exacerbations (*p* = 0.0048), and a greater susceptibility to chronic obstructive pulmonary disease (*p* = 0.0003), as well as emphysema (*p* = 0.0029), and a tendency for rapid decline of airflow obstruction (*p* = 0.0927) [80]. The meta-analysis of Khanna et al. proves that VDBP is a major determinant of vitamin-D metabolism and transport, showing that alleles GC1F and GC1F/1F posed a risk of COPD. GC1S-1S was found to be a risk only among European participants in these studies [81].

#### *2.9. Coronary Artery Disease*

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, and among CVD coronary artery disease (CAD), they are almost half of all cardiovascular deaths and the most common cause of death [82]. VDBP and its genetic polymorphisms have been linked as susceptible components for CAD [83]. The study by Peri et al. showed evidence of the association of rs4588 and rs7051 with CAD cases among patients after acute myocardial infarction and correlations of these polymorphisms with serum levels of 25-hydroxyvitamin D 25(OH)D. Rs4588 T/T was determined as a susceptibility factor for anteroseptal myocardial infarction, where the same genotype was generally more prevalent in smokers [82]. A study by Tarighi et al. among the Iranian population showed a significant association between the GG genotype (rs7041) and CAD (*p* = 0.02, OR = 0.5737 95% CI = 0.304–0.944) [83]. Daffara et al., in a group of 1080 patients, proved that 57% carried the mutated G allele of rs7041, while 22% carried the A allele of rs4588. In addition, higher C-reactive protein levels were observed in the carriers of the G allele of rs7041 (*p* = 0.02), and 25-hydroxyvitamin D levels were similar between the groups. The rs4588 A allele was associated with a higher prevalence of lesions in the left anterior descending artery and a longer lesion length (*p* = 0.04 and *p* = 0.03, respectively). Rs7041 and rs4588 did not affect the prevalence of CAD [84].

#### *2.10. Multiple Sclerosis (MS)*

Numerous studies suggest that vitamin D levels affect the risk of multiple sclerosis development and modify disease activity [85–89]. Munger et al. found an inverse relationship between serum 25(OH)D level and risk of MS among Caucasians. No similar association was found among those with darker skin or Hispanics [88]. According to the research, these groups have lower 25-hydroxyvitamin D levels with no significant health

consequences [90,91]. There are racial/ethnical differences in the polymorphism of the vitamin D-binding protein-SNPs at rs7041 and rs4588. The dominant allele in rs7041 in Caucasians is the minor allele in those with African heritage. The VDBP isoform most commonly found in individuals with darker skin is the most efficient transporter of 25 hydroxyvitamin D and its metabolites. The MS Sunshine study found that there is a strong association between higher lifetime sun exposure and MS risk across racial/ethnic groups. Low ultraviolet radiation from the sun should lead to low vitamin D status and can explain the geographic distribution of the disease. There is a lack of association between 25-hydroxyvitamin D and MS risk among those with African heritage and Hispanics [92]. Langer-Gould et al. suggested that these differences cannot be explained by racial/ethnic variations in bioavailable vitamin D [92]. Xin Zhang et al. provide evidence that VDBP rs7041 and 4588 polymorphism may not be associated with an increased risk of multiple sclerosis in the meta-analysis [62].

#### *2.11. Parkinson's Disease*

Low vitamin D status is suggested to be associated with Parkinson's Disease [93–95]. Knekt et al. found that individuals with a serum vitamin D concentration of at least 50 nmol/L had a 65% lower risk than those with values below 25 nmol/L, after the adjustment of several potential cofounders, in the follow-up for Parkinson's disease occurrence of the Mini-Finland Health survey [96]. Zheng Lv et al. suggested that patients with 25(OH)D level <50 nmol/L experienced a twofold increased risk of PD in the meta-analysis [97]. In the prospective cohort study of 137 patients with Parkinson's disease, circulating 25 hydroxivitamin D levels were deficient in one-half of the patients [98]. GC polymorphism was associated with 25(OH)D levels-TT carriers for GC1, and AA carriers for GC2 had lower vitamin D status. There was no significant association between GC polymorphism and 1,25(OH)D levels. SNPs of VDBP showed no significant association with the severity of PD. In another study of 382 patients and 242 controls in a Turkish cohort, only rs7041 was associated with PD risk [94]. Significantly higher levels of serum 25-hydroxyvitamin D was observed in the group of homozygous major allele carriers for rs2282679, rs3755967, and rs2298850 with slower progression of the disease. In the proteomic studies, a decreased level of VDBP in the CSF has been suggested to be a biomarker of disease [99].

All the information above in the Section 2 is gathered into summary table—Table 4.


**Table 4.** Summary of mentioned diseases in the Section 2. Most common VDBP polymorphisms with effects.


#### **Table 4.** *Cont.*


**Table 4.** *Cont.*

#### **3. Research Limitations on Vitamin D**

Vitamin D and its metabolites have some measurement limitations. Even if both serum and plasma can be used for vitamin D metabolite measurements, serum is preferred due to the fact that it is free from anticoagulants (heparin, EDTA, citrate). EDTA, heparin, and citrate may interfere with measurements [100]. The second problem in measuring is the stability of vitamin D metabolites because metabolites are only stable due to the binding to VDBP and proper storing: room temperature, 4 ◦C, or frozen. In case of separation from VDBP, storing at −70 ◦C is required [101]. Seasonal variation of vitamin D also needs to be considered as vitamin D biosynthesis is sun-dependent and depends on geographical locations and seasons (highest levels of vitamin D during the summer, lowest during the winter) [102]. This information may provide many significant differences in studies concerning vitamin D and its metabolites and its connection to polymorphisms occurring in the vitamin D pathway. Genetic variants of VDBP [103], DHCR7 [103], CYP2R1 [104], CYP24A1 [104], VDR [104], CYP3A4 [105], CYP2R1 [105], CYP27B [105], and LRP2 [105] were found to be associated with 25(OH)D levels. There are still many polymorphisms that have not yet been included in any studies.

Other important factors are age, sex, BMI, and lifestyle. Age can play a role, mostly in the group of elderly people >75 years old, due to reduced calcium absorption, intestinal resistance of calcium absorption to circulating 1,25(OH)2D, decreased ability of skin in vitamin D producing, deficiency of vitamin D substrate [106], and less sunshine exposure [107]. BMI is also considered to increase with age. Vitamin D deficiency is prevalent in a group of obese people which suggests the correlation of higher levels of adipose tissue and lower levels of vitamin D status [108]. Skin color also plays a role in vitamin D levels, as darker skin tones protect from UVB irradiation and, consequently, increase the risk of vitamin D deficiency. In addition, darker-skinned individuals have slower vitamin D synthesis [100].

The liver and kidneys are the two most important organs in the metabolism of vitamin D. Decreased kidney and liver functions may provide calcitriol deficiency and disruptions in overall vitamin D catabolism [100]. Diseases associated with these organs and their relationship with vitamin D and its metabolic pathway still require more research.

We suggest that all experiments that include vitamin D and its metabolism should contain more specific information concerning the studied group, age, sex, and geographical location. Articles without such information may lead to misleading conclusions.

#### **4. Conclusions**

Vitamin D and vitamin D binding protein have an undeniable impact on human health. Polymorphisms occurring in the VDBP gene can be a significant risk factor or prevalence factor in many diseases associated with obesity, diabetes, PCOS, MD, or PD. Many studies have shown VDBP and vitamin D level as a biomarker in many diseases, and if that is so, knowing the role of SNPs in proteins may contribute to finding new approaches for many syndromes and diseases associated with the vitamin D metabolic pathway.

In bone density, low levels of VDBP were associated with lowering BMD in the rs7041 "G" allele., while rs4588 "A" was linked to lower 25(OH)D levels and increased bone fracture risk. In obesity, the rs4588 "C" allele was linked to lower 25(OH)D levels and higher BMI scores among females. Rs4588 "T" and rs7041 "C" alleles increased risk in developing PCOS among women who had vitamin D deficiency. In COPD, the rs7041 "T" allele was associated with a higher risk of disease.

We believe that VDBP polymorphisms may affect the levels of vitamin D metabolites and thus contribute to the development of certain diseases. However, we are aware that most studies do not have a specific time of the year to measure vitamin D accurate metabolite levels, which can be misleading due to its seasonal nature. Research to date, although linking some diseases with vitamin D deficiency or VDBP polymorphisms, is not sufficient if the underlying mechanism is not elucidated. For this purpose, further research is required regarding the possible influence of SNP polymorphisms, their modifications in the structure of the binding protein, and their influence on the organism.

**Author Contributions:** Conceptualization, A.C. and D.R.; investigation, J.P., D.R. and K.A.; writing D.R., K.A. and J.P.; writing—review and editing, D.R. and A.C.; visualization, D.R.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** Figure 1 was created thanks to Natalia Kordulewska and biorender.com.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **Genetic Variants of the NF-**κ**B Pathway: Unraveling the Genetic Architecture of Psoriatic Disease**

**Rubén Queiro 1,2,\* , Pablo Coto <sup>3</sup> , Leire González-Lara <sup>4</sup> and Eliecer Coto 2,5**


**Abstract:** Psoriasis is a multifactorial genetic disease for which the genetic factors explain about 70% of disease susceptibility. Up to 30–40% of psoriasis patients develop psoriatic arthritis (PsA). However, PsA can be considered as a "disease within a disease", since in most cases psoriasis is already present when joint complaints begin. This has made studies that attempt to unravel the genetic basis for both components of psoriatic disease enormously difficult. Psoriatic disease is also accompanied by a high burden of comorbid conditions, mainly of the cardiometabolic type. It is currently unclear whether these comorbidities and psoriatic disease have a shared genetic basis or not. The nuclear factor of kappa light chain enhancer of activated B cells (NF-κB) is a transcription factor that regulates a plethora of genes in response to infection, inflammation, and a wide variety of stimuli on several cell types. This mini-review is focused on recent findings that highlight the importance of this pathway both in the susceptibility and in the determinism of some features of psoriatic disease. We also briefly review the importance of genetic variants of this pathway as biomarkers of pharmacological response. All the above may help to better understand the etiopathogenesis of this complex entity.

**Keywords:** psoriasis; psoriatic arthritis; NF-κB; comorbidities; genetic architecture

#### **1. Introduction**

Psoriasis is a chronic immune-mediated inflammatory dermopathy that affects 2–3% of the general population. Its most common companion is psoriatic arthritis (PsA), a chronic arthritis included under the spondyloarthritis concept that affects one-third of psoriasis patients [1]. According to recent estimates, PsA can affect almost 0.6% of the adult population [2]. Both psoriasis and PsA are the main poles of what is now regarded as psoriatic disease, a systemic nature entity, where apart from the skin and musculoskeletal condition, there are a wide variety of comorbidities, the most relevant being those of cardiometabolic type [1,3]. In fact, we know of the tight connections between the inflammatory burden of psoriatic disease and higher cardiovascular risk [1,4].

Psoriatic disease itself, as well as its pleomorphic clinical manifestations, result from complex stochastic interactions between elements of genetic predisposition, immunopathological alterations, and environmental factors [1]. In the case of PsA, it is currently thought that the genetic substrate of the disease favors certain alterations of the gut microbiome of these patients, which gives rise to an IL17-type immune response, which have the potential capacity to migrate to distant sites such as entheses and joints, where the activation of certain lineages of innate cellularity (ILC-3 and γδ T cells), responsible for the beginning

**Citation:** Queiro, R.; Coto, P.; González-Lara, L.; Coto, E. Genetic Variants of the NF-κB Pathway: Unraveling the Genetic Architecture of Psoriatic Disease. *Int. J. Mol. Sci.* **2021**, *22*, 13004. https://doi.org/ 10.3390/ijms222313004

Academic Editor: Elixabet Lopez-Lopez

Received: 14 November 2021 Accepted: 29 November 2021 Published: 30 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of the manifestations of the disease, would prevail (gut–joint axis theory) [1,5]. Adaptive immunity would act, in this case, as positive feedback for the primal responses of innate immunity [1,5].

Psoriasis is a multifactorial genetic disease for which the genetic factors explain about 70% of disease susceptibility [1,6,7]. However, PsA can be considered as a "disease within a disease", since in most cases psoriasis is already present when arthritis begins [1]. This has made studies that attempt to unravel the genetic basis for both components of psoriatic disease enormously difficult. Moreover, this information is essential as it would help predict which psoriasis patients are at increased risk of developing arthritis over time. Genetic factors are evident from the high prevalence of PsA among first-degree relatives of PsA probands, and a recurrence risk ratio of 30–35 [1,6,7]. Also, the psoriatic disease concordance rate for monozygotic twins is higher compared to that of dizygotic twins [1]. Like psoriasis, both dominant and recessive inheritance, as well as an excessive paternal transmission pattern, has been proposed for PsA but neither apply, thus PsA has also been considered a multifactorial polygenetic disease [1].

Genome wide association studies (GWAS) have revealed more than 80 risk loci associated with psoriatic disease that are basically incorporated into three fundamental networks: (i) genes involved in the maintenance of the cutaneous barrier function, (ii) genes that control innate immune responses mediated by NF-κB and interferon signaling, (iii) genes that control adaptive immune responses that involves CD8 lymphocytes and Th17 signaling [1,7]. However, most of them are related to psoriasis risk, while only a few seem specific to PsA. Genes with genome-wide significance for PsA include *HLA-B/C, HLA-B, IL12B, IL23R, TNP1, TRAF3IP3*, and *REL* [1,7]. Although recent GWAS studies have identified numerous risk loci of psoriatic disease, almost 50% of the heritability of the disease remains unknown, and most of the genes identified so far have a small weight in the genetic etiology of the disease. This "lost" heritability can be attributed, among others, to common variants that have a very small weight in risk, copy number variants, epigenetic or epistatic interactions, or lack of power of current detection tools [1,7].

As we have commented so far, the concept of psoriatic disease not only refers to the skin or musculoskeletal domains of the disease, but also to the wide variety of comorbidities that accompany it [1,3]. Therefore, it would be interesting to know if this complex network of manifestations presents common or different genetic restriction elements. This knowledge could be key in the future to lay the foundations of truly personalized medicine for these patients.

This mini-review will focus on the importance of the NF-κB pathway, both in the predisposition to psoriatic disease, as well as in the differentiation of the genetic architecture of its different components, including the determinism of some of its comorbidities.

#### **2. The NF-**κ**B Signaling Pathway**

The nuclear factor of kappa light chain enhancer of activated B cells (NF-κB) is a transcription factor that regulates a plethora of genes in response to infection, inflammation, and a wide variety of stimuli on immune cells [8,9]. This transcription factor represents a family of structurally related proteins (p100 or NFκB2, p105 or NFκB1, p65 or RelA, RelB, or c-Rel), which exist as homo- or heterodimers [8,9]. Effects of NF-κB are mediated through three pathways. In the canonical pathway, (i) phosphorylation of inhibitory IκB proteins (IkBα) leads to release of NF-κB and its nuclear translocation to promote inflammation and cell survival. The p105 pathway (ii) is dependent on phosphorylation of p105 proteins, leading to nuclear translocation of p52 heterodimer complexes to promote inflammation. Different to the above two pathways, the alternate p100 pathway (iii) does not depend on the NF-κB essential modulator (NEMO)-IKKa-IKKb (NEMO-IKK) complex for phosphorylation, but rather on NF-κB inducing kinase (NIK), and IKKa heterodimers phosphorylate p100 and allows nuclear translocation of p52/RelB heterodimers [8,9].

The pathogenetic role of NF-κB has been demonstrated in many diseases including cancer, immune-mediated inflammatory diseases (IMIDs), as well as cardiovascular

diseases [9]. Genome-wide association studies have revealed several psoriatic disease susceptibility genes associated with the NF-κB pathway [10]. In addition, the role of NF-κB in psoriasis is supported by studies that reported its differential expression in normal vs. affected skin, but also by studies that showed that the inhibition of this transcriptional factor by several compounds ameliorated skin inflammation [11,12]. ceptibility genes associated with the NF-κB pathway [10]. In addition, the role of NF-κB in psoriasis is supported by studies that reported its differential expression in normal vs. affected skin, but also by studies that showed that the inhibition of this transcriptional factor by several compounds ameliorated skin inflammation [11,12]. Due to its relevance to this review, we will refer in detail to a specific pathway, which

The pathogenetic role of NF-κB has been demonstrated in many diseases including cancer, immune-mediated inflammatory diseases (IMIDs), as well as cardiovascular diseases [9]. Genome-wide association studies have revealed several psoriatic disease sus-

Due to its relevance to this review, we will refer in detail to a specific pathway, which involves TNF-α and the NF-κB transcription complex (Figure 1) with special focus on genetic variations in *CARD14*, *TNFSRF1B*, *NFKB1*, *NFKBIA*, and *NFKBIZ*. involves TNF-α and the NF-κB transcription complex (Figure 1) with special focus on genetic variations in *CARD14*, *TNFSRF1B*, *NFKB1*, *NFKBIA*, and *NFKBIZ*.

*Int. J. Mol. Sci.* **2021**, *22*, x FOR PEER REVIEW 3 of 11

**Figure 1.** The TNF-α/TNFRSF1A,B/NF-KB signaling pathway. TNF-α is a pro-inflammatory cytokine synthesized by various cell lineages, mainly macrophages, dendritic cells, keratinocytes as well as various T cell types. It exerts its functions through binding to type 1 (TNFR1) and 2 (TNFR2) cell membrane receptors. TNFR2 (TNFRSF1B) shows higher affinity for TNF-α. NF-κB is a dimer made up of Rel family proteins (p50, p52, Rel A/p65, cRel, and Rel B), the p50/p65 heterodimer being the most common. In the absence of TNF-α stimulation, the activity of NF-κB is regulated at cytoplasmic level by an inhibitory complex of proteins called IKB or Kappa Beta inhibitors (IkB alpha, beta, and epsilon) that regulate its function by blocking its translocation to the nucleus. The binding of TNFα to its receptors triggers the activation of the inhibitor kinases of NF-κB or "IKK complex" consisting of IKK-alpha, IKK-beta and IKK-gamma or NEMO. The characteristic event is the phosphorylation of IKB-alpha by the IKKa/b complex, inducing its ubiquitination and degradation by the proteasome; and thus, allowing the release of NF-κB. Once in the nucleus, NF-κB meets another inhibitory protein called IKBZ. Likewise, IKBZ can act as a transcription factor regulated by IL-17A, IL-1beta, and to a lesser extent, by TNF-α. In turn, IKBZ plays an important role in the development and expansion of Th17 lineages. Finally, CARD14 activates NF-κB by partially known mechanisms (see text for more details), although some authors suggest that CARD14 would activate the IKK complex, consequently regulating the activity of NF-κB, leading to an increase in its transcriptional activity. TNF-α: tumor necrosis factor-alpha. TNFR1 (TNFRSF1A): TNF-α type 1 receptor. TNFR2 (TNFRSF1B): TNF-α type 2 receptor. NF-κB: nuclear factor of kappa light chain enhancer of activated B cells. IKB: kappa-beta inhibitors. IKK: inhibitor kinases of NF-κB. IKBZ: z-inhibitor protein of NF-κB. IL: interleukin. Th: T-helper. CARD14: caspase recruitment domain family member 14. Activation of the NF-κB pathway leads to increased transcription of numerous genes including proinflammatory cytokines, chemokines, and growth factors, all of which are **Figure 1.** The TNF-α/TNFRSF1A,B/NF-KB signaling pathway. TNF-α is a pro-inflammatory cytokine synthesized by various cell lineages, mainly macrophages, dendritic cells, keratinocytes as well as various T cell types. It exerts its functions through binding to type 1 (TNFR1) and 2 (TNFR2) cell membrane receptors. TNFR2 (TNFRSF1B) shows higher affinity for TNF-α. NF-κB is a dimer made up of Rel family proteins (p50, p52, Rel A/p65, cRel, and Rel B), the p50/p65 heterodimer being the most common. In the absence of TNF-α stimulation, the activity of NF-κB is regulated at cytoplasmic level by an inhibitory complex of proteins called IKB or Kappa Beta inhibitors (IkB alpha, beta, and epsilon) that regulate its function by blocking its translocation to the nucleus. The binding of TNF-α to its receptors triggers the activation of the inhibitor kinases of NF-κB or "IKK complex" consisting of IKK-alpha, IKK-beta and IKK-gamma or NEMO. The characteristic event is the phosphorylation of IKB-alpha by the IKKa/b complex, inducing its ubiquitination and degradation by the proteasome; and thus, allowing the release of NF-κB. Once in the nucleus, NFκB meets another inhibitory protein called IKBZ. Likewise, IKBZ can act as a transcription factor regulated by IL-17A, IL-1beta, and to a lesser extent, by TNF-α. In turn, IKBZ plays an important role in the development and expansion of Th17 lineages. Finally, CARD14 activates NF-κB by partially known mechanisms (see text for more details), although some authors suggest that CARD14 would activate the IKK complex, consequently regulating the activity of NF-κB, leading to an increase in its transcriptional activity. TNF-α: tumor necrosis factor-alpha. TNFR1 (TNFRSF1A): TNF-α type 1 receptor. TNFR2 (TNFRSF1B): TNF-α type 2 receptor. NF-κB: nuclear factor of kappa light chain enhancer of activated B cells. IKB: kappa-beta inhibitors. IKK: inhibitor kinases of NF-κB. IKBZ: z-inhibitor protein of NF-κB. IL: interleukin. Th: T-helper. CARD14: caspase recruitment domain family member 14.

involved in the onset and perpetuation of the inflammatory response in psoriatic disease [11,12]. Interestingly, the activation of NF-κB induces the production of TNF-α, which in Activation of the NF-κB pathway leads to increased transcription of numerous genes including proinflammatory cytokines, chemokines, and growth factors, all of which are involved in the onset and perpetuation of the inflammatory response in psoriatic disease [11,12]. Interestingly, the activation of NF-κB induces the production of TNF-α, which

in turn can stimulate this pathway, thus resulting in positive proinflammatory feedback, and playing a key role in the chronicity of the skin/joint inflammatory response [11,12]. Moreover, an association between high levels of TNF-α and the activation of NF-κB has been found in the skin of patients with psoriasis; and therefore, a possible mechanism of action of anti-TNFα drugs would be, at least in part, the inhibition of the transcriptional activity of NF-κB [13], as we will see later.

#### **3. Role in the Etiology of Psoriatic Disease**

The caspase recruitment domain family member 14 (CARD14) is an intracellular scaffold protein that is prominently expressed in keratinocytes and mediates NF-κB activation through the formation of a CBM (CARD14-BCL10-MALT1) signaling complex [14]. It has been shown that gain-of-function mutations of *CARD14* enhance CBM complex formation in keratinocytes driving hyperactivation of the NF-κB pathway [15]. This leads to transcription of several chemokines (CCL20, CXCL1, and CXCL2), cytokines (IL-36 and IL-19), and antimicrobial peptides, followed by recruitment and activation of neutrophils, dendritic cells, and T cells [15]. Activated dendritic cells in turn release IL-23, which drives the downstream expression of IL-17 and IL-22, two central cytokines in the pathogenesis of the disease [15]. Variations in the *CARD14* gene have been associated with psoriatic disease. Jordan et al. identified 15 nucleotide variants in the *CARD14* gene significantly associated with the risk of psoriasis [16]. In a meta-analysis, they confirmed the association between rsll652075 (p. Arg820Trp) and psoriasis [17]. This association was later confirmed by two large meta-analysis [18,19]. We also found a statistically significant association between rsll652075 (CC genotype) and the risk of developing psoriasis (OR 1.59), regardless of the Cw6 genotype. However, we did not find genotypic differences related to psoriasis severity, age at disease onset (above or below 40 years), PsA, or family history of psoriasis [20].

The *TNFSRF1B* gene encodes the type 2 membrane receptor for TNF-α. The SNP TN-FRSF1B rs1061622 (p. Met196Arg) has been implicated in the risk of several IMIDs [21,22]. This variant has hardly been studied in psoriatic disease. In our population, we did not find differences in allelic and genotypic frequencies between patients and controls for the SNP rs1061622. Neither were differences found in psoriasis severity, nor between the presence or absence of PsA. However, we did find an increased frequency of the G allele in the Cw6 + patients (OR 1.69), which suggests an increased risk of developing psoriasis conditioned by the HLA-Cw6 allele [23].

The *NFKB1* gene encodes the p50 protein, responsible for binding to the consensus sequence of the promotor region of many genes [8,9]. The most studied and best characterized polymorphism of this gene is a 4 nucleotide indel in the promoter region (rs28362491; -94 of the ATTG). The copies of the gene with the deletion would have less promoter activity and, therefore, translate into lower protein levels, which could explain the genetic associations described for this variant [24]. Other variants located at the chromosomal region where *NFKB1* is found have been investigated and related to the risk of psoriasis, specifically rs1020760 and rs1609798 [25]. A study in a Han Chinese population has recently established the association between the SNP rs1020760, psoriasis, and a positive family history of psoriasis [26]. We studied the SNP rs230526 A/G in the *NFKB1* gene, which is in complete linkage disequilibrium (LD) with rs28362491 (-94 of ATTG). Therefore, the genotypes of rs230526 would allow us to infer those of rs28362491. Despite this, we did not find any difference between allelic or genotypic frequencies in patients and controls. We also did not see a significant effect on its severity, PsA, or the age of onset of psoriasis; nor differences in relation to the presence of HLA-Cw6 [27].

*NFKBIA* gene encodes IκBα, a cytoplasmic inhibitor of the NF-κB transcriptional complex. That is, its binding to the complex keeps it inactive in the cytoplasm by blocking its ability to move to the cell nucleus. In response to certain stimuli, IκBα is phosphorylated, and this releases it from the complex, which can now migrate to the nucleus and act as a multigenic transcription factor [8,9]. Variants in this gene have been associated with the risk of myocardial infarction and several IMIDs [28,29]. In 2010, two GWAS established

the association of two *NFKBIA* SNPs and the risk of psoriasis, while in 2015 a third GWAS confirmed the association of an additional SNP with psoriasis [30–32]. We have studied the rs7152376 SNP, which is in complete LD with rs12883343, a SNP related in the literature with both psoriasis and PsA. We did not find any association between rs7152376 and the risk of psoriasis, psoriasis severity, or the age at disease onset. However, we did find a statistically significant association between the rare G allele and the risk of PsA, with a dominant effect (AG + GG vs. AA). The rare NFKBIA rs7152376 C was significantly more frequent in PsA vs. controls (OR 2.03). The difference was even higher between PsA vs. "pure" psoriasis (OR 3.2) [33]. These findings are in line with the results of a meta-analysis that described the association of rs12883343, in complete LD with rs7152376, and the risk of PsA [34]. In this meta-analysis, the rare allele was also more frequent in PsA than in controls (meta-OR 1.22). Therefore, this variation in *NFKBIA* could be a true risk marker for PsA within the psoriatic complex.

The *NFKBIZ* gene encodes the IKBζ protein. The activity of NF-κB is regulated by nuclear inhibitors, such as IKBζ, which block its binding to DNA [8,9]. On the other hand, these peptides have a dual effect since by themselves they can act as transcription factors activating the expression of genes regulated by IL-17 or TNFα [35]. In 2015, Tsoi et al. described the genetic association between the *NFKBIZ* SNP rs7637230 and psoriasis [36]. This SNP is located outside the gene at the 3 'end, so it could be an indirect marker of some variant in *NFKBIZ* associated with its expression and/or function. The best candidate described so far would be an indel-type polymorphism in intron 10, rs3217713. Given its proximity to the 5 'end, it could influence the processing of total RNA into messenger RNA, an aspect that should be further investigated in vitro. We have investigated this indel SNP in psoriatic disease. When studying rs3217713, we did not find significant differences between patients and controls, and between the different clinical groups of patients, except for Cw6. Interestingly, the risk of developing psoriasis conferred by the combination of Cw6 + and rs3217713 ins/ins was statistically higher (OR 3.61) than the risk conferred by each genotype separately, which suggests some mechanism of genetic interaction or epistasis between both loci [37].

#### **4. NF-**κ**B Pathway Variants and Drug Response**

Due to the various levels of regulation, the NF-κB signaling pathway can be potentially targeted at various levels including kinases, phosphatases, ubiquitination, nuclear translocation, DNA binding, protein acetyl transferases, and methyl transferases [38]. Currently available drugs for the treatment of psoriasis and PsA interact in some way with these targets within the NF-kB signaling pathway. For example, NSAIDs are commonly used in the treatment of PsA and have been shown to inhibit the activity of IKKβ, preventing the degradation of IκB and blocking NF-κB nuclear translocation. Glucocorticoids are highly effective in the topical treatment of active psoriatic lesions but also as systemic agents in PsA and have been shown to inhibit NF-κB through both indirect and direct mechanisms. Indirectly, glucocorticoids induce the transcription and synthesis of IκBα, enhancing the retention of NF-κB in the cytoplasm and effectively inhibiting its activation. However, under certain conditions, glucocorticoids can directly inhibit activated NF-κB via competition between p65 and the glucocorticoid receptor for limited nuclear supplies of coactivator proteins. Both receptors require these coactivators for maximal activity, and by sequestering the available stores, glucocorticoids interfere with p65-dependent gene activation. Stimulation of human synovial cells with TNFα results in NF-κB activation and subsequent cell proliferation, with NF-κB blockade able to activity inhibit this TNFαinduced proliferation. Anti-TNFα biologics effectively remove these upstream activator stimuli, and as such, the inhibition of NF-κB activation can be considered part of their mechanism of action [38]. Secukinumab, an anti–IL-17A antibody, mediates some of its antipsoriatic effects by rapidly inhibiting IκBζ and subsequently IκBζ signature genes, which highly suggests that IL-17A/IκBζ signaling is a key driver of the complex psoriatic phenotype. These data strongly indicate that an important and very early mechanism of

action of anti–IL-17A therapy in patients with psoriasis is a reduction in IκBζ expression and a concomitant reduction in expression of IκBζ signature genes. It is possible that the IL-17A/IκBζ signaling axis also plays a role in the pathogenesis of psoriatic arthritis [39].

The search for genetic markers of good/bad response to drugs is a booming field, but still in its infancy. In this case, we will focus on the associations between NF-κB pathway variants and the response to anti-TNFα drugs. Anti-TNFα blocking agents could negatively modulate the signaling of the NF-κB pathway [13]. Genetic variations in the components of this pathway, particularly those that are related to the risk of psoriatic disease, could serve as pharmacogenetic markers. With respect to *CARD14* gene, we did find a significant association between rsll652075 (CC genotype) and the response to anti-TNFα drugs measured as PASI 75 at 24 weeks (OR 3.71) [40]. In our study cohorts, carriers of the G allele (GT + GG genotypes) of rs1061622 *TNFRSF1B* were associated with a worse response to anti-TNFα drugs (OR 2.34) [23]. However, the literature shows disparate results both in psoriatic disease and in other IMIDs, in this regard [41]. In our experience with anti-TNFα drugs, we did not observe any association between the genetic variants in *NFKB1*, *NFKBIA*, and drug response. However, in the case of *NFKBIZ*, the insertion allele was found more frequently in the group of non-responders, with a recessive effect (II vs. ID + DD *p* = 0.02), and OR for the negative drug response of 3.01 (95%CI: 1.15–7.88) [42]. However, the results of the literature in this regard are also disparate [43]. Anyway, it must be considered that this insertion allele is associated with the risk of psoriasis conditioned by the presence of HLA-Cw6, and the latter has also been linked to drug response in this disease [44].

#### **5. NF-**κ**B Genetic Variants and Cardiometabolic Comorbidity**

Psoriatic disease is accompanied by extracutaneous and extra-articular manifestations, as well as by a wide range of comorbidities. Compared with controls, patients with psoriatic disease have a higher incidence rate of other autoimmune diseases, cardiovascular disease, obesity/overweight, depression, anxiety, smoking, cancer, diabetes, alcohol abuse, osteoporosis, uveitis, and fatty liver disease [45]. One of the common connectors between the disease and its comorbidities is inflammation. For example, chronic inflammation in both psoriatic disease and atherosclerosis promotes increased production of adipokines and pro-inflammatory cytokines (e.g., TNF) with consequent insulin resistance and endothelial dysfunction. Also, data from animal models and human studies highlight the proatherogenic role of IL-17 on vascular inflammation, acting in synergy with TNF and IL-6. IL-17 is involved in endothelial dysfunction, hypertension, plaque progression and destabilization, stroke, and myocardial infarction. IL-17 also links inflammation with insulin resistance and adipocytes dysfunction (two common aspects of the so-called psoriatic metainflammation). In mice models with systemic inflammation and insulin resistance, there is accumulation of IL-6/IL-17 co-expressing T cells in the adipose tissue; these cells enhance leptin production, which eventually acts in synergy with IL-6 and IL-17 to promote Th17 differentiation. This complex network drives local and systemic insulin resistance, and, in fact, IL-17 neutralization improves glucose uptake. Another example of these relationships is seen in osteoporosis (OP), another common comorbidity among psoriatic patients. Cytokines involved in PsA pathogenesis have an effect on bone cell activity with possible inhibitory or stimulatory stimuli on osteoclasts and osteoblasts. TNF, IL-6, and IL-1, for example, exert stimulatory activity toward osteoclasts and inhibitory activity toward osteoblasts. IL-12 and IL-23 are also critical to inflammation-induced bone resorption. Specifically, IL-23 upregulates the receptor activator of nuclear factor kappa- B (RANK) on preosteoclasts and induces Th17 cells to produce IL-17. IL-17, one of PsA signature cytokines, promotes bone resorption via RANK ligand upregulation. Indeed, Th17 cells have been found to be highly increased in blood and tissues of patients with OP [45].

In any case, for the purposes of this review, we will focus on the most important comorbidity in these patients, such as cardiovascular disease. Although psoriatic disease is frequently accompanied by cardiovascular comorbidity, the potential genetic connections

between this comorbidity and psoriatic disease have received little attention. Furthermore, we do not know whether the genetic pathways to both cardiovascular comorbidity and psoriatic disease are common or differ. Although some studies suggest shared genes between psoriasis and cardiometabolic comorbidity, others suggest that the genetic architecture of psoriatic disease and cardiometabolic traits are largely distinct [46,47]. The NF-κB pathway might play a role in the pathogenesis of renal disease and type 2 diabetes mellitus (T2DM). In that sense, we studied 487 individuals, all Caucasian and aged 65–85 years. A total of 104 (21%) had impaired renal function and 30% were diabetics. The NFKB1 variants were significantly associated with T2DM: rs7667496 (OR 1.68) and rs28362491 (OR 1.67). There was a trend toward the association of these variants with impaired renal function. The two NFKB1 variants were in LD, and homozygous for the two non-risk alleles (rs7667496 CC + rs28362491 II), were significantly more common in the non-diabetics [48]. Therefore, *NFKB1* gene variations may be related to T2DM and an impaired renal function, both aspects clearly present in psoriatic patients [1,3].

Several reports show that the transcription factor NF-κB also activates genes involved in various cardiovascular diseases, in the pathogenesis of cardiac remodeling, and heart failure [49]. NF-κB is activated in the heart in many conditions: during acute ischemia and reperfusion, during unstable angina or in response to preconditioning [49]. An association between inflammation and cardiovascular risk has been suggested by the evidence that inflammatory cytokines, including IL-1β, IL-6, IL-18, and TNF-α, are increased in patients with heart failure, and increased inflammatory markers, such as C-reactive protein, predict a worse survival during acute coronary syndromes [49]. The association between psoriatic disease and cardiovascular risk has been clearly established over the last decade [1,3,4]. We determined whether common variants in three NF-κB genes were associated with early-onset coronary artery disease (CAD). Allele and genotype frequencies for the NFKB1 rs28362491 (-94 delATTG) and NFKBIA rs8904 were not significantly different between patients and controls. For the NFKBIZ rs3217713, the deletion allele was significantly more frequent in early-onset CAD patients. Deletion-carriers were more frequent in CAD (OR 1.48). Therefore, NFKBIZ variant was an independent risk factor for developing early-onset CAD [50]. However, in a recent meta-analysis of 13 case-control studies with 17 individual cohorts containing 9378 cases and 10,738 controls, the mutant D allele in NFKB1 rs28362491 locus increased the risk of CAD [51].

Table 1 summarizes the main results of this review.


**Table 1.** Genetic variants within the TNF-α/NF-κB pathway and psoriatic disease features.

**Table 1.** *Cont.*


\* Other variants located at the chromosomal region where *NFKB1* is found have been investigated and related to psoriasis risk, specifically rs1020760 (also associated with a positive family history) and rs1609798 [25,26]. LD: linkage disequilibrium. IMIDs: immune-mediated inflammatory diseases. CARD14: caspase recruitment domain family member 14 gene. TNFSRF1B: Type 2-TNFα receptor gene. NFKB1: NFκB subunit 1 gene. NFKBIA: NFκB alpha-inhibitor gene. NFKBIZ: NFκB zeta-inhibitor gene.

Obesity, the core element of metabolic syndrome (MetS), is common among patients with psoriatic disease [1,3]. The NF-κB pathway is involved in the low-grade chronic inflammation associated with obesity. The noncanonical IkB kinase ε and TBK1 are upregulated by overnutrition and may therefore be suitable potential therapeutic targets for MetS [52]. Also, NF-κB modulates apical components of metabolic processes including metabolic hormones such as insulin and glucagon, and therefore this pathway is also involved in both insulin resistance and sensitivity, another key component of psoriatic disease-associated MetS [3,49]. The importance between metabolic derangements and NFκB-mediated inflammation associated with psoriatic disease is already moving to the practical realm. Thus, it has recently been set that obese, diabetic, and hypertensive patients with PsA tend to present a better response and persistence to secukinumab, an anti-IL17A therapy [53].

#### **6. Conclusions**

The NF-κB is an essential regulator of inflammatory and metabolic processes associated with psoriatic disease. Genetic variants of this pathway are associated not only with the risk of suffering the condition but may also be at the base of the comorbidities that frequently accompany it. These variants can also help improve treatment selection. In recent years, genetic variant analysis of the NF-κB pathway has contributed to a finer dissection of the genetic architecture of psoriatic disease. However, we still need further analysis of these genetic variants, especially with an emphasis on non-canonical regulators of this pathway, which currently have a direct implication in the appearance of important comorbidities of psoriatic disease such as diabetes and obesity. Thus, for example, an IKK/TBK1 inhibitor has been tested in obese individuals with type 2 diabetes showing encouraging results [52]. These findings may have future implications for a disease approach focused on the needs of each psoriatic patient.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

