*Article* **Set-Based Rare Variant Expression Quantitative Trait Loci in Blood and Brain from Alzheimer Disease Study Participants**

**Devanshi Patel 1,2, Xiaoling Zhang <sup>2</sup> , John J. Farrell <sup>2</sup> , Kathryn L. Lunetta <sup>3</sup> and Lindsay A. Farrer 1,2,3,4,5,6,\***

	- Boston, MA 02118, USA; zhangxl@bu.edu (X.Z.); farrell@bu.edu (J.J.F.)

**Abstract:** Because studies of rare variant effects on gene expression have limited power, we investigated set-based methods to identify rare expression quantitative trait loci (eQTL) related to Alzheimer disease (AD). Gene-level and pathway-level cis rare-eQTL mapping was performed genome-wide using gene expression data derived from blood donated by 713 Alzheimer's Disease Neuroimaging Initiative participants and from brain tissues donated by 475 Religious Orders Study/Memory and Aging Project participants. The association of gene or pathway expression with a set of all cis potentially regulatory low-frequency and rare variants within 1 Mb of genes was evaluated using SKAT-O. A total of 65 genes expressed in the brain were significant targets for rare expression single nucleotide polymorphisms (eSNPs) among which 17% (11/65) included established AD genes *HLA-DRB1* and *HLA-DRB5*. In the blood, 307 genes were significant targets for rare eSNPs. In the blood and the brain, *GNMT*, *LDHC*, *RBPMS2*, *DUS2*, and *HP* were targets for significant eSNPs. Pathway enrichment analysis revealed significant pathways in the brain (*n* = 9) and blood (*n* = 16). Pathways for apoptosis signaling, cholecystokinin receptor (CCKR) signaling, and inflammation mediated by chemokine and cytokine signaling were common to both tissues. Significant rare eQTLs in inflammation pathways included five genes in the blood (*ALOX5AP*, *CXCR2*, *FPR2*, *GRB2*, *IFNAR1*) that were previously linked to AD. This study identified several significant gene- and pathway-level rare eQTLs, which further confirmed the importance of the immune system and inflammation in AD and highlighted the advantages of using a set-based eQTL approach for evaluating the effect of low-frequency and rare variants on gene expression.

**Keywords:** Alzheimer disease; expression quantitative trait loci (eQTL); rare variants; set-based eQTL; SKAT-O; pathways; immune system; inflammation; ROSMAP; ADNI

#### **1. Introduction**

Late-onset Alzheimer disease (AD) is the most common type of dementia that affects an estimated 5.7 million individuals aged 65 years and older in the United States, with the number projected to rise to 14 million by 2050 [1]. AD is highly heritable (h2 = 58–79%) [2], but common variants explain only one-third of the genetic portion of AD risk [2]. Highly penetrant rare variants may account for some of the missing heritability [3]. Whole-exome sequencing studies have identified robust AD associations with rare missense variants in *TREM2, AKAP9, UNC5C, ZNF655, IGHG3, CASP7* and *NOTCH3* [4–9], and it is expected that more AD-related rare variants will be identified by whole-genome sequencing (WGS) studies, because some rare variants, including those in non-coding regions, likely contribute

**Citation:** Patel, D.; Zhang, X.; Farrell, J.J.; Lunetta, K.L.; Farrer, L.A. Set-Based Rare Variant Expression Quantitative Trait Loci in Blood and Brain from Alzheimer Disease Study Participants. *Genes* **2021**, *12*, 419. https://doi.org/10.3390/ genes12030419

Academic Editor: Laura Ibanez

Received: 12 February 2021 Accepted: 10 March 2021 Published: 15 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> Bioinformatics Graduate Program, Boston University, Boston, MA 02215, USA; dpatel@bu.edu

to AD risk. However, identification of genes that are impacted by these rare variants, and thus likely have a functional role in AD, remains challenging.

Some AD risk variants are associated with gene expression, as demonstrated by recent expression quantitative trait locus (eQTL) studies [10,11]. Rare variants may contribute to extreme gene expression within a single tissue or across multiple tissues [12–15]. However, genome-wide studies of rare eQTLs are generally underpowered to obtain significant results. Although gene-based tests, which test the aggregate effects of multiple variants, are commonly used to evaluate the association of a disease with rare variants, only a few studies have applied this approach to the analysis of rare eQTLs. Several eQTL studies employed set-based approaches including testing gene expression with multiple single nucleotide polymorphisms (SNPs) chosen by variable selection [16,17] using a gene-based partial least-squares method to correlate multiple gene transcript probes with multiple SNPs [18], and identifying variants associated with transcript and protein modules [19]. These applications were not focused on rare variants, but still afforded higher power with a potential to find significant associations with low-frequency variants.

Few studies have applied a set-based eQTL method for rare variants. Recently, Lutz et al. applied burden and set-based (sequence) kernel association (SKAT) tests to normalize read counts in RNA-sequence (RNA-seq) studies [20]. In this study, we performed a gene-based cis-eQTL analysis using expression data derived from human blood and brain tissue to identify genes that contain a set of potentially regulatory low-frequency and rare variants (minor allele frequency (MAF) < 0.05) that are significantly associated with their expression. Although this design focused on rare variants, and thus has low power to detect expression differences between AD cases and controls, the set-based method can potentially discriminate AD-related targets among a group of genes located within 1 Mb from the expression single nucleotide polymorphisms (eSNPs) that were previously associated with the risk of AD. We also applied a pathway-based approach to determine which genes contribute most to the overall gene expression profile of a significant pathway containing a set of co-expressed functionally related genes.

#### **2. Materials and Methods**

#### *2.1. Study Cohorts*

The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multisite longitudinal study that began enrolling subjects in 2004, and includes persons with AD, mild cognitive impairment (MCI), and normal cognitive functioning [21]. Affymetrix Human Genome U219 array gene expression data derived from whole blood, whole-genome sequence (WGS) data, and phenotype data were downloaded from a public-access database (http: //www.loni.usc.edu (accessed on 11 December 2018)). The portion of the sample included in this study included 207 AD cases, 284 MCI cases, 194 controls, and 28 individuals with missing dementia status.

The Religious Orders Study (ROS)/Memory and Aging Project (MAP) also contributed to this research. ROS enrolled older nuns and priests from across the US without known dementia for a longitudinal clinical analysis and brain donation. MAP enrolled older subjects without dementia from retirement homes, who agreed to brain donation at the time of death [22,23]. RNA-sequence data, including gene expression information derived from dorsolateral prefrontal cortex area tissue donated by 475 participants (281 autopsyconfirmed AD cases and 194 controls), as well as WGS data included in this study, were obtained from the AMP-AD knowledge portal (https://www.synapse.org/#!Synapse: syn3219045 (accessed on 1 July 2018)) [24]. Characteristics of subjects from both cohorts are provided in Table 1.


**Table 1.** Characteristics of subjects in the Religious Orders Study/Memory and Aging Project (ROSMAP) and Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets.

NHW—non-Hispanic white, AA—African American. \* mean (standard deviation).

#### *2.2. Data Processing*

ADNI microarray gene expression data were normalized and log-transformed using limma [25]. ROSMAP RNA-seq data were normalized and then log-transformed using a previously described pipeline [26]. The log-transformed expression data were evaluated using surrogate variable analysis (SVA) [27] to obtain surrogate variables for global technical effects and hidden effects, which were included as covariates in the analysis models for eQTL discovery. Additional filtering steps of GWAS and gene expression data included eliminating 167 ROSMAP and 96 ADNI subjects with missing data (resulting in the sample sizes reported in Table 1), restricting gene expression data to protein-coding genes (12,971 genes in ROSMAP and 16,025 genes in ADNI), and selecting only bi-allelic low-frequent and rare variants (MAF ≤ 0.05) with a variant call rate of >95%.

#### *2.3. Functional Annotation of Variants*

Variants in the ADNI and ROSMAP WGS datasets were annotated using CADD v1.6 [28] and GWAVA v1.0 software [29]. Combined Annotation-Dependent Depletion (CADD) scores prioritize functional, deleterious, and disease-causal coding and noncoding variants by integrating multiple annotations into one score by contrasting variants that survived natural selection with simulated mutations [28]. A scaled CADD score of 10 or greater indicates a raw score in the top 10% of all possible reference genome single nucleotide variants (SNVs), and a score of 20 or greater indicates a raw score in the top 1% [28]. Genome-Wide Annotation of Variants (GWAVA) scores predict the functional impact of non-coding genetic variants based on annotations of non-coding elements and genome-wide properties, such as evolutionary conservation and GC-content, in the range of 0–1 with mutations scored >0.5 identified as "functional" and those scored ≤0.5 as "non-functional" [29]. Genomic coordinates of variants in the ADNI dataset that were established using genome build GRCh38 were converted to build hg19 using liftOver software (https://genome.ucsc.edu/cgi-bin/hgLiftOver (accessed on 3 November 2018)). Both ADNI and ROSMAP WGS variants were matched by chromosome, position, reference, and alternate alleles. Variants having a CADD score >15 or a GWAVA region score >0.5 were annotated as having a potential regulatory function.

#### *2.4. Set-Based eQTL Analysis*

The sequence of steps to identify set-based eQTLs in the blood and brain is shown in Figure 1.

**Figure 1.** Overview of set-based rare expression quantitative trait loci (eQTL) analysis. Gene-level tests were performed for each protein-coding gene using an aggregate of all potentially regulatory single nucleotide polymorphisms (SNPs) with minor allele frequency ≤0.05 within 1 Mb of each gene. Pathway-level analysis was carried out in two steps. First, the weighted gene co-expression network analysis (WCGNA) method was applied to identify co-expressed gene modules. Next, pathway enrichment analysis was conducted using the Protein Analysis Through Evolutionary Relationships (PAN-THER) tool to identify significantly enriched pathways in these gene modules, and pathway-level tests were then performed on each enriched pathway, including the aggregated SNPs for each gene in the module. Results were considered significant (*p* < 0.05) after applying a Bonferroni correction. **Figure 1.** Overview of set-based rare expression quantitative trait loci (eQTL) analysis. Gene-level tests were performed for each protein-coding gene using an aggregate of all potentially regulatory single nucleotide polymorphisms (SNPs) with minor allele frequency ≤0.05 within 1 Mb of each gene. Pathway-level analysis was carried out in two steps. First, the weighted gene co-expression network analysis (WCGNA) method was applied to identify co-expressed gene modules. Next, pathway enrichment analysis was conducted using the Protein Analysis Through Evolutionary Relationships (PANTHER) tool to identify significantly enriched pathways in these gene modules, and pathway-level tests were then performed on each enriched pathway, including the aggregated SNPs for each gene in the module. Results were considered significant (*p* < 0.05) after applying a Bonferroni correction.

#### 2.4.1. Gene-Level cis-eQTL Analysis 2.4.1. Gene-Level cis-eQTL Analysis

For common variants, eQTL analysis entails testing the association of expression of one gene with one variant. Gene-level eQTL analysis was performed by testing the association of expression of one gene with aggregated cis-regulatory variants, limited to those with a frequency of <0.05 and located in or within 1 Mb of the gene. Gene-based tests were performed using the SKAT-O method, which combines the variance component (SKAT) approach and burden tests into one test with optimal power [30]. We implemented SKAT-O tests for set-based eQTL analysis by considering the gene expression value as the outcome, with the aggregated rare variant count as the predictor. The regression model for analyses of the ROSMAP data also included covariates for age, sex, post-mortem interval (PMI), study (ROS or MAP), and a term for a surrogate variable (SV1), derived from the gene expression data matrix to account for unmeasured/hidden technical effects on gene expression using surrogate variable analysis (SVA) [27]. Model covariates for analyses of the ADNI data included baseline age, sex, RNA integrity number (RIN), year of blood sample collection, and SV1. SKAT-O was implemented with group-wise tests using EPACTS software (https://genome.sph.umich.edu/wiki/EPACTS (accessed on 9 March 2021)) with the following parameter specifications: epacts group —vcf [specific chr genome vcf.gz file]\—groupf [file of aggregated rare variants] —out [out file]\—ped [gene expression file]—max-maf 0.05\—pheno \$gene—cov Age\_baseline—cov Sex—cov RIN cov YearofCollection—cov SV1—test skat—skat-o—run 8. The significance threshold after adjusting for the number of genes tested was 3.86 × 10−6 (0.05/12,971) for analyses of For common variants, eQTL analysis entails testing the association of expression of one gene with one variant. Gene-level eQTL analysis was performed by testing the association of expression of one gene with aggregated cis-regulatory variants, limited to those with a frequency of <0.05 and located in or within 1 Mb of the gene. Gene-based tests were performed using the SKAT-O method, which combines the variance component (SKAT) approach and burden tests into one test with optimal power [30]. We implemented SKAT-O tests for set-based eQTL analysis by considering the gene expression value as the outcome, with the aggregated rare variant count as the predictor. The regression model for analyses of the ROSMAP data also included covariates for age, sex, post-mortem interval (PMI), study (ROS or MAP), and a term for a surrogate variable (SV1), derived from the gene expression data matrix to account for unmeasured/hidden technical effects on gene expression using surrogate variable analysis (SVA) [27]. Model covariates for analyses of the ADNI data included baseline age, sex, RNA integrity number (RIN), year of blood sample collection, and SV1. SKAT-O was implemented with group-wise tests using EPACTS software (https://genome.sph.umich.edu/wiki/EPACTS (accessed on 9 March 2021)) with the following parameter specifications: epacts group—vcf [specific chr genome vcf.gz file]\—groupf [file of aggregated rare variants]—out [out file]\—ped [gene expression file]—max-maf 0.05\—pheno \$gene—cov Age\_baseline—cov Sex—cov RIN—cov Year of Collection—cov SV1—test skat—skat-o—run 8. The significance threshold after adjusting for the number of genes tested was 3.86 <sup>×</sup> <sup>10</sup>−<sup>6</sup> (0.05/12,971) for analyses of the ROSMAP

data and 3.12 <sup>×</sup> <sup>10</sup>−<sup>6</sup> (0.05/16,024) for analyses of the ADNI data (Figure 1). To identify sentinel variants that contribute the majority of the evidence for significant gene-based results, eQTL tests were performed for all significant genes and each individual potentially regulatory rare variant (MAF ≤ 0.05) within 1Mb of the gene using linear regression models with the above covariates in R [31] for each cis-regulatory variant. The significance threshold after adjusting for the number of unique gene-SNP eQTLs was 1.83 <sup>×</sup> <sup>10</sup>−<sup>6</sup> (0.05/27,393) for analyses of the ROSMAP data and 1.17 <sup>×</sup> <sup>10</sup>−<sup>7</sup> (0.05/425,995) for analyses of the ADNI data.

#### 2.4.2. Pathway-Level cis-eQTL Analysis

Pathway-level eQTL analysis was employed to test the association of a pathway, containing many genes, with sets of variants in each of the genes in the pathway one at a time. First, modules of co-expressed genes were identified using the Weighted Gene Co-expression Network Analysis (WGCNA) method implemented in R [32], including all protein-coding genes that were expressed in the ADNI and ROSMAP datasets. Analyses were conducted using the default parameters (soft-threshold power β = 6.00, deepSplit = 2 (medium sensitivity), a minimum module size of 20, and a merge cut height of 0.15) that were recommended by the developers of the software [32] and applied in another AD study [33]. Each gene module can be summarized quantitatively by a module eigengene (ME) value derived from principal component analysis. The ME is considered to be representative of gene expression profiles in a gene module. Next, gene-set pathway enrichment analysis was performed using the Protein Analysis Through Evolutionary Relationships (PANTHER) software tool [34] to determine which pathways were significantly enriched in the gene modules identified from the WGCNA for pathway-level eQTL analysis. Significance of the enriched pathways was determined by the Fisher's Exact test with a false discovery rate (FDR) of <0.05. Pathway-level eQTL analysis was performed for each significantly enriched pathway. The association of the ME value and each gene in the module was tested individually using all potentially regulatory rare cis-SNPs (MAF < 0.05). Models included the same covariates and parameter specifications as described for the gene-level eQTL tests and were analyzed using the SKAT-O method implemented in EPACTS. A total of 77 genes in 9 enriched pathways were evaluated in the ROSMAP dataset, and 100 genes in 16 enriched pathways were evaluated in the ADNI dataset. After correction for the number of genes that were tested, the thresholds for significant pathway-level rare eQTLs were *<sup>p</sup>* < 6.49 <sup>×</sup> <sup>10</sup>−<sup>4</sup> in the ROSMAP dataset and *<sup>p</sup>* < 5.0 <sup>×</sup> <sup>10</sup>−<sup>4</sup> in the ADNI dataset (Figure 1).

#### 2.4.3. Comparison of Rare and Common eQTLs

To determine whether both common variants and gene-level aggregated rare/lowfrequency variants target expression of the same genes, we evaluated the overlap in significant gene-based cis-eQTLs with those involving common variants (MAF > 0.05) within 1 Mb of protein-coding genes that were obtained previously from the Framingham Heart Study (blood) and ROSMAP (brain) gene expression datasets [26]. These comparisons not only indicated which eGenes are regulated by rare and/or common variants, but also determined whether multiple variants can separately up- or down-regulate expression of the same gene.

#### **3. Results**

#### *3.1. Gene-Level eQTL Associations*

In the gene-level eQTL analysis, aggregating on average 416 unique low-frequency and rare variants for each gene, 65 significant gene-level eQTLs (*<sup>p</sup>* < 3.86 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ) were identified in the brain (Figure 1, Table S1). Eight of these genes, including established AD genes *HLA-DRB1* [35] and *HLA-DRB5* [36], are located in or near the major histocompatibility locus. By comparison, 307 significant gene-level eQTLs, with an average of 678 unique variants, were observed in blood at *<sup>p</sup>* < 3.12 <sup>×</sup> <sup>10</sup>−<sup>6</sup> (Figure 1, Table S2). Among these genes, *ABCA7*, *ECHDC3*, and *MS4A6A* are known AD loci [35,36]. The genes *GNMT*, *LDHC*, *RBPMS2*, *DUS2*, and *HP* were significant in both the brain and blood (Table 2), noting that the evidence for *RBPMS2* was stronger in the blood (*<sup>p</sup>* = 1.69 <sup>×</sup> <sup>10</sup>−36) than the brain (*<sup>p</sup>* = 9.90 <sup>×</sup> <sup>10</sup>−<sup>8</sup> ).


**Table 2.** Significant gene-level eQTLs common to blood and brain.

+ Cumulative number of variants. ˆ Number of unique variants. Chromosome and map position according to GRCh37 assembly.

#### *3.2. Variant-Level eQTL Associations*

Examination of the variant-level eQTL associations for the 65 significant genes in the brain identified 61 significant eGene-eSNP eQTL pairs, involving 22 unique eGenes (Table S3). By a very wide margin, the most significant eQTL pair featured rs772849040 located in *NFAT5*, which targeted *DDX19A-DDX19B* (*<sup>p</sup>* <sup>≤</sup> 1.0 <sup>×</sup> <sup>10</sup>−314). *DDX19A-DDX19B* was also a significant eGene for rs17881635 located in COG4 (*<sup>p</sup>* = 6.26 <sup>×</sup> <sup>10</sup>−23). *COPZ1* and *TMPRSS6* were both significant eGenes for seven eSNPs each. A much larger number of eQTL pairs (*n* = 832) were significant in the blood, in which 185 eGenes were unique (Table S4). Four of these genes had 20 or more significant eSNPs: *KRT79* (*n* = 36), *TAC3* (*n* = 32), *CDK12* (*n* = 24), and *SOS1* (*n* = 20). *LDHC* was a significant eGene for two eSNPs in the blood (rs117652970, *<sup>p</sup>* = 1.12 <sup>×</sup> <sup>10</sup>−<sup>21</sup> and rs17579565, *<sup>p</sup>* = 8.26 <sup>×</sup> <sup>10</sup>−21) and a third eSNP in the brain rs773835421, *<sup>p</sup>* = 1.60 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ). Adjacent genes *DHRS4* and its homolog *DHRS4L2* were significant eGene targets for 17 eSNPs. Similarly, three SNPs were each significant eQTLs paired with *ATP6V0D1* and *CMTM2*, and four SNPs were each significant eQTLs paired with *IKZF3* and *GSDMA*. In the brain, rs1260874991 and rs1405001784 were significant eSNPs for two zinc finger protein genes (*ZNF101* and *ZNF103*).

#### *3.3. Pathways Enriched in the Brain and Blood*

Pathway enrichment analysis of each gene module revealed 9 significant enriched pathways in the brain and 16 in the blood (Table 3). The apoptosis signaling, cholecystokinin receptor (CCKR) signaling map, and inflammation mediated by chemokine and cytokine signaling pathways were enriched in both the brain and blood. Focusing on genes in the significantly enriched pathways in the brain, the aggregated rare variants in *CCL7* and *CCL8* were associated with the inflammation mediated by chemokine and cytokine signaling pathway (*<sup>p</sup>* = 1.84 <sup>×</sup> <sup>10</sup>−<sup>5</sup> and *<sup>p</sup>* = 4.50 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , respectively, Table 4). In total, 6 of the 22 genes that contained significant aggregated rare eQTLs associated with pathway expression in the blood were members of the same inflammation pathway: *ALOX5AP* (*<sup>p</sup>* = 1.26 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ), *CXCR2* (*<sup>p</sup>* = 1.53 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), *FPR2* (*<sup>p</sup>* = 1.25 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ), *GRB2* (*<sup>p</sup>* = 6.04 <sup>×</sup> <sup>10</sup>−<sup>7</sup> ), *IFNAR1* (*<sup>p</sup>* = 1.98 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ), and *RAF1* (*<sup>p</sup>* = 2.11 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ) (Table 4). Furthermore, *CFLAR* (*<sup>p</sup>* = 2.42 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ), *TMBIM6* (*<sup>p</sup>* = 4.48 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ), and *TNFRSF10C* (*<sup>p</sup>* = 8.77 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ) were significant rare variant eQTLs in apoptosis signaling pathways in the blood. Significant aggregated rare variant eQTLs were observed with *ALOX5AP* in both gene-level (*<sup>p</sup>* = 2.20 <sup>×</sup> <sup>10</sup>−10) and pathway-level (*<sup>p</sup>* = 1.26 <sup>×</sup> <sup>10</sup>−<sup>4</sup> ) analyses (Table 4).

*Genes* **2021**, *12*, 419


pathway/expected number of genes in module.

*Genes* **2021**, *12*, 419


**Table 4.** Significant pathway-level eQTLs in the brain or blood by aggregating cis rare variants.

#### *3.4. Gene Targets of eQTLs in the Brain and Blood PDLIM5* [54] *S100A12* [55]

*KF1B* [50,51] *LRRC2* [52] *MS4A6A \** [36] *PADI2* [53]

*Genes* **2021**, *12*, x FOR PEER REVIEW 10 of 17

*3.4. Gene Targets of eQTLs in the Brain and Blood*

mon SNPs.

Comparison of significant rare and common eQTLs in each tissue (Figure 2) revealed 203 genes in the blood and 40 genes in the brain that were targets of rare and common eSNPs (Table S5), including 19 in the blood and 9 in the brain that have both been previously implicated in AD (Table 5). Three genes (*LDHC*, *RBPMS2*, and *HP*) are targets that were observed in significant rare and common eQTLs in the brain and the blood. *SPPL3* [56] *TMEM51* [57] *TREML4* [58] *UBE4B* [59] \* AD locus established by GWAS.

Comparison of significant rare and common eQTLs in each tissue (Figure 2) revealed 203 genes in the blood and 40 genes in the brain that were targets of rare and common eSNPs (Table S5), including 19 in the blood and 9 in the brain that have both been previously implicated in AD (Table 5). Three genes (*LDHC*, *RBPMS2*, and *HP*) are targets that

were observed in significant rare and common eQTLs in the brain and the blood.

**Table 5.** Genes previously implicated in AD whose expression is influenced by both rare and com-

**eQTLs in Blood eQTLs in Brain eGene Reference eGene Reference**  *ABCA7 \** [36] *ACOT1* [37] *ADAMTSL4* [38] *HLA-A* [39] *ARRB2* [40] *HLA-DOB \** [26] *ATG7* [41] *HLA-DRB1 \** [35,41] *CD36* [42] *HLA-DRB5 \** [36] *CREB5* [43] *HP* [44] *CTNNAL1* [45] *POMC* [46] *ECHDC3 \** [35] *RNF39* [47,48] *HP* [44] *ZNF253* [49]

**Figure 2.** Overlap of significant genes in rare gene-level and common eQTLs in (**A**) the blood and (**B**) the brain.


**Figure 2.** Overlap of significant genes in rare gene-level and common eQTLs in (**A**) the blood and (**B**) the brain. **Table 5.** Genes previously implicated in AD whose expression is influenced by both rare and common SNPs.

\* AD locus established by GWAS.

#### **4. Discussion**

Our study demonstrates that low-frequency and rare variants have a significant impact on both the expression of genes considered individually and the co-expression of genes in pathways. Our study highlights the value of the set-based rare-eQTL method because, similar to gene-based association tests, many novel significant genes we identified were not detected by the analysis of rare variants individually, which requires a much larger sample size. In addition, many of the most significant rare-variant findings involved genes with prior connections to AD through case-control comparisons using GWAS, gene expression, and functional studies.

Several of the most significant gene-level eQTL findings in the blood have previously been implicated in AD. *MS4A6A* (*<sup>p</sup>* = 1.77 <sup>×</sup> <sup>10</sup>−22) is among a family of genes containing many SNPs that are associated with AD risk at the genome-wide level [35,36]). A meta-analysis of gene expression studies found that *NUMA1* (*p =* 6.01 <sup>×</sup> <sup>10</sup>−76) was significantly upregulated in the hippocampus of AD cases [60], and another study showed that downregulation of *GAD1* (*p =* 1.49 <sup>×</sup> <sup>10</sup>−58) was associated with reduced neuronal activity [61]. Follistatin, encoded by *FST* (*<sup>p</sup>* = 4.02 <sup>×</sup> <sup>10</sup>−30), is a gonadal protein that inhibits the follicle-stimulating protein. The transmembrane protein, tomoregulin-2, contains follistatin-like modules and is found extensively in amyloid plaques in AD brains [62]. *KIF1B* (*p =* 4.49 <sup>×</sup> <sup>10</sup>−21) expression is significantly increased in AD and is associated with accelerated progression in neurodegenerative diseases [50,51]. The established AD gene *ADAM10* [35] is downregulated by *SFRP1* (*p =* 2.16 <sup>×</sup> <sup>10</sup>−20), which is significantly increased in the brain and cerebrospinal fluid (CSF) of AD patients [63]. *EXOC2* (*p =* 6.19 <sup>×</sup> <sup>10</sup>−<sup>9</sup> ) was identified as an AD age-of-onset modifier [64] and contains a rare missense variant that was observed in seven AD cases in an AD whole-exome sequencing study [9].

Four of the five significant gene-level rare eQTLs in the brain and blood (Table 1) have also been implicated in AD. *GNMT* expression has been detected in the hippocampus and its deficiency results in reduced neurogenic capacity, spatial learning, and memory impairment [65]. *LDHC* has differentially methylated regions in the blood in AD cases [66]. The overexpression of *DUS2* reduces Aβ<sup>42</sup> toxicity [67]. The acute-phase protein haptoglobin, encoded by *HP*, is significantly elevated among AD patients compared to healthy controls in serum [44,68] and CSF [69] in Asians and persons of European ancestry. The *HP* 1/1 genotype was associated with poorer cognitive function and greater cognitive decline than other *HP* genotypes in a sample of 466 African Americans with type 2 diabetes [70]. The RNA-binding protein *RBPMS2* has not been linked to AD but is a constituent of a leukocyte signature for traumatic brain injury [71].

We identified several pathways that are significantly enriched with genes involved in the CCKR signaling map, apoptosis signaling, and inflammation mediated by chemokine and cytokine signaling pathways, all of which have been linked to AD [72–74]. Wnt signaling, one of the significant pathways we observed in brain, suppresses tau phosphorylation and Aβ production/aggregation, inhibits *BACE1* expression, and promotes neuronal survival [75]. *HSPA5* (*<sup>p</sup>* = 7.91 <sup>×</sup> <sup>10</sup>−<sup>5</sup> ), one of the significant pathway-level eQTL findings, is involved in both amyloid precursor protein metabolism and neuronal death in AD [76].

Our rare-eQTL gene-level and pathway-level results confirm the substantial immune and inflammatory component to AD. Significant gene-level rare eQTLs in the brain included several HLA region loci linked to AD by GWAS (*HLA-DRB1* and *HLA-DRB5* [35,36]) and cell-type specific eQTL analysis (*HLA-DOB* [26]). *IL27* (*<sup>p</sup>* = 1.69 <sup>×</sup> <sup>10</sup>−30) is a cytokine, and *CARD17* (*p =* 6.73 <sup>×</sup> <sup>10</sup>−13) encodes a regulatory protein of inflammasomes, which are responsible for the activation of inflammatory responses [77]. Overall, 8 of the 21 significant pathway-level rare eQTLs involved genes which have roles in the inflammation mediated by the chemokine and cytokine signaling pathway. Chemokine levels were found to be significantly increased in serum, CSF, and brain tissue from AD cases [78]. Chemokine receptor *CXCR2* induces Aβ peptides [79]. Another gene in this group, *IFNAR1*, encodes the interferon α and β receptor subunit 1. Primary microglia isolated from the brains of *APP/PS1* mutant mice with ablated type-I interferon signaling have shown reduced levels of Aβ1–42 [80]. In addition to being a significant pathway-level rare eQTL, *FPR2* is also very significant eQTL in the blood (*p =* 1.22 <sup>×</sup> <sup>10</sup>−240), and more specifically, in interferon and anti-bacterial cells (*p =* 3.81 <sup>×</sup> <sup>10</sup>−17) [26]. It is involved in the uptake and clearance of Aβ and contributes to innate immunity and inflammation [81]. *ALOX5AP* (a.k.a. *FLAP)* is expressed in microglia and encodes a protein which, with 5-lipoxygenase, is required for leukotriene synthesis. Leukotrienes are arachidonic acid metabolites which have been implicated in neuroinflammatory and amyloidogenesis processes in AD [82]. Pharmacological inhibition of FLAP in Tg2576 mice significantly reduced tau phosphorylation at

multiple sites and increased post-synaptic density protein-95 and microtubule-associated protein 2 [83]. Growth factor receptor-bound protein 2, encoded by *GRB2,* is an adaptor protein that is involved in the trafficking of Aβ [84]. Although the inflammation pathway was implicated in the eQTL analysis in both the brain and blood, our results showed that the genes significantly contributing to pathway expression differed between the tissues. This suggests that AD-related inflammatory processes may differ in the blood and brain.

We observed significant eQTLs involving 27 target genes, previously implicated in AD through genetic and experimental approaches, which were paired with rare variants identified in this study and previously reported common variants [26] (Table 5). *HP* was the only gene in this group whose expression was influenced by rare and common eSNPs in both the blood and brain, and thus, it has notable potential as a blood-based biomarker reflecting AD-related gene expression changes in brain.

Although the set-based rare-eQTL method employed in this study has multiple strengths in comparison to the analysis of individual rare eQTLs (e.g., higher power, reduced multiple testing burden, and ability to detect the effects of variants with lower frequency), our results should be interpreted cautiously in light of several limitations. Comparisons between the brain and blood were not conducted using data from the same subjects, and thus may underestimate similarities across tissues. Also, brain expression patterns may reflect post-mortem changes unrelated to disease or cell-type specific expression [85]. The set-based method using SKAT-O allows for opposite effect directions of the constituent SNPs in the test; however, closer scrutiny of the individual SNPs is necessary to draw conclusions about the collective influence of rare variants on expression, as well as consistency of the effect direction across tissues. Our results, which were generated from analyses at the tissue level, do not account for patterns that are cell-type specific within the blood and brain, as we recently demonstrated for common individual variant eQTLs in these datasets [26]. In addition, it is unclear whether the set-based eQTL method applied in this study would behave similarly for rare (MAF < 0.01) and low-frequency (0.01 < MAF < 0.05) variants analyzed separately. Finally, although this investigation was conducted using tissue obtained from participants enrolled in studies of AD, the direct testing of the relevance of findings from the set-based tests of rare variants to AD status was not feasible, because the sample size was insufficient to have representation of the sentinel variants in both the case and control groups. This limitation is analogous to the difficulty encountered in the replication of the aggregated rare variant test findings in AD genetic association studies [7,8]. Thus, further studies of some genes are needed to establish their role in AD. Nonetheless, our study provided evidence favoring specific genes under previously established AD-association peaks whose expression may be differentially or concordantly regulated in the blood and brain (Table 5).

#### **5. Conclusions**

This study of gene-based and pathway-level rare eQTLs implicated novel genes that may have important roles in AD, found additional evidence supporting the contribution of immune/inflammatory pathways in AD, and demonstrated the utility of a set-based eQTL approach for assessing the role of rare variants in molecular mechanisms underlying the disease. The relevance of these findings to AD should be validated in larger samples with sufficient power for comparing patterns between AD cases and controls, as well as with functional experiments.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2073-4 425/12/3/419/s1, Table S1: Gene-level rare cis-eQTLs in the brain (*<sup>p</sup>* < 3.86 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), Table S2: Individual SNP eQTLs in the brain (*<sup>p</sup>* < 1.83 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), Table S3: Gene-level rare cis-eQTLs in the blood (*<sup>p</sup>* < 3.12 <sup>×</sup> <sup>10</sup>−<sup>6</sup> ), Table S4: Individual SNP eQTLs in the blood (*<sup>p</sup>* < 1.17 <sup>×</sup> <sup>10</sup>−<sup>7</sup> ), Table S5: eGene targets of both rare and common eQTLs in the blood and brain.

**Author Contributions:** D.P. analyzed the data and prepared the figures and tables. J.J.F. provided database management and bioinformatics support. X.Z. and K.L.L. provided expert advice on the design and execution of the statistical genetic analyses. D.P. and L.A.F. wrote the manuscript. L.A.F. obtained the funding for this study. X.Z. and L.A.F. supervised the project. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by NIH grants RF1-AG057519, 2R01-AG048927 U01-AG058654, P30-AG13846, 3U01-AG032984, U01-AG062602, and U19-AG068753. Collection of study data provided by the Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, U01AG61358, a grant from the Illinois Department of Public Health, and the Translational Genomics Research Institute. Collection and sharing of ADNI data were funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81 X WH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer's Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroR x Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institute of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org (accessed on 9 March 2021)). The grantee organization is the Northern Alzheimer's Disease Cooperative Study at the University of Southern California, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuroimaging at the University of Southern California.

**Institutional Review Board Statement:** This study was approved by the Boston University Institutional Review Board.

#### **Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The results published here are in whole or in part based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.synapse.org (accessed on 1 July 2018)). ROSMAP study data were provided by the Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago. ADNI data can be obtained by study investigators (adni.loni.usc.edu (accessed on 9 March 2021)).

**Acknowledgments:** Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how\_to\_apply/ADNI\_ Acknowledgement\_List.pdf (accessed on 9 March 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Genes* Editorial Office E-mail: genes@mdpi.com www.mdpi.com/journal/genes

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-2932-5