Next Article in Journal
Comprehensive Analyses of Simple Sequence Repeat (SSR) in Bamboo Genomes and Development of SSR Markers with Peroxidase Genes
Next Article in Special Issue
mintRULS: Prediction of miRNA–mRNA Target Site Interactions Using Regularized Least Square Method
Previous Article in Journal
Next Generation Sequencing after Invasive Prenatal Testing in Fetuses with Congenital Malformations: Prenatal or Neonatal Investigation
Previous Article in Special Issue
Prediction of the Effects of Missense Mutations on Human Myeloperoxidase Protein Stability Using In Silico Saturation Mutagenesis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mining High-Level Imaging Genetic Associations via Clustering AD Candidate Variants with Similar Brain Association Patterns

1
University of Pennsylvania, Philadelphia, PA 19104, USA
2
The Catholic University of Korea, Seoul 06591, Korea
3
Indiana University, Indianapolis, IN 46202, USA
4
Cedars-Sinai, West Hollywood, CA 90048, USA
*
Author to whom correspondence should be addressed.
Membership of the ADNI is provided in the Acknowledgments.
Genes 2022, 13(9), 1520; https://doi.org/10.3390/genes13091520
Submission received: 16 July 2022 / Revised: 12 August 2022 / Accepted: 17 August 2022 / Published: 24 August 2022

Abstract

:
Brain imaging genetics examines associations between imaging quantitative traits (QTs) and genetic factors such as single nucleotide polymorphisms (SNPs) to provide important insights into the pathogenesis of Alzheimer’s disease (AD). The individual level SNP-QT signals are high dimensional and typically have small effect sizes, making them hard to be detected and replicated. To overcome this limitation, this work proposes a new approach that identifies high-level imaging genetic associations through applying multigraph clustering to the SNP-QT association maps. Given an SNP set and a brain QT set, the association between each SNP and each QT is evaluated using a linear regression model. Based on the resulting SNP-QT association map, five SNP–SNP similarity networks (or graphs) are created using five different scoring functions, respectively. Multigraph clustering is applied to these networks to identify SNP clusters with similar association patterns with all the brain QTs. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern. We applied this pipeline to an AD imaging genetic study, which yielded promising results. For example, in an association study between 54 AD SNPs and 116 amyloid QTs, we identified two SNP clusters with one responsible for amyloid beta clearances and the other regulating amyloid beta formation. These high-level findings have the potential to provide valuable insights into relevant genetic pathways and brain circuits, which can help form new hypotheses for more detailed imaging and genetics studies in independent cohorts.

1. Introduction

Alzheimer’s Disease (AD) is a complex neurodegenerative disorder characterized by continuous cognitive impairment and eventual amyloid plaques, neurofibrillary tangles, and atrophy patterns in the brain [1,2,3]. As the most common type of dementia, AD is responsible for approximately 5.8 million dementia cases in US [4]. AD has a heritability ranging from 60% to 80% estimated from the twin study [5]. The most widely used approach to identify AD genetic basis is to perform a genome-wide association study (GWAS) or GWAS-based meta-analysis on case-control phenotypes. Over 50 AD-related single nucleotide polymorphisms (SNPs) have been identified [6,7].
Many previous AD studies use GWAS and pathway enrichment analysis to explore the genetic basis of the AD diagnosis [3,7,8,9,10,11,12,13,14]. However, these case-control genetic association studies cannot directly reveal the biological pathways from genetic determinants, molecular signatures, and brain traits to cognitive and clinical outcomes. To bridge this gap, brain imaging genetics [15,16,17] is emerging as a new research field, where quantitative traits (QTs) extracted from brain imaging data are used as intermediate phenotypes to study genetics. These imaging QTs have the potential to not only link genetics with disease outcomes but also capture neuropathological heterogeneity of AD [18,19].
Conventional brain imaging genetics studies perform massive pairwise association analyses between each SNP-QT pair. These individual level SNP-QT signals are high dimensional and typically have small effect sizes, making them hard to be detected and replicated. To bridge this gap, some studies attempt to interpret these results on a macroscopic level or derive high-level understandings. For example, Yao et al. used a two-dimensional enrichment analysis to address this challenge, grouping similar brain regions and genes together via a biclustering approach [20]. Yao’s work identified various high-level two-dimensional imaging genetic modules, which were predefined based on the brain transcriptome data from Allen Human Brain Atlas.
In this work, instead of using the knowledge-driven, predefined imaging genetic modules, we propose an alternative data-driven approach to identify high-level imaging genetic patterns. Based on the detailed SNP-QT associations, we develop a graph-cut algorithm to cluster similar SNPs together so that SNPs within the same cluster tend to have similar associations with QTs across the brain. We construct multiple SNP networks based on different similarity measurements. Each similarity network can be viewed as a weighted graph with a specific similarity measure defined as the edge weight. We employ a multigraph clustering method derived from min-max graph cut to discover SNP clusters that take into consideration of all the studied similarity measures. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern to provide valuable biological insights at a high level.
We applied this pipeline to an AD imaging genetic study, which yielded promising results. For example, in an association study between 54 AD SNPs and 116 amyloid QTs, we identified two SNP clusters with one responsible for amyloid beta clearances and the other regulating amyloid beta formation. These high-level findings have the potential to provide valuable insights into relevant genetic pathways and brain circuits, which can help form new hypotheses for subsequent imaging and genetics studies in independent cohorts.

2. Material and Methods

2.1. Data Description

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) [21]. The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see www.adni-info.org. In this study, participants (N = 971) include 202 AD, 218 late MCI (LMCI), 296 early MCI (EMCI), and 255 healthy control (HC) subjects. The baseline structural magnetic resonance imaging (MRI) scans, AV45, and FDG positron-emission tomography (PET) scans, genotyping data, demographic information and clinical assessments are downloaded from the ADNI database (adni.loni.usc.edu). Table A1 shows participant characteristics.

2.2. Data Preprocessing

The genotyping data were downloaded and analyzed using PLINK v1.90 [22]. We perform quality control using the following criteria: genotyping call rate > 95 % , minor allele frequency > 5 % , and Hardy Weinberg Equilibrium > 1.00 × 10 6 . Then, we select 54 risk variants identified by recent AD genome-wide association studies (GWAS) or GWAS meta-analysis [3,6,7]. Table A2 shows the list of risk variants investigated in this study.
Structural MRI scans are processed with voxel-based morphometry (VBM) using the Statistical Parametric Mapping (SPM) software. All scans are aligned to a T1-weighted template image, segmented into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) maps, normalized to the standard Montreal Neurological Institute (MNI) space as 2 × 2 × 2 mm 3 voxels. The GM maps are extracted and smoothed with an 8mm FWHM kernel. We then extract the average regional GM measurements from 116 regions-of-interests (ROIs) defined by the automated anatomical labeling (AAL) atlas.
Preprocessed F-18 florbetapir (AV45) PET scans are collected and aligned to the Montreal Neurological Institute space as 2 × 2 × 2 mm voxels using SPM. Standard uptake value ratio is computed by intensity normalization based on a cerebellar crus reference region. We then extract the average regional AV45 measurements from 116 AAL ROIs.
The (18)F-fluorodeoxyglucose (FDG) PET measurements are also registered into the same MNI space as 2 × 2 × 2 mm 3 voxels by SPM. We then extract the average regional FDG measurements from 116 AAL ROIs.

2.3. Method Overview

Figure 1 shows the flowchart of the analyses performed in this study, including six steps. Step 1 generates detailed SNP-QT association maps for five different subject sets examined in our prior study [23], respectively. Step 2 constructs five SNP similarity networks using different scoring functions. Step 3 performs multigraph clustering on the five SNP networks with a range of cluster numbers. Step 4 examines the clustering quality of each cluster through Silhouette analysis. Based on the Silhouette scoring results, two cluster groups are selected for the subsequent analysis in Steps 5 and 6. We perform functional annotation for (1) each identified SNP cluster in Step 5 using pathway analysis and (2) its corresponding brain association pattern in Step 6 using Neurosynth and Neurovault.

2.4. Step 1: Imaging Genetic Association Analysis

The relationship between each ROI-based imaging QT and each SNP can be obtained by performing a linear regression. Let G be a set of SNPs and Y be a set of imaging QTs (AV45, FDG and VBM). We perform a linear regression model to estimate the additive effect of each SNP g G on each QT y Y . The analysis is performed for all possible SNP-QT pairs for each of the five comparison groups (i.e., EMCI vs. HC, LMCI vs. HC, AD vs. HC, MCI vs. HC, ALL vs. HC) within each of the three imaging modalities (i.e., AV45, FDG, and VBM). The regression is repeated 54 × 116 times. The linear regression model is defined as follows:
y = α g + Γ Z + ϵ ,
where Z = ( z 1 , , z k ) T includes the variables whose effects we want to exclude, such as age, sex, and education; α and Γ = ( γ 1 , , γ k ) are the coefficients; ϵ is the error term. Our goal is to estimate α and also test if the SNP g has a significant effect (i.e., α 0 ) on each QT y Y .
Thus, in Step 1 we generate an ROI-based p-value map to quantify the significance of SNP effects on imaging data. Specifically, in this work, each element of the significance map records the “negative log p-value” l o g 10 ( p ) at the corresponding ROI. At the end of this step, we have 5 SNP-QT maps of size 54 (number of studied SNPs) × 116 (number of ROIs) for each of the three modalities.

2.5. Step 2: SNP Networks with Different Similarity Measurements

Step 1 explores the lower level relationship between imaging and genetic data. In order to aggregate the individual effects of multiple SNP–ROI pairs to high-level imaging genetic patterns, we transform the SNP-QT maps to an SNP network that models the SNP similarity in terms of their effects on all the QTs across the entire brain. From Step 1, a 54-by-116 SNP-QT map is constructed for each of the five comparison groups within each of the three modalities. For each SNP, there is a 116 dimensional feature representation that maps its effect on the brain. The similarity measurement is applied on all pairs of 116-dimensional normalized SNP vectors to create a 54-by-54 SNP network. Five scoring functions shown in Table 1 are used, resulting in five distinct 54-by-54 SNP networks for each comparison group. The three SNP networks formed by the Pearson correlation, the Spearman correlation, and the cosine similarity are normalized by taking the absolute value of the entry, respectively. The two SNP networks formed by the Manhattan and Euclidean distances are transformed to normalized similarity networks by taking a Gaussian radial basis function centered at distance = 0 with a standard deviation of (maximum–minimum)/3, respectively. After normalization, all the entries in each 54-by-54 SNP network have a value between 0 and 1.

2.6. Step 3: Multigraph Min-Max Graph Clustering

Although an SNP network describes the similarity between each pair of SNPs, a high-level understanding can be obtained by grouping similar SNPs together and study their collective effects. From Step 2, five 54-by-54 normalized similarity SNP networks are created for each comparison group within each of the three modalities. The network can be viewed as a graph so that the connected components output from graph cut algorithms are viewed as network clusters. Ding et al. proposed a min-max graph cut algorithm that improves cluster quality and balance by minimizing similarity between pairwise subgraphs and maximizing similarity within each subgraph [24]. The min-max graph cut takes a single similarity network as input, so it clusters one network and examines the effect of one scoring function. Wang et al. generalized the single-graph min-max graph cut into multigraph min-max graph cut, which is used in this study to evaluate the combined effect of five scoring functions [25]. The objective functions of both min-max graph cut models are shown in Table 2. In this study, multigraph min-max graph cut algorithm is implemented through a gradient descent method with convergence conditions. The implication of multigraph min-max clustering is that it combines the effects of multiple scoring functions at the same time. The clustering results of multigraph min-max graph cut algorithm have features that resemble the clustering results of single-graph min-max clustering from the best scoring function. Multigraph min-max clustering with five 54-by-54 SNP networks as inputs is performed on the number of clusters ranged from 2 to 9 to produce clustering results for each comparison group within each modality.

2.7. Step 4: Silhouette Scoring Analysis

The goal of this step is to determine the optimal number of clusters. Silhouette refers to a method of interpretation and validation of consistency within clusters of data and provides a graphical representation of cluster quality [26]. The Silhouette value has a range between -1 and 1. A value close to 1 indicates good clustering quality: the objects are close to assigned clusters and far from neighbor clusters. A value close to -1 suggests that the number of clusters selected is not appropriate. The scoring functions are listed in Table 3. The Silhouette scoring analysis is performed on the clustering results of multigraph clustering with number of cluster ranged from 2 to 9. The normalized similarity networks in Step 3 are transformed to distance matrices by converting a similarity measure of x into a distance measure of 1 x . For a given number of clusters, there are 5 similarity measurements × 5 comparison groups within each of the three modalities. The 5 × 5 = 25 Silhouette scores are averaged for comparison. The clustering result with the highest averaged Silhouette score is selected for further analysis. The Silhouette scoring analysis is also performed on the clustering results of single-graph clustering with number of cluster ranging from 2 to 9. The 5 Silhouette scores from 5 comparison groups are averaged and compared with the averaged Silhouette score of the multigraph clustering to analyze the effectiveness of multigraph clustering.

2.8. Step 5: EnrichR Elsevier Pathway Analysis

A high-level result of two SNP groups is produced from previous analysis. The genetic domain of each SNP group can be analyzed through the pathway analysis using Enrichr. Enrichr is an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3) [27,28,29]. The software can also be embedded into any tool that performs gene list analysis. The 54 AD-related SNPs in this study are mapped to their closest gene, upstream or downstream. The SNP cluster from multigraph clustering are mapped to a group of genes and uploaded to EnrichR for pathway analysis. The elsevier pathway analysis results of each SNP cluster are recorded and compared because it contains various AD-related pathways.

2.9. Step 6: Neurovault Brain Region Analysis

After analyzing the genetic domain, the brain pattern corresponding to each SNP cluster can be analyzed through mapping the average effect of each SNP group onto the brain. This brain association pattern can be analyzed by Neurovault and Neurosynth [30], which gives us functional and structural information of the affected brain regions. NeuroVault is an open-science neuroinformatics online repository of brain statistical maps atlases and parcellations [30]. Neurosynth is a platform for large-scale, automated synthesis of functional magnetic resonance imaging (fMRI) data. It takes thousands of published articles reporting the results of fMRI studies and outputs brain maps with calculated correlation coefficients given the uploaded MRI data. The SNPs that are grouped together are expected to affect similar brain regions; thus, the averaged SNP effect on 116 QTs from each SNP group is calculated and mapped onto the brain. The resulting brain map is functionally annotated using NeuroVault and Neurosynth.

3. Result

3.1. Imaging Genetic Association Maps

Figure 2 shows all 15 resulting imaging genetic association maps, arranged by three modalities (AV45, FDG, VBM) against five comparisons (EMCI vs. HC, LMC vs. HC, AD vs. HC, MCI vs. HC, All vs. HC). Each map consists of 54 SNPs on the vertical axis and 116 ROIs on the horizontal axis. The order of SNPs on the vertical axis follows the list shown in Table A2. The order of ROIs on the horizontal axis follows the list shown in Table A3.
Each entry of the map corresponds to l o g 10 (p-value) from the linear regression before normalization. After an initial SNP-QT map is created, each 116-dimensional vector of a given SNP is normalized such that the Euclidean norm is 1. This step is performed so that each SNP is represented as a directional unit vector to facilitate subsequent analysis.
While such an imaging genetic map describes detailed associations for each SNP-QT pair, it is not straightforward to detect any general trend in these maps. The goal of the subsequent steps is to extract high-level information from these maps and help provide biological interpretation to aid biomarker discovery and therapeutic target identification.

3.2. Multigraph vs. Single-Graph Silhouette Analysis

The multigraph vs. single-graph averaged Silhouette scores are shown in Figure 3. The multigraph averaged Silhouette score is calculated by taking the mean of 25 Silhouette scores (5 scoring functions × 5 comparison groups) from the multigraph clustering result at a given number of clusters for a given modality. The single-graph averaged Silhouette score is calculated by also taking the mean of 5 × 5 = 25 Silhouette scores. Instead of using the same clustering result across five scoring functions for the multigraph case, a single-graph clustering is performed on each of the scoring functions. The Silhouette scores are calculated based on the clustering result of a specific scoring function.
A higher Silhouette score indicates a better clustering quality. A lower number of clusters is preferred in this study when the Silhouette scores are similar since our goal is to provide a high-level understanding. As a result, cluster number = 2 is chosen for the subsequent analyses.

3.3. Clustering Results

The SNP networks constructed by the normalized cosine scoring function are shown in Figure 4. The two resulting SNP clusters are separated by two black lines. The cluster with a smaller number of SNPs is reordered in the top left corner with the cluster with a larger number of SNPs in the bottom right corner.
The similarity network entries are normalized so that the minimum is 0 and the maximum is 1. Each SNP has a maximum similarity of 1 with itself as observed from the diagonal. Good partition of SNPs is indicated by strong similarity within each cluster and weak similarity between the clusters. A balanced size of the two clusters is preferred so that we can identify multiple high-level patterns instead of one single high-level pattern coupled with a small number of outliers; therefore, the clustering result on the AV45 measures for the LMCI vs. HC comparison group as well as the clustering result on the VBM measures for the AD vs. HC comparison group are selected for subsequent analysis.

3.4. Case Study: Example AV45 Result

Among all the results in modality AV45, the most balanced one is generated by analyzing the LMCI vs. HC comparison group, and this result is shown in Table A4. The functional annotation and pathway analysis of the identified SNP clusters and the corresponding brain maps are shown in Figure 5. The SNPs in each of the two groups are mapped to their closest genes and uploaded as two gene sets to enrichR. The Elsevier pathway analysis is used in this study because multiple AD related pathways are included in this pathway, which is helpful for understanding AD pathogenesis. The average normalized brain significance maps corresponding to two SNP groups are shown in Figure 5c. Neurosynth analysis results of these two brain maps are shown in Figure 5d.

3.5. Case Study: Example VBM Result

Among all the results in the modality VBM, the most significant and balanced result is generated by analyzing the AD vs. HC comparison group, and this result is shown in Table A5. The functional annotation and pathway analysis of the identified SNP clusters and the corresponding brain maps are shown in Figure 6. The analysis is similar to the previous case study on the AV45 measures for the LMCI vs. HC comparison group. This clustering result has a lower Silhouette score (0.158) than that in the previous case study (0.293). So a less distinct pattern is observed in the network, along with less differentiated pathways, brain regions, and brain map visualization.

4. Discussion

4.1. Comparison between Single-Graph and Multigraph Clusterings

In this study, multiple scoring functions have been selected to evaluate the similarity between different AD-related SNPs in terms of their effects on 116 ROIs across the brain. Each scoring function quantifies the similarity between SNPs from a specific perspective. Multigraph clustering is used to output a clustering result that combines the effects of multiple scoring functions. The purpose of building SNP–SNP networks through different scoring methods is to evaluate the SNP similarity in terms of their effects on 116 ROIs traits across the brain from multiple perspectives. Given two vectors (1, 2, 3) and (0.001, 0.002, 0.003), their Pearson correlation, Spearman correlation, and cosine similarity are all 1 (corresponding to the largest similarity), since they focus on comparing the vector directionality instead of the vector magnitude; however, their Manhattan distance and Euclidean distance are very sensitive to the vector magnitude, and thus are both large, leading to very small similarity. Our multigraph approach combines the effects of all these scoring functions, and takes into consideration both vector directionality and magnitude when performing multigraph clustering.
Several single-graph and multigraph clusterings with a varying number of clusters from 2 to 9 are performed. Averaged Silhouette analysis scores are used to quantify clustering quality under a given cluster condition. In Figure 3, the plot of averaged Silhouette analysis for single-graph shows that clustering quality improves in general as the number of clusters increases for FDG and VBM; however, for AV45 a higher number of clusters leads to a lower cluster quality. There is an inconsistency in the optimal number of clusters for different imaging modalities. The goal of this study is to acquire a high-level understanding of imaging genetic associations. Despite the inconsistency of clustering quality, a large number of clusters also makes subsequent analysis complicated. Only a few brain regions and pathways will be present when the number of SNPs in each cluster decreases, which downgrades the high-level understanding back to individual level analysis.
With these difficulties addressed in single graph clustering, the use of multigraph clustering is very promising for various reasons. The first advantage of multigraph clustering is that at a given number of clusters, it is able to selectively use scoring functions that behave well. For example, at cluster number = 2, the Pearson and Spearman methods have low Silhouette scores (<0.062) across all three modalities, while the Manhattan, Euclidean, and cosine methods have high ones (>0.11). In this case, the multigraph clustering yields an average Silhouette score of 0.1016 (Figure 3), resulting in prominent patterns when mapped to Manhattan, Euclidean, and cosine networks (e.g., Figure 5a).
The second advantage of multigraph clustering for this study is that it behaves the best for AV45 and VBM at the number of clusters = 2 (see Figure 3). As discussed above, a small number of clusters is great for high-level analysis. For FDG, the Silhouette score for the cluster number of 2 is also close to the score for the cluster number of 8. So the result for the cluster number of 2 is reported for all three modalities in this study and coupled with subsequent functional annotation and pathway analysis.
The third advantage of multigraph clustering is that the analysis is more efficient and consistent than a collection of single-graph clusterings. Instead of doing five single-graph clusterings with inconsistent results among different scoring functions, multigraph clustering is able to return a single set of clustering result. This feature provides a novel way of analysis for future studies with a large number of candidate evaluation functions and no prior knowledge of their performances.

4.2. AV45 Clustering Result

In the AV45 row of Figure 4, comparison group AD vs. HC and ALL vs. HC both have one cluster group of 1 SNP and another cluster group of 53 SNPs. The two clusters can be viewed as one group because the multigraph clustering algorithm explicitly enforces each cluster to be nonempty. While these two results are not significant, rs11278892 with its minor allele G is classified to be the most distant from the other 53 SNPs.
Comparison group EMCI vs. HC has one cluster group of 2 SNPs and another cluster group of 52 SNPs. Again, this can be roughly viewed as a single group. The smaller cluster group contains rs4575098 and rs4663105. There is no prior research of rs4575098, but rs4663105 mapped to BINI gene was identified as having a significant association among APOE ϵ 4 + and ϵ 4 subjects [31]. Future research can be conducted on the association between rs4575098 and rs4663105 as well as their collective role in early MCI development.
Comparison group LMCI vs. HC has the most balanced cluster group for AV45 with one cluster of 20 SNPs and another cluster of 34 SNPs (with APOE rs429358). The partition will provide us with insights of how two groups of SNPs each plays a different role in the LMCI stage. This finding is promising given that (1) LMCI is the transitional stage between EMCI and AD, (2) there are no significant partitions at EMCI and AD, and (3) there is a significant pattern at LMCI. This suggests a potential stage-specific imaging genetic pattern during AD progression, which warrants further investigation. See Section 4.5 for additional discussion on the functional annotation of this high-level imaging genetic pattern.

4.3. FDG Clustering Result

In the FDG row of Figure 4, for the smaller cluster group, EMCI vs. HC group has rs10498633 and rs12881735, LMCI vs. HC group has rs10498633 and rs12881735, and AD vs. HC group has rs6656401, rs2093760, and rs4844610. The MCI vs. HC group has eight SNPs and the ALL vs. HC group has six SNPs. In general, the clustering patterns in the networks do not seem as significant as AV45 and VBM. The Silhouette score of FDG (0.076) is also lower than AV45 (0.102) and VBM (0.0879); yet, there is one observation of the results: rs10498633 present in both EMCI and LMCI smaller cluster groups. Previous studies have shown that rs10498633 in SLC24A4 was significantly associated with anisotropy, total number and length of fibers, including some connecting brain hemispheres [32].

4.4. VBM Clustering Result

In the VBM row of Figure 4, comparison group MCI vs. HC has one group of 2 SNPs (rs4236673 and rs9331896) and another group of 52 SNPs. Comparison group ALL vs. HC has one group of 1 SNP (rs9271058) and another group of 53 SNPs. These cases can be viewed as having one group instead of two partitions.
Comparison group EMCI vs. HC has a smaller group of six SNPs: rs10808026, rs7810606, rs10498633, rs12881735, rs12590654, and rs113260531. Comparison group LMCI vs. HC has a smaller group of five SNPs: rs4236673, rs9331896, rs10498633, rs12881735, and rs12590654. The SNPs rs10498633, rs12881735, and rs12590654 lie in the intersection of these two groups, potentially having an impact throughout the MCI stage. As mentioned in the FDG section, rs10498633 is also found to be distant from the other AD-related SNPs for VBM modality, which reinforces its unique role associated with anisotropy in the MCI stage.
Comparison group AD vs. HC has the most balanced cluster result with one group of 16 SNPs and another group of 38 SNPs. This provides us with insights about how the two groups of AD-related SNPs each play a different role in AD patients. Functional annotation of this high-level imaging genetic pattern are discussed in Section 4.6.

4.5. AV45 Case Study

In Figure 5a,b, the Elsevier pathway analysis reveals some promising results on our genetic analysis of AV45 measures in the LMCI vs. HC comparison: (1) the pathway of amyloid beta clearance in AD is enriched by genes associated with the SNP Group 1, and (2) the pathway of amyloid beta formation in AD is enriched by genes associated with the SNP Group 2. AD pathogenesis is widely believed to be driven by the production and decomposition of β -amyloid peptide [33]. The disease state of AD is closely related to the solubility and the quantity of β -amyloid. Our pathway analysis suggests that the SNPs in Group 1 have potential to be related to the decomposition of amyloid beta while the SNPs in Group 2 to be related to its production. Since AD is characterized by accumulation of β -amyloid, it warrants further investigation that the SNPs involved here can be studied as suppressors and/or promoters to minimize the amount of β -amyloid present [34].
A relevant observation from our pathway analysis is Group 1’s association with amyloid beta and APP intracellular transport in AD and amyloid beta traffic and degradation in extracellular matrix in AD and Group 2’s association with APP processing. β -amyloid is released by sequential proteolytic processing of the amyloid precursor protein, so the inhibition of APP processing and the excitation of intracellular transport, traffic, and degradation together minimize the accumulation of β -amyloid in the extracellular matrix.
Another indicator of Group 1’s role on β -amyloid is the MBP immunal pathway, which is responsible for amyloid beta degradation [35]. The most correlated pathway of Group 2 is complement activation in AD. Complement proteins are integral components of amyloid plaques and cerebral vascular amyloid in AD patient brains, which can be found at the earliest of amyloid deposition [36]. The complement activation also coincides with the clinical expression of Alzheimer’s dementia. Aside from the two group’s direct associations with β -amyloid, the pathway analysis also shows that AD is correlated with different diseases including Tangier Disease, cancer, psoriasis, and asthma. Previous studies have shown that Tangier Disease is caused by mutations of ABAC1, which is closely related to β -amyloid [37].
In Figure 5c,d, the most correlated brain regions associated with SNP Group 1 include cerebellar, cerebellum, vi, lobules, and vermis (see https://neurosynth.org/analyses/terms/, accessed on 16 June 2022 for definition of these terms). Cerebellar and cerebellum are responsible for motor functions and balance. It is also associated with the visual system. Vermis and some subsequent correlated brain regions are also associated with maintaining posture. So, this group is primarily associated with brain regions that are responsible for balance, motor functions, and visual functions. Group 2 is correlated with prefrontal, medial prefrontal, medial, prefrontal cortex, and social. All these regions control cognitive ability, memory management, and emotional impulse. The affected brain regions and their respective functions of two groups of SNPs show a great difference, demonstrating the promise of our clustering result.

4.6. VBM Case Study

Figure 6a,b shows the results of Elsevier pathway analysis on our genetic study of VBM measures in the AD vs. HC comparison. SNP Group 1 is associated with complement activation in AD and various pathways that is associated with the immune system and systematic lupus erythematosus, which is a disease categorized by the immune system attacking its own tissues. SNP Group 2 is associated with amyloid clearance and formation pathways, which has an ambiguous downstream function compared with the AV45 results. Thus previous AV45 result shows a better partition, which can also be verified by visually inspecting the SNP networks and comparing the averaged Silhouette scores (0.1015 vs. 0.0879).
In Figure 6c,d, the brain association pattern corresponding to SNP Group 1 includes cerebellum, cerebellar, vi, lobules, and putamen. Cerebullum and cerebellar govern motor functions and balance (see https://neurosynth.org/analyses/terms/, accessed on 16 June 2022 for definition of these terms). The putamen is involved in learning and motor control, including speech articulation, language functions, and cognitive functions. Similar to the Group 1 result of the AV45 analysis above, this group is associated with balance, motor functions, and visual functions. The brain association pattern corresponding to SNP Group 2, on the other hand, is related to premotor, parietal motor, movements, and primary motor. The primary function of the premotor cortex is to assist in integration of sensory and motor information of the performance of an action. The parietal lobes integrate somatosensory signals and information from different modalities. The difference between the two brain maps in this case is less significant than the AV45 analysis above.

5. Conclusions

A data-driven analysis pipeline has been proposed in this work to identify high-level imaging genetic patterns. Based on the detailed SNP-QT associations, we develop a graph-cut algorithm to cluster similar SNPs together so that SNPs within the same cluster tend to have similar associations with QTs across the brain. We construct multiple SNP networks based on different similarity measurements. Each similarity network can be viewed as a weighted graph with a specific similarity measure defined as the edge weight. We employ a multigraph clustering method derived from min-max graph cut to discover SNP clusters that take into consideration of all the studied similarity measures. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern to provide valuable biological insights at a high level.
Our genetic analysis of the AV45 imaging QTs in the LMCI vs. HC comparison yields a prominent clustering pattern in the cosine SNP network. The pathway analysis shows that the identified SNP Group 1 is associated with amyloid beta clearances while the SNP Group 2 is related to amyloid beta formation. The functional annotation using Neurosynth shows that the brain regions associated with SNP Group 1 are related to motor and balance functions while the brain regions associated with SNP Group 2 are related to memory and cognitive functions. These high-level findings have the potential to provide valuable insights into relevant genetic pathways and brain circuits, which can help form new hypotheses for more detailed imaging and genetics studies in independent cohorts.

Author Contributions

Conceptualization, R.W., A.J.S., J.H.M. and L.S.; methodology, R.W., J.B. and M.K.; software, R.W.; validation, R.W., J.B. and L.S.; formal analysis, R.W. and L.S.; investigation, R.W. and L.S.; resources, A.J.S., J.H.M. and L.S.; data curation, R.W., J.B. and M.K.; writing—original draft preparation, R.W.; writing—review and editing, J.B. and L.S.; visualization, R.W. and J.B.; supervision, L.S.; project administration, L.S.; funding acquisition, A.J.S., J.H.M. and L.S. The investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data, but did not participate in analysis or writing of this report. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health grant numbers R01 LM013463, U01 AG068057, R01 AG071470, R01 AG058854, and P30 AG010133, and the National Science Foundation grant number IIS 1837964.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Pennsylvania (protocol code 831893 and date of approval 26 October 2018).

Informed Consent Statement

Study subjects gave written informed consent at the time of enrollment for data collection and completed questionnaires approved by each participating site’s IRB. The authors state that they have obtained approval from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Data Sharing and Publications Committee for use of the data.

Data Availability Statement

The datasets used and analyzed during the study are available in the ADNI LONI repository, https://adni.loni.usc.edu/, accessed on 16 June 2022.

Acknowledgments

Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org, accessed on 16 June 2022). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu, accessed on 16 June 2022). A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf, accessed on 16 June 2022.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADAlzheimer’s Disease
GWASgenome-wide association study
SNPsingle nucleotide polymorphism
QTquantitative traits
ROIregion of interest
MRImagnetic resonance imaging
PETpositron emission tomography
HChealthy control
EMCIearly mild cognitive impairment
LMCIlate mild cognitive impairment
AV45F-18 florbetapir
FDG(18)F-fluorodeoxyglucose
VBMvoxel-based morphometry

Appendix A

Table A1. Participant characteristics.
Table A1. Participant characteristics.
HCEMCILMCIADTotal
Number of subject255296218202971
Age76.35 ± 6.5471.78 ± 7.2874.71 ± 8.3975.85 ± 7.6774.48 ± 7.67
Sex (Male/Female)132/123167/129129/89123/79551/420
Education (Year)16.37 ± 2.6412.12 ± 2.6416.12 ± 2.9415.83 ± 2.8116.13 ± 2.75
The list includes 54 susceptibility loci identified by recent landmark AD genetic studies [3,6,7]. The SNP-QT association maps shown in Figure 2 have a vertical axis that follows the order below.
Table A2. Selected AD-related SNPs.
Table A2. Selected AD-related SNPs.
rs-IDChromosomePositionGene Symbolrs-IDChromosomePositionGene Symbol
rs4575098chr1161155392ADAMTS4rs7920721chr1011720308ECHDC3
rs6656401chr1207692049CR1rs3740688chr1147380340SPI1
rs2093760chr1207786828CR1rs10838725chr1147557871CELF1
rs4844610chr1207802552CR1rs983392chr1159923508MS4A6A
rs4663105chr2127891427BIN1rs7933202chr1159936926MS4A2
rs6733839chr2127892810BIN1rs2081545chr1159958380MS4A6A
rs10933431chr2233981912INPP5Drs867611chr1185776544PICALM
rs35349669chr2234068476INPP5Drs10792832chr1185867875PICALM
rs6448453chr411026028CLNKrs3851179chr1185868640PICALM
rs190982chr588223420MEF2C-AS1rs17125924chr1453391680FERMT2
rs9271058chr632575406HLA-DRB1rs17125944chr1453400629FERMT2
rs9473117chr647431284CD2APrs10498633chr1492926952SLC24A4
rs9381563chr647432637CD2APrs12881735chr1492932828SLC24A4
rs10948363chr647487762CD2APrs12590654chr1492938855SLC24A4
rs2718058chr737841534GPR141rs442495chr1559022615ADAM10
rs4723711chr737844263GPR141rs59735493chr1631133100KAT8
rs1859788chr799971834PILRArs113260531chr175138980SCIMP
rs1476679chr7100004446ZCWPW1rs28394864chr1747450775ABI3
rs12539172chr7100091795NYAP1rs111278892chr191039323ABCA7
rs10808026chr7143099133EPHA1rs3752246chr191056492ABCA7
rs7810606chr7143108158EPHA1-AS1rs4147929chr191063443ABCA7
rs11771145chr7143110762EPHA1-AS1rs41289512chr1945351516PVRL2
rs28834970chr827195121PTK2Brs3865444chr1951727962CD33
rs73223431chr827219987PTK2Brs6024870chr2054997568CASS4
rs4236673chr827464929CLUrs6014724chr2054998544CASS4
rs9331896chr827467686CLUrs7274581chr2055018260CASS4
rs11257238chr1011717397ECHDC3rs429358chr1945411941APOE
Table A3. Region of interest order. This table includes 116 regions of interest in the brain. The SNP-QT association maps shown in Figure 2 have a horizontal axis that follows the order below.
Table A3. Region of interest order. This table includes 116 regions of interest in the brain. The SNP-QT association maps shown in Figure 2 have a horizontal axis that follows the order below.
IndexNameIndexNameIndexNameIndexName
1Precentral_L30Insula_R59Parietal_Sup_L88Temporal_Pole_Mid_R
2Precentral_R31Cingulum_Ant_L60Parietal_Sup_R89Temporal_Inf_L
3Frontal_Sup_L32Cingulum_Ant_R61Parietal_Inf_L90Temporal_Inf_R
4Frontal_Sup_R33Cingulum_Mid_L62Parietal_Inf_R91Cerebelum_Crus1_L
5Frontal_Sup_Orb_L34Cingulum_Mid_R63SupraMarginal_L92Cerebelum_Crus1_R
6Frontal_Sup_Orb_R35Cingulum_Post_L64SupraMarginal_R93Cerebelum_Crus2_L
7Frontal_Mid_L36Cingulum_Post_R65Angular_L94Cerebelum_Crus2_R
8Frontal_Mid_R37Hippocampus_L66Angular_R95Cerebelum_3_L
9Frontal_Mid_Orb_L38Hippocampus_R67Precuneus_L96Cerebelum_3_R
10Frontal_Mid_Orb_R39ParaHippocampal_L68Precuneus_R97Cerebelum_4_5_L
11Frontal_Inf_Oper_L40ParaHippocampal_R69Paracentral_Lobule_L98Cerebelum_4_5_R
12Frontal_Inf_Oper_R41Amygdala_L70Paracentral_Lobule_R99Cerebelum_6_L
13Frontal_Inf_Tri_L42Amygdala_R71Caudate_L100Cerebelum_6_R
14Frontal_Inf_Tri_R43Calcarine_L72Caudate_R101Cerebelum_7b_L
15Frontal_Inf_Orb_L44Calcarine_R73Putamen_L102Cerebelum_7b_R
16Frontal_Inf_Orb_R45Cuneus_L74Putamen_R103Cerebelum_8_L
17Rolandic_Oper_L46Cuneus_R75Pallidum_L104Cerebelum_8_R
18Rolandic_Oper_R47Lingual_L76Pallidum_R105Cerebelum_9_L
19Supp_Motor_Area_L48Lingual_R77Thalamus_L106Cerebelum_9_R
20Supp_Motor_Area_R49Occipital_Sup_L78Thalamus_R107Cerebelum_10_L
21Olfactory_L50Occipital_Sup_R79Heschl_L108Cerebelum_10_R
22Olfactory_R51Occipital_Mid_L80Heschl_R109Vermis_1_2
23Frontal_Sup_Medial_L52Occipital_Mid_R81Temporal_Sup_L110Vermis_3
24Frontal_Sup_Medial_R53Occipital_Inf_L82Temporal_Sup_R111Vermis_4_5
25Frontal_Med_Orb_L54Occipital_Inf_R83Temporal_Pole_Sup_L112Vermis_6
26Frontal_Med_Orb_R55Fusiform_L84Temporal_Pole_Sup_R113Vermis_7
27Rectus_L56Fusiform_R85Temporal_Mid_L114Vermis_8
28Rectus_R57Postcentral_L86Temporal_Mid_R115Vermis_9
29Insula_L58Postcentral_R87Temporal_Pole_Mid_L116Vermis_10
Table A4. SNP clustering result on the AV45 measures for the LMCI vs. HC comparison. The SNP and the corresponding closest genes are listed for each resulting cluster or group.
Table A4. SNP clustering result on the AV45 measures for the LMCI vs. HC comparison. The SNP and the corresponding closest genes are listed for each resulting cluster or group.
Group 1 Group 2
IndexSNPGeneIndexSNPGene
1rs4575098_AADAMTS41rs6656401_ACR1
2rs4663105_CRP11-138I18.22rs2093760_ACR1
3rs6733839_TRP11-138I18.23rs4844610_ACR1
4rs6448453_AAP001257.14rs10933431_GSPI1
5rs9381563_CRNU6-560P5rs35349669_TCELF1
6rs2718058_GFERMT26rs190982_GMS4A6A
7rs11257238_CPVRL27rs9271058_AMS4A6A
8rs7920721_GAPOE8rs9473117_CPICALM
9rs10838725_CBIN19rs10948363_GRNU6-560P
10rs983392_GBIN110rs4723711_TFERMT2
11rs7933202_CINPP5D11rs1859788_ASLC24A4
12rs2081545_AINPP5D12rs1476679_CSLC24A4
13rs867611_GCASS413rs12539172_TSLC24A4
14rs10792832_ACASS414rs10808026_AADAM10
15rs3851179_TCASS415rs7810606_TKAT8
16rs10498633_THLA-DRB116rs11771145_ARP11-333E1.1
17rs12881735_CAL355353.117rs28834970_CRP11-81K2.1
18rs12590654_AAL355353.118rs73223431_TCNN2
19rs113260531_AEPDR119rs4236673_AABCA7
20rs28394864_AGPR14120rs9331896_CABCA7
21 21rs3740688_GCD33
22 22rs17125924_GRP11-61G19.1
23 23rs17125944_CMEF2C-AS1
24 24rs442495_CCD2AP
25 25rs59735493_AGPR141
26 26rs111278892_GEPDR1
27 27rs3752246_GPILRA
28 28rs4147929_AZCWPW1
29 29rs41289512_GNYAP1
30 30rs3865444_AEPHA1
31 31rs6024870_AEPHA1-AS1
32 32rs6014724_GEPHA1-AS1
33 33rs7274581_CPTK2B
34 34rs429358_CPTK2B
Table A5. SNP clustering result on the VBM measures for the AD vs. HC comparison. The SNP and the corresponding closest genes are listed for each resulting cluster or group.
Table A5. SNP clustering result on the VBM measures for the AD vs. HC comparison. The SNP and the corresponding closest genes are listed for each resulting cluster or group.
Group 1 Group 2
IndexSNPGeneIndexSNPGene
1rs6656401_ACR11rs4575098_AADAMTS4
2rs2093760_ACR12rs4663105_CRP11-138I18.2
3rs4844610_ACR13rs6733839_TRP11-138I18.2
4rs1859788_ASLC24A44rs10933431_GSPI1
5rs1476679_CSLC24A45rs35349669_TCELF1
6rs12539172_TSLC24A46rs6448453_AAP001257.1
7rs11771145_ARP11-333E1.17rs190982_GMS4A6A
8rs28834970_CRP11-81K2.18rs9271058_AMS4A6A
9rs73223431_TCNN29rs9473117_CPICALM
10rs4236673_AABCA710rs9381563_CRNU6-560P
11rs9331896_CABCA711rs10948363_GRNU6-560P
12rs3740688_GCD3312rs2718058_GFERMT2
13rs113260531_AEPDR113rs4723711_TFERMT2
14rs3752246_GPILRA14rs10808026_AADAM10
15rs4147929_AZCWPW115rs7810606_TKAT8
16rs3865444_AEPHA116rs11257238_CPVRL2
17 17rs7920721_GAPOE
18 18rs10838725_CBIN1
19 19rs983392_GBIN1
20 20rs7933202_CINPP5D
21 21rs2081545_AINPP5D
22 22rs867611_GCASS4
23 23rs10792832_ACASS4
24 24rs3851179_TCASS4
25 25rs17125924_GRP11-61G19.1
26 26rs17125944_CMEF2C-AS1
27 27rs10498633_THLA-DRB1
28 28rs12881735_CAL355353.1
29 29rs12590654_AAL355353.1
30 30rs442495_CCD2AP
31 31rs59735493_AGPR141
32 32rs28394864_AGPR141
33 33rs111278892_GEPDR1
34 34rs41289512_GNYAP1
35 35rs6024870_AEPHA1-AS1
36 36rs6014724_GEPHA1-AS1
37 37rs7274581_CPTK2B
38 38rs429358_CPTK2B

References

  1. Jack, C.R., Jr.; Bennett, D.A.; Blennow, K.; Carrillo, M.C.; Feldman, H.H.; Frisoni, G.B.; Hampel, H.; Jagust, W.J.; Johnson, K.A.; Knopman, D.S.; et al. A/T/N: An unbiased descriptive classification scheme for Alzheimer disease biomarkers. Neurology 2016, 87, 539–547. [Google Scholar] [CrossRef] [PubMed]
  2. Hardy, J.A.; Higgins, G.A. Alzheimer’s disease: The amyloid cascade hypothesis. Science 1992, 256, 184–185. [Google Scholar] [CrossRef] [PubMed]
  3. Jansen, I.E.; Savage, J.E.; Watanabe, K.; Bryois, J.; Williams, D.M.; Steinberg, S.; Sealock, J.; Karlsson, I.K.; Hägg, S.; Athanasiu, L.; et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 2019, 51, 404–413. [Google Scholar] [CrossRef] [PubMed]
  4. Alzheimer’s Association. 2020 Alzheimer’s disease facts and figures. Alzheimer’s Dement. 2020, 16, 391–460. [Google Scholar] [CrossRef] [PubMed]
  5. Gatz, M.; Reynolds, C.A.; Fratiglioni, L.; Johansson, B.; Mortimer, J.A.; Berg, S.; Fiske, A.; Pedersen, N.L. Role of genes and environments for explaining Alzheimer disease. Arch. Gen. Psychiatry 2006, 63, 168–174. [Google Scholar] [CrossRef]
  6. Kunkle, B.W.; Grenier-Boley, B.; Sims, R.; Bis, J.C.; Damotte, V.; Naj, A.C.; Boland, A.; Vronskaya, M.; Van Der Lee, S.J.; Amlie-Wolf, A.; et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nat. Genet. 2019, 51, 414–430. [Google Scholar] [CrossRef]
  7. Lambert, J.C.; Ibrahim-Verbaas, C.A.; Harold, D.; Naj, A.C.; Sims, R.; Bellenguez, C.; Jun, G.; DeStefano, A.L.; Bis, J.C.; Beecham, G.W.; et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 2013, 45, 1452–1458. [Google Scholar] [CrossRef]
  8. Ramanan, V.; Saykin, A. Pathways to neurodegeneration: Mechanistic insights from GWAS in Alzheimer’s disease, Parkinson’s disease, and related disorders. Am. J. Neurodegener. Dis. 2013, 2, 145. [Google Scholar]
  9. Gandhi, S.; Wood, N.W. Genome-wide association studies: The key to unlocking neurodegeneration? Nat. Neurosci. 2010, 13, 789–794. [Google Scholar] [CrossRef]
  10. Pihlstrom, L.; Wiethoff, S.; Houlden, H. Chapter 22—Genetics of neurodegenerative diseases: An overview. In Handbook of Clinical Neurology; Kovacs, G.G., Alafuzoff, I., Eds.; Elsevier: Amsterdam, The Netherlands, 2018; Volume 145, pp. 309–323. [Google Scholar] [CrossRef]
  11. Tsuji, S. Genetics of neurodegenerative diseases: Insights from high-throughput resequencing. Hum. Mol. Genet. 2010, 19, R65–R70. [Google Scholar] [CrossRef] [Green Version]
  12. Chung, J.; Wang, X.; Maruyama, T.; Ma, Y.; Zhang, X.; Mez, J.; Sherva, R.; Takeyama, H.; The Alzheimer’s Disease Neuroimaging Initiative; Lunetta, K.L.; et al. Genome-wide association study of Alzheimer’s disease endophenotypes at prediagnosis stages. Alzheimer’s Dement. 2018, 14, 623–633. [Google Scholar] [CrossRef] [PubMed]
  13. Waring, S.C.; Rosenberg, R.N. Genome-Wide Association Studies in Alzheimer Disease. Arch. Neurol. 2008, 65, 329–334. [Google Scholar] [CrossRef] [PubMed]
  14. Harold, D.; Abraham, R.; Hollingworth, P.; Sims, R.; Gerrish, A.; Hamshere, M.L.; Pahwa, J.S.; Moskvina, V.; Dowzell, K.; Williams, A.; et al. Genome-Wide Association Study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nat. Genet. 2009, 41, 1088–1093. [Google Scholar] [CrossRef] [PubMed]
  15. Shen, L.; Thompson, P.M. Brain Imaging Genomics: Integrated Analysis and Machine Learning. Proc. IEEE 2020, 108, 125–162. [Google Scholar] [CrossRef] [PubMed]
  16. Shen, L.; Thompson, P.M.; Potkin, S.G.; Bertram, L.; Farrer, L.A.; Foroud, T.M.; Green, R.C.; Hu, X.; Huentelman, M.J.; Kim, S.; et al. Genetic analysis of quantitative phenotypes in AD and MCI: Imaging, cognition and biomarkers. Brain Imaging Behav. 2014, 8, 183–207. [Google Scholar] [CrossRef] [PubMed]
  17. Shen, L.; Kim, S.; Risacher, S.L.; Nho, K.; Swaminathan, S.; West, J.D.; Foroud, T.; Pankratz, N.; Moore, J.H.; Sloan, C.D.; et al. Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort. Neuroimage 2010, 53, 1051–1063. [Google Scholar] [CrossRef]
  18. Ferreira, D.; Nordberg, A.; Westman, E. Biological subtypes of Alzheimer disease. Neurology 2020, 94, 436–448. [Google Scholar] [CrossRef]
  19. Jellinger, K.A. Pathobiological Subtypes of Alzheimer Disease. Dement. Geriatr. Cogn. Disord. 2021, 49, 321–333. [Google Scholar] [CrossRef]
  20. Yao, X.; Yan, J.; Kim, S.; Nho, K.; Risacher, S.L.; Inlow, M.; Moore, J.H.; Saykin, A.J.; Shen, L. Two-dimensional enrichment analysis for mining high-level imaging genetic associations. Brain Inform. 2017, 4, 27–37. [Google Scholar] [CrossRef]
  21. Weiner, M.W.; Veitch, D.P.; Aisen, P.S.; Beckett, L.A.; Cairns, N.J.; Green, R.C.; Harvey, D.; Jack, C.R.; Jagust, W.; Liu, E.; et al. The Alzheimer’s Disease Neuroimaging Initiative: A review of papers published since its inception. Alzheimer’s Dement. 2013, 9, e111–e194. [Google Scholar] [CrossRef] [Green Version]
  22. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed]
  23. Kim, M.; Wu, R.; Yao, X.; Saykin, A.J.; Moore, J.H.; Shen, L.; Alzheimer’s Disease Neuroimaging Initiative. Identifying genetic markers enriched by brain imaging endophenotypes in Alzheimer’s disease. BMC Med. Genom. 2022, 15, 168. [Google Scholar] [CrossRef] [PubMed]
  24. Ding, C.; He, X.; Zha, H.; Gu, M.; Simon, H. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the Proceedings 2001 IEEE International Conference on Data Mining, San Jose, CA, USA, 29 November–2 December 2001; pp. 107–114. [Google Scholar] [CrossRef]
  25. De, W.; Wang, Y.; Nie, F.; Yan, J.; Cai, W.; Saykin, A.J.; Shen, L.; Huang, H. Human connectome module pattern detection using a new multi-graph MinMax cut model. Med. Image Comput. Comput. Assist. Interv. 2014, 17, 313–320. [Google Scholar] [PubMed]
  26. Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
  27. Chen, E.Y.; Tan, C.M.; Kou, Y.; Duan, Q.; Wang, Z.; Meirelles, G.V.; Clark, N.R.; Ma’ayan, A. Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 2013, 14, 128. [Google Scholar] [CrossRef] [PubMed]
  28. Kuleshov, M.V.; Jones, M.R.; Rouillard, A.D.; Fernandez, N.F.; Duan, Q.; Wang, Z.; Koplev, S.; Jenkins, S.L.; Jagodnik, K.M.; Lachmann, A.; et al. Enrichr: A comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016, 44, W90–W97. [Google Scholar] [CrossRef]
  29. Xie, Z.; Bailey, A.; Kuleshov, M.V.; Clarke, D.J.B.; Evangelista, J.E.; Jenkins, S.L.; Lachmann, A.; Wojciechowicz, M.L.; Kropiwnicki, E.; Jagodnik, K.M.; et al. Gene Set Knowledge Discovery with Enrichr. Curr. Protoc. 2021, 1, e90. [Google Scholar] [CrossRef]
  30. Gorgolewski, K.J.; Varoquaux, G.; Rivera, G.; Schwarz, Y.; Ghosh, S.S.; Maumet, C.; Sochat, V.V.; Nichols, T.E.; Poldrack, R.A.; Poline, J.B.; et al. NeuroVault.org: A web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Front. Neuroinform. 2015, 9, 8. [Google Scholar] [CrossRef]
  31. Jun, G.; Ibrahim-Verbaas, C.A.; Vronskaya, M. A novel Alzheimer disease locus located near the gene encoding tau protein. Mol. Psychiatry 2016, 21, 108–117. [Google Scholar] [CrossRef] [Green Version]
  32. Yan, J.; Raja V, V.; Huang, Z.; Amico, E.; Nho, K.; Fang, S.; Sporns, O.; Wu, Y.C.; Saykin, A.; Goni, J.; et al. Brain-wide structural connectivity alterations under the control of Alzheimer risk genes. Int. J. Comput. Biol. Drug Des. 2020, 13, 58–70. [Google Scholar] [CrossRef]
  33. Murphy, M.P.; LeVine, H. Alzheimer’s disease and the amyloid-beta peptide. J. Alzheimer’s Dis. 2010, 19, 311–323. [Google Scholar] [CrossRef] [PubMed]
  34. Grimm, M.O.; Mett, J.; Stahlmann, C.P.; Grösgen, S.; Haupenthal, V.J.; Blümel, T.; Hundsdörfer, B.; Zimmer, V.C.; Mylonas, N.T.; Tanila, H.; et al. APP intracellular domain derived from amyloidogenic β- and γ-secretase cleavage regulates neprilysin expression. Front. Aging Neurosci. 2015, 7, 77. [Google Scholar] [CrossRef] [PubMed]
  35. Papuć, E.; Rejdak, K. The role of myelin damage in Alzheimer’s disease pathology. Arch. Med. Sci. 2020, 16, 345–351. [Google Scholar] [CrossRef] [PubMed]
  36. Kolev, M.V.; Ruseva, M.M.; Harris, C.L.; Morgan, B.P.; Donev, R.M. Implication of complement system and its regulators in Alzheimer’s disease. Curr. Neuropharmacol. 2009, 7, 1–8. [Google Scholar] [CrossRef] [PubMed]
  37. Koldamova, R.; Fitz, N.F.; Lefterov, I. The role of ATP-binding cassette transporter A1 in Alzheimer’s disease and neurodegeneration. Biochim. Biophys. Acta 2010, 1801, 824–830. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Flowchart of our analysis pipeline. Step 1 generates detailed SNP-QT association maps (54 SNPs by 116 QTs) for five different subject sets examined in our previous study [23], respectively. Step 2 transforms the SNP-QT map to SNP networks by applying different similarity scoring functions to each pair of 116-dimensional SNP vectors. Step 3 uses multigraph min-max cut algorithm to generate an optimal clustering result scoring analysis in Step 4. In Step 5, the SNPs in each cluster are mapped to nearest genes and uploaded to enrichR for Elsevier pathway analysis to identify relevant biological pathways. In Step 6, Neurovault and Neurosynth are used to functionally annotate the average brain association pattern for all the SNPs in each cluster.
Figure 1. Flowchart of our analysis pipeline. Step 1 generates detailed SNP-QT association maps (54 SNPs by 116 QTs) for five different subject sets examined in our previous study [23], respectively. Step 2 transforms the SNP-QT map to SNP networks by applying different similarity scoring functions to each pair of 116-dimensional SNP vectors. Step 3 uses multigraph min-max cut algorithm to generate an optimal clustering result scoring analysis in Step 4. In Step 5, the SNPs in each cluster are mapped to nearest genes and uploaded to enrichR for Elsevier pathway analysis to identify relevant biological pathways. In Step 6, Neurovault and Neurosynth are used to functionally annotate the average brain association pattern for all the SNPs in each cluster.
Genes 13 01520 g001
Figure 2. Detailed imaging genetic association maps (54 SNPs by 116 ROIs) with each entry as a normalized l o g 10 (p-value) from linear regression of ROI vs. SNP within each comparison group. Normalization was performed so that each row has a squared norm of 1. The vertical axis follows the SNP order listed in Table A2. The horizontal axis follows the ROI order listed in Table A3.
Figure 2. Detailed imaging genetic association maps (54 SNPs by 116 ROIs) with each entry as a normalized l o g 10 (p-value) from linear regression of ROI vs. SNP within each comparison group. Normalization was performed so that each row has a squared norm of 1. The vertical axis follows the SNP order listed in Table A2. The horizontal axis follows the ROI order listed in Table A3.
Genes 13 01520 g002
Figure 3. Averaged Silhouette scoring of single-graph and multigraph clustering results across 5 scoring functions × 5 comparison groups at each number of cluster. The results of analyzing AV45, FDG, and VBM data are shown from left to right. In the subsequent analyses, we report the multigraph results of clustering SNPs into two groups, which is the optimal case for both AV45 and VBM.
Figure 3. Averaged Silhouette scoring of single-graph and multigraph clustering results across 5 scoring functions × 5 comparison groups at each number of cluster. The results of analyzing AV45, FDG, and VBM data are shown from left to right. In the subsequent analyses, we report the multigraph results of clustering SNPs into two groups, which is the optimal case for both AV45 and VBM.
Genes 13 01520 g003
Figure 4. The SNP networks (54 by 54) constructed by the normalized cosine scoring function. Each entry is the cosine similarity of two corresponding SNP representations (measuring their association patterns with 116 ROIs in the brain). The black line indicates the partition of two clusters.
Figure 4. The SNP networks (54 by 54) constructed by the normalized cosine scoring function. Each entry is the cosine similarity of two corresponding SNP representations (measuring their association patterns with 116 ROIs in the brain). The black line indicates the partition of two clusters.
Genes 13 01520 g004
Figure 5. (a) Cosine SNP network derived from genetic analysis of the AV45 data in the LMCI vs. HC comparison. (b) The Elsevier pathway analysis from EnrichR of SNP group 1 (20 SNPs) and SNP group 2 (34 SNPs). (c) The average normalized brain significance maps corresponding to SNP group 1 (left) and SNP group 2 (right), respectively. (d) Neurosynth analysis results of the two brain maps shown in (c).
Figure 5. (a) Cosine SNP network derived from genetic analysis of the AV45 data in the LMCI vs. HC comparison. (b) The Elsevier pathway analysis from EnrichR of SNP group 1 (20 SNPs) and SNP group 2 (34 SNPs). (c) The average normalized brain significance maps corresponding to SNP group 1 (left) and SNP group 2 (right), respectively. (d) Neurosynth analysis results of the two brain maps shown in (c).
Genes 13 01520 g005
Figure 6. (a) Cosine SNP network derived from analyzing VBM data in the AD vs. HC comparison. (b) The Elsevier pathway analysis from EnrichR of SNP group 1 (16 SNPs) and SNP group 2 (38 SNPs). (c) The average normalized brain significance maps corresponding to SNP group 1 (left) and SNP group 2 (right), respectively. (d) Neurosynth analysis results of the two brain maps shown in (c).
Figure 6. (a) Cosine SNP network derived from analyzing VBM data in the AD vs. HC comparison. (b) The Elsevier pathway analysis from EnrichR of SNP group 1 (16 SNPs) and SNP group 2 (38 SNPs). (c) The average normalized brain significance maps corresponding to SNP group 1 (left) and SNP group 2 (right), respectively. (d) Neurosynth analysis results of the two brain maps shown in (c).
Genes 13 01520 g006
Table 1. Assume the 54-by-116 genetic-imaging matrix is X. Scoring functions are applied to X i and X j R 116 , 116-dimensional row vectors of X that maps the effect of a given SNP to 116 brain regions of interest (ROIs). Assume X i k denotes the i-th row and k-th column entry of X. Note that the Manhattan distance and Euclidean distance need to be transformed to the corresponding similarity measures using a Gaussian radial basis function in the third column.
Table 1. Assume the 54-by-116 genetic-imaging matrix is X. Scoring functions are applied to X i and X j R 116 , 116-dimensional row vectors of X that maps the effect of a given SNP to 116 brain regions of interest (ROIs). Assume X i k denotes the i-th row and k-th column entry of X. Note that the Manhattan distance and Euclidean distance need to be transformed to the corresponding similarity measures using a Gaussian radial basis function in the third column.
MeasurementScoring FunctionNormalized Similarity
Pearson correlation r ( i , j ) = k = 1 n ( X i k X i ¯ ) ( X j k X j ¯ ) k = 1 n ( X i k X i ¯ ) 2 ( X j k X j ¯ ) 2 | r ( i , j ) |
Spearman correlation ρ ( i , j ) = 1 6 k = 1 n ( r a n k ( X i k ) r a n k ( X j k ) ) 2 n ( n 2 1 ) | ρ ( i , j ) |
Manhattan distance d ( i , j ) = | | X i X j | | 1 e 0.5 d ( i , j ) d m i n ( d m a x d m i n ) / 3 2
Euclidean distance d ( i , j ) = | | X i X j | | 2 e 0.5 d ( i , j ) d m i n ( d m a x d m i n ) / 3 2
Cosine c o s ( i , j ) = X i · X j | | X i | | · | | X i | | | c o s ( i , j ) |
Table 2. Objective functions of single-graph and multigraph clustering. A is the adjacency matrix, which is equivalent to the similarity network in this study. D is the diagonal matrix of A. Q is the output clustering labels. K is the number of clusters.
Table 2. Objective functions of single-graph and multigraph clustering. A is the adjacency matrix, which is equivalent to the similarity network in this study. D is the diagonal matrix of A. Q is the output clustering labels. K is the number of clusters.
Graph Cut Algorithm for Cluster AnalysisObjective Function
Single-graph min-max cut m i n Q T Q = I Σ k = 1 K q k T D q k q k T A q k
Multigraph min-max cut m i n Q T Q = I Σ v = 1 m Σ k = 1 K q k T D v q k q k T A v q k
Table 3. Silhouette scoring functions. Let C I be the cluster which node i belongs to.
Table 3. Silhouette scoring functions. Let C I be the cluster which node i belongs to.
MeasureCalculation
mean distance a ( i ) = 1 | C I | 1 Σ j C I , i j d ( i , j )
mean dissimilarity b ( i ) = m i n J I 1 | C J | Σ j C J d ( i , j )
Silhouette value s ( i ) = b ( i ) a ( i ) m a x ( a ( i ) , b ( i ) )
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, R.; Bao, J.; Kim, M.; Saykin, A.J.; Moore, J.H.; Shen, L.; on behalf of ADNI. Mining High-Level Imaging Genetic Associations via Clustering AD Candidate Variants with Similar Brain Association Patterns. Genes 2022, 13, 1520. https://doi.org/10.3390/genes13091520

AMA Style

Wu R, Bao J, Kim M, Saykin AJ, Moore JH, Shen L, on behalf of ADNI. Mining High-Level Imaging Genetic Associations via Clustering AD Candidate Variants with Similar Brain Association Patterns. Genes. 2022; 13(9):1520. https://doi.org/10.3390/genes13091520

Chicago/Turabian Style

Wu, Ruiming, Jingxuan Bao, Mansu Kim, Andrew J. Saykin, Jason H. Moore, Li Shen, and on behalf of ADNI. 2022. "Mining High-Level Imaging Genetic Associations via Clustering AD Candidate Variants with Similar Brain Association Patterns" Genes 13, no. 9: 1520. https://doi.org/10.3390/genes13091520

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop