Next Article in Journal
Diagnosing Cervical Dysplasia Using Visual Inspection of the Cervix with Acetic Acid in a Woman in Rural Haiti
Previous Article in Journal
Developing a Service Platform Definition to Promote Evidence-Based Planning and Funding of the Mental Health Service System
Previous Article in Special Issue
Longitudinal Trajectories of Cholesterol from Midlife through Late Life according to Apolipoprotein E Allele Status
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Analysis of a Repeated Measure Design in Genome-Wide Association Analysis

1
The Center for Genome Science, Korea National Institute of Health, KCDC, Osong 361-951, Korea
2
Department of Applied Statistics, Chung-Ang University, Seoul 156-756, Korea
3
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA
4
Department of Statistics, Inha University, Incheon 402-751, Korea
5
Department of Public Health Science, Seoul National University, Seoul 151-742, Korea
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2014, 11(12), 12283-12303; https://doi.org/10.3390/ijerph111212283
Submission received: 31 July 2014 / Revised: 7 November 2014 / Accepted: 18 November 2014 / Published: 28 November 2014
(This article belongs to the Special Issue Genetic Epidemiology)

Abstract

:
Longitudinal data enables detecting the effect of aging/time, and as a repeated measures design is statistically more efficient compared to cross-sectional data if the correlations between repeated measurements are not large. In particular, when genotyping cost is more expensive than phenotyping cost, the collection of longitudinal data can be an efficient strategy for genetic association analysis. However, in spite of these advantages, genome-wide association studies (GWAS) with longitudinal data have rarely been analyzed taking this into account. In this report, we calculate the required sample size to achieve 80% power at the genome-wide significance level for both longitudinal and cross-sectional data, and compare their statistical efficiency. Furthermore, we analyzed the GWAS of eight phenotypes with three observations on each individual in the Korean Association Resource (KARE). A linear mixed model allowing for the correlations between observations for each individual was applied to analyze the longitudinal data, and linear regression was used to analyze the first observation on each individual as cross-sectional data. We found 12 novel genome-wide significant disease susceptibility loci that were then confirmed in the Health Examination cohort, as well as some significant interactions between age/sex and SNPs.

1. Introduction

Disease prognosis and personalized medicine require the identification of genetic and non-genetic risk factors and, with the rapid improvement of genotyping technology, more than ten thousand genome-wide association studies (GWAS) have been conducted to discover disease susceptibility loci. Since the first such successful study in 2005 [1], more than ten thousand disease susceptibility loci have been successfully identified and these findings have improved our understanding of the genetic background of human diseases. However, in spite of these successes in GWAS, causal genetic variants identified by GWAS explain only a small proportion of the heritability [2,3]. Various reasons, including the common disease/rare variant hypothesis, have been put forward to explain this so-called missing heritability [4]. However, the missing heritability is partially attributable to a large number of false negative findings induced by insufficient sample sizes when controlling for multiple testing [5], and various strategies, such as GWAS using multiple phenotypes or longitudinal data [6,7], have been considered to overcome these problems. The analysis of multiple phenotypes can suffer from their inherent heterogeneity, but the analysis of the multiple measures of the same phenotype provided by longitudinal data may avoid this issue and, if measurement errors are substantial, GWAS with longitudinal data can be expected to mitigate the sample size problem.
Even though there are few GWAS using longitudinal data [8,9,10], compared to cross-sectional data longitudinal data have various useful features. First, although phenotyping is sometimes more expensive than the cost of genotyping, in those situations where the cost of genotyping is more expensive than that of phenotyping, repeated measurements at different time points have the virtual effect of enlarging the sample size. Second, with longitudinal data, the total phenotypic variance can be decomposed into among-subject and within-subject components. Third, phenotypes at different time points can be compared with baseline phenotypes and any confounding effect due to age can be prevented. Fourth, the onset of some diseases is sometimes affected by genetic variants, and gene × age interaction can be estimated with better accuracy. In this report, we conducted GWAS with longitudinal data in the Korean Association Resource (KARE) cohort. Phenotypes in the KARE cohort were measured every two years from 2001 to 2005, and we performed GWAS for eight phenotypes with three repeated measurements: systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting plasma glucose (GLU0), 2-h OGTT glucose (GLU120), height, body mass index (BMI), high-density lipoprotein (HDL) and aspartate aminotransferase (AST). Results from the longitudinal GWAS were compared with those from GWAS using cross-sectional data, and our results showed that GWAS using longitudinal data provided more significant results. We identified 12 novel variants associated with phenotypes: rs11067763 (near MED13L) for DBP; rs12991703 (near MARCO) and rs7197218 (in XYLT1) for GLU0; rs17178527 (in AK097143) for BMI; rs12292858 (in SIK3), rs11066280 (in HECTD4) and rs183786 (near ALDH1A2) for HDL; and rs10849915 (in CCDC63), rs3782889 (in MYL2), rs12229654 (near MYL2-CUX2), rs11066280 (in HECTD4) and rs2072134 (in OAS3) for log-transformed AST. These variant associations were found to replicate in the Health Examinee (HEXA) cohort and thus illustrate the practical value of a longitudinal data analysis.

2. Materials and Methods

2.1. The Korean Association Resource (KARE) Cohort

The KARE cohort consists of a total of 10,038 individuals (5018 and 5020 individuals from Ansung and Ansan, respectively). Participants ranged from 40 to 69 years old, and their phenotypes were consecutively measured with two-year intervals from 2001 to 2005. Among the 10,038 participants, 10,004 individuals were genotyped for 500,568 SNPs with the Affymetrix Genome-Wide Human SNP array 5.0. Individuals and SNPs with call rates less than 95% were excluded from the analysis. SNPs with p-values for Hardy-Weinberg equilibrium (HWE) less than 10−6, or with minor allele frequencies (MAF) less than 0.01, were eliminated. Furthermore, individuals with tumors, gender inconsistencies, or whose heterozygosity rates were more than 30%, or identity in state (IBS) more than 0.8, were excluded from the analysis [11]. In total, 8842 individuals with 352,228 SNPs were available at the baseline time-point. At the second and third time-points, there were some missing phenotypes, and phenotypes for 7568 and 6675 individuals, respectively, were available.

2.2. The Health Examinee (HEXA) Cohort

Independent individuals in the HEXA cohort were from a second population based cohort sample provided by the Health study. This study combines subjects from the Wonju, Pyeong Chang, Gangneung, Geumsan, and Naju regional cohorts in Korea. There are 120,000 participants in the HEXA cohort, 4302 of whom, between 40 and 69 years old, were randomly selected for genotyping with the Affymetrix Genome-Wide Human SNP array 6.0. Individuals in the HEXA cohort were used to replicate the significant findings found in the KARE cohort, and individuals with tumors, large heterozygosity rates, gender inconsistencies, evidence of non-Asian ancestry, or whose IBS was more than 0.8 or call rates were less than 95%, were excluded from analysis [12]. SNPs with p-values for HWE less than 10−6, genotype call rates less than 95%, or MAF less than 0.01, were excluded, and the remaining SNPs for 3703 individuals were used for the analysis. In particular, the HEXA cohort is a cross-sectional study, and if there are some SNPs which have a progressing effect on phenotypes, results from HEXA and KARE cohort could be heterogeneous.

2.3. Sample Size Calculation for a Longitudinal Study

Sample sizes required to achieve 0.8 power at the 10−8 significance level were calculated in both the presence and absence of population substructure. We denote the number of individuals and measurements for each individual by n and t, respectively. We assume that t is same for all individuals. The additively coded value of the genotype for individual i is denoted by Xi. We assume Hardy-Weinberg equilibrium and each individual’s genotype is assumed to follow a trinomial distribution. The effect of the disease allele is assumed to be β. We assume a matrix of environmental effects that does not contain time-varying covariates for individual i, denoted by Zi, and its coefficient is assumed to be the column vector α. The columns of Zi denote each covariate. The phenotype for individual i at time-point j is denoted by Yij, and the corresponding t-dimensional vector by Yi. Letting 1t be the t-dimensional column vector with elements 1, Yi was assumed to be:
Y i = Z i α + ( X i β ) 1 t + g i 1 t + c i 1 t + ε i ,   c i ~ N ( 0 , σ c 2 ) , ε i ~ M V N ( 0 , σ ε 2 I )
where ci indicates the random effect which explains the similarity of repeated measurements for each individual attributable to a non-polygenic effect. We denote σp2 = σg2 + σc2 + σε2 and ρ = σc2/σp2. Thus ρ measures the proportion of the variance of ci relative to the phenotypic variance. In particular, we included in the model to provide a polygenic effect for individual i, and assumed as an approximation g = ( g 1 , g 2 , , g n ) follows the multivariate normal distribution with mean vector 0 and variance-covariance matrix σ g 2 Φ where Φ corresponds to the genetic relationship matrix. If σ g 2 σ g 2 is 0, the correlation matrix R of Yi becomes compound symmetric. The null hypothesis H0 is β = 0, and β is assumed to be βa under the alternative hypothesis.
For sample size calculation, we assumed that there is no covariate effect other than the genotypes, and then we could assume that the environmental variable Zi is 1t. Then if we let πl = P(Xi = δl, Zi = 1) = P(Xi = δl) for the coded genotype δl where δ1 = 0, δ2 = 1 and δ3 = 2, and K = π1(π2 + 2π3)2 + π2(π3π1)2 + π3(1 + π1π3)2, the required sample size for 1 − ϕ power at the significance level α can be derived to be:
n α = ( z 1 α 2 + z 1 ϕ ) σ p 2 K β α 2 ( 1 + ( t 1 ) ρ t )
(see the Appendix for the detailed derivation). Liu and Liang [13] derived the required sample sizes when Xi is binary and our results are based on their derivations. We extend their result to Xi with arbitrary many levels. For nα, we assume that σc2 + σ ε2 = 1 and:
2 β 2 p ( 1 p ) 2 β 2 p ( 1 p ) + σ c 2 + σ ε 2 = 0.005
In the presence of population substructure, σg2 is assumed to be larger than 0. The required sample size cannot be directly calculated, and we calculated na by simulation studies based on a Monte Carlo method. In our simulations, we assumed that σg2 + σc2 + σ ε2 = 1 and the effect of disease the allele, β, was calculated with the following assumptions:
2 β 2 p ( 1 p ) 2 β 2 p ( 1 p ) + σ g 2 + σ c 2 + σ ε 2 = 0.005   and   h 2 = 2 β 2 p ( 1 p ) + σ g 2 2 β 2 p ( 1 p ) + σ g 2 + σ c 2 + σ ε 2 = 0.3
where these equations indicate that the proportion of genetic variance explained by the causal genotype is 0.005 and heritability is 0.3. For convenience, Φ was assumed to be a compound symmetric matrix with off-diagonal elements 0.1. The off-diagonal elements are asymptotically equivalent to twice the kinship coefficient between two individuals, and 0.1 means that the individuals are genetically remote relatives.

2.4. Genome-Wide Association Studies (GWAS) Using Longitudinal and Cross-Sectional Data

In the Korean Association Resource (KARE) data, eight phenotypes (SBP, DBP, GLU0, GLU120, height, BMI, HDL, and AST) were observed every two years from 2001 to 2005, so that three observations were available for each phenotype. Genotypes and phenotypes for 8842 individuals were initially available. However, there were some missing phenotypes for follow-up observations, and only 7568 and 6675 individuals were observed for the second and third time-points, respectively. Reasons for dropout were not known, but may include death, immigration, and non-response.
GWAS using longitudinal data were performed by generalized least squares using the nlme package in the R software. The phenotype for individual i is denoted by Yi which is a three dimensional vector. The matrix Zi indicates a covariate vector for environmental effects, including the intercept as the first column, and sex, age and age2 at the first time-point were included as covariates for all eight phenotypes. In particular, weight is known to be related to glucose levels, and thus it was included as an additional covariate for the GWAS of GLU0 and GLU120 [14,15]. The coefficient vector of Zi is denoted by a vector α. The effect of the time interval can be understood as the effect of aging, and it was denoted by the vector w. Here, w is a three-dimensional row vector and its coefficient is η. The population substructure between individuals was adjusted for with the EIGENSTRAT approach [16] and the remaining potential bias unadjusted by EIGENSTRAT was further adjusted for by the genomic control method [17]. In particular, the IBS matrix is often better than the identity-by-descent matrix for capturing the long-distance relationships that result from variations at the population level [18] and we used the IBS matrix for EIGENSTRAT. The first five principal component (PC) scores accounted for 75% of the variation in the IBS matrix, and they were used as covariates to adjust for any population substructure. The PC score vectors for individual i and its coefficient vector are PCi and γ, respectively. The additively coded value of the genotype for individual i is denoted by Xi. The effect of the disease allele is assumed to be β. The variance-covariance matrix for εi is denoted Σ and assumed to be an unstructured symmetric matrix. For longitudinal analysis, our final model is:
Y i = Z i α + ( X i β ) 1 3 + P C i γ + w η + ε i
where w = ( 0 , 2 , 4 ) , ε i = ( ε i 1 , ε i 2 , ε i 3 ) is distributed as M V N ( 0 , Σ ) for i = 1,2, …, 8842.
Furthermore, we conducted GWAS using cross-sectional data for comparison with the GWAS using longitudinal data. For this we took the phenotypes at the first-time point in the KARE cohort, and the GWAS were conducted with linear regression. For the cross-sectional analysis, we used the same covariates except for time interval, with the linear model:
Y i = Z i α i + X i β k = 1 5 P C i k γ k + ε i
where ε i is distributed as N ( 0 , σ 2 ) for i = 1,2, …, 8842. The results from longitudinal and cross-sectional data were compared.
We tested whether there exist any interaction effects between our significant findings and environmental variables by adding interaction terms as covariates. We considered age, sex and time interval as environmental variables, and their statistical interaction with the SNPs were tested. Significant results were further tested for replication in the HEXA cohort (other than time interval, because the HEXA cohort only has cross-sectional data). Finally, we tested all the significant findings from the KARE cohort, as a discovery dataset, in the HEXA cohort, as a replication dataset.

3. Results

3.1. Sample Size for a Longitudinal Cohort Design

We calculated the sample size required to achieve 0.8 power at the genome-wide significance level α = 10−8 in both the absence and presence of population substructure. Figure 1 and Figure 2 respectively show the required sample size nα, in the absence and presence of population substructure, as a function of the number of time points t and the common correlation ρ between phenotype measures on the same person. We found the required sample size nα is proportionally related to t and inversely related to ρ. Sample size is minimized for small ρ and large t, and the effect of t on sample size is maximal when ρ = 0. These results illustrate the practical efficiency of GWAS with longitudinal data. For instance, if ρ = 0.4 and t = 3, then 5158 individuals are sufficient to achieve 0.8 power at the genome-wide significance level and, compared to cross-sectional data, genotyping costs for 3438 individuals can be saved vs. the cost of obtaining 6878 phenotypes.
Figure 1. Required sample size in the absence of population substructure. The sample size is indicated by n. The required sample size to achieve 0.8 power at the significance level α = 10−8 has been calculated as a function of t, the number of time points, and ρ, the correlation between measurements at two different time-points.
Figure 1. Required sample size in the absence of population substructure. The sample size is indicated by n. The required sample size to achieve 0.8 power at the significance level α = 10−8 has been calculated as a function of t, the number of time points, and ρ, the correlation between measurements at two different time-points.
Ijerph 11 12283 g001

3.2. Genome-Wide Association Studies (GWAS) with Longitudinal Data for Eight Phenotypes

Table 1 and Table 2 provide descriptive statistics for sex, age and other available phenotypes from the KARE and HEXA cohorts. These show that the distributions of phenotypes are similar in the HEXA cohort the KARE cohort at each time point. We checked the normality of the eight phenotypes with histograms. In particular, AST was not normally distributed and so was log-transformed. Figure 3 shows that log-transformed AST and the other seven phenotypes on the original scale are about normally distributed, so these were used for the GWAS.
Figure 2. Required sample size in the presence of population substructure. The sample size is indicated by n. The required sample size to achieve 0.8 power at the significance level α = 10−8 has been calculated as a function of t, the number of time points, and ρ, the correlation between measurements at two different time-points.
Figure 2. Required sample size in the presence of population substructure. The sample size is indicated by n. The required sample size to achieve 0.8 power at the significance level α = 10−8 has been calculated as a function of t, the number of time points, and ρ, the correlation between measurements at two different time-points.
Ijerph 11 12283 g002
Table 1. Sample sizes for Korean Association Resource (KARE) cohort.
Table 1. Sample sizes for Korean Association Resource (KARE) cohort.
Time PointKARE
123
N(Ansan/Ansung)Age(s.d)N(Ansan/Ansung)Age(s.d)N(Ansan/Ansung)Age(s.d)
Male2374/180951.78(8.79)1967/164253.71(8.82)1758/142455.58(8.71)
Female2263/239652.61(9.02)1764/221354.60(8.99)1543/195056.48(8.90)
Total4637/420552.22(8.92)3731/385554.18(8.92)3301/337456.05(8.82)
Table 2. Descriptive statistics for eight quantitative phenotypes examined in the Korean Association Resource (KARE) and Health Examinee (HEXA) cohorts.
Table 2. Descriptive statistics for eight quantitative phenotypes examined in the Korean Association Resource (KARE) and Health Examinee (HEXA) cohorts.
Time PointKARE CohortHEXA Cohort
123
PhenotypeMean(s.d)NMean(s.d)NMean(s.d)NMean(s.d)N
SBP121.65(18.61)8842118.6(17.3)7504116.6(16.62)6646121.69(14.36)3703
DBP80.26(11.46)884278.49(10.96)750477.69(10.25)664677.05(9.84)3703
GLU087.66(21.88)858192.74(15.14)668892.31(15.15)598594.10(24.56)3703
GLU120126.76(51.03)8387125.77(41.59)4865134.07(50.56)5985Not available
height160(8.67)8842159.93(8.74)7461159.95(8.76)6596161.49(8.10)3703
BMI24.6(3.12)883824.59(3.09)745624.52(3.05)659623.96(2.88)3703
HDL44.65(10.09)884146.27(9.90)749544.04(10.25)664054.60(13.27)3703
AST29.81(18.41)884124.67(14.95)749525.87(19.02)664024.51(12.94)3703
The results of the GWAS with longitudinal data in the KARE cohort were compared with the results from GWAS using cross-sectional data. For the cross-sectional data we used the phenotypes at the first time-point, applying linear regression. Population substructure was adjusted for with the EIGENSTRAT method, and five principal component (PC) scores were included as covariates in both the longitudinal and cross-sectional data analyses. We found that five PC scores explain roughly 75% of the kinship matrix, and Table 3 shows the estimated variance inflation factors, λ, obtained by genomic control [17]. The estimated variance inflation factors from the longitudinal data analyses were always slightly larger than those from the cross-sectional data analyses, which suggests that longitudinal data analysis tends to be more sensitive to population substructure. Even though more detailed sensitivity analyses are necessary to confirm whether the model assumption for longitudinal data analyses are satisfied, our findings are probably not affected by population substructure because the quantile-quantile (QQ) and Manhattan plots for the eight phenotypes in Supplementary Figures 1–4 consistently show the validity of our analysis.
Figure 3. Histograms for SBP, DBP, GLU0, GLU120, HEIGHT, BMI, HDL and log AST in the KARE cohort.
Figure 3. Histograms for SBP, DBP, GLU0, GLU120, HEIGHT, BMI, HDL and log AST in the KARE cohort.
Ijerph 11 12283 g003
Table 3. Inflation factors by genomic control.
Table 3. Inflation factors by genomic control.
PhenotypeCross-Sectional DataLongitudinal Data
SBP1.0401.051
DBP1.0281.053
GLU01.0221.026
GLU1201.0261.037
height1.0691.071
BMI1.0461.052
HDL1.0341.039
log AST1.0231.030
We calculated the correlations between the different time-points for each phenotype and they are presented in Table 4. The correlations for height and BMI are usually very large and those for log AST are the smallest. Therefore, the improvement in power on using longitudinal data is expected to be the most substantial for log AST, and it seems to be almost negligible for height and BMI. Table 5 and Table 6 show the results from GWAS using longitudinal data and cross-sectional data in the KARE cohort, and the significant results were further tested in the HEXA cohort. The cross-sectional data for the KARE cohort are the first measurements for each individual in the longitudinal data. Cross-sectional and longitudinal data in the KARE cohort were analyzed with linear regression and a linear mixed model, respectively, and SNPs with p-values from either the cross-sectional or longitudinal data analysis less than 10−6 were selected for the replication studies. For the discovery analyses, the genome-wide significance level by Bonferroni correction is 1.4E − 07. For replication, we calculated the one-sided p-value for the direction from the longitudinal analysis using the KARE data, and used 0.05 as the significance level. Whenever the results from the two cohorts were in different directions, the p-values from the HEXA cohort were larger than 0.5. In Table 5 and Table 6, we added results from previous studies. If a SNP has not been significantly reported but SNPs in genes in linkage disequilibrium with it have been significantly reported, those SNPs are denoted by “*”. Table 5 and Table 6 show that GWAS using the longitudinal data in the KARE cohort identified 29 significant SNPs, 20 of which have been reported in previous GWAS, while the cross-sectional data identified only 19 genome-wide significant SNPs. Therefore we can conclude that the longitudinal data lead to substantial power improvement. In our GWAS using the longitudinal data, nine SNPs were newly detected, six of which were significantly replicated in the HEXA cohort.
Table 4. Correlations between different time-points for each phenotype.
Table 4. Correlations between different time-points for each phenotype.
Time point PhenotypeCorrelation between time-points
1–22–31–3
SBP0.6080.6040.552
DBP0.5500.5960.517
GLU00.7000.7950.822
GLU1200.6750.7480.715
height0.9840.9850.984
BMI0.9420.9410.916
HDL0.6900.6840.667
log AST0.4440.4320.468
Table 5 shows that rs2401887 located in CALM1 is more significantly associated with SBP in the longitudinal data analysis. GWAS of DBP identified three significant SNPs; rs3025047 in the VEGFA gene, rs7100467 near SORCS1 and rs11067763 near MED13L. It has been reported that VEGFA is related to type-2 diabetes, coronary artery disease, age-related macular degeneration and body fat [19,20,21]. For GLU0, rs12991703 located near the MARCO gene was genome-wide significant using the cross-sectional data, and rs2191346 and rs6494306, which are respectively in linkage disequilibrium with DGKB and VPS13C, were more significant using the longitudinal data.
rs7197218 in the XYLT1 gene, which is related to corneal astigmatism [22], was genome-wide significant using the cross-sectional data. rs6031492, located in GDAPL1L, is more significantly associated with GLU120 in the cross-sectional data analysis. Table 6 shows that we detected rs17178527 in AK097143 and rs11000212 in ANAPC16 as associated with BMI. rs12292858 in SIK3 was more significant using the cross-sectional data, and rs2238153 in ATXN2, rs11066280 in HECTD4 and rs183786 near ALDH1A2 were more significantly associated with HDL by longitudinal data analysis. ATXN2, HECTD4 and ALDH1A2 have been reported to have significant associations for phenotypes related to HDL [12,23,24,25,26,27,28,29,30,31,32,33,34], and the significant associations for HECTD4 and ALDH1A2 were successfully replicated in the HEXA cohort. We also performed GWAS of log-transformed AST, and Table 6 shows nine significant SNPs, rs9837421 in SH3BP5, rs10849915 in CCDC63, rs3782889 in MYL2, rs12229654 near MYL2-CUX2, rs11066280 in HECTD4, rs11066453 in OAS1, rs2072134 in OAS3, rs12483959 in PNPLA3 and rs2143571 in SAMM50. Previous studies have reported that SH3BP5, CCDC63, MYL2 and OAS3 are related to alcohol dependence phenotypes [35,36,37], and PNPLA3 and SAMM50 are related to nonalcoholic fatty liver disease [38,39], and so our results strengthen their importance in liver disease.
We also performed association analysis to detect gene×environment interaction, and SNPs that interact with aging, sex and time interval were identified by using the longitudinal data in the KARE cohort. Table 7 and Table 8 list SNPs with p-values for gene×environment interaction less than 10−6. Table 7 shows that rs7197218 seems to be a promising candidate SNP for interaction with aging for GLU0, and Figure 4 shows that the age effects are substantially different for this SNP. However, the MAF of rs7197218 is 0.01456, and neither it nor any other SNPs that are in linkage disequilibrium with it, were found in the HEXA cohort. Thus the significant association of this SNP could not be confirmed and it will need to be further investigated in follow-up studies. Table 8 shows that rs2074356 and rs11066280 interact significantly with sex for HDL, and rs2074356, rs11066280 and rs12229654 do so for log-transformed AST. Interestingly, rs2074356 and rs11066280 have significant interaction effects with sex for both HDL and AST. We further confirmed these significant gene×environment interactions in the HEXA cohort. Based on the direction of the coefficients for these interactions, we calculated one-sided p-values, and the combined p-values by Fisher’s and Liptak’s methods [40,41]. It has been shown that the most efficient method is achieved by Liptak’s methods if the effect sizes are expected to be the same [42]. Table 9 shows that these significant interactions were further replicated in the HEXA cohort, and the combined p-values become smaller. Figure 4 shows that the effects of these SNPs are substantially different for males and females and, therefore, we can conclude that the effects of these SNPs are significantly different for males and females.
In summary, we can conclude that GWAS with longitudinal data provide an efficient strategy, and our overall results show that the improvement in power is substantial, its effect being inversely proportional to ρ.
Table 5. Results for SBP, DBP, GLU0 and GLU120. SNPs with p-values less than 10−6 from cross-sectional or longitudinal data are listed.
Table 5. Results for SBP, DBP, GLU0 and GLU120. SNPs with p-values less than 10−6 from cross-sectional or longitudinal data are listed.
SNPChrPositionNearby GeneMinor AlleleMAFDiscoveryReplicationPreviously Published
Cross-SectionalLongitudinalCross-Sectional
beta ± s.ePbeta ± s.ePbeta ± s.eone-side P
SBP
rs172497541288584717ATP2B1A0.3732−1.63 ± 0.279.73E − 10−1.27 ± 0.221.11E − 08−0.86 ± 0.495.30E − 03Cho et al. NG 2009 [11]
rs1106628012111302166in HECTD4T0.1717−1.45 ± 0.342.52E − 05−1.59 ± 0.292.95E − 08−1.65 ± 0.436.35E − 05Kato et al. NG 2011 [27]
rs24018871489952963CALM1C0.02125−3.19 ± 0.935.89E − 04−3.84 ± 0.788.58E − 07
DBP
rs100303624102841866in BANK1C0.2081−0.72 ± 0.214.34E − 04−0.87 ± 0.173.21E − 07−0.22 ± 0.272.11E-01* Zhang et al. Hypertension Res 2012 [43]
rs3025047643854388in VEGFAA0.01024−2.65 ± 0.21.75E − 03−3.64 ± 0.713.07E − 07
rs710046710108153198SORCS1T0.02356−2.43 ± 0.531.81E − 04−2.82 ± 0.553.43E − 07
rs172497541288584717ATP2B1A0.3732−0.94 ± 0.174.33E − 08−0.8 ± 0.142.01E − 08−0.56 ± 0.238.50E − 03Cho et al. NG 2009 [11]
rs1106628012111302166in HECTD4T0.1717−0.94 ± 0.221.97E − 05−0.98 ± 0.189.25E − 08−0.76 ± 0.385.35E − 03Kato et al. NG 2011 [27]
rs1106776312114682724MED13LG0.3297−0.78 ± 0.181.04E − 05−0.79 ± 0.158.75E − 080 ± 0.355.04E − 01
GLU0
rs129917032119536716MARCOA0.056553.62 ± 0.681.18E − 072.51 ± 0.581.55E − 05−1.14 ± 0.79.11E − 01
rs7754840620769229in CDKAL1C0.47611.8 ± 0.321.72E − 081.78 ± 0.275.16E − 110.98 ± 0.743.99E − 02Kwak et al. Diabetes 2012 [44]
rs9460546620771611in CDKAL1G0.48081.75 ± 0.323.38E − 081.76 ± 0.273.76E − 11
rs2191346715020403DGKBC0.2891−1.72 ± 0.361.33E − 06−1.53 ± 0.33.64E − 07−0.62 ± 0.621.55E − 01
rs649430615VPS13CA0.3435−1.43 ± 0.331.71E − 05−1.46 ± 0.281.92E − 07−1.21 ± 0.592.08E − 02* Manning et al. NG 2012 [45]
rs71972181617319136in XYLT1G0.014567.23 ± 0.681.23E − 074.29 ± 1.172.48E − 04
GLU120
rs7754840620769229in CDKAL1C0.47614.73 ± 0.781.51E − 094.73 ± 0.731.10E − 10Kwak et al. Diabetes 2012 [44]
rs1222965412109898844MYL2-CUX2G0.1426−4.84 ± 1.111.21E − 05−5.16 ± 1.035.84E − 07Go et al. J Hum Genet 2013 [46]
rs207435612111129784in HECTD4T0.1467−5.19 ± 1.092.02E − 06−5.2 ± 1.023.44E − 07Go et al. J Hum Genet 2013 [46]
rs60314922042330963in GDAPL1LG0.49493.84 ± 0.776.91E − 073.03 ± 0.722.81E − 05
rs28680882042347066GDAPL1LA0.4377−3.99 ± 0.782.68E − 07−3.54 ± 0.721.04E − 06
Table 6. Results for Height, BMI, HDL and log AST. SNPs with p-values less than 10−6 from cross-sectional or longitudinal data are listed.
Table 6. Results for Height, BMI, HDL and log AST. SNPs with p-values less than 10−6 from cross-sectional or longitudinal data are listed.
SNPChrPositionNearby geneMinor AlleleMAFDiscoveryReplicationPreviously Published
Cross-sectionalLongitudinalCross-Sectional
beta ± s.ePbeta ± s.ePbeta ± s.eone-side P
Height
rs170381821118669928SPAG17G0.4188−0.45 ± 0.084.08E − 08−0.45 ± 0.085.58E − 08−0.13 ± 0.131.53E − 01Cho et al. Nat Genet 2009 [11]
rs105131373142626120in ZBTB38A0.26050.49 ± 0.098.14E − 080.49 ± 0.095.85E − 080.43 ± 0.147.40E − 04Kim et al. J Hum Genet 2009 [47]
rs6918981634346492RPL35P2-NUDT3G0.20920.55 ± 0.12.98E − 080.55 ± 0.11.72E − 080.1 ± 0.152.51E − 01Kim et al. J Hum Genet 2009 [47]
BMI
rs171785276141947773in AK097143A0.2486−0.32 ± 0.052.96E − 09−0.31 ± 0.056.35E-090.05 ± 0.087.47E − 01
rs110002121073625658in ANAPC16 in ASCC1G0.20570.27 ± 0.061.85E − 060.28 ± 0.065.14E − 070.05 ± 0.082.90E − 01
rs99396091652378028in FTOT0.12620.34 ± 0.071.29E − 060.34 ± 0.077.36E − 070.23 ± 0.11.29E − 02Cho et al. Nat Genet 2009 [11]
HDL
rs271819857982in LPLT0.20641.15 ± 0.194.84E − 101.12 ± 0.172.15E − 111.81 ± 0.387.70E − 07
rs17482753819876926LPLT0.12431.95 ± 0.238.83E − 181.91 ± 0.211.42E − 203.48 ± 0.462.71E − 14Heid et al. Circ Cardiovasc Genet 2008 [48]
rs17410962819892360LPLA0.12441.95 ± 0.238.25E − 181.91 ± 0.211.76E − 203.48 ± 0.462.35E − 14
rs126860049106693247in ABCA1T0.2136−1.26 ± 0.27.01E − 12−1.37 ± 0.171.62E − 16−1.19 ± 0.326.10E − 04Kim et al. Nat Genet 2011 [12]
rs1121612611116122450BUD13C0.20271.43 ± 0.192.69E − 141.36 ± 0.171.54E − 151.44 ± 0.57.45E − 05Kim et al. Nat Genet 2011 [12]
rs658956611116157633in ZNF259C0.2176−1.25 ± 0.181.10E − 11−1.15 ± 0.174.47E − 12−1.89 ± 0.328.15E − 08* Waterworth et al. Arteriosclear Thromb Vasc Biol 2010 [49]
rs1229285811116319189in SIK3C0.17591.05 ± 0.27.73E − 080.88 ± 0.187.68E − 070.9 ± 0.351.11E − 02
rs1222965412109898844MYL2-CUX2G0.1426−1.25 ± 0.246.42E − 09−1.21 ± 0.27.35E − 10−1.66 ± 0.461.25E − 04Kim et al. Nat Genet 2011 [12]
rs223815312110423930in ATXN2A0.4579−0.68 ± 0.168.76E − 06−0.71±0.143.29E − 07
rs1106628012111302166in HECTD4T0.1717−1.4 ± 0.153.10E − 12−1.35±0.181.17E − 13−1.92 ± 0.48.95E − 07
rs207213412111893559in OAS3A0.1143−1.39 ± 0.194.53E − 09−1.31 ± 0.221.25E − 09−1.36 ± 0.42.33E − 03Kim et al. Nat Genet 2011 [12]
rs1837861556455402ALDH1A2T0.305−0.8 ± 0.168.53E − 07−0.83 ± 0.152.33E − 08−0.61 ± 0.333.13E-02
rs169402121556481312LIPCT0.34051.27 ± 0.161.05E − 151.3 ± 0.141.95E − 191.11 ± 0.472.04E − 04Kim et al. Nat Genet 2011 [12]
rs64940051556511816in LIPCG0.2678−0.89 ± 0.21.34E − 07−0.89 ± 0.155.82E − 09−0.79 ± 0.391.05E − 02
rs127089801655569880in CETPC0.0984−1.67 ± 0.253.63E − 11−1.65 ± 0.235.61E − 13−1.88 ± 0.57.40E − 05Kim et al. Nat Genet 2011 [12]
rs21565521845435666LIPGA0.164−0.89 ± 0.211.53E − 05−0.93 ± 0.195.90E − 07−1.15 ± 0.42.29E − 03Waterworth et al. Arteriosclear Thromb Vasc Biol 2010 [49]
rs44206381950114786APOC1C0.1121−1.3 ± 0.164.21E − 08−1.14 ± 0.211.23E − 07−2.01 ± 0.471.88E − 05Willer et al. Nat Genet 2013 [50]
AST
rs9837421315322297in SH3BP5G0.193−0.02 ± 0.012.14E − 04−0.03 ± 0.015.79E − 070 ± 05.32E − 01
rs1084991512109818005in CCDC63G0.1758−0.03 ± 0.012.00E − 06−0.03 ± 0.011.81E − 08−0.02 ± 03.68E − 02
rs378288912109835038in MYL2C0.1726−0.03 ± 0.013.79E − 06−0.03 ± 0.017.26E − 09−0.02 ± 02.02E − 02
rs1222965412109898844MYL2-CUX2G0.1426−0.04 ± 0.017.34E − 08−0.04 ± 0.014.74E − 11−0.02 ± 02.80E − 02
rs1106628012111302166in HECTD4T0.1717−0.05 ± 0.018.17E − 13−0.05 ± 0.011.70E − 18−0.03 ± 01.94E − 04
rs1106645312111850004in OAS1G0.1265−0.03 ± 0.015.26E − 06−0.03±0.018.72E − 07−0.02 ± 07.75E − 02
rs207213412111893559in OAS3A0.1143−0.04 ± 0.017.31E − 08−0.04 ± 0.018.42E − 09−0.02 ± 06.75E − 02
rs124839592242657329in PNPLA3A0.41570.03 ± 01.79E − 090.03 ± 01.02E − 120.03 ± 02.14E − 06* Kamatani et al. Nat Genet 2010 [38]
rs21435712242723019in SAMM50T0.41360.02 ± 0.016.45E − 070.03 ± 07.73E − 100.03 ± 04.22E − 05* Kawaguchi et al. PLoS One 2012 [39]
Table 7. Gene × environment interaction effect in the KARE cohort. Interactions of time interval with SNP were tested, and p-values for SNPs with genome-wide significant interaction are listed.
Table 7. Gene × environment interaction effect in the KARE cohort. Interactions of time interval with SNP were tested, and p-values for SNPs with genome-wide significant interaction are listed.
Effectrs7197218 for GLU0
betaStd.Errorp-value
SNP5.921.211.08E − 06
time×SNP−1.270.252.65E − 07
Table 8. Gene × environment interaction effect in the KARE cohort. Interactions of sex with SNPs were tested, and p-values for SNPs with genome-wide significant interaction are listed.
Table 8. Gene × environment interaction effect in the KARE cohort. Interactions of sex with SNPs were tested, and p-values for SNPs with genome-wide significant interaction are listed.
Effectrs2074356 for HDLrs11066280 for HDLrs2074356 for Log(AST)rs11066280 for Log(AST)rs12229654 for Log(AST)
betaStd.Errorp-valuebetaStd.Errorp-valuebetaStd.Errorp-valuebetaStd.Errorp-valueBetaStd.Errorp-value
SNP−4.800.627.28E − 15−4.690.584.60E − 16−0.130.025.52E − 13−0.170.022.81E − 20−0.150.021.32E − 19
sex×SNP2.260.394.46E − 092.210.361.06E − 090.060.015.82E − 080.080.018.25E − 120.070.013.24E − 11
Figure 4. Interaction effects of SNPs with sex or time interval. (a) Mean of GLU0 at each time-point for each of two rs7197218 genotypes (circles indicate homozygous genotypes with no minor alleles, triangles indicate heterozygous genotypes); (b-f) phenotypic mean of each genotype for males and females (blue and red lines indicate male and female, respectively).
Figure 4. Interaction effects of SNPs with sex or time interval. (a) Mean of GLU0 at each time-point for each of two rs7197218 genotypes (circles indicate homozygous genotypes with no minor alleles, triangles indicate heterozygous genotypes); (b-f) phenotypic mean of each genotype for males and females (blue and red lines indicate male and female, respectively).
Ijerph 11 12283 g004

4. Discussion

It is well known that longitudinal analysis is useful to detect aging effects, and statistically efficient for detecting significant associations. In this report, we numerically calculated the sample sizes required to achieve statistical power at the genome-wide significance level, and our results showed that the power is proportionally related to the number of observations on each individual and inversely related to the correlation between the pairs of observations on an individual. In a large-scale genetic analysis, genotyping cost may be larger than the phenotyping cost, and then we can conclude that analyzing longitudinal data is an efficient strategy to improve the rate of false negative findings. However if the proportion of missing data is large, statistical power loss can be substantial; and if the missingness is not at random, even a small proportion of missing phenotypes can generate a serious bias [51]. In spite of the statistical efficiency of longitudinal data analysis, any possibility of potential bias from the missingness pattern should be carefully investigated; and it should be noted that a little carelessness can lead to a substantial bias.
Table 9. Combined p-values for gene × environment interaction. For replication, interactions of sex with SNPs were tested in the HEXA cohort and a combined p-value was calculated using both Fisher’s and Liptak’s methods.
Table 9. Combined p-values for gene × environment interaction. For replication, interactions of sex with SNPs were tested in the HEXA cohort and a combined p-value was calculated using both Fisher’s and Liptak’s methods.
Sex × SNPHEXA Cohort p-valueCombined p-value Using Fisher’s MethodCombined p-value Using Liptak’s Method
rs2074356 for HDL3.58E − 013.39E − 082.52E − 07
rs11066280 for HDL2.184E − 015.37E − 092.52E − 08
rs2074356 for Log(AST)4.86E − 025.86E − 084.41E − 08
rs11066280 for Log(AST)1.23E − 023.13E − 123.10E − 12
rs12229654 for Log(AST)7.42E − 037.22E − 124.96E − 12
Furthermore, we performed GWAS with both longitudinal and cross-sectional data, and significant results from a longitudinal data analysis in the KARE cohort were further tested in the HEXA cohort. 12 SNPs that have not been reported elsewhere were identified, and the significant p-values from replication studies strengthened the possibility that they are causal. In particular, GWAS with longitudinal data showed that rs3025047 is significantly associated with DBP even though it is not significantly associated in GWAS with cross-sectional data. The MAF of rs3025047 is 0.01, so it is a variant with relatively low frequency. In the HEXA cohort, rs3025047 was not available, nor were any SNPs in linkage disequilibrium with it. Even though further studies are necessary to confirm whether rs3025047 is a true causal variant, our analysis results illustrate that GWAS using longitudinal data can be an efficient strategy for rare variant association analysis.
During the last decade, more than ten thousand GWAS successfully identified disease susceptibility loci, and these findings increase our understanding of diseases. However, the so-called missing heritability [4] reveals that efficient analysis algorithms should be investigated, and GWAS of longitudinal data seem to provide a useful strategy that may bridge the gap.

5. Conclusions

Analyzed as a repeated measure design, the power of longitudinal data is proportionally related to the number of observations on each individual and inversely related to the correlation between the multiple observations on an individual. This facilitates finding causal SNPs and their interactions with environmental variables, as well as with age and sex. In two Korean cohorts it enabled us to find 12 novel genome-wide significant SNPs associated with eight phenotypes, and significant gene × environment interaction. Therefore, we can conclude that longitudinal data seem to provide efficient strategies for GWAS.

Supplementary Files

Supplementary File 1

Acknowledgments

Data was provided by a grant from Korea Centers for Disease Control and Prevention (4845-301, 4851-302, 4851-307). This research was provided by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education 45 (2013R1A1A2010437), and an intramural grant from the Korean National Institute of Health (2013-NG73002-00). This research was supported by Basic Science Research Program through the NRF funded by the Ministry of Science, ICT & Future Planning(NRF-2013R1A1A2010437), and by Basic Science Research Program through the NRF funded by the Ministry of Science, ICT & Future Planning (NRF-2014S1A2A2028559). Woojoo Lee was supported by Basic Science Research Program through the NRF funded by the Ministry of Education, Science and Technology (NRF-2013R1A1A1061332).

Author Contributions

The study was designed by Woojoo Lee and Sungho Won. Statistical analysis was performed by Young Lee, Suyeon Park, Sanghoon Moon and Juyoung Lee. The manuscript was written by Young Lee, Robert C. Elston and Sungho Won. All authors reviewed the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

We assume that the null hypothesis is β = 0, and β = βa under the alternative hypothesis. According to Liu and Liang [13], the score for our hypothesis can be defined by:
S β = 1 σ p 2 i = 1 n X i 1 t R 1 ( Y i 1 t X i β 1 t Z i α )
and we denote by ¯ the expected variance-covariance matrix of the score function Sβ under the alternative hypothesis in the absence of population substructure. Letting be the chi-square noncentrality parameter to achieve 1 – ϕ power at the α significance level, Liu and Liang [13] showed that the required sample size for the score test becomes:
n a = v / ( β a 2 ¯ )
We let 1w be a w-dimensional column vector with elements 1. We assume that (Xi, Zi) can be (δl, ψl), where l = 1, … , L, and define πl = P(Xi = δl, Zi = ψl). If we assume u l = δ l 1 t , D l = diag ( ψ l ) is diagonal matrix, J l = 1 l 1 d , v l = J t D l , and d denotes the number of rows of D l , the elements of the Fisher information matrices for β and α are found to be:
I β α = σ P 2 l π l u l R 1 V l   and   I α α = σ P 2 l π l v l R 1 v l .
In the absence of population substructure, ¯ can be shown to be:
¯ = σ P 2 l π l ( u l T I β α I αα 1 v l T ) R 1 ( u l v l I αα 1 I β α T )
Furthermore:
u l R 1 v l = δ l 1 t R 1 J t D l = ( 1 t R 1 1 t ) δ l ( 1 d D l )
leads to:
I β α = σ P 2 l π l u l R 1 v l = σ P 2 ( 1 t R 1 1 t ) l π l δ l ( 1 d D l )
We let J d = 1 d 1 d , and, because v l = J t D l = 1 t 1 d D l , we have:
v l R 1 V l = D l ( J t R 1 J t ) D l = ( 1 t R 1 1 t ) D l J d D l
Thus, we have:
I α α = σ P 2 l π l v l R 1 v l = σ P 2 ( 1 t R 1 1 t ) l π l ( D l J d D l )
Consequently, if we let Ω = I β γ I α α 1 = ( l π l δ l ( 1 d D l ) ) ( l π l ( D l J d D l ) ) 1 , some tedious algebraic manipulations lead to:
¯ = σ P 2 l π l ( u l T I β α I αα v l T ) R 1 ( u l v l I αα 1 I β α T ) = σ P 2 l π l ( δ l 2 ( 1 t R 1 1 t ) 2 δ l R 1 J D l Ω + Ω D l J R 1 J D l Ω ) = σ P 2 ( 1 t R 1 1 t ) l π l ( δ l 2 2 δ l 1 d D l Ω + Ω D l J d Ω )
If we denote l π l ( δ l 2 2 δ l 1 d D l Ω + Ω D l J d Ω ) by K, the required sample size becomes:
n a = v / ( β a 2 ¯ ) = ( z 1 α / 2 + z 1 ϕ ) 2 σ P 2 K β a 2 ( 1 + ( t 1 ) ρ t )
because 1 t R 1 1 t = t / ( 1 + ( t 1 ) ρ ) . This completes the derivation for nα.

References

  1. Klein, R.J.; Zeiss, C.; Chew, E.Y.; Tsai, J.Y.; Sackler, R.S.; Haynes, C.; Henning, A.K.; SanGiovanni, J.P.; Mane, S.M.; Mayne, S.T.; et al. Complement factor h polymorphism in age-related macular degeneration. Science 2005, 308, 385–389. [Google Scholar] [CrossRef] [PubMed]
  2. Maher, B. Personal genomes: The case of the missing heritability. Nature 2008, 456, 18–21. [Google Scholar] [CrossRef] [PubMed]
  3. Manolio, T.A.; Collins, F.S.; Cox, N.J.; Goldstein, D.B.; Hindorff, L.A.; Hunter, D.J.; McCarthy, M.I.; Ramos, E.M.; Cardon, L.R.; Chakravarti, A.; et al. Finding the missing heritability of complex diseases. Nature 2009, 461, 747–753. [Google Scholar] [CrossRef] [PubMed]
  4. Visscher, P.M.; Brown, M.A.; McCarthy, M.I.; Yang, J. Five years of gwas discovery. Am. J. Hum. Genet. 2012, 90, 7–24. [Google Scholar] [CrossRef] [PubMed]
  5. Pearson, T.A.; Manolio, T.A. How to interpret a genome-wide association study. J. Am. Med. Assoc. 2008, 299, 1335–1344. [Google Scholar] [CrossRef]
  6. Diggle, P.; Diggle, P. Analysis of Longitudinal Data, 2nd ed.; Oxford University Press: Oxford, UK/New York, NY, USA, 2002. [Google Scholar]
  7. OʼReilly, P.F.; Hoggart, C.J.; Pomyen, Y.; Calboli, F.C.; Elliott, P.; Jarvelin, M.R.; Coin, L.J. Multiphen: Joint model of multiple phenotypes can increase discovery in gwas. PLoS One 2012, 7. [Google Scholar] [CrossRef] [PubMed]
  8. Gray, D.J.; Thrift, A.P.; Williams, G.M.; Zheng, F.; Li, Y.S.; Guo, J.; Chen, H.; Wang, T.; Xu, X.J.; Zhu, R.; et al. Five-year longitudinal assessment of the downstream impact on schistosomiasis transmission following closure of the three gorges dam. PLoS Negl. Trop. Dis. 2012, 6. [Google Scholar] [CrossRef] [PubMed]
  9. Smith, E.N.; Chen, W.; Kahonen, M.; Kettunen, J.; Lehtimaki, T.; Peltonen, L.; Raitakari, O.T.; Salem, R.M.; Schork, N.J.; Shaw, M.; et al. Longitudinal genome-wide association of cardiovascular disease risk factors in the bogalusa heart study. PLoS Genet. 2010, 6. [Google Scholar] [CrossRef]
  10. Zhu, W.; Cho, K.; Chen, X.; Zhang, M.; Wang, M.; Zhang, H. A genome-wide association analysis of framingham heart study longitudinal data using multivariate adaptive splines. BMC Proc. 2009, 3. [Google Scholar] [CrossRef]
  11. Cho, Y.S.; Go, M.J.; Kim, Y.J.; Heo, J.Y.; Oh, J.H.; Ban, H.J.; Yoon, D.; Lee, M.H.; Kim, D.J.; Park, M.; et al. A large-scale genome-wide association study of asian populations uncovers genetic factors influencing eight quantitative traits. Nat. Genet. 2009, 41, 527–534. [Google Scholar] [CrossRef] [PubMed]
  12. Kim, Y.J.; Go, M.J.; Hu, C.; Hong, C.B.; Kim, Y.K.; Lee, J.Y.; Hwang, J.Y.; Oh, J.H.; Kim, D.J.; Kim, N.H.; et al. Large-scale genome-wide association studies in east asians identify new genetic loci influencing metabolic traits. Nat. Genet. 2011, 43, 990–995. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, G.; Liang, K.Y. Sample size calculations for studies with correlated observations. Biometrics 1997, 53, 937–947. [Google Scholar] [CrossRef] [PubMed]
  14. Camastra, S.; Bonora, E.; Del Prato, S.; Rett, K.; Weck, M.; Ferrannini, E. Effect of obesity and insulin resistance on resting and glucose-induced thermogenesis in man. Egir (European group for the study of insulin resistance). Int. J. Obesity Relat. Metab. Disord. 1999, 23, 1307–1313. [Google Scholar] [CrossRef]
  15. McCarthy, M.I. Genomics, type 2 diabetes, and obesity. N Engl. J. Med. 2010, 363, 2339–2350. [Google Scholar] [CrossRef] [PubMed]
  16. Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.A.; Reich, D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006, 38, 904–909. [Google Scholar] [CrossRef] [PubMed]
  17. Devlin, B.; Roeder, K.; Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 2001, 60, 155–166. [Google Scholar] [CrossRef] [PubMed]
  18. Kang, H.M.; Sul, J.H.; Service, S.K.; Zaitlen, N.A.; Kong, S.Y.; Freimer, N.B.; Sabatti, C.; Eskin, E. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010, 42, 348–354. [Google Scholar] [CrossRef] [PubMed]
  19. Davies, R.W.; Wells, G.A.; Stewart, A.F.; Erdmann, J.; Shah, S.H.; Ferguson, J.F.; Hall, A.S.; Anand, S.S.; Burnett, M.S.; Epstein, S.E.; et al. A genome-wide association study for coronary artery disease identifies a novel susceptibility locus in the major histocompatibility complex. Circulation. Cardiovasc. Genet. 2012, 5, 217–225. [Google Scholar] [CrossRef]
  20. Yu, Y.; Bhangale, T.R.; Fagerness, J.; Ripke, S.; Thorleifsson, G.; Tan, P.L.; Souied, E.H.; Richardson, A.J.; Merriam, J.E.; Buitendijk, G.H.; et al. Common variants near frk/col10a1 and vegfa are associated with advanced age-related macular degeneration. Hum. Mol. Genet. 2011, 20, 3699–3709. [Google Scholar] [CrossRef] [PubMed]
  21. Zeggini, E.; Scott, L.J.; Saxena, R.; Voight, B.F.; Marchini, J.L.; Hu, T.; de Bakker, P.I.; Abecasis, G.R.; Almgren, P.; Andersen, G.; et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 2008, 40, 638–645. [Google Scholar] [CrossRef] [PubMed]
  22. Lopes, M.C.; Hysi, P.G.; Verhoeven, V.J.; Macgregor, S.; Hewitt, A.W.; Montgomery, G.W.; Cumberland, P.; Vingerling, J.R.; Young, T.L.; van Duijn, C.M.; et al. Identification of a candidate gene for astigmatism. Investig. Ophthalmol. Vis. Sci. 2013, 54, 1260–1267. [Google Scholar] [CrossRef]
  23. Adeyemo, A.; Gerry, N.; Chen, G.; Herbert, A.; Doumatey, A.; Huang, H.; Zhou, J.; Lashley, K.; Chen, Y.; Christman, M.; et al. A genome-wide association study of hypertension and blood pressure in african americans. PLoS Genet. 2009, 5. [Google Scholar] [CrossRef] [PubMed]
  24. Ganesh, S.K.; Zakai, N.A.; van Rooij, F.J.; Soranzo, N.; Smith, A.V.; Nalls, M.A.; Chen, M.H.; Kottgen, A.; Glazer, N.L.; Dehghan, A.; et al. Multiple loci influence erythrocyte phenotypes in the charge consortium. Nat. Genet. 2009, 41, 1191–1198. [Google Scholar] [CrossRef] [PubMed]
  25. Hunt, K.A.; Zhernakova, A.; Turner, G.; Heap, G.A.; Franke, L.; Bruinenberg, M.; Romanos, J.; Dinesen, L.C.; Ryan, A.W.; Panesar, D.; et al. Newly identified genetic risk variants for celiac disease related to the immune response. Nat. Genet. 2008, 40, 395–402. [Google Scholar] [CrossRef] [PubMed]
  26. Ikram, M.K.; Sim, X.; Jensen, R.A.; Cotch, M.F.; Hewitt, A.W.; Ikram, M.A.; Wang, J.J.; Klein, R.; Klein, B.E.; Breteler, M.M.; et al. Four novel loci (19q13, 6q24, 12q24, and 5q14) influence the microcirculation in vivo. PLoS Genet. 2010, 6. [Google Scholar] [CrossRef]
  27. Kato, N.; Takeuchi, F.; Tabara, Y.; Kelly, T.N.; Go, M.J.; Sim, X.; Tay, W.T.; Chen, C.H.; Zhang, Y.; Yamamoto, K.; et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east asians. Nat. Genet. 2011, 43, 531–538. [Google Scholar] [CrossRef] [PubMed]
  28. Kottgen, A.; Albrecht, E.; Teumer, A.; Vitart, V.; Krumsiek, J.; Hundertmark, C.; Pistis, G.; Ruggiero, D.; OʼSeaghdha, C.M.; Haller, T.; et al. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 2013, 45, 145–154. [Google Scholar] [CrossRef] [PubMed]
  29. Kottgen, A.; Pattaro, C.; Boger, C.A.; Fuchsberger, C.; Olden, M.; Glazer, N.L.; Parsa, A.; Gao, X.; Yang, Q.; Smith, A.V.; et al. New loci associated with kidney function and chronic kidney disease. Nat. Genet. 2010, 42, 376–384. [Google Scholar] [CrossRef] [PubMed]
  30. Lu, X.; Wang, L.; Chen, S.; He, L.; Yang, X.; Shi, Y.; Cheng, J.; Zhang, L.; Gu, C.C.; Huang, J.; et al. Genome-wide association study in han chinese identifies four new susceptibility loci for coronary artery disease. Nat. Genet. 2012, 44, 890–894. [Google Scholar] [CrossRef] [PubMed]
  31. Newton-Cheh, C.; Johnson, T.; Gateva, V.; Tobin, M.D.; Bochud, M.; Coin, L.; Najjar, S.S.; Zhao, J.H.; Heath, S.C.; Eyheramendy, S.; et al. Genome-wide association study identifies eight loci associated with blood pressure. Nat. Genet. 2009, 41, 666–676. [Google Scholar] [CrossRef] [PubMed]
  32. Wain, L.V.; Verwoert, G.C.; OʼReilly, P.F.; Shi, G.; Johnson, T.; Johnson, A.D.; Bochud, M.; Rice, K.M.; Henneman, P.; Smith, A.V.; et al. Genome-wide association study identifies six new loci influencing pulse pressure and mean arterial pressure. Nat. Genet. 2011, 43, 1005–1011. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Wu, C.; Hu, Z.; He, Z.; Jia, W.; Wang, F.; Zhou, Y.; Liu, Z.; Zhan, Q.; Liu, Y.; Yu, D.; et al. Genome-wide association study identifies three new susceptibility loci for esophageal squamous-cell carcinoma in chinese populations. Nat. Genet. 2011, 43, 679–684. [Google Scholar] [CrossRef] [PubMed]
  34. Yang, X.; Lu, X.; Wang, L.; Chen, S.; Li, J.; Cao, J.; Chen, J.; Hao, Y.; Li, Y.; Zhao, L.; et al. Common variants at 12q24 are associated with drinking behavior in han chinese. Am. J. Clin. Nutr. 2013, 97, 545–551. [Google Scholar] [CrossRef] [PubMed]
  35. Baik, I.; Cho, N.H.; Kim, S.H.; Han, B.G.; Shin, C. Genome-wide association studies identify genetic loci related to alcohol consumption in korean men. Am. J. Clin. Nutr. 2011, 93, 809–816. [Google Scholar] [CrossRef] [PubMed]
  36. Zuo, L.; Gelernter, J.; Zhang, C.K.; Zhao, H.; Lu, L.; Kranzler, H.R.; Malison, R.T.; Li, C.S.; Wang, F.; Zhang, X.Y.; et al. Genome-wide association study of alcohol dependence implicates kiaa0040 on chromosome 1q. Neuropsychopharmacology 2012, 37, 557–566. [Google Scholar] [CrossRef] [PubMed]
  37. Zuo, L.; Zhang, F.; Zhang, H.; Zhang, X.Y.; Wang, F.; Li, C.S.; Lu, L.; Hong, J.; Lu, L.; Krystal, J.; et al. Genome-wide search for replicable risk gene regions in alcohol and nicotine co-dependence. Am. J. Med. Genet. Part B Neuropsychiatr. Genet. 2012, 159B, 437–444. [Google Scholar] [CrossRef]
  38. Kamatani, Y.; Matsuda, K.; Okada, Y.; Kubo, M.; Hosono, N.; Daigo, Y.; Nakamura, Y.; Kamatani, N. Genome-wide association study of hematological and biochemical traits in a japanese population. Nat. Genet. 2010, 42, 210–215. [Google Scholar] [CrossRef] [PubMed]
  39. Kawaguchi, T.; Sumida, Y.; Umemura, A.; Matsuo, K.; Takahashi, M.; Takamura, T.; Yasui, K.; Saibara, T.; Hashimoto, E.; Kawanaka, M.; et al. Genetic polymorphisms of the human pnpla3 gene are strongly associated with severity of non-alcoholic fatty liver disease in japanese. PLoS One 2012, 7. [Google Scholar] [CrossRef] [PubMed]
  40. Fisher, R.A. Statistical Methods for Research Workers, 11th ed.; Oliver and Boyd: Edinburgh, UK, 1950. [Google Scholar]
  41. Liptak, T. On the combination of independent tests. Magyar Tudom Aanyos Akad Aemia Matematikai Kutat Ao Intezetenek Kozlemenyei 1958, 3, 171–197. [Google Scholar]
  42. Won, S.; Morris, N.; Lu, Q.; Elston, R.C. Choosing an optimal method to combine p-values. Stat. Med. 2009, 28, 1537–1553. [Google Scholar] [CrossRef] [PubMed]
  43. Zhang, D.; Pang, Z.; Li, S.; Jiang, W.; Wang, S.; Thomassen, M.; Hjelmborg, J.V.; Kruse, T.A.; Ohm Kyvik, K.; Christensen, K.; et al. Genome-wide linkage and association scans for pulse pressure in chinese twins. Hypertens. Res. 2012, 35, 1051–1057. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Kwak, S.H.; Kim, S.H.; Cho, Y.M.; Go, M.J.; Cho, Y.S.; Choi, S.H.; Moon, M.K.; Jung, H.S.; Shin, H.D.; Kang, H.M.; et al. A genome-wide association study of gestational diabetes mellitus in korean women. Diabetes 2012, 61, 531–541. [Google Scholar] [CrossRef] [PubMed]
  45. Manning, A.K.; Hivert, M.F.; Scott, R.A.; Grimsby, J.L.; Bouatia-Naji, N.; Chen, H.; Rybin, D.; Liu, C.T.; Bielak, L.F.; Prokopenko, I.; et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 2012, 44, 659–669. [Google Scholar] [PubMed]
  46. Go, M.J.; Hwang, J.Y.; Kim, Y.J.; Hee Oh, J.; Kim, Y.J.; Heon Kwak, S.; Soo Park, K.; Lee, J.; Kim, B.J.; Han, B.G.; et al. New susceptibility loci in myl2, c12orf51 and oas1 associated with 1-h plasma glucose as predisposing risk factors for type 2 diabetes in the korean population. J. Hum. Genet. 2013, 58, 362–365. [Google Scholar] [CrossRef] [PubMed]
  47. Kim, J.J.; Lee, H.I.; Park, T.; Kim, K.; Lee, J.E.; Cho, N.H.; Shin, C.; Cho, Y.S.; Lee, J.Y.; Han, B.G.; et al. Identification of 15 loci influencing height in a korean population. J. Hum. Genet. 2010, 55, 27–31. [Google Scholar] [CrossRef] [PubMed]
  48. Heid, I.M.; Boes, E.; Muller, M.; Kollerits, B.; Lamina, C.; Coassin, S.; Gieger, C.; Doring, A.; Klopp, N.; Frikke-Schmidt, R.; et al. Genome-wide association analysis of high-density lipoprotein cholesterol in the population-based kora study sheds new light on intergenic regions. Circ. Cardiovasc. Genet. 2008, 1, 10–20. [Google Scholar] [CrossRef] [PubMed]
  49. Waterworth, D.M.; Ricketts, S.L.; Song, K.; Chen, L.; Zhao, J.H.; Ripatti, S.; Aulchenko, Y.S.; Zhang, W.; Yuan, X.; Lim, N.; et al. Genetic variants influencing circulating lipid levels and risk of coronary artery disease. Arterioscler. Thromb. Vasc. Biol. 2010, 30, 2264–2276. [Google Scholar] [CrossRef] [PubMed]
  50. Global Lipids Genetics, C.; Willer, C.J.; Schmidt, E.M.; Sengupta, S.; Peloso, G.M.; Gustafsson, S.; Kanoni, S.; Ganna, A.; Chen, J.; Buchkovich, M.L.; et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013, 45, 1274–1283. [Google Scholar] [CrossRef] [PubMed]
  51. Graham, J.W. Missing data analysis: Making it work in the real world. Annu. Rev. Psychol. 2009, 60, 549–576. [Google Scholar] [CrossRef] [PubMed]

Share and Cite

MDPI and ACS Style

Lee, Y.; Park, S.; Moon, S.; Lee, J.; Elston, R.C.; Lee, W.; Won, S. On the Analysis of a Repeated Measure Design in Genome-Wide Association Analysis. Int. J. Environ. Res. Public Health 2014, 11, 12283-12303. https://doi.org/10.3390/ijerph111212283

AMA Style

Lee Y, Park S, Moon S, Lee J, Elston RC, Lee W, Won S. On the Analysis of a Repeated Measure Design in Genome-Wide Association Analysis. International Journal of Environmental Research and Public Health. 2014; 11(12):12283-12303. https://doi.org/10.3390/ijerph111212283

Chicago/Turabian Style

Lee, Young, Suyeon Park, Sanghoon Moon, Juyoung Lee, Robert C. Elston, Woojoo Lee, and Sungho Won. 2014. "On the Analysis of a Repeated Measure Design in Genome-Wide Association Analysis" International Journal of Environmental Research and Public Health 11, no. 12: 12283-12303. https://doi.org/10.3390/ijerph111212283

Article Metrics

Back to TopTop