Next Article in Journal
Hsa_circ_0006692 Promotes Lung Cancer Progression via miR-205-5p/CDK19 Axis
Next Article in Special Issue
Opioid Use Disorder and Alternative mRNA Splicing in Reward Circuitry
Previous Article in Journal
Transcriptome Analysis Reveals Candidate Lignin-Related Genes and Transcription Factors during Fruit Development in Pomelo (Citrus maxima)
Previous Article in Special Issue
Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates

Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2022, 13(5), 847; https://doi.org/10.3390/genes13050847
Submission received: 12 April 2022 / Revised: 30 April 2022 / Accepted: 5 May 2022 / Published: 9 May 2022
(This article belongs to the Special Issue Statistical Genetics in Human Diseases)

Abstract

:
Although the X chromosome accounts for about 5% of the human genes, it is routinely excluded from genome-wide association studies probably due to its unique structure and complex biological patterns. While some statistical methods have been proposed for testing the association between X chromosomal markers and diseases, very a few of them can adjust for covariates. Unfortunately, those methods that can incorporate covariates either need to specify an X chromosome inactivation model or require the permutation procedure to compute the p value. In this article, we proposed a novel analytic approach based on logistic regression that allows for covariates and does not need to specify the underlying X chromosome inactivation pattern. Simulation studies showed that our proposed method controls the size well and has robust performance in power across various practical scenarios. We applied the proposed method to analyze Graves’ disease data to show its usefulness in practice.

1. Introduction

Many diseases exhibit a gender preference, such as autoimmune diseases, cardiovascular diseases, psychiatric diseases, and cancer, implying that genetic variants on the X chromosome play an important role in sex differences [1,2,3,4,5]. However, most genome-wide association studies (GWAS) routinely exclude the analysis of X-chromosomal variants probably because the X chromosome has a unique structure and complex biological patterns [6,7,8]. Females have one more X chromosome than males, and to balance gene expression on the X chromosome with that of males, one of the female X chromosomes is inactivated in the early embryo [9]. Usually, the process of X chromosome inactivation (XCI) is considered random (XCI-R) [10], i.e., for an X-linked gene, nearly 50% of the cells have the paternal allele active while the rest cells have the maternal allele active. However, studies have shown that skewed XCI (XCI-S) is more biologically plausible [11]. XCI-S is a non-random process, which has been defined as a significant deviation from XCI-R, for instance, the inactivation of one of the alleles in more than 75% of cells [12]. In addition, up to 25% of X-linked genes can escape from XCI (XCI-E) [9]. Both alleles in the genes under XCI-E will be active, which are similar to autosomal genes.
To account for the unique characteristics of the X chromosome, several statistical methods have been developed for testing the association between X chromosomal markers and diseases [13,14,15,16,17,18]. However, very a few of them can adjust for covariates. In large-scale GWAS, spurious associations may occur due to the influence of additional covariates, such as sex, age, and population structure [19,20]. Particularly on the X chromosome, if the sex ratios differ between cases and controls, then sex will be a confounder when the allele frequency of females is unequal to that of males. In practice, a natural way to adjust for covariates is to build a regression model, and logistic regression is generally adopted for binary traits. Based on the logistic regression framework, Gao et al. [15] integrated four tests ( FM 01 ,   FM 02 , FM F , and FM S ) in the software toolset XWAS. In FM 01 and FM 02 , three genotypes of females are both coded by 0, 1, and 2, while two genotypes of males are coded by 0 and 1 for FM 01 and by 0 and 2 for FM 02 . In the latter, males are treated as homozygous females to reflect the dosage compensation relationship between the two sexes. Hence, FM 01 and FM 02 assume that the underlying XCI patterns are XCI-E and XCI-R, respectively. On the other hand, FM F and FM S build logistic regressions for females and males separately and then combine the two p values using Fisher’s and Stouffer’s methods, respectively. However, these two methods do not take any XCI patterns into consideration and thus may suffer from substantial power loss if the test marker is undergoing XCI. Wang et al. [14] proposed another approach (denoted by maxLR ) that can consider four special XCI patterns simultaneously: XCI-R, XCI-E, XCI-S fully toward the normal allele (XCI-SN), and XCI-S fully toward the risk allele (XCI-SR). In their method, three genotypes of females are coded as 0, γ , and 2 under XCI, where γ 0 , 2 measures the degree of skewness of XCI. For instance, γ = 0   2 represents that all the risk (normal) alleles are inactivated in heterozygous females, which corresponds to the XCI-SN (XCI-SR) pattern. While maxLR has robust performance in power, its p value is evaluated based on the permutation procedure, which is very computationally intensive, especially in GWAS. Hence, it is still desirable to develop a robust method that can both adjust for covariates and analytically calculate the p value.
To fill this gap, this article proposed a novel statistical method to test the association between X chromosomal markers and a specific disease. Our method, which is also based on logistic regression, is robust because it does not require assigning a specific XCI pattern. Further, our method can compute the p value without the resample procedure by directly using the rhombus formula. We implemented an extensive simulation study to compare the performance of our approach with the existing ones. Simulation results showed that our method controls the size well and can maintain relatively high power across a variety of scenarios. Finally, we applied our proposed approach to the Graves’s disease data to demonstrate its practical use.

2. Method

Consider an X-linked SNP with deleterious allele A and normal allele a. Then, there are three possible genotypes for females: aa, Aa, and AA, and two for males: a and A. We assume a binary variable D for the disease of interest with D = 1   0 representing individuals with (without) the disease. X = x 1 , , x p denotes the p covariates that need to be adjusted in the model, where x 1 1   is the model intercept and x 2 represents the binary variable with 1 being female and 0 being male. We further assume that the relationship between the phenotype and genotype for individual i can be constructed by the following logistic regression model:
log Pr D i = 1 | G i , X i Pr D i = 0 | G i , X i = X i α + β G i
where the subscript i denotes the i th individual, G is the genotypic score, α = α 1 ,   α 2 , , α p , and β represents the regression coefficients for the covariates and the genotypic score. Note that the genotypic score depends on the underlying XCI pattern. According to the coding strategy by Wang et al. [14], G i can be written in the following uniform form
G i Z 1 ,   Z 2 = 2 I i A A + Z 1 I i A a + Z 2 I i A ,
where I .   is the indicator function, and Z 1   and Z 2   are unknown parameters depending on the underlying XCI pattern. For instance, when the SNP is undergoing XCI, Z 1   and Z 2 can be assigned by γ and 2, respectively. In this coding strategy, γ is a measure of the skewness of XCI, and males are treated as homozygous females to reflect the dosage compensation. Table 1 lists the genotypic scores for all five genotypes and the corresponding values of Z 1   and Z 2   under the four special XCI patterns.
We chose the score statistic to test the null hypothesis: β = 0 because the association tests for all the SNPs share the same null model. For a total sample size of n , the score function can be derived as
U Z 1 , Z 2 = i = 1 n G i Z 1 ,   Z 2 D i Pr D i = 1 | X i ,
where Pr D i = 1 | X i = e x p X i α ^ 1 + e x p X i α ^ is the disease probability estimated for individual i without considering the genotype (details of the derivation are given in Appendix A). The information matrix of (1) can be written as follows:
I Z 1 , Z 2 = I β Z 1 , Z 2 I β α Z 1 , Z 2 I β α Z 1 , Z 2 I α ,
where
I β Z 1 , Z 2 = i = 1 n G i Z 1 ,   Z 2 2 1 Pr D i = 1 | X i Pr D i = 1 | X i ,
I β α Z 1 , Z 2 = i = 1 n X i 1 G i Z 1 ,   Z 2 1 Pr D i = 1 | X i Pr D i = 1 | X i , , i = 1 n X i p G i Z 1 ,   Z 2 1 Pr D i = 1 | X i Pr D i = 1 | X i ,
and
I α = i = 1 n X i 1 2 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i 1 X i p 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i p X i 1 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i p 2 1 Pr D i = 1 | X i Pr D i = 1 | X i .
Under the null hypothesis: β = 0 , we have
W S = U Z 1 , Z 2 V Z 1 , Z 2 1 U Z 1 , Z 2 χ 1 2 ,
where V Z 1 , Z 2 = I β Z 1 , Z 2 I β α Z 1 , Z 2 I α 1 I β α Z 1 , Z 2 is estimated as the variance of U Z 1 , Z 2 . Therefore, the statistic
S Z 1 , Z 2 = U Z 1 , Z 2 V Z 1 , Z 2
asymptotically follows a standard normal distribution under the null hypothesis.
Note that the calculation of the test statistic relies on the underlying XCI pattern. Unfortunately, this is generally unknown for a specific SNP. We thereby proposed a robust test referred to as XCMAX 4 to account for the four special XCI models. The XCMAX 4 statistic is defined as follows:
XCMAX 4 = max S 0 ,   2 , S 1 ,   2 , S 2 ,   2 , S 1 ,   1
Due to the correlation between the four score tests, XCMAX 4 does not follow any classical distributions. We assume that S 0 ,   2 , S 1 ,   2 , S 2 ,   2 , and S 1 ,   1 jointly asymptotically follow a multivariate normal distribution N 0 ,   Σ , where 0 is a four-dimensional vector with all elements being 0, and Σ is the correlation matrix with
Σ = 1 ρ 0 , 2 , 1 , 2 ρ 1 , 2 , 0 , 2 1 ρ 0 , 2 , 2 ,   2 ρ 0 , 2 , 1 , 1 ρ 1 , 2 , 2 ,   2 ρ 1 , 2 , 1 , 1 ρ 2 ,   2 , 0 , 2 ρ 2 ,   2 , 1 , 2 ρ 1 , 1 , 0 , 2 ρ 1 , 1 , 1 , 2 1 ρ 2 ,   2 , 1 , 1 ρ 1 , 1 , 2 ,   2 1 .
In the above correlation matrix, ρ z 11 , z 21 , z 12 , z 22 is the correlation coefficient between S Z 11 , Z 21 and S Z 12 , Z 22 . Given Σ , we can analytically derive the p value of XCMAX 4 . Particularly, let f y , 0 ,   Σ be the density function of the multivariate normal distribution N 0 ,   Σ ; then, for a given z > 0 , the p value of XCMAX 4 is calculated by
P r ( XCMAX 4 > z ) = 1 z z f y , 0 , Σ d y .
Next, we need to accurately estimate the correlation matrix Σ . To this end, we first build a new model that contains two parameters representing genetic effects as follows:
log Pr D i = 1 | G i , X i Pr D i = 0 | G i , X i = X i α + β 1 G i Z 11 , Z 21 + β 2 G i Z 12 , Z 22 .
The information matrix of (2) can be expressed as follows:
I Z 11 , Z 21 , Z 12 , Z 22 = I β 1 β 2 Z 11 , Z 21 , Z 12 , Z 22 I β 1 β 2 α Z 11 , Z 21 , Z 12 , Z 22 I β 1 β 2 α Z 11 , Z 21 , Z 12 , Z 22 I α ,
where
I β 1 β 2 Z 11 , Z 21 , Z 12 , Z 22 = i = 1 n G i Z 11 ,   Z 21 2 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n G i Z 11 ,   Z 21 G i Z 12 ,   Z 22 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n G i Z 11 ,   Z 21 G i Z 12 ,   Z 22 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n G i Z 12 ,   Z 22 2 1 Pr D i = 1 | X i Pr D i = 1 | X i ,
and
I β α Z 11 , Z 21 , Z 12 , Z 22 = i = 1 n X i 1 G i Z 11 , Z 21 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i p G i Z 11 , Z 21 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i 1 G i Z 12 , Z 22 1 Pr D i = 1 | X i Pr D i = 1 | X i i = 1 n X i p G i Z 12 , Z 22 1 Pr D i = 1 | X i Pr D i = 1 | X i
Under the null hypothesis β 1 = β 2 = 0 , the statistic
W S = U Z 11 , Z 21 , U Z 12 , Z 22 C Z 11 , Z 21 , Z 12 , Z 22 1 U Z 11 , Z 21 , U Z 12 , Z 22
asymptotically follows a chi-square distribution with two degrees of freedom, where
C Z 11 , Z 12 , Z 21 , Z 22 = I β 1 β 2 Z 11 , Z 21 , Z 12 , Z 22 I β 1 β 2 α Z 11 , Z 21 , Z 12 , Z 22 I α 1 I β 1 β 2 α Z 11 , Z 21 , Z 12 , Z 22
is the covariance matrix of U Z 11 , Z 12 and U Z 21 , Z 22 . Therefore, the correlation coefficient between S Z 11 , Z 12 and S Z 21 , Z 22 can be estimated as
1 , 0 C Z 11 , Z 21 , Z 12 , Z 22 0 , 1 1 , 0 C Z 11 , Z 21 , Z 12 , Z 22 1 , 0 × 0 , 1 C Z 11 , Z 21 , Z 12 , Z 22 0 , 1
Once Σ is estimated, we can calculate the p value of XCMAX 4 . Although the four-dimensional integral can be calculated in the commonly used software (e.g., the mvtnorm package in R, https://cran.r-project.org/web/packages/mvtnorm/index.html, (accessed on 10 April 2022)), the algorithm based on the Quasi-Monte-Carlo procedure needs a lot of computing resources to achieve relatively high accuracy. Hence, it would be still desirable to obtain its analytic form if possible. Fortunately, we can use the rhombus formula [13,21] to obtain the upper bound of the p-value of XCMAX 4 as follows:
P XCMAX 4 > z 2 Φ z Φ z 1 + 4 ϕ z z i = 1 3 Φ L i i + 1 z 2 + Φ π L i i + 1 z 2 1 ,
where Φ x and ϕ x denote the cumulative distribution function and probability density function of the standard normal distribution, respectively, and L i i + 1 = arccos ρ i i + 1 , where ρ i i + 1 is the correlation efficient between i th and i + 1 th score statistics. Note the order of four test statistics S 0 , 2 ,   S 1 , 2 , S 2 , 2 , and S 1 , 1 is not specified in the above formula, so 12 kinds of upper bounds can be obtained. Therefore, only the smallest bound among them is adopted as an approximation of the p value. As shown in Wang et al. [13], such approximation is very accurate for small p values, which would be quite useful in GWAS because the significance level is generally very stringent (e.g., 5 × 10 8 ) in such studies.

3. Simulation Study

3.1. Simulation Settings

We conducted comprehensive simulation studies to compare the performance of XCMAX 4 with FM 01 , FM 02 , FM F , and FM S , all of which can adjust covariates. Note that we did not include the maxLR in our simulations because this method is a permutation-based approach, which would be too time-consuming for GWAS. The data are simulated from the following model:
log Pr D i = 1 | G i , x i 2 , x i 3 Pr D i = 0 | G i , x i 2 , x i 3 = α 1 + α 2 x i 2 + α 3 x i 3 + β G i ,
where x 2 is the binary covariate sex, x 3 is a continuous covariate, which is sampled from the uniform distribution U 0 , 1 , and G is the genotype score. The ratio of males to females is assumed to be 1 : 1 in the general population, so x 2 follows a binomial distribution B 0.5 . Further, we assume that the genotype of females (aa, Aa, AA) follows a trinomial distribution with probabilities q f 0 ,   q f 1 , q f 2 , while the genotype of males (a, A) follows a binomial distribution 1 q m ,   q m . Let q f and F be the respective risk allele frequency and the inbreeding coefficient for females. Then, we have q f 0 = 1 q f 2 + F q f 1 q f , q f 1 = 2 ( 1 F ) q f 1 q f , and q f 2 = q f 2 + F q f 1 q f . The values of q f and q m are both set to be 0.1, 0.2, and 0.3, so there are nine combinations in total. F is assigned to be 0 and 0.05, where the former implies Hardy–Weinberg equilibrium (HWE) and the latter represents a scenario of Hardy–Weinberg disequilibrium (HWD). The intercept   α 1 is fixed at 5 . For the coefficients x 2 and x 3 , we consider two cases for each of them: α 2 = 0.4005 , 0.4005 and α 3 = 0.5 ,   1.5 . The genetic effect β is set to be 0, 0.1116, 0.15, and 0.1858, where β = 0 means no association between the SNP and the disease status, and the other three values of β indicate that the odds ratios of females with genotype AA are about 1.25, 1.35, and 1.45. Obviously, the case of β = 0 is used to study the size, while the empirical power is investigated in the non-zero β cases.
Note that, when studying the power, we only choose three combinations of q f and q m : 0.3 ,   0.3 ,   0.3 ,   0.2 ,   and   0.2 ,   0.3 for convenience. The scenarios that the SNP undergoes XCI or escapes from XCI are both considered. For the former, we let γ range from 0 to 2 in increments of 0.5. As such, we have considered various XCI patterns, including XCI-SN, XCI-R, and XCI-SR. Once the XCI pattern is assumed, we can assign the corresponding value for the genotypic score G .
Given the covariates, the genotypic score, and the regression coefficients, we can generate the disease status from the binomial distribution for a large population. Then, we randomly sample 2500 cases and 2500 controls from this population. We find that when α 2 = ± 0.4005 , the proportions of females in cases varied from 40% to 60% in the simulated data. The size is estimated at three nominal levels: α = 1 × 10 3 ,   1 × 10 4 ,   and   1 × 10 5 based on 1,000,000 replicates, while the power is only estimated at the nominal level α = 1 × 10 4 based on 10,000 replicates. The p value of XCMAX 4 is evaluated by using the rhombus formula.

3.2. Results

3.2.1. Size

Table 2 shows the estimated type I error rate at the nominal significance level α = 1 × 10 4 when HWE holds in the female population. As expected, all the methods controlled the size well in all the scenarios. Although XCMAX 4 appears slightly conservative in some scenarios, its p values are similar to the nominal level. We also simulated the scenarios of HWD ( F = 0.05 ). However, we observed that the performances of all the tests were similar to those of Table 2, and HWD in females had little impact on the size. Therefore, the simulation results with non-zero F are presented in the Supplementary Material (Table S1). The results of type I error rates estimated at the nominal level α = 1 × 10 3 and α = 1 × 10 5 are also given in Supplementary Material (Tables S2–S5). As can be seen, XCMAX4 still had the correct size in general, except being slightly conservative at α = 1 × 10 3 .

3.2.2. Power

Figure 1, Figure 2 and Figure 3 plot the powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI patterns when β = 0.15 , F = 0 and q f ,   q m = 0.3 ,   0.3 ,   0.3 ,   0.2 , and 0.2 ,   0.3 ,   respectively. These figures show that all four subfigures exhibited a similar pattern in power, indicating that the covariates had a very limited impact on the performance of all methods.
In Figure 1, we can see that FM 01 and FM F were generally less powerful than other methods in all situations. XCMAX 4 performed best when γ = 0 (XCI-SN) and 2 (XCI-SR). However, when γ = 1 (XCI-R), FM 02 was the most powerful, followed by FM S and XCMAX 4 . This was expected because FM 02 is proposed exactly under XCI-R. We also observed that XCMAX 4 had a better power than FM S when γ = 0.5 , while FM S performed slightly better than XCMAX 4 when γ = 1.5 . In both scenarios, FM 02 was still the most powerful method, but the differences in power between these three methods were generally very small. Notice that the results in Figure 2 and Figure 3 are analogous to those in Figure 1, and thereby the allele frequencies of females and males did not apparently change the power profiles of all of the methods.
Figure 4 plots the powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under the XCI-E pattern with β = 0.15 . Based on this figure, FM 01 was uniformly the most powerful in all scenarios as expected, followed by FM S and XCMAX 4 . FM 02 was generally less powerful than FM 01 , FM S , and XCMAX 4 , but still performed better than FM F . The power results with β = 0.15 , and F = 0.05 are provided in Supplementary Material (Figures S1–S4), which are similar to those in Figure 1, Figure 2, Figure 3 and Figure 4, indicating that HWD in females had little effect on the power results. The power results with β = 0.1116   and   0.1858 are generally consistent with those in Figure 1, Figure 2, Figure 3 and Figure 4, implying that the properties of XCMAX 4 did not vary with the magnitude of the genetic effect (see Figures S5–S20 in Supplementary Material). As expected, when the value of β increased, the powers of all methods uniformly increased.
In conclusion, FM 01   and FM 02   can have high power if the underlying XCI pattern is modelled correctly but may be less powerful in other scenarios. In contrast, XCMAX 4 retained a relatively good power across a variety of scenarios. Compared to XCMAX 4 , FM S may suffer from power loss if the SNP is undergoing XCI but will be more powerful under XCI-E. FM F had the overall worst performance and thus is not recommend. It should be noted that, FM 01 , FM 02 , FM F , and FM S adopted logistic regression, which is slightly more computationally intensive than XCMAX 4 in GWAS because the implementation of the logistic regression requires additional iterations. Compared to the other four methods, testing 2000 SNPs, XCMAX 4 saved half the time. The details of time comparisons are given in Supplemental Material (Table S6).

4. Application to Graves’ Disease Data

Graves’ disease (GD) is an autoimmune disease of hyperthyroidism that is four times more common in women than in men [22,23]. Substantial studies have shown that the genetic background explains about four-fifths of the susceptibility to GD.
Considering the distinct gender bias, it is highly reasonable to speculate that the genes on the X chromosome play an important role in the development of GD. Recently, two independent studies found that rs3827440, a non-synonymous SNP of the GRP174 gene on the X chromosome, was associated with GD. A two-stage GWAS, focused on the Han population in China, first reported this finding, which was further validated in two Caucasian cohorts. There are two alleles at rs3827400, with T being the risk allele and C being the normal one. Table 3 displays the four datasets about rs3827400 mentioned in these two studies. We applied XCMAX 4 , FM 01 , FM 02 , FM F , and FM S to each dataset; the results are shown in Table 4. Note that sex was included as a covariate when calculating the p values of XCMAX 4 , FM 01 , and FM 02 .
This table indicates that none of these methods uniformly performed the best across all four datasets. For the two datasets from the Chinese population, all methods consistently showed that rs3827400 was associated with GD at the 1 × 10 4   significance   level . Among these tests, XCMAX 4 consistently had the second smallest p values. However, the p values of all the methods from both Caucasian datasets suggested no such an association at the same significance level probably because of their relatively small sample size. We also observed that XCMAX 4 appeared slightly conservative in these scenarios, but this was not surprising because the rhombus formula is less accurate when the p value is greater than 0.01.
Because both the Han population and the Caucasian population contained two datasets, we also tested such association at the population level by treating the data source as an additional covariate. The corresponding results are given in Table 5, which are similar to those in Table 4.

5. Discussion

This paper proposed a novel robust method, XCMAX 4 , to test the association between the marker on the X chromosome and a specific disease for case-control design. Our method is an extension of the CMAX 3 [24] test on the X chromosome, which can both incorporate the information of XCI and allow for covariates. Unlike the maxLR proposed by Wang et al., XCMAX 4 is construted by using the score test, which is more efficient in GWAS because we only need to fit the null model once. Moreover, the maxLR requires permutation to calculate the p value, which makes it unappealing in GWAS. In contrast, the p value of XCMAX 4 can be computed analytically by using the rhombus formula. On the other hand, although FM 01 , FM 02 , FM F , and FM S can also adjust for covariates, they do not take various XCI models into consideration and thus may suffer from substantial power loss in some scenarios. However, XCMAX 4 can retain a relatively high power by accounting for four special XCI patterns simultaneously. Simulation results showed that XCMAX 4 controlled the size well and had robust performance in power. Therefore, we recommend using XCMAX 4 for its effectiveness, robustness, and generality. Finally, to help implement XCMAX 4 in practice, we provide an R function XCMAX 4 , which is available at https://github.com/YoupengSU/XCMAX4.git (accessed on 12 April 2022).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes13050847/s1, Table S1: Estimated typeⅠerror rate at the nominal significance level 1 × 10 4 for XCMAX 4 , FM 01 , FM 02 , FM F , and FM S against q f , q m , α 2 , and α 3 based on 1,000,000 replicates when F = 0.05 .; Tables S2–S5: Estimated typeⅠerror rates at the nominal significance levels 1 × 10 3 and 1 × 10 5 ; Table S6: Time used to test 2000 SNPs with a sample size of 5000; Figures S1–S4: Powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S when F = 0.05 ; Figures S5–S20: Powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S when β = 0.1116   a n d   0.1858 .

Author Contributions

Conceptualization, P.W. and Y.S.; methodology, Y.S.; validation, P.W. and J.H.; formal analysis, Y.S. and J.H.; writing—original draft, Y.S. and J.H.; writing—review and editing, P.W. and J.H.; visualization S.C., Z.C., and M.D.; supervision, P.W.; funding acquisition, P.Y. and H.J. All authors have read and agreed to the published version of the manuscript.

Funding

The study was supported by the National Natural Science Foundation of China (No. 82173628) and the National Key R&D Program of China (No. 2018YFE0206900).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The real data used in this study are available from two published papers at https://dx.doi.org/10.1136%2Fjmedgenet-2013-101595, and https://doi.org/10.1111/tan.12259 (assessed on 7 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Detailed Derivation of the Score Statistic

The log-likelihood function of Model (1) can be written as
l β , α = log i = 1 n D i μ i 1 D i 1 μ i = i = 1 n D i log μ i + 1 D i log 1 μ i       = i = 1 n D i log μ i 1 μ i + log 1 μ i       = i = 1 n D i X i α + β G i log 1 + e X i α + β G i ,
where μ i = e x p X i α + β G i 1 + e x p X i α + β G i representing the probability of having disease for i th individual.
Assume that θ ^ 0 = 0 , α ^ is the restricted maximum likelihood estimate of θ = β , α under the condition β = 0 , then the score function and Fisher’s information matrix can be given as
U θ ^ 0 = l β , α θ θ = θ ^ 0 = l α , β β , l α , β α θ = θ ^ 0 = l α , β β , 0 θ = θ ^ 0 , = i = 1 n G i D i e X i α ^ 1 + e X i α ^ , 0 = U Z 1 , Z 2 , 0 ,
and
I θ ^ = E 2   l θ θ θ θ = θ ^ 0 = i = 1 n e X i α ^ 1 + e X i α ^ 2 G i G i i = 1 n e X i α ^ 1 + e X i α ^ 2 X i p G i i = 1 n e X i α ^ 1 + e X i α ^ 2 G i X i p i = 1 n e X i α ^ 1 + e X i α ^ 2 X i p X i p θ = θ 0 = I 11 I 12 I 21 I 22 ,
where
I 11 = i = 1 n e X i α ^ 1 + e X i α ^ 2 G i G i , I 12 = i = 1 n X i 1 G i e X i α ^ 1 + e X i α ^ 2 , , i = 1 n X i p G i e X i α ^ 1 + e X i α ^ 2 , I 21 = I 12 ,
and
I 22 = i = 1 n e X i α ^ 1 + e X i α ^ 2 X i 1 X i 1 i = 1 n e X i α ^ 1 + e X i α ^ 2 X i 1 X i p i = 1 n e X i α ^ 1 + e X i α ^ 2 X i p X i 1 i = 1 n e X i α ^ 1 + e X i α ^ 2 X i p X i p .
By Cox et. al. [25], we can obtain the score test statistic as
W S = U θ ^ 0 I θ ^ 1 U θ ^ 0 = U θ ^ 0 I θ ^ 1 U θ ^ 0 = U θ ^ 0 I 11 I 12 I 21 I 22 1 U θ ^ 0 = U θ ^ 0 I 11 I 12 I 22 1 I 21 1 I 11 I 12 I 22 1 I 21 1 I 12 I 22 1 I 22 1 I 21 I 11 I 12 I 22 1 I 21 1 I 22 1 + I 22 1 I 21 I 11 I 12 I 22 1 I 21 1 I 12 I 22 1 U θ ^ 0 = U Z 1 , Z 2 I 11 I 12 I 22 1 I 21 1 U Z 1 , Z 2 ,
which asymptotically follows a chi-square distribution with degrees of freedom being 1. In Model (2), β becomes a two-dimensional vector, and the proofs are similar, so the details are omitted.

References

  1. Voskuhl, R. Sex differences in autoimmune diseases. Biol. Sex Differ. 2011, 2, 1. [Google Scholar] [CrossRef] [Green Version]
  2. Appelman, Y.; van Rijn, B.B.; Ten Haaf, M.E.; Boersma, E.; Peters, S.A. Sex differences in cardiovascular risk factors and disease prevention. Atherosclerosis 2015, 241, 211–218. [Google Scholar] [CrossRef]
  3. Riecher-Rossler, A. Sex and gender differences in mental disorders. Lancet Psychiatry 2017, 4, 8–9. [Google Scholar] [CrossRef]
  4. Dong, M.; Cioffi, G.; Wang, J.; Waite, K.A.; Ostrom, Q.T.; Kruchko, C.; Lathia, J.D.; Rubin, J.B.; Berens, M.E.; Connor, J.; et al. Sex Differences in Cancer Incidence and Survival: A Pan-Cancer Analysis. Cancer Epidemiol. Biomark. Prev. 2020, 29, 1389–1397. [Google Scholar] [CrossRef] [PubMed]
  5. Erol, A.; Winham, S.J.; McElroy, S.L.; Frye, M.A.; Prieto, M.L.; Cuellar-Barboza, A.B.; Fuentes, M.; Geske, J.; Mori, N.; Biernacka, J.M.; et al. Sex differences in the risk of rapid cycling and other indicators of adverse illness course in patients with bipolar I and II disorder. Bipolar Disord. 2015, 17, 670–676. [Google Scholar] [CrossRef] [PubMed]
  6. Lu, Z.; Carter, A.C.; Chang, H.Y. Mechanistic insights in X-chromosome inactivation. Philos. Trans. R Soc. Lond. B Biol. Sci. 2017, 372, 356. [Google Scholar] [CrossRef] [PubMed]
  7. Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 1961, 190, 372–373. [Google Scholar] [CrossRef] [PubMed]
  8. Carrel, L.; Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 2005, 434, 400–404. [Google Scholar] [CrossRef] [PubMed]
  9. Fang, H.; Disteche, C.M.; Berletch, J.B. X Inactivation and Escape: Epigenetic and Structural Features. Front. Cell Dev. Biol. 2019, 7, 219. [Google Scholar] [CrossRef]
  10. Disteche, C.M. Dosage compensation of the sex chromosomes and autosomes. Semin. Cell Dev. Biol. 2016, 56, 9–18. [Google Scholar] [CrossRef] [Green Version]
  11. Cantone, I.; Fisher, A.G. Human X chromosome inactivation and reactivation: Implications for cell reprogramming and disease. Philos. Trans. R Soc. Lond. B Biol. Sci. 2017, 372, 358. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, P.; Zhang, Y.; Wang, B.Q.; Li, J.L.; Wang, Y.X.; Pan, D.; Wu, X.B.; Fung, W.K.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on case-control design. BMC Bioinform. 2019, 20, 11. [Google Scholar] [CrossRef] [PubMed]
  13. Wang, P.; Xu, S.Q.; Wang, B.Q.; Fung, W.K.; Zhou, J.Y. A robust and powerful test for case-control genetic association study on X chromosome. Stat. Methods Med. Res. 2019, 28, 3260–3272. [Google Scholar] [CrossRef] [PubMed]
  14. Wang, J.; Yu, R.; Shete, S. X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet. Epidemiol. 2014, 38, 483–493. [Google Scholar] [CrossRef]
  15. Gao, F.; Chang, D.; Biddanda, A.; Ma, L.; Guo, Y.; Zhou, Z.; Keinan, A. XWAS: A Software Toolset for Genetic Data Analysis and Association Studies of the X Chromosome. J. Hered. 2015, 106, 666–671. [Google Scholar] [CrossRef] [Green Version]
  16. Clayton, D. Testing for association on the X chromosome. Biostatistics 2008, 9, 593–600. [Google Scholar] [CrossRef]
  17. Zheng, G.; Joo, J.; Zhang, C.; Geller, N.L. Testing association for markers on the X chromosome. Genet. Epidemiol. 2007, 31, 834–843. [Google Scholar] [CrossRef]
  18. Chen, Z.; Ng, H.K.; Li, J.; Liu, Q.; Huang, H. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat. Methods Med. Res. 2017, 26, 567–582. [Google Scholar] [CrossRef]
  19. Clayton, D.G. Sex chromosomes and genetic association studies. Genome Med. 2009, 1, 110. [Google Scholar] [CrossRef] [Green Version]
  20. Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.A.; Reich, D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006, 38, 904–909. [Google Scholar] [CrossRef]
  21. Li, Q.Z.; Zheng, G.; Li, Z.H.; Yu, K. Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann. Hum. Genet. 2008, 72, 397–406. [Google Scholar] [CrossRef] [PubMed]
  22. Chu, X.; Shen, M.; Xie, F.; Miao, X.J.; Shou, W.H.; Liu, L.; Yang, P.P.; Bai, Y.N.; Zhang, K.Y.; Yang, L.; et al. An X chromosome-wide association analysis identifies variants in GPR174 as a risk factor for Graves’ disease. J. Med. Genet. 2013, 50, 479–485. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Szymanski, K.; Miskiewicz, P.; Pirko, K.; Jurecka-Lubieniecka, B.; Kula, D.; Hasse-Lazar, K.; Krajewski, P.; Bednarczuk, T.; Ploski, R. rs3827440, a nonsynonymous single nucleotide polymorphism within GPR174 gene in X chromosome, is associated with Graves’ disease in Polish Caucasian population. Tissue Antigens 2014, 83, 41–44. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, Z.; Zang, Y. CMAX3: A Robust Statistical Test for Genetic Association Accounting for Covariates. Genes 2021, 12, 1723. [Google Scholar] [CrossRef]
  25. Cox, D.R.; Hinkley, D.V. Theoretical Statistics; CRC Press: Boca Raton, FL, USA, 1979. [Google Scholar]
Figure 1. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , and q f = q m = 0.3 .
Figure 1. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , and q f = q m = 0.3 .
Genes 13 00847 g001
Figure 2. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , q f = 0.3 , and q m = 0.2 .
Figure 2. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , q f = 0.3 , and q m = 0.2 .
Genes 13 00847 g002
Figure 3. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , q f = 0.2 , and q m = 0.3 .
Figure 3. Estimated powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under various XCI models. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , F = 0 , q f = 0.2 , and q m = 0.3 .
Genes 13 00847 g003
Figure 4. Powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under XCI-E. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , and F = 0 . In the horizontal coordinates, “A”, “B”, and “C” represent three combinations of q f ,   q m : 0.3 , 0.3 , 0.3 , 0.2 , and 0.2 , 0.3 , respectively, and the numbers 1–4 represent four combinations of α 2 , α 3 : 0.4055 , 0.5 , 0.4055 , 1.5 , 0.4055 ,   0.5 , and 0.4055 , 1.5 , respectively.
Figure 4. Powers of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S under XCI-E. The simulation was based on 10,000 replicates with β = 0.15 , α 1 = 5 , and F = 0 . In the horizontal coordinates, “A”, “B”, and “C” represent three combinations of q f ,   q m : 0.3 , 0.3 , 0.3 , 0.2 , and 0.2 , 0.3 , respectively, and the numbers 1–4 represent four combinations of α 2 , α 3 : 0.4055 , 0.5 , 0.4055 , 1.5 , 0.4055 ,   0.5 , and 0.4055 , 1.5 , respectively.
Genes 13 00847 g004
Table 1. The genotypic scores for five genotypes and their corresponding values of Z 1   and Z 2   under the four special XCI patterns.
Table 1. The genotypic scores for five genotypes and their corresponding values of Z 1   and Z 2   under the four special XCI patterns.
XCI Pattern a a A a A A a A Z 1 Z 2
XCI-SN0020202
XCI-R0120212
XCI-SR0220222
XCI-E0120111
Table 2. Estimated type I error rate × 10 4 at the nominal significance level 1 × 10 4 for XCMAX 4 , FM 01 , FM 02 , FM F , and FM S against q f , q m , α 2 , and α 3 based on 1,000,000 replicates under HWE.
Table 2. Estimated type I error rate × 10 4 at the nominal significance level 1 × 10 4 for XCMAX 4 , FM 01 , FM 02 , FM F , and FM S against q f , q m , α 2 , and α 3 based on 1,000,000 replicates under HWE.
q f q m α 3   α 2   =   0.4005 α 2   =   0.4005
X C M A X 4 F M 01 F M 02 F M F F M S X C M A X 4 F M 01 F M 02 F M F F M S
0.10.1 0.5 1.030.740.860.950.820.870.930.930.660.98
0.20.880.960.840.860.961.071.021.100.991.02
0.30.910.870.841.100.840.861.020.880.951.00
0.20.10.940.960.980.930.950.860.850.740.780.79
0.21.021.020.931.121.001.121.431.241.221.39
0.30.901.090.990.761.021.100.991.030.981.01
0.30.10.870.880.870.880.870.791.010.930.960.91
0.20.931.011.040.770.990.830.940.890.910.93
0.30.831.130.920.950.910.961.151.051.141.06
0.10.1 1.5 0.881.060.930.791.010.881.000.830.770.92
0.20.840.770.820.790.750.920.870.960.980.86
0.30.881.221.080.961.160.991.171.141.071.09
0.20.10.920.931.060.921.030.811.030.910.960.91
0.20.841.010.920.910.980.860.800.880.850.78
0.30.941.081.141.081.080.850.940.980.980.97
0.30.11.001.081.030.931.000.990.851.010.960.94
0.20.881.080.910.930.920.930.760.900.830.94
0.30.820.950.861.010.880.981.060.971.101.09
Table 3. Data of rs3827400 related to Graves’ disease in two independent studies.
Table 3. Data of rs3827400 related to Graves’ disease in two independent studies.
DatasetRaceFemale CaseMale CaseFemale ControlMale Control
CCTCTTCTCCTCTTCT
Chu et al. (stage I)Han163508444109232219541367172186
Chu et al. (stage II)Han471160612982846065841344957396526
Szymanski et al. (Warsaw)Caucasian14620585535118822981146104
Szymanski et al. (Gliwice)Caucasian58783020117173272010
Table 4. p values of the XCMAX 4 , FM 01 , FM 02 , FM F , and FM S tests from four datasets.
Table 4. p values of the XCMAX 4 , FM 01 , FM 02 , FM F , and FM S tests from four datasets.
Dataset X C M A X 4 F M 01 F M 02 F M F F M S
Chu et al. (stage I) × 10 8 1.5739.5130.5071.8321.731
Chu et al. (stage II) × 10 15 0.8477.7640.5614.1081.144
Szymanski et al. (Warsaw) × 10 1 1.0830.4910.3951.0380.410
Szymanski et al. (Gliwice) × 10 1 5.9672.5152.8005.5002.628
Table 5. p values of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S tests from Han and Caucasian populations.
Table 5. p values of XCMAX 4 , FM 01 , FM 02 , FM F , and FM S tests from Han and Caucasian populations.
Population X C M A X 4 F M 01 F M 02 F M F F M S
Han × 10 22 1.27555.3470.2851.7921.444
Caucasian × 10 2 6.9322.5712.5535.7951.993
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Su, Y.; Hu, J.; Yin, P.; Jiang, H.; Chen, S.; Dai, M.; Chen, Z.; Wang, P. XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates. Genes 2022, 13, 847. https://doi.org/10.3390/genes13050847

AMA Style

Su Y, Hu J, Yin P, Jiang H, Chen S, Dai M, Chen Z, Wang P. XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates. Genes. 2022; 13(5):847. https://doi.org/10.3390/genes13050847

Chicago/Turabian Style

Su, Youpeng, Jing Hu, Ping Yin, Hongwei Jiang, Siyi Chen, Mengyi Dai, Ziwei Chen, and Peng Wang. 2022. "XCMAX4: A Robust X Chromosomal Genetic Association Test Accounting for Covariates" Genes 13, no. 5: 847. https://doi.org/10.3390/genes13050847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop