Abstract
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases. Several methods have been proposed to estimate the degree of XCI-S (denoted as ) for quantitative and qualitative traits based on unrelated females. However, there is no method available for estimating based on general pedigrees. Therefore, in this paper, we propose a Bayesian method to obtain the point estimate and the credible interval of based on the mixture of general pedigrees and unrelated females (called mixed data for brevity), which is also suitable for only general pedigrees. We consider the truncated normal prior and the uniform prior for . Further, we apply the eigenvalue decomposition and Cholesky decomposition to our proposed methods to accelerate the computation speed. We conduct extensive simulation studies to compare the performances of our proposed methods and two existing Bayesian methods which are only applicable to unrelated females. The simulation results show that the incorporation of general pedigrees can improve the efficiency of the point estimation and the precision and the accuracy of the interval estimation of . Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use.
1. Introduction
X chromosome inactivation (XCI) is an important epigenetic phenomenon, which was described by Lyon [1] for the first time. In mammals, females have two X chromosomes, whereas males have only one X chromosome. During the early development of embryos in females, one of the two X chromosomes becomes a Barr body and remains inactivated in subsequent somatic cells to ensure the balance of transcriptional dosages on the X chromosome between females and males [2]. In general, the process of XCI is random. Specifically, in females, approximately 50% of the cells have the paternal allele at an X-chromosomal locus inactivated, and the remaining approximately 50% of the cells keep the maternal allele inactivated, which is called random XCI (XCI-R) [3]. However, there are still two other patterns of XCI: the escape from XCI (XCI-E) and the skewed XCI (XCI-S) [3]. XCI-E means that a female has a region of the X chromosome without inactivation, i.e., the alleles on both X chromosomes are kept active. In humans, approximately 15–30% of X-linked loci have been reported to undergo XCI-E [4,5]. As for XCI-S, more than 75% of the cells inactivate the same allele at an X-chromosomal locus in females [6,7,8]. In some extremely skewed cases, it is possible that more than 90% of the cells have the same allele being inactivated [9].
Some X-linked diseases have been reported to be associated with XCI-S, such as esophageal carcinoma, recurrent spontaneous abortion, and Klinefelter’s syndrome [10,11,12,13]. The degree of XCI-S can affect the severity of X-linked diseases in heterozygous females [14]. A larger proportion of the cells with the activated deleterious allele in heterozygous females will cause more severe expression of the related diseases, whereas a smaller proportion can protect the females from the diseases [6,7]. For example, the XCI-S towards mutant alleles on the F9 gene may cause moderately severe haemophilia B, whereas the XCI-S against the same mutant alleles may cause mild haemophilia B in heterozygous females [15]. Thus, the incorporation of the XCI-S information into association analysis may improve the test power of the X-chromosomal association tests [16]. In fact, some methods have been proposed to test for the association between X-chromosomal single nucleotide polymorphisms (SNPs) and traits, which consider the XCI patterns [17,18,19,20,21,22,23,24,25,26]. For unrelated data, Wang et al. [24] proposed a permutation-based maximum likelihood ratio association test for qualitative traits, which takes account of all the XCI patterns. More specifically, for XCI, three female genotypes (, and ) are encoded as 0, , and 2, respectively, meanwhile two male genotypes ( and ) are encoded as 0 and 2, respectively, where and are the normal and deleterious alleles, respectively. is the unknown genotypic value used to measure the degree of XCI-S. For XCI-E, three female genotypes are encoded as 0, 1, and 2, and two male genotypes are encoded as 0 and 1. For pedigree data, Ding et al. [21] put forward a Monte Carlo pedigree disequilibrium test for X-linked qualitative traits and Zhang et al. [25] constructed the orthogonal model and used the kinship matrix to represent the correlation between the individuals in pedigrees for X-linked quantitative traits. Both methods take XCI-R or XCI-E into account, however, they are not suitable for XCI-S. Furthermore, the method of Ding et al. [21] cannot directly incorporate covariates and the method of Zhang et al. [25] is time-consuming. On the other hand, there is an autosomal association test, named GEMMA, which can incorporate covariates and is computationally efficient for pedigree data [27]. Moreover, GEMMA can be easily extended to accommodate the XCI-R and XCI-E patterns.
Recently, some methods to measure the degree of XCI-S have become available. Based on family trios (parents and their affected daughter), Xu et al. [28] proposed a statistical index to measure for qualitative traits. Wang et al. [29] and Li et al. [30] used unrelated females to estimate and derive the corresponding confidence interval (CI) for qualitative and quantitative traits, respectively. In Wang et al. [29] and Li et al. [30], was expressed as the ratio of two regression coefficients, and the CI was obtained using Fieller’s method. However, these methods may yield unbounded CIs when the denominator in the ratio is close to zero. It should be noted that Wang et al. [31] put forward a penalized Fieller’s method which can obtain the bounded CI of a ratio by penalizing the denominator of the ratio away from zero. Therefore, Yu et al. [32] applied the penalized Fieller’s method to the estimation of the degree of XCI-S for unrelated females. However, the penalized Fieller’s method does not consider the constraint condition of , and just simply uses the interval [0, 2] to truncate the point estimate and the CI of , which may result in extreme point estimates (0 or 2), empty sets, non-information intervals (i.e., [0, 2]), and discontinuous intervals. Therefore, Yu et al. [32] considered the constraint condition as the prior, and further proposed a Bayesian method for estimating the degree of XCI-S based on unrelated females. The Bayesian method can avoid the generation of extreme point estimates, empty sets, non-information intervals, and discontinuous intervals. However, the above-mentioned methods are all based on family trios or unrelated females and cannot accommodate general pedigrees. It should be noted that general pedigrees are increasingly popular because pedigree designs are naturally equipped to control for population stratification [33,34]. Therefore, it is necessary to suggest a method for estimating the degree of XCI-S based on general pedigrees or the mixture of general pedigrees and unrelated females.
In this paper, we propose a Bayesian method to estimate the degree of XCI-S based on the mixture of general pedigrees and unrelated females for both quantitative and qualitative traits, which is also suitable for only general pedigrees. We use the kinship matrix to represent the correlation between females in general pedigrees and construct the generalized linear mixed model. The prior of is set to be a truncated normal distribution and a uniform distribution. The posterior distribution of is drawn using a Hamiltonian Monte Carlo (HMC) sampling algorithm. We regard the mode of the sample from the posterior distribution as the point estimate of , and consider the corresponding highest posterior density interval (HPDI) as the credible interval of [35]. Because the posterior sampling process of the generalized linear mixed model is very computationally intensive [36], we additionally employ the eigenvalue decomposition (EVD) and Cholesky decomposition to accelerate the computation speed. Further, we conduct extensive simulation studies to compare the performances of our proposed methods and the existing Bayesian methods. Finally, we apply our proposed methods to Minnesota Center for Twin and Family Research (MCTFR) data for their practical use.
2. Materials and Methods
2.1. Notations
Consider an X-chromosomal locus with alleles and being the normal allele and the deleterious allele, respectively. Suppose that we have collected the X-linked traits (quantitative or qualitative), the genotypes at the locus of pedigrees (including individuals, males or females), and additional independent/unrelated females. Note that the individuals in the same pedigree are genetically correlated. Since XCI only exists in females, we only select females in these pedigrees and additional unrelated females to build the model, and we assume that . Let be the trait of the th female and indicate the genotype of the th female . Then, is the trait vector of all the females, and is the genotype vector of all the females. According to Wang et al. [24], we encode the genotypes as the genotypic values , where represents the degree of XCI-S. As such, the genotypic value vector of all the females can be expressed as . Considering the correlations among females selected from pedigrees, we utilize the kinship matrix to measure the correlations of this kind. To be specific, we first use both the males and the females in the pedigrees to construct an kinship matrix , which can be obtained using the algorithm of Lange [37] through the “kinship2” package of R software [38]. Then, we select the corresponding rows and columns of females in matrix and obtain the matrix of these females. As for unrelated females, the genetic relatedness matrix can be expressed as the identity matrix . Finally, the genetic relatedness matrix of can be denoted as the following block matrix:
We build the generalized linear mixed model to describe the association between and
where is the regression coefficient of ; is the vector of covariates of the th female including 1 as the first element and is an covariate matrix; is the vector of the regression coefficients of with being the intercept; is a random effect, and the random variable is generated by the multivariate normal distribution, i.e., , where is the variance of the polygenic effects; is the link function; and is the conditional mean of given and .
To estimate , we decompose in Equation (1) into according to Wang et al. [29], where and are two indicator variables. indicates whether the th female contains at least one deleterious allele , and denotes whether the th female has two deleterious alleles. Then, we can rewrite Equation (1) as follows:
For quantitative traits, is the identity function and has the residual error , so Equation (2) becomes a linear mixed model
where and is the variance of . For qualitative traits, is the logit function, and Equation (2) can be written as
2.2. Building Bayesian Models
For quantitative traits, follows a multivariate normal distribution according to Equation (3), i.e.,
where and . The unknown parameters are , and let be the likelihood function of based on expression (5). So, the posterior distribution of can be expressed as , where is the joint prior of .
As for qualitative traits, follows a Bernoulli distribution based on Equation (4), i.e.,
where and
The unknown parameters are , and let be the likelihood function of based on expression (6). The posterior distribution of can be expressed as , where is the joint prior of .
2.3. Eigenvalue Decomposition and Cholesky Decomposition for Accelerating Computation Speed
It should be noted that, due to the high-dimensional matrix , the Bayesian posterior sampling processes of and would be computationally intensive, especially when is large [36,39]. So, according to Runcie and Crawford [40] and Zhao et al. [36], we use the EVD and Cholesky decomposition to accelerate the sampling process for quantitative and qualitative traits, respectively. The transformed posterior distributions of and are denoted by and , respectively, where , and , respectively, are the transformed , and based on by the EVD; is a lower triangular matrix satisfying by Cholesky decomposition; and follows and satisfies . The details refer to Supplementary Appendices SA and SB. From Table 1, we find that using the EVD and Cholesky decomposition in the posterior sampling process can greatly reduce running time (the details can be seen in Section 3).
Table 1.
Mean running time of the BNP method with a posterior sampling process based on EVD or Cholesky decomposition and an original posterior sampling process for general pedigrees.
2.4. HMC Algorithm and Priors
Note that it is difficult to derive the closed forms of the posterior distributions and , so we use the HMC algorithm [35] to sample the parameters from the approximate posterior distributions, which can be efficiently implemented through the “cmdstanr” package in R software. We choose the HMC algorithm because it can improve the independence of the samples and has higher efficiency than the other Markov-Chain Monte Carlo methods [35].
According to Yu et al. [32], we set the priors of and as follows: For nuisance parameters and , we select non-informative priors to reduce their influence on the posterior distributions. Specifically, we assume that and [41] so that and can be sampled from the positive and negative values with equal probabilities. For the standard deviation of polygenic effects, we choose the exponential distribution with mean being 1, i.e., [35]. For based on quantitative traits, there is an extra parameter , and we also suppose that . For the parameter of interest, by considering the constraint condition of , we set two priors. One is a uniform distribution from 0 to 2, i.e., , which is a non-informative prior. The other is to assume that the more skewed values of have the lower probability and the probability of being 1 is the highest, which is consistent with the genetic background [3]. In this way, we set to obey a truncated normal distribution with both the parameters being fixed at 1 and the values ranging from 0 to 2. The probability density function of the prior of is
where is the probability density function of the standard normal distribution and is its cumulative distribution function. We assume that the unknown parameters are unrelated to each other because the HMC algorithm does not dramatically suffer from the correlated parameters in the model [35]. Therefore, the prior distributions and can be calculated as the product of the priors of all the parameters. Moreover, and can also be flexibly set according to practical background.
After we obtain the posterior samples of and through the HMC algorithm, we calculate the mode of the samples as the point estimate of , and compute the HPDI of the samples as the credible interval of . We denote the Bayesian methods with the truncated normal distribution and the uniform distribution for the mixture of general pedigrees and additional unrelated females as BNM and BUM, and the corresponding point estimates yielded by these two methods as and , respectively.
2.5. Situations When Considering General Pedigrees and Unrelated Females, Respectively
Notice that our proposed methods are also applicable to the situation with only general pedigrees and that with only unrelated females. For the situation with only general pedigrees, the genetic relatedness matrix degenerates to twice the kinship matrix of all the females from the pedigrees, i.e., . We denote the Bayesian methods with the truncated normal distribution and the uniform distribution for general pedigrees as BNP and BUP, and the corresponding point estimates as and , respectively. For the situation with only unrelated females, our proposed methods still work where the genetic relatedness matrix is reduced to be the identity matrix . However, compared with the existing BN and BU methods having the prior of being the truncated normal distribution and the uniform distribution, respectively [32], our proposed methods require additionally estimating the random effects ’s, which may reduce the estimation accuracy and be time-consuming. Therefore, in practice, for unrelated females, we recommend using the existing BN and BU methods. Furthermore, just like Yu et al. [32], the point estimates of based on the BN and BU methods are represented as and , respectively.
2.6. Situation When There Are Missing Genotypes for Some Individuals from General Pedigrees
It should be noted that our proposed methods are also suitable for the situation where the genotypes of some individuals from some pedigrees are missing, by simply excluding the individuals with missing genotypes and deleting the corresponding rows and columns of these individuals from the genetic relatedness matrix .
2.7. Simulation Settings
To evaluate the performance of our proposed methods (BNM and BUM for the mixture of general pedigrees and additional unrelated females, and BNP and BUP for only general pedigrees) and compare them with the existing methods (BN and BU for only unrelated females) [32] when estimating the degree of the XCI-S, we conduct the following extensive simulation studies. When simulating general pedigrees, we consider three pedigree structures: (1) the nuclear family with 4 people, (2) the three-generation family with 10 people and (3) the four-generation family with 12 people, as shown in Figure 1. We fix the sex ratio at 1:1 in our simulation study. A total of 50 pedigrees under each pedigree structure are simulated, which leads to being 150, being 1300, and being approximately 650. For a larger sample size, we simulate 200 pedigrees under each pedigree structure, and the corresponding is 600, is 5200, and is approximately 2600. Because there are two X chromosomes in females and only one in males, we first generate the genotypes of the female founders using probabilities and the genotypes of the male founders using probabilities , where and are the frequencies of the deleterious allele in females and males, respectively. We first set to be 0.3 and 0.1 and keep consistent with . To simulate the situations with and being different, we further set and . Then, we simulate the genotypes of the nonfounders according to Mendelian inheritance. We consider a covariate , which is generated from the standard normal distribution. Note that the estimation of the degree of XCI-S only needs the females. As such, let be the value of for the th female () and we only simulate the quantitative trait values of all the females in the pedigrees, which are generated based on the following multivariate normal distribution:
where is the vector of the quantitative trait values of these females; is the vector of their genotypic values with the elements being 0, , or 2 respectively corresponding to genotypes , where the value of represents the degree of XCI-S and is randomly sampled from ; ; and is the kinship matrix of the females and is an identity matrix. is the intercept and is the regression coefficient of the covariate , which are both fixed at 0.5 [42]. According to Schifano et al. [43], we set and , which means that the values of the polygenic heritability . Furthermore, we set so that the heritability due to the causal SNP, , remains less than 2% for the chosen values of , , and mentioned above. As for a qualitative trait, we generate the corresponding values using the threshold model [44]. Specifically, once the quantitative trait values in Equation (8) are generated, they are transformed to be affected if they are less than the threshold and otherwise to be unaffected. Here, we fix the prevalence of the disease under study at 0.3, and the threshold is then taken as the 30% quantile of the distribution of the quantitative trait. In addition, to consider the situation in which the genotypes of some individuals in the pedigrees are missing, the missing rate () is set to be 0 and 0.4. means that the genotypes of all the individuals in the pedigrees are collected and indicates that the genotype of an individual is randomly missing with probability 0.4.
Figure 1.
Pedigree structure used for the simulation studies. The squares are males, the circles are females and the rhombus could be any gender. The numbers are used to encode the family members. (a) Nuclear family; (b) three-generation family; and (c) four-generation family.
When simulating unrelated females, we directly generate their genotypes using probabilities . For comparing BNP and BUP for only general pedigrees with BN and BU for only unrelated females, respectively, we set the number of unrelated females () to be 650 and 2600, which is almost equal to the number of the females in 150 and 600 pedigrees mentioned above, and we fix the variance of the residual error in the unrelated females at [45], which is the same as the total variance of the quantitative trait value in the females from the general pedigrees. Other parameters and simulation settings are kept the same as those when simulating general pedigrees. Specifically, the quantitative trait values of the unrelated females are generated according to the following multivariate normal distribution:
where is the vector of the quantitative trait values of the unrelated females; is the vector of their genotypic values with the elements being 0, , or 2 corresponding to genotypes ; is the covariate vector, where is the value of the covariate for the th female (); and is an identity matrix. As for a qualitative trait, just like simulating the general pedigrees, we also generate the corresponding values using the threshold model [44]. By combining the females in the general pedigrees and additional unrelated females, we can obtain the mixed data. We use the BNM and BUM methods, the BNP and BUP methods, and the existing BN and BU methods to obtain the point estimates and the HPDIs of based on the mixed data, only general pedigrees, and only unrelated females, respectively.
Ma et al. [23] claimed that the variance of the quantitative trait under study for heterozygous females () may be higher than those for homozygous females ( and ) due to the XCI and other factors (e.g., gene-gene interactions and gene mutation), and the increase ratio can be up to 20%. However, so far, in our model, we do not consider the heteroscedasticity of this kind because of the potential computation cost in Bayesian inference. To investigate whether our proposed methods are still robust in the presence of the heteroscedasticity, we additionally simulate the mixed data for quantitative traits with the heteroscedasticity. Specifically, we use , , and to represent the residual variance in females with genotypes , , and , respectively. The simulation settings for the mixed data are the same as those under the homoscedasticity, except that we assume here. Furthermore, for comparison, we utilize to represent that the variances across different genotypes are the same. We apply the BNM and BUM methods to the mixed data, and apply the BNP and BUP methods to only general pedigrees.
We conduct 500 replicates for each simulation setting. For each replicate, we set 4 chains for extracting the samples simultaneously. For each chain, we extract 3000 samples, and the first 1000 samples are used for warming up. Therefore, we finally obtain 8000 samples in each replicate. To ensure the convergence, the target acceptance rate is taken as 0.9. We assess the convergence of Markov chains by calculating the convergence diagnostic [46]. Note that the ’s of our proposed methods are all less than 1.05, which indicates good convergence and also means that drawing 8000 samples is enough. The above posterior sampling process is implemented using the “cmdstanr” package in R software (version 4.1.2, http://r-project.org, accessed on 2 February 2023). To evaluate the accuracy of the point estimates, we calculate their mean squared errors (MSEs). Here, , where is the th true value of , and is the estimate of (). We also draw scatter plots to visually display the six point estimates (, , , , , and ) against the true values of . To compare the performances of the interval estimation of all the six methods (BNM, BUM, BNP, BUP, BN, and BU), we calculate the coverage probability (CP) as well as the median, the mean, the interquartile range, and the standard deviation of the widths of the 95% HPDIs of (respectively denoted by , , , and ). Moreover, we draw scatter plots of the interval widths of all the six methods against the true values of .
3. Results
3.1. Simulation Results under the Situations of Homoscedasticity and Allele Frequencies in Females and Males Being the Same
To assess the computation efficiency of our proposed methods based on the EVD and Cholesky decomposition, we considered the BNP method for only general pedigrees as an example. Here, was taken to be 150 and 600, , , and (i.e., there were no missing genotypes in all the pedigrees) for both quantitative and qualitative traits. The other parameters were fixed in the same way as in the “Simulation Settings” subsection. A total of 500 replicates were conducted for each simulation setting. There were two kinds of BNP methods that we wanted to compare: (1) the BNP method with the posterior sampling process based on the EVD (for quantitative traits) or Cholesky decomposition (for qualitative traits), and (2) the BNP method with the posterior sampling process based on the posterior distribution (for quantitative traits) or (for qualitative traits), which is called the original posterior sampling process in this paper. We computed the mean running time of the BNP method based on the EVD or Cholesky decomposition for all 500 replicates. However, it is important to note that the original posterior sampling process may take up a huge amount of time. Therefore, we only calculated the mean running time of the original posterior sampling process over the first 10 replicates. All the computations were performed on a Tsinghua Tongfang Z900 personal computer (Microsoft Windows 7 Enterprise (Service Pack 1), 4 GB of RAM and 3.60 GHz Intel(R) Core(TM) i7-4790 CPU). The results of the mean running time are given in Table 1. As shown in Table 1, the EVD and Cholesky decomposition can greatly speed up the Bayesian sampling process, especially when is 600.
The MSEs of the six point estimates (, , , , , and ) of under and are listed in Table 2. We found that the MSEs of and based on the mixed data are the smallest under all the simulated scenarios, which means that it is more efficient to estimate the degree of XCI-S by simultaneously using general pedigrees and additional unrelated females. The MSEs of and for only general pedigrees are slightly larger than those of and for only unrelated females in all the simulated situations. This probably demonstrates that general pedigrees provide less information for estimating the degree of XCI-S than unrelated females when the total number of the females in all the pedigrees and that of the unrelated females are the same. As for the two priors of , the point estimates (, , and ) with the truncated normal distribution have the MSEs similar to those (, , and ) with the uniform distribution, with , , and performing slightly better than , , and , respectively. Furthermore, it can be observed from Table 2 that the MSEs of the six point estimates decrease when and increase, and (the frequency of the deleterious allele ) increase, and (the variance of the polygenic effects) decreases. As expected, compared to (i.e., there are no missing genotypes in all the pedigrees), the MSEs of the six point estimates increase when (i.e., the genotypes of about 40% individuals in general pedigrees are missing). In addition, the six point estimates have smaller MSEs for quantitative traits than for qualitative traits.
Table 2.
Mean squared errors (MSEs) of point estimates , , , , , and under and among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Figure 2 and Figure 3 show the scatter plots of the six point estimates (, , , , , and ) against the true values of with , , , , , and for quantitative and qualitative traits, respectively. Supplementary Figures S1–S14 show the corresponding scatter plots under other simulation settings. The six rows of each figure represent the results of the six point estimates, and the two columns of each figure denote the corresponding results with and , respectively (i.e., subplots (a), (c), (e), (g), (i) and (k) are the scatter plots of , , , , , and with , respectively, whereas subplots (b), (d), (f), (h), (j) and (l) are the corresponding scatter plots with ). The upper side and the right side of each subplot are the distribution of the true value of and that of the point estimate of , respectively. By comparing the six subplots in the same column of each figure, we found that and based on the mixed data are closer to the true value of than , , , and . Moreover, noting that the distribution of the true value of is , it can be seen that the distributions of and are more uniform than those of the four other point estimates. These indicate that it is necessary to combine general pedigrees with unrelated females when estimating . The dispersion of is slightly smaller than that of , and the dispersions of and are slightly less than those of and , although differences of these kinds are not so obvious in most figures. By comparing the two subplots in the same row of each figure, it can be seen that the estimates with (in the subplot of the second column) have larger dispersion than those with (in the subplot of the first column), implying that the missing genotypes of some individuals in the collected pedigrees would increase the MSEs of the point estimates. Furthermore, comparing Figure 2 with Figure 3 (or comparing Supplementary Figures S1–S7 with Supplementary Figures S8–S14, respectively) shows that the six point estimates have better performance for quantitative traits than for qualitative traits. In addition, from these figures, the trend of the six point estimates with respect to , , , , and is consistent with that in Table 2. Finally, it is observed from these figures that most of the point estimates can be evenly distributed on both sides of the true value of , except for the situations with , , and , where the six point estimates may underestimate (Supplementary Figures S2, S3, S9 and S10). However, when , , and , we can obtain point estimates which are much more evenly distributed around the true value of (Supplementary Figures S6, S7, S13 and S14). This suggests that when analyzing the SNPs with low frequencies of the deleterious allele, our proposed point estimates need large sample sizes.
Figure 2.
Scatter plots of six point estimates of against true values of with , , , , , and for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of and that of the point estimate of , respectively. (a) with ; (b) with ; (c) with ; (d) with ; (e) with ; (f) with ; (g) with ; (h) with ; (i) with ; (j) with ; (k) with ; and (l) with .
Figure 3.
Scatter plots of six point estimates of against true values of with , , , , , and for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of and that of the point estimate of , respectively. (a) with ; (b) with ; (c) with ; (d) with ; (e) with ; (f) with ; (g) with ; (h) with ; (i) with ; (j) with ; (k) with ; and (l) with .
Table 3 describes the CPs of the six interval estimation methods (BNM, BUM, BNP, BUP, BN, and BU) under and . From Table 3, we can find that all six methods can control the CPs around 95% in all the simulated situations, which verifies their accuracy when estimating the degree of XCI-S. Table 4 and Supplementary Table S1 display the medians and the means of the widths of the 95% HPDIs ( and ), respectively, obtained by the six methods under and . From these tables, we can see that the BNM and BUM methods based on the mixed data have smaller and than the other four methods (BNP, BUP, BN, and BU) under all the simulated scenarios, which indicates that simultaneously using general pedigrees and additional unrelated females can improve the precision of the interval estimation of the degree of XCI-S. The and of the BNP and BUP methods for only general pedigrees are slightly larger than those of the BN and BU methods for only unrelated females, which is consistent with the findings based on the MSEs of their corresponding point estimates from Table 2. For two priors of , the interval estimation with the truncated normal prior (the BNM, BNP, and BN methods) and that with the uniform prior (the BUM, BUP, and BU methods) have a similar performance, whereas the BNM, BNP, and BN methods respectively obtain slightly smaller and than the BUM, BUP, and BU methods. When and increase, and (the frequency of the deleterious allele ) increase, (the variance of the polygenic effects) decreases, (the probability of the genotype of an individual in a pedigree being missing) changes from 0.4 to 0, or the trait changes from qualitative to quantitative, the and of the six methods decrease.
Table 3.
Coverage probabilities (CPs, in %) of the BNM, BUM, BNP, BUP, BN, and BU methods under and among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively a.
Table 4.
s of the BNM, BUM, BNP, BUP, BN, and BU methods under and among 500 replicates for mixed data, only general pedigrees, and unrelated females, respectively.
Table 5 and Supplementary Table S2 show the interquartile range and the standard deviation of the widths of the 95% HPDIs ( and ), respectively, of the six methods under and . Figure 4 and Figure 5 display the scatter plots of the widths of the 95% HPDIs based on the six interval estimation methods (BNM, BUM, BNP, BUP, BN, and BU) against the true values of with , , , , , and for quantitative and qualitative traits, respectively. Supplementary Figures S15–S28 give the corresponding scatter plots under other simulation settings. The six rows of each figure represent the results of the six methods, and the two columns of each figure denote the corresponding results with and , respectively. From Table 5, we find that when the prior of is fixed to be the truncated normal distribution, the BNM method generally obtains smaller than the BNP and BN methods under all the simulated scenarios except for the cases of , , and for quantitative traits and those of and for qualitative traits. Similarly, when the prior of is taken as the uniform distribution, the of the BUM method are less than those of the BUP and BU methods under all the simulated scenarios in general, except for the situations mentioned above. It can be seen in Supplementary Table S2 that the BNM (BUM) method generally derives smaller than the BNP and BN (BUP and BU) methods except for the cases with for both quantitative and qualitative traits and those with , , and for qualitative traits. This may be explained by the fact that the largest width of the 95% HPDIs of the six methods is 2, and when and or , the widths of the intervals obtained by the BNP, BUP, BN, and BU methods are very close to 2 (as can be observed in Supplementary Figures S16, S17 and S22–S24), which make the dispersion of the widths of the intervals of the BNP, BUP, BN, and BU methods smaller and cause smaller and of the BNP, BUP, BN, and BU methods. It is important to note that the width of the 95% HPDI of does not follow the normal distribution under most of the simulated scenarios (Supplementary Figures S15–S17, S20–S24, S27 and S28), so the trend of the results of the is not exactly the same as that of the . On the other hand, the and of the BNP (BUP) method are larger than those of the BN (BU) method. In addition, the and of the six methods decrease with higher and when and , and increase when and and other parameters are unchanged. As for two priors, the BNM, BNP, and BN methods obtain slightly smaller and than the BUM, BUP, and BU methods. It is shown in some subplots of Figure 4 and Figure 5 and Supplementary Figures S15–S28 that the scatter plots look like an inverted V shape. This indicates that shorter intervals are obtained when the true values of are close to 0 or 2, by noting . On the other hand, in some figures (e.g., Supplementary Figures S16, S17 and S22–S24), most of the widths of the intervals based on the BNP, BUP, BN, and BU methods are very close to 2, which leads to the smaller dispersion of the interval widths. Other findings are similar to those from Table 4 and Table 5, and Supplementary Tables S1 and S2, and we do not discuss them here for brevity.
Table 5.
s of the BNM, BUM, BNP, BUP, BN, and BU methods under and among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively.
Figure 4.
Scatter plots of widths of HPDIs based on six methods against true values of with , , , , , and for quantitative trait. The upper side and the right side of each subplot are the distribution of the true value of and that of the width of the HPDI of , respectively. (a) BNM with ; (b) BNM with ; (c) BUM with ; (d) BUM with ; (e) BNP with ; (f) BNP with ; (g) BUP with ; (h) BUP with ; (i) BN with ; (j) BN with ; (k) BU with ; and (l) BU with .
Figure 5.
Scatter plots of widths of HPDIs based on six methods against true values of with , , , , , and for qualitative trait. The upper side and the right side of each subplot are the distribution of the true value of and that of the width of the HPDI of , respectively. (a) BNM with ; (b) BNM with ; (c) BUM with ; (d) BUM with ; (e) BNP with ; (f) BNP with ; (g) BUP with ; (h) BUP with ; (i) BN with ; (j) BN with ; (k) BU with ; and (l) BU with .
3.2. Simulation Results When Allele Frequencies in Females and Males Being Different
Supplementary Table S3 shows the MSEs of the point estimates , , , and under and , and . Supplementary Tables S4–S8 give the CP, , , , and , respectively, of the BNM, BUM, BNP, and BUP methods under and , and . It can be observed from Supplementary Table S4 that when and , all four methods control the CPs around 95%. From Supplementary Tables S3 and S5–S8, the MSE, , , and of the four methods with and are generally smaller than those with and larger than those with (compared with Table 2 and Table 4, Supplementary Table S1, Table 5 and Supplementary Table S2, respectively), implying that our proposed methods still work when there are differences in the frequencies of the deleterious alleles between females and males.
3.3. Simulation Results under Heteroscedasticity
Supplementary Table S9 displays the MSEs of the point estimates , , , and under , and and . Supplementary Tables S10–S14 show the CP, , , , and , respectively, of the BNM, BUM, BNP, and BUP methods under , and and . As shown in Supplementary Table S10, our four proposed methods all control the CPs around 95% well when heteroscedasticity exists (i.e., ). From Supplementary Table S9 and Supplementary Tables S11–S14, we can find that the MSE, ,, , and of our proposed methods under heteroscedasticity are similar to the corresponding results under homoscedasticity (i.e., ) for all simulated situations, which indicates that our proposed methods are still robust when heteroscedasticity is present.
3.4. Application to MCTFR Data
The MCTFR Genome-Wide Association Study of Behavioral Disinhibition is a family-based study of substance abuse and related psychopathology [47]. The dataset can be downloaded from the database of Genotypes and Phenotypes with the accession number phs000620.v1.p1 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1, accessed on 2 February 2023). This dataset contains 2183 families, 7377 participants (3831 females and 3546 males), and 527,829 SNPs. There are five quantitative traits in the dataset: the nicotine composite score, the alcohol consumption composite score, the alcohol dependence composite score (DEP), the illicit drug composite score, and the behavioral disinhibition composite score [48]. Because we only use females for measuring the degree of XCI-S, 3831 females and 12,354 SNPs on the X chromosome were selected. We filtered the data using the following quality control criteria: (1) excluding SNPs with a missing rate > 10%, (2) removing SNPs with a minor allele frequency < 5%, and (3) excluding individuals with a genotype missing rate > 10%. After quality control, 850 families, 3195 females (including 1959 females from 850 families and additional 1236 unrelated females), and 11,344 SNPs were kept to conduct the subsequent analyses.
It is important to note that estimating requires the SNPs on the X chromosome to be associated with the traits under study. Therefore, borrowing the idea of the GEMMA method for association analysis on autosomes based on only general pedigrees [27], we propose an improved linear mixed model to test for association on the X chromosome based on the mixed data. We made the following two main modifications: Firstly, we set the relatedness matrix as the block matrix in the Materials and Methods section so that the proposed linear mixed model is applicable to the mixture of general pedigrees and additional unrelated females. Secondly, the parameter is generally unknown. To consider the XCI, referring to Wang et al. [24], we utilized the grid search method and was taken to be {0, 0.5, 1, 1.5, 2} in the increments of 0.5. We used the improved linear mixed model to calculate the p-value for each value of , and then combined these five p-values using Cauchy’s method [49] to obtain the final test statistic. We conducted some simulation studies and found that the proposed improved linear mixed model can control the type I error rate well (the details can be seen in Supplementary Appendix SC and Supplementary Table S15). It should be noted that the five quantitative traits in the MCTFR dataset do not follow normal distributions. Therefore, we transformed the traits using the rank-based inverse normal transformation [50] before carrying out association analysis. Furthermore, we incorporated two covariates, age and year of birth, into the improved linear mixed model. The significance level of the association tests was set to be based on the Bonferroni correction.
The proposed linear mixed model identified three SNPs, rs10522027, rs12860832, and rs12849233, which are associated with the DEP trait at the level. The positions, alleles, minor allele frequencies, corresponding traits, p-values, and genes which the three SNPs belong to are presented in Table 6. SNP rs10522027 is found within the gene transmembrane protein 47 (TMEM47), which may be associated with the chemoresistance of breast cancer cells and hepatocellular carcinoma [51]. SNPs rs12860832 and rs12849233 are found in the gene PAS domain containing repressor 1 (PASD1), which might serve as a new target for the prognosis and the future treatment of glioma [52]. Furthermore, we calculated the point estimates (, , , , , and ) and the 95% HPDIs of based on the proposed BNM, BUM, BNP, and BUP methods and the existing BN and BU methods for these three SNPs, where the BNM and BUM methods use the mixed data (850 families and an additional 1236 unrelated females), the BNP and BUP methods utilize only 850 families with 1959 females, and the BN and BU methods are applied to only the additional 1236 unrelated females. The point estimates and the corresponding 95% HPDIs of obtained by the six methods for these SNPs are listed in Table 7. It is shown that the six point estimates of for the three SNPs are not far away from one, and the corresponding 95% HPDIs all contain one, which means that the XCI patterns of the three SNPs are the XCI-R or the XCI-E. In addition, we can observe the advantage of the BNM and BUM methods because they generally obtain smaller credible intervals than the other four methods, which is consistent with our simulation results. However, the BNP and BUP methods can give shorter HPDIs than the BN and BU methods, which does not coincide with our simulation results. This could be because the number of females in the 850 families is 1959, which is much larger than the number of additional unrelated females (1236).
Table 6.
SNPs detected in association analysis for the MCTFR data.
Table 7.
Application of the six methods to SNPs detected in association analysis for the MCTFR data.
4. Discussion
In this paper, we consider a generalized linear mixed model and propose two Bayesian methods (BNM and BUM) to estimate the degree of XCI-S (i.e., ) based on the mixture of general pedigrees and additional unrelated females for both quantitative and qualitative traits, where the BNM method uses the prior of the truncated normal distribution and the BUM method utilizes the prior of the uniform distribution, which both make full use of the constraint condition of . When only general pedigrees were available, the BNM and BUM methods were reduced to the BNP and BUP methods, respectively. We do not propose the corresponding Fieller’s method and the Penalized Fieller’s method to estimate the degree of XCI-S based on general pedigrees in this paper, as it has been confirmed that the performance of the above two methods is worse than Bayesian methods for only unrelated females [32]. It is important to note that that the closed form of the posterior distribution of is not easily derived, so we applied the HMC algorithm to conduct the posterior sampling process, calculated the mode of the resulting samples as the point estimate of , and regarded the HPDI of as the credible interval of . However, the posterior sampling process based on general pedigrees is very computationally intensive, especially when the dimension of the relatedness matrix (i.e., in this paper) is over 1000 [36]. As such, we used the EVD and Cholesky decomposition of to speed up the posterior sampling process for quantitative and qualitative traits, respectively. On the other hand, we also considered the median and the percentile interval (the 2.5th and 97.5th percentiles) of the posterior samples as the point estimate and the credible interval of , respectively. However, they performed less well than the mode and the HPDI (data not shown for brevity), and then we selected the latter instead.
The simulation results demonstrate that the EVD and Cholesky decomposition can greatly speed up the posterior sampling process, which is important to allow our proposed methods to accommodate large sample sizes, and may be referenced by other Bayesian researchers. The simulation results under and also show that the BNM and BUM methods have similar performances and are advantageous over the other four methods, which indicates that it is necessary to simultaneously analyze general pedigrees and additional unrelated females when estimating the degree of XCI-S in practice. More specifically, for the point estimation, the MSEs of and are close to each other and are smaller than those of the four other point estimates. The MSE of is the smallest in all the simulated situations. The MSEs of the existing point estimates and for unrelated females are slightly smaller than those of and for general pedigrees when the number of females is fixed. This suggests that general pedigrees provide less information for estimating the degree of XCI-S than unrelated females when the total number of the females in all the pedigrees and that of the unrelated females are the same. For the interval estimation, all six methods (BNM, BUM, BNP, BUP, BN, and BU) control the CPs around 95%. The BNM and BUM methods perform similarly to each other and both obtain much smaller credible intervals ( and ) than the other four methods under all the simulated scenarios. The BNP and BUP methods perform slightly worse than the BN and BU methods when the number of females is fixed, which is consistent with the findings based on the point estimation (, , , and ). For two priors of , the performances of the BNM, BNP, and BN methods with the truncated normal prior are slightly better than those of the BUM, BUP, and BU methods with the uniform prior, whereas differences of these kinds are not so obvious in our proposed methods, suggesting that our proposed methods are not as sensitive to the choice of priors. Furthermore, our proposed methods perform better when and increase, and (the frequency of the deleterious allele ) increase, (the variance of the polygenic effects) decreases, or the trait changes from qualitative to quantitative. When there are missing genotypes for some individuals in pedigrees, the SLINK software based on the peeling algorithm [53] could be used to impute these missing genotypes. However, to make the test statistics in hypothesis testing robust, or the parameter estimation accurate and precise, one may repeatedly impute the missing genotypes using the SLINK software (e.g., 50 imputations), which is very time-consuming for our proposed Bayesian methods. On the other hand, it is easy to combine 50 resulting point estimates of by taking the mean, median, or mode of them as the final point estimate; however, there appears to be an issue with the process of combining the 50 resulting credible intervals. Therefore, when the genotypes of some individuals in the collected pedigrees were missing, we did not impute these missing genotypes. Instead, we chose to delete the individuals with missing genotypes directly. In fact, the simulation results show that, even when the genotypes of approximately 40% of individuals in general pedigrees are missing, our proposed methods can still control the CPs well, indicating that our proposed methods are robust when there are missing genotypes in the data. The simulation results also show that our proposed methods still work when the frequency of the deleterious allele in females and that in males are different (i.e., and ). Furthermore, when heteroscedasticity exists (i.e., ), our proposed methods remain robust.
The proposed methods have the following issues to be discussed: Firstly, it is well known that the prior distributions of unknown parameters are important in Bayesian inference, and the choice of them may affect the results. In this paper, we consider two priors for , a non-informative prior which has little effect on the posterior distribution of , and a truncated normal distribution based on the genetic background of XCI. We also take account of non-informative priors for regression coefficients and weak priors for variances. In practical applications, researchers can choose appropriate priors according to their research background. Secondly, the Bayesian method adopts the HMC algorithm for the posterior sampling process, which is not greatly influenced by the correlations among unknown parameters. Therefore, for computational efficiency, we assume that all unknown parameters are unrelated. However, Bayesian methods should have better performance if the correlation between parameters is considered. Thirdly, the HPDIs that contain the number one can only indicate that the SNP undergoes the XCI-R or XCI-E pattern. The process of further distinguishing the XCI-R and XCI-E patterns is a potential problem to be solved. Fourthly, Ma et al. [23] claimed that the variance of the quantitative trait under study for heterozygous females may be higher than that for homozygous females in some cases. For computational efficiency, we assumed that the variances of quantitative traits for different genotypes in females are the same in our proposed methods.
To address the issues mentioned above, we will consider the following improvements in the future: Firstly, we will take into account non-informative priors for variances, such as non-informative Gamma prior or inverse-Gamma prior [41], to improve our proposed methods. Secondly, we will use the Gibbs sampling algorithm [54] to conduct the Bayesian posterior sampling process when the parameters are correlated. Thirdly, the information from the XCI-E can be estimated using the difference of transcriptional dosage on the X chromosome between male hemizygotes and female homozygotes. Therefore, we will incorporate the information from males into our model to further distinguish the XCI-E from the XCI-R. Fourthly, although we have completed some simulation studies showing that our proposed methods are still robust in the presence of heteroscedasticity (Supplementary Tables S9–S14), we will extend our proposed methods to manage the situation of heteroscedasticity to further improve the precision and the accuracy of estimating the degree of XCI-S in the future. Finally, besides the GEMMA [27], we understand that the REGENIE method for autosomal SNPs [55] could take into account population stratification. Therefore, we will extend it to test for the association between X chromosomal SNPs and traits based on the mixed data in the future.
5. Conclusions
In summary, we propose a Bayesian method with two priors (the truncated normal prior and the uniform prior) to estimate the degree of XCI-S based on the mixture of general pedigrees and additional unrelated females, which are denoted by the BNM and BUM methods, respectively. We also develop the corresponding Bayesian method, which is suitable for only general pedigrees, denoted by the BNP and BUP methods. We conducted an extensive simulation study to compare the performance of our four proposed methods with the two existing BN and BU methods. The simulation results show that the BNM method obtains the smallest MSE, the shortest width of the HPDIs, and the most stable CPs, which indicates that it is more efficient in estimating the degree of XCI-S by simultaneously using general pedigrees and additional unrelated females. Finally, we applied the proposed methods to the MCTFR data, and found that three associated SNPs (rs10522027, rs12860832, and rs12849233) undergo the XCI-R or XCI-E pattern.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom13030543/s1. Supplementary Appendix SA: Using the EVD to speed up the posterior sampling process for quantitative traits; Supplementary Appendix SB: Using Cholesky decomposition to speed up the posterior sampling process for qualitative traits; Supplementary Appendix SC: Simulation study of the type I error rate for our proposed improved linear mixed model; Tables S1 and S2: s and s of the BNM, BUM, BNP, BUP, BN, and BU methods under and among 500 replicates for mixed data, only general pedigrees, and only unrelated females, respectively; Table S3: Mean squared errors (MSEs) of point estimates , , , and under and , and among 500 replicates for mixed data and only general pedigrees, respectively; Tables S4–S8: Coverage probabilities (CPs, in %), s, s, s and s of the BNM, BUM, BNP, and BUP methods under and , and among 500 replicates for mixed data and only general pedigrees, respectively; Table S9: Mean squared errors (MSEs) of point estimates , , , and under , and and among 500 replicates for quantitative traits; Tables S10–S14: Coverage probabilities (CPs, in %), s, s, s, and s of the BNM, BUM, BNP, and BUP under , and and methods among 500 replicates for quantitative traits; Table S15: Type I error rate of our proposed improved linear mixed model under and among 1000 replicates. Figures S1–S14: Scatter plots of six point estimates of against true values of with and 600, and 2600, and 0.1, and 1, , and for the quantitative or qualitative traits; Figures S15–S28: Scatter plots of widths of HPDIs based on six methods against true values of with and 600, and 2600, and 0.1, and 1, , and for the quantitative or qualitative traits.
Author Contributions
Conceptualization, J.-Y.Z.; methodology, Y.-F.K. and S.-Z.L.; software, Y.-F.K., S.-Z.L. and J.-Y.Z.; validation, K.-W.W., B.Z., Y.-X.Y. and M.-K.L.; writing original draft, Y.-F.K., S.-Z.L. and K.-W.W.; review and edit, B.Z., Y.-X.Y., M.-K.L. and J.-Y.Z.; supervision, J.-Y.Z.; project administration, J.-Y.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China, grant number 82173619, the Guangdong Basic and Applied Basic Research Foundation, grant number 2023A1515011242, and the Science and Technology Planning Project of Guangdong Province, grant number 2020B1212030008.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The R package BEMXCIS for the BNM, BUM, BNP and BUP methods is freely available at https://github.com/Yi-FanKong/BEMXCIS (accessed on 2 February 2023), which is implemented by R software (version 4.1.2). The MCTFR data used for this study can be found on the database of Genotypes and Phenotypes (dbGaP) with the accession number phs000620.v1.p1 and the dbGaP request number 86747-7 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1, accessed on 2 February 2023).
Acknowledgments
The Minnesota Center for Twin and Family Research (MCTFR) was supported by the National Institute on Drug Abuse grant no. U01 DA024417. The sample ascertainment and data collection in MCTFR data were supported by the National Institute on Drug Abuse grant nos. R37 DA05147 and R01 DA13240, the National Institute on Alcohol Abuse and Alcoholism grant nos. R01 AA09367 and R01 AA11886, and the National Institute of Mental Health grant no. R01 MH66140.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 1961, 190, 372–373. [Google Scholar] [CrossRef]
- Zito, A.; Davies, M.N.; Tsai, P.C.; Roberts, S.; Andres-Ejarque, R.; Nardone, S.; Bell, J.T.; Wong, C.; Small, K.S. Heritability of skewed X-inactivation in female twins is tissue-specific and associated with age. Nat. Commun. 2019, 10, 5339. [Google Scholar] [CrossRef]
- Amos-Landgraf, J.M.; Cottle, A.; Plenge, R.M.; Friez, M.; Schwartz, C.E.; Longshore, J.; Willard, H.F. X chromosome-inactivation patterns of 1,005 phenotypically unaffected females. Am. J. Hum. Genet. 2006, 79, 493–499. [Google Scholar] [CrossRef]
- Peeters, S.B.; Cotton, A.M.; Brown, C.J. Variable escape from X-chromosome inactivation: Identifying factors that tip the scales towards expression. Bioessays 2014, 36, 746–756. [Google Scholar] [CrossRef]
- Posynick, B.J.; Brown, C.J. Escape from X-chromosome inactivation: An evolutionary perspective. Front. Cell Dev. Biol. 2019, 7, 241. [Google Scholar] [CrossRef]
- Deng, X.; Berletch, J.B.; Nguyen, D.K.; Disteche, C.M. X chromosome regulation: Diverse patterns in development, tissues and disease. Nat. Rev. Genet. 2014, 15, 367–378. [Google Scholar] [CrossRef] [PubMed]
- Medema, R.H.; Burgering, B.M. The X factor: Skewing X inactivation towards cancer. Cell 2007, 129, 1253–1254. [Google Scholar] [CrossRef] [PubMed]
- Shvetsova, E.; Sofronova, A.; Monajemi, R.; Gagalova, K.; Draisma, H.; White, S.J.; Santen, G.; Chuva, D.S.L.S.; Heijmans, B.T.; van Meurs, J.; et al. Skewed X-inactivation is common in the general female population. Eur. J. Hum. Genet. 2019, 27, 455–465. [Google Scholar]
- Bajic, V.; Mandusic, V.; Stefanova, E.; Bozovic, A.; Davidovic, R.; Zivkovic, L.; Cabarkapa, A.; Spremo-Potparevic, B. Skewed X-chromosome inactivation in women affected by Alzheimer’s disease. J. Alzheimers Dis. 2015, 43, 1251–1259. [Google Scholar] [CrossRef]
- Giliberto, F.; Radic, C.P.; Luce, L.; Ferreiro, V.; de Brasi, C.; Szijan, I. Symptomatic female carriers of duchenne muscular dystrophy (DMD): Genetic and clinical characterization. J. Neurol. Sci. 2014, 336, 36–41. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Jin, T.; Liang, H.; Tu, Y.; Zhang, W.; Gong, L.; Su, Q.; Gao, G. Skewed X-chromosome inactivation in patients with esophageal carcinoma. Diagn. Pathol. 2013, 8, 55. [Google Scholar] [CrossRef] [PubMed]
- Sangha, K.K.; Stephenson, M.D.; Brown, C.J.; Robinson, W.P. Extremely skewed X-chromosome inactivation is increased in women with recurrent spontaneous abortion. Am. J. Hum. Genet. 1999, 65, 913–917. [Google Scholar] [CrossRef] [PubMed]
- Simmonds, M.J.; Kavvoura, F.K.; Brand, O.J.; Newby, P.R.; Jackson, L.E.; Hargreaves, C.E.; Franklyn, J.A.; Gough, S.C. Skewed X chromosome inactivation and female preponderance in autoimmune thyroid disease: An association study and meta-analysis. J. Clin. Endocrinol. Metab. 2014, 99, E127–E131. [Google Scholar] [CrossRef] [PubMed]
- Sun, Z.; Fan, J.; Wang, Y. X-chromosome inactivation and related diseases. Genet. Res. 2022, 2022, 1391807. [Google Scholar] [CrossRef]
- Okumura, K.; Fujimori, Y.; Takagi, A.; Murate, T.; Ozeki, M.; Yamamoto, K.; Katsumi, A.; Matsushita, T.; Naoe, T.; Kojima, T. Skewed X chromosome inactivation in fraternal female twins results in moderately severe and mild haemophilia B. Haemophilia 2008, 14, 1088–1093. [Google Scholar] [CrossRef]
- Ozbalkan, Z.; Bagislar, S.; Kiraz, S.; Akyerli, C.B.; Ozer, H.T.; Yavuz, S.; Birlik, A.M.; Calguneri, M.; Ozcelik, T. Skewed X chromosome inactivation in blood cells of women with scleroderma. Arthritis Rheum. 2005, 52, 1564–1570. [Google Scholar] [CrossRef]
- Chen, Z.; Ng, H.K.; Li, J.; Liu, Q.; Huang, H. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat. Methods Med. Res. 2017, 26, 567–582. [Google Scholar] [CrossRef]
- Chen, B.; Craiu, R.V.; Strug, L.J.; Sun, L. The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies. Genet. Epidemiol. 2021, 45, 694–709. [Google Scholar] [CrossRef]
- Clayton, D. Testing for association on the X chromosome. Biostatistics 2008, 9, 593–600. [Google Scholar] [CrossRef]
- Deng, W.Q.; Mao, S.; Kalnapenkis, A.; Esko, T.; Sun, L. Analytical strategies to include the X-chromosome in variance heterogeneity analyses: Evidence for trait-specific polygenic variance structure. Genet. Epidemiol. 2019, 43, 815–830. [Google Scholar] [CrossRef]
- Ding, J.; Lin, S.; Liu, Y. Monte carlo pedigree disequilibrium test for markers on the X chromosome. Am. J. Hum. Genet. 2006, 79, 567–573. [Google Scholar] [CrossRef]
- Gao, F.; Chang, D.; Biddanda, A.; Ma, L.; Guo, Y.; Zhou, Z.; Keinan, A. XWAS: A software toolset for genetic data analysis and association studies of the X chromosome. J. Hered. 2015, 106, 666–671. [Google Scholar] [CrossRef]
- Ma, L.; Hoffman, G.; Keinan, A. X-inactivation informs variance-based testing for X-linked association of a quantitative trait. BMC Genom. 2015, 16, 241. [Google Scholar] [CrossRef]
- Wang, J.; Yu, R.; Shete, S. X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet. Epidemiol. 2014, 38, 483–493. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Martin, E.R.; Morris, R.W.; Li, Y.J. Association test for x-linked QTL in family-based designs. Am. J. Hum. Genet. 2009, 84, 431–444. [Google Scholar] [CrossRef] [PubMed]
- Zheng, G.; Joo, J.; Zhang, C.; Geller, N.L. Testing association for markers on the X chromosome. Genet. Epidemiol. 2007, 31, 834–843. [Google Scholar] [CrossRef]
- Zhou, X.; Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012, 44, 821–824. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.Q.; Zhang, Y.; Wang, P.; Liu, W.; Wu, X.B.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on family trios. BMC Genet. 2018, 19, 109. [Google Scholar] [CrossRef] [PubMed]
- Wang, P.; Zhang, Y.; Wang, B.Q.; Li, J.L.; Wang, Y.X.; Pan, D.; Wu, X.B.; Fung, W.K.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation based on case-control design. BMC Bioinform. 2019, 20, 11. [Google Scholar] [CrossRef]
- Li, B.H.; Yu, W.Y.; Zhou, J.Y. A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data. BMC Genom. Data 2021, 22, 24. [Google Scholar] [CrossRef] [PubMed]
- Wang, P.; Xu, S.; Wang, Y.X.; Wu, B.; Fung, W.K.; Gao, G.; Liang, Z.; Liu, N. Penalized fieller’s confidence interval for the ratio of bivariate normal means. Biometrics 2021, 77, 1355–1368. [Google Scholar] [CrossRef] [PubMed]
- Yu, W.Y.; Zhang, Y.; Li, M.K.; Yang, Z.Y.; Fung, W.K.; Zhao, P.Z.; Zhou, J.Y. BEXCIS: Bayesian methods for estimating the degree of the skewness of X chromosome inactivation. BMC Bioinform. 2022, 23, 193. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Zhou, J.; Sobel, E.M.; Lange, K. Fast genome-wide pedigree quantitative trait loci analysis using mendel. BMC Proc. 2014, 8, S93. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Blangero, J.; Dyer, T.D.; Chan, K.K.; Lange, K.; Sobel, E.M. Fast genome-wide QTL association mapping on pedigree and population data. Genet. Epidemiol. 2017, 41, 174–186. [Google Scholar] [CrossRef]
- Annis, J.; Miller, B.J.; Palmeri, T.J. Bayesian inference with stan: A tutorial on adding custom distributions. Behav. Res. Methods 2017, 49, 863–886. [Google Scholar] [CrossRef]
- Zhao, J.H.; Luan, J.A.; Congdon, P. Bayesian linear mixed models with polygenic effects. J. Stat. Softw. 2018, 85, 1–27. [Google Scholar] [CrossRef]
- Lange, K. Mathematical and Statistical Methods for Genetic Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2002; p. 384. [Google Scholar]
- Sinnwell, J.P.; Therneau, T.M.; Schaid, D.J. The kinship2 R package for pedigree data. Hum. Hered. 2014, 78, 91–93. [Google Scholar] [CrossRef]
- Bae, H.T.; Perls, T.T.; Sebastiani, P. An efficient technique for Bayesian modeling of family data using the bugs software. Front. Genet. 2014, 5, 390. [Google Scholar] [CrossRef] [PubMed]
- Runcie, D.E.; Crawford, L. Fast and flexible linear mixed models for genome-wide genetics. PLoS Genet. 2019, 15, e1007978. [Google Scholar] [CrossRef]
- Kruschke, J.K. Bayesian data analysis. Wiley Interdiscip. Rev.-Cogn. Sci. 2010, 1, 658–676. [Google Scholar] [CrossRef]
- Ma, C.; Boehnke, M.; Lee, S. Evaluating the calibration and power of three gene-based association tests of rare variants for the X chromosome. Genet. Epidemiol. 2015, 39, 499–508. [Google Scholar] [CrossRef] [PubMed]
- Schifano, E.D.; Epstein, M.P.; Bielak, L.F.; Jhun, M.A.; Kardia, S.L.; Peyser, P.A.; Lin, X. SNP set association analysis for familial data. Genet. Epidemiol. 2012, 36, 797–810. [Google Scholar] [CrossRef]
- Won, S.; Lange, C. A general framework for robust and efficient association analysis in family-based designs: Quantitative and dichotomous phenotypes. Stat. Med. 2013, 32, 4482–4498. [Google Scholar] [CrossRef] [PubMed]
- Saad, M.; Wijsman, E.M. Association score testing for rare variants and binary traits in family data with shared controls. Brief. Bioinform. 2019, 20, 245–253. [Google Scholar] [CrossRef] [PubMed]
- Vehtari, A.; Gelman, A.; Simpson, D.; Carpenter, B.; Bürkner, P. Rank-normalization, folding, and localization: An improved R for assessing convergence of MCMC. Bayesian Anal. 2020, 16, 667–718. [Google Scholar] [CrossRef]
- Miller, M.B.; Basu, S.; Cunningham, J.; Eskin, E.; Malone, S.M.; Oetting, W.S.; Schork, N.; Sul, J.H.; Iacono, W.G.; Mcgue, M. The Minnesota Center for Twin and Family Research genome-wide association study. Twin Res. Hum. Genet. 2012, 15, 767–774. [Google Scholar] [CrossRef]
- Mcgue, M.; Zhang, Y.; Miller, M.B.; Basu, S.; Vrieze, S.; Hicks, B.; Malone, S.; Oetting, W.S.; Iacono, W.G. A genome-wide association study of behavioral disinhibition. Behav. Genet. 2013, 43, 363–373. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, S.; Li, Z.; Morrison, A.C.; Boerwinkle, E.; Lin, X. ACAT: A fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 2019, 104, 410–421. [Google Scholar] [CrossRef]
- Mccaw, Z.R.; Lane, J.M.; Saxena, R.; Redline, S.; Lin, X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 2020, 76, 1262–1272. [Google Scholar] [CrossRef]
- Ng, K.T.; Yeung, O.W.; Liu, J.; Li, C.X.; Liu, H.; Liu, X.B.; Qi, X.; Ma, Y.Y.; Lam, Y.F.; Lau, M.Y.; et al. Clinical significance and functional role of transmembrane protein 47 (TMEM47) in chemoresistance of hepatocellular carcinoma. Int. J. Oncol. 2020, 57, 956–966. [Google Scholar] [CrossRef]
- Li, R.; Guo, M.; Song, L. PAS domain containing repressor 1 (PASD1) promotes glioma cell proliferation through inhibiting apoptosis in vitro. Med. Sci. Monitor 2019, 25, 6955–6964. [Google Scholar] [CrossRef] [PubMed]
- Weeks, D.E.; Ott, J.; Lathrop, G.M. SLINK: A general simulation program for linkage analysis. Am. J. Hum. Genet. 1990, 47, A204. [Google Scholar]
- Cheng, H.; Qu, L.; Garrick, D.J.; Fernando, R.L. A fast and efficient Gibbs sampler for Bayes in whole-genome analyses. Genet. Sel. Evol. 2015, 47, 80. [Google Scholar] [CrossRef] [PubMed]
- Mbatchou, J.; Barnard, L.; Backman, J.; Marcketta, A.; Kosmicki, J.A.; Ziyatdinov, A.; Benner, C.; O’Dushlaine, C.; Barber, M.; Boutkov, B.; et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 2021, 53, 1097–1103. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).




