Next Article in Journal
α-Bisabolol, a Dietary Sesquiterpene, Attenuates Doxorubicin-Induced Acute Cardiotoxicity in Rats by Inhibiting Cellular Signaling Pathways, Nrf2/Keap-1/HO-1, Akt/mTOR/GSK-3β, NF-κB/p38/MAPK, and NLRP3 Inflammasomes Regulating Oxidative Stress and Inflammatory Cascades
Next Article in Special Issue
Evolution and Expression of the Expansin Genes in Emmer Wheat
Previous Article in Journal
Endoplasmic Reticulum Stress-Related Ten-Biomarker Risk Classifier for Survival Evaluation in Epithelial Ovarian Cancer and TRPM2: A Potential Therapeutic Target of Ovarian Cancer
Previous Article in Special Issue
Molecular Characterization and Marker Development of the HMW-GS Gene from Thinopyrum elongatum for Improving Wheat Quality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Local and Bayesian Survival FDR Estimations to Identify Reliable Associations in Whole Genome of Bread Wheat

INRES-Plant Breeding, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113 Bonn, Germany
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(18), 14011; https://doi.org/10.3390/ijms241814011
Submission received: 3 August 2023 / Revised: 2 September 2023 / Accepted: 7 September 2023 / Published: 12 September 2023
(This article belongs to the Special Issue Wheat Genetics and Genomics 2.0)

Abstract

:
Estimating the FDR significance threshold in genome-wide association studies remains a major challenge in distinguishing true positive hypotheses from false positive and negative errors. Several comparative methods for multiple testing comparison have been developed to determine the significance threshold; however, these methods may be overly conservative and lead to an increase in false negative results. The local FDR approach is suitable for testing many associations simultaneously based on the empirical Bayes perspective. In the local FDR, the maximum likelihood estimator is sensitive to bias when the GWAS model contains two or more explanatory variables as genetic parameters simultaneously. The main criticism of local FDR is that it focuses only locally on the effects of single nucleotide polymorphism (SNP) in tails of distribution, whereas the signal associations are distributed across the whole genome. The advantage of the Bayesian perspective is that knowledge of prior distribution comes from other genetic parameters included in the GWAS model, such as linkage disequilibrium (LD) analysis, minor allele frequency (MAF) and call rate of significant associations. We also proposed Bayesian survival FDR to solve the multi-collinearity and large-scale problems, respectively, in grain yield (GY) vector in bread wheat with large-scale SNP information. The objective of this study was to obtain a short list of SNPs that are reliably associated with GY under low and high levels of nitrogen (N) in the population. The five top significant SNPs were compared with different Bayesian models. Based on the time to events in the Bayesian survival analysis, the differentiation between minor and major alleles within the association panel can be identified.

1. Introduction

Bread wheat (Triticum aestivum L.) is among one of the major crops for food security worldwide which alone contributes 20% of the protein and calories in our daily diet. Increasing Grain yield (GY) is the main goal of wheat breeding programs. GY is a complex trait controlled by many genes with small effects. Under field conditions, both genetic and environmental factors, and interaction between genotype and environment (G × E) affect GY [1]. However it is observed that the analysis of GY related parameters presents challenge. Because the observations should be independent and identical and the residuals must follow normal distribution. In practice, they are affected by genetic and environmental factors and the distribution of data is far from normal [2]. There is only one mean vector obtained from the observations, which is not sufficiently informative to confirm whether the distribution is normal or not. One of the most popular tools to compute true confidence interval for the mean and standard error is bootstrap [3], which is a resampling statistical device to measure the accuracy of observed bias and variance [4] in complex traits like GY. In field experiments, genotypes have random effects, which lead to some outliers among the observations. The best linear unbiased predictors (BLUPs) is utilized in phenotypic mixed linear models to estimate random effects. BLUPs with frequentist perspective can be an analogy to Bayesian inference in dealing with outliers [5]. In parallel, genetic estimated breeding value (GEBV) describes the individual genetic merit to produce superior offspring, and correlation between GEBVs and BLUPs is a common approach to check the quality of GY vector [6]. GEBVs of complex traits can be computed by BLUPs via genome wide association study (GWAS) models. So, GWAS is a successful and fast strategy for genetic dissection of complex traits in plants [7]. The objective of GWAS is to determine the presence of a significant relationship between genotypes and traits of interest. Therefore, understanding variations in rare phenotypes among complex traits and associations between large number of markers (typically SNPs) and a given trait are of great interest [8]. It is particularly useful when little information on the genetic control of a quantitative complex trait is available. The main challenge is to find significant reliable associations in GWAS results. Population structure and genomic relationship (kinship) matrix are two main components involved in the GWAS model that can reduce structural and systematic effects. Furthermore, there are some genetic hyper-parameters such as marker effect, minor allele frequency (MAF), number of call rate for each marker [9] and epistasis effects between loci [10], which are not presented in usual GWAS models, but can improve the power and accuracy of models significantly. Moreover, various GWAS models, such as single locus association, multi-locus association [11,12], Bayesian whole genome regression and whole genome variance, have potential to attain reliable significant associations. In the single locus association model, population structure and environmental factors are taken as fixed effects while phenotypic values and markers are considered as random effects [13]. It further requires Bonferroni correction with false discovery rate (FDR) of 0.05 for pairwise tests between the complex trait and SNPs. At this threshold level, although the risk of a false positive has been reduced, the chance of producing true positive associations is very low. As such, this model has led to many pseudo associations in the results of some studies [14]. Moreover, high over or low under-fitting in single locus association model due to high-density SNPs makes optimization difficult. Meanwhile, the multi-locus association model based on the restricted maximum likelihood (REML) method applies better FDR correction to reduce selection criterion, but due to pairwise comparisons, collinearity in this model remains high. It has also led to over- or under-fitting in the results of GWAS model [15]. Particularly, when the epistasis effects between loci are high, due to low bias and high variance in the model parameters, many pseudo associations arise in the results [16,17,18]. The whole genome variance model provides the analysis of genetic variance based on whole genome regression [19]. The probability of receiving reliable significant associations in this model is still low, due to multicollinearity and heteroscedasticity in model components. The single, multi-locus and whole genome variance models are based on linear regression algorithm, which has low efficiency for big genomic datasets with large-p-value-small effects. However, the Bayesian whole genome regression model based on prior and posterior probability distributions is a convenient algorithm to deal with these difficulties [20]. Therefore, the model’s accuracy can be higher than three previous frequentist models, because the complex trait and SNPs are taking prior distribution [21]. In this model, SNP effects and dimensions of genomic dataset are interpreted with Bayes factor (BF), which is easier and more reliable against SNP p-values [22]. BF is a summary measure that provides an alternative to the p-value for ranking significant associations [23]. In addition, in this model, false coverage rate (FCR) is considered analogous to FDR [24]. FCR covers more dimensions among the selected intervals in the genomic file [25]. Nevertheless, determining the actual prior distribution for the GY vector and SNPs represents a difficult part of computation [26]. Genomic prediction (GP) as a promising technique in molecular breeding provides the possibility to consider the performance of GWAS models. Bayesian survival FDR analysis is a robust strategy based on probability theory to determine the true prior distribution in large-scale genomic datasets [17,27]. Moreover, this analysis is a regularization approach that can be applied as a GP model with high performance [28]. It attempts to estimate all genomic effects among GY vector, while the pseudo effects of covariates are reduced to zero within SNPs [29].

2. Local and Bayesian Survival FDR Analysis in GWAS

In the last decade, GWAS has been made feasible by high-throughput genomic technologies based on next-generation sequencing (NGS) techniques. When multiple pairwise association tests are performed between case and control individuals, it becomes difficult to minimize type I error (false positive error) in hypothesis testing simultaneously. Some efforts to control the family wise error rate (FWER) versus multiple pairwise errors were proposed by Bonferroni, then Benjamini [30], who proposed FDR control as an alternative approach to overall type I error. GWAS is very sensitive in identifying the genetic basis of complex traits such as GY with many agronomic components with small effects. The reasons for this sensitivity could be the many genetic variations with small effects among individuals and the variation in genetic structure of sub population. Thus, the power of common GWAS models, such as genome regression models, to control for the FDR threshold is very low to detect non null hypotheses for associations. Efron and Tibshirani developed an extension of FDR for large-scale simultaneous hypothesis testing, called local FDR based on MLE in scale of z-value versus p-value [31]. Large-scale genetic association studies are conducted in the hope of discovering signal SNPs involved in the association of complex traits. Identifying the correct FDR to decide between signal associations is the critical key of GWAS. Based on frequent perspective, the final extension of the correct threshold in GWAS is the local FDR approach proposed by Efron [32]. The local FDR approach is suitable for simultaneously testing many associations, based on the empirical Bayes perspective.
In the local FDR framework, the FDR is a measurement of posterior signal SNPs with rule:
P ( z i π 0 f 0 + π 1 f 1 + + π n f n > 1 α
where z i is a t-statistic comparing pairwise SNP associations with N 0 , 1 distribution under null hypothesis, π is proportion of true null hypothesis, f the mixture density estimate at midpoint of bin, calculated based on empirical Bayes rule of z i with degree of freedom (df) for fitting the MLE of z i , α is significant level of association test. The local FDR (locFDR) is then defined as:
l o c F D R z i = z i π 0 f 0 z i π i f i
Thus in locFDR, maximum likelihood estimator is sensitive to bias when the GWAS model includes two or more explanatory variables as genetic parameters. The results are based on maximum likelihood of marker effects, which is only a point estimation and there is no further information about prior distribution of the signal and followers of the signal associations.
However, the advantage of the Bayesian perspective is that the knowledge of prior distribution comes from other genetic parameters included in the GWAS model, such as linkage disequilibrium (LD) analysis, MAF and call rate of significant associations. In the current study, we propose to solve the multicollinearity and large-scale problems in the GY vector as a complex trait in bread wheat and large-scale SNP information with Bayesian survival analysis. The aim of this study is to obtain a short list of SNPs that are reliably associated to GY vector among three levels of N. The hypothesis to test, for each SNP = SNP1,…, SNPp when pairwise comparisons between censored and relapsed observations, are conditionally independent, x S N P ~ P θ S N P , so x S N P is GY values for each genotype. We suppose the SNP effect as parameter space of θ S N P Θ , where Θ is an unobserved scalar parameter that can be partitioned in two parts ( Θ 0 ,   Θ 1 ) such that:
H0: θ S N P Θ 0 or SNPi is not significantly associated with GY among population.
Ha: θ S N P Θ 1 or SNPi is significantly associated with GY among population.
In Bayesian survival framework, the FDR is a measurement of posterior signal SNPs with rule:
l P ( θ S N P Θ 1   | x S N P ) > 1 α
Where α takes MAF of SNP in the pairwise comparisons of censored and returned observations.
The Bayesian survival FDR (bsFDR) is then defined:
b s F D R λ S N P = S N P P ( θ S N P Θ 1   | λ S N P . θ S N P ) S N P ( λ S N P .   θ S N P )
where λ S N P is Bayes survived factor that indicate SNP effect based on MAF value for GY of each genotype.
Estimating a significant FDR threshold in GWAS is still a major challenge to distinguish true positive hypotheses from false positives and negative errors. Several comparative methods for multiple testing have been developed to determine the significance threshold; however, these methods may be overly conservative and lead to an increase in false negative errors. Here, we developed two empirical methods to determine the statistical significance threshold based on SNP- heritability of GY as a target trait. To test the locFDR and bsFDR methods for significance threshold under different distinguished GWAS models, we used the mean of results from three years of field experiments on GY vector in bread wheat under low N (LN) and high N (HN) applications.

3. Results

3.1. Grain Yield Quality Control

The GY observations obtained from 221 bread wheat genotypes under HN treatment were averaged across three years with a mean value of 6400 and standard division of 145.16 (gr/m2). In the histogram plot, the Shapiro p-value = 0.0891, which indicates the GY vector follows a normal distribution. However, we considered this test inadequate, so to focus more on the quality of observations, the random vectors were repeated 2000 times with replacement from original GY vector was generated and Bayesian bootstrap p-value = 0.00325, which shows the Shapiro p-value is accurate (Figure 1b). Consequently, a confidence interval of 95% and standard error in simulated samples are close to original observations. Correlation between BLUPs and BLUEs as analog for EBVs is high (r = 0.814), showing that the effects of genotypes are distributed randomly and the quality of GY vector is acceptable to use in GWAS and GP models. Moreover, under LN treatment after removing outliers the Bayesian bootstrap p-value = 0.3048, which shows the Shapiro p-value = 0.2462 follows a normal distribution and is ready to apply in GWAS models (Figure 1a).

3.2. Population Structure Analysis

In Figure 2, based on the population of 221 genotypes, genetic data are divided into three clusters which implies these genotypes differ from each other in genetic content.
The variance is explained by PCA being normally distributed in ten PCs and the first three PCs with explained variance 20.5, 10.8 and 9.7%, respectively, are considerable as PC number in the GWAS analysis. The allocation of genotypes in related clusters could be attributed to their genetic descent and common ancestry. The Fst plot based on 150 K SNPs with MAF greater than 0.05, indicates how heterozygosity varies in sub-populations in comparison with the whole population. It is thus considered a threshold of calculated proportion of homozygosity in whole genome, which is clearly observed in Figure 2d and the proportion of homozygosity in the population is acceptable.

3.3. GWAS and GP Model Selection

To determine the best GWAS model, local FDR analysis on SNP p-values for each model was run.
In Figure 3a, the results under low N show that in the variance component model (sommer) the standard error is minimum (sigma = 0.048), but Bayesian whole genome regression (NAM) exhibits the highest Delta ( μ ) = 0.589 and highest Proportion (H0) = 0.625, which shows that the false positive rate to reject H0 via sommer under LN treatment is not high, so this minimum sigma = 0.048 is not reliable. For all GWAS models, the local FDR plot was revealed in one tail.
In Figure 3b, the results under high N show that for Bayesian whole genome regression (NAM) the standard error is minimum (sigma = 0.062), the Delta is highest ( μ ) = 0.661 and Proportion is highest (H0) = 1.034, in front in variance component model, sigma = 0.664 is highest and the higher sigma with lower Delta ( μ ) = 0.358, and lower Proportion (H0) = 0.433, representing lower model accuracy, so the sommer model was removed from further analysis.
For rrBLUP and NAM, the local FDR plot was one tail but in mlmm model the two tails of significant SNPs was revealed. To check the performance of the remaining three GWAS models, the Bayesian survival analysis based on highest significant association was performed for GY vector under LN and HN treatments separately, and the mean and standard error of survived SNPs (S(SNPi)) were calculated. To measure the accuracy of each model, Akaike information criterion (AIC) and Bayesian information criterion (BIC) were calculated and the results are shown in Figure 4. Under LN, GY vector in the rrBLUP and mlmm models was the only major allele that survived with mean of S(SNPi) 0.547 and 0.481. However, in the NAM model both major and minor alleles during survival analysis have remained. Moreover, the NAM displayed the minimum standard error (SE S(SNPi) = 0.189) in comparison to the other two models. The AIC and BIC were 150,408 and 150,867, respectively, which are lower than rrBLUP and mlmm. For GY vector under HN condition, the bootstrapping method was applied and in the mlmm model only the major allele survived with mean of S(SNPi) 0.597. However, in the rrBLUP both major and minor alleles and in the NAM all major, minor, heterozygous and missing (NA) alleles during survival analysis remained. In addition, the NAM exhibited minimum SE S(SNPi) = 0.104 in comparison to the other two models. Here, the AIC and BIC values were also less than those against rrBLUP and mlmm. Therefore, both local FDR and Bayesian survival analyses have confirmed NAM to be the best GWAS model with minimum residual errors on the SNPs. The Manhattan and QQ-plots for NAM model for GY vector under HN are shown in Figure 5a,b.
The association points well above the gray area in QQ-plot correspond to GWAS hits and in Manhattan plot by threshold −log10 (p = 4), the continuous points on chromosome region 3A specify the reliable SNPs, which show association with investigated trait (Table 1 and Table 2).

3.4. SNP Effect Estimation

In this study, we were also interested in examining the impact of prior distribution on the SNP p-values of NAM as the best model of GWAS and estimation accuracy. Therefore, the GP models were assessed with six different Bayesian prior distributions: A, B, C, LASSO, ridge regression and survival. For all prior distributions, the markers with null effects were removed from analysis. For Bayesian survival prior distribution, λ 0 = 1 and exp β G Y = 0.001 , was assigned to obtain flat prior. To calculate λ S N P i and p-values, a Markov chain Monte Carlo (MCMC) of length 150,000 iterations was considered after a burning bootstrap period of 2000 iterations. The p-values generated from all prior distributions were converted to z-values for easier interpretation and minimizing the effect of the SNP dimension. By comparing the box plots of the SNP Z-scores generated from six different Bayesian prior distributions (Figure 6a), the Bayesian LASSO and Bayesian survival models exhibit highest accuracy by correlation between predicted genetic values ( g ^ ) with true breeding values ( g ) ,   r B a y e s L g ^ , g = 0.8722 and r B a y e s S u r v g ^ , g = 0.8876. The top five significant SNPs received from NAM model (Figure 6a) were compared with different Bayesian models. Only the Bayesian survival analysis can identify the differentials within them due to time to events among minor and major alleles within association panel. With this approach, the first two signal SNPs show a trend, and share reliable associations. However, for other Bayesian platforms there is no trend among SNP effects (Figure 6b).

4. Discussion

Nitrogen is the main nutrient for canopy growth and photosynthesis, which is responsible for GY and quality. Allelic variation for GY under low and high N could be high due to large mutations in the signal SNPs within candidate genes. Moreover, considerable effort is required to identify all involved variants among complex traits. Therefore, the main challenge in GWAS is to find significant and reliable associations related to a given complex trait. To address this dilemma, local FDR correction and Bayesian survival analysis, two precise and efficient computational approaches, serve as different filters to determine the best GWAS and GP models and consequently obtain a reliable association in the result of the best selected model. Mixed populations and variants with small effect sizes or rare alleles in kinship coefficients derived from marker information or outliers in the phenotypic vector, may lead to causal signals and false association between marker and complex trait. In parallel with these genetic parameters, there are some genetic hyper-parameters such as panel size, number of markers, MAF and number of call rates for each marker, that are not represented in the given model, but which have effects on the performance of the model and accuracy of results. In the GWAS and GP, one side of the model is allocated to the complex trait. To reduce the multicollinearity and heteroscedasticity in phenotypic observations, a tradeoff between bias and variance is required. Outliers in the GY vector are the first cause of bias results in GWAS. We used bootstrapping as a powerful and well-known tool to deal with outliers in the GY vector. However, in this parametric resampling method, the GY vector is simulated to obtain the confidence interval of the mean and standard error of the phenotypic observations. However, randomization with global replacement in parametric bootstrapping leads to underestimation of the variation of the complex trait. In this study, in addition to dealing with outliers with usual procedures, we applied Bayesian bootstrap using MCMC state resampling, which can be a good alternative to control underestimation of variance. Because the observations appeared with a certain probability with bias and variance tradeoff on the results. This probability based on prior distribution offers more precise estimation. In the results, Bayesian bootstrap p-value shows more significance with lower sampling error among the GY vector (Figure 1a). Therefore, the correlation between BLUPs and BLUEs is higher than 0.8 among prior distribution of the GY vector. However, the low replicability and reliability of results in the linear GWAS models, such as single-locus, multi-locus association and whole-genome variance models, pose a major challenge. Basically, in self-pollinated wheat crops, there are some non-additive relationships between loci, such as codominant and epistasis, due to heterogeneity within the locus. However, linear GWAS models do not account for this problem, and the genomic relationship matrix is estimated based only on additive fixed effects between SNPs. Consequently, the false positive rate and type I error are high, and many SNPs with true non-zero effects behave like null effects and, conversely, many SNPs with null effects show significant associations in the results. In contrast, in Bayesian whole genome regression, SNPs are considered as random effects, which introduces MAF as the main genetic hyper-parameter in the model. Additionally, the effects of minor and major alleles are accounted for via the X Z α β component in the model, which reveals the allelic effects from the interaction between population structure and kinship matrix. All these different types of information, including genetic parameters and hyper-parameters, bear implications for assessing reliable associations in the GWAS results. With the Bayesian framework, it is possible to include all this information in both sides of the model, which greatly reduces the type I and II errors due to the posterior specifications for each component of the model. In the present work, we have studied two precise and efficient computational approaches as different filters to determine the best GWAS and GP models and consequently obtain reliable association in the results of the best selected model.
Local FDR as frequentist inference with Fisher information background was applied to check the distribution of SNPs especially in the tails as critical regions of the distribution. This method, based on maximum likelihood estimation, is more precise for determining the effect size and strength of a large-scale genomic file. As can be observed in the results, the expected mean likelihoods based on SNP information are acceptable, but the standard error in the models with a scale of −log10 (SNP) is still high. The main criticism of this approach is that it locally focuses only on SNP effects in tails, whereas signaling associations were distributed throughout the genome. Moreover, the magnitude of SNP effects alone may be insufficient to inform a conclusion about the performance of the model. For example, the degree of freedom (df) to fit the estimated density among SNPs is affected by the MAF and the quality of SNP chip (e.g., NAs and imputation). A larger df is required for sharper tails in f z z , but in the mlmm model df is low, and consequently the sigma is partially low, which may lead to underestimation or overestimation of the model. With this limitation, we also considered whether we should use this approach because the proportion of false rejection hypotheses is larger in this approach than in the family wise error rate (FWER) approach. The most interesting result of this approach is that in the variance component GWAS model, proportion of H0 (sigma/mean) = 1.034 is the highest due to the highest sigma (SNP standard error), thus this model can clearly be removed due to its very low power. In the Bayesian whole genome regression with a low proportion of H0 (sigma/mean) = 0.633, the putative signal SNPs in the tails were very well covered and the model produced the lowest error (sigma = 0.062).
Bayesian survival FDR analysis as a modern inference based on posterior estimation of SNP effects is commonly used for large-scale genomic data with small effects. In the survival part with the ( Δ S N P i = 0 n component of the approach, we included the minor and major alleles in each SNP based on the time to events, which is necessary to achieve the expected prior distribution. Therefore, the MAF and the quality of SNP chip were considered in the computation. In GWAS with the original SNP information among GY vector, only in the NAM model were both minor and major alleles revealed in the results with the highest AIC and BIC criteria versus two models. In GWAS with SNP simulation, all allelic effects including major, minor, heterozygous and missing were revealed in the same model, which may present good evidence supporting this model as the best performing. Thus, we found that this is in good agreement with [29] to minimize the type I error at the significance level of MAF. As shown in Figure 5 and Figure 6b, the relationship between effect size and MAF is not strong for SNPs with higher MAF value. The difference is claimed for SNPs with low MAFs (from SNP3: AX-110385692 to SNP5: BS00094057_51). In this case, it indicates that the informative nature of l o c F D R z i is lower than b s F D R λ S N P to determine reliable associations, which closely aligns with the results of [23,33]. Binary MCMC sampling is a renewable technique in survival Bayesian inference to generate large-scale genomic datasets; however, it is affected by long computation times in the linear algorithms. In the Bayesian part, we solved this technical difficulty using the exponential form exp ( Δ S N P i = 0 n to predict the SNP prior distribution. For both the local FDR and Bayesian survival approaches, the standard error of the NAM model was lower than the others. To compare the performance of the Bayesian survival model with other commonly used Bayesian platforms, we proposed λ S N P i based on covariate effects between SNPs. These covariate effects have negative implications for the expected prior distribution in all Bayesian approaches, including survival. To solve this problem, β G Y should take an exponential form based on λ 0 which allows the GP algorithms to examine SNP effects with minimal bias and variance. To compare the accuracy of GP models, we used Z-score because it increases the mean and variance of genetic value for each genotype. However, it makes interpretation easier and more accurate than p-value and Bayes factor with a smaller comparison interval. The five top significant SNPs were compared with different Bayesian models. Only Bayesian survival analysis can identify the difference between minor and major alleles based on time to events. In b s F D R λ S N P ,     λ S N P is a semi-parametric statistic because it is based on MAF and allelic scores of genotypes simultaneously [34]. It seems that the semi-parametric empirical Bayes factor better controls both false positive and false negative errors in GWAS to identify allelic variations and find templates for significant SNPs. Therefore, we propose the utilization of different GWAS models at more N levels to increase the replicability of the posterior probability [35] of the identified signal associations.

5. Materials and Methods

5.1. Plant Materials and Field Experiment

In this study, a set of 221 bread wheat genotypes from Breeding Innovations In Wheat for resilient Cropping Systems (BRIWECS) project were cultivated as split-plots design, in three cropping seasons 2018, 2019 and 2020 at the agricultural research station Campus Klein-Altendorf, University of Bonn, Germany. We recorded the GY value of each genotype under high and low N fertilizer annually. Then, the average GY values of the genotypes were used as GY vector for subsequent analysis. The outliers in the GY vector were checked. To deal with outliers, they were kept since they reflected the actual field values across all years. The distribution of the residuals was checked using Shapiro normality test. To improve the quality of the vector, 2000 repeated random samples with replacement were generated from the original GY vector using R/bootstrap, and then Bayesian bootstrap p-value was calculated. To also control genotype random effect, the EBVs and BLUPs were based on broad sense heritability [36] plotted.

5.2. SNP Quality Control

A platform of 150 K affymetrix SNP Chip was used to apply the GY vector in the GWAS and GP models. The SNPs with MAF ≤ 0.05 were removed due to monomorphism in the marker. After checking SNPs that deviated from the Hardy–Weinberg equilibrium (HWE), only 22,489 polymorphic SNP markers remained, which were used in GWAS and GP analyses.

5.3. Population Structure

Cumulative variance explained by the eigenvalues of the principal components was calculated, and then discriminant analysis of principal components (DAPC), which minimizes genetic variation within clusters, was also conducted using R/adegenet for assessing the population structure. To determine the actual number of PCs retained in the DAPC, cross validation (CV) was performed to simulate the low and high numbers of PCs in the model. To control outliers in the population clusters, information from the identical-by-state (IBS), which measures differences in allelic states, and the fixation index ( F s t ) , which refers to the amount of heterozygosity at different levels of population structure, were calculated based on covariance estimation [37]. To detect the regions that might be involved in the linkage pattern of the population, the whole genome wide plot of F s t was constructed.

5.4. Construction of Genomic Relationship Matrix (GRM)

Basically, the GRM as kinship matrix is used in GWAS and GP models. Based on the method suggested by (Mathew et al. [38]), the covariance between individuals g i and g j can be equal by covariance of S N P i j , therefore the GRM was calculated using G = Σ k = 1 L g i k p k 2 Σ k = 1 L p k 1 p k , and p k = 1 n Σ i = 1 n g i k where, L is number of loci, p k is MAF for the locus k and g i .

5.5. GWAS and GP Models

A total of four different GWAS models, including single-locus association with R/rrBLUP [39], multi-locus association with R/mlmm.gwas [11], variance component association with R/sommer [40] and Bayesian whole genome regression with R/NAM [41] at low and high N levels were performed to detect the association between SNPs and GY as a target trait. The single-locus association model was fitted in the adjusted form the mixed linear model as y = X β + Z u + e , where y is the trait vector, X is the fixed effects matrix, β is the vector of coefficients including principal components and population structure, Z the matrix of random SNP effects coded as (−1, 0 and 1), V u = K σ g 2 , where K is the GRM as kinship matrix and σ g 2 is additive genetic variance with IBS basis. It was removed from the model due to convergence of N × Y to zero. Multi-locus association model, form y i = 1 n = μ + j = 1 m M . j β j + e , where y i = 1 n is trait vector with n genotypes, m is total number of SNPs, M i j is the matrix of random SNP effects coded as (0, 1 and 2) and β j is the vector of SNP effects and H 0 is given in terms of β = σ g 2 = 0 . Once the −log (p-value) is above the FDR threshold line, H 0 is rejected. In the variance component model, y = i = 1 c K i σ g 2 + e , where c is non-overlapping classes of SNPs, K i is the class of kinship matrix based on genomic data and the components of genetic variance are from REML of the mean of SNP information. Bayesian whole genome regression model was fitted to y = μ + X α + Z β + X Z α β + e , where y is trait vector, X is matrix of genotypes and SNPs, α is corresponding vector of SNP effects that captures small effects of all SNPs, β is vector that captures additional effects of SNPs with large effect based on Bayes factor.
GP models were performed with Bayesian whole genome regression based on the following model: y = μ + X β + Z 1 Z 2 u + e , where, y is the GY vector, X , is SNP information for the genotypes, β is regression coefficient for each SNP, indicating the SNP effect in the model, Z 1 is effect of major allele, Z 2 is effect of minor allele, u is vector of Bayes factor for SNP matrix and e is implies to residual term [42]. Bayesian models including Bayes A, B, C, LASSO, RR and Bayesian survival analysis, were then applied to predict genomic estimated breeding values (GEBVs). The CV approach was used to evaluate the accuracy of each model. To determine the best model, correlations ρ y ˇ E B V ,   y ˇ G E B V were estimated for each GP model.

5.6. SNP Effect Estimation under locFDR and bsFDR

To check the distribution of SNPs p-value, for each GWAS model under low and high N levels, zz vector was created under the null hypothesis N (0, 1), which is necessary, when SNPs p-value is very far from normal distribution. To fit the density of f (zz) with heavy tails, degree of freedom (df) was determined based on SNP sample size. Then, empirical null hypotheses were used to estimate the parameters of f (zz) by maximum likelihood, indicating the accuracy of each GWAS model, and the results were visualized using R/locfdr [32]. Based on l o c F D R z i function, the significant associations above FDR threshold line of 0.05 were selected for the GWAS models. The survival function was represented as a cumulative hazard function using R/survival [43],   S S N P i = exp Δ S N P i = 0 n , where, S s n p is survived coefficient for SNP, which estimates the tendency between uniformity in the bottom of SNPs p-value distribution and the peaks, while there were no other observations in the bottom of distribution S y S N P . Based on the MAF and missing values of each SNP, the prior distribution of significant associations were calculated using the function b s F D R λ S N P , for the GWAS models. To check for covariate effects within SNPs in the GP models, the Bayesian survival function was applied in the form of λ S N P i = λ 0 S N P exp β G Y , where, λ S N P i is the Bayes survived factor indicating the SNP effect, λ 0 is the baseline for function and β G Y is regression coefficient of the whole genome under the GY vector.

Author Contributions

M.B.S. literature and problem statement, methodology, formal analysis, writing original draft; A.B. validation, review, editing, funding acquisition; J.L. editing and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to acknowledge the kind support of DAAD who generously provided us the opportunity to pursue our research and financing the study. Also, the authors are grateful to the Institute for Crop Science and Resource Conservation (INRES)-Plant Breeding, University of Bonn, Germany, for responsible supervision and execution plan during the study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Eltaher, S.; Baenziger, P.S.; Belamkar, V.; Emara, H.A.; Nower, A.A.; Salem, K.F.; Alqudah, A.M.; Sallam, A. GWAS revealed effect of genotype × environment interactions for grain yield of Nebraska winter wheat. BMC Genom. 2021, 22, 2. [Google Scholar] [CrossRef]
  2. Nehe, A.; Akin, B.; Sanal, T.; Evlice, A.K.; Ünsal, R.; Dinçer, N.; Demir, L.; Geren, H.; Sevim, I.; Orhan, Ş.; et al. Genotype x environment interaction and genetic gain for grain yield and grain quality traits in Turkish spring wheat released between 1964 and 2010. PLoS ONE 2019, 14, e0219432. [Google Scholar] [CrossRef]
  3. Efron, B.; Hastie, T.J. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
  4. Chernick, M.R.; LaBudde, R.A. An Introduction to Bootstrap Methods with Applications to R; Wiley-Blackwell: Hoboken, NJ, USA, 2011. [Google Scholar]
  5. Gianola, D.; Cecchinato, A.; Naya, H.; Schön, C.C. Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front. Genet. 2018, 9, 195. [Google Scholar] [CrossRef] [PubMed]
  6. Ma, X.; Christensen, O.F.; Gao, H.; Huang, R.; Nielsen, B.; Madsen, P.; Jensen, J.; Ostersen, T.; Li, P.; Shirali, M.; et al. Prediction of breeding values for group-recorded traits including genomic information and an individually recorded correlated trait. Heredity 2020, 126, 206–217. [Google Scholar] [CrossRef] [PubMed]
  7. Deja-Muylle, A.; Parizot, B.; Motte, H.; Beeckman, T. Exploiting natural variation in root system architecture via genome-wide association studies. J. Exp. Bot. 2020, 71, 2379–2389. [Google Scholar] [CrossRef]
  8. Gondro, C.; Van der Werf, J.; Hayes, B. Genome-Wide Association Studies and Genomic Prediction; Humana Press: Totowa, NJ, USA, 2017. [Google Scholar]
  9. Maldonado, C.; Mora-Poblete, F.; Contreras-Soto, R.I.; Ahmar, S.; Chen, J.T.; do Amaral Júnior, A.T.; Scapim, C.A. Genome-Wide Prediction of Complex Traits in Two Outcrossing Plant Species Through Deep Learning and Bayesian Regularized Neural Network. Front. Plant Sci. 2020, 11, 593897. [Google Scholar] [CrossRef]
  10. Wei, W.H.; Hemani, G.; Haley, C.S. Detecting epistasis in human complex traits. Nat. Rev. Genet. 2014, 15, 722–733. [Google Scholar] [CrossRef]
  11. Segura, V.; Vilhjálmsson, B.J.; Platt, A.; Korte, A.; Seren, Ü.; Long, Q.; Nordborg, M. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 2012, 44, 825–830. [Google Scholar] [CrossRef]
  12. Würschum, T.; Kraft, T. Evaluation of multi-locus models for genome-wide association studies: A case study in sugar beet. Heredity 2014, 114, 281–290. [Google Scholar] [CrossRef]
  13. Müller, B.U.; Stich, B.; Piepho, H.P. A general method for controlling the genome-wide type I error rate in linkage and association mapping experiments in plants. Heredity 2010, 106, 825–831. [Google Scholar] [CrossRef]
  14. Mutshinda, C.M.; Sillanpää, M.J. Swift block-updating EM and pseudo-EM procedures for Bayesian shrinkage analysis of quantitative trait loci. Theor. Appl. Genet. 2012, 125, 1575–1587. [Google Scholar] [CrossRef]
  15. Wang, S.B.; Feng, J.Y.; Ren, W.L.; Huang, B.; Zhou, L.; Wen, Y.J.; Zhang, J.; Dunwell, J.M.; Xu, S.; Zhang, Y.M. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci. Rep. 2016, 6, 19444. [Google Scholar] [CrossRef] [PubMed]
  16. Mathew, B.; Léon, J.; Sannemann, W.; Sillanpää, M.J. Detection of Epistasis for Flowering Time Using Bayesian Multilocus Estimation in a Barley MAGIC Population. Genetics 2018, 208, 525–536. [Google Scholar] [CrossRef] [PubMed]
  17. Kärkkäinen, H.P.; Li, Z.; Sillanpää, M.J. An Efficient Genome-Wide Multi locus Epistasis Search. Genetics 2015, 201, 865–870. [Google Scholar] [CrossRef]
  18. Li, Z.; Sillanpää, M.J. Estimation of Quantitative Trait Locus Effects with Epistasis by Variation Bayes Algorithms. Genetics 2012, 190, 231–249. [Google Scholar] [CrossRef]
  19. Svishcheva, G.R.; Axenovich, T.I.; Belonogova, N.M.; van Duijn, C.M.; Aulchenko, Y.S. Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 2012, 44, 1166–1170. [Google Scholar] [PubMed]
  20. Banerjee, S.; Zeng, L.; Schunkert, H.; Söding, J. Bayesian multiple logistic regression for case-control GWAS. PLoS Genet. 2018, 14, e1007856. [Google Scholar] [CrossRef]
  21. Chen, C.; Steibel, J.P.; Tempelman, R.J. Genome Wide Association Analyses Based on Broadly Different Specifications for Prior Distributions, Genomic Windows, and Estimation Methods. Genetics 2017, 206, 1791–1806. [Google Scholar] [CrossRef]
  22. He, L.; Kulminski, A.M. Fast Algorithms for Conducting Large-Scale GWAS of Age-at-Onset Traits Using Cox Mixed-Effects Models. Genetics 2020, 215, 41–58. [Google Scholar] [CrossRef] [PubMed]
  23. Wakefield, J. Bayes factors for genome-wide association studies: Comparison with p-values. Genet. Epidemiol. 2009, 33, 79–86. [Google Scholar] [CrossRef]
  24. Lee, Y.; Luca, F.; Pique-Regi, R.; Wen, X. Bayesian Multi-SNP Genetic Association Analysis: Control of FDR and Use of Summary Statistics. bioRxiv 2018. [Google Scholar] [CrossRef]
  25. McDaid, A.F.; Joshi, P.K.; Porcu, E.; Komljenovic, A.; Li, H.; Sorrentino, V.; Litovchenko, M.; Bevers, R.P.; Rüeger, S.; Reymond, A.; et al. Bayesian association scan reveals loci associated with human lifespan and linked biomarkers. Nat. Commun. 2017, 8, 15842. [Google Scholar] [CrossRef]
  26. Hughey, J.J.; Rhoades, S.D.; Fu, D.Y.; Bastarache, L.; Denny, J.C.; Chen, Q. Cox regression increases power to detect genotype-phenotype associations in genomic studies using the electronic health record. BMC Genom. 2019, 20, 805. [Google Scholar] [CrossRef] [PubMed]
  27. Theodoratou, E.; Farrington, S.M.; Timofeeva, M.; Din, F.V.N.; Svinti, V.; Tenesa, A.; Liu, T.; Lindblom, A.; Gallinger, S.; Campbell, H.; et al. Genome-wide scan of the effect of common nsSNPs on colorectal cancer survival outcome. Br. J. Cancer 2018, 119, 988–993. [Google Scholar] [CrossRef] [PubMed]
  28. Habier, D.; Fernando, R.L.; Kizilkaya, K.; Garrick, D.J. Extension of the Bayesian alphabet for genomic selection. BMC Bioinform. 2011, 12, 186. [Google Scholar] [CrossRef]
  29. Bi, W.; Fritsche, L.G.; Mukherjee, B.; Kim, S.; Lee, S. A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. Am. J. Hum. Genet. 2020, 107, 222–233. [Google Scholar] [CrossRef] [PubMed]
  30. Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
  31. Efron, B.; Tibshirani, R. Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 2002, 23, 70–86. [Google Scholar] [CrossRef]
  32. Efron, B. Correlation and Large-Scale Simultaneous Significance Testing. J. Am. Stat. Assoc. 2007, 102, 93–103. [Google Scholar] [CrossRef]
  33. Ouyang, H. Bayesian Approach for Nonlinear Dynamic System and Genome-Wide Association Study. Ph.D. Thesis, North Carolina State University, Raleigh, NC, USA, 2010. [Google Scholar]
  34. Morisawa, J.; Otani, T.; Nishino, J.; Emoto, R.; Takahashi, K.; Matsui, S. Semi-parametric empirical Bayes factor for genome-wide association studies. Eur. J. Hum. Genet. 2021, 29, 800–807. [Google Scholar] [CrossRef]
  35. McGuire, D.; Jiang, Y.; Liu, M.; Weissenkampen, J.D.; Eckert, S.; Yang, L.; Chen, F.; Berg, A.; Vrieze, S.; Jiang, B.; et al. Model-based assessment of replicability for genome-wide association meta-analysis. Nat. Commun. 2021, 12, 1964. [Google Scholar] [CrossRef] [PubMed]
  36. Schmidt, P.; Hartung, J.; Bennewitz, J.; Piepho, H.P. Heritability in Plant Breeding on a Genotype-Difference Basis. Genetics 2019, 212, 991–1008. [Google Scholar] [CrossRef] [PubMed]
  37. Dadshani, S.; Mathew, B.; Ballvora, A.; Mason, A.S.; Léon, J. Detection of breeding signatures in wheat using a linkage disequilibrium-corrected mapping approach. Sci. Rep. 2021, 11, 5527. [Google Scholar] [CrossRef]
  38. Mathew, B.; Léon, J.; and Sillanpää, M.J. A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity 2017, 120, 356–368. [Google Scholar] [CrossRef]
  39. Endelman, J.B. Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome 2011, 4, 250–255. [Google Scholar] [CrossRef]
  40. Covarrubias-Pazaran, G. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. bioRxiv 2018, 354639. [Google Scholar] [CrossRef]
  41. Xavier, A.; Xu, S.; Muir, W.M.; Rainey, K.M. NAM: Association studies in multiple populations. Bioinformatics 2015, 31, 3862–3864. [Google Scholar] [CrossRef]
  42. de los Campos, G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P. Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding. Genetics 2013, 193, 327–345. [Google Scholar] [CrossRef]
  43. Rizvi, A.A.; Karaesmen, E.; Morgan, M.; Preus, L.; Wang, J.; Sovic, M.; Hahn, T.; Sucheston-Campbell, L.E. gwasurvivr: An R package for genome wide survival analysis. Bioinformatics 2018, 35, 1968–1970. [Google Scholar] [CrossRef]
Figure 1. (a) Left, GY vector under LN treatment after removing outliers the Bayesian bootstrap p-value = 0.3048, which shows the Shapiro p-value = 0.2462 is following normal distribution, but on the right, the correlation between BLUPs and BLUEs as analog for EBVs is low (r = 0.107). (b) Left, GY vector under HN treatment, mean of three agronomic years 2018, 2019 and 2020, quality control via Shapiro p-value and Bayesian bootstrap p-value, both p-values represent acceptable management of outliers in the vector; right, the correlation between BLUPs and BLUEs as analog for EBVs is high (r = 0.814).
Figure 1. (a) Left, GY vector under LN treatment after removing outliers the Bayesian bootstrap p-value = 0.3048, which shows the Shapiro p-value = 0.2462 is following normal distribution, but on the right, the correlation between BLUPs and BLUEs as analog for EBVs is low (r = 0.107). (b) Left, GY vector under HN treatment, mean of three agronomic years 2018, 2019 and 2020, quality control via Shapiro p-value and Bayesian bootstrap p-value, both p-values represent acceptable management of outliers in the vector; right, the correlation between BLUPs and BLUEs as analog for EBVs is high (r = 0.814).
Ijms 24 14011 g001
Figure 2. (a,b) PCA-explained variance and number of clusters by PCs based on 150 K SNP Chip on 221 bread wheat genotypes, (c) discriminate analysis of principle components with alpha-score based on genetic diversity between clusters, (d) fixation index using 150 K SNP Chips of the genetic variation among and within population.
Figure 2. (a,b) PCA-explained variance and number of clusters by PCs based on 150 K SNP Chip on 221 bread wheat genotypes, (c) discriminate analysis of principle components with alpha-score based on genetic diversity between clusters, (d) fixation index using 150 K SNP Chips of the genetic variation among and within population.
Ijms 24 14011 g002
Figure 3. (a) GWAS model selection based on local FDR approach, in each model: Delta shows maximum likelihood for mean estimation, Sigma is standard error and Proportion of H0 refers to proportion of false rejection hypotheses. All three estimators are important to determine the accuracy and performance of GWAS model for grain yield under low N fertilizer. (b) GWAS model selection based on local FDR approach, in each model: Delta shows maximum likelihood for mean estimation, Sigma is standard error and Proportion of H0 refers proportion of false rejection hypotheses. All three estimators are important to determine the accuracy and performance of GWAS model for grain yield under high N fertilizer.
Figure 3. (a) GWAS model selection based on local FDR approach, in each model: Delta shows maximum likelihood for mean estimation, Sigma is standard error and Proportion of H0 refers to proportion of false rejection hypotheses. All three estimators are important to determine the accuracy and performance of GWAS model for grain yield under low N fertilizer. (b) GWAS model selection based on local FDR approach, in each model: Delta shows maximum likelihood for mean estimation, Sigma is standard error and Proportion of H0 refers proportion of false rejection hypotheses. All three estimators are important to determine the accuracy and performance of GWAS model for grain yield under high N fertilizer.
Ijms 24 14011 g003
Figure 4. GWAS model selection based on Bayesian survival analysis, the minor and major alleles in each SNP were involved in computation based on time to events, which is necessary to attain expected SNP prior distribution. SNP1: AX-158576714, SNP2: AX-109898892, SNP3: AX-110385692, SNP4: AX-109470057 and SNP5: BS00094057_51. (a) Under LN, GY vector in the rrBLUP and mlmm models only major allele survived with mean of S(SNPi) 0.547 and 0.481, but in the NAM model both major and minor alleles during survival analysis remained. (b) Under HN condition, in the mlmm model only the major allele survived with mean of S(SNPi) 0.597, but in the rrBLUP both major and minor alleles and in the NAM all major, minor, heterozygous and missing (NA) alleles during survival analysis remained.
Figure 4. GWAS model selection based on Bayesian survival analysis, the minor and major alleles in each SNP were involved in computation based on time to events, which is necessary to attain expected SNP prior distribution. SNP1: AX-158576714, SNP2: AX-109898892, SNP3: AX-110385692, SNP4: AX-109470057 and SNP5: BS00094057_51. (a) Under LN, GY vector in the rrBLUP and mlmm models only major allele survived with mean of S(SNPi) 0.547 and 0.481, but in the NAM model both major and minor alleles during survival analysis remained. (b) Under HN condition, in the mlmm model only the major allele survived with mean of S(SNPi) 0.597, but in the rrBLUP both major and minor alleles and in the NAM all major, minor, heterozygous and missing (NA) alleles during survival analysis remained.
Ijms 24 14011 g004
Figure 5. (a) Manhattan upon Bayesian whole genome regression GWAS model, the top five significant SNPs include Ax-110385692 in chr. 3A, Ax-158547970 in chr. 3A, Ax-158538619 in chr. 4A, Ax-158522989 in chr. 3A and Ax-109470057 in chr. 3A, were plotted with NAM package. (b) In QQ plot, all top five significant SNPs closely follow the Chi-square association line trend.
Figure 5. (a) Manhattan upon Bayesian whole genome regression GWAS model, the top five significant SNPs include Ax-110385692 in chr. 3A, Ax-158547970 in chr. 3A, Ax-158538619 in chr. 4A, Ax-158522989 in chr. 3A and Ax-109470057 in chr. 3A, were plotted with NAM package. (b) In QQ plot, all top five significant SNPs closely follow the Chi-square association line trend.
Ijms 24 14011 g005
Figure 6. (a) Accuracy of GP models based on Bayesian inference, Z-score was used due to increase mean deviation in genetic value for each genotype. (b) The top five significant SNPs were compared with different Bayesian models. Only the Bayesian survival analysis can identify the differentials within them due to time to events among minor and major alleles within association panel. SNP1: AX-158576714, SNP2: AX-109898892, SNP3: AX-110385692, SNP4: AX-109470057 and SNP5: BS00094057_51.
Figure 6. (a) Accuracy of GP models based on Bayesian inference, Z-score was used due to increase mean deviation in genetic value for each genotype. (b) The top five significant SNPs were compared with different Bayesian models. Only the Bayesian survival analysis can identify the differentials within them due to time to events among minor and major alleles within association panel. SNP1: AX-158576714, SNP2: AX-109898892, SNP3: AX-110385692, SNP4: AX-109470057 and SNP5: BS00094057_51.
Ijms 24 14011 g006
Table 1. GWAS results using rrBLUP, mlmm, Sommer and NAM models for GY vector of 221 wheat genotypes under low N treatment.
Table 1. GWAS results using rrBLUP, mlmm, Sommer and NAM models for GY vector of 221 wheat genotypes under low N treatment.
GWAS ModelN LevelSNPChr aPos bMaj. Allele cMAF d−log(p-Value) eML
( l o c F D R z i ) f
P
( b s F D R λ S N P ) g
rrBLUPLNAX-1585966442A136,652,249NA---4.4564076310.5020.195
LNAX-1585683027D55,688,624NA---4.2565073840.179
LNAX-1098659276D160,078,273AA0.1763.6691683140.235
LNAX-1585767143A69,099,202CC0.1813.2888850.237
LNAX-1098988923A73,588,989CC0.1273.1287320.261
mlmmLNAX-1089156134A79,319,176NA---3.1896610.5410.112
LNAX-1106005226A111,630,421CC0.0643.0998510.178
LNAX-1585442057D24,335,511CC0.0642.9001640.181
LNAX-1586123233A3,508,832NA---2.7788970.233
LNAX-1092998945A97,047,183TT0.2882.6839410.163
SommerLNAX-1585442056A24,335,511CC0.0645.30268110.5570.188
LNAX-1087411006B67,958,573NA---5.16788020.175
LNAX-1089156134A79,319,176NA---4.60117480.110
LNAX-1092998945A97,047,183CC0.1083.67398100.122
LNAX-1106005226A121,657,421NA---3.54612840.135
NAMLNAX-1103856923A47,067,572TT0.1153.5418400.5890.288
LNAX-1585479703A46,623,733AA0.1123.1742080.276
LNAX-1586089422A64,279,945GG0.2242.1103740.234
LNAX-861786232A248,384,372TT0.2132.0285630.260
LNwsnp_Ex_c351_6894157B706,808,922CC0.1152.0165660.251
a Chr: Chromosome, b Pos: Position (bp), c Maj. Allele: Major Allele, d MAF: Minor Allele Frequency, e −log(p Value) for five top significant SNPs received from different GWAS models, f ML l o c F D R z i : maximum likelihood of five significant SNPs based on local FDR threshold in the GWAS model, g P ( b s F D R λ S N P ): posterior probability of SNP when alternative hypothesis is true.
Table 2. GWAS results using rrBLUP, mlmm, Sommer and NAM models for GY vector of 221 wheat genotypes under high N treatment.
Table 2. GWAS results using rrBLUP, mlmm, Sommer and NAM models for GY vector of 221 wheat genotypes under high N treatment.
GWAS ModelN LevelSNPChr aPos bMaj. Allele cMAF d−log (p-Value) eML
( l o c F D R z i )  f
P
( b s F D R λ S N P ) g
rrBLUPHNAX-1585767143A69,099,202CC0.1813.2888850.5610.389
HNAX-1098988923A73,588,989CC0.1273.1287320.391
HNAX-1115632001B15,596TT0.1122.7644300.269
HNAX-1100389795B13,056NA---2.1733670.266
HNAX-1088529225B14,184NA---2.0392610.266
mlmmHNAX-1098988923A73,588,989CC0.1273.2888850.4540.372
HNAX-1585442057D24,335,511CC0.0642.9001640.284
HNAX-1586123233A3,508,832NA---2.7788970.357
HNAX-1089156134A79,319,176NA---2.1532610.251
HNAX-1092998945A97,047,183TT0.2882.0739410.272
SommerHNAX-1089156134A79,319,176NA---4.60117480.6610.248
HNAX-1092998945A97,047,183CC0.1083.67398100.223
HNAX-1585442056A24,335,511CC0.0643.2114530.217
HNAX-1585767143A69,099,202CC0.1813.2888850.281
HNIACX25405A619,684,824CC0.2132.782840.272
NAMHNAX-1585767143A69,099,202CC0.1813.2888850.5580.415
HNAX-1098988923A73,588,989CC0.1273.28459150.427
HNAX-1103856923A47,067,572TT0.1153.18522910.414
HNAX-1094700573A61,095,990CC0.1622.59538620.427
HNBS00094057_513A7,437,103TT0.0631.9380220.412
a Chr: Chromosome, b Pos: Position (bp), c Maj. Allele: Major Allele, d MAF: Minor Allele Frequency, e −log(p Value) for top five significant SNPs received from different GWAS models, f ML l o c F D R z i : maximum likelihood of five significant SNPs based on local FDR threshold in the GWAS model, g P ( b s F D R λ S N P ): posterior probability of SNP when alternative hypothesis is true.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sadeqi, M.B.; Ballvora, A.; Léon, J. Local and Bayesian Survival FDR Estimations to Identify Reliable Associations in Whole Genome of Bread Wheat. Int. J. Mol. Sci. 2023, 24, 14011. https://doi.org/10.3390/ijms241814011

AMA Style

Sadeqi MB, Ballvora A, Léon J. Local and Bayesian Survival FDR Estimations to Identify Reliable Associations in Whole Genome of Bread Wheat. International Journal of Molecular Sciences. 2023; 24(18):14011. https://doi.org/10.3390/ijms241814011

Chicago/Turabian Style

Sadeqi, Mohammad Bahman, Agim Ballvora, and Jens Léon. 2023. "Local and Bayesian Survival FDR Estimations to Identify Reliable Associations in Whole Genome of Bread Wheat" International Journal of Molecular Sciences 24, no. 18: 14011. https://doi.org/10.3390/ijms241814011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop