Next Article in Journal
Important Role and Properties of Granular Nanocellulose Particles in an In Vitro Simulated Gastrointestinal System and in Lipid Digestibility and Permeability
Previous Article in Journal
REST Is Not Resting: REST/NRSF in Health and Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Comparison of the Accuracy of Epistasis and Haplotype Models for Genomic Prediction of Seven Human Phenotypes

Department of Animal Science, University of Minnesota, Saint Paul, MN 55108, USA
*
Author to whom correspondence should be addressed.
Biomolecules 2023, 13(10), 1478; https://doi.org/10.3390/biom13101478
Submission received: 7 August 2023 / Revised: 26 September 2023 / Accepted: 28 September 2023 / Published: 3 October 2023
(This article belongs to the Section Bioinformatics and Systems Biology)

Abstract

:
The accuracy of predicting seven human phenotypes of 3657–7564 individuals using global epistasis effects was evaluated and compared to the accuracy of haplotype genomic prediction using 380,705 SNPs and 10-fold cross-validation studies. The seven human phenotypes were the normality transformed high density lipoproteins (HDL), low density lipoproteins (LDL), total cholesterol (TC), triglycerides (TG), weight (WT), and the original phenotypic observations of height (HTo) and body mass index (BMIo). Fourth-order epistasis effects virtually had no contribution to the phenotypic variances, and third-order epistasis effects did not affect the prediction accuracy. Without haplotype effects in the prediction model, pairwise epistasis effects improved the prediction accuracy over the SNP models for six traits, with accuracy increases of 2.41%, 3.85%, 0.70%, 0.97%, 0.62% and 0.93% for HDL, LDL, TC, HTo, WT and BMIo respectively. However, none of the epistasis models had higher prediction accuracy than the haplotype models we previously reported. The epistasis model for TG decreased the prediction accuracy by 2.35% relative to the accuracy of the SNP model. The integrated models with epistasis and haplotype effects had slightly higher prediction accuracy than the haplotype models for two traits, HDL and BMIo. These two traits were the only traits where additive × dominance effects increased the prediction accuracy. These results indicated that haplotype effects containing local high-order epistasis effects had a tendency to be more important than global pairwise epistasis effects for the seven human phenotypes, and that the genetic mechanism of HDL and BMIo was more complex than that of the other traits.

1. Introduction

Epistasis effects are gene interaction effects and global epistasis effects of single nucleotide polymorphism (SNP) markers are interaction effects of genome-wide SNPs. Several studies showed that global epistasis effects increased the accuracy of genomic prediction [1,2,3,4]. Haplotype effects may contain local high-order epistasis effects beyond pairwise epistasis effects and improved the prediction accuracy of genomic prediction in some cases [5,6,7,8,9,10]. For the Framingham Heart Study (FHS) data in this study, we previously reported that haplotypes improved the accuracy of predicting the phenotypic values of seven human phenotypes, with accuracy increases of 1.86–8.12% due to the use of haplotypes relative to the best SNP models, and the haplotype epistasis heritability showed that the epistasis effects contained in the haplotypes were responsible for the accuracy increases due to haplotypes [8]. The epistasis effects contained in the haplotypes were local high-order epistasis effects beyond pairwise epistasis effects because the average number of SNPs in haplotypes for the best haplotype prediction model was mostly more than 12 SNPs per haplotype block for the seven human phenotypes, although the exact number of SNPs involved in those high-order epistasis effects were unknown. Local high-order epistasis effects have the limitation of covering proximal SNPs only in a chromosome region or gene and cannot study interactions between distal variants on the same chromosome or variants on different chromosomes. In contrast, global epistasis effects can be used to study interaction effects between genetic variants anywhere on the genome, and potentially could utilize epistasis effects not covered by haplotypes for improving the accuracy of predicting phenotypic values. Similarly, local high-order epistasis effects could cover epistasis effects that are not feasible to study using the approach for global epistasis effects, as shown by the results of third- and fourth-order epistasis effects in this study. However, the comparison between local and global epistasis effects for their contributions to the phenotypic variance and the accuracy of genomic prediction was unavailable. We hypothesized that local epistasis can be more important than global epistasis for some traits, the reverse is true for some other traits, and some traits may involve both local and global epistasis effects. To test those hypothesis, and to discover and utilize the complex genetic mechanisms associated with those hypothesis, we developed a multifactorial method that integrates haplotype effects that contain local epistasis effects with global epistasis effects for genomic estimation and prediction [11] and implemented this method by the EPIHAP computer program [2]. In this study, we investigated the relationship between the accuracy of genomic prediction of seven human phenotypes and global low-order epistasis effects as well as local high-order epistasis effects contained in haplotypes to answer two questions: whether global epistasis effects or haplotype effects were more important for predicting the seven human phenotypes, and whether the integration of global epistasis and haplotype effects would improve the accuracy of genomic prediction for any of the seven phenotypes using the multifactorial method implemented in the EPIHAP program.

2. Materials and Methods

2.1. Phenotypic and SNP Data

The Framingham Heart Study (FHS) data (2019 version) had 7565 individuals with genotypes of the 500 K SNP panel that had 488,146 autosomal SNPs. The seven phenotypes analyzed in this study were the normality transformed high density lipoproteins (HDL), low density lipoproteins (LDL), total cholesterol (TC), triglycerides (TG), weight (WT) using the Box-Cox transformation; and the original phenotypic observations without normality transformation of height (HTo) and body mass index (BMIo), with 3657–7564 observations, as previously described [8]. The SNP coordinates were converted to those of GRCh38.p13, and 486,356 SNPs had known chromosome positions on GRCh38.p13. From these 486,356 autosome SNPs, we previously studied eight SNP sets including the 380 K SNP set with 380,705 SNPs requiring minor allele frequency of 0.05, and this 380 SNP set was used in this study for epistasis and haplotype analysis. The haplotype blocking method for each trait was that with the highest prediction accuracy among over 140 haplotype models using structural and function genomic information in our previous study [8], i.e., the gene-based blocks for HDL, 12 SNPs per block for LDL, the 50 Kb blocks for TG, the 200 Kb blocks for HTo, 12 SNPs per block for WT, and the 100 Kb blocks for BMIo.

2.2. Mixed Model for GBLUP and GREML

Genomic best linear unbiased prediction (GBLUP) of genetic values was used for calculating the accuracy of predicting the phenotypic values, and genomic restricted maximum likelihood estimation (GREML) was used for estimating the heritability of each type of genetic effects using a multifactorial mixed model. The effect types evaluated by GBLUP and GREML included SNP additive and dominance effects, haplotype additive effects, and global epistasis effects up to the fourth-order. Details of the multifactorial model and the GBLUP and GREML based on this model are available in our methodology article [11]. The multifactorial mixed model can be briefly described as:
y = X b + Z g + e = X b + Z i = 1 f u i + e
V = Z G Z +   σ e 2 I N = Z ( i = 1 f G i ) Z +   σ e 2 I N = Z ( i = 1 f σ i 2 S i ) Z +   σ e 2 I N
G = Var ( i = 1 f u i ) = i = 1 f G i = i = 1 f σ i 2 S i
where y = N × 1 column vector of phenotypic observations, Z = N × n incidence matrix allocating phenotypic observations to each individual = identity matrix for one observation per individual (N = n), N = number of observations, n = number of individuals, b = c × 1 column vector of fixed effects, c = number of fixed effects, X = N × c model matrix of b, e = N × 1 column vector of random residuals, σ e 2 = residual variance, u i = n × 1 column vector of the genetic effects of the i th effect type, σ i 2 = variance of the genetic effects of the i th effect type, S i = n × n genomic relationship matrix of the i th effect type, and f = 15 in this study. Subscript i = 1 14 in Equations (1) and (2) represent SNP and epistasis effects (Table 1), and subscript i = 15 represents haplotype additive effect. Fixed effects included sex and cholesterol treatment as classification variables, and age, glucose and BMIo as covariables for HDL, TC, LDL and TG; and sex as classification variable and age as covariable for HTo, WT and BMIo. The calculations of the genomic epistasis relationship matrices ( S i ) can use either the approximate method that is the genomic version of Henderson’s Hadamard products between additive and dominance relationship matrices [12,13,14,15] or the exact method that removes intra-locus epistasis that should not exist from the approximate method [16]. The EPIHAP package implemented both methods, and this study used the exact method although the approximate and exact methods had similar results for a swine dataset [11] and the same results for a Holstein dataset [2].

2.3. Evaluation of Prediction Accuracy Using Cross-Validation

A 10-fold validation study was used to evaluate the accuracy of predicting the phenotypic values of each trait for each model. Individuals with phenotypic observations were randomly divided into 10 validation populations. Each validation poluation was predicted by the training population that consists of all the remaining nine validation populations. The first nine validation populations had equal sample size each and the tenth population had the remaining individuals. In each validation population, phenotypic values were omitted when calculating GBLUP for training and validation individuals. The observed accuracy of predicting the phenotypic values (predictive ability, [17]) was defined as the correlation between GBLUP of the genotypic values and the phenotypic values in each validation population and then averaged over all validation populations, i.e.,:
R ^ 0 p = corr ( g ^ 0 ,   y 0 ) = [ k = 1 10 corr ( g ^ 0 k ,   y 0 k ) ] / 10
where g ^ 0 = GBLUP of g 0 ; g 0 = unobservable genetic values; y 0 = phenotypic observations; subscript ‘0’ denotes validation population; ‘corr’ stands for correlation.

2.4. Initial Selection of Epistasis Effects for Prediction Models

The purpose of the initial selection of epistasis models was to exclude effect types with little or no contribution to the phenotypic variance from further evaluation for prediction accuracy. The initial model selection required each effect type to have a heritability estimate greater than 1% for the effect type to be included in the prediction model for further evaluation.

3. Results and Discussion

3.1. Initial Epistasis Models for Predicting Phenotypic Values

Heritability estimates using the model with SNP and epistasis effects up to the fourth-order showed that the fourth-order virtually had no contribution to the phenotypic variances of the seven phenotypes (Table 1). Among the 35 heritability estimates of the fourth-order epistasis effects for seven traits, only six were nonzero in the range of 0.001–0.004, with five nonzero estimates for WT and one for TC. Since none of the heritability estimates of the fourth-order epistasis effects reached the 0.01 threshold we required, the fourth-order epistasis effects were excluded from the prediction model for any trait. For four traits (LDL, TC, TG and HTo), A × A was the only epistasis effect with heritability greater than the 0.01 threshold, ranging from 0.135 for TG to 0.423 for LDL. Therefore, the prediction model for these four traits was A + D + AA. HDL and BMIo had A × D heritability estimates but no third-order epistasis heritability estimates exceeded the 0.01 threshold value, and the initial prediction model for these two traits were the A + D + AA + AD models. For the third-order epistasis effects, only WT had heritability estimates greater than the 0.01 threshold, 0.017 for A × A × A and 0.014 for A × A × D, and the initial prediction model was A + D + AA + AD + DD + AAA + AAD (Table 1).
Table 1. Heritability estimates of the full model with SNP and epistasis effects up to the fourth-order.
Table 1. Heritability estimates of the full model with SNP and epistasis effects up to the fourth-order.
Subscript in Equations (1) and (2)HDLLDLTCTGHToWTBMIo
A10.2410.2840.3310.2210.6480.4240.320
D20.0550.0340.0850.0790.1340.0710.012
A × A30.3560.4230.1450.1350.1920.1060.219
A × D40.1110.0000.0030.0000.0000.0570.126
D × D50.0010.0000.0010.0000.0000.0150.008
A × A × A60.0020.0000.0040.0000.0000.0170.000
A × A × D70.0020.0000.0010.0000.0000.0140.000
A × D × D80.0000.0000.0010.0000.0000.0080.000
D × D × D90.0000.0000.0000.0000.0000.0040.000
A × A × A × A100.0000.0000.0010.0000.0000.0040.000
A × A × A × D110.0000.0000.0000.0000.0000.0030.000
A × A × D × D120.0000.0000.0000.0000.0000.0020.000
A × D × D × D130.0000.0000.0000.0000.0000.0010.000
D × D × D × D140.0000.0000.0000.0000.0000.0010.000
Total heritability 0.7680.7420.5730.4370.9740.7280.686
Initial model A + D + AA + ADA + D + AAA + D + AAA + D + AAA + D + AAA + D + AA
+ AD + DD
+ AAA + AAD
A + D +
AA + AD
A is additive effect. D is dominance effect. A × A, A × D and D × D are second-order (pairwise) epistasis effects. A × A × A, A × A × D, A × D × D and D × D × D are third-order epistasis effects. A × A × A × A, A × A × A × D, A × A × D × D, A × D × D × D and D × D × D × D are fourth-order epistasis effects. Entries in italic are heritability estimates greater than 0.01 for the initial prediction models. HDL is the normality transformed high density lipoproteins. LDL is the normality transformed low density lipoproteins. TC is the normality transformed total cholesterol. TG is the normality transformed triglycerides. WT is the normality transformed weight (WT). HTo is the original phenotypic observations without normality transformation of height, BMIo is the original phenotypic observations without normality transformation of body mass index.

3.2. Accuracy of Epistasis Models for Predicting Phenotypic Values

The comparison of seven models with epistasis effects for WT showed the model with A × A × A and A × A × D had the same observed prediction accuracy as the five models with pairwise epistasis effects with observed prediction accuracy of 0.325, a 0.62% increase over the A + D SNP model (Table S1). These results indicated that A × A × A and A × A × D effects had no contribution to the observed accuracy of predicting WT. Among the five models with pairwise epistasis effects for WT, the A + AD and A + AD + DD models had the same prediction accuracy of 0.325, indicating D × D had no contribution to the prediction accuracy of WT. Therefore, the A + D + AD model was the best prediction model for WT with the smallest number of effect types. For BMIo, the A + AD and A + AD + DD models had the highest observed prediction accuracy of 0.325, indicating that D × D effects had the same contribution to the observed prediction accuracy of BMIo as the A × D (Table S2). Therefore, either A × D or D × D could be included in the model but including both were unnecessary, and we used the A + AD mode for BMIo. For HDL, the A + D + AA + AD model had the highest prediction accuracy of 0.297 (Table S3) and was the best epistasis model. Among the seven traits, HDL, WT and BMIo were the only three traits that had A × D effects in the final prediction models, and WT and BMIo were the only two traits that did not have A × A effects in the prediction models. These prediction models with A×D effects indicated that the genetic mechanism of HDL, WT and BMIo likely was more complex than that of LDL, TC and HTo.
The final epistasis models with the highest prediction accuracy and smallest number of effect types from 10-fold cross-validations for the seven traits are summarized in Table 2 and Figure 1. Of these seven traits, epistasis effects achieved accuracy increases over the SNP model for predicting the phenotypic values for six of the seven traits. Among these seven traits, LDL and HDL had the largest accuracy increases of 3.85% and 2.41% respectively, and four traits had minor accuracy increases for four traits, TC, HTo, WT and BMIo, with accuracy increases of 0.70%, 0.97%, 0.62% and 0.93% respectively. However, epistasis effects (A × A) decreased the prediction accuracy for TG (−2.35%) for unknown reasons. Compared to the observed prediction accuracy of the best haplotype models, none of the epistasis models reached the prediction accuracy of the best haplotype models (Table 2, Figure 1), showing that epistasis effects were not as accurate as haplotype effects for predicting the seven human phenotypes, or, local high-order epistasis effects contained in haplotypes were more important than global low-order epistasis effects. This conclusion was further confirmed by the results of the integrated prediction models with both epistasis and haplotype effects.

3.3. Accuracy and Heritability Estimates of Integrated Models with Epistasis and Haplotype Effects

The integrated model for each trait includes the haplotypes of the best haplotype blocking methods identified from our previous study [8] and the best epistasis model identified in this study. The results showed that the integration of epistasis and haplotype effects slightly increased the prediction accuracy over the best haplotype model for two traits, HDL (3.10% accuracy increase of the integrated model versus 2.76% accuracy increase of the best haplotype model), and BMIo (2.48% versus 2.17%), but did not improve the prediction accuracy of the haplotype models for five traits, LDL (7.69% versus 8.12%), TC (3.15% versus 3.15%), TG (0% versus 3.29%), HTo (2.18% versus 2.18%), and WT (1.86% versus 1.86%) (Table 2). Haplotype additive effects almost accounted for all the SNP additive heritability for three traits, with additive heritability reduced to 0.01 from 0.286 for LDL, from 0.327 for TC and from 0.460 for WT, whereas the haplotype additive heritability was 0.517 for LDL, 0.403 for TC, and 0.427 for WT (Table 3, or Table S4 with standard deviations). Haplotype additive effects also accounted for most of the A × A heritability for the five traits in the integrated model, with A × A heritability reduced to 0.042, 0.134, 0.017, 0.082 and 0.001 from 0.358, 0.429. 0.162, 0.171 and 0.183 for HDL, LDL, TC, TG and HTo, respectively (Table 3). These results of integrated models further confirmed that local high-order epistasis effects in haplotypes were more important than global pairwise epistasis effects for all seven human phenotypes in this study.

4. Conclusions

Heritability estimates and the accuracy of genomic prediction of seven human phenotypes showed that global fourth-order epistasis effects virtually had no contribution to the phenotypic variances, and global third-order epistasis effects did not affect the prediction accuracy. Global low-order (pairwise) epistasis effects improved the accuracy of predicting six of the seven phenotypes but not as much as the accuracy of haplotype effects. The integrated models with global low-order (pairwise) epistasis effect and haplotype effects had slightly higher prediction accuracy than the haplotype models for two traits whereas the haplotype models had the better prediction accuracy for the remaining five traits. These results indicated that local high-order epistasis effects in haplotypes had a tendency to be more important than global pairwise epistasis effects for the seven human phenotypes.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom13101478/s1, Table S1: Prediction accuracy of alternative models for weight (WT) from 10-fold validations; Table S2: Prediction accuracy of alternative models for body mass index (BMIo) from 10-fold validations; Table S3: Prediction accuracy of alternative models for high density lipoproteins (HDL) from 10-fold validations; Table S4: Genomic heritability estimates with standard deviations from 10-fold validations.

Author Contributions

Conceptualization, YD; Methodology: Z.L., D.P. and Y.D.; software, Z.L. and D.P.; validation, Z.L.; formal analysis, Z.L.; writing—original draft preparation, Y.D. and Z.L..; writing—review and editing, Y.D., Z.L. and D.P.; visualization, Z.L. and Y.D.; supervision, Y.D.; project administration, Y.D.; funding acquisition, Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Institutes of Health’s National Human Genome Research Institute, grant R01HG012425 as part of the NSF/NIH Enabling Discovery through GEnomics (EDGE) Program; grant 2020-67015-31133 from the USDA National Institute of Food and Agriculture; and project MIN-16-124 of the Agricultural Experiment Station at the University of Minnesota. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

The use of the Framingham Heart Study data in this study was approved by the Institutional Review Board (IRB ID 1305M32563).

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study are available from dbGaP, and data access requires approval from the National Heart, Lung, and Blood Institute (NHLBI).

Acknowledgments

The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. Supercomputer computing time and storage for data analysis were provided by the Minnesota Supercomputer Institute at the University of Minnesota.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Derbyshire, M.C.; Khentry, Y.; Severn-Ellis, A.; Mwape, V.; Saad, N.S.M.; Newman, T.E.; Taiwo, A.; Regmi, R.; Buchwaldt, L.; Denton-Giles, M. Modeling first order additive× additive epistasis improves accuracy of genomic prediction for sclerotinia stem rot resistance in canola. Plant Genome 2021, 14, e20088. [Google Scholar] [CrossRef] [PubMed]
  2. Liang, Z.; Prakapenka, D.; Parker Gaddis, K.L.; VandeHaar, M.J.; Weigel, K.A.; Tempelman, R.J.; Koltes, J.E.; Santos, J.E.P.; White, H.M.; Peñagaricano, F. Impact of epistasis effects on the accuracy of predicting phenotypic values of residual feed intake in U. S Holstein cows. Front. Genet. 2022, 13, 1017490. [Google Scholar] [CrossRef] [PubMed]
  3. Vojgani, E.; Pook, T.; Martini, J.W.; Hölker, A.C.; Mayer, M.; Schön, C.-C.; Simianer, H. Accounting for epistasis improves genomic prediction of phenotypes with univariate and bivariate models across environments. Theor. Appl. Genet. 2021, 134, 2913–2930. [Google Scholar] [CrossRef] [PubMed]
  4. Calleja-Rodriguez, A.; Chen, Z.; Suontama, M.; Pan, J.; Wu, H.X. Genomic predictions with nonadditive effects improved estimates of additive effects and predictions of total genetic values in Pinus sylvestris. Front. Plant Sci. 2021, 12, 666820. [Google Scholar] [CrossRef] [PubMed]
  5. Akdemir, D.; Jannink, J.-L. Locally epistatic genomic relationship matrices for genomic association and prediction. Genetics 2015, 199, 857–871. [Google Scholar] [CrossRef] [PubMed]
  6. Won, S.; Park, J.-E.; Son, J.-H.; Lee, S.-H.; Park, B.H.; Park, M.; Park, W.-C.; Chai, H.-H.; Kim, H.; Lee, J. Genomic Prediction Accuracy Using Haplotypes Defined by Size and Hierarchical Clustering Based on Linkage Disequilibrium. Front. Genet. 2020, 11, 134. [Google Scholar] [CrossRef] [PubMed]
  7. Gao, N.; Martini, J.W.; Zhang, Z.; Yuan, X.; Zhang, H.; Simianer, H.; Li, J. Incorporating gene annotation into genomic prediction of complex phenotypes. Genetics 2017, 207, 489–501. [Google Scholar] [CrossRef] [PubMed]
  8. Liang, Z.; Tan, C.; Prakapenka, D.; Ma, L.; Da, Y. Haplotype analysis of genomic prediction using structural and functional genomic information for seven human phenotypes. Front. Genet. 2020, 11, 588907. [Google Scholar] [CrossRef] [PubMed]
  9. Difabachew, Y.F.; Frisch, M.; Langstroff, A.L.; Stahl, A.; Wittkop, B.; Snowdon, R.J.; Koch, M.; Kirchhoff, M.; Cselényi, L.; Wolf, M. Genomic prediction with haplotype blocks in wheat. Front. Plant Sci. 2023, 14, 1168547. [Google Scholar] [CrossRef] [PubMed]
  10. Alemu, A.; Batista, L.; Singh, P.K.; Ceplitis, A.; Chawade, A. Haplotype-tagged SNPs improve genomic prediction accuracy for Fusarium head blight resistance and yield-related traits in wheat. Theor. Appl. Genet. 2023, 136, 92. [Google Scholar] [CrossRef] [PubMed]
  11. Da, Y.; Liang, Z.; Prakapenka, D. Multifactorial methods integrating haplotype and epistasis effects for genomic estimation and prediction of quantitative traits. Front. Genet. 2022, 13, 922369. [Google Scholar] [CrossRef] [PubMed]
  12. Henderson, C. Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. J. Anim. Sci. 1985, 60, 111–117. [Google Scholar] [CrossRef]
  13. Su, G.; Christensen, O.F.; Ostersen, T.; Henryon, M.; Lund, M.S. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS ONE 2012, 7, e45293. [Google Scholar] [CrossRef] [PubMed]
  14. Muñoz, P.R.; Resende, M.F.; Gezan, S.A.; Resende, M.D.V.; de los Campos, G.; Kirst, M.; Huber, D.; Peter, G.F. Unraveling additive from nonadditive effects using genomic relationship matrices. Genetics 2014, 198, 1759–1768. [Google Scholar] [CrossRef] [PubMed]
  15. Vitezica, Z.G.; Legarra, A.; Toro, M.A.; Varona, L. Orthogonal estimates of variances for additive, dominance, and epistatic effects in populations. Genetics 2017, 206, 1297–1307. [Google Scholar] [CrossRef] [PubMed]
  16. Jiang, Y.; Reif, J.C. Efficient algorithms for calculating epistatic genomic relationship matrices. Genetics 2020, 216, 651–669. [Google Scholar] [CrossRef] [PubMed]
  17. Legarra, A.; Robert-Granié, C.; Manfredi, E.; Elsen, J.-M. Performance of genomic selection in mice. Genetics 2008, 180, 611–618. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Accuracy of four prediction models for predicting phenotypic values of seven human phenotypes from 10-fold cross-validation studies. The accuracy measure was the correlation between the GBLUP genetic values and the phenotypic values in each validation population and then averaged over all validation populations (Equation (4)). Each error bar was one standard deviation of the prediction accuracies above and below each average.
Figure 1. Accuracy of four prediction models for predicting phenotypic values of seven human phenotypes from 10-fold cross-validation studies. The accuracy measure was the correlation between the GBLUP genetic values and the phenotypic values in each validation population and then averaged over all validation populations (Equation (4)). Each error bar was one standard deviation of the prediction accuracies above and below each average.
Biomolecules 13 01478 g001
Table 2. Accuracy of different models for predicting the phenotypic values in the validation populations.
Table 2. Accuracy of different models for predicting the phenotypic values in the validation populations.
HDLLDLTCTGHTOWTBMIO
SNP model (A + D)
Prediction accuracy0.2900.2340.286 0.2130.4130.3230.322
Global epistasis model
Prediction modelA + D + AA + ADA + D + AAA + D + AAA + D + AAA + D + AAA + D + ADA + AD
Prediction accuracy0.2970.2430.2880.208 0.4170.3250.325
Accuracy increase over SNP model (%)2.413.850.70−2.350.970.620.93
Haplotype model [8]
Prediction modelA + D + HHD + HHA + D + HA + D + HA + H
Haplotype blocking methodGenes12 SNPs50 Kb50 Kb200 Kb12 SNPs100 Kb
Prediction accuracy0.2980.2530.2950.2200.4220.3290.329
Accuracy increase over SNP model (%)2.768.123.153.292.181.862.17
Integrated model with epistasis and haplotype effects
Prediction modelA + D + AA + AD
+ H
A + D + AA
+ H
A + D + AA
+ H
A + D + AA
+ H
A + D + AA
+ H
A + D + AD
+ H
A + AD
+ H
Prediction accuracy0.2990.2520.2950.2130.4220.3290.330
Accuracy increase over SNP model (%)3.107.693.1502.181.862.48
Prediction accuracy is observed accuracy of predicting phenotypic values (Equation (4)). A is SNP additive values. D is SNP dominance values. H is haplotype additive values. AA is A×A values. AD is A×D values.
Table 3. Genomic heritability estimates as averages from 10-fold validations.
Table 3. Genomic heritability estimates as averages from 10-fold validations.
HDLLDLTCTGHToWTBMIo
SNP model (A + D)
A0.3860.4080.3890.2600.7400.4720.415
D0.1240.1770.1040.1240.2020.0930.046
Total0.5100.5850.4930.3830.9420.5650.462
Global epistasis model
ModelA + D + AA + ADA + D + AAA + D + AAA + D + AAA + D + AAA + D + ADA + AD
A0.2430.2860.3270.2110.6540.4600.398
D0.0560.0370.0870.069 0.1370.064-
A × A0.3580.4290.1620.1710.183--
A × D0.115----0.1880.265
Total0.7710.7520.5760.4510.9750.7120.663
Integrated model with global epistasis effects and haplotype effects
ModelA + D + AA + AD + HA + D + AA + HA + D + AA + HA + D + AA + HA + D + AA + HA + D + AD + HA + AD + H
A0.1020.0100.0100.0720.3610.0100.119
D0.0410.0400.0780.0620.1450.043-
A × A0.0420.1340.0170.0820.001--
A × D0.264----0.1050.135
H0.3320.5170.4030.2060.4940.4270.356
Total0.7800.7020.5380.421 1.00.6870.61
A is SNP additive values. D is SNP dominance values. H is haplotype additive values. AA is A × A values. AD is A × D values.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, Z.; Prakapenka, D.; Da, Y. Comparison of the Accuracy of Epistasis and Haplotype Models for Genomic Prediction of Seven Human Phenotypes. Biomolecules 2023, 13, 1478. https://doi.org/10.3390/biom13101478

AMA Style

Liang Z, Prakapenka D, Da Y. Comparison of the Accuracy of Epistasis and Haplotype Models for Genomic Prediction of Seven Human Phenotypes. Biomolecules. 2023; 13(10):1478. https://doi.org/10.3390/biom13101478

Chicago/Turabian Style

Liang, Zuoxiang, Dzianis Prakapenka, and Yang Da. 2023. "Comparison of the Accuracy of Epistasis and Haplotype Models for Genomic Prediction of Seven Human Phenotypes" Biomolecules 13, no. 10: 1478. https://doi.org/10.3390/biom13101478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop