Next Article in Journal
Analysis of the Mechanism of Wood Vinegar and Butyrolactone Promoting Rapeseed Growth and Improving Low-Temperature Stress Resistance Based on Transcriptome and Metabolomics
Previous Article in Journal
Identification of SYNJ1 in a Complex Case of Juvenile Parkinsonism Using a Multiomics Approach
Previous Article in Special Issue
Characterization and Functional Analysis of the 17-Beta Hydroxysteroid Dehydrogenase 2 (hsd17b2) Gene during Sex Reversal in the Ricefield Eel (Monopterus albus)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GWAS Enhances Genomic Prediction Accuracy of Caviar Yield, Caviar Color and Body Weight Traits in Sturgeons Using Whole-Genome Sequencing Data

1
Fisheries Science Institute, Beijing Academy of Agriculture and Forestry Sciences & Beijing Key Laboratory of Fisheries Biotechnology, Beijing 100068, China
2
Key Laboratory of Sturgeon Genetics and Breeding, Ministry of Agriculture and Rural Affairs, Hangzhou 311799, China
3
National Innovation Center for Digital Seed Industry, Beijing 100097, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2024, 25(17), 9756; https://doi.org/10.3390/ijms25179756
Submission received: 3 August 2024 / Revised: 5 September 2024 / Accepted: 7 September 2024 / Published: 9 September 2024

Abstract

:
Caviar yield, caviar color, and body weight are crucial economic traits in sturgeon breeding. Understanding the molecular mechanisms behind these traits is essential for their genetic improvement. In this study, we performed whole-genome sequencing on 673 Russian sturgeons, renowned for their high-quality caviar. With an average sequencing depth of 13.69×, we obtained approximately 10.41 million high-quality single nucleotide polymorphisms (SNPs). Using a genome-wide association study (GWAS) with a single-marker regression model, we identified SNPs and genes associated with these traits. Our findings revealed several candidate genes for each trait: caviar yield: TFAP2A, RPS6KA3, CRB3, TUBB, H2AFX, morc3, BAG1, RANBP2, PLA2G1B, and NYAP1; caviar color: NFX1, OTULIN, SRFBP1, PLEK, INHBA, and NARS; body weight: ACVR1, HTR4, fmnl2, INSIG2, GPD2, ACVR1C, TANC1, KCNH7, SLC16A13, XKR4, GALR2, RPL39, ACVR2A, ADCY10, and ZEB2. Additionally, using the genomic feature BLUP (GFBLUP) method, which combines linkage disequilibrium (LD) pruning markers with GWAS prior information, we improved genomic prediction accuracy by 2%, 1.9%, and 3.1% for caviar yield, caviar color, and body weight traits, respectively, compared to the GBLUP method. In conclusion, this study enhances our understanding of the genetic mechanisms underlying caviar yield, caviar color, and body weight traits in sturgeons, providing opportunities for genetic improvement of these traits through genomic selection.

1. Introduction

Sturgeons, with 27 species distributed across the Northern Hemisphere, are ancient fish that represent a remarkable evolutionary relic, often referred to as “living fossils” [1]. All of these species are listed as Appendix II species under the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In the sturgeon breeding programs, there are three key traits, including caviar yield, caviar color, and body weight. Caviar yield has a direct correlation with caviar and fry production, leading to significant demand from enterprises for sturgeon breeding populations with high caviar yields. Among the range of caviar products, golden caviar is more costly than caviar of other colors, possibly due to the association of gold with luxury and quality. Additionally, sturgeon is highly valued for its meat quality, and cultivating a fast-growing sturgeon breeding population can expedite sturgeon production in the meat industry market. Therefore, it is recommended that caviar yield, caviar color, and body weight should be the primary objectives of sturgeon cultivation in China. Due to the late sexual maturity of sturgeon, which typically takes about 6–8 years, the breeding cycle is prolonged, and traditional pedigree-based methods suffer from lengthy generation intervals and low efficiency. Therefore, molecular marker-based breeding, especially genomic selection [2], emerges as an effective approach to expedite genetic progress in sturgeon breeding. However, at present, no reports exist on comprehensive genome-wide scanning for key molecular markers associated with all three traits based on population size.
Currently, with the rapid development of whole-genome sequencing technology and the reduction of costs, GWAS has gradually become a mainstream strategy for genetic analysis and identification of important candidate genes related to economic traits in livestock [3], plants [4], and aquatic animals [5]. In aquaculture, GWAS have been used for genetic dissection of meat quality in common carp [6], Atlantic salmon [7] and large yellow croaker [8], growth in catfish [9,10], disease resistance in Atlantic salmon [11] and large yellow croaker [12,13]. However, there have been no reports on GWAS for traits such as caviar yield, caviar color, and body weight in sturgeons.
The concept of genomic selection (GS) was initially proposed by Meuwissen et al. [2] in 2001. This method involves deriving genomic estimated breeding values (GEBV) from high-density markers across the entire genome, premised on the assumption that at least one SNP is in LD with quantitative trait loci (QTLs) affecting the target trait. In recent years, a growing body of research on GS in aquaculture animals has positioned it as a cutting-edge technology in aquaculture breeding, highlighting its potential to expedite breeding cycles and reduce associated costs. Enhancing the accuracy of genomic prediction is a prevalent challenge in GS, and incorporating prior information from GWAS to enhance this accuracy has been reported in rainbow trout [14], dairy cattle [15] and pigs [16]. However, many studies have found that incorporating GWAS prior information does not improve the accuracy of genomic prediction [17,18,19]. Therefore, this study implemented a unique strategy that combines LD-pruned markers and GWAS prior information to improve the accuracy of genomic prediction for caviar yield, caviar color, and body weight traits in sturgeons.

2. Results

2.1. Whole-Genome Sequencing and SNP Calling

Whole-genome sequencing was conducted for 673 fish, yielding a total of 67.84 billion reads with an average of 0.10 billion reads per individual. Among these, 92.60% of reads successfully aligned to the reference genome, resulting in an average sequencing depth of 13.69× for 673 individuals (ranging from 5.06× to 25.85×). After stringent quality control, a total of 10.41 million SNPs were identified. Figure 1 illustrates the histogram of SNP distribution and SNP density plots across all chromosomes. The number of high-quality SNPs per chromosome varied from 385 (Chr60) to 794,151 (Chr1) (Figure 1B), with an average density of 5345.31 SNPs/Mb (Figure 1A).

2.2. Population Structure Analysis

Through the first three principal component analyses (Figure 1C), it can be observed that individuals have similar genetic backgrounds in the Russian sturgeon population, which is beneficial for conducting GWAS. The pattern of LD, as depicted in Figure 1D, indicates that the average genome-wide LD (r2) obtained based on adjacent pairs of markers was 0.049 and the LD decay was 20 kb at r2 = 0.05, suggesting that candidate genes can be effectively mapped in GWAS results by setting the region of 20 kb upstream and downstream of significant SNPs.

2.3. Phenotype Statistics and Heritability Estimation

Descriptive statistical data for the analysis of traits in the Russian sturgeon population are shown in Table 1. The mean (standard deviation) caviar yield, caviar color, and body weight were 0.19 (0.057), 2.453 (0.653), and 19.933 (4.029), respectively. Coefficients of variation were high for caviar yield, caviar color, and body weight, 30.00%, 26.62%, and 20.21%, respectively. In addition, as shown in Table 2, SNP-based heritability was estimated through genome-wide association analysis, with heritabilities for caviar yield, caviar color, and weight being 0.497, 0.614, and 0.627, respectively, indicating moderate to high levels of heritability for each trait, which is advantageous for selective breeding programs.

2.4. Genome-Wide Association Study

Manhattan plots of GWAS for the three traits and corresponding QQ plots are shown in Figure 2. For caviar yield, no genome-wide significant SNPs were detected, and 31 SNPs reached the suggestive significant level (Figure 2A). The 31 suggestive significant SNPs were located on chromosomes 1, 2, 4, 6, 8, 9, 10, 12, 19, 22, 44, 46, and 56. For caviar color (Figure 2C), 1 genome-wide significant SNP and 36 suggestive significant SNPs were observed. The 1 genome-wide significant SNP was located on chromosome 22, and the 36 suggestive significant SNPs were located on chromosomes 1–7, 9, 12, 15, 17–22, 25, 36, 39, 45, and 51. For body weight (Figure 2E), 1 genome-wide significant SNP was detected, it was located on chromosome 12, and 225 SNPs at the suggestive significant level were observed, with 198 SNPs located on chromosome 12 and the remaining SNPs located on chromosomes 1, 2, 4–7, 10, 15–17, 21, 37, 41, 45, and 56.
The QQ plots show that the influence of population stratification was negligible (Figure 2B–D). Moreover, the average genomic inflation factors λ for the three traits were close to 1 (0.99, 0.98, and 0.99 for caviar yield, caviar color, and body weight, respectively). The QQ plots λ suggest that there were little or no residual population structure effects on the test statistic inflation.

2.5. Identification of Candidate Genes

GWAS based on whole-genome sequencing data were used to detect candidate functional genes. Based on the functional annotation analysis, candidate genes were detected within a 20-kb region, centering each significant and suggestive SNPs. As shown in Table 3, 29 genes were found for caviar yield, of which 10 genes were potential candidate genes. For caviar color (Table 4), 22 genes were detected, and 6 genes had functions related to caviar color. For body weight (Table 5), 77 genes were detected, of which 15 genes were potential candidate genes.

2.6. Genomic Prediction Performance

To assess the effect of incorporating GWAS results on genomic prediction, the accuracy of genomic prediction for caviar yield, caviar color, and body weight traits was evaluated using the GBLUP, GLDBLUP, and GFBLUP methods, as shown in Figure 3. GBLUP and GLDBLUP produced similar predictive accuracy, demonstrating that reducing SNP density to 50 K by LD pruning can yield prediction accuracy comparable to utilizing all markers. Additionally, GFBLUP produced the highest predictive accuracy in all cases, with GFBLUP improving by 2%, 1.9%, and 3.1% over GBLUP for caviar yield, caviar color, and body weight, respectively. For prediction bias, as shown in Figure 3, GFBLUP produces similar or lower prediction bias compared to GBLUP and GLDBLUP methods, e.g., for the body weight trait, the prediction biases of GFBLUP, GBLUP, and GLDBLUP are 0.269, 0.325, and 0.323, respectively. For MSE, GLDBLUP and GFBLUP produced lower values than GBLUP for caviar yield, while for the other two traits, all three methods produced similar MSE. Additionally, all three methods produced similar MAE in all cases.

3. Discussion

In this study, we conducted GWAS and identified several candidate genes related to caviar yield, caviar color, and body weight in Russian sturgeon. Furthermore, to verify the reliability of GWAS results, we evaluated the accuracy of genomic prediction for the three traits by combining LD pruning markers and GWAS prior information. The result showed that combining LD-pruned markers and GWAS prior information could improve the accuracy of genomic prediction for caviar yield, caviar color, and body weight traits in sturgeons.

3.1. Potential Candidate Genes for Caviar Yield

For caviar yield, a number of candidate genes located within 20 kb of genome-wide significant and suggestive significant SNPs were identified in both lines. Among them, the TFAP2A gene plays a vital role in mouse oocyte maturation [20]. Overexpression of TFAP2A may upregulate p300, increasing levels of histone acetylation and lactylation, which in turn impede spindle assembly and chromosome alignment, ultimately hindering nuclear meiotic division in mouse oocytes [20]. Niu et al. [21] reported that the RPS6KA3 gene was associated with reproduction pathways in Xiang pigs. The presence of CRB3 in many organs and its distribution pattern during mouse embryonic development suggest that the CRB3 plays a significant role in establishing and maintaining polarity in mouse embryos [22]. For TUBB gene, Zhao et al. [23] reported that TUBB regulates spindle assembly and chromosome dynamics during mouse oocyte maturation. A study showed a role for the H2AFX gene in germ cell loss, and histone H2AFX links meiotic chromosome asynapsis to prophase I oocyte loss in mammals [24]. The morc3 gene was related to the regulation of animal reproduction, and the deletion of morc3 reduced the pregnancy rate of male mice and led to low fertility [25]. The BAG1 gene was found to have potential efficacy in terms of ameliorating oocyte maturation [26]. A study showed that RANBP2 acts as an inhibitor of premature maturation-promoting factor activation and the untimely degradation of securin in oocyte maturation, thereby preserving the accurate timing of the resumption of maturation and meiotic progression in mouse oocytes [27]. The PLA2G1B gene was found to be possible a newly discovered component affecting the efficacy of horse IVM/IVF [28]. A study observed that NYAP1 plays a key role in ovarian development by regulating target genes related to the oxytocin signaling pathway, and its differential expression level in Han sheep may contribute to improving fecundity [29].

3.2. Potential Candidate Genes for Caviar Color

For caviar color, within the range of 20 kb of the genome wide significant and suggestive significant SNPs, only two genes, SRFBP1 and INHBA, have been reported to be directly associated with pigment formation. The SRFBP1 gene was reported to be associated with skin pigmentation in an Ogye x White Leghorn F2 chicken population [30]. The INHBA gene strongly controls skin pigmentation and also influences serum vitamin D levels in African Americans [31]. Surprisingly, the functions of other genes identified are directly related to immunity rather than pigment formation. Among them, the NFX1 protein was found to encode a repressor of gene expression, suggesting that NFX1 limits the immune response following infection [32]. Fiil et al. [33] reported that OTULIN restricts Met1-Ub formation after immune receptor stimulation to prevent unwarranted proinflammatory signaling. The PLEK gene was related to the immune system, suggesting an inactive immune regulation [34]. The NARS gene plays a role in oxidative stress/hypoxia and endoplasmic reticulum stress/unfolded protein response, and its mutation leads to melanoma susceptibility [35]. This suggests that the immune response, as a protective mechanism, will indirectly lead to the formation of pigment. Similar results have been reported in a large number of studies, e.g., Linher-Melville and Li [36] demonstrated that the melanocytes could swallow exogenous beads and then recruit immune cells to protect from injury in zebrafish (Danio rerio). Similarly, the INHBA gene participates in the biological processes related to pigmentation [31] and also participates in the biological processes significantly related to hematopoiesis and immune system [34].

3.3. Potential Candidate Genes for Body Weight

For body weight, 15 potential candidate genes have been identified within the range of 20 kb of the genome-wide significant and suggestive significant SNPs. The ACVR1 gene was identified in multiple regions and belongs to the transforming growth factor (TGF)-β superfamily, which can inhibit muscle differentiation [37]. Zhao et al. [38] reported that the ACVR1 gene might contribute to later myogenesis and more muscle fibers in Landrace (LR, lean) than Lantang (LT, obese) pig breeds. A study indicated that a synonymous mutation g.101220 C > T located on the fifth intron of the ovis HTR4 gene was detected, and association analysis showed that this mutation was significantly associated with growth traits in sheep [39]. The fmnl2 gene is a candidate gene responsible for facioscapulohumeral muscular dystrophy, and it is critical for muscle development [40]. The polymorphism of the INSIG2 gene is associated with increased subcutaneous fat in women and poor resistance training response in men [41]. The GPD2 gene could catalyze the esterification of fatty acids to triglycerides [42]. ACVR1C is one of the type I transforming growth factor-β (TGF-β) receptors, and can be used as an adipocyte developmental marker [43]. The TANC1 gene is essential for mammalian myoblast fusion [44]. Xie et al. [45] reported that KCNH7 is the candidate gene related to growth in Licha Black Pig. A study showed that loss of the SLC16A13 gene increases mitochondrial respiration in the liver, leading to reduced hepatic lipid accumulation and increased hepatic insulin sensitivity in high-fat diet-fed SLC16A13 knockout mice [46]. The XKR4 gene is related to feed intake and average daily gain of cattle [47]. SNPs near the XKR4 gene are also associated with subcutaneous, which has been considered as a candidate for carcass traits [48]. The GALR2 gene is a regulator of insulin resistance, and activation of GALR2 represents a promising strategy against obesity-induced insulin resistance [49]. The RPL39 is a crucial candidate gene associated with growth in farm animals [50]. Goh et al. [51] reported that ACVR2A directly and negatively regulates osteoblasts’ bone mass through activin receptor signaling. Dong et al. [52] reported that the ADCY10 gene could be one of the key regulating switches for the energy metabolism in Yili goose. The ZEB2 gene was also reported to be associated with body weight in Hu sheep [53].

3.4. Genomic Prediction Incorporating GWAS Prior Information

Whole-genome sequencing data includes most causal mutations that affect traits of interest, making genomic prediction less limited by the LD between SNPs and causal mutations. Simulation studies have shown that whole-genome sequencing data can improve the accuracy of genomic prediction within populations by 40% [54]. However, a substantial amount of empirical data suggests that whole-genome sequencing does not always provide greater prediction accuracy compared to SNP chips [17]. The primary reason is the presence of a large number of noisy loci in the genome, which adversely affect the accuracy of genomic prediction. Therefore, some studies have reported that LD pruning of whole genome sequencing data can reduce the number of noisy loci and improve the accuracy of genomic prediction [17,55,56]. However, our previous research has shown that using LD pruning to reduce SNP density to different levels cannot necessarily improve the accuracy of genomic prediction [57]. One possible reason is that while noisy loci are removed, functional loci may also be inadvertently eliminated, resulting in an inability to enhance prediction accuracy. In addition, there have been reports on using GWAS priors to improve the accuracy of genomic prediction [14,15,16], e.g., Yoshida and Yáñez [14] reported that the accuracy of genomic prediction can be improved using preselected variants from GWAS for growth under chronic thermal stress in rainbow trout. However, many studies have reported that utilizing prior information from GWAS does not improve the accuracy of genomic prediction [17,18,19]. This may be because, although functional sites are included in the genome, noisy sites have not been effectively removed, resulting in an inability to enhance prediction accuracy. Therefore, this study identified the advantages of both methods. Firstly, noisy loci were removed by performing LD pruning on whole genome sequencing data. Then, functional loci were screened using GWAS based on whole genome sequencing and combined with LD-pruned loci. The results showed that all three traits—caviar yield, caviar color, and body weight—could achieve improved accuracy in genomic prediction, further verifying the reliability of the GWAS results in this study. This study provides a new approach for enhancing the accuracy of genomic prediction based on whole-genome sequencing data.

4. Materials and Methods

4.1. Population and Phenotyping Measurement

The Russian sturgeons used in this study were from Hangzhou Qiandaohu Xunlong Sci-tech Co., Ltd. (Hangzhou, China). Details regarding fish rearing and phenotyping procedures have been provided in our previous study [58]. In 2012, 6 dams and 26 sires were artificially inseminated to create 26 full-sib families. At the age of 8, the developmental status of fish roe was assessed using in vitro puncture. Fish with an average roe diameter exceeding 2.8 mm were individually tagged with passive integrated transponder (PIT) electronic markers, and a fin sample was collected and preserved in absolute ethanol. Subsequently, these tagged fish were processed for caviar production at Hangzhou Qiandaohu Xunlong Sci-tech Co., Ltd. The body weight (BW), total caviar weight (CW), and caviar color (CC) of each fish were recorded. Caviar yield (CY) was calculated relative to the female body weight using the formula CY = CW/BW. A subjective color score for the caviar was assigned based on color depth, ranging from 1 to 4, with gold receiving a score of 4, light as 3, middle as 2, and black as 1. All caviar color scores were recorded by the same operator, who used the image as a reference guide for classification. In total, 673 fish with phenotype records were selected for subsequent analysis. The descriptive statistics of phenotypes are presented in Table 1.

4.2. Whole-Genome Sequencing

Genomic DNA extraction followed the standard phenol-chloroform method. Whole-genome sequencing was conducted for 673 fish on the BGI-T7 platform. Libraries were constructed, and sequencing was performed using 150 bp paired-end reads on DNBSEQ-T7 (MGI Technology Co., Ltd., Shenzhen, China). After sequencing, raw reads with a minimum average quality greater than 20 were subjected to trimming. Reads passing the filtering step were aligned against the reference genome of sterlet (Acipenser ruthenus) assembly ASM1064508v1 [1] using the Burrows-Wheeler Alignment (BWA, version 0.7.17) [59]. The alignment files were then converted to BAM format using SAMtools (version 1.2) [60]. To eliminate potential PCR duplicates, Picard MarkDuplicates (http://broadinstitute.github.io/picard/, accessed on 26 October 2023) was employed. SNP calling was performed using the UnifiedGenotyper utility of GATK (version 3.5) [61]. Variants were subsequently filtered using GATK Variant Filtration (version 3.5) with the following criteria: DP (Depth) ≥ 4, FS (FisherStrand) < 60, QUAL (Quality) ≥ 50, and QD (Quality by Depth) ≥ 2.0. Further details on the 673 sequenced fish are available in Supplementary Table S1.

4.3. Genotype Imputation and Population Structure Analysis

Imputation for missing genotypes of whole-genome sequencing data was performed with Beagle (version 4.1) [62]. Variants with a minor allele frequency (MAF) lower than 0.05 and deviation from the Hardy-Weinberg equilibrium (HWE) (p value < 10−7) were excluded using the PLINK software (version 1.90) [63]. Furthermore, due to the high level of LD in the genome, most SNPs are redundant; LD pruning was performed using PLINK [63] to remove variants in high LD (r2 > 0.9). After LD pruning, 10,409,793 SNPs were retained for the whole-genome sequencing data. Principal component analysis (PCA) was performed on the genomic relationship matrix using GCTA software (version 1.25.3) [64]. This resulted in a matrix of eigenvectors in descending order that represented principal components (PCs), where PC1 had the largest eigenvalue. The overall structuring of genetic variation was visualized in a scatterplot of the top few PCs. LD between a pair of SNPs was measured as r2, and LD decay analysis based on r2 was conducted using PopLDdecay (version 3.42) [65] to assess LD patterns.

4.4. Genome-Wide Association Study

A single-marker regression model was implemented to detect the association of SNP with caviar yield, caviar color, and body weight traits. The model includes a random polygenic effect to account for shared genetic effects of related individuals and to control population stratification. The statistical model is described below:
y = 1μ + bx + Zg + e,
in which y is the vector of phenotypes; 1 is a vector of ones; μ is the overall mean; b is the average effect of the gene substitution of a particular SNP; x is a vector of the SNP genotype (coded as 0, 1, or 2); g is a vector of random polygenic effects with a normal distribution g ~ N(0, Gσa2), in which σa2 is the polygenic variance and G is the genomic relationship matrix and was constructed using all markers following VanRaden [66]; Z is an incidence matrix relating phenotypes to the corresponding random polygenic effects; and e is a vector of residual effects with a normal distribution N(0, Iσe2), in which σe2 is the residual variance. The software GCTA (version 1.25.3) [64] was used to fit the model.
In order to control false positives, we used 5 × 10−8 as a genome-wide significance level, which was also applied in human GWAS [67]. We adopted 5 × 10−6 as the suggestive level [68]. The Manhattan and quantile-quantile (QQ) plots were drawn with the CMplot package (https://github.com/YinLiLin/R-CMplot, accessed on 5 February 2024) in R (http://www.r-project.org/, accessed on 5 February 2024).

4.5. Functional Genomic Analysis

Functional annotation of all coding genes of Acipenser ruthenus was performed using eggNOG-mapper (version 2) [69], which offers higher accuracy compared to traditional sequence similarity search methods such as BLAST search, as it avoids annotating from collateral homology. Genes located in the region between the 20 kb upstream and 20 kb downstream of the significant and suggestive SNPs were retrieved for data mining.

4.6. Genomic Prediction Incorporating GWAS Prior Information

In order to evaluate the genomic prediction effect of caviar yield, caviar color, and body weight traits, the genomic best linear unbiased prediction (GBLUP) based on the genomic relationship matrix and genomic feature BLUP (GFBLUP) including GWAS prior information were implemented to predict GEBV for each genotyped individual.

4.6.1. GBLUP

The GBLUP [66] model was used to predict the GEBV of all genotyped individuals:
y = 1 μ + Z g + e ,
where y is the vector of phenotypes, μ is the overall mean, 1 is a vector of ones, g is the vector of genomic breeding values, following a normal distribution of N ( 0 ,   G σ g 2 ), where σ g 2 is the additive genetic variance, and G is the marker-based genomic relationship matrix [66]. Z is an incidence matrix linking g to y and e is the vector of random errors, following a normal distribution of N ( 0 ,   I σ e 2 ), where σ e 2 is the residual variance.
For GBLUP, the G was constructed using whole-genome sequencing markers. According to our previous study [57], reducing SNP density to 50 K through LD pruning yielded similar prediction accuracy to using all markers. This method is termed GLDBLUP.

4.6.2. GFBLUP

The GFBLUP [70] model, which uses prior information about genomic features, is based on a linear mixed model with two random genomic effects:
y = 1 μ + Z f + Z r + e ,
where y , 1 , μ , and e are the same as in the GBLUP model, f is the vector of genomic values captured by genetic markers associated with a genomic feature of interest, following a normal distribution of N ( 0 ,   G f σ f 2 ); r is the vector of genomic effects captured by the remaining set of genetic markers, following a normal distribution N ( 0 ,   G r σ r 2 ), and Z is an incidence matrix that links f and r to y . Matrices G f and G r were constructed similarly to G , with G f based on significant genetic markers determined by FDR with 0.05. G r utilizes 50 K SNPs obtained through LD pruning, excluding the markers used in G f . It should be noted that GWAS analysis is based only on reference data.
To assess prediction efficiency, genomic prediction was carried out through 10-fold cross-validation (CV). The genotyped individuals were randomly split into ten folds, phenotypes from one-fold (validation population) were removed from the dataset, and the remaining folds (reference population) were used to predict the GEBV in the validation population. This 10-fold CV was replicated 20 times, resulting in 20 average accuracies of genomic prediction. The validation population was the same in each replicate of 10-fold CV for all the three methods, GBLUP, GLDBLUP, and GFBLUP. Prediction accuracy was calculated as the Pearson’s correlation between phenotypic values y and GEBV for the validation individuals, i.e., r(y, GEBV). The regression coefficient of y on GEBV was used to evaluate the bias of predictions, and the bias was expressed as the absolute value of the regression coefficient minus 1, i.e., abs(1- b(y, GEBV)). In addition, mean squared error (Mse) and mean absolute error (Mae) metrics were used to compare model performance. Mse (Mae) represented the average square (absolute) of the difference between y and GEBV centered on zero.

5. Conclusions

In this study, the GWAS based on whole-genome sequencing was performed for caviar yield, caviar color, and body weight in Russian sturgeon. Combining the results of GWAS and bioinformatics annotation analysis, 10 genes were identified as potential candidate genes associated with the caviar yield trait; 6 genes were considered potential candidate genes related to the caviar color trait; and 15 genes were detected as potential candidate genes related to the body weight trait. In addition, combining LD-pruned markers and GWAS prior information could improve the accuracy of genomic prediction for caviar yield, caviar color, and body weight traits in sturgeons. These findings provide valuable insights into the genetic mechanisms underlying these important traits and demonstrate the potential for their genetic improvement through advanced genomic selection methods. Future studies could further enhance this understanding by integrating advanced microscopical techniques, such as transmission electron microscopy, to provide a comprehensive morpho-functional analysis of Russian sturgeons.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms25179756/s1.

Author Contributions

Conceptualization, H.S. and H.H.; Methodology, H.S.; Writing—Original Draft Preparation, H.S.; Formal Analysis, T.D.; Data Curation, H.S., T.D., W.W., X.Y., C.G. and S.B.; Writing—Review and Editing, H.S., T.D. and H.H.; Supervision, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32341059, 32202915), the Beijing Natural Science Foundation (6222014), and the Reform and Development Project of the Fisheries Science Institute at the Beijing Academy of Agriculture and Forestry Sciences (JJPY-2025-01).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed during this study are available from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviation

SNPsSingle nucleotide polymorphisms
GWASGenome-wide association study
GFBLUPGenomic feature BLUP
LDLinkage disequilibrium
CITESConvention on International Trade in Endangered Species of Wild Fauna and Flora
GSGenomic selection
GEBVGenomic estimated breeding values
QTLsQuantitative trait locis
PITPassive integrated transponder
BWBody weight
CWCaviar weight
CCCaviar color
CYCaviar yield
DPDepth
QUALQuality
QDQuality by depth
MAFMinor allele frequency
HWEHardy-Weinberg equilibrium
PCAPrincipal component analysis
PCsPrincipal components
QQQuantile-quantile
GBLUPGenomic best linear unbiased prediction
CVCross-validation
MseMean squared error
MaeMean absolute error

References

  1. Du, K.; Stock, M.; Kneitz, S.; Klopp, C.; Woltering, J.M.; Adolfi, M.C.; Feron, R.; Prokopov, D.; Makunin, A.; Kichigin, I.; et al. The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization. Nat. Ecol. Evol. 2020, 4, 841–852. [Google Scholar] [CrossRef]
  2. Meuwissen, T.H.E.; Hayes, B.J.; Goddard, M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 2001, 157, 1819–1829. [Google Scholar] [CrossRef] [PubMed]
  3. Sharma, A.; Lee, J.S.; Dang, C.G.; Sudrajad, P.; Kim, H.C.; Yeon, S.H.; Kang, H.S.; Lee, S.H. Stories and Challenges of Genome Wide Association Studies in Livestock—A Review. Asian-Australas. J. Anim. Sci. 2015, 28, 1371–1379. [Google Scholar] [CrossRef] [PubMed]
  4. Tibbs Cortes, L.; Zhang, Z.; Yu, J. Status and prospects of genome-wide association studies in plants. Plant Genome 2021, 14, e20077. [Google Scholar] [CrossRef]
  5. Yáñez, J.M.; Barría, A.; López, M.E.; Moen, T.; Garcia, B.F.; Yoshida, G.M.; Xu, P. Genome-wide association and genomic selection in aquaculture. Rev. Aquacult. 2023, 15, 645–675. [Google Scholar] [CrossRef]
  6. Zheng, X.H.; Kuang, Y.Y.; Lv, W.H.; Cao, D.C.; Sun, Z.P.; Sun, X.W. Genome-Wide Association Study for Muscle Fat Content and Abdominal Fat Traits in Common Carp (Cyprinus carpio). PLoS ONE 2016, 11, e0169127. [Google Scholar] [CrossRef]
  7. Horn, S.S.; Ruyter, B.; Meuwissen, T.H.E.; Moghadam, H.; Hillestad, B.; Sonesson, A.K. GWAS identifies genetic variants associated with omega-3 fatty acid composition of Atlantic salmon fillets. Aquaculture 2020, 514, 734494. [Google Scholar] [CrossRef]
  8. Xiao, S.J.; Wang, P.P.; Dong, L.S.; Zhang, Y.G.; Han, Z.F.; Wang, Q.R.; Wang, Z.Y. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea. PeerJ 2016, 4, e2664. [Google Scholar] [CrossRef]
  9. Li, N.; Zhou, T.; Geng, X.; Jin, Y.L.; Wang, X.Z.; Liu, S.K.; Xu, X.Y.; Gao, D.Y.; Li, Q.; Liu, Z.J. Identification of novel genes significantly affecting growth in catfish through GWAS analysis. Mol. Genet. Genom. 2018, 293, 587–599. [Google Scholar] [CrossRef]
  10. Geng, X.; Liu, S.K.; Yao, J.; Bao, L.S.; Zhang, J.R.; Li, C.; Wang, R.J.; Sha, J.; Zeng, P.; Zhi, D.G.; et al. A Genome-Wide Association Study Identifies Multiple Regions Associated with Head Size in Catfish. G3 Genes Genomes Genet. 2016, 6, 3389–3398. [Google Scholar] [CrossRef]
  11. Robledo, D.; Matika, O.; Hamilton, A.; Houston, R.D. Genome-Wide Association and Genomic Selection for Resistance to Amoebic Gill Disease in Atlantic Salmon. G3 Genes Genomes Genet. 2018, 8, 1195–1203. [Google Scholar] [CrossRef]
  12. Wu, Y.; Zhou, Z.; Pan, Y.; Zhao, J.; Bai, H.; Chen, B.; Zhang, X.; Pu, F.; Chen, J.; Xu, P. GWAS identified candidate variants and genes associated with acute heat tolerance of large yellow croaker. Aquaculture 2021, 540, 736696. [Google Scholar] [CrossRef]
  13. Zeng, J.; Zhao, J.; Wang, J.; Bai, Y.; Long, F.; Deng, Y.; Jiang, P.; Xiao, J.; Qu, A.; Tong, B.; et al. Genetic linkage between swimming performance and disease resistance enables multitrait breeding strategies in large yellow croaker. Agric. Commun. 2023, 1, 100019. [Google Scholar] [CrossRef]
  14. Yoshida, G.M.; Yáñez, J.M. Increased accuracy of genomic predictions for growth under chronic thermal stress in rainbow trout by prioritizing variants from GWAS using imputed sequence data. Evol. Appl. 2021, 15, 537–552. [Google Scholar] [CrossRef]
  15. Song, H.; Li, L.; Ma, P.; Zhang, S.; Su, G.; Lund, M.S.; Zhang, Q.; Ding, X. Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels. J. Dairy Sci. 2018, 101, 5250–5254. [Google Scholar] [CrossRef] [PubMed]
  16. Liu, Y.; Zhang, Y.; Zhou, F.; Yao, Z.; Zhan, Y.; Fan, Z.; Meng, X.; Zhang, Z.; Liu, L.; Yang, J.; et al. Increased Accuracy of Genomic Prediction Using Preselected SNPs from GWAS with Imputed Whole-Genome Sequence Data in Pigs. Animals 2023, 13, 3871. [Google Scholar] [CrossRef] [PubMed]
  17. Song, H.L.; Ye, S.P.; Jiang, Y.F.; Zhang, Z.; Zhang, Q.; Ding, X.D. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs. Genet. Sel. Evol. 2019, 51, 58. [Google Scholar] [CrossRef]
  18. Song, H.; Hu, H. Strategies to improve the accuracy and reduce costs of genomic prediction in aquaculture species. Evol. Appl. 2021, 15, 578–590. [Google Scholar] [CrossRef]
  19. Lu, S.; Liu, Y.; Yu, X.J.; Li, Y.Z.; Yang, Y.M.; Wei, M.; Zhou, Q.; Wang, J.; Zhang, Y.P.; Zheng, W.W.; et al. Prediction of genomic breeding values based on pre-selected SNPs using ssGBLUP, WssGBLUP and BayesB for Edwardsiellosis resistance in Japanese flounder. Genet. Sel. Evol. 2020, 52, 49. [Google Scholar] [CrossRef]
  20. Lin, J.; Ji, Z.; Di, Z.; Zhang, Y.; Yan, C.; Zeng, S. Overexpression of Tfap2a in Mouse Oocytes Impaired Spindle and Chromosome Organization. Int. J. Mol. Sci. 2022, 23, 14376. [Google Scholar] [CrossRef]
  21. Niu, X.; Huang, Y.; Lu, H.; Li, S.; Huang, S.; Ran, X.; Wang, J. CircRNAs in Xiang pig ovaries among diestrus and estrus stages. Porc. Health Manag. 2022, 8, 29. [Google Scholar] [CrossRef]
  22. Yin, Y.; Sheng, J.; Hu, R.; Yang, Y.; Qing, S. The expression and localization of Crb3 in developmental stages of the mice embryos and in different organs of 1-week-old female mice. Reprod. Domest. Anim. 2014, 49, 824–830. [Google Scholar] [CrossRef] [PubMed]
  23. Zhao, J.; Wang, L.; Zhou, H.-X.; Liu, L.; Lu, A.; Li, G.-P.; Schatten, H.; Liang, C.-G. Clathrin Heavy Chain 1 is Required for Spindle Assembly and Chromosome Congression in Mouse Oocytes. Microsc. Microanal. 2013, 19, 1364–1373. [Google Scholar] [CrossRef]
  24. Cloutier, J.M.; Mahadevaiah, S.K.; ElInati, E.; Nussenzweig, A.; Toth, A.; Turner, J.M. Histone H2AFX Links Meiotic Chromosome Asynapsis to Prophase I Oocyte Loss in Mammals. PLoS Genet. 2015, 11, e1005462. [Google Scholar] [CrossRef]
  25. Liu, J.; Qi, N.; Xing, W.; Li, M.; Qian, Y.; Luo, G.; Yu, S. The TGF-beta/SMAD Signaling Pathway Prevents Follicular Atresia by Upregulating MORC2. Int. J. Mol. Sci. 2022, 23, 10657. [Google Scholar] [CrossRef] [PubMed]
  26. Mahmoodi, M.; Cheraghi, E.; Riahi, A. The Effect of Wharton’s Jelly-Derived Conditioned Medium on the In Vitro Maturation of Immature Oocytes, Embryo Development, and Genes Expression Involved in Apoptosis. Reprod. Sci. 2024, 31, 190–198. [Google Scholar] [CrossRef]
  27. Kim, H.J.; Lee, S.Y.; Lee, H.S.; Kim, E.Y.; Ko, J.J.; Lee, K.A. Zap70 and downstream RanBP2 are required for the exact timing of the meiotic cell cycle in oocytes. Cell Cycle 2017, 16, 1534–1546. [Google Scholar] [CrossRef]
  28. Shen, Y.; Ulaangerel, T.; Ren, H.; Davshilt, T.; Yi, M.; Li, X.; Xing, J.; Du, M.; Bai, D.; Dugarjav, M.; et al. Proteomic Differences Between the Ovulatory and Anovulatory Sides of the Mare’s Follicular and Oviduct Fluid. J. Equine Vet. Sci. 2023, 121, 104207. [Google Scholar] [CrossRef]
  29. Miao, X.; Luo, Q.; Zhao, H.; Qin, X. Co-expression analysis and identification of fecundity-related long non-coding RNAs in sheep ovaries. Sci. Rep. 2016, 6, 39398. [Google Scholar]
  30. Cha, J.; Jin, D.; Kim, J.H.; Kim, S.C.; Lim, J.A.; Chai, H.H.; Jung, S.A.; Lee, J.H.; Lee, S.H. Genome-wide association study revealed the genomic regions associated with skin pigmentation in an Ogye x White Leghorn F2 chicken population. Poult. Sci. 2023, 102, 102720. [Google Scholar] [CrossRef]
  31. Batai, K.; Cui, Z.; Arora, A.; Shah-Williams, E.; Hernandez, W.; Ruden, M.; Hollowell, C.M.P.; Hooker, S.E.; Bathina, M.; Murphy, A.B.; et al. Genetic loci associated with skin pigmentation in African Americans and their effects on vitamin D deficiency. PLoS Genet. 2021, 17, e1009319. [Google Scholar] [CrossRef] [PubMed]
  32. Mussig, C.; Schroder, F.; Usadel, B.; Lisso, J. Structure and putative function of NFX1-like proteins in plants. Plant Biol. 2010, 12, 381–394. [Google Scholar] [CrossRef]
  33. Fiil, B.K.; Damgaard, R.B.; Wagner, S.A.; Keusekotten, K.; Fritsch, M.; Bekker-Jensen, S.; Mailand, N.; Choudhary, C.; Komander, D.; Gyrd-Hansen, M. OTULIN restricts Met1-linked ubiquitination to control innate immune signaling. Mol. Cell. 2013, 50, 818–830. [Google Scholar] [CrossRef]
  34. Chen, W.; Zhang, J. Potential molecular characteristics in situ in response to repetitive UVB irradiation. Diagn. Pathol. 2016, 11, 129. [Google Scholar] [CrossRef] [PubMed]
  35. April, C.S.; Barsh, G.S. Distinct pigmentary and melanocortin 1 receptor-dependent components of cutaneous defense against ultraviolet radiation. PLoS Genet. 2007, 3, e9. [Google Scholar] [CrossRef]
  36. Linher-Melville, K.; Li, J.L. The roles of glial cell line-derived neurotrophic factor, brain-derived neurotrophic factor and nerve growth factor during the final stage of folliculogenesis: A focus on oocyte maturation. Reproduction 2013, 145, R43–R54. [Google Scholar] [CrossRef] [PubMed]
  37. Shore, E.M.; Xu, M.; Feldman, G.J.; Fenstermacher, D.A.; Cho, T.J.; Choi, I.H.; Connor, J.M.; Delai, P.; Glaser, D.L.; LeMerrer, M.; et al. A recurrent mutation in the BMP type I receptor ACVR1 causes inherited and sporadic fibrodysplasia ossificans progressiva. Nat. Genet. 2006, 38, 525–527. [Google Scholar] [CrossRef] [PubMed]
  38. Zhao, X.; Mo, D.; Li, A.; Gong, W.; Xiao, S.; Zhang, Y.; Qin, L.; Niu, Y.; Guo, Y.; Liu, X.; et al. Comparative Analyses by Sequencing of Transcriptomes during Skeletal Muscle Development between Pig Breeds Differing in Muscle Growth Rate and Fatness. PLoS ONE 2011, 6, e19774. [Google Scholar] [CrossRef] [PubMed]
  39. Xu, D.; Wang, X.; Wang, W.; Zhang, D.; Li, X.; Zhang, Y.; Zhao, Y.; Cheng, J.; Zhao, L.; Wang, J.; et al. Detection of single nucleotide polymorphism in HTR4 and its relationship with growth traits in sheep. Anim. Biotechnol. 2023, 34, 4600–4607. [Google Scholar] [CrossRef]
  40. Zhang, T.; Chen, C.; Han, S.; Chen, L.; Ding, H.; Lin, Y.; Zhang, G.; Xie, K.; Wang, J.; Dai, G. Integrated Analysis Reveals a lncRNA-miRNA-mRNA Network Associated with Pigeon Skeletal Muscle Development. Genes 2021, 12, 1787. [Google Scholar] [CrossRef]
  41. Orkunoglu-Suer, F.E.; Gordish-Dressman, H.; Clarkson, P.M.; Thompson, P.D.; Angelopoulos, T.J.; Gordon, P.M.; Moyna, N.M.; Pescatello, L.S.; Visich, P.S.; Zoeller, R.F.; et al. INSIG2 gene polymorphism is associated with increased subcutaneous fat in women and poor response to resistance training in men. BMC Med. Genet. 2008, 9, 117. [Google Scholar] [CrossRef]
  42. Bao, G.; Li, S.; Zhao, F.; Wang, J.; Liu, X.; Hu, J.; Shi, B.; Wen, Y.; Zhao, L.; Luo, Y. Comprehensive Transcriptome Analysis Reveals the Role of lncRNA in Fatty Acid Metabolism in the Longissimus Thoracis Muscle of Tibetan Sheep at Different Ages. Front. Nutr. 2022, 9, 847077. [Google Scholar] [CrossRef]
  43. Song, Y.; Ahn, J.; Suh, Y.; Davis, M.E.; Lee, K. Identification of novel tissue-specific genes by analysis of microarray databases: A human and mouse model. PLoS ONE 2013, 8, e64483. [Google Scholar] [CrossRef] [PubMed]
  44. Avirneni-Vadlamudi, U.; Galindo, K.A.; Endicott, T.R.; Paulson, V.; Cameron, S.; Galindo, R.L. Drosophila and mammalian models uncover a role for the myoblast fusion gene TANC1 in rhabdomyosarcoma. J. Clin. Investig. 2012, 122, 403–407. [Google Scholar] [CrossRef]
  45. Xie, Q.; Zhang, Z.; Chen, Z.; Sun, J.; Li, M.; Wang, Q.; Pan, Y. Integration of Selection Signatures and Protein Interactions Reveals NR6A1, PAPPA2, and PIK3C2B as the Promising Candidate Genes Underlying the Characteristics of Licha Black Pig. Biology 2023, 12, 500. [Google Scholar] [CrossRef]
  46. Schumann, T.; Konig, J.; von Loeffelholz, C.; Vatner, D.F.; Zhang, D.; Perry, R.J.; Bernier, M.; Chami, J.; Henke, C.; Kurzbach, A.; et al. Deletion of the diabetes candidate gene Slc16a13 in mice attenuates diet-induced ectopic lipid accumulation and insulin resistance. Commun. Biol. 2021, 4, 826. [Google Scholar]
  47. Terakado, A.P.N.; Costa, R.B.; de Camargo, G.M.F.; Irano, N.; Bresolin, T.; Takada, L.; Carvalho, C.V.D.; Oliveira, H.N.; Carvalheiro, R.; Baldi, F.; et al. Genome-wide association study for growth traits in Nelore cattle. Animal 2018, 12, 1358–1362. [Google Scholar] [CrossRef] [PubMed]
  48. Ramayo-Caldas, Y.; Ballester, M.; Fortes, M.R.S.; Esteve-Codina, A.; Castelló, A.; Noguera, J.L.; Fernández, A.I.; Pérez-Enciso, M.; Reverter, A.; Folch, J.M. From SNP co-association to RNA co-expression: Novel insights into gene networks for intramuscular fatty acid composition in porcine. BMC Genom. 2014, 15, 232. [Google Scholar] [CrossRef] [PubMed]
  49. Fang, P.; Zhang, L.; Yu, M.; Sheng, Z.; Shi, M.; Zhu, Y.; Zhang, Z.; Bo, P. Activiated galanin receptor 2 attenuates insulin resistance in skeletal muscle of obese mice. Peptides 2018, 99, 92–98. [Google Scholar] [CrossRef]
  50. Mohammadabadi, M.; Bordbar, F.; Jensen, J.; Du, M.; Guo, W. Key Genes Regulating Skeletal Muscle Development and Growth in Farm Animals. Animals 2021, 11, 835. [Google Scholar] [CrossRef]
  51. Goh, B.C.; Singhal, V.; Herrera, A.J.; Tomlinson, R.E.; Kim, S.; Faugere, M.C.; Germain-Lee, E.L.; Clemens, T.L.; Lee, S.J.; DiGirolamo, D.J. Activin receptor type 2A (ACVR2A) functions directly in osteoblasts as a negative regulator of bone mass. J. Biol. Chem. 2017, 292, 13809–13822. [Google Scholar] [CrossRef] [PubMed]
  52. Dong, H.; Zhang, J.; Li, Y.; Ahmad, H.I.; Li, T.; Liang, Q.; Li, Y.; Yang, M.; Han, J. Liver Transcriptome Profiling Identifies Key Genes Related to Lipid Metabolism in Yili Geese. Animals 2023, 13, 3473. [Google Scholar] [CrossRef]
  53. Zhang, D.; Zhang, X.; Li, F.; La, Y.; Li, G.; Zhang, Y.; Li, X.; Zhao, Y.; Song, Q.; Wang, W. The association of polymorphisms in the ovine PPARGC1B and ZEB2 genes with body weight in Hu sheep. Anim. Biotechnol. 2022, 33, 90–97. [Google Scholar] [CrossRef]
  54. Iheshiulor, O.O.M.; Woolliams, J.A.; Yu, X.J.; Wellmann, R.; Meuwissen, T.H.E. Within- and across-breed genomic prediction using whole-genome sequence and single nucleotide polymorphism panels. Genet. Sel. Evol. 2016, 48, 15. [Google Scholar] [CrossRef]
  55. Zhu, D.; Zhao, Y.; Zhang, R.; Wu, H.; Cai, G.; Wu, Z.; Wang, Y.; Hu, X. Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population. Genet. Sel. Evol. 2023, 55, 72. [Google Scholar] [CrossRef] [PubMed]
  56. Ye, S.; Gao, N.; Zheng, R.; Chen, Z.; Teng, J.; Yuan, X.; Zhang, H.; Chen, Z.; Zhang, X.; Li, J.; et al. Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction. Front. Genet. 2019, 10, 673. [Google Scholar] [CrossRef]
  57. Song, H.; Dong, T.; Wang, W.; Jiang, B.; Yan, X.; Geng, C.; Bai, S.; Xu, S.; Hu, H. Cost-effective genomic prediction of critical economic traits in sturgeons through low-coverage sequencing. Genomics 2024, 116, 110874. [Google Scholar] [CrossRef]
  58. Song, H.; Xu, S.; Luo, K.; Hu, M.; Luan, S.; Shao, H.; Kong, J.; Hu, H. Estimation of genetic parameters for growth and egg related traits in Russian sturgeon (Acipenser gueldenstaedtii). Aquaculture 2022, 546, 737299. [Google Scholar] [CrossRef]
  59. Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef]
  60. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; Proc, G.P.D. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
  61. McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef]
  62. Browning, B.L.; Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 2009, 84, 210–223. [Google Scholar] [CrossRef] [PubMed]
  63. Chang, C.C.; Chow, C.C.; Tellier, L.C.A.M.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, 7. [Google Scholar] [CrossRef] [PubMed]
  64. Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A Tool for Genome-wide Complex Trait Analysis. Am. J. Hum. Genet. 2011, 88, 76–82. [Google Scholar] [CrossRef] [PubMed]
  65. Zhang, C.; Dong, S.S.; Xu, J.Y.; He, W.M.; Yang, T.L. PopLDdecay: A fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 2019, 35, 1786–1788. [Google Scholar] [CrossRef]
  66. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef]
  67. Pe’er, I.; Yelensky, R.; Altshuler, D.; Daly, M.J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 2008, 32, 381–385. [Google Scholar] [CrossRef]
  68. Wang, X.; Wang, L.; Shi, L.; Zhang, P.; Li, Y.; Li, M.; Tian, J.; Wang, L.; Zhao, F. GWAS of Reproductive Traits in Large White Pigs on Chip and Imputed Whole-Genome Sequencing Data. Int. J. Mol. Sci. 2022, 23, 13338. [Google Scholar] [CrossRef]
  69. Huerta-Cepas, J.; Szklarczyk, D.; Forslund, K.; Cook, H.; Heller, D.; Walter, M.C.; Rattei, T.; Mende, D.R.; Sunagawa, S.; Kuhn, M.; et al. eggNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 2016, 44, D286–D293. [Google Scholar] [CrossRef]
  70. Edwards, S.M.; Sorensen, I.F.; Sarup, P.; Mackay, T.F.C.; Sorensen, P. Genomic Prediction for Quantitative Traits Is Improved by Mapping Variants to Gene Ontology Categories in Drosophila melanogaster. Genetics 2016, 203, 1871–1873. [Google Scholar] [CrossRef]
Figure 1. SNP distribution and population structure of Russian sturgeon. (A) Distribution of SNPs in 10 Mb windows across the genome; (B) Number of SNPs on each chromosome; (C) Principal component analyses for the first to the third dimensions of principal component (PC); (D) Genome-wide LD decay.
Figure 1. SNP distribution and population structure of Russian sturgeon. (A) Distribution of SNPs in 10 Mb windows across the genome; (B) Number of SNPs on each chromosome; (C) Principal component analyses for the first to the third dimensions of principal component (PC); (D) Genome-wide LD decay.
Ijms 25 09756 g001
Figure 2. Manhattan and QQ plots of genome-wide association studies for caviar yield, caviar color, and body weight in the Russian sturgeon population. (A,B) Caviar yield; (C,D) Caviar color; (E,F) Body weight. In the Manhattan diagram, the dashed and solid lines indicate the genome-wide and suggestive significance threshold, respectively. In the Manhattan plots, different colors represent individual chromosomes. Each dot corresponds to a SNP, and its color indicates its chromosomal location.
Figure 2. Manhattan and QQ plots of genome-wide association studies for caviar yield, caviar color, and body weight in the Russian sturgeon population. (A,B) Caviar yield; (C,D) Caviar color; (E,F) Body weight. In the Manhattan diagram, the dashed and solid lines indicate the genome-wide and suggestive significance threshold, respectively. In the Manhattan plots, different colors represent individual chromosomes. Each dot corresponds to a SNP, and its color indicates its chromosomal location.
Ijms 25 09756 g002
Figure 3. Genomic prediction performance. (A) Accuracy, (B) bias, (C) Mse, and (D) Mae of genomic prediction for caviar yield, caviar color, and body weight traits based on GBLUP, GLDBLUP, and GFBLUP methods.
Figure 3. Genomic prediction performance. (A) Accuracy, (B) bias, (C) Mse, and (D) Mae of genomic prediction for caviar yield, caviar color, and body weight traits based on GBLUP, GLDBLUP, and GFBLUP methods.
Ijms 25 09756 g003
Table 1. The descriptive statistics of caviar yield, caviar color, and body weight.
Table 1. The descriptive statistics of caviar yield, caviar color, and body weight.
TraitNumberMeanSDCVMaxMin
Caviar yield6730.1900.05730.00%0.4390.021
Caviar color6732.4530.65326.62%41
Body weight67319.9334.02920.21%35.40010.400
SD, standard deviation; CV, coefficient of variation.
Table 2. Estimated variance components and heritability for caviar yield, caviar color, and body weight.
Table 2. Estimated variance components and heritability for caviar yield, caviar color, and body weight.
TraitV(G)V(e)h2
Caviar yield0.001590.001610.497
Caviar color0.2420.1520.614
Body weight11.4936.8440.627
V(G), random polygenic variance; V(e), residual variance; h2, heritability.
Table 3. The genome significant and suggestive SNPs with the caviar yield trait using whole-genome sequencing data.
Table 3. The genome significant and suggestive SNPs with the caviar yield trait using whole-genome sequencing data.
ChrSNP_R (bp)SNP_NPosition_Top (bp)p Value_TopCandidate Gene
443,712,625–43,752,625243,733,6545.12 × 10⁻8TFAP2A
425,562,496–25,602,496125,582,4965.87 × 10⁻7C8orf34
562,391,026–2,431,02612,411,0261.18 × 10⁻6PCOLCE, PFN2, RNF167
955,004,692–55,044,692155,024,6921.76 × 10⁻6RPS6KA3
464,382,490–4,422,49014,402,4902.03 × 10⁻6CRB3, DENND1C, TUBB
445,849,293–5,889,29315,869,2932.12 × 10⁻6ARCN1, H2AFX, HMBS
178,013,802–78,053,802178,033,8022.49 × 10⁻6RORB
917,568,460–17,608,460117,588,4602.49 × 10⁻6SLC5A7, STK24
919,246,701–19,286,701119,266,7012.67 × 10⁻6SETD4, morc3
440,684,761–40,724,761140,704,7613.98 × 10⁻6BAG1, C7orf25
223,969,669–4,009,66913,989,6694.10 × 10⁻6DPY19L3, ZNF507
436,848,652–36,888,652136,868,6524.20 × 10⁻6ABHD3
917,393,602–17,433,602117,413,6024.25 × 10⁻6RANBP2
122,846,164–2,886,16412,866,1644.28 × 10⁻6CRYBA4, CRYBB1, PLA2G1B, TPST2
561,992,379–2,032,37912,012,3794.64 × 10⁻6NYAP1
825,489,572–25,529,572125,509,5724.83 × 10⁻6CCKBR
Chr, chromosome. SNP_R, range of significant and suggestive SNPs region. SNP_N, number of significant and suggestive SNPs. Position_Top, the position (bp) of the top SNP in the range of significant and suggestive SNPs region. p value_Top, p value of the top SNP. The bolded text shows the potential candidate genes associated with caviar yield, identified through functional annotation with eggNOG-mapper.
Table 4. The genome significant and suggestive SNPs with the caviar color trait using whole-genome sequencing data.
Table 4. The genome significant and suggestive SNPs with the caviar color trait using whole-genome sequencing data.
ChrSNP_R (bp)SNP_NPosition_Top (bp)p Value_TopCandidate Gene
222,458,265–2,498,26512,478,2654.23 × 10⁻8OGFOD1
316,905,242–16,945,242116,925,2427.84 × 10⁻8NFX1, OTULIN
763,424,745–63,464,745263,444,8081.96 × 10⁻7ALDH18A1, CRYGB, ENTPD1
219,438,473–19,478,473119,458,4735.26 × 10⁻7SRFBP1
651,347,092–51,387,092151,367,0928.82 × 10⁻7CNRIP1, PLEK
1928,444,118–28,484,118128,464,1189.64 × 10⁻7HIC2
2112,773,362–12,813,362112,793,3621.26 × 10⁻6ZFYVE20
185,267,483–85,307,483185,287,4831.54 × 10⁻6HCN1
710,056,229–10,096,229110,076,2292.42 × 10⁻6GPR85
2524,937,562–24,977,562124,957,5622.45 × 10⁻6CDK16
660,924,314–60,964,314160,944,3142.96 × 10⁻6FNDC4
317,152,507–17,192,507117,172,5073.43 × 10⁻6INHBA
187,476,833–87,516,833187,496,8333.57 × 10⁻6NARS
514,080,177–4,120,17714,100,1773.66 × 10⁻6PLXNB3
2013,829,146–13,869,146113,849,1464.07 × 10⁻6TMEM164
59,376,351–9,416,35119,396,3514.23 × 10⁻6FBXL4
36405,705–445,7051425,7054.64 × 10⁻6APOBEC3G
153,107,797–53,147,797153,127,7974.96 × 10⁻6IQCM
Chr, chromosome. SNP_R, range of significant and suggestive SNPs region. SNP_N, number of significant and suggestive SNPs. Position_Top, the position (bp) of the top SNP in the range of significant and suggestive SNPs region. p value_Top, p value of the top SNP. The bolded text shows the potential candidate genes associated with caviar color, identified through functional annotation with eggNOG-mapper.
Table 5. The genome significant and suggestive SNPs with the body weight trait using whole-genome sequencing data.
Table 5. The genome significant and suggestive SNPs with the body weight trait using whole-genome sequencing data.
ChrSNP_R (bp)SNP_NPosition_Top (bp)p Value_TopCandidate Gene
1232,000,256–32,040,256532,035,7783.54 × 10⁻8BAZ2B
1232,955,799–32,995,799432,979,8461.95 × 10⁻7ARL6IP6, PRPF40A
1232,292,886–32,332,886632,331,5761.96 × 10⁻7ACVR1, UPP2
1233,242,523–33,282,523533,281,5261.97 × 10⁻7KALRN
1233,735,020–33,775,020333,755,0202.83 × 10⁻7GMPPA, PNKD
1233,222,041–33,262,041633,256,2862.90 × 10⁻7KALRN
190,268,253–90,308,253190,288,2532.93 × 10⁻7HTR4
1232,383,120–32,423,1201032,411,7843.09 × 10⁻7CYTIP, ERMN, GALNT5
1232,313,856–32,353,856532,333,8564.12 × 10⁻7ACVR1
1233,088,046–33,128,046333,126,2884.89 × 10⁻7MYLK
1232,359,131–32,399,131432,379,8355.00 × 10⁻7CYTIP
1232,989,267–33,029,267333,009,2675.21 × 10⁻7fmnl2
1233,759,093–33,799,093333,779,0935.90 × 10⁻7DARS, MCM6, PNKD, TMBIM1
1044,336,019–44,376,019144,356,0196.05 × 10⁻7INSIG2
1235,795,634–35,835,634235,832,9707.09 × 10⁻7LYPD6
1232,500,094–32,540,094232,520,0947.64 × 10⁻7GPD2
1232,930,958–32,970,958632,959,6467.85 × 10⁻7ARL6IP6
1232,849,926–32,889,926332,881,0018.75 × 10⁻7GALNT13
1232,870,402–32,910,402332,890,4829.58 × 10⁻7RPRM
1232,520,728–32,560,728232,541,7041.01 × 10⁻6NR4A2
174,638,093–4,678,09314,658,0931.10 × 10⁻6ZNF536
1233,874,939–33,914,939733,894,9391.11 × 10⁻6THSD7B
1236,126,431–36,166,431136,146,4311.12 × 10⁻6UBXN4, enc
1234,918,833–34,958,833434,956,5341.20 × 10⁻6GTDC1
1620,101,682–20,141,682120,121,6821.23 × 10⁻6DNAJC17
1232,335,824–32,375,824432,363,5981.27 × 10⁻6ACVR1C
1511,074,569–11,114,569111,094,5691.32 × 10⁻6ADAP1, COX19
1231,713,257–31,753,257231,747,2941.44 × 10⁻6TBR1
1231,739,419–31,779,419131,759,4191.63 × 10⁻6PSMD14, TBR1
1233,184,361–33,224,361333,213,8051.79 × 10⁻6KALRN, ROPN1
1232,082,038–32,122,038232,117,5711.82 × 10⁻6TANC1
1231,495,949–31,535,949231,527,0741.83 × 10⁻6KCNH7
452,422,442–2,462,44212,442,4421.90 × 10⁻6RNF39, ZKSCAN8
56338,814–378,8141358,8142.14 × 10⁻6BCL6B, SLC16A13
1232,031,529–32,071,529332,051,5292.14 × 10⁻6WDSUB1
429,653,490–29,693,490129,673,4902.19 × 10⁻6XKR4
1236,422,230–36,462,230136,442,2302.20 × 10⁻6ESYT3, FAIM
1232,158,497–32,198,497532,181,6492.21 × 10⁻6TANC1
419,975,042–20,015,042119,995,0422.29 × 10⁻6TOP1MT
1112,313,498–112,353,4981112,333,4982.33 × 10⁻6ARID3C
371,687,436–1,727,43611,707,4362.35 × 10⁻6AMIGO1, CYB561D1
1730,419,784–30,459,784130,439,7842.47 × 10⁻6GALR2
2124,163,402–24,203,402124,183,4022.48 × 10⁻6AKAP14, NDUFA1, NKAP, RPL39, SOWAHD, UPF3B
1236,200,620–36,240,620136,220,6202.57 × 10⁻6MAP3K19, RAB3GAP1
1234,895,887–34,935,887334,932,4492.70 × 10⁻6GTDC1
1232,262,572–32,302,572232,299,1912.86 × 10⁻6CCDC148, UPP2
1236,763,187–36,803,187136,783,1872.90 × 10⁻6DDX18, Htr5b
1235,557,188–35,597,188135,577,1883.02 × 10⁻6ACVR2A
1234,667,954–34,707,954134,687,9543.04 × 10⁻6KYNU
1232,780,874–32,820,874132,800,8743.09 × 10⁻6KCNJ3
1232,198,180–32,238,180232,218,1803.11 × 10⁻6PKP4, dapl1
1231,634,734–31,674,734131,654,7343.19 × 10⁻6ADCY10, GCG
679,213,869–79,253,869179,233,8693.33 × 10⁻6MARCKS
1231,950,349–31,990,349131,970,3493.44 × 10⁻6MARCH7
1236,500,960–36,540,960136,520,9603.54 × 10⁻6C2orf76, DBI, STEAP3, TMEM37
1234,988,988–35,028,988135,008,9883.69 × 10⁻6ZEB2
434,744,439–34,784,439134,764,4393.75 × 10⁻6B4GALT6, TTR
415,614,428–5,654,42815,634,4283.82 × 10⁻6INA
1233,800,017–33,840,017133,820,0174.56 × 10⁻6CXCR4
Chr, chromosome. SNP_R, range of significant and suggestive SNPs region. SNP_N, number of significant and suggestive SNPs. Position_Top, the position (bp) of the top SNP in the range of significant and suggestive SNPs region. p value_Top, p value of the top SNP. The bolded text shows the potential candidate genes associated with body weight, identified through functional annotation with eggNOG-mapper.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, H.; Dong, T.; Wang, W.; Yan, X.; Geng, C.; Bai, S.; Hu, H. GWAS Enhances Genomic Prediction Accuracy of Caviar Yield, Caviar Color and Body Weight Traits in Sturgeons Using Whole-Genome Sequencing Data. Int. J. Mol. Sci. 2024, 25, 9756. https://doi.org/10.3390/ijms25179756

AMA Style

Song H, Dong T, Wang W, Yan X, Geng C, Bai S, Hu H. GWAS Enhances Genomic Prediction Accuracy of Caviar Yield, Caviar Color and Body Weight Traits in Sturgeons Using Whole-Genome Sequencing Data. International Journal of Molecular Sciences. 2024; 25(17):9756. https://doi.org/10.3390/ijms25179756

Chicago/Turabian Style

Song, Hailiang, Tian Dong, Wei Wang, Xiaoyu Yan, Chenfan Geng, Song Bai, and Hongxia Hu. 2024. "GWAS Enhances Genomic Prediction Accuracy of Caviar Yield, Caviar Color and Body Weight Traits in Sturgeons Using Whole-Genome Sequencing Data" International Journal of Molecular Sciences 25, no. 17: 9756. https://doi.org/10.3390/ijms25179756

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop