Integrated Analysis Reveals Genetic Basis of Growth Curve Parameters in an F2 Designed Pig Population Based on Genome and Transcriptome Data

Key Laboratory of Agricultural Animal Genetics, Breeding, and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan 430070, China
Authors to whom correspondence should be addressed.
Agriculture 2024, 14(10), 1704; (registering DOI)
Submission received: 29 July 2024 / Revised: 25 September 2024 / Accepted: 26 September 2024 / Published: 28 September 2024
(This article belongs to the Section Farm Animal Production)


Appropriate growth curves can reflect more sophisticated growth patterns of animals than body weight, and thus, the identification of genes and variants related to the growth curve parameter traits contributes to revealing the fine growth and development characteristics of livestock. However, the ability of single genome-wide association analysis (GWAS) and transcriptome analyses to identify valuable genes and variants is limited. In this study, based on genome and transcriptome data, the growth curve parameter traits of hybrid pigs were analyzed, and a set of genes and variants were identified. The Gompertz–Laird growth curve model was optimized to reveal the growth pattern of F2 individuals of Duroc × Erhualian pigs over four time points. Five growth parameters were estimated, including initial body weight ( W 0 ) , instantaneous growth rate per day (L), coefficient of relative growth or maturing index (k), body weight at inflection point ( W i ) , and average growth rate (GR). These five parameters were subjected to a genome-wide association study, differential gene expression analysis, and weighted gene co-expression network analysis (WGCNA). In the study, 336 pigs were genotyped, and 39,494 SNP markers were used for each pig in the analysis. Thirty of these pigs were also included in the transcriptomics analysis. Based on genome and transcriptome data, the integrated analyses identified five putative SNPs (including INRA0056460 on chromosome X, DRGA0004151 on chromosome 3, INRA0056460 on chromosome X, H3GA0049324 on chromosome 17, and H3GA0037747 on chromosome 13) and 15 candidate genes (PDGFA, VEGFD, CSPP1, EFHC1, PIK3C3, ZZZ3, GCC2, MAPK14, ZPR1, ISG15, ANG, CEBPD, ZHX3, CTBP2, and MYNN). The functional analysis indicated that these candidate genes played important roles in cell division and differentiation, development and aging, and skeletal muscle and fat formation. Our results provide insight into the genetic mechanisms underlying the growth and development of hybrid pigs and offer a theoretical basis for genomic breeding.

1. Introduction

Growth is a complex quantitative trait with moderate–high heritability in pigs, and it is of great economic value [1,2]. Improving the growth or fattening efficiency of pigs is an important goal of pig genetic improvement. It has been reported that the fattening efficiency of pigs is significantly affected by the growth performance of piglets [3]. At present, there have been some genetic studies on immunity in piglets [4,5,6]. However, there is a lack of revelation of the genetic basis of the growth pattern of piglets. Traditional genetic studies of growth traits can be performed by analyzing animal body weight at representative time points or by analyzing growth performance indicators such as average daily gain in pigs [7,8]. Although quantitative trait locus (QTL) and genes related to growth traits detected by these two methods are helpful to animal breeding, they are not adequate to elucidate the growth patterns of animals. Since animal growth is nonlinear, the growth pattern can be explained by nonlinear functions, and these nonlinear functions with good fitting effects are defined as the growth curves [9]. These growth curve parameters such as instantaneous growth rate and maturity index display the growth characteristics of the animal. Genetic analysis based on these parameters helps to reveal the genetic basis of animal growth patterns [10].
It has been reported that the Gompertz model is one of the most commonly used growth curve models [11]. The complete growth curve of a pig is close to the S-shaped growth curve, and the Gompertz model can describe the growth pattern of pigs well. However, the original Gompertz model is dependent on mature weight, which cannot fit the growth curve well for a specific period, and thus an optimized model was developed by researchers, namely, the Gompertz–Laird model [12,13]. The Gompertz–Laird model is a flexible Gompertz model with a variable inflection point, and thus, it does not depend on mature weight [14]. At present, the Gompertz–Laird model has been successfully used to fit growth data of poultry, livestock, and marine animals [14,15,16].
A genome-wide association study (GWAS) has been widely used to reveal the genetic basis of growth patterns in common poultry and livestock such as chickens, cattle, pigs, and rabbits [10,14,17,18]. However, for the complex genetic structure of growth traits, traditional mixed linear model (MLM) statistical power is limited [19]. Liu et al. introduced FarmCPU as an improved multi-locus model, which improved statistical power and detected more candidate genes [20,21,22]. Subsequently, Huang et al. developed BLINK, which further improved the computational efficiency and statistical power of GWAS [23]. Despite the development of these powerful GWAS models, given the growth traits of complex genetic structure, a single GWAS model is still not widely applicable. Hence, the combination of these models for the genetic analysis of complex traits has gained increasing popularity [24,25].
GWAS uses genomic information to identify genetic variants associated with a target trait. However, there is often more than one gene in the genome region involved in the significant genetic variation, which is not conducive to the identification of candidate genes. An effective solution is to identify candidate genes based on both genome and transcriptome data [25,26]. Therefore, in this study, in order to reveal the genetic basis of the growth pattern of piglets, the genomic and transcriptomic data were integrated for genetic analysis based on the growth curve parameters of pigs during the corresponding period. This research filled the gap in the research regarding the genetic basis of the growth pattern of piglets and provided valuable information for improving the efficiency of pig growing.

2. Materials and Methods

2.1. Ethics Statement

All the animal experiments were approved by the Science Ethics Committee of Huazhong Agricultural University, China, following the protocols and guidelines (approval number: HZAUSW-2018-008).

2.2. Animal and Genotyping Data

The population utilized in this study consisted of a hybrid F2 population (containing 393 pigs) derived from hybridization between Duroc and Erhualian pigs [27]. All the animals were raised by Qingyuan Wenshi Pig Breeding Technology Co., Ltd. (Qingyuan City, Guangdong Province, China). Individual pig weight was measured at four time points, namely, day 1, 21, 35, and 80, respectively.
Genomic DNA was extracted from frozen ear or tail tissue samples (336 pigs). All the samples were genotyped by Illumina PorcineSNP60 BeadChip, and a total of 62,163 SNPs were obtained. Quality control (QC) of the resultant SNPs was performed using PLINK1.9 [28]. The SNPs with a call rate < 0.95 were excluded from the analysis. Subsequently, missing genotype data were imputed using Beagle5.4 software [29]. Then, SNPs with a minor allele frequency (MAF) < 0.01 were excluded. After genotype data processing, 39,494 SNPs were retained for subsequent analysis.

2.3. Animals and Transcriptome Data

This study used the transcriptome data published in our previous studies [27,30]. The RNA was extracted from blood samples collected from the anterior vena cava of piglets at the age of 33 days, and RNA tagging and hybridization were performed at a commercial Affymetrix array service company (GeneTech Biotechnology Limited Company, Shanghai, China). Raw expression data were processed and normalized using the Robust Multichip Average methods [31]. Affymetrix Pig microarray was annotated using the Affymetrix Pig Genome Array Annotation file (Release 36,, accessed on 26 February 2024). Pipeline was carried out in an R environment. Firstly, the affy package was used to process chip data, and then an annotation file was used to obtain the gene expression matrix [32]. The transcriptome data were derived from 30 individual hybrid F2 pigs with genotyping, and 10,261 genes were annotated.

2.4. Growth Curve Construction

The Gompertz–Laird model was used to fit the growth curve based on the 336 pigs with genotyping. Five estimated growth curve parameters were used as parameter traits, namely, initial body weight (W0), instantaneous growth rate per day (L), coefficient of relative growth or maturing index (k), body weight at inflection point (Wi), and average growth rate (GR). The statistical functions of the Gompertz–Laird model and growth curve parameters are presented in Table 1.

2.5. Weighted Gene Co-Expression Network Analysis (WGCNA)

We used the R package WGCNA to construct the gene co-expression network [33]. Firstly, 10,261 genes and 30 samples were assessed as available, and 3 outliers were removed from samples by sample clustering. Secondly, the optimal soft threshold was selected. Based on this optimal soft threshold, we constructed an adjacency matrix (power = 6) to describe the correlation strength between nodes and then transformed the adjacency matrix into a topological overlap matrix (TOM). The TOM matrix is a method to quantitatively describe the similarity between two nodes by comparing their weighted correlation with other nodes. Next, hierarchical clustering was performed to identify the modules, with each module containing at least 30 genes (minClusterSize = 30). Afterward, we calculated the similarity of module eigengenes, hierarchically clustered the modules, and merged similar modules (abline = 0.2). The gene set with high topological overlap similarity was defined as a module, and in the same module, genes tended to be highly co-expressed. Further, we identified the co-expression modules most associated with target traits and selected the most important genes in the modules as WGCNA candidate genes (WCGs). Candidate genes were identified by the correlation (module membership, MM) between in-module gene expression and the eigenvalue of the module as well as the correlation (gene significance, GS) between in-module gene expression and the corresponding phenotype, with the threshold values of MM > 0.8 and GS > 0.2 [34].

2.6. Differential Gene Expression Analysis

For each parameter trait, all individuals (30 pigs with both transcriptome and genome information) were ranked based on their trait values. Ten extreme individuals were then selected from both ends of the distribution for each parameter trait, and these selected individuals (20 pigs) were categorized into high (treatment) and low (control) groups for subsequent differential gene expression analysis. Differentially expressed genes (DEGs) were identified by empirical Bayes correction using limma packages with the threshold of p < 0.05 and log-fold change |(log2FC)| > 0.5 [35].

2.7. Genome-Wide Association Analysis

In this study, the genome-wide association studies (GWAS) were performed using the FarmCPU [20] and BLINK [23] models in R packages rMVP [36] and GAPIT [37]. The genome-wide significance thresholds were set as p = 0.05/N and p = 0.01/N, respectively, where N represented the number of SNPs. To avoid overlooking potential linkage signals, the threshold for suggestive association was set as p = 1/N, where N represented the number of SNPs. An intersection strategy was employed to reduce false positives, in which the overlapped significant SNPs from both models (FarmCPU and BLINK) were defined as the final significant SNPs. The significant GWAS genes (SGGs) within 1 Mb segments surrounding each significant SNP were detected using the biomaRt (v2.58.2) package [38].

2.8. Gene Ontology Enrichment Analysis

Gene Ontology (GO) enrichment analysis was carried out by clusterProfiler package [39]. We integrated WCGs, DEGs, and SGGs as the overall gene set, and then performed GO enrichment analysis. Subsequently, GO terms containing SGGs were screened for the candidate gene identification.

2.9. Identification of Candidate Genes

The GO terms containing SGGs were named the SGH-GO terms (Figure S2). Genes involved in more biological processes associated with the trait may play a more important role in the regulation of the trait. For each trait, the top 5 genes with the highest enrichment frequency (terms ratio) in the relevant biological processes (BP) SGH-GO terms were defined as candidate genes, and the genes with the same enrichment frequency as the top 5 genes were also defined as candidate genes (Figure S2).

3. Results

3.1. Growth Curve Fitting and Definition of Parameter Traits

The fitness indices ( R 2 and a d j   R 2 ) of the Gompertz–Laird model were higher than 0.99 (Table 1), indicating that these growth curves were properly fitted by this nonlinear growth curve model [40]. The descriptions of the growth curve parameter traits are presented in Table 2. Using the BGLR package, the heritability of the five parameter traits was estimated to range from 0.31 to 0.49 [41]. The histograms of five parameter traits are shown in Figure S1, and the data of five parameter traits conformed to normal distribution. Pearson correlations of growth curve parameters are shown in Figure 1. It has been reported that traits with a correlation coefficient > 0.8 are considered the same traits [42], while other researchers disagreed with this view [14], and thus, we further analyzed these five traits separately.

3.2. Significant SNPs and Nearby Genes Identified by GWAS

The significant SNPs associated with the five traits were identified using two GWAS models (FarmCPU and BLINK), respectively (Figure 2). To reduce false positives, the overlapped significant and suggestive SNPs identified by FarmCPU and BLINK were defined as putative SNPs. A total of five putative SNPs were obtained (INRA0056460 on chromosome X, DRGA0004151 on chromosome 3, INRA0056460 on chromosome X, H3GA0049324 on chromosome 17, and H3GA0037747 on chromosome 13). These five overlapped putative SNPs associated with growth curve parameters and their nearby 30 genes were identified by FarmCPU and BLINK (Table 3). For the parameter trait initial body weight ( W 0 ) , one genome-wide significant SNP (INRA0056460) was detected on chromosome X. The INRA0056460 was extremely significant in both FarmCPU and BLINK. For the parameter trait body weight at the inflection point ( W i ) , H3GA0037747 was detected on chromosome 13.
For the parameter trait instantaneous growth rate ( L ) , 3 genome-wide significant SNPs (DRGA0004151, H3GA0049324, and INRA0056460) were detected on chromosome 3, 17, and X, respectively. DRGA0004151 was significant in FarmCPU, but it was suggestive in BLINK. INRA0056460 was significant in FarmCPU and extremely significant in BLINK. Notably, H3GA0049324 was found to be associated with both parameter traits L and k , and it was extremely significant in both FarmCPU and BLINK, suggesting that H3GA0049324 might play an important role in the growth and development of Duroc × Erhualian F2 generation pigs. In addition, no key SNP was identified for parameter trait growth rate ( G R ) , which might be due to the small sample size.

3.3. WGCNA Candidate Genes and Differentially Expressed Genes Identified by Transcriptome Analysis

The 10,261 gene expression matrix for each parameter trait was obtained after data preprocessing. We used the hierarchical cluster method to cluster the samples and retained 27 samples. After removing outliers, a sample clustering tree was plotted (Figure 3A). The soft threshold was set as 6 according to the scale-free index (Figure 3B) and mean connectivity (Figure 3C) of various soft thresholds to construct a scale-free network. Subsequently, the adjacency matrix and the topological overlap matrix were constructed. Finally, 32 modules were identified based on average hierarchical clustering and dynamic tree clipping results (Figure 3D). One or two modules most relevant to each trait were selected for the identification of WGCNA candidate genes (WCGs) (Figure 4). The identified candidate genes were presented in Tables S1–S5.
All individuals were ranked according to the values of the trait, and the top ten and bottom ten individuals were separately selected for subsequent analysis. The selected individuals were divided into high-value and low-value groups for subsequent differential gene expression analysis. A total of 289, 109, 68, 128, and 46 DEGs in the comparison of the high-value group vs. low-value group were identified for W 0 , L , k , W i , and G R , respectively (Figure 5; Tables S6–S10).

3.4. Integrated Analysis of Genome and Transcriptome Data and Gene Function Annotation

Since a single omics analysis tends not to thoroughly reveal the genetic basis of a trait, we combined genomic and transcriptomic data to identify key genes influencing the parameter traits. Firstly, WGCNA candidate genes (WCGs), differentially expressed genes (DEGs), and significant GWAS genes (SGGs) were integrated as an overall gene set, and then GO enrichment analysis was performed based on this overall gene set (Tables S11–S15). In this study, the genes surrounding each of the key SNPs were defined as SGGs. Subsequently, GO terms containing SGGs were screened for the identification of candidate genes, and the GO terms hit by SGGs were named SGH-GO terms.
For traits W 0 , L , k , and W i , 6, 16, 4, and 2 SGH-GO terms were screened, respectively (Tables S16–S19). GO enrichment analysis results showed that the selected genes were involved in multiple biological processes related to growth and development such as cell division, osteoblast differentiation, protein transport, and immunological biological processes (mainly including adaptive immune response and leukemia-related biological processes) (Figure 6). We selected the top five genes with the highest enrichment frequency in the relevant biological processes (BP) SGH-GO terms as candidate genes, and the genes with the same enrichment frequency as the top 5 genes were also defined as candidate genes (Table 4). A total of 15 candidate genes were identified. Trait L and trait k were highly correlated (0.9), and the candidate genes related to trait k were contained in the SGH-GO term enrichment gene set related to trait L . However, the candidate genes related to these two traits were different, suggesting that the highly correlated traits may still have a different genetic basis. In addition, an interesting phenomenon is that candidate genes from DEGs are up-regulated in the high-value group, suggesting a positive role of these genes in pig growth and development (Table S20).

4. Discussion

Although GWAS has been widely used in the genetic anatomy of complex traits, false positive is a long-standing problem in GWAS, especially when the sample size is insufficient. Compared with a classical single-point GWAS method such as MLM, multi-site GWAS can control false positive variation [43,44,45]. In addition, previous research has indicated that when multiple GWAS methods are used to analyze the same dataset, putative SNPs jointly determined by multiple methods tend to be more reliable [46,47]. In this study, 31 and 16 significant SNPs were identified by FarmCPU and BLINK, respectively, of which five overlapped SNPs were determined as candidate markers (INRA0056460, DRGA0004151, INRA0056460, H3GA0049324, and H3GA0037747). W 0 and W i belong to body weight traits, and one SNP marker was identified to be associated with each of them. For trait L , three candidate SNP markers were identified. Notably, H3GA0049324 located on chromosome 17 was associated with both L and k , which were growth-rate traits, underscoring the potential of this marker. At present, a common method for identifying candidate genes is to select genes near significant SNPs in GWAS, and the distance threshold is usually set at 1 Mb [7,14,48,49]. Therefore, in the study, the genes around these five SNP markers (1 Mb upstream or downstream) were included in subsequent integrated analyses as significant GWAS genes (SGGs).
Based on transcriptome data, differential gene expression analysis and WGCNA were performed to identify DEGs and WCGs, respectively. Transcriptome analysis can identify the actively expressed genes related to the trait of interest [50]. Since the expression profile chip does not cover all the genes, it is difficult to directly overlap them with SGGs for the identification of candidate genes. In this study, a new combination analysis strategy was adopted to integrate SGGs, DEGs, and WCGs for GO enrichment analysis, and then the GO terms hit by SGGs (SGH-GO terms) were identified. The genes enriched in the SGH-GO terms were significantly related to growth, development, and immunobiological processes. Further, we screened the candidate genes according to the enrichment frequency of genes in the biological processes (BP) SGH-GO terms and finally obtained 15 candidate genes (VEGFD, PDGFA, CSPP1, EFHC1, PIK3C3, ZZZ3, MYNN, CTBP2, GCC2, MAPK14, ZPR1, ISG15, ANG, ZHX3, and CEBPD).
For trait W 0 , six candidate genes were identified, including VEGFD, PDGFA, CSPP1, EFHC1, PIK3C3, and ZZZ3, five of which were derived from DEGs and one from SGGs. The vascular endothelial growth factor (VEGF) family and platelet-derived growth factor (PDGF) family are two major families of tertiary vascular growth factors with close evolutionary relationships [51,52], and our identified two top candidate genes VEGFD and PDGFA (in terms of enrichment frequency) belonged to these two families, respectively. VEGFD has been reported to have both angiogenic and lymphangiogenic activity in porcine skeletal muscle [53]. PDGFA has been found to be involved in human testicular development [54], and in this study, PDGFA was identified to be associated with the initial body weight trait ( W 0 ) , suggesting the potential role of PDGFA in early growth and development of pigs. The two candidate genes (VEGFD and PDGFA) associated with W 0 were enriched in the entries related to cell division and fibroblast proliferation, which were consistent with gene functions, suggesting their important roles in the early growth and development of piglets.
Four other candidate genes associated with W 0 , although relatively infrequently enriched, also play potentially important roles in weight regulation. Two splice isoforms of CSPP1 have been identified, whose functions include cell division, cilia formation, cell cycle control, and other biological processes necessary for animal growth [55,56,57,58]. A study of rats has indicated that EFHC1 regulates cell division and cortical development [59]. PIK3C3 is involved in pig body weight regulation such as anti-lipolysis and glucose output in the liver and muscle systems [60]. A genome-wide meta-analysis has revealed that ZZZ3 is associated with human obesity, highlighting its importance for pig body weight [61]. Interestingly, EFHC1, as a specific duplicated gene in pigs, is associated with the differentiation of wild boar and domestic pig populations, but there have been limited reports on its function in pigs [62].
For parameter trait body weight at the inflection point ( W i ), two candidate genes MYNN and CTBP2 were identified. MYNN is highly expressed in porcine cerebellum, stomach, and longissimus dorsi, implying its important role in digestion, absorption, and skeletal muscle growth [63]. Another study has demonstrated that the depletion of MYNN reduces BMP signaling activity, in turn affecting bone formation, bone development, and other processes [64,65]. The bone is the scaffold of the body tissue and develops before muscle and fat. MYNN may be important for bone development in piglets. Bone development is bound to affect fattening efficiency, which suggests the potential role of MYNN in improving fattening efficiency. CTBP2 promotes adipogenesis and regulates body size and development [66,67]. In another study, CTBP2 is identified as a candidate gene related to pork quality improvement and multiple muscle pathological processes such as muscle atrophy and malnutrition [68]. These findings collectively suggest that CTBP2 might play an important role in the body weight and meat quality traits of pigs.
Among the growth parameter traits, L and k are two key traits. In this study, GCC2, MAPK14, ZPR1 and ANG were identified as candidate genes related to trait L , exerting important functions at the cellular level to maintain the life activities of the organism. In enrichment analysis, these genes were enriched into items such as phosphatase activity, protein transport, and osteoblast differentiation, suggesting their important role in piglet growth and immune processes. Of these identified candidate genes, GCC2 participates in Golgi structure maintenance [69]; MAPK14 is involved in cell survival, apoptosis, proliferation, differentiation and migration of different types of cells [70]; ZPR1 is associated with transcriptional defects and cell cycle progression [71]; and ANG encodes the ribonuclease 5 protein, which can regulate protein levels and inhibit cell apoptosis [72,73,74]. In this study, ISG15 was also identified as a candidate gene, and this gene has been reported to be an important regulatory factor in the immune pathway, playing an important role in regulating antiviral activity [75,76]. Before the growth period in pigs, they have experienced weaning, transport and other conditions that make them prone to stress and infection, so the growth process inevitably activates genes and immune genes that play necessary roles at the cellular level to resist external stimuli.
Maturity index k is a growth rate-related trait, which has important implications for animal production. In this study, CEBPD and ZHX3 were identified as candidate genes related to trait k . In pigs, the transcription factor beta (CEBPD) promotes the synthesis of triglycerides by inhibiting the expression of the FGF21 gene [77]. ZHX3 from the zinc-fingers and homeoboxes (ZHX) family is associated with human development, aging, and osteoblast differentiation, which is consistent with the results of enrichment analysis [78,79,80,81]. Considering its important functions in humans, ZHX3 is worthy of further study as a candidate gene in piglets.
In summary, previous studies have confirmed or implied the important roles of the above-mentioned candidate genes associated with the growth curve parameters of piglets in the growth and development process. Among them, the role of MYNN in bone formation and development suggests its important and unique role in piglets. In addition, several candidate genes, such as PDGFA, EFHC1 and ZZZ3, were found to be involved in growth and development in other species, suggesting a potential effect of these genes on growth and development in piglets. Pig growth is a complex and continuous process, and thus there is a high probability that numerous candidate genes associated with standard growth and development were identified. Although it is hard to clearly determine whether these candidate genes are stage-specific regulatory genes in piglets (or perform stage-specific functions), the findings of this study provide valuable insights into the revealed genetic basis of piglets and provide information for improving the fattening efficiency in practical production.

5. Conclusions

Our genetic analysis of growth curve parameters in piglets has successfully identified several SNPs and candidate genes associated with parametric traits. These findings offer valuable information regarding the revealing genetic mechanisms of pig growth patterns during the piglet period. Such knowledge can be utilized to support the formulation of pig-breeding strategies with high fattening efficiency and can also provide a preliminary foundation for exploring the genetic factors influencing the fattening efficiency of pigs.

