**3. Discussion**

#### *3.1. Metabolites in the PLS-DA and Metabolic Pathways of Pigs*

The previous study reported that different breed types performed differently in RFI variation [8], so RFI-related metabolomics could be breed specific. Therefore, different breeds tend to exhibit different metabolite abundance values, for example, in studies involving the colostrum of pigs [12,13], the milk and plasma of cattle [8,14], the yolk and albumen of chickens [15,16], the plasma of dogs [17], and the fruit metabolite content of tomatoes [18]. In pigs, the heritability and genetic correlation of production traits of Duroc, Landrace, and Yorkshire pigs vary. Duroc pigs showed lower heritability of feed efficiency but greater performance of growth traits [19,20]. The metabolomics of this study showed that metabolite values vary between two pig breeds and between the sampling times (Figures 1A and 4C), as the metabolite profiles would change according to the breeds and time points [6]. Metabolites of Duroc from first sampling time and Landrace from second sampling time were apparently stratified, probably because metabolite values of these two groups and their metabolite profiles were different. However, metabolites of Duroc from second sampling time and Landrace from first sampling time were very close, probably because metabolite values of these two groups and their metabolite profiles

were very similar (Figure 1A). Hence, the breed di fferences between Duroc and Landrace pigs were driven both by genetic and metabolic factors.

The arginine biosynthesis pathway (ssc00220), where arginine, aspartic acid, citrulline, glutamic acid, and ornithine were significantly enriched in our study (Table 1), plays a crucial role in amino acid metabolism, particularly in the synthesis of citrulline and proline in pigs [21]. By linking arginine, glutamate, and proline in a bidirectional way, the arginine and proline metabolism pathway (ssc00330) biosynthesizes arginine and proline by glutamate. It is observed that proline metabolism is associated with metastasis formation of breast cancer [22]. In dairy cattle, the alanine, aspartate, and glutamate metabolism (ssc00250) identified in the gene-based pathways of our study (Table 1) is the potential metabolic biomarker between the low and high feed e fficient conditions [8].

#### *3.2. Genome-Wide Significant SNP-Related Genes Associated with Metabolites*

The previous GWAS for Duroc pigs identified two pleiotropic QTLs on chromosome one and seven for feed e fficiency [20]. Do et al. (2014) [10] revealed 19 significant SNPs located on several chromosomes (e.g., one, three, seven, eight, nine, ten, fourteen, and fifteen) that were highly associated with feed e fficiency in Yorkshire pigs. In addition, other studies also found significant SNPs associated with RFI on other chromosomes, for example, SNPs on chromosome five in the growing Piétrain–Large White pigs [23], on chromosome two in a crossed populations [24], on chromosome six in Large White pigs [25], etc. [26,27].

In this study, significant SNPs were mainly located on chromosome one (58/152), but the associated metabolites only mapped to 1-hexadecyl-sn-glycero-3-phosphocholine (1.7%), 1-myristoyl-sn-glycero-3-phosphocholine (1.7%), isovalerylcarnitine (47.0%), isoleucyl proline (0.9%), propionylcarnitine (47.0%), and lysoPC(16:0) (1.7%). Obviously, isovalerylcarnitine and propionylcarnitine primarily derived from amino acid catabolism were the major metabolites that associated with nine significant SNP-related genes (i.e., *CCNC*, *FBXL4*, *FHL5*, *LOC780435*, *MAT2B*, *MCHR2*, *PNISR*, *RRAGD*, and *SIM1*) on chromosome one (Supplementary Table S1). A previous study indicated that the amount of isovalerylcarnitine could decrease in the plasma and liver tissues but greatly increased in the muscle tissue, as a byproduct of leucine catabolism [28]. The isovalerylcarnitine compound was reported to be found in high amounts in the colostrum and milk of sows [29]. As a key role in the mitochondrial fatty acid transport and high-energy phosphate exchange, propionylcarnitine could improve cardiac dysfunction by reducing myocardial ischaemia [30].

#### *3.3. Gene and Metabolite Interaction Network*

Based on the gene interaction node *N6AMT1*, one gene–gene interaction was found to connect *CCNC* with *FBXL4* (Figure 4B), in which significant SNPs were annotated to gene components and associated with isovalerylcarnitine and propionylcarnitine (Table 3). As the members of *CDK8* mediator complex that can regulate β-catenin-driven transcription, *CCNC* encodes the cell cycle regulatory protein cyclin C and results in the protein dysfunction due to the mutations of *CCNC* [31,32]. *CCNC* is also believed to increase the quiescent cells to maintain *CD34* expression after knocking down *CCNC* expression in human cord blood [33], while the amplification of *CCNC* was in a relationship with the unfavorable prognosis [34]. *FBXL4* is considered to participate in oxidative phosphorylation, mitochondrial dynamics, cell migration, prostate cancer metastasis, circadian GABAergic cyclic alteration, etc. [35–39]. The association results in pigs found that blood and immune traits were associated with the SNPs of *FBXL4* [40]. The node *N6AMT1* is responsible for DNA 6mA methylation modification as the first glutamine-specific protein methyltransferase characterized in mammals; thus, glutamine could be regulated by the genes that promote porcine growth performance [41,42].

#### *3.4. Associations Linking SNP Genotypes, Metabolites, and RFI*

To investigate the direct association between SNP genotypes and RFIs, we also conducted GWAS for RFI (i.e., where the GWAS model included RFI as phenotype and SNPs as fixed e ffect/explanatory variable) in the mixed linear model. Unfortunately but as expected due to small sample size, the results showed that no genome-wide significant associations were found between SNPs and RFI values (*p*-values ≥ 2.09 × <sup>10</sup>−4). However, the top SNP was DRGA0008061 (*p*-values = 2.09 × <sup>10</sup>−4), and we found five genes (*ANGPTL2*, *AUTS2*, *GRIFIN*, *PTRH1*, and *SIRT5*) in which the top ten SNPs were annotated (Supplementary Table S3). In our previous studies, Banerjee et al. (2020) [11] also revealed that DRGA0008061 was one of the top significant SNPs associated with RFI after genome-wide epistatic interaction network analysis for feed efficiency using the same genotypes and pig populations as used in our current study. Meanwhile, Carmelo et al. (2020) [6] used Kolmogorov–Smirnov test to identify the significant metabolites associated with feed efficiency traits at two time points in Duroc and Landrace pigs. They found that 1-hexadecyl-sn-glycero-3-phosphocholine, 1-myristoyl-sn-glycero-3-phosphocholine, isovalerylcarnitine, lysoPC(16:0), and phosphocholine were significantly (*p*-value < 0.05) associated with RFI and other feed efficiency traits [6]. By matching the results from Carmelo et al. (2020) [6] with our results, we found that these five metabolites were also our main significant SNP-associated findings in GWAS (Table 3). Therefore, the triangular association of genotypes (SNP), metabolomics (metabolite), and feed efficiency (RFI) is established via our mGWAS (SNPs affecting metabolites) and GWAS (SNPs affecting RFI) and is linked with the previous studies [6,11].

#### **4. Materials and Methods**

#### *4.1. Animals, Metabolites, and Genotypes*

A total of 108 pigs were involved in this study including 59 Duroc and 49 Landrace pigs that were part of our own previous published studies [6,11]. The detailed description of the animal experiment and phenotype, metabolite, and genotypes data collection are available from our previously published studies, and all data used in this study were derived from our datasets that were already made public. Metabolite data [6] were accessed using MetaboLights accession ID MTBLS1384 with a link: https://www.ebi.ac.uk/metabolights/MTBLS1384. Genotype data [11] were accessed from National Center for Biotechnology Information (NCBI) GEO accession number: GSE144064 with the following link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144064. The genotype data was sequenced using GeneSeek-Neogen PorcineSNP80 BeadChip containing 68,528 loci based on the version Sscrofa10.2/susScr3 [11].

As in Carmelo et al. (2020) [6], all the pigs were purebred uncastrated males derived from sixteen-sire families in four generations and fed on the same diets. They had RFI values calculated for each pig as the difference between the observed daily feed intake (DFI) and the predicted daily feed intake (pDFI) [6] following the method of Nguyen et al. (2001) [43]. Nguyen et al. (2001) [43] firstly corrected the DFI for batch and sex and their interaction effects (i.e., fixed effects) and then estimated the pDFI from different regression models including growth rate and back fat after adjustments for above fixed effects; hence, Carmelo et al. (2020) [6] could compute RFIs in the same way by correcting fixed effects in their study. Finally, our study directly used RFIs together with other phenotypes by accessing the public dataset with a link: https://www.ebi.ac.uk/metabolights/MTBLS1384. The range of actual RFI values of Duroc were from −10 to 14, while Landrace's RFI value range was from −14 to 17 (Figure 5). The previous study conducted the metabolite–trait association analysis for RFI, so it was suggested that fatness or other factors should be adjusted in the calculation of RFI to achieve more accurate association results in their study [6]. As similar means of RFI for Duroc and Landrace pigs were observed in Figure 5 of our study, we assumed that fatness was adjusted in the calculation of RFI, but we cannot determine it. In this study, we selectively chose the extreme left and extreme right tails of distribution of feed efficiency (i.e., actual RFI values) distribution of all the pigs (n = 108) with one standard deviation (SD) from the mean [11,44] of actual RFI values. Then, they were defined as high RFI pigs (RFI ≤ −5.23, n = 11) and low RFI pigs (RFI ≥ 5.23, n = 16), respectively (Figure 5). The overview of the analysis workflow is shown in Figure 6 and included five scenarios of phenotypes

in the GWAS analysis based on different transformations of metabolites. The five types of phenotypes were (1) the metabolites from first sampling time, (2) the metabolites from second sampling time, (3) the metabolites from combined two sampling times (i.e., metabolite values from first and second sampling times were combined as an integrated dataset, where each pig had two metabolic values for one metabolite, but genotypes were same for the metabolite values between first and the second sampling times from the same pig), (4) the first component score (component 1) from partial least squares-discriminant analysis (PLS-DA), and (5) the metabolites enriched in the significant metabolic pathways (Figure 6).

**Figure 5.** Distribution of actual RFI values of Duroc (n = 59) and Landrace (n = 49) pigs.

**Figure 6.** Overall analysis workflow.

Metabolite data was downloaded by accessing MetaboLights accession ID MTBLS1384 with a link, https://www.ebi.ac.uk/metabolights/MTBLS1384, and were collected in two sampling times (i.e., the first sampling time was at the age when pig weighted approximately 28 kg, and the second sampling time was 45 days after the first sampling time) for each pig by the previous study [6]. Finally, 45 metabolites were used in this study (Figure 7) including 16 annotated metabolites (i.e., 1-hexadecyl-sn-glycero-3-phosphocholine, 1-myristoyl-sn-glycero-3-phosphocholine, (3-Carboxypropyl)trimethylammonium, 5-methyl-5,6-dihydrouracils, acetaminophen, acetylcarnitine, benzoic acid, cotinine, creatinine, indoleacrylic acid, isoleucyl proline, isovalerylcarnitine, leucyl methionine, lysoPC(16:0), manNAc, and propionylcarnitine) and 29 identified metabolites (i.e., 4-aminobenzoic acid, alanine, arginine, aspartic acid, carnitine, citrulline, cytidine, disaccharide, glutamic acid, guanine, guanosine, hypoxanthine, inosine, isoleucine, lactic acid, methionine, monosaccharide, nicotine amide, ornithine, phenylalanine, proline, pyruvic acid, riboflavine, sorbitol, thiamine, threonine, thymidine, uridine, and xanthine).

**Figure 7.** Statistical description of 45 metabolites from combined two sampling times. Note: The red solid circle indicates the limit of detection (LOD) relative value of each metabolite. LOD refers to the lowest value of a metabolite that the LC-MS method can detect. M1 to M45 indicate the metabolites of 1-hexadecyl-sn-glycero-3-phosphocholine, 1-myristoyl-sn-glycero-3-phosphocholine, (3-Carboxypropyl)trimethylammonium, 4-aminobenzoic acid, 5-methyl-5,6-dihydrouracils, acetaminophen, acetylcarnitine, alanine, arginine, aspartic acid, benzoic acid, carnitine, citrulline, cotinine, creatinine, cytidine, disaccharide, glutamic acid, guanine, guanosine, hypoxanthine, indoleacrylic acid, inosine, isoleucine, isoleucyl proline, isovalerylcarnitine, lactic acid, leucyl methionine, lysoPC(16:0), manNAc, methionine, monosaccharide, nicotine amide, ornithine, phenylalanine, proline, propionylcarnitine, pyruvic acid, riboflavine, sorbitol, thiamine, threonine, thymidine, uridine, and xanthine, respectively.

The genotype data was downloaded from NCBI GEO database with accession number: GSE144064 with a link, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144064, that was issued by the previous study [11]. After removing the markers with duplicated SNP positions (i.e., coordinates) (n = 274), unannotated SNP positions (n = 2618), and no genotypes (n = 3903) from GeneSeek-Neogen PorcineSNP80 BeadChip (68,516 SNP markers here), 61,721 SNP markers remained. Afterwards, we performed the imputation for missing markers using pedigree (i.e., all the pigs were derived from sixteen-sire families in four generations) by FImpute software (version 3) [45], as the closer relatives usually share longer haplotypes; therefore, pedigree information could contribute towards the FImpute software, achieving more accurate imputation [45,46]. Quality control (QC) for the imputed genotypes

was conducted based on the criteria of Hardy–Weinberg equilibrium (HWE > <sup>10</sup>−7) and minor allele frequency (MAF ≥ 0.001) by PLINK software (version 1.9) [47].

In this study, we also combined the metabolite values from the first sampling time and the second sampling time as an integrated dataset, so each pig had two values in one metabolite. However, the genotypes for two metabolite values were the same if one metabolite value was from the first sampling time of one pig while the other metabolite value was from the second sampling time of the same pig. In other words, each pig had two di fferent metabolite values but the same genotypes; thus, QC results of the integrated dataset (n = 206) were di fferent from the unintegrated dataset (n = 108), especially for HWE but not for MAF. Finally, the genotypes for the first sampling time and the second sampling time retained 47,109 SNP markers after removing unqualified 9337 (HWE ≤ <sup>10</sup>−7) and 5275 (MAF < 0.001) markers, while the genotypes for the combined two sampling times retained 42,393 SNP markers after removing unqualified 14,053 (HWE ≤ <sup>10</sup>−7) and 5275 (MAF < 0.001) markers.

#### *4.2. Partial Least Squares-Discriminant Analysis and Metabolic Pathway Enrichment for 45 Metabolites*

The partial least squares-discriminant analysis (PLS-DA) and metabolic pathway analysis for the 45 metabolites were performed by *MetaboAnalyst* software (version 4.0) [48] using *Homo sapiens* as the library. Fishers' exact test and relative betweenness centrality were used for the overrepresented analysis and the pathway impact value calculation (i.e., sum of importance measures of the matched metabolites divided by the sum of the importance measures of all the metabolites), respectively [49]. The first component scores (component 1) and metabolites enriched in the significant metabolic pathways after false discovery rate (FDR) correction of multiple hypothesis testing (FDR < 0.1) were selected as phenotypes of the transformed metabolites for GWAS. The mean calculated for the metabolites enriched in each significant metabolic pathway was considered as transformed metabolite values; thus, each pathway had one transformed metabolite value (i.e., the mean).

#### *4.3. Mixed Linear Model Based Association Analysis*

In this study, we considered other environmental factors (e.g., age) the same among all the pigs, so we only used breed and RFI as the fixed e ffects to directly link metabolites with genotypes. GWAS for 45 single metabolites and transformed metabolites (i.e., component 1 and enriched metabolites) was conducted by mixed linear model based association analysis in GCTA software (version 1.93.0) [50]. The mixed linear model is as follows:

$$\mathbf{y}\_{\ \cdot} = \mathbf{X}b + \mathbf{g} + \mathbf{e}\_{\prime} \tag{1}$$

where y is the vector of phenotypes (i.e., metabolites from the first, second, and combined two sampling times and the transformed metabolites); *b* is the vector of fixed e ffects including intercept, breed e ffects (i.e., Duroc and Landrace pigs), RFI e ffects (i.e., actual RFI values included as covariates), and SNP effects (i.e., candidate SNPs included as covariates) to be tested; *X* is the design matrix for fixed e ffects that includes SNP genotype indicators (i.e., 0, 1, or 2); *g* is the vector of polygenic e ffects as random effects that are the accumulated e ffects of all SNPs; and *e* is the vector of residual e ffects. The polygenic and residual variances are *Var*[*g*] = *G*σ<sup>2</sup> *g* and *Var*[*e*] = *I*σ<sup>2</sup> *e*, where *G* and *I* are the genetic relationship matrix (GRM) calculated using all SNPs and identity matrix, respectively.

#### *4.4. Significant SNPs and Their Annotated Genes*

The significant SNPs for GWAS were defined when the *p*-values were less than the threshold after Bonferroni correction for multiple hypothesis testing on genome level. The threshold for metabolites from the first and second sampling times was 1.06 × 10−<sup>6</sup> (i.e., 0.05/47109), while the threshold for combined two sampling times was 1.18 × 10−<sup>6</sup> (i.e., 0.05/42393). Then, the significant SNPs were annotated to the genes and gene components (i.e., promoters, exons, and introns) of porcine RefSeq database (Sscrofa10.2/susScr3) downloaded from University of California Santa Cruz (UCSC) genome browser (https://genome.ucsc.edu/cgi-bin/hgTables), where a window of 500 kb was used for the

annotation of intergenic regions to neighboring genes. In addition, we used the reference SNP (rsfSNP) ID (i.e., specific rs number) of significant SNPs to annotate them to the genes and gene components of latest porcine RefSeq database (Sscrofa11.1/susScr11).

Linkage disequilibrium (LD) analysis to display the potential haplotypes for the significant SNPs was performed using Haploview software (version 4.2) [51]. SNPs with a distance larger than 500 kb were ignored in the pairwise comparisons for LD analysis.

#### *4.5. Gene-Based Pathway Enrichment Analysis and Gene–Metabolite Interaction Network*

We used R package *KEGG.db* (version 3.2.3) of *Sus scrofa* species to annotate SNP-related genes in the gene-based pathway enrichments. Based on the adjusted *p*-values (p.adjust) < 0.2 under FDR control, the gene-based pathways were finally realized by R package *clusterprofiler* (version 3.12.0) [52]. The gene–gene interaction networks were created by *GeneMANIA* server [53,54] with default settings using *Homo sapiens* as the library. Then, the gene–metabolite networks for interactions between SNP-related genes and phenotype-related metabolites were created by *MetaboAnalyst* tool [55] with default settings using the same library of *Homo sapiens*. Significant SNP-associated metabolites based on the low-high RFI pigs were hierarchically clustered by Ward's method in Euclidean distance [56]. Then, a heat map for averaged metabolite clustering was visualized by *MetaboAnalyst* tool [48].
