Genome-Wide Identification of Specific Genetic Loci Common to Sheep and Goat

Zuoxiang Liang; Xiaoyu Yue; Yangxiu Liu; Mengyan Ye; Ling Zhong; Yue Luan; Qin Wang

doi:10.3390/biom14060638

,

and

¹

State Key Laboratory of Animal Biotech Breeding, National Engineering Laboratory for Animal Breeding, Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China

²

Department of Animal Science, University of Minnesota, Saint Paul, MN 55108, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Biomolecules2024, 14(6), 638;https://doi.org/10.3390/biom14060638

This article belongs to the Section Bioinformatics and Systems Biology

Version Notes

Order Reprints

Abstract

Sheep and goat may become carriers of some zoonotic diseases. They are important livestock and experimental model animals for human beings. The fast and accurate identification of genetic materials originating from sheep and goat can prevent and inhibit the spread of some zoonotic diseases, monitor market product quality, and maintain the stability of animal husbandry and food industries. This study proposed a methodology for identifying sheep and goat common specific sites from a genome-wide perspective. A total of 150 specific sites were selected from three data sources, including the coding sequences of single copy genes from nine species (sheep, goat, cow, pig, dog, horse, human, mouse, and chicken), the dbSNPs for these species, and human 100-way alignment data. These 150 sites exhibited low intraspecific heterogeneity in the resequencing data of 1450 samples from five species (sheep, goat, cow, pig, and chicken) and high interspecific divergence in the human 100-way alignment data after quality control. The results were proven to be reliable at the data level. Using the process proposed in this study, specific sites of other species can be screened, and genome-level species identification can be performed using the screened sites.

Keywords:

SGCSSs; species identification; PCA; MNSAs; D values

1. Introduction

Sheep (Ovis aries) and goat (Capra hircus), as two popular livestock, are not only important sources of meat, pelt, wool fiber, and dairy for human consumption but also have been utilized for scientific research on genetics and disease models [1,2,3,4,5]. However, sheep and goat are also potential carriers of many zoonotic pathogens [6]. A famous example is bovine spongiform encephalopathy (BSE), one type of transmissible spongiform encephalitis (TSE), leads to brain damage or death [7]. It was reported that sheep and goat BSE can be transmitted more efficiently than cow BSE in transgenic mice with human prion protein (PrP) [8]. Another serious zoonosis, brucellosis, also could be transmitted from sheep or goat to humans [9]. This disease, which is caused by the genus Brucella, threatens public health in many parts of the world [10,11]. Therefore, detecting genetic materials originating from sheep and goat is important for both food and public health surveillance.

Over the past few decades, the mitochondrial DNA (mtDNA) is widely utilized for species identification [12]. mtDNA, a small DNA located in mitochondria, is characterized by maternal inheritance, a high mutation rate, a low recombination level, and a high level of interspecies divergences [13]. Haider et al. reported that the cytochrome c oxidase subunit I (COI) gene of mtDNA, along with the polymerase chain reaction (PCR)–restriction fragment length polymorphism (RFLP) technique, can be used to efficiently distinguish eight species, including cow, chicken, turkey, sheep, pig, buffalo, camel, and donkey [14]. Recently, Jahura et al. found that two conserved fragments of the mitochondrial cytochrome b gene were able to accurately distinguish products originating from sheep and goat [15]. Although some successful examples exist, the application of mtDNA is limited due to its relatively low resolution and inability for quantification.

As next-generation sequencing (NGS) technologies become more and more popular, high-quality genome assemblies and accurate genome annotations are rapidly accumulating for various organisms, as well as for whole-genome resequencing data [16,17,18,19,20]. An increasing number of studies prefer to identify molecular markers in nuclear genomes for species identification due to the limitation of mtDNA. Although the data volumes increase over many orders of magnitude, requiring more effort in data analysis, the application of nuclear genomes still provides the following advantages: (1) It provides plenty of regions featuring high intraspecific concordance and interspecific divergences. For example, ultra-conserved elements (UCEs), known for their high conservation, exhibit superior performance to infer species phylogeny [21,22]. (2) It offers a collection of single copy genes. Single copy genes evolve slower than duplicated genes, which helps to increase the accuracy in sequence alignments [23,24,25]. Furthermore, compared to mtDNA that has high copy number variations in different individuals or tissues, the sites from single copy genes without any copy number variation are more suitable for quantitative analysis using real-time PCR [26,27]. (3) It provides high reliability at the data level. Huge numbers of single-nucleotide polymorphisms (SNPs) for more and more species can be used to verify whether the species-specific sites selected from single copy genes for species identification are credible or not. In addition, SNP genotyping is a very cost-effective way for species identification and exhibits highly accurate performance under modern techniques including real-time PCR and mass spectrum.

In this study, we developed a pipeline to identify highly confident sheep and goat common specific sites (SGCSSs) to distinguish the genetic materials derived from sheep–goat (sheep and goat) and seven other species. By doing this, high-quality assemblies and annotated genomic data as well as SNP data from nine species were used, including sheep, goat, cow (Bos taurus), pig (Sus scrofa), dog (Canis lupus familiaris), horse (Equus caballus), human (Homo sapiens), mouse (Mus musculus), and chicken (Gallus gallus) for comprehensive searching and filtering. Subsequently, we obtained 150 candidate SGCSSs.

Next, we performed a verification test using the resequencing data to investigate whether these particular sites belong only to sheep–goat or not. The resequencing data contained a total of 1450 samples from five species: chicken, pig, cow, sheep, and goat. In addition, we also analyzed the interspecific differences in SGCSS with high confidence. This study has two important meanings: (1) The first is accurate quantification. The use of 150 SGCSS to distinguish sheep and goat from other common mammals could reduce testing costs and increase efficiency. (2) A new screening strategy was proposed, in which multi-parameter screening was carried out step by step based on a large amount of data, and then specific loci were found to be common between sheep and goat but almost non-existent in other common mammals. This method of screening large amounts of data to obtain specific sites is novel and innovative.

2. Materials and Methods

2.1. Data Collection

Three data sources were involved in this study: (1) High-quality genomic data for sheep, goat, cow, pig, dog, horse, human, mouse, and chicken were downloaded from the Ensembl database (release 92, http://www.ensembl.org/). These included gene transfer format (GTF) files, dbSNP (vcf format) files, soft masked genomic sequences (genome versions in Supplementary Table S1), and coding sequences (CDS). (2) Human 100-way alignments were obtained from the UCSC database (http://genome.ucsc.edu/). (3) The SNP information from 1450 whole-genome resequencing (WGS) individuals were collected from published data and the data from our lab (data not released). This dataset contained five species: Ovis aries (sheep), Capra hircus (goat), Bos taurus (cow), Sus scrofa (pig), and Gallus (chicken). Since Ovis canadensis (bighorn sheep), Ovis dalli (dall sheep), Ovis orientalis (mouflon), and Ovis vignei (urial) are closely related to Ovis aries (sheep), all of them were assigned to sheep populations. Similarly, Capra aegagrus (bezoar) and Capra hircus (goat) were both assigned to goat populations; Bos taurus × indicus (cow) and Bos taurus (cow) were both assigned to cow populations; and Sus cebifrons (Visayan warty pig), Sus barbatus (bearded pig), Sus celebensis (Celebes wild boar), Sus verrucosus (Java warty pig), Phacochoerus africanus (common warthog), and Sus scrofa (pig) were assigned to pig populations.

In total, sheep populations contained 377 samples from more than 51 breeds, goat populations contained 217 samples, cow populations contained 34 samples, pig populations contained 366 samples from more than 46 breeds, and chicken populations contained 456 samples from 21 breeds. Additional details of the dataset are described in Supplementary Table S2.

2.2. Gene Family Clusters Construction

The TreeFam methodology was applied to define a gene family as a group of genes descended from a single gene in the recent common ancestor of all animals [28]. The TreeFam pipeline with same parameters from the previous sheep genome project was used to infer the orthologs among the nine species [29]. In detail, the perl script “bp_translate_seq.pl” in BioPerl v1.69 was first used to translate CDS to the corresponding peptide sequence [30]. Next, “all-against-all” BLASTP (E-value < 1 × 10⁻⁷) was performed to compute the similarities of all the protein sequences of the nice species. High-scoring segment pairs (HSPs) were then concatenated for each gene pair by SOLAR. When the region aligned to both genes exceeded 1/3 of the sequence length, a connection (edge) between the two nodes (genes) was added. Finally, the hcluster_sg algorithm (https://github.com/douglasgscofield/hcluster) was used to cluster the homologous genes into the corresponding gene families.

2.3. Sequence Alignment Processing

Based on the constructed gene family clusters, single copy orthologs were first extracted from species reference genomes. Next, multiple peptide sequence alignment (MPSA) for each ortholog was conducted using SATé v2.2.7 and MAFFT v7.408 [31,32]. To select high-quality MPSAs, Gblocks v0.91b (parameters: -t=p -b3=15 -b4=5 -b5=n) was applied to extract conserved domain sequences from each MPSA [33]. An MPSA, in which the length of the aligned segment was shorter than 100 amino acids or the proportion of gaps was more than 20% of the total alignments, was discarded. Finally, the remaining MPSAs were converted back to the format of multiple nucleotide sequence alignments (MNSAs).

2.4. Searching Candidate SGCSSs

Next, we carried out a sliding window strategy with window size = 20 bp and step size = 1 bp on each MNSA to screen the candidate SGCSSs. The criteria for a candidate site extracted from MNSAs were as follows: (1) The nucleobase in sheep must be identical to that in goat. (2) The nucleobases in sheep and goat must be different from those in the other seven species (cow, pig, dog, horse, human, mouse, and chicken). (3) The nucleobases located at the corresponding genomic positions of the other seven species must be consistent. (4) The nucleotide relationship between the candidate SGCSS and its corresponding sites in the other seven species must not follow the principle of DNA base pair complementarity. For example, suppose that the nucleobase of one candidate SGCSS was “C” for sheep–goat, the corresponding nucleobases in the other seven species must be “A” or “T” but not “G” (Figure 1). (5) Gaps within 20 bp of the candidate SGCSS are not permitted.

Figure 1. One MNSA window centered on candidate SGCSS (green) without gaps. The corresponding sites in the other seven species are shown in gray. The length of MNSA fragment surrounded by candidate SGCSS is 20 bp (red rectangle).

The candidate SGCSS containing variants in the population genetic data for each species must be removed from the list obtained from the previous step. Therefore, we filtered out candidate SGCSSs that had SNPs using the dbSNP data for all nine species. The number of SNPs recording in the dbSNP database ranged from 5,019,522 for horse to 324,781,098 for human (Supplementary Table S3). The candidate SGCSSs, along with their flanking sequences, were then mapped back to the genome to obtain the exact genomic location of the SGCSSs and their corresponding sites in seven other species. To do this, probes containing the candidate SGCSS or the corresponding sites in other species were trimmed from each sequence of each MNSA (Supplementary Figure S1), with flanking sequences that ranged from 21 to 41 bp. For all nine species, only the unique match was retained for further analysis. Finally, we retained the results where both candidate SGCSSs and corresponding sites had no SNP records in dbSNP.

2.5. Scoring Candidate SGCSSs

In this step, we scored the candidate SGCSSs to select sites that have maximum discrimination power using the human 100-way alignment data. After filtering out SGCSSs that were not recorded in the human 100-way alignments, we then removed the locus with a frequency of missing sites more than 5%, and the species with a frequency of missing sites more than 10%. To do this, the items, including nucleobases and gaps, that occurred in the alignments for each site were divided into three groups. In detail, for each species, a nucleobase was assigned to group 1 if it was identical to that in the candidate SGCSS and to group 2 if it differed from that in the candidate SGCSS. The missing site was assigned to group 3. Therefore, the frequency for each group could be calculated as follows:

p_{1 i} = \frac{n_{1 i}}{N}

(1)

p_{2 i} = \frac{n_{2 i}}{N}

(2)

p_{3 i} = \frac{n_{3 i}}{N}

(3)

where

n_{1 i}

is the counts of group 1 for the ith site,

n_{2 i}

is the counts of group 2 for the ith site,

n_{3 i}

is the counts of group 3 for the ith site, and N is the total number of species retained after quality control. We then utilized a

D_{i}

value which is the difference between

p_{2 i}

and

p_{1 i}

to score ith candidate SGCSS. The formula could be expressed as follows:

D_{i} = p_{2 i} - p_{1 i}

(4)

Finally, the top-ranked 150 SGCSSs with the highest D values were selected as the most informative SGCSSs.

2.6. The Specificity of SGCSS Analysis

To evaluate the performance of these 150 SGCSSs, we first merged these sites and their corresponding sites among other species from the human 100-way alignments into a pseudo-MNSA. We then conducted principal component analysis (PCA) and phylogeny construction on the MNSA. For the PCA, we first transformed the MNSA into an integer format, where each nucleobase in group 1 was given an integer value of 1, those in group 2 were set to −1, and those in group 3 were set to NA for each site. This step was implemented in scikit-learn v0.18 [34].We also built a maximum-likelihood phylogenetic tree for the MNSA using MEGA v7.0.18 [35] software and utilized FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/) to visualize the tree.

2.7. Tests for Selection Signals

To investigate the selection signals for SGCSSs, we performed the nucleotide diversity and Tajima’s D statistics for each of the five species (sheep, goat, cow, pig, and chicken) using the 1450 WGS individuals described above. All data processing was carried out using Plink v1.9 and the VCFtools v0.1.13 software [36,37]. For each species population, the nucleotide diversity and Tajima’s D were calculated applying a sliding window strategy with window of 1000 bp and shifting step size of 250 bp by using the scikit-allel v0.20.3 package (https://github.com/cggh/scikit-allel).

3. Results and Discussion

3.1. Identification of the Highly Confident SGCSSs

The overall process for searching SGCSSs is shown in Figure 2. We started with inferring the orthologs among nine species (sheep, goat, cow, pig, dog, horse, human, mouse, and chicken) using TreeFam. As a result, a total of 2916 single copy gene families were identified from 12,361 orthologous groups, with an average of 14.27% in nine genomes (ranging from 12.79% in human to 16.27% in goat, Figure 3). After translating the CDS to peptide sequences, we performed multiple sequence alignments for each single copy ortholog. Moreover, 92% of the length of MPSAs exceeded 100 amino acids, indicating high-quality assemblies and annotations for the nine investigated genomes. However, there were only 810 (27.7%) MPSAs with a length longer than 100 amino acids and proportion of gaps less than 20%. This might be explained by the reason that the nine species were evolutionarily distant.

Figure 2. Overview of the highly confident SGCSSs identification. MPSA, multiple peptide sequence alignment; MNSA, multiple nucleotide sequence alignment.

Figure 3. Single copy orthologs examined in the nine species. The proportion of single copy genes in whole genes of every species are represented in blue color and other types are shown in green.

To obtain highly confident SGCSSs, as shown in Table 1, we applied a serial of filtering steps, starting with 3245 candidate SGCSSs from 810 MNSAs. We then found that 3230 out of 3245 candidate SGCSSs, along with their corresponding sites among the other seven species, had exact genomic coordinates. After that, the sites that had SNP records in the dbSNP data were filtered, resulting in 2136 candidate SGCSSs.

Table 1. The number of remaining candidate SGCSSs after each filter step.

These candidate SGCSSs were only identified from nine species, but it is still unclear whether these sheep and goat specific sites exist in other species. Therefore, we downloaded the human 100-way alignment data, which contain the alignment information for 100 species from UCSC, to filter out sites that were not specific to sheep and goat. After filtering out non-recorded sites and performing quality control for missing data, the number of candidates were reduced to 1158. Meanwhile, the species in which the frequency of missing sites exceeded 10% during the quality control step were discarded, and 96 species remained. Finally, the top-ranked 150 sites based on the D value were selected as the SGCSSs list (Supplementary Table S4).

An ideal SGCSS should be in accordance with the rules of low intraspecific heterogeneity and high interspecific divergence. In this study, low intraspecific heterogeneity was defined as follows: the allele at the SGCSS locus, where the base was identical to the base (or its complementary base) on the sheep reference genome, must have exhibited a very high frequency (>0.95), whereas another allele, whose base differed from the sheep reference genome should have had a sharply low frequency (<0.05). High interspecific divergence was defined as follows: these SGCSSs could be clearly separated from other species using some clustering algorithms, such as PCA. Therefore, we next conducted a performance evaluation on the 150 SGCSSs based on these two principles.

3.2. Evaluation of Intraspecific Heterogeneity

Our first step was to verify whether the 150 SGCSSs adhered to the principle of low intraspecific heterogeneity. The resequencing data containing 1450 animals from five species were used to carry out this test. We calculated the frequency of the site that was homozygous and consistent with the reference genome sequence (homozygous reference genotype). We found high intraspecific consistency for these 150 SGCSSs. As shown in Table 2, only 4 (2.7%) out of the 150 SGCSSs exhibited allele variation in certain species. Nevertheless, we found an extremely high proportion of homozygous reference genotypes in all of four SGCSSs (0.950–1). This suggests that the final SGCSSs we selected were highly reliable.

Table 2. The frequency of homozygous reference genotypes of 4 sites of 150 SGCSSs. The coordinates of these four sites are based on sheep reference genome.

3.3. Evaluation of Interspecific Divergence

Our second step was to verify whether these SGCSSs exhibited high interspecific divergence. We performed PCA on these 150 SGCSSs and their corresponding sites in the other 94 species, which were extracted from the human 100-way alignment data after quality control and converted from the ACGT format to integers. As expected, the result clearly exhibited separation between sheep–goat and the other 94 species (Figure 4a). It is worth noting that the PCA result also revealed that the Tibetan antelope, which is a member of the same Caprinae subfamily as sheep and goat, could be clearly distinguished from other species. Although these SGCSSs were screened from the datasets without genomic information about Tibetan antelope, they clustered closer with sheep and goat rather than other species. This indicated that these SGCSSs could also be used for species identification between Tibetan antelope and sheep–goat. Phylogenetic analysis provided a consistent result. As shown in Figure 4b, the maximum-likelihood phylogenetic tree revealed that sheep, goat, and Tibetan antelope were grouped more closely together than other species. Interestingly, we also observed that sheep, goat, and Tibetan antelope clustered closely to many aquatic animals, such as spotted gar, zebrafish, and southern platy fish, in the phylogenetic tree. This thus implied that these 150 SGCSSs might have appeared very early in evolution.

Figure 4. The PCA and phylogenetic analysis among 96 species by using 150 SGCSSs and their corresponding sites from the other 94 species. (a) First two principal components are used for these 96 species. (b) The unrooted phylogenetic tree constructed using maximum-likelihood algorithm in MEGA v7.0.18. The lines in red color refer to sheep, goat, and Tibetan antelope.

Overall, our analysis showed that these 150 SGCSSs meet the principles of low intraspecific heterogeneity and high interspecific divergence.

3.4. Characterization of Selected SGCSSs

We then analyzed the genes containing these SGCSSs using sheep genomic coordinates. There were 103 genes with these sites. Most genes contained only one SGCSS, while a few contained more than one, such as UBR4 (n = 9), NCOA6 (n = 6), HELZ (n = 5), TRIP12 (n = 4), and BIRC6 (n = 4) (Supplementary Table S4).

We defined an SGCSS as a non-synonymous site if amino acid alternations occurred in any of the other seven species or a synonymous site if no amino acid alternation was observed in either of the other seven species. (details in Supplementary Table S5). It was found that 123 (82%) sites were synonymous, while the remaining 27 (18%) sites were non-synonymous substitutions. Among the 27 non-synonymous sites, cow shared the highest number of identical amino acids with sheep and goat (n = 23), while chicken shared the lowest number (n = 7) (Figure 5a). Considering the evolutionary distance among the nine species, it was implied that 27 non-synonymous substitutions had the power to effectively distinguish these nine species (Figure 5b).

Figure 5. The matrix for the number of common non-synonymous substitutions between nine species (a) and the phylogenetic tree constructed using 27 non-synonymous sites in nine species (b).

3.5. Case Study for Functionality of SGCSSs

As a case study, we observed that the SGCSSs located on the NCOA6 gene exhibited a high level of differentiation between sheep–goat and the other seven species (Supplementary Table S5). As a nuclear receptor coactivator, NCOA6 plays an essential role in many biological functions, such as the regulation of transcription, pigmentation, adipocyte differentiation, etc. [38,39,40,41]. NCOA6 is located on sheep chromosome 13, and two non-synonymous substitutions associated with sheep and goat common specific peptides (SGCSPs) were detected from the 150 SGCSSs (Supplementary Table S5). The sites for these two peptides were located at chr13:63638684 and chr13:63638708 of the sheep reference genome. We further detected the selection signals by using 1450 resequencing individuals from five species. Since the distance between these two sites was very close (24 bp), 200 kb regions centered at chr 13:63638708 of the sheep reference genome or its corresponding sites on goat, cow, pig, and chicken reference genomes were respectively selected. Low levels of nucleotide diversity were observed in all species (Supplementary Figure S2 and Supplementary Table S6), especially near chr 13:63638708 or its corresponding sites. Negative Tajima’s D values, indicating positive or negative selection, were also observed in the goat, cow, pig, and chicken populations, particularly in the left regions of the SGCSS site (Figure 6 and Supplementary Table S6). Thus, the analysis of nucleotide diversity and Tajima’s D indicated that the regions near these two SGCSSs or its corresponding sites might have experienced natural selection in five species.

Figure 6. The Tajima’s D analysis for an SGCSS and its corresponding sites on five species. The SGCSS is located at chr13:63638708 of sheep reference genome. To perform this analysis, a 200 kb region centered on the SGCSS is extracted from sheep genome (red). Similarly, the 200 kb regions centered on the corresponding sites are extracted from goat (green), cow (blue), pig (blue), and chicken (blue) reference genomes.

4. Conclusions

In conclusion, we built a pipeline and identified 150 SGCSSs through extensive genome scanning. The SGCSSs effectively distinguished sheep and goat from the other seven species with both low intraspecific heterogeneity and high interspecific divergence. Our results showed that species identification could be obtained from nuclear genomes, such as these SGCSSs, which was an efficient and cost-effective strategy. Since these SGCSSs were selected from single copy genes, this thus provided the potential for quantitative analysis, although this was not included in the current study. The existence of non-synonymous SGCSSs could be due to genetic drift or functional alternation. Our approach also provided an example for an SGCSS of selection signal identification in the genome.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom14060638/s1, Supplementary Table S1. The reference genome versions used for identifying candidate SGCSSs. Supplementary Table S2. Detailed information of 1450 resequencing samples. Supplementary Table S3. The number of SNP recorded in Ensembl dbSNP database (version 92). Supplementary Table S4. Summary of 150 SGCSSs and their corresponding sites among the other seven species. The position for each SGCSS is based on sheep reference genome. The nucleobases are extracted from nine species reference genomes. Supplementary Table S5. The substitution types of 150 SGCSSs. The position for each SGCSS is based on sheep reference genome. The nucleobases are extracted from nine species CDS data and then translated to peptides. Supplementary Table S6. The nucleotide diversity and Tajimas’D analysis for an SGCSS and its corresponding sites on five species. Supplementary Figure S1. A schematic diagram showing a group of probes in a species. Supplementary Figure S2. The nucleotide diversity analysis for an SGCSS and its corresponding sites on five species.

Author Contributions

Conceptualization, Q.W.; writing—original draft preparation, Z.L.; methodology and formal analysis, Z.L.; investigation and data curation, Z.L., X.Y. and Y.L. (Yangxiu Liu); writing—review and editing, Q.W., Z.L., X.Y. and Y.L. (Yangxiu Liu); visualization, X.Y., M.Y., L.Z. and Y.L. (Yue Luan); supervision, funding acquisition, and project administration, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China, No. 2022YFD1600104; Chinese Universities Scientific Fund, No. 2023RC039.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article.

Acknowledgments

We acknowledge the Major Scientific and Technological Project in the Inner Mongolia Autonomous Region (2021ZD0019-2) for the support with the high-performance computing platform. Particularly, we thank Yiqiang Zhao and Ruifei Yang for their help with technical assistance and visualization. Thanks also to Yuzhe Wang for providing chicken genome resequencing data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tedeschi, L.O.; Cannas, A.; Fox, D.G. A nutrition mathematical model to account for dietary supply and requirements of energy and other nutrients for domesticated small ruminants: The development and evaluation of the Small Ruminant Nutrition System. Small Rumin. Res. 2010, 89, 174–184. [Google Scholar] [CrossRef]
Alberto, F.J.; Boyer, F.; Orozco-terWengel, P.; Streeter, I.; Servin, B.; de Villemereuil, P.; Benjelloun, B.; Librado, P.; Biscarini, F.; Colli, L.; et al. Convergent genomic signatures of domestication in sheep and goats. Nat. Commun. 2018, 9, 813. [Google Scholar] [CrossRef] [PubMed]
Hammer, R.E.; Pursel, V.G.; Rexroad, C.E., Jr.; Wall, R.J.; Bolt, D.J.; Ebert, K.M.; Palmiter, R.D.; Brinster, R.L. Production of transgenic rabbits, sheep and pigs by microinjection. Nature 1985, 315, 680–683. [Google Scholar] [CrossRef]
Shumbusho, F.; Raoul, J.; Astruc, J.M.; Palhiere, I.; Lemarie, S.; Fugeray-Scarbel, A.; Elsen, J.M. Economic evaluation of genomic selection in small ruminants: A sheep meat breeding program. Animal 2016, 10, 1033–1041. [Google Scholar] [CrossRef] [PubMed]
Wilkes, D.; Li, G.; Angeles, C.F.; Patterson, J.T.; Huang, L.Y. A large animal neuropathic pain model in sheep: A strategy for improving the predictability of preclinical models for therapeutic development. J. Pain Res. 2012, 5, 415–424. [Google Scholar] [CrossRef] [PubMed]
Ganter, M. Zoonotic risks from small ruminants. Vet. Microbiol. 2015, 181, 53–65. [Google Scholar] [CrossRef] [PubMed]
Scott, M.R.; Will, R.; Ironside, J.; Nguyen, H.O.; Tremblay, P.; DeArmond, S.J.; Prusiner, S.B. Compelling transgenetic evidence for transmission of bovine spongiform encephalopathy prions to humans. Proc. Natl. Acad. Sci. USA 1999, 96, 15137–15142. [Google Scholar] [CrossRef] [PubMed]
Padilla, D.; Beringue, V.; Espinosa, J.C.; Andreoletti, O.; Jaumain, E.; Reine, F.; Herzog, L.; Gutierrez-Adan, A.; Pintado, B.; Laude, H.; et al. Sheep and Goat BSE Propagate More Efficiently than Cattle BSE in Human PrP Transgenic Mice. PLoS Pathog. 2011, 7, e1001319. [Google Scholar] [CrossRef] [PubMed]
Reviriego, F.J.; Moreno, M.A.; Dominguez, L. Risk factors for brucellosis seroprevalence of sheep and goat flocks in Spain. Prev. Vet. Med. 2000, 44, 167–173. [Google Scholar] [CrossRef]
Seleem, M.N.; Boyle, S.M.; Sriranganathan, N. Brucellosis: A re-emerging zoonosis. Vet. Microbiol. 2010, 140, 392–398. [Google Scholar] [CrossRef]
Wareth, G.; Melzer, F.; Tomaso, H.; Roesler, U.; Neubauer, H. Detection of Brucella abortus DNA in aborted goats and sheep in Egypt by real-time PCR. BMC Res. Notes 2015, 8, 212. [Google Scholar] [CrossRef]
Atig, R.K.; Hsouna, S.; Beraud-Colomb, E.; Abdelhak, S. Mitochondrial DNA: Properties and applications. Arch. De L’Institut Pasteur De Tunis 2009, 86, 3–14. [Google Scholar]
Hebert, P.D.N.; Ratnasingham, S.; deWaard, J.R. Barcoding animal life: Cytochrome oxidase subunit 1 divergences among closely related species. Proc. R. Soc. B Boil. Sci. 2003, 270, S96–S99. [Google Scholar] [CrossRef]
Haider, N.; Nabulsi, I.; Al-Safadi, B. Identification of meat species by PCR-RFLP of the mitochondrial COI gene. Meat Sci. 2012, 90, 490–493. [Google Scholar] [CrossRef]
Jahura, F.; Munira, S.; Bhuiyan, A.; Hoque, M.; Bhuiyan, M. Molecular detection of goat and sheep meat origin using mitochondrial cytochrome b gene. Bangladesh J. Anim. Sci. 2016, 45, 41–45. [Google Scholar] [CrossRef]
Altshuler, D.M.; Durbin, R.M.; Abecasis, G.R.; Bentley, D.R.; Chakravarti, A.; Clark, A.G.; Donnelly, P.; Eichler, E.E.; Flicek, P.; Gabriel, S.B.; et al. An integrated map of genetic variation from 1092 human genomes. Nature 2012, 491, 56–65. [Google Scholar]
Elsik, C.G.; Tellam, R.L.; Worley, K.C.; Gibbs, R.A.; Abatepaulo, A.R.R.; Abbey, C.A.; Adelson, D.L.; Aerts, J.; Ahola, V.; Alexander, L.; et al. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution. Science 2009, 324, 522–528. [Google Scholar]
Groenen, M.A.M.; Archibald, A.L.; Uenishi, H.; Tuggle, C.K.; Takeuchi, Y.; Rothschild, M.F.; Rogel-Gaillard, C.; Park, C.; Milan, D.; Megens, H.J.; et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature 2012, 491, 393–398. [Google Scholar] [CrossRef] [PubMed]
Li, R.; Fan, W.; Tian, G.; Zhu, H.; He, L.; Cai, J.; Huang, Q.; Cai, Q.; Li, B.; Bai, Y.; et al. The sequence and de novo assembly of the giant panda genome. Nature 2010, 463, 311–317. [Google Scholar] [CrossRef]
Wang, J.; Wang, W.; Li, R.; Li, Y.; Tian, G.; Goodman, L.; Fan, W.; Zhang, J.; Li, J.; Zhang, J.; et al. The diploid genome sequence of an Asian individual. Nature 2008, 456, 60–65. [Google Scholar] [CrossRef]
Green, R.E.; Braun, E.L.; Armstrong, J.; Earl, D.; Nguyen, N.; Hickey, G.; Vandewege, M.W.; St John, J.A.; Capella-Gutierrez, S.; Castoe, T.A.; et al. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science 2014, 346, 1254449. [Google Scholar] [CrossRef] [PubMed]
Jarvis, E.D.; Mirarab, S.; Aberer, A.J.; Li, B.; Houde, P.; Li, C.; Ho, S.Y.; Faircloth, B.C.; Nabholz, B.; Howard, J.T.; et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 2014, 346, 1320–1331. [Google Scholar] [CrossRef] [PubMed]
Kosiol, C.; Vinar, T.; da Fonseca, R.R.; Hubisz, M.J.; Bustamante, C.D.; Nielsen, R.; Siepel, A. Patterns of positive selection in six Mammalian genomes. PLoS Genet. 2008, 4, e1000144. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhou, Q.; Wang, Y.; Luo, L.; Yang, J.; Yang, L.; Liu, M.; Li, Y.; Qian, T.; Zheng, Y.; et al. Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration. Nat. Commun. 2015, 6, 10033. [Google Scholar] [CrossRef] [PubMed]
Wu, H.; Guang, X.; Al-Fageeh, M.B.; Cao, J.; Pan, S.; Zhou, H.; Zhang, L.; Abutarboush, M.H.; Xing, Y.; Xie, Z.; et al. Camelid genomes reveal evolution and adaptation to desert environments. Nat. Commun. 2014, 5, 5188. [Google Scholar] [CrossRef] [PubMed]
Köppel, R.; Daniels, M.; Felderer, N.; Brünen-Nieweler, C. Multiplex real-time PCR for the detection and quantification of DNA from duck, goose, chicken, turkey and pork. Eur. Food Res. Technol. 2013, 236, 1093–1098. [Google Scholar] [CrossRef]
Laube, I.; Zagon, J.; Broll, H. Quantitative determination of commercially relevant species in foods by real-time PCR. Int. J. Food Sci. Technol. 2007, 42, 336–341. [Google Scholar] [CrossRef]
Li, H.; Coghlan, A.; Ruan, J.; Coin, L.J.; Hériché, J.K.; Osmotherly, L.; Li, R.Q.; Liu, T.; Zhang, Z.; Bolund, L.; et al. TreeFam: A curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 2006, 34, D572–D580. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Xie, M.; Chen, W.B.; Talbot, R.; Maddox, J.F.; Faraut, T.; Wu, C.H.; Muzny, D.M.; Li, Y.X.; Zhang, W.G.; et al. The sheep genome illuminates biology of the rumen and lipid metabolism. Science 2014, 344, 1168–1173. [Google Scholar] [CrossRef]
Stajich, J.E.; Block, D.; Boulez, K.; Brenner, S.E.; Chervitz, S.A.; Dagdigian, C.; Fuellen, G.; Gilbert, J.G.R.; Korf, I.; Lapp, H.; et al. The bioperl toolkit:: Perl modules for the life sciences. Genome Res. 2002, 12, 1611–1618. [Google Scholar] [CrossRef]
Katoh, K.; Kuma, K.; Toh, H.; Miyata, T. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33, 511–518. [Google Scholar] [CrossRef]
Liu, K.; Raghavan, S.; Nelesen, S.; Linder, C.R.; Warnow, T. Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees. Science 2009, 324, 1561–1564. [Google Scholar] [CrossRef] [PubMed]
Talavera, G.; Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007, 56, 564–577. [Google Scholar] [CrossRef] [PubMed]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. [Google Scholar] [CrossRef] [PubMed]
Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.R.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.W.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed]
Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef] [PubMed]
Olsen, H.G.; Knutsen, T.M.; Kohler, A.; Svendsen, M.; Gidskehaug, L.; Grove, H.; Nome, T.; Sodeland, M.; Sundsaasen, K.K.; Kent, M.P.; et al. Genome-wide association mapping for milk fat composition and fine mapping of a QTL for de novo synthesis of milk fatty acids on bovine chromosome 13. Genet. Sel. Evol. 2017, 49, 20. [Google Scholar] [CrossRef] [PubMed]
Puig-Oliveras, A.; Revilla, M.; Castelló, A.; Fernández, A.I.; Folch, J.M.; Ballester, M. Expression-based GWAS identifies variants, gene interactions and key regulators affecting intramuscular fatty acid content and composition in porcine meat. Sci. Rep. 2016, 6, 31803. [Google Scholar] [CrossRef]
Singh, K.; Davis, S.R.; Dobson, J.M.; Molenaar, A.J.; Wheeler, T.T.; Prosser, C.G.; Farr, V.C.; Oden, K.; Swanson, K.M.; Phyn, C.V.C.; et al. cDNA microarray analysis reveals that antioxidant and immune genes are upregulated during involution of the bovine mammary gland. J. Dairy Sci. 2008, 91, 2236–2246. [Google Scholar] [CrossRef]
Sturm, R.A.; Duffy, D.L. Human pigmentation genes under environmental selection. Genome Biol. 2012, 13, 248. [Google Scholar] [CrossRef] [PubMed]

Figure 1. One MNSA window centered on candidate SGCSS (green) without gaps. The corresponding sites in the other seven species are shown in gray. The length of MNSA fragment surrounded by candidate SGCSS is 20 bp (red rectangle).

Figure 2. Overview of the highly confident SGCSSs identification. MPSA, multiple peptide sequence alignment; MNSA, multiple nucleotide sequence alignment.

Figure 3. Single copy orthologs examined in the nine species. The proportion of single copy genes in whole genes of every species are represented in blue color and other types are shown in green.

Figure 4. The PCA and phylogenetic analysis among 96 species by using 150 SGCSSs and their corresponding sites from the other 94 species. (a) First two principal components are used for these 96 species. (b) The unrooted phylogenetic tree constructed using maximum-likelihood algorithm in MEGA v7.0.18. The lines in red color refer to sheep, goat, and Tibetan antelope.

Figure 5. The matrix for the number of common non-synonymous substitutions between nine species (a) and the phylogenetic tree constructed using 27 non-synonymous sites in nine species (b).

Figure 6. The Tajima’s D analysis for an SGCSS and its corresponding sites on five species. The SGCSS is located at chr13:63638708 of sheep reference genome. To perform this analysis, a 200 kb region centered on the SGCSS is extracted from sheep genome (red). Similarly, the 200 kb regions centered on the corresponding sites are extracted from goat (green), cow (blue), pig (blue), and chicken (blue) reference genomes.

Table 1. The number of remaining candidate SGCSSs after each filter step.

Before	After	Filter Factors
-	3245	Non-candidate SGCSSs
3245	3230	No genomic coordinates
3230	2136	SNPs observed in dpSNP
2136	1158	No records or quality control (human 100-way alignments)
1158	150	D value

Table 2. The frequency of homozygous reference genotypes of 4 sites of 150 SGCSSs. The coordinates of these four sites are based on sheep reference genome.

Chr	Position	Genotype1	Genotype2	Sheep	Goat	Cow	Pig	Chicken
5	47181590	GG, CC	AA, TT	1	1	1	0.984	1
12	10283861	TT	GG, CC	1	1	1	1	0.993
12	40543138	AA	GG, CC	1	1	1	1	0.993
21	39262113	CC	AA, TT	0.95	1	1	1	1

Genotype1 represents the homozygous reference genotype in sheep and goat. Genotype2 represents the homozygous reference genotype in cow, pig, and chicken.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Genome-Wide Identification of Specific Genetic Loci Common to Sheep and Goat

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Gene Family Clusters Construction

2.3. Sequence Alignment Processing

2.4. Searching Candidate SGCSSs

2.5. Scoring Candidate SGCSSs

2.6. The Specificity of SGCSS Analysis

2.7. Tests for Selection Signals

3. Results and Discussion

3.1. Identification of the Highly Confident SGCSSs

3.2. Evaluation of Intraspecific Heterogeneity

3.3. Evaluation of Interspecific Divergence

3.4. Characterization of Selected SGCSSs

3.5. Case Study for Functionality of SGCSSs

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics