1. Introduction
Animal husbandry is the cornerstone of modern agriculture. The excavation, protection, and evaluation of germplasm resources are the prerequisites for solving the “stuck neck” problem at the source of China’s livestock industry. Breed resources have developed diverse productivity types and a rich genetic diversity during evolution [
1]. China is rich in waterfowl. However, with the continuous expansion of human activities and increasing environmental pressures, waterfowl breeds are facing serious threats and declines [
2,
3]. The current “List of Reusable Breeds of Animal Genetic Resources in China” includes 30 local goose breeds. Thirteen goose breeds are close to endangered status, fourteen local breeds are on the brink of extinction, and three breeds have already become extinct, including
Caohai,
Wenshan, and
Simao geese [
4,
5]. To protect the diversity of rare and endemic endangered goose breeds, a national resource census survey found that some local goose breeds, such as the Lingxian White (LX), Yangjiang (YJ), Yan (YE), Wuzong (WZ), Baizi (BZ), and Xupu (XP) geese (
Anser cygnoides), have less than 1000 hens [
6]. The efficiency and output values of genetic resources in some places are low, and high-yield breeds are often introduced for hybridization. Chaotic mating is a serious phenomenon that results in fewer pure breeds, and some existing goose breeds are close to extinction.
The genetic diversity of organisms is the basis for ensuring the survival and evolution of breeds and is of great significance for evolutionary polymorphism analyses, genetic relationship analyses, germplasm resource optimization, and the protection of existing breeds [
7]. Currently, microsatellite markers and whole genome sequencing methods are used to study the genetic diversity and population structure of endangered goose breeds [
8,
9,
10]. Wen et al. also used whole genome sequencing to reveal the origin breeds of Chinese domestic geese [
11]. Most domestic goose breeds in China came from
Anser cygnoides, whereas European domestic geese were derived from
Anser anser. Only the Ili goose, which is distributed throughout Xinjiang, originated from
Anser anser [
11]. Heikkinen et al. selected 14 breeds (
Anser anser) from the Eurasian continent and conducted the first genome-based inference using whole genome markers of European geese. They explain that the fixation index of European goose breeds (grayleg and domestic geese) is 0.15800 [
12]. In these efforts, due to the limited number of known markers, the investigators used markers that originated from different species or breeds. This resulted in fewer alleles being labeled in many loci or the target sequence not being amplified via a polymerase chain reaction (
PCR). Weiβ et al. reported that several microsatellite markers were isolated from grayleg geese (
Anser anser) [
13]. However, most of those markers revealed low levels of polymorphism in endemic endangered goose breeds. Therefore, the use of microsatellite markers and whole genome sequencing methods to investigate the genetic diversity of species has certain limitations. The mitochondrial genome has unique characteristics that distinguish it from the nuclear genome.
Mitochondrial DNA (mtDNA) has specific characteristics such as a simple structure, maternal inheritance, rapid evolutionary speed, and almost no recombination, making it the best molecular genetic marker for studying a species’ origin, evolution, and classification [
14]. mtDNA is a type of DNA that exists in all animal cells. It is a circular genome with 16,569 base pairs (bp), which are passed down through maternal generations [
15]. The mitochondrial genome has many copies of mtDNA, including 13 protein-coding genes,
22 tRNA genes,
2 rRNA genes, and non-coding regions. The mtDNA cytochrome b (
CYTB) gene has a moderate evolutionary rate and is suitable for detecting genetic differences at the population level. It is an ideal marker for studying a population’s genetic structure and diversity [
16,
17]. Previous studies have mostly focused on the mitochondrial
D-loop,
ND6, and
COI regions. Abdel-Kafy et al. used the
D-loop region to study the phenotype and genetic characteristics of Egyptian geese and found that the potential heritability of the head, stem, tarsal length, and live weight was relatively low [
18]. Jia et al. used the mitochondrial
ND6 region of chickens to explore the sequence combinations of several different regions between breeds, which can provide a more comprehensive and accurate understanding of the maternal origin of chickens [
19]. Zhang et al. suggested that the mitochondrial
COI region could be amplified to identify goose breeds [
20]. There have been no systematic studies on the genetic diversity and evolutionary analysis of mitochondrial
CYTB genes in the six locally endangered goose breeds (
Anser cygnoides) included in our study.
In this study, blood samples from six locally endangered goose breeds were collected, and the DNA was purified.
PCR amplification and Sanger sequencing were performed on
CYTB mtDNA molecular markers to establish a DNA barcode database. The complete sequence of the mitochondrial
CYTB gene has been used as a molecular marker to evaluate the genetic diversity of mtDNA and identify specific polymorphic sites [
21]. The mitochondrial gene labeling method is expected to provide a theoretical basis for the identification, protection, breeding, and utilization of endemic endangered geese genetic resources in China.
4. Discussion
Biodiversity includes genetic, reproductive, and ecological diversity [
31]. Research on the global genetic diversity of organisms is a prerequisite for exploring the evolution of life and reproductive diversity. The genetic diversities of various animals can be analyzed at the genetic and molecular levels [
32]. By further studying genetic diversity using molecular markers, genomic DNA markers (microsatellites, internal transcription spacer 2 [
ITS2], and mitochondrial markers [
cytochrome b,
COI, and
ND4]), we can reveal the variations or genes involved in the phenotypic changes of different breeds that may rapidly evolve after domestication, forming specific phenotypic characteristics. Regions or loci that have undergone selection exhibit specific characteristics, including high population differentiation, significantly reduced nucleotide diversity levels, and long-range haplotype homozygosity [
33,
34]. Therefore, based on these principles, we studied different parameters by measuring specific sites in the mtDNA
CYTB region. The highest haplotype diversity and average nucleic acid difference detected within the six breeds of geese included in our study were 0.993 ± 0.011 and 8.463 ± 0.202, respectively. Li et al. determined the mtDNA
D-loop sequences of 26 goose breeds
(Anser cygnoides) and six
Lande geese (
Anser anser), with an average
Hd of 0.1384 and
Pi of 0.00029 for common breeds of Chinese geese [
35]. The genetic diversity of endemic endangered goose breeds is considerably higher than that of other goose breeds (
Anser cygnoides). Therefore, we believe that the mitochondrial
CYTB region has accurate maternal inheritance, a conservative genetic structure, a moderate evolutionary rate, and is an effective molecular marker for studying the genetic structure of endangered waterfowl. This may be owing to the differences in the specific sites of each gene-coding and non-coding region in the mtDNA fragment. When the obtained
CYTB sequence of six goose breeds was compared with data from GenBank,
Anser cygnoides was identified using BLAST. A study of greylag geese (
Anser anser) using 204 base pair fragments of mitochondrial control regions showed that the purine (A + T) content was slightly higher than that of pyrimidine (G + C) [
36]. This phenomenon was consistent with our results. It is possible that the nucleotide encoding the protein causing the mutation was subject to less natural selection pressure at the codon site. In the intraspecific sequence comparison of endemic endangered goose breeds, the similarity between partial sequences of individuals was maintained at thresholds of approximately 97%, 98%, and 100%. This result was similar to the experimental results obtained by Ran et al. for mountain plum chickens [
37]. In our research, we chose a specific gene segment (the
CYTB region) rather than the entire mitochondrial sequence for sequencing analysis. Therefore, measuring longer sequences between different genes within the entire mtDNA sequence may provide more comprehensive information on specific mutation sites than measuring shorter sequences, a hypothesis that requires further testing. Based on the genetic diversity information of endemic endangered geese excavated, we selected geese with ideal mutation site sequences as breeding targets and formulated corresponding protection and restoration plans, including the establishment of artificial incubation centers and implementation of habitat restoration plans, to increase the population size and maintain genetic diversity. These results provide further data for optimizing the population structure of endemic endangered breeds and improving breeding methods.
The assembly of haplotype genomes is of great significance for the analysis of structural variation between haplotypes, the study of the evolution of breeds’ genetic origins, the evolution of sex chromosomes, deleterious mutations, and the exploration of the molecular mechanisms of hybrid vigor [
38,
39]. The number of mtDNA haplotypes at trapping sites positively correlates with the number of individuals examined per population [
40]. Although mtDNA is inherited maternally, the number of haplotypes in males is not closely related to the number in females [
41]. In this study, we collected blood samples from 180 geese, and 81 haplotypes were detected based on nucleotide variation between sequences (a high proportion of female haplotypes). A phylogenetic tree constructed from
CYTB gene haplotypes showed that the breeds could be divided into six major branches, indicating limitations in their genetic lineage. A study of
Neocaridina denticulata showed that the phylogenetic relationships between different groups are cross-nested and did not show distinct geographical aggregation [
42]. This indicates that haplotype differentiation between the groups was not apparent. Owing to the significant differentiation of haplotypes within poultry breeds, the results of our study contradict those found in poultry breeds [
37]. We found that the number of unique haplotypes in YE, XP, and WZ geese exceeded 10 (
N > 10). Haplotype sharing was observed between samples collected from different breeds, except for Hap_10 (1.23%). Other endemic endangered breeds formed relatively distinct geographical structures, and groups in different geographical locations mostly had unique haplotypes. The results for the variable sites of the different haplotypes showed that Hap_4 and Hap_10 were the dominant haplotypes among the groups (
N > 12). This may be owing to the widespread genetic mixing between and within goose breeds; moreover, human factors cannot be ruled out (artificial introduction of foreign genes and isolation and protection). After checking for base insertions and deletions, the adenine (A) mutation site was found to be relatively high (43.33%), indicating a strong transversion bias characteristic of animal mtDNA. The conclusion reached by Boman et al. [
43] is contrary to that of our study. The abnormal situation may be owing to short evolutionary time constraints; both female relatedness and limited male-mediated gene flow significantly reduce mtDNA genetic variation in breeding colonies or small breeds [
44]. Second, insertions and deletions during sequence processing alter the sequence of mitochondrial DNA, resulting in mutations that require further study on the variable types of transitions and reversals. In summary, our research has found that large geographical distances and human activities jointly restricted the diffusion of haplotypes in the populations of geese under study, thereby increasing the differentiation of the north–south haplotype branches of six endemic endangered breeds.
The degree of difference in the average genetic distance and between breeds reflects the distance of genetic relationships between various groups or the genetic similarity between breeds [
45]. In our research, BLAST sequence comparison and inter-variety clustering analysis of 180 samples showed that the six goose breeds were roughly divided into two large evolutionary groups. BZ, XP, LX, and YE geese were divided into three subgroups. These were all local breeds in China; therefore, they were grouped into one major category. BLAST analysis generated all 81 different haplotypes, and the most likely species of
Anser cygnoides were identified based on GenBank data. Our results were consistent with those of previous studies that used mitochondrial data to determine the origins of domestic geese in China and Europe [
11,
46]. The larger the value of
Fst, the greater the genetic diversity of the breeds and the higher the degree of population differentiation [
47]. We observed the relationships between locally endangered goose breeds. The LX, XP, and BZ geese had the highest degree of genetic differentiation (
Fst > 0.8), and the gene flow values were not significant (
Nm < 1). There is substantial genetic differentiation between breeds when
Fst ≥ 0.33 [
48]. Generally, the genetic exchange between the LX, XP, and BZ geese was low compared to that of the other three breeds. Substantial genetic differentiation caused by genetic drift may have occurred, which is consistent with the results of the genetic distance analysis. The degree of genetic differences between the LX, YJ, YE, and WZ goose breeds was relatively low (
Nm > 15.00). Frequent gene exchange can effectively inhibit genetic drift and reduce the risk of genetic differentiation among groups [
49]. However, the overall genetic exchange between groups was not smooth and was significantly affected by geographical isolation. No significant difference was observed between breeds that maintained a high level of genetic diversity (
p > 0.05). In this study, the Wright’s fixation index between endemic endangered goose breeds in China was close to zero (
Fst = 0.07099), and the genetic structures of the different breeds were completely consistent. Therefore, we speculate that this may be owing to the gradual expansion of the scale of artificially raised geese in northern China, using excellent male breeds (
Lande and
Rhine) for hybrid breeding [
50]. Frequent human activities promote male-mediated gene flow in adjacent breeds (such as setting up corridors), thereby promoting the dispersal and exchange of goose breeds.
The fact that the variability in a breed’s genetic diversity is smaller than the census population size among breeds is known as Lewontin’s paradox [
51]. The reason for this scaling is unclear; however, it is likely that multiple factors are involved (over collection, habitat fragmentation, and accidental events). To determine whether a physiological population has experienced population expansion, we used two dynamic detection methods: nucleotide mismatch difference analysis and neutrality tests. Expansion events result in smaller genetic differences between most individuals, because they are mainly derived from a small group of ancestral breeds. In this case, the distribution of nucleotide mismatch differences exhibits a single peak “Poisson distribution” characteristic. Tajima’s D test is more likely to reveal the history of ancient population expansion, whereas Fu’s Fs test is more sensitive to recent population expansion [
52]. In our study, the Fu’s Fs of the neutrality test for the BZ, LX, and YE geese was negative, and Tajima’s D was positive. However, there was a lack of statistical significance in the theoretical sense, thus deviating from the neutrality test results. Similar results were obtained using a diversity analysis of slender fish breeds in the Yalu River Basin [
53]. Combined with the nucleotide mismatch distribution map of the six goose breeds, the expected values for pairwise differences in the
CYTB control region showed a smooth curve, whereas the actual values had the highest peak. Nucleotide mismatches in the mitochondrial
CYTB sequences of the BZ, LX, YE, and XP breeds exhibited a typical unimodal distribution. They conformed to the unimodal curve pattern of population expansion, and it was inferred that the BZ, LX, YE, and XP breeds had recently experienced population size expansions. This pattern may be caused by human intervention in the introduction and breeding of endemic endangered breeds and the demarcation of nature reserves for endemic endangered breeds to protect their natural reproduction from environmental threats. The increased occurrence of extreme weather events caused by global warming and climate change might affect the population fluctuation patterns of endemic endangered breeds [
54]. This study found that the neutrality test Fu’s Fs of WZ and YJ geese were close to zero, and Tajima’s D was positive. The two groups were relatively stable and consistent with the neutrality test results. The nucleotide mismatch difference distribution map showed that the expected values for pairwise differences in genes in the population control region showed a smooth curve, whereas the actual values in the two breeds showed multimodal Poisson distributions. We speculate that the breeds WZ and YJ were relatively stable and did not experience population expansions or bottleneck effects. These finding differs from those of previous studies [
55]. Sampling points reflect the geography, potentially influencing the temperature, environment, and local adaptability. Furthermore, differences in the climate and environment cause changes in the local adaptability. Human factors such as culling and release have caused differences between endemic endangered groups in different geographical locations and habitats [
56]. The current climate may cause changes in volatility cycles, affecting the dispersal patterns of WZ and YJ geese. Future research should aim to compare the genetic diversity observed through hybridization and different molecular markers (including genomics) between current and future breeds at the same research sites.