Next Article in Journal
A Novel Pathogenic Large Duplication in EXT1 Identified in a Family with Multiple Osteochondromas
Previous Article in Journal
In Silico Exploration of AHR-HIF Pathway Interplay: Implications for Therapeutic Targeting in ccRCC
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Variant-Centric Analysis of Allele Sharing in Dogs and Wolves

by
Matthew W. Funk
1 and
Jeffrey M. Kidd
1,2,*
1
Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
2
Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
*
Author to whom correspondence should be addressed.
Genes 2024, 15(9), 1168; https://doi.org/10.3390/genes15091168
Submission received: 8 August 2024 / Revised: 28 August 2024 / Accepted: 30 August 2024 / Published: 5 September 2024
(This article belongs to the Section Animal Genetics and Genomics)

Abstract

:
Canines are an important model system for genetics and evolution. Recent advances in sequencing technologies have enabled the creation of large databases of genetic variation in canines, but analyses of allele sharing among canine groups have been limited. We applied GeoVar, an approach originally developed to study the sharing of single nucleotide polymorphisms across human populations, to assess the sharing of genetic variation among groups of wolves, village dogs, and breed dogs. Our analysis shows that wolves differ from each other at an average of approximately 2.3 million sites while dogs from the same breed differ at nearly 1 million sites. We found that 22% of the variants are common across wolves, village dogs, and breed dogs, that ~16% of variable sites are common across breed dogs, and that nearly half of the differences between two dogs of different breeds are due to sites that are common in all clades. These analyses represent a succinct summary of allele sharing across canines and illustrate the effects of canine history on the apportionment of genetic variation.

1. Introduction

Domestic dogs (Canis lupus familiaris) are a powerful model system for genetics and evolutionary biology. Genetic evidence from modern and ancient samples indicate that dogs were domesticated from a now-extinct lineage of wolves 20,000 to 30,000 years ago somewhere in Eurasia [1,2,3]. Global surveys of dog genetic diversity show a clear separation between samples of western Eurasian and eastern Eurasian origin [4]. Since domestication, dogs have experienced a complex demographic history, including waves of global migration that parallel the movement of human populations [3]. These demographic events include bottlenecks, expansions, and periods of interbreeding and population replacement that have left distinct signatures on the patterns of canine genetic diversity [5,6]. Modern dog breeds, which are genetically closed populations that are bred towards specified phenotypes, are a relatively recent development, with origins within the past 400 years, with most breeds originating during the 19th century [7]. The unique genetic structure of modern dog breeds facilitates the identification of alleles affecting disease susceptibility, morphology, and behavior [8].
The severe bottlenecks and sustained, small, effective population sizes associated with breed formation have had profound effects on canine genomes, including an increased load of deleterious alleles [6], increased presence of recessively inherited disorders [9], and an associated decrease in overall fitness [10,11,12]. Despite their large phenotypic diversity and popularity as companion animals, most canine genetic variation is not found among breed dogs. Rather, most genetic diversity is found in populations of dogs that live as semi-feral human commensals around the world [13,14]. These populations, known as village dogs or street dogs, better represent the state of dogs throughout their long history prior to the formation of modern breeds. Importantly, genetic studies have confirmed that village dogs represent distinct populations that have retained the genetic diversity lost during the formation of breeds and are not simply mixtures of breed dogs that have “escaped” [15]. Thus, breed dogs, village dogs, and wolves reflect distinct aspects of canine history.
The relevance of dogs to human disease studies has led to the growth of a robust canine genetics research community. This research community has developed valuable resources, including a well-annotated reference genome [16,17], detailed phenotype information coupled with databases of known genetic variation that enable efficient genome-wide trait mapping [18], and publicly available collections of samples with short-read whole genome sequence data [19,20,21]. This strong research foundation has been expanded by the recent availability of multiple high-quality reference genomes derived from long-read sequencing technologies [22,23,24,25,26,27,28,29]. Recently, the Dog10K consortium released whole genome sequencing data from a diverse collection of nearly 2000 samples, including wolves, village dogs, and breed dogs [21]. This collection offers an unbiased view of canine genome variation and is a valuable resource for trait mapping and evolutionary studies. Importantly, this sequence-based resource represents a genome-wide perspective on patterns of genetic variation in canines without the biases associated with genotyping arrays [30,31]. However, effectively visualizing the sharing of genetic variation among multiple sample categories in this large collection remains a challenge.
Visualizing the large genetic data sets generated from geographically diverse human populations has also been a challenge. In 2020 Biddanda, Rice, and Novembre developed a technique, known as GeoVar, to summarize the sharing patterns of alleles of different frequencies [32]. The GeoVar approach bins alleles into the following three categories based on their frequency in a population: unobserved, coded as U; rare, coded as R; and common, coded as C. The joint distribution of frequencies across populations is then conveyed using the encoding for each variant. For example, a variant that is common in each of four populations will be coded as ‘CCCC’, while a variant that is rare in the first population and unobserved in the other three will be coded as ‘RUUU’. The overall pattern of allele frequencies can then be depicted as the relative frequency of each encoding, which is represented as a GeoVar plot [32]. Essentially, this scheme represents a discretization of the multi-population site frequency spectrum and allows for a greater understanding of how rare and common variations are distributed among various groups. The application of this approach to human data revealed that the majority of variants are rare in one geographic location and unobserved elsewhere; variants that are common in one region are likely to also be found globally; that most of the differences between two individuals are due to globally common alleles; and that genotyping arrays are heavily biased toward globally common alleles.
In this study, we analyze genome-wide single nucleotide polymorphism (SNP) data from the Dog10K project to investigate patterns of allele sharing among canines. First, we estimate the average number of differences found at accessible SNP positions between samples, confirming that wolves show the greatest amount of genetic variation. We then apply the GeoVar approach to breed dogs, village dogs, and wolves, as well as to smaller sub-groupings among breed and village dogs. We find that globally common alleles are more frequent in canines as compared to similarly processed human samples; that ~71% of the sites that differ between two breed dogs or two village dogs are common throughout canines; and that canine genotyping arrays are strongly biased toward common alleles.

2. Materials and Methods

2.1. Canine Variant Data Processing

Single nucleotide polymorphism (SNP) data were obtained from the Dog10K consortium based on the alignment of Illumina sequence data to the canFam4/UU_Cfam-GSD_1.0 reference assembly [21]. We utilized the final ‘strict filtering’ sample set that includes 1929 individuals, consisting of 1579 breed dogs, 281 village dogs, 57 wolves, and 12 dogs with a mixed origin or that are not recognized by any international registering body. We further filtered the sites to remove variation on the sex chromosomes, non-biallelic SNPs, SNPs with missing data, and SNPs that were not variable among the analyzed samples. A total of 26,585,484 SNPs (79% of the original SNPs marked as ‘PASS’ in the VCF file) were retained. This filtered SNP set, with no missing data, was used for all analyses including distances between dog samples and patterns of allele sharing.

2.2. SNP Distance Calculation

An estimate of SNP distances between a pair of samples over k biallelic SNPs was calculated, as in [33], as i = 1 k d i where di is 1 when the pair has opposite homozygous genotypes; ½ when one sample is homozygous and the other heterozygous; and 0 when both samples have the same genotype. This is a conservative estimate of the number of differences expected between two alleles randomly sampled from two diploid individuals, as heterozygotes are assumed to be concordant. For village dogs, wolves, and within breeds, the distance was calculated between all possible pairs of individuals. For the full breed dog analysis, one sample from each breed was selected.

2.3. Canine GeoVar Analysis

Allele sharing analysis was performed based on the minor allele frequency of each SNP found in the analyzed sample set using the GeoVar software (version 1.0.2), as previously described [32]. Each analysis followed the same workflow and was based on the filtered SNP set with no missing data. First, SNP genotypes were extracted for the samples used in the analysis and SNPs that were not polymorphic among the analyzed samples were discarded. Next, each SNP was categorized as being absent, rare (minor allele frequency < 5%) or common in each sample grouping, and the total count of SNPs having each combination of frequency categories was determined. Since the variation discovered is sensitive to differences in sample size, the analysis was performed based on a fixed number of individuals per category using a random selection of samples. For the analysis of sharing among canines, 50 wolves, 50 village dogs, and 50 breed dogs were randomly selected. For the analysis of village dog groups, 35 samples in each category were randomly selected. For breeds, 30 samples in each breed clade were analyzed. For each analysis, five different random samples were analyzed to assess the effect of sample selection on the overall result.

2.3.1. Village Dog Classification

Sample groups for village dog analyses were defined based on principal component analysis (PCA) and reported the sample location for all village dogs in the strict filter sample set (Table S1). Prior to PCA, SNPs were filtered with PLINK, Version 1.9 [34] to remove SNPs with a minor allele frequency of <5% and in high-linkage disequilibrium (using PLINK—indep-pairwise 50 10 0.1). PCA was performed using PLINK.

2.3.2. Breed Dog Classification

Breed dog clades were defined as in Meadows et al. [21]. Only clades with more than 30 individuals were retained for analysis (Table 1).

2.3.3. Microarray and Two-Sample Comparisons

To assess the effect of variant ascertainment, the analysis was repeated based on sites included on the Illumina Canine HD array. The analysis was also performed based on variants that differ between two individuals. For this, a sample from each of the two categories being analyzed was selected from the samples not included in the randomly selected subset used in the sharing analysis. For the breed dogs, comparisons were made between the largest clade, Scenthounds, and the three clades that have the rarest variation.

2.4. Human GeoVar Analysis

The analysis of human allele sharing was performed to assess the effect of down sampling to 50 individuals per group. Human variation data from the high coverage 1000 Genomes Project sample collection were obtained from [35]. The analysis was limited to passing biallelic variants in the autosomes with no missing data. The same five continental populations used by Biddanda et al. [32] were assessed as follows: AFR (Africa); EUR (Europe); EAS (East Asia); SAS (South Asia); and AMR (admixed Americas). To match the dog sample sizes, 50 individuals were randomly selected from each category. To assess the impact of a variable number of categories, human analysis was also performed with the following three categories: Africa, Asia (combined EAS and SAS), and European populations.

3. Results

3.1. Average Number of Differences between Canines

We analyzed patterns of allele sharing among canines based on the genome-wide SNP variation map obtained by the Dog10K consortium [21]. Genetic variants were identified based on the alignment of Illumina sequencing reads to the canFam4/UU_Cfam_GSD_1.0 reference assembly derived from a German Shepherd dog [22]. Following filtering, the analyzed variant data set included 26,585,484 autosomal, biallelic SNPs that were variable among 1929 samples, with a density of ~120 SNPs/10 kbp (Figure S1), 79% of the original ‘PASS’ SNPs in the data.
The 1000 Genomes Project data reports variation from 2504 humans [9], with a total of 98,188,417 PASS site SNPs, or about 357 SNPs/10 kb. After performing the same filtering described above, this rate was reduced to about 312 SNPs/10 kb (Figure S2). This represents a 2.6-fold increase over the rate of canine SNPs that were analyzed, with 87% of total PASS SNPs being analyzed.
To assess the genetic differences among samples, we determined the number of SNP differences between pairs of samples across categories (Figure 1). As expected, wolves show the greatest amount of genetic variation; the 57 wolf samples differ from each other at a mean of 2,346,972 SNPs. This is followed by village dogs, where the 281 samples show a mean difference of 1,806,946 SNPs. We selected one sample from each of the 321 breeds analyzed, finding a mean difference of 1,702,814 SNPs between breeds. A subset of pairwise breed comparisons show increased SNP distances of around 2 million. This included comparisons of Japanese breeds with wolfdogs, notably the Czechoslovakian Wolfdog and Saarloos Wolfdog. The breed with the largest mean SNP distance to other breeds is the Shikoku (Table S2). Other breeds that have a high average SNP distance to other breeds include the Norwegian Lundhund, consistent with the extreme population bottleneck this breed experienced [21,36,37].
As a comparison, we performed the same analysis comparing samples within a breed. We selected two breeds for comparison: Basset hounds and Bernese Mountain dogs. We found a mean of 1,137,228 SNP differences among seven Basset hounds and 992,700 SNP differences among ten Bernese Mountain dogs. To compare this between the difference in a geographic cluster of village dogs, we found that among the 15 Congo Village dogs, there is a mean SNP difference of 1,560,901 between pairs of samples, consistent with the greater amount of genetic diversity found in village dogs [13,15].

3.2. Sharing of Alleles among Breed Dogs, Wolves, and Village Dogs

We constructed GeoVar plots to assess patterns of allele sharing among breed dogs, wolves, and village dogs. These plots offer a visual description of allele sharing as a function of allele frequency in a group and can be considered as a discretized representation of the multi-dimensional site frequency spectrum. Variants are classified as being common (maf ≥ 5%), rare (maf < 5%), or unobserved in each population, and are plotted based on combinations of categories across sample sets. Since only 50 samples from each category were selected, the minimum possible allele frequency is 1%.
To correct for the uneven size of the groups, we randomly selected 50 breed dogs, wolves, and village dogs (Figure 2). The most common GeoVar classification is for alleles to be common in all three groups: 22% of sites have a minor allele frequency ≥ 5% in all three categories. This is closely followed by sites that are rare in wolves and unobserved in breed dogs and village dogs (21% of sites). An additional ~10% of sites are common in wolves and unobserved in breed dogs and village dogs, thus ~31% of sites are only variable in wolves. Approximately 10% of sites are rare in village dogs and unobserved in the other samples, while ~5% of sites are rare in breed dogs and unobserved elsewhere. To assess the effect of random sample selection, we repeated this process with five different random sample selections (Figure 2). In general, the effect of using a different random sample on the frequency of the categories is small, varying the size of the categories by about +/−1% of the total SNP set, although in certain situations, it does affect the ordering of the categories.
Since the effect of analyzing of 50 samples per category versus the hundreds of samples per category used in the human analysis is unclear [32], we repeated the GeoVar analysis of the 1000 Genomes Project samples using a random selection of 50 individuals per group. The analysis in Biddanda et al. found that most variants in humans are rare in a single population and unobserved elsewhere. In our human analyses with a reduced sample size, the most frequent category is variants that are rare in samples from Africa and unobserved elsewhere (Figure S3). Unsurprisingly, with smaller sample sizes, the proportion of variants that are globally common increases, accounting for ~11% of sites. Since the Dog10K analysis used only three groups instead of the five used for humans, we repeated the analysis by classifying human samples into three groups (Figure S4). This increases the percentage of globally common alleles to ~15%, still notably lower than the ~22% found in canines.

3.3. Allele Sharing within Village Dogs and Breed Clades

Next, we assessed allele sharing within sample groups. First, we analyzed village dogs. Based on the sample location and principal component analysis, we assigned 281 village dog samples into the following three geographic groups (Figure S5, Table S1): Africa (50 samples from Kenya, Liberia, and Congo); Central Asia (40 samples from Azerbaijan, Bulgaria, Iran, Tajikistan, and Uzbekistan); and East Asia (139 samples from China, Cambodia, Myanmar, and Nepal). We performed GeoVar analysis across these three groups based on a random sampling of 35 individuals per group (Figure 3). The most common pattern was variants that were common in all three groupings, accounting for ~35% of SNPs. This was followed by variants that were rare in one population but absent in the other three, with the highest proportion of variants present in the village dogs from east Asia.
To explore sharing among dog breeds, we analyzed 17 breed clades as defined by the Dog10K project (Table 1) [21]. We selected 30 samples from each clade for analysis and found that variants that are common in all 17 clades were the most common category, accounting for ~16% of sites (Figure 4). This was followed by variants that were rare in a single clade and absent elsewhere, with the Asian, German Shepherd, and Flockguard Sighthound clades possessing the rarest variations.

3.4. SNP Microarrays Are Skewed toward Globally Common Sites

To assess the effect of ascertainment of sites present on genotyping arrays, we repeated the GeoVar analysis using only the sites present on the Illumina Canine HD SNP array. After filtering, this resulted in 150,299 sites. As expected, the proportion of sites that are common increased dramatically, with ~65% of all variants being globally common among wolves, village dogs, and wolves (Figure 5). Variants that are rare in one population and common in the others are also more abundant than those found in the sequencing data.

3.5. Common Variants Account for Most of the Differences between Individual Dogs and Wolves

To assess the frequency spectrum of sites that differ between two individuals, we repeated GeoVar analysis based on sites that differ between two samples. First, we identified one breed dog (a Golden Retriever), one wolf (from Russia), and one village dog (from Nepal) that were not in the randomly selected set used for the analysis (which is the same as that in the bottom right plot in Figure 3). We then performed GeoVar analysis based on the SNP differences found between each pair of these three samples (Figure 6). Sites that are common in all three groups accounted for ~56–73% of the differences between samples.
We repeated GeoVar analysis based on sites that differed between two samples belonging to the same group, as follows: two breed dogs (a Hellenic Hound and a Swedish White Elkhound); two village dogs (from China and French Polynesia); and two wolves (from Russia and China) (Figure 7). We found that ~49% of the sites that differ between two wolves are common across the breed dog, village dog, and wolf categories while ~71% of the sites that differ between two breed dogs, or two village dogs, are globally common. A similar pattern holds when considering the geographic distribution of variants found between two village dogs, with 71–77% of differences common in each of the east Asia, central Asia, and Africa groups (Figure 8). Finally, we assessed the distribution of variation across breed clades, finding that ~49% of the differences between two breed dogs are common across all 17 clades (Figure 9). An additional ~4% of variants are rare in in the German Shepherd clade but common in all others.

4. Discussion

The availability of genome sequencing data from diverse samples allows for an unprecedented view of genetic variation in a species, including variation that is rare or restricted to particular subpopulations. However, the scale of the data sets that are now produced presents barriers for the efficient analysis and visualization of the resulting data, necessitating the use of multiple complementary approaches. In this study, we analyzed patterns of allele sharing among wolves, village dogs, and breed dogs analyzed by the Dog10K consortium [21].
As an initial summary of the data, we determined the average SNP distances between samples. As expected, wolves are the most genetically diverse group, with an average of ~2.3 million SNP differences between samples, compared to ~1.8 million SNPs between village dogs and 1.7 million SNPs between dogs of different breeds. Dogs from the same breed show a lower, but still substantial, amount of diversity with ~1 million SNPs, although this differs by breed. We note that these distances were calculated based on SNPs that were identified by Illumina short-read sequencing following a standardized filtering approach [21] and were also reduced by our removal of variants with any missing data across all samples. The direct comparison of high-quality long-read assemblies identifies additional SNP differences [28], as well as other types of variation, such as small indels and mobile element insertions, that make important contributions to canine genetic diversity [27].
To assess the sharing of SNPs across sample groups, we used GeoVar, a visualization method developed for the analysis of human genetic variation [32]. The canine samples we analyzed include fewer individuals per group as well as fewer groupings relative to collections of human genetic variation, resulting in an increased proportion of common alleles. However, when using a sample size of 50 individuals per group, we found that 22% of the variants are common across wolves, village dogs, and breed dogs. In comparison, ~15% of alleles are common across three continental groupings of human samples of the same size. This suggests that globally common alleles are approximately 50% more abundant across the three groupings of canines than found in humans. This likely reflects the evolutionary history of canines: dogs were domesticated from a now-extinct wolf population somewhere in Asia [3] and ancestral wolves had a large effective population size [38]. In contrast, the pattern of allele sharing in humans reflects groups that have recently diverged from ancestors with a smaller effective population size [32].
We found that ~16% of sites are common across 17 breed clades, followed by variants that are rare in the Asian, German Shepherds, or Flockguard Sighthound clades and unobserved elsewhere. The high proportion of variants that are unique to the Asian clade (~6%) likely reflects the deep ancestral separation found between dogs across Eurasia [4]. The Dog10K data were generated by aligning reads to the canFam4/UU_Cfam-GSD_1.0 assembly, which was derived from a German Shepherd Dog. Although one may speculate about the effects of reference bias on variation patterns found among German Shepherd-like dogs, we note that similar breed relationship patterns are observed when Illumina sequencing data are analyzed relative to a reference assembly from a Greenland wolf outgroup [39]. Apart from German Shepherds, the top three clades for rare variation—Asian, Flockguard Sighthound, and Spitz—are all located within the same superclade in the Dog10K analysis, though this also contains the Hungary clade [21]. This suggests that rare variation may be enriched in specific clades.
We categorized the village dogs into three categories based on PCA and geography. The village dogs showed remarkable similarity between the categories, with ~35% of alleles being globally common. Village dogs from east Asia tend to have more genetic diversity than village dogs from central Asia or Africa. However, the differences between the groups are minor.
As expected, most of the differences between any two samples represented variants common across all categories. We found that nearly half the sites that differed between two wolves were common across the wolf, village dog, and breed dog categories and that almost half of the differences between two dogs of different breeds were due to sites that are common across all 17 breed clades. Among the village dog categories, nearly three-quarters of the differences are globally common across all three categories.
These results give a more complete picture of how canine variation varies across breeds and between breed dogs, wolves, and village dogs. The GeoVar plots provide a simple visualization that complements other techniques such as SNP distances and PCA. Importantly, GeoVar plots offer a glimpse into the sharing patterns of rare variants that have only been accessible since the advent of large-scale whole genome sequencing studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes15091168/s1 Supplementary Table S1. Village Dog Population Assignments, Supplementary Table S2. Breed Dog SNP Distances, Figure S1. Density of SNPs included in analysis, Figure S2. Density of SNPs from the human 1000 Genomes Project, Figure S3. Human GeoVar analysis with reduced sample sizes, Figure S4. Human GeoVar analysis with three groups, and Figure S5. Principal component analysis of village dogs.

Author Contributions

Conceptualization, M.W.F. and J.M.K.; methodology M.W.F.; writing M.W.F. and J.M.K., conceived of the study, M.W.F. performed analyses, M.W.F. and J.M.K. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part through computational resources and services provided by Advanced Research Computing at the University of Michigan, Ann Arbor.

Data Availability Statement

SNP data from the Dog10K consortium is available in the Zenodo archive at https://zenodo.org/record/8084059. Data from the human 1000 Genomes project high-coverage sequencing is available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20190425_NYGC_GATK/ (accessed on 28 May 2024).

Acknowledgments

We thank the members of the Dog10K Consortium for making canine variation data publicly available. We thank Mathew Blacksmith, Emily Koch, Anthony Nguyen, and Peter Schall for helpful feedback on manuscript drafts.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Skoglund, P.; Ersmark, E.; Palkopoulou, E.; Dalen, L. Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into high-latitude breeds. Curr. Biol. 2015, 25, 1515–1519. [Google Scholar] [CrossRef] [PubMed]
  2. Botigue, L.R.; Song, S.; Scheu, A.; Gopalan, S.; Pendleton, A.L.; Oetjens, M.; Taravella, A.M.; Seregely, T.; Zeeb-Lanz, A.; Arbogast, R.M.; et al. Ancient European dog genomes reveal continuity since the Early Neolithic. Nat. Commun. 2017, 8, 16082. [Google Scholar] [CrossRef] [PubMed]
  3. Bergstrom, A.; Frantz, L.; Schmidt, R.; Ersmark, E.; Lebrasseur, O.; Girdland-Flink, L.; Lin, A.T.; Stora, J.; Sjogren, K.G.; Anthony, D.; et al. Origins and genetic legacy of prehistoric dogs. Science 2020, 370, 557–564. [Google Scholar] [CrossRef]
  4. Frantz, L.A.; Mullin, V.E.; Pionnier-Capitan, M.; Lebrasseur, O.; Ollivier, M.; Perri, A.; Linderholm, A.; Mattiangeli, V.; Teasdale, M.D.; Dimopoulos, E.A.; et al. Genomic and archaeological evidence suggest a dual origin of domestic dogs. Science 2016, 352, 1228–1231. [Google Scholar] [CrossRef] [PubMed]
  5. Freedman, A.H.; Gronau, I.; Schweizer, R.M.; Ortega-Del Vecchyo, D.; Han, E.; Silva, P.M.; Galaverni, M.; Fan, Z.; Marx, P.; Lorente-Galdos, B.; et al. Genome sequencing highlights the dynamic early history of dogs. PLoS Genet. 2014, 10, e1004016. [Google Scholar] [CrossRef] [PubMed]
  6. Marsden, C.D.; Ortega-Del Vecchyo, D.; O‘Brien, D.P.; Taylor, J.F.; Ramirez, O.; Vila, C.; Marques-Bonet, T.; Schnabel, R.D.; Wayne, R.K.; Lohmueller, K.E. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc. Natl. Acad. Sci. USA 2016, 113, 152–157. [Google Scholar] [CrossRef]
  7. Parker, H.G.; Shearin, A.L.; Ostrander, E.A. Man’s best friend becomes biology’s best in show: Genome analyses in the domestic dog. Annu. Rev. Genet. 2010, 44, 309–336. [Google Scholar] [CrossRef]
  8. Shearin, A.L.; Ostrander, E.A. Leading the way: Canine models of genomics and disease. Dis. Model. Mech. 2010, 3, 27–34. [Google Scholar] [CrossRef]
  9. Axelsson, E.; Ljungvall, I.; Bhoumik, P.; Conn, L.B.; Muren, E.; Ohlsson, A.; Olsen, L.H.; Engdahl, K.; Hagman, R.; Hanson, J.; et al. The genetic consequences of dog breed formation-Accumulation of deleterious genetic variation and fixation of mutations associated with myxomatous mitral valve disease in cavalier King Charles spaniels. PLoS Genet. 2021, 17, e1009726. [Google Scholar] [CrossRef]
  10. Mooney, J.A.; Yohannes, A.; Lohmueller, K.E. The impact of identity by descent on fitness and disease in dogs. Proc. Natl. Acad. Sci. USA 2021, 118, e2019116118. [Google Scholar] [CrossRef]
  11. Bannasch, D.; Famula, T.; Donner, J.; Anderson, H.; Honkanen, L.; Batcher, K.; Safra, N.; Thomasy, S.; Rebhun, R. The effect of inbreeding, body size and morphology on health in dog breeds. Canine Med. Genet. 2021, 8, 12. [Google Scholar] [CrossRef] [PubMed]
  12. Yordy, J.; Kraus, C.; Hayward, J.J.; White, M.E.; Shannon, L.M.; Creevy, K.E.; Promislow, D.E.L.; Boyko, A.R. Body size, inbreeding, and lifespan in domestic dogs. Conserv. Genet. 2020, 21, 137–148. [Google Scholar] [CrossRef] [PubMed]
  13. Shannon, L.M.; Boyko, R.H.; Castelhano, M.; Corey, E.; Hayward, J.J.; McLean, C.; White, M.E.; Abi Said, M.; Anita, B.A.; Bondjengo, N.I.; et al. Genetic structure in village dogs reveals a Central Asian domestication origin. Proc. Natl. Acad. Sci. USA 2015, 112, 13639–13644. [Google Scholar] [CrossRef] [PubMed]
  14. Boyko, A.R. The domestic dog: Man’s best friend in the genomic era. Genome Biol. 2011, 12, 216. [Google Scholar] [CrossRef]
  15. Boyko, A.R.; Boyko, R.H.; Boyko, C.M.; Parker, H.G.; Castelhano, M.; Corey, L.; Degenhardt, J.D.; Auton, A.; Hedimbi, M.; Kityo, R.; et al. Complex population structure in African village dogs and its implications for inferring dog domestication history. Proc. Natl. Acad. Sci. USA 2009, 106, 13903–13908. [Google Scholar] [CrossRef]
  16. Lindblad-Toh, K.; Wade, C.M.; Mikkelsen, T.S.; Karlsson, E.K.; Jaffe, D.B.; Kamal, M.; Clamp, M.; Chang, J.L.; Kulbokas, E.J., 3rd; Zody, M.C.; et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005, 438, 803–819. [Google Scholar] [CrossRef]
  17. Hoeppner, M.P.; Lundquist, A.; Pirun, M.; Meadows, J.R.; Zamani, N.; Johnson, J.; Sundstrom, G.; Cook, A.; FitzGerald, M.G.; Swofford, R.; et al. An improved canine genome and a comprehensive catalogue of coding genes and non-coding transcripts. PLoS ONE 2014, 9, e91172. [Google Scholar] [CrossRef]
  18. Hayward, J.J.; Castelhano, M.G.; Oliveira, K.C.; Corey, E.; Balkman, C.; Baxter, T.L.; Casal, M.L.; Center, S.A.; Fang, M.; Garrison, S.J.; et al. Complex disease and phenotype mapping in the domestic dog. Nat. Commun. 2016, 7, 10460. [Google Scholar] [CrossRef]
  19. Plassais, J.; Kim, J.; Davis, B.W.; Karyadi, D.M.; Hogan, A.N.; Harris, A.C.; Decker, B.; Parker, H.G.; Ostrander, E.A. Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology. Nat. Commun. 2019, 10, 1489. [Google Scholar] [CrossRef]
  20. Jagannathan, V.; Drogemuller, C.; Leeb, T.; Dog Biomedical Variant Database, C. A comprehensive biomedical variant catalogue based on whole genome sequences of 582 dogs and eight wolves. Anim. Genet. 2019, 50, 695–704. [Google Scholar] [CrossRef]
  21. Meadows, J.R.S.; Kidd, J.M.; Wang, G.D.; Parker, H.G.; Schall, P.Z.; Bianchi, M.; Christmas, M.J.; Bougiouri, K.; Buckley, R.M.; Hitte, C.; et al. Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture. Genome Biol 2023, 24, 187. [Google Scholar] [CrossRef]
  22. Wang, C.; Wallerman, O.; Arendt, M.L.; Sundstrom, E.; Karlsson, A.; Nordin, J.; Makelainen, S.; Pielberg, G.R.; Hanson, J.; Ohlsson, A.; et al. A novel canine reference genome resolves genomic architecture and uncovers transcript complexity. Commun. Biol. 2021, 4, 185. [Google Scholar] [CrossRef]
  23. Jagannathan, V.; Hitte, C.; Kidd, J.M.; Masterson, P.; Murphy, T.D.; Emery, S.; Davis, B.; Buckley, R.M.; Liu, Y.H.; Zhang, X.Q.; et al. Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome. Genes 2021, 12, 847. [Google Scholar] [CrossRef] [PubMed]
  24. Edwards, R.J.; Field, M.A.; Ferguson, J.M.; Dudchenko, O.; Keilwagen, J.; Rosen, B.D.; Johnson, G.S.; Rice, E.S.; Hillier, D.; Hammond, J.M.; et al. Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome. BMC Genom. 2021, 22, 188. [Google Scholar] [CrossRef]
  25. Player, R.A.; Forsyth, E.R.; Verratti, K.J.; Mohr, D.W.; Scott, A.F.; Bradburne, C.E. A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS. Life Sci. Alliance 2021, 4. [Google Scholar] [CrossRef]
  26. Sinding, M.S.; Gopalakrishnan, S.; Raundrup, K.; Dalen, L.; Threlfall, J.; Darwin Tree of Life Barcoding Collective; Wellcome Sanger Institute Tree of Life Programme; Wellcome Sanger Institute Scientific Operations: DNA Pipelines Collective; Tree of Life Core Informatics Collective; Darwin Tree of Life Consortium; et al. The genome sequence of the grey wolf, Canis lupus Linnaeus 1758. Wellcome Open Res. 2021, 6, 310. [Google Scholar] [CrossRef] [PubMed]
  27. Halo, J.V.; Pendleton, A.L.; Shen, F.; Doucet, A.J.; Derrien, T.; Hitte, C.; Kirby, L.E.; Myers, B.; Sliwerska, E.; Emery, S.; et al. Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes. Proc. Natl. Acad. Sci. USA 2021, 118, e2016274118. [Google Scholar] [CrossRef]
  28. Schall, P.Z.; Winkler, P.A.; Petersen-Jones, S.M.; Yuzbasiyan-Gurkan, V.; Kidd, J.M. Genome-wide methylation patterns from canine nanopore assemblies. G3 Genes Genomes Genet. 2023, 13, jkad203. [Google Scholar] [CrossRef]
  29. Bredemeyer, K.R.; vonHoldt, B.M.; Foley, N.M.; Childers, I.R.; Brzeski, K.E.; Murphy, W.J. The value of hybrid genomes: Building two highly contiguous reference genome assemblies to advance Canis genomic studies. J. Hered. 2024, 115, 480–486. [Google Scholar] [CrossRef]
  30. Clark, A.G.; Hubisz, M.J.; Bustamante, C.D.; Williamson, S.H.; Nielsen, R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 2005, 15, 1496–1502. [Google Scholar] [CrossRef]
  31. Lachance, J.; Tishkoff, S.A. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. Bioessays 2013, 35, 780–786. [Google Scholar] [CrossRef] [PubMed]
  32. Biddanda, A.; Rice, D.P.; Novembre, J. A variant-centric perspective on geographic patterns of human allele frequency variation. Elife 2020, 9, e60107. [Google Scholar] [CrossRef] [PubMed]
  33. Gronau, I.; Hubisz, M.J.; Gulko, B.; Danko, C.G.; Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet. 2011, 43, 1031–1034. [Google Scholar] [CrossRef]
  34. Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, 7. [Google Scholar] [CrossRef] [PubMed]
  35. Byrska-Bishop, M.; Evani, U.S.; Zhao, X.; Basile, A.O.; Abel, H.J.; Regier, A.A.; Corvelo, A.; Clarke, W.E.; Musunuri, R.; Nagulapalli, K.; et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022, 185, 3426–3440.e19. [Google Scholar] [CrossRef]
  36. Pfahler, S.; Distl, O. Effective population size, extended linkage disequilibrium and signatures of selection in the rare dog breed lundehund. PLoS ONE 2015, 10, e0122680. [Google Scholar] [CrossRef] [PubMed]
  37. Kettunen, A.; Daverdin, M.; Helfjord, T.; Berg, P. Cross-Breeding Is Inevitable to Conserve the Highly Inbred Population of Puffin Hunter: The Norwegian Lundehund. PLoS ONE 2017, 12, e0170039. [Google Scholar] [CrossRef]
  38. Fan, Z.; Silva, P.; Gronau, I.; Wang, S.; Armero, A.S.; Schweizer, R.M.; Ramirez, O.; Pollinger, J.; Galaverni, M.; Ortega Del-Vecchyo, D.; et al. Worldwide patterns of genomic variation and admixture in gray wolves. Genome Res. 2016, 26, 163–173. [Google Scholar] [CrossRef]
  39. Nguyen, A.K.; Schall, P.Z.; Kidd, J.M. A map of canine sequence variation relative to a Greenland wolf outgroup. Mamm. Genome 2024. [Google Scholar] [CrossRef]
Figure 1. SNP distances between samples. Histograms of SNP distances in groups of wolves, breed dogs, village dogs, village dogs from Congo, Bernese Mountain dogs and Basset hounds are shown. The vertical scales are not identical, since there are different counts for different groups. The mean distance between each selection of pairs is given by ‘d’ at the top.
Figure 1. SNP distances between samples. Histograms of SNP distances in groups of wolves, breed dogs, village dogs, village dogs from Congo, Bernese Mountain dogs and Basset hounds are shown. The vertical scales are not identical, since there are different counts for different groups. The mean distance between each selection of pairs is given by ‘d’ at the top.
Genes 15 01168 g001
Figure 2. Allele sharing among wolves, village dogs, and breed dogs. The GeoVar plots show the proportion of different patterns of alleles in the three major categories of breed dogs, wolves, and village dogs, shown from left to right. The rows are arranged so that the most common pattern for each single nucleotide variant is at the bottom, with decreasing frequency going toward the top of the figure. Each individual figure represents five separate random samples of 50 canines each, with the larger figure on the left being arbitrarily chosen for easier viewing. The boundary between rare and common is a minor allele frequency < 5%. The total number of SNPs analyzed for each sample set is given at the top of each plot.
Figure 2. Allele sharing among wolves, village dogs, and breed dogs. The GeoVar plots show the proportion of different patterns of alleles in the three major categories of breed dogs, wolves, and village dogs, shown from left to right. The rows are arranged so that the most common pattern for each single nucleotide variant is at the bottom, with decreasing frequency going toward the top of the figure. Each individual figure represents five separate random samples of 50 canines each, with the larger figure on the left being arbitrarily chosen for easier viewing. The boundary between rare and common is a minor allele frequency < 5%. The total number of SNPs analyzed for each sample set is given at the top of each plot.
Genes 15 01168 g002
Figure 3. Allele sharing among village dog groups. Five GeoVar plots showing the allele sharing between village dogs based on the geographic groupings, with 35 random samples in each category. Only SNPs showing polymorphism in village dogs are depicted.
Figure 3. Allele sharing among village dog groups. Five GeoVar plots showing the allele sharing between village dogs based on the geographic groupings, with 35 random samples in each category. Only SNPs showing polymorphism in village dogs are depicted.
Genes 15 01168 g003
Figure 4. Allele sharing among breed clades. Five GeoVar plots showing allele sharing between breed clades. Only SNPs showing polymorphism in breed dogs are depicted. Thirty samples from each population were collected. Abbreviations: AmT = American Terriers, BeH = Belgian Herders, CoT = Contiential Terriers, EnT = English Terriers, FlS = Flockguard Sighthound, GeS = German Shepherd, ScoT = Scottish Terriers. The order of the clades is the same in each plot.
Figure 4. Allele sharing among breed clades. Five GeoVar plots showing allele sharing between breed clades. Only SNPs showing polymorphism in breed dogs are depicted. Thirty samples from each population were collected. Abbreviations: AmT = American Terriers, BeH = Belgian Herders, CoT = Contiential Terriers, EnT = English Terriers, FlS = Flockguard Sighthound, GeS = German Shepherd, ScoT = Scottish Terriers. The order of the clades is the same in each plot.
Genes 15 01168 g004
Figure 5. Allele sharing among wolves, village dogs, and breed dogs based on sites present on genotyping arrays. A GeoVar plot using only SNPs found on the Canine HD Illumina array. The five random samples of breeds are the same as in Figure 2.
Figure 5. Allele sharing among wolves, village dogs, and breed dogs based on sites present on genotyping arrays. A GeoVar plot using only SNPs found on the Canine HD Illumina array. The five random samples of breeds are the same as in Figure 2.
Genes 15 01168 g005
Figure 6. Distribution of alleles that differ between individuals from different categories. A comparison of SNPs that are unique when comparing between breed dogs, wolves, and village dogs is shown. Panel (left) shows SNPs that are different between a breed dog and a wolf; panel (middle) shows SNPs that are different between a breed dog and a village dog (Congolese); and panel (right) shows SNPs that are different between a village dog and a wolf. The 50 individuals used in each category are the same as in the bottom right subplot in Figure 4. The two dogs that are used to find differences are not in this random sample. Breed dog = GOLD000007 (Golden Retriever), village dog = VILLNP000001 (Nepal), wolf = CLUPRU000001 (Russia).
Figure 6. Distribution of alleles that differ between individuals from different categories. A comparison of SNPs that are unique when comparing between breed dogs, wolves, and village dogs is shown. Panel (left) shows SNPs that are different between a breed dog and a wolf; panel (middle) shows SNPs that are different between a breed dog and a village dog (Congolese); and panel (right) shows SNPs that are different between a village dog and a wolf. The 50 individuals used in each category are the same as in the bottom right subplot in Figure 4. The two dogs that are used to find differences are not in this random sample. Breed dog = GOLD000007 (Golden Retriever), village dog = VILLNP000001 (Nepal), wolf = CLUPRU000001 (Russia).
Genes 15 01168 g006
Figure 7. Distribution of alleles that differ between individuals from the same category. The same analysis as in Figure 7, except using two samples from the same category. Breed Dogs = CRTR000009 (Hellenic Hound) and SWWE000006 (Swedish White Elkhound), Wolves = CLUPRU000001 (Russia) and CLUPCN000001 (China), Village Dogs = VILLCN000091 (China), VILLPF000004 (French Polynesia).
Figure 7. Distribution of alleles that differ between individuals from the same category. The same analysis as in Figure 7, except using two samples from the same category. Breed Dogs = CRTR000009 (Hellenic Hound) and SWWE000006 (Swedish White Elkhound), Wolves = CLUPRU000001 (Russia) and CLUPCN000001 (China), Village Dogs = VILLCN000091 (China), VILLPF000004 (French Polynesia).
Genes 15 01168 g007
Figure 8. Distribution of alleles that differ between village dogs. Comparisons of SNPs that are different between two village dogs. The dogs used are not found in the random samples of 35 each. Samples used for the comparison: East Asia = VILLCN000099 (China), Central Asia = VILLIR000022 (Iran), Africa = VILLCG000006 (Congo).
Figure 8. Distribution of alleles that differ between village dogs. Comparisons of SNPs that are different between two village dogs. The dogs used are not found in the random samples of 35 each. Samples used for the comparison: East Asia = VILLCN000099 (China), Central Asia = VILLIR000022 (Iran), Africa = VILLCG000006 (Congo).
Genes 15 01168 g008
Figure 9. Distribution of alleles that differ between breed dogs. An analysis based on variants that differ between dogs from different breed clades is shown. Comparisons are given for a Scenthound compared to three clades of breed dogs: the Asian clade, Flockguard Sighthound, and Spitz. Abbreviations: AmT = American Terrier, BeH = Belgian Herder, CoT = Continental Terrier, EnT = English Terrier, FlS = Flockguard Sighthound, GeS = German Shepherd, ScoT = Scottish Terrier. Samples used: Scenthound = BMSH0000002, Flockguard Sighthound = CIRN000003, Asian = TIBT000003, Spitz = GSVM000005.
Figure 9. Distribution of alleles that differ between breed dogs. An analysis based on variants that differ between dogs from different breed clades is shown. Comparisons are given for a Scenthound compared to three clades of breed dogs: the Asian clade, Flockguard Sighthound, and Spitz. Abbreviations: AmT = American Terrier, BeH = Belgian Herder, CoT = Continental Terrier, EnT = English Terrier, FlS = Flockguard Sighthound, GeS = German Shepherd, ScoT = Scottish Terrier. Samples used: Scenthound = BMSH0000002, Flockguard Sighthound = CIRN000003, Asian = TIBT000003, Spitz = GSVM000005.
Genes 15 01168 g009
Table 1. Breed clades used for GeoVar analysis.
Table 1. Breed clades used for GeoVar analysis.
CladeNumber of SamplesCladeNumber of Samples
Alpine67Mastiff97
American Terriers31Pointer138
Asian135Retriever36
Belgian Herders43Scenthound222
Continental Herders38Scottish Terrier40
English Terriers53Spaniel62
Flockguard Sighthound145Spitz152
German Shepherd32UK Herding46
Hungary43
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Funk, M.W.; Kidd, J.M. A Variant-Centric Analysis of Allele Sharing in Dogs and Wolves. Genes 2024, 15, 1168. https://doi.org/10.3390/genes15091168

AMA Style

Funk MW, Kidd JM. A Variant-Centric Analysis of Allele Sharing in Dogs and Wolves. Genes. 2024; 15(9):1168. https://doi.org/10.3390/genes15091168

Chicago/Turabian Style

Funk, Matthew W., and Jeffrey M. Kidd. 2024. "A Variant-Centric Analysis of Allele Sharing in Dogs and Wolves" Genes 15, no. 9: 1168. https://doi.org/10.3390/genes15091168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop