*2.3. Microsatellite Genotyping*

The samples were genotyped for 11 highly polymorphic microsatellite loci (BM1818, BM2113, BM1824, ETH10, ETH225, INRA023, SPS115, TGLA53, TGLA122, TGLA126, and TGLA227) which were recommended by the International Society of Animal Genetics (ISAG) [35]. The multiplex PCRs were carried out in a final volume of 10 μL in a PCR buffer with 200 mM dNTPs, 1.0 mM MgCl2, a 0.5 mM primer mix, 1 unit of Taq polymerase (Dialat Ltd., Moscow, Russia), and 1 μL of genomic DNA. The initial denaturation at 95 ◦C for 4 min was followed by 35 cycles of PCR amplification (95 ◦C, 20 s; 63 ◦C, 30 s; 72 ◦C, 1 min). Additionally, the final extension was performed at 72 ◦C for 10 min. The negative controls (PCR reaction without DNA template) were included in each PCR experiment to check for possible DNA contaminations. PCR fragments were separated by capillary electrophoresis on an ABI3130xl genetic analyzer (Applied Biosystems, Beverly, MA, USA) using GeneScan™-350 ET ROX as a fragment standard. The fragment lengths were determined using the Gene Mapper software v4 (Applied Biosystems, Beverly, MA, USA). Allele sizes were standardized according to an ISAG International Bovine (Bos Taurus) STR typing comparison test (2018–2019).

A modified multiple-tube approach, proposed by Mondol et al. [37] and Modi et al. [38], was used to determine the consensus genotypes as previously described [29]. Briefly, each DNA sample was analyzed in five replicates. Initially, multiplex amplification of microsatellite loci was performed in duplicates and only the samples in which at least six loci were successfully amplified (positive multiplex PCRs) were selected for further analysis. For such samples, three additional independent PCR replicates were performed using the same DNA extractions. The genotyping results of the five PCR replicates were used to calculate the "quality indices" (QIs) for each sample/locus as described by Miquel et al. [39]. For the samples with QI values less than 0.75, three additional multiplex PCRs were carried out using the same DNA extraction. Genotypes with a QI value of 0.75 and higher at each locus

were considered reliable and selected for further analysis. The number of positive PCRs from the number of replicates for each locus/sample was defined as amplification failure. Those genotypes different from the most frequent genotype were defined as genotyping errors and included allelic dropout (ADO) or false alleles (FAs). The overall error rates were calculated as the ratio of genotyping errors from the total number of positive PCRs. For calculation of the ADO and FA rates, we used the protocol proposed by Broquet and Petit [40]. For the quality control of genotyping, the probability values of correct genotyping (*p*) for each locus were calculated according He et al. [41]. The threshold for *p*-values at each locus was set as <0.001.

#### *2.4. Data Analysis*

FreeNA [42] and MICRO-CHECKER 2.2.3 software [43] were used to check the presence of null alleles. We applied GenAIEx 6.5 software [44] to calculate the number and frequency of alleles. The packages diveRsity [45], adegenet [46], and ggplot2 [47] of the R software (http://cran.r-project.org, accessed on 12 May 2021) were used for calculating the main statistics, including observed heterozygosity (Ho), unbiased expected heterozygosity (uHe), unbiased inbreeding coefficient (u*F*IS), and rarefied number of alleles (AR) [48], performing principal component analysis (PCA), and visualizing breed relationships. Pairwise differences among studied populations for uHe and AR estimates were tested using Wilcoxon rank-sum test [49] using Statistica 10 software (www.statsoft.com, accessed on 18 July 2021). The software environment R3.5.0 was used to prepare the data files [50].

Pairwise Jost's D genetic distances [51] and paired *FST* genetic distances [52] were calculated to evaluate the degree of genetic differentiation of the studied breeds. The pairwise Jost's D and *FST* matrices were used to construct the phylogenetic trees using the Neighbor-Net algorithm in SplitsTree 4.14.5 [53].

STRUCTURE 2.3.4 software [54] was applied to investigate the genetic structure of studied populations. We used the admixture model with the option of correlated allele frequencies. The burn-in period was set to 10,000 iterations, followed by 100,000 Markov chain Monte Carlo (MCMC) repetitions for each run. We performed 10 independent runs for a number of possible clusters (*K*) ranging from 2 to 8. CLUMPAK software [55], available at http://clumpak.tau.ac.il (accessed on 7 May 2021), was used to analyze multiple independent runs at a single *K* value. For measuring the constancy over runs, we calculated average pairwise similarity scores using CLUMPAK [55].

To infer demographic history with admixture, we applied the TreeMix software [56] to the microsatellite data. We computed the mean and variance in length at each microsatellite locus and used them to run TreeMix v1.13. We tested up to five migration events in twenty iterations per migration edge. The optimal number of migration events was determined using R package "optM" [57] based on TreeMix output files. We added the optimal number of migrations to the phylogenetic model and estimated the consistency between migration edges using TreeMix for 20 independent runs.
